Microsoft is the only major LLM vendor with a South Africa inference region. For the regulated workloads that have to keep prompts and grounding data in country — banking, healthcare, public sector, FAIS-covered advisory — that single fact is the deciding architectural input. This sub-tree maps the strategy decision (which Copilot surface fits the workload), the implementation patterns (how a Worker actually calls gpt-4o in Joburg), and the auth model that holds it all together.
Three live nodes today: the strategy-level explainer for the in-country LLM decision, the implementation node for the headless Worker pattern, and the AI Gateway upgrade that wraps it with observability. Three more in the pipeline — the Azure AI Agent Service alternative for full-code agents, M365 Copilot deep dive on the SA M365 Geo, and the Copilot Studio BYO-model playbook.
The headless pattern — a JNB-colocated Worker calling gpt-4o in southafricanorth. Four production-grade concepts: api-key fetch, SSE streaming, throttle handling with KV idempotency, and Entra ID OAuth client credentials.
The observability and resilience upgrade. One URL change buys cache, retry, prompt logs, cost dashboards, and per-tenant rate limits — for ~5 ms latency overhead. The default for any LLM workload heading to production.
The full-code escape hatch when Copilot Studio's low-code ceiling isn't enough. Threads, tool calls, file search, code interpreter — orchestrated against the same SA-North gpt-4o substrate.
Tenant-Geo verification, the surfaces that leak data out of country (Bing grounding, plugins, Loop, Whiteboard), Purview controls, and the audit log path for a POPIA-defensible "where does my data go" answer.
Wiring a Copilot Studio agent to a customer-owned Azure OpenAI deployment in SA North. The full audit evidence pack — Power env region, AOAI policy, BYO config, live trace verification.
Azure OpenAI, Microsoft Graph, Power Platform, and Copilot Studio all share the same identity layer: Entra ID (formerly Azure AD). Learn the OAuth client-credentials flow once and it works for every API in this branch. Two scopes cover almost everything an integration needs.
Register an app in Entra ID, generate a client secret, request a token from login.microsoftonline.com with the scope for the API you want. Tokens are 60-minute by default — cache them in KV with a 55-minute TTL and rotate the secret in Key Vault without touching code.
// Two scopes cover the whole Microsoft surface const aiScope = 'https://cognitiveservices.azure.com/.default'; // Azure OpenAI const graphScope = 'https://graph.microsoft.com/.default'; // M365 / Copilot // Same flow, different scope const body = new URLSearchParams({ client_id: env.AZURE_CLIENT_ID, client_secret: env.AZURE_CLIENT_SECRET, scope: aiScope, // or graphScope grant_type: 'client_credentials', });
The payoff: every leaf in this hub uses the same auth function. The Worker that calls Azure OpenAI is one scope away from being the Worker that reads a user's mailbox. See concept 04 in the Workers explainer for the production-grade KV-cached implementation.
Three different residency commitments, three different boundaries. Azure OpenAI in southafricanorth is fully in country — both the resource and the inference compute are in Joburg. M365 Copilot with the SA M365 Geo stores prompts, completions, and grounding data at rest in country; LLM compute uses the global Azure OpenAI pool. Copilot Studio with BYO model matches the AOAI guarantee — fully in country.
The strict reading (SARB Directive 6, certain DPSA mandates) requires the AOAI guarantee. The pragmatic reading (most POPIA s.72 cases) accepts the M365 Geo guarantee. The decision is in the strategy explainer — implementation lives in the patterns explainer.
Microsoft is the LLM substrate. Cloudflare is the runtime that calls it. ERP / CRM systems are what the agents talk to. The legal branch is the constraint layer. Every Microsoft leaf depends on all four.