A synthesised voice is not a deployed product. Getting a voice agent onto a real phone line takes two more layers: an orchestration platform that wires speech-in, reasoning, and speech-out into a live conversation, and a telephony / contact-centre platform that connects the calls, routes them, and manages queues and human agents. This leaf maps both layers, names who leads each, and works the South African in-country picture — Amazon Connect in af-south-1, the SA-native incumbents, and the ICASA frame.
A voice agent on a phone is a stack. Confusing the layers is the most common mistake when scoping a voice project — teams pick a voice vendor and discover they still have no way to take a call.
# A voice agent on a phone line = three layers, bottom to top TELEPHONY / CCaaS the phone network, numbers, routing, queues, human-agent desks Amazon Connect · Twilio · Telviva · Euphoria | AGENT ORCHESTRATION wires STT + LLM + TTS into a real-time conversation; turns, interruptions, tools, hand-off Vapi · Retell · Bland · LiveKit · Pipecat | VOICE GENERATION the synthetic voice itself (TTS), transcription (STT) ElevenLabs · Cartesia · Deepgram — see the landscape leaf
The bottom layer — voice generation — is covered in the voice AI landscape and the ElevenLabs leaf. This leaf covers the two layers above it. Some products span layers: ElevenAgents covers generation + orchestration; Amazon Connect covers telephony and, with Amazon Nova Sonic, orchestration too. But the layers are real, and a deployment needs all three — whether from one vendor or three.
Orchestration platforms manage the hard real-time problem — streaming speech in, running the reasoning, streaming speech out, handling interruptions and turn-taking, calling tools, and handing off to a human. Pricing is per-minute of conversation. Latency figures are vendor-stated.
| Platform | Shape | Pricing | Best for |
|---|---|---|---|
| Vapi | Middleware — bring your own LLM, TTS, STT, telephony | ~$0.05/min platform fee + component costs | Maximum control; custom stacks where you pick every component. |
| Retell AI | Production-ready managed platform | ~$0.07/min | Quality- and compliance-first teams — SOC 2, HIPAA, GDPR; ~600ms response. |
| Bland AI | API-first, outbound-optimised | ~$499/mo + ~$0.11/min | Raw scale — high-volume outbound campaigns, 1M+ concurrent calls. |
| LiveKit Agents | Open-source WebRTC framework (Python, Node) | OSS framework + infra cost | Scalable low-latency real-time agents; adaptive interruption, native MCP tools. |
| Pipecat | Open-source Python framework | OSS — self-host | Rapid prototyping and full developer control; frame-based streaming. |
| ElevenAgents | Voice + orchestration bundled | ElevenLabs credits | Teams already on ElevenLabs voices who want generation and orchestration from one vendor. |
The split that matters: Vapi, LiveKit, and Pipecat give you control — you assemble the stack. Retell and Bland give you a managed path — faster to production, less to tune. ElevenAgents is the bundled option if voice quality is the priority and you have already chosen ElevenLabs. None of these is the telephony layer; they connect to it.
Telephony platforms provide PSTN connectivity, phone numbers, call routing, IVR, queues, and — for full contact centres — human-agent desktops and analytics. This is a mature, separate industry that voice AI plugs into; it long predates it.
| Platform | Type | SA availability | Notes |
|---|---|---|---|
| Amazon Connect | Cloud contact centre (CCaaS) | af-south-1 (Cape Town) | Native AI — Amazon Q in Connect, Nova Sonic speech-to-speech. Official ElevenLabs integration. Verified in af-south-1. |
| Twilio | Communications platform (CPaaS) | Widely used in SA | Programmable voice APIs; the general-purpose path. Build-it-yourself rather than a packaged contact centre. |
| Genesys Cloud | Enterprise CCaaS | Available; regional specifics unverified | Heavyweight enterprise contact centre. Strong workforce-management tooling. |
| Vonage | CPaaS | SA support; specifics unverified | Voice / messaging APIs. Comparable positioning to Twilio. |
| Telviva | SA-native UCaaS / CCaaS | SA — Teraco data centres | Market leader, ~17% SA share, ~95,000 users. Fully in-country; ICASA-licensed. |
| Euphoria Telecom | SA-native cloud PBX + contact centre | SA — Cape Town + Joburg | Predictive dialling, skills-based routing, CRM integration. SA-native. |
| SureTel | SA-native hosted VoIP | SA data centres | SMB-focused hosted VoIP with local failover. SA-native. |
Single-vendor. Amazon Connect alone — telephony plus its native Nova Sonic voice agents. Or a packaged platform that brokers everything. Simplest to operate; least control over the voice and the model.
Telephony + bundled voice-agent. Amazon Connect (or Twilio) for the phone line, ElevenAgents for generation + orchestration. The official ElevenLabs–Amazon Connect integration is exactly this pattern. Best voice quality; two vendors to manage.
Three-layer custom. Twilio or a SIP trunk for telephony, Vapi or LiveKit for orchestration, ElevenLabs or Cartesia for the voice. Maximum control; you own the integration and the latency budget.
They are different layers. The question is never “ElevenLabs or Amazon Connect” — it is whether to use a voice-generation vendor with a telephony platform, and which of each. The only place they overlap is that Amazon Connect now has its own native voice agents (Nova Sonic), so Connect can be the orchestration layer too — competing there with Vapi and ElevenAgents, not with ElevenLabs the voice vendor.
For South African contact centres, data residency and local regulation are real constraints. Here is the honest picture, layer by layer.
Amazon Connect is available in the AWS Cape Town region (af-south-1, three availability zones). A Connect instance there can claim South African numbers — including 0860 / 0861 shared-cost numbers common in SA business — route SA PSTN, and keep contact-centre data (recordings, transcripts, contact records) resident in Cape Town. For a POPIA data-residency requirement on the telephony and contact data, this is a genuine in-country option.
The caveat: if you plug a global voice-generation vendor (ElevenLabs, Cartesia) into that Connect instance, the voice-synthesis call leaves the country. “In-country” then covers the telephony layer, not the voice-gen layer. For fully in-country synthesis, that means a self-hosted open model or a local provider. Amazon Connect's own Nova Sonic voice runs within the AWS region — check its af-south-1 availability specifically if that matters.
If the whole telephony stack must be in-country and ICASA-licensed without the AWS shared-responsibility framing, South Africa has real local providers: Telviva (market leader, on Teraco data centres), Euphoria Telecom (Cape Town and Joburg), and SureTel (SMB-focused). These are full SA-native contact-centre and PBX platforms. They are less likely to ship cutting-edge native voice AI than Amazon Connect — but they compose with a voice-agent layer the same way, and the telephony is unambiguously local.
VoIP is legal in South Africa. The distinction that matters: an organisation using VoIP for its own internal communication generally does not need a licence; a company providing VoIP / telephony as a service must be ICASA-licensed. The SA-native providers carry their own licensing. POPIA applies to all customer call data regardless of platform — and compliance is a shared responsibility: the platform provides the infrastructure, but the contact-centre operator is accountable for lawful processing, consent, and disclosure. Verify ICASA and POPIA posture for the specific deployment; do not assume the platform vendor carries it for you.
1 · Is data residency a hard requirement? If yes, the telephony layer is Amazon Connect in af-south-1 or an SA-native incumbent (Telviva, Euphoria) — and accept that a global voice-gen vendor breaks strict residency. If no, the field is open.
2 · Build or buy the orchestration? Buy (Retell, Bland, ElevenAgents) for speed to production. Build (Vapi, LiveKit, Pipecat) for control over latency, model choice, and cost at scale.
3 · Inbound contact centre, or outbound campaigns? Inbound with human escalation points at a full CCaaS (Amazon Connect, Genesys, Telviva). Pure high-volume outbound points at Bland or a campaign-tuned stack.
4 · How much does voice quality matter? If the synthetic voice is the product's face, pair the telephony layer with ElevenLabs or Cartesia rather than relying on a platform's built-in voice. If the voice is functional, a platform-native voice (Amazon Polly, Nova Sonic) is fine and simpler.
CCaaS and orchestration platforms move fast — verify pricing and regional availability against the date on this leaf.