know.2nth.ai Media Voice agents & telephony
media · voice & speech · deployment layer

Voice agents & telephony.

A synthesised voice is not a deployed product. Getting a voice agent onto a real phone line takes two more layers: an orchestration platform that wires speech-in, reasoning, and speech-out into a live conversation, and a telephony / contact-centre platform that connects the calls, routes them, and manages queues and human agents. This leaf maps both layers, names who leads each, and works the South African in-country picture — Amazon Connect in af-south-1, the SA-native incumbents, and the ICASA frame.

Deployment layer Reference knowledge Hot · quarterly review

Voice generation is one layer of three.

A voice agent on a phone is a stack. Confusing the layers is the most common mistake when scoping a voice project — teams pick a voice vendor and discover they still have no way to take a call.

# A voice agent on a phone line = three layers, bottom to top

  TELEPHONY / CCaaS          the phone network, numbers, routing,
                            queues, human-agent desks
                            Amazon Connect · Twilio · Telviva · Euphoria
       |
  AGENT ORCHESTRATION        wires STT + LLM + TTS into a
                            real-time conversation; turns,
                            interruptions, tools, hand-off
                            Vapi · Retell · Bland · LiveKit · Pipecat
       |
  VOICE GENERATION           the synthetic voice itself (TTS),
                            transcription (STT)
                            ElevenLabs · Cartesia · Deepgram — see the landscape leaf

The bottom layer — voice generation — is covered in the voice AI landscape and the ElevenLabs leaf. This leaf covers the two layers above it. Some products span layers: ElevenAgents covers generation + orchestration; Amazon Connect covers telephony and, with Amazon Nova Sonic, orchestration too. But the layers are real, and a deployment needs all three — whether from one vendor or three.

The layer that turns three APIs into a conversation.

Orchestration platforms manage the hard real-time problem — streaming speech in, running the reasoning, streaming speech out, handling interruptions and turn-taking, calling tools, and handing off to a human. Pricing is per-minute of conversation. Latency figures are vendor-stated.

PlatformShapePricingBest for
VapiMiddleware — bring your own LLM, TTS, STT, telephony~$0.05/min platform fee + component costsMaximum control; custom stacks where you pick every component.
Retell AIProduction-ready managed platform~$0.07/minQuality- and compliance-first teams — SOC 2, HIPAA, GDPR; ~600ms response.
Bland AIAPI-first, outbound-optimised~$499/mo + ~$0.11/minRaw scale — high-volume outbound campaigns, 1M+ concurrent calls.
LiveKit AgentsOpen-source WebRTC framework (Python, Node)OSS framework + infra costScalable low-latency real-time agents; adaptive interruption, native MCP tools.
PipecatOpen-source Python frameworkOSS — self-hostRapid prototyping and full developer control; frame-based streaming.
ElevenAgentsVoice + orchestration bundledElevenLabs creditsTeams already on ElevenLabs voices who want generation and orchestration from one vendor.

The split that matters: Vapi, LiveKit, and Pipecat give you control — you assemble the stack. Retell and Bland give you a managed path — faster to production, less to tune. ElevenAgents is the bundled option if voice quality is the priority and you have already chosen ElevenLabs. None of these is the telephony layer; they connect to it.

The layer that owns the phone number.

Telephony platforms provide PSTN connectivity, phone numbers, call routing, IVR, queues, and — for full contact centres — human-agent desktops and analytics. This is a mature, separate industry that voice AI plugs into; it long predates it.

PlatformTypeSA availabilityNotes
Amazon ConnectCloud contact centre (CCaaS)af-south-1 (Cape Town)Native AI — Amazon Q in Connect, Nova Sonic speech-to-speech. Official ElevenLabs integration. Verified in af-south-1.
TwilioCommunications platform (CPaaS)Widely used in SAProgrammable voice APIs; the general-purpose path. Build-it-yourself rather than a packaged contact centre.
Genesys CloudEnterprise CCaaSAvailable; regional specifics unverifiedHeavyweight enterprise contact centre. Strong workforce-management tooling.
VonageCPaaSSA support; specifics unverifiedVoice / messaging APIs. Comparable positioning to Twilio.
TelvivaSA-native UCaaS / CCaaSSA — Teraco data centresMarket leader, ~17% SA share, ~95,000 users. Fully in-country; ICASA-licensed.
Euphoria TelecomSA-native cloud PBX + contact centreSA — Cape Town + JoburgPredictive dialling, skills-based routing, CRM integration. SA-native.
SureTelSA-native hosted VoIPSA data centresSMB-focused hosted VoIP with local failover. SA-native.

One vendor or three — the real choice.

Three ways to assemble a voice agent on a phone

Single-vendor. Amazon Connect alone — telephony plus its native Nova Sonic voice agents. Or a packaged platform that brokers everything. Simplest to operate; least control over the voice and the model.

Telephony + bundled voice-agent. Amazon Connect (or Twilio) for the phone line, ElevenAgents for generation + orchestration. The official ElevenLabs–Amazon Connect integration is exactly this pattern. Best voice quality; two vendors to manage.

Three-layer custom. Twilio or a SIP trunk for telephony, Vapi or LiveKit for orchestration, ElevenLabs or Cartesia for the voice. Maximum control; you own the integration and the latency budget.

You do not choose between ElevenLabs and Amazon Connect

They are different layers. The question is never “ElevenLabs or Amazon Connect” — it is whether to use a voice-generation vendor with a telephony platform, and which of each. The only place they overlap is that Amazon Connect now has its own native voice agents (Nova Sonic), so Connect can be the orchestration layer too — competing there with Vapi and ElevenAgents, not with ElevenLabs the voice vendor.

What “in-country” actually buys you.

For South African contact centres, data residency and local regulation are real constraints. Here is the honest picture, layer by layer.

Amazon Connect in af-south-1

Amazon Connect is available in the AWS Cape Town region (af-south-1, three availability zones). A Connect instance there can claim South African numbers — including 0860 / 0861 shared-cost numbers common in SA business — route SA PSTN, and keep contact-centre data (recordings, transcripts, contact records) resident in Cape Town. For a POPIA data-residency requirement on the telephony and contact data, this is a genuine in-country option.

The caveat: if you plug a global voice-generation vendor (ElevenLabs, Cartesia) into that Connect instance, the voice-synthesis call leaves the country. “In-country” then covers the telephony layer, not the voice-gen layer. For fully in-country synthesis, that means a self-hosted open model or a local provider. Amazon Connect's own Nova Sonic voice runs within the AWS region — check its af-south-1 availability specifically if that matters.

The SA-native incumbents

If the whole telephony stack must be in-country and ICASA-licensed without the AWS shared-responsibility framing, South Africa has real local providers: Telviva (market leader, on Teraco data centres), Euphoria Telecom (Cape Town and Joburg), and SureTel (SMB-focused). These are full SA-native contact-centre and PBX platforms. They are less likely to ship cutting-edge native voice AI than Amazon Connect — but they compose with a voice-agent layer the same way, and the telephony is unambiguously local.

The ICASA frame

VoIP is legal in South Africa. The distinction that matters: an organisation using VoIP for its own internal communication generally does not need a licence; a company providing VoIP / telephony as a service must be ICASA-licensed. The SA-native providers carry their own licensing. POPIA applies to all customer call data regardless of platform — and compliance is a shared responsibility: the platform provides the infrastructure, but the contact-centre operator is accountable for lawful processing, consent, and disclosure. Verify ICASA and POPIA posture for the specific deployment; do not assume the platform vendor carries it for you.

Four questions that narrow it fast.

1 · Is data residency a hard requirement? If yes, the telephony layer is Amazon Connect in af-south-1 or an SA-native incumbent (Telviva, Euphoria) — and accept that a global voice-gen vendor breaks strict residency. If no, the field is open.

2 · Build or buy the orchestration? Buy (Retell, Bland, ElevenAgents) for speed to production. Build (Vapi, LiveKit, Pipecat) for control over latency, model choice, and cost at scale.

3 · Inbound contact centre, or outbound campaigns? Inbound with human escalation points at a full CCaaS (Amazon Connect, Genesys, Telviva). Pure high-volume outbound points at Bland or a campaign-tuned stack.

4 · How much does voice quality matter? If the synthetic voice is the product's face, pair the telephony layer with ElevenLabs or Cartesia rather than relying on a platform's built-in voice. If the voice is functional, a platform-native voice (Amazon Polly, Nova Sonic) is fine and simpler.

Where this leaf links into the tree.

Primary sources.

CCaaS and orchestration platforms move fast — verify pricing and regional availability against the date on this leaf.