Voice Providers
VoiceInfra integrates with four TTS providers, each with different strengths. You can mix and match across your agents based on the trade-offs that matter for each deployment.- ElevenLabs — Widest voice library, most natural output. ElevenLabs offers the largest selection of premium voices with the most natural-sounding prosody and emotion. Best choice when voice quality is the top priority. Supports custom voice cloning. Higher per-minute cost than other providers.
- Cartesia — Ultra-low latency. Cartesia is optimized for speed, making it ideal for high-volume deployments where response time is critical. Excellent voice quality with a smaller library than ElevenLabs. Also supports voice cloning.
- Rime Labs — Natural US English voices. Rime Labs specializes in American English with highly natural-sounding voices tuned for conversational use cases. A strong choice for US-focused deployments where accent and intonation in English are the priority.
- Deepgram — Speech-to-text with integrated TTS. Deepgram is primarily a transcription (STT) leader but also offers TTS output. Choosing Deepgram for both transcription and voice keeps your pipeline within a single provider, which can simplify billing and reduce latency on the STT side.
Selecting a Voice
You assign a voice at the agent level in the no-code builder, or at the node level in the workflow builder.- No-code builder — open the Model & Voice tab and choose your provider and voice from the dropdown. You can preview the voice before saving.
- Workflow builder — set a default voice in the workflow settings, then override it on any individual Conversation node. This lets different stages of the same call use different voices if needed.
Language Support
VoiceInfra agents handle 30+ languages out of the box. You don’t need to build separate agents for each language you serve — automatic language detection handles the switching for you.Automatic Language Detection
When a caller speaks, the platform identifies their language from the first utterance and routes both transcription and response generation to match. The agent replies in the caller’s detected language immediately, with no perceptible delay and no manual configuration required. Automatic detection is enabled by default on every agent. If a caller switches languages mid-conversation — for example, starting in English and continuing in Spanish — the agent detects the change and adapts its responses accordingly.Supported Languages
VoiceInfra supports 30+ languages including:| Region | Languages |
|---|---|
| Americas | English (US/CA), Spanish (ES/LATAM), Portuguese (BR/PT), French (CA) |
| Europe | French, German, Italian, Dutch, Polish, Swedish, Norwegian, Danish, Finnish |
| Middle East & Africa | Arabic (MSA and regional), Turkish, Hebrew |
| South Asia | Hindi, Bengali, Tamil, Telugu, Urdu |
| East Asia | Mandarin Chinese, Japanese, Korean, Indonesian, Thai |
| Other | Russian, Ukrainian, Romanian, Czech, Greek, and more |
Language coverage and transcription accuracy vary by provider. For languages with strong regional accents or specialized vocabulary, test multiple transcription providers (Deepgram, Speechmatics, Groq, OpenAI) to find the best match for your audience.
Custom Voice Cloning
If you need a branded voice rather than a stock voice, VoiceInfra supports custom voice profiles created through ElevenLabs and Cartesia. A cloned voice lets you maintain a consistent, recognizable sound across all your AI agents — whether you’re using your own recorded voice or a custom voice designed for your brand. To set up a custom voice clone:- Create the voice clone in your ElevenLabs or Cartesia account.
- Navigate to Voice Lab in the VoiceInfra dashboard and click Add Voice Clone.
- Select the provider, enter the voice name and language, and save. VoiceInfra pulls the provider voice ID automatically.
- The cloned voice appears in the voice selector across all your agents and workflows.
Frequently Asked Questions
Can I preview a voice before deploying?
Can I preview a voice before deploying?
Yes. The voice selector in both the no-code builder and workflow builder includes a preview function. Select a voice and click the play button to hear a sample. You can preview multiple voices before making your selection, and you can change the voice at any time after deployment without rebuilding the agent.
Does language detection affect latency?
Does language detection affect latency?
No measurable impact. Language detection runs in parallel with the transcription pipeline and adds no perceptible delay to the caller’s experience. The agent’s first response is in the correct language.
Can I use a custom cloned voice?
Can I use a custom cloned voice?
Yes. Custom voice cloning is supported through ElevenLabs and Cartesia. Create the clone in your provider account, register it in VoiceInfra’s Voice Lab, and it becomes available across all your agents immediately. See the Custom Voice Cloning section above for the full setup steps.
Do I need to configure language per agent?
Do I need to configure language per agent?
No. Automatic language detection is on by default for every agent — you don’t need to configure anything to support multilingual callers. If you want to restrict an agent to a specific language (for example, a dedicated Spanish-language support line), you can set a fixed language in the agent configuration to override automatic detection.