Overview
ElevenLabs provides the text-to-speech models in varg. Multiple model variants are available balancing quality, speed, and language support.| Model ID | Quality | Speed | Languages | Credits | ~Cost |
|---|---|---|---|---|---|
eleven_v3 | Best | Standard | Multi | 25 | $0.25 |
eleven_multilingual_v2 | Great | Standard | 29 languages | 25 | $0.25 |
turbo | Good | Fastest | English | 20 | $0.20 |
eleven_turbo_v2 | Good | Fast | English | 20 | $0.20 |
eleven_turbo_v2_5 | Good | Fast | Multi | 20 | $0.20 |
eleven_flash_v2 | Good | Ultra-fast | English | 20 | $0.20 |
eleven_flash_v2_5 | Good | Ultra-fast | Multi | 20 | $0.20 |
Quick start
Available voices
| Voice | Gender | Style | Best for |
|---|---|---|---|
rachel | Female | Calm, warm | Narration |
bella | Female | Soft, gentle | Storytelling |
domi | Female | Confident | Presentations |
elli | Female | Young, cheerful | Social media |
adam | Male | Deep, warm | Narration |
josh | Male | Young, energetic | Social media |
sam | Male | Raspy | Character voices |
antoni | Male | Calm | Podcasts |
arnold | Male | Authoritative | Announcements |
Parameters
The text to convert to speech.
Voice name or ElevenLabs voice ID.
Speech model variant (see table above).
stability — 0 to 1 (default 0.5). Higher = more consistent, lower = more expressive.
similarity_boost — 0 to 1 (default 0.75). How closely to match the voice.Choosing a model
| Scenario | Recommended model |
|---|---|
| Best quality English | eleven_v3 |
| Multiple languages | eleven_multilingual_v2 |
| Fast English narration | turbo or eleven_turbo_v2 |
| Real-time / interactive | eleven_flash_v2_5 |
| Budget | turbo (20 credits) |
Composition example
Use speech in a video composition with captions:Pricing
| Model | Credits | USD |
|---|---|---|
eleven_v3 | 25 | $0.25 |
eleven_multilingual_v2 | 25 | $0.25 |
turbo / eleven_turbo_v2 | 20 | $0.20 |
eleven_turbo_v2_5 | 20 | $0.20 |
eleven_flash_v2 / v2_5 | 20 | $0.20 |
Related models
VEED Fabric
Animate a portrait with generated speech.
Sync Lipsync
Apply speech to existing video.
Whisper
Transcribe audio back to text.