Documentation | ElevenLabs Documentation

Eleven v3

Our most emotionally rich, expressive speech synthesis model

Dramatic delivery and performance

70+ languages supported

5,000 character limit

Support for natural multi-speaker dialogue

Eleven Multilingual v2

Lifelike, consistent quality speech synthesis model

Natural-sounding output

29 languages supported

10,000 character limit

Most stable on long-form generations

Eleven Flash v2.5

Our fast, affordable speech synthesis model

Ultra-low latency (~75ms†)

32 languages supported

40,000 character limit

Faster model, 50% lower price per character for API generations

Scribe v2

State-of-the-art speech recognition model

Accurate transcription in 90+ languages

Keyterm prompting, up to 1000 terms

Entity detection, up to 56

Precise word-level timestamps

Speaker diarization, up to 32 speakers

Dynamic audio tagging

Smart language detection

Scribe v2 Realtime

Real-time speech recognition model

Accurate transcription in 90+ languages

Real-time transcription

Low latency (~150ms†)

Precise word-level timestamps

Text to Speech

Convert text into lifelike speech

Speech to Text

Transcribe spoken audio into text

Music

Generate music from text

Text to Dialogue

Create natural-sounding dialogue from text

Image & Video

Generate images and videos from text

Voice changer

Modify and transform voices

Voice isolator

Isolate voices from background noise

Dubbing

Dub audio and videos seamlessly

Sound effects

Create cinematic sound effects

Voices

Clone and design custom voices

Voice Remixing

Transform and enhance existing voices

Forced Alignment

Align text to audio

ElevenAgents

Deploy intelligent voice agents

ElevenLabs Documentation

ElevenLabs Documentation

How ElevenLabs works

Choose your path

ElevenCreative

ElevenAgents

ElevenAPI

Meet the models

Browse by capability

How ElevenLabs works

Choose your path

ElevenCreative

ElevenAgents

ElevenAPI

Meet the models

Browse by capability