For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Connect
BlogHelp CenterAPI PricingSign up
OverviewElevenCreativeElevenAgentsElevenAPIAPI referenceChangelog
OverviewElevenCreativeElevenAgentsElevenAPIAPI referenceChangelog
    • Introduction
    • Models
  • Capabilities
    • Text to Speech
    • Speech to Text
    • Music
    • Text to Dialogue
    • Image & Video
    • Voice Changer
    • Voice Isolator
    • Dubbing
    • Sound Effects
    • Voices
    • Voice Remixing
    • Forced Alignment
    • Voice Agents
    • Speech Engine
  • Administration
    • Account
    • Billing
    • Pay As You Go
    • Consolidated billing
    • Data Residency
    • Usage analytics
    • Files
LogoLogo
Login
Login
Connect
BlogHelp CenterAPI PricingSign up
On this page
  • Overview
  • How it works
  • When to use Speech Engine
  • Key features
  • FAQ
Capabilities

Speech Engine

Add voice to your own chat agent or LLM with ElevenLabs.
Was this page helpful?
Previous

Account

Create and manage your ElevenLabs account to start generating AI audio
Next
Built with

Overview

ElevenLabs Speech Engine adds voice capabilities to any chat agent. ElevenLabs handles speech-to-text and text-to-speech while your server provides the LLM logic. The SDK manages connection lifecycle, turn-taking, and interruption detection so you can focus on your agent’s behavior.

Quickstart

Build a voice agent with the ElevenLabs SDK.

JavaScript SDK reference

Classes, methods, and events for the JavaScript SDK.

Python SDK reference

Classes, methods, and events for the Python SDK.

How it works

Speech Engine connects your server to ElevenLabs over WebSocket. Each connection represents one conversation.

  1. A user speaks in the browser. ElevenLabs captures the audio and transcribes it.
  2. The transcript is sent to your server along with the full conversation history.
  3. Your server passes the transcript to your LLM and streams the response back.
  4. ElevenLabs converts the text to speech and plays it in the browser.

When to use Speech Engine

Speech Engine is designed for developers who want to bring their own LLM and control the conversation logic on their own server. Use it when you need to:

  • Add voice to an existing text-based chat agent
  • Use a specific LLM, fine-tuned model, or custom inference pipeline
  • Keep full control over conversation routing, context management, and tool calling
  • Integrate voice into an existing server application (Express, FastAPI, etc.)

If you want a fully hosted solution where ElevenLabs provides the LLM, knowledge base, and tools, use ElevenAgents instead.

Key features

  • Any LLM - use OpenAI, Anthropic, Google Gemini, or any model that produces text. The SDK auto-extracts text from OpenAI, Anthropic, and Gemini stream formats.
  • Interruption handling - when the user speaks mid-response, the SDK cancels the in-flight LLM request automatically via an AbortSignal (TypeScript) or task cancellation (Python).
  • Streaming - responses are streamed to the browser as they are generated. Pass a string, an async iterable, or a native LLM stream object.
  • Turn-taking - the SDK manages conversation turns, so your server only needs to respond to transcripts.

FAQ

What LLMs are supported?

Any LLM that produces text. The SDK has built-in stream extraction for OpenAI (Responses API and Chat Completions API), Anthropic Messages API, and Google Gemini API. For other providers, pass a plain string or an async iterable of string chunks.

What is the difference between Speech Engine and ElevenAgents?

ElevenAgents is a fully hosted platform where ElevenLabs provides the LLM, knowledge base, and tools. Speech Engine is for developers who want to bring their own LLM and control the conversation logic on their own server.

What server frameworks are supported?

In TypeScript, you can attach Speech Engine to any Node.js HTTP server (Express, Fastify, or plain http.createServer()), or run a standalone WebSocket server. In Python, the SDK provides a standalone server via engine.serve(), or you can integrate with FastAPI, Starlette, or any ASGI framework using engine.create_session().