Text-to-Speech

Overview

ArgentOS supports text-to-speech (TTS) through ElevenLabs integration, allowing your agent to speak responses aloud through the dashboard. The dashboard handles voice synthesis using the user's selected voice.

TTS Priority System

The dashboard processes speech in this priority order:

[TTS:text] markers: The agent explicitly marks text to be spoken
Auto-summarize: Long responses are automatically summarized and spoken
Short responses: Brief responses are spoken directly
MEDIA: fallback: Agent-generated audio files (may use a different voice)

The dashboard TTS always uses the voice selected in Audio Settings, ensuring a consistent voice experience regardless of how the speech was triggered.

TTS Markers

The agent can explicitly control what gets spoken using markers in its response:

Here's the full analysis of your server logs...

[TTS:Your server logs show three critical errors in the last hour. I've documented the details below.]

The text inside [TTS:...] is spoken aloud. The rest of the response is displayed visually but not spoken. This lets the agent provide detailed written content while speaking a concise summary.

Voice Selection

ElevenLabs Voices

Configure your preferred voice in the dashboard's Audio Settings:

Open the dashboard
Go to Settings > Audio
Select a voice (Jessica, Lily, etc.)
Adjust speech rate and volume

Configuration

{
  "dashboard": {
    "tts": {
      "provider": "elevenlabs",
      "voice": "jessica",
      "rate": 1.0,
      "volume": 0.8,
      "autoSpeak": true
    }
  }
}

An ElevenLabs API key (configured in dashboard settings)
Browser with Web Audio API support
Speakers or headphones

Overview

TTS Priority System

TTS Markers

Voice Selection

ElevenLabs Voices

Configuration

Agent-Generated Audio

Disabling TTS

Requirements

On this page