Voice
discli’s voice stack lets your bot join a server voice channel, transcribe everyone live, speak via TTS, and play audio. By the end of this guide you’ll have a meeting transcriber that writes speaker-labelled lines to a file and asks Claude to summarise the call when you stop it.
Voice features only work in server voice channels. Discord’s API does not let application bots join DM or group-DM voice calls.
What you’ll build
A bot that joins a voice channel, prints lines like:
- **[14:32:08] Roy:** so what's our plan for the migration- **[14:32:14] Sara:** I'd start with the read path first- **[14:32:21] Roy:** good idea — can we ship that this sprint?…to console and to ~/.discli/transcripts/meeting-<timestamp>.md, then on Ctrl+C asks Claude to produce a structured summary.
Step 1 — Install the voice extras
The core discord-cli-agent install is text-only on purpose — the voice stack pulls in PyNaCl, discord-ext-voice-recv, davey (Discord’s end-to-end voice encryption library), and audioop-lts. Install the extras when you want voice:
pip install 'discord-cli-agent[voice,deepgram]'uv add 'discord-cli-agent[voice,deepgram]'# or, in a checkout:uv sync --extra voice --extra deepgramYou’ll also need:
- libopus on the system —
apt install libopus0(Debian/Ubuntu),brew install opus(macOS), bundled on Windows viadiscord.py’s wheel. - ffmpeg on
PATHif you want to usevoice play(file/URL playback) or TTS playback —apt install ffmpeg/brew install ffmpeg/winget install ffmpeg.
For provider-specific extras:
| Extra | Adds |
|---|---|
voice | core voice send/receive, Silero VAD, audioop-lts for Python 3.13+ |
deepgram | Deepgram streaming STT + Aura TTS |
elevenlabs | ElevenLabs TTS |
openai-voice | OpenAI Whisper STT + TTS |
all-voice | voice + elevenlabs + deepgram |
Step 2 — Run the doctor
Before you debug “nothing is happening”, run:
discli doctordoctor reports the state of every piece the voice stack needs. A working voice install looks like:
CORE [ok] python — Python 3.12.1 [ok] bot token — configured [ok] discord.py — v2.7.1
VOICE [ok] libopus — loaded [ok] PyNaCl — v1.5.0 [ok] discord-ext-voice-recv — v0.5.2a179 [ok] davey (DAVE crypto) — v0.1.5 [ok] DAVE/Opus patches — install ok
STT [ok] DEEPGRAM_API_KEY — set [--] OPENAI_API_KEY (Whisper) — not set
TTS [--] ELEVENLABS_API_KEY — not set [--] OPENAI_API_KEY (TTS) — not set [ok] DEEPGRAM_API_KEY (Aura) — set
TOOLS [ok] ffmpeg — /usr/bin/ffmpeg
OK — no problems found (3 optional check(s) skipped).Lines marked [--] are optional providers you haven’t configured. discli only fails if something you actually need is broken. If doctor reports clean, voice will work.
Use --json for machine-readable output (handy in CI):
discli doctor --jsonStep 3 — Join a voice channel
Pick a voice channel ID (right-click → Copy ID in Discord with Developer Mode on, or use discli channel list), then:
discli voice join "<channel name or id>"voice join is a one-shot command — it connects, confirms, and disconnects. To stay connected you need either an interactive session (next step) or discli serve (covered in Serve Mode).
Step 4 — Listen and transcribe
Start a transcription session in two terminals.
Terminal 1 — keep the bot in the channel:
discli voice join "general"Terminal 2 — listen with Deepgram streaming STT:
export DEEPGRAM_API_KEY=...discli voice listenYou’ll see lines like:
[123456789012345678] hello can you hear me[987654321098765432] yes loud and clearEach line is [user_id] text. Hit Ctrl+C to stop.
voice listen defaults to Deepgram for streaming. Pass --continuous to keep the session alive after each transcript (default), and --duration N to listen for N seconds and stop.
Step 5 — Speak with TTS
export ELEVENLABS_API_KEY=...discli voice speak "joining the call now"Pick a provider with --server config or env var:
| Provider | Env var | Notes |
|---|---|---|
| ElevenLabs | ELEVENLABS_API_KEY | Highest quality, paid |
| OpenAI | OPENAI_API_KEY | Cheap and decent |
| Deepgram Aura | DEEPGRAM_API_KEY | Streaming-friendly |
Customise the voice ID with --voice <id> and the speech rate with --speed 0.8.
Step 6 — Play audio
discli voice play /path/to/file.mp3discli voice play https://example.com/stream.opusPlayback goes through ffmpeg so any format ffmpeg understands works. Use voice stop, voice pause, voice resume to control it.
Step 7 — Build the meeting transcriber
discli ships a complete meeting-transcription example at examples/meeting_transcriber.py. It joins a server voice channel, transcribes everyone with their display names, appends to ~/.discli/transcripts/meeting-<timestamp>.md, and uses the Claude Agent SDK to produce a structured summary on Ctrl+C.
Set up:
pip install 'discord-cli-agent[voice,deepgram]' claude-agent-sdkdiscli config set token YOUR_BOT_TOKENexport DEEPGRAM_API_KEY=...Run:
python examples/meeting_transcriber.py <voice_channel_id>Output during the meeting:
Connected as MyBot#1234 (12345)Listening to #standup. Transcript: /home/me/.discli/transcripts/meeting-20260514-103000.mdPress Ctrl+C to stop and generate a summary.
- **[10:30:14] Roy:** ok let's go around — what did everyone do yesterday- **[10:30:22] Sara:** finished the auth migration, started on the rate limiter- **[10:30:35] Roy:** niceOn Ctrl+C:
Stopping listener…Generating summary from 47 line(s)…Summary cost: $0.0084
=== Meeting Summary ===
## SummaryThe team did a standup covering yesterday's work and today's plan.Sara finished the auth migration; rate limiter is next.
## Key decisions- Move forward with read-path-first for the migration this sprint.
## Action items- Sara: finish rate limiter today.- Roy: write the migration runbook by Friday.
## Open questions- Do we backfill old sessions or expire them?The summary is appended to the same transcript file. You can re-summarise later by piping the markdown back to Claude.
Behind the scenes — DAVE encryption
Modern Discord wraps voice in DAVE (Discord Audio and Video Encryption — their end-to-end voice encryption). The off-the-shelf discord-ext-voice-recv library strips the legacy SecretBox layer but doesn’t know about DAVE, so libopus rejects every packet as corrupted stream.
discli patches PacketDecoder._decode_packet at runtime to insert a davey.DaveSession.decrypt(...) call between the SecretBox layer and libopus, and wraps pop_data so a single bad packet can’t kill the listener. The patches install lazily on the first VoiceEngine.listen_start(...) call — you don’t need to do anything.
If discli doctor reports [FAIL] DAVE/Opus patches, voice listening is broken at the framework level. File an issue with the doctor output.
Permissions
The chat profile denies voice — agents using chat cannot join or speak. Use voice, moderation, or full to allow voice actions:
discli --profile voice voice join general# orDISCLI_PROFILE=voice discli voice listenThe readonly profile allows the stateless voice lookups (voice status, voice where, voice members) so a read-only agent can answer “who’s in voice?” without being able to join.
See Permission Profiles for full details.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
Voice features need extras that aren't installed | Voice extras missing | pip install 'discord-cli-agent[voice]' |
discli doctor shows libopus not loaded | Missing system package | apt install libopus0 / brew install opus |
| Bot joins but transcript is silent | (rare) DAVE patch failed to install | discli doctor — file an issue if VOICE is red |
discli voice play fails with FFmpeg not found | ffmpeg not on PATH | apt install ffmpeg / brew install ffmpeg |
| Deepgram returns nothing | Free tier auth or quota | check DEEPGRAM_API_KEY matches an active project |
See the full list in Common Issues.
What’s next
- Read the Meeting Transcription use case for a deeper walkthrough of the example.
- Build a persistent agent that does voice + text — see Building Agents and Serve Mode.
- Reference all voice commands in CLI Commands.
- See voice serve actions and events in Serve Actions and Serve Events.