Voice Adapter
Pluggable STT/TTS bridge that connects any agenticore instance to any HTTP voice service. Decoupled from specific implementations — the adapter doesn’t know or care whether the backend is ElevenLabs, Deepgram, Whisper, or a custom service.
Enable
Set one environment variable:
VOICE_SERVICE_URL=http://anton-voice.anton-dev.svc:8400
Optional:
VOICE_API_KEY=your-api-key # forwarded as Authorization header
Architecture
agenticore/voice/__init__.py — is_enabled(), get_adapter() singleton
agenticore/voice/adapter.py — VoiceAdapter protocol + HttpVoiceAdapter
The adapter is stateless, instantiated on first use, and reused via a module-level singleton. No ASGI lifespan changes needed.
Protocol
@runtime_checkable
class VoiceAdapter(Protocol):
async def transcribe(self, audio: bytes, mime_type: str = "audio/ogg") -> str: ...
async def speak(self, text: str, output_format: str = "ogg") -> tuple[bytes, str]: ...
async def close(self) -> None: ...
transcribe(audio, mime_type) → str
Sends audio to the voice service for speech-to-text.
- Method:
POST /stt - Content-Type:
multipart/form-data - Field:
audio(binary file) - Returns: transcribed text string
speak(text, output_format) → (bytes, content_type)
Sends text to the voice service for text-to-speech.
- Method:
POST /speak - Body:
{"text": "...", "voice": {"output_format": "ogg"}, "store": false} - Returns: tuple of
(audio_bytes, content_type_header)
Setting store: false returns raw audio bytes instead of a stored file reference.
HttpVoiceAdapter
The built-in implementation using httpx.AsyncClient:
class HttpVoiceAdapter:
def __init__(self, base_url: str, api_key: str | None = None, timeout: float = 120.0)
base_url— voice service root (e.g.http://anton-voice:8400)api_key— optional, sent asAuthorization: Bearer <key>timeout— request timeout in seconds (default 120, accommodates long TTS)
Usage
from agenticore.voice import is_enabled, get_adapter
if is_enabled():
adapter = get_adapter()
text = await adapter.transcribe(audio_bytes, "audio/ogg")
audio, content_type = await adapter.speak("Hello world")
await adapter.close()
Error Handling
VoiceQuotaError— raised when the voice service returns a quota/billing error. The Telegram connector surfaces this to the user instead of silently falling back.- General HTTP errors raise
httpx.HTTPStatusError— callers should handle gracefully.
Compatible Services
Any HTTP service implementing:
POST /stt— multipart upload, returns{"text": "..."}POST /speak— JSON body, returns raw audio bytes
The Anton voice service (anton-voice) implements this interface using ElevenLabs/Deepgram/Cartesia backends.