Platform architecture

Sayna sits between your apps, telecom carriers, and third-party speech providers. Use this page to understand the big-picture data flows before diving into specific APIs.

Need request/response schemas? Pair this overview with the REST API, WebSocket, and SIP configuration references.

System landscape

Traffic entry points

WebSocket (/ws) – The main control plane. Clients connect once, send a config payload (providers + LiveKit settings), and then stream audio, emit speak commands, or relay LiveKit data with sub-second latency. REST endpoints – Provide complementary one-off operations:

/voices lists provider catalogs for UI pickers.
/speak performs one-shot synthesis when a persistent socket is overkill.
/livekit/token mints participant tokens so browser/mobile clients can join the same LiveKit room that Sayna already inhabits.

SIP ingress – When the sip block is populated, Sayna exposes a SIP domain/IP. Carriers such as Twilio point their Origination URI at that address (e.g., sip:sip.sayna.ai;transport=tcp). Sayna enforces room_prefix constraints, filters source IPs, and forwards LiveKit webhook payloads to the hosts defined in SIP_HOOKS_JSON.

Voice orchestration pipeline

Configuration – Sayna validates the config message, injects server-side API keys, initializes STT/TTS providers, loads DSP assets (turn detection, noise filtering), and emits ready once everything is live.
Streaming – Binary frames flow through the STT connector; stt_result events stream back with is_final and is_speech_final hints so you know when to respond.
Synthesis – speak commands enqueue TTS jobs. Audio streams back as binary frames and, when LiveKit is enabled, mirrors into the room for human listeners.
Caching – TTS outputs are hashed by text + config and stored under CACHE_PATH, so repeated prompts replay instantly.
Error handling – Most faults surface as JSON error events while keeping the socket open, giving clients room to retry.

LiveKit & SIP interplay

Sayna runs its own LiveKit participant (identity defaults to sayna-ai). Other clients call /livekit/token to get their own access tokens; Sayna never shares its agent keys.
When enable_recording=true, Sayna asks LiveKit to start composite recordings to the S3 target you configured.
SIP mode auto-provisions the LiveKit SIP trunk + dispatch rule (sayna-{room_prefix}-trunk/dispatch). You only configure the carrier; Sayna takes care of LiveKit.
The SIP dispatcher reads sip.h.to headers from LiveKit webhook events and forwards the raw JSON payload to whichever hook hostname matches—perfect for per-domain routing or downstream analytics.

Component responsibilities

Component	Purpose	Key considerations
Edge & optional auth	Terminates TLS/WebSockets and enforces API secret or delegated JWT policies before traffic reaches `/ws` or REST handlers.	Pair with `AUTH_REQUIRED=true` when exposing Sayna on the public internet.
API & WebSocket gateway	Hosts `/ws`, `/voices`, `/speak`, `/livekit/token`, validates payloads, and multiplexes JSON/binary frames.	Idle sockets close after ~10 s; errors rarely tear down the connection.
Voice orchestrator	Manages per-session state (providers, caches, DSP), schedules STT/TTS work, and prevents mismatched streams.	Configuration drives everything—use consistent `stt_config` and `tts_config` when you need cache hits.
Provider connectors	Abstract Deepgram STT/TTS and ElevenLabs TTS behind a unified schema.	Outbound HTTPS is required; provide the relevant API keys via env vars.
LiveKit transport	Joins rooms, mirrors Sayna’s audio into WebRTC, relays data-channel messages, and coordinates recordings.	`/livekit/token` keeps your own agentic logic and participants in sync with Sayna’s LiveKit room.
SIP hook router	Enforces `room_prefix`, respects `SIP_ALLOWED_ADDRESSES`, and forwards webhook payloads to domain-specific HTTPS endpoints.	Combine with Twilio SIP setup when routing PSTN callers into Sayna.

Common deployment patterns

Pattern	Flow
WebSocket-only assistant	App connects to `/ws` → sends `config` (no LiveKit) → streams mic audio → reacts to `stt_result` → sends `speak` for replies.
Hybrid WebSocket + LiveKit	Sayna joins LiveKit via config → browsers fetch `/livekit/token` and join the room → LiveKit audio feeds Sayna’s STT, `speak` audio mirrors back to participants, data-channel messages sync UI state.
PSTN ingress via Twilio	Twilio trunk dials `sip.yourdomain.com;transport=tcp` → Sayna validates `room_prefix` and source IPs → auto-provisioned LiveKit trunk/dispatch routes the call → `/livekit/token` lets agents join from browsers while SIP hooks notify backend systems.

Operational checklist

Dependencies – Outbound HTTPS to providers + LiveKit; inbound TCP from carrier IPs when SIP is enabled.
Scaling – /ws sessions are stateful. Run multiple Sayna instances behind a load balancer (with sticky sessions if the proxy terminates WebSockets).
Observability – Monitor ready vs error rates, STT latency, cache hit ratios, SIP provisioning logs, and webhook forwarding success.
Security – Use AUTH_REQUIRED=true plus API secrets or delegated JWT to protect REST + WebSocket traffic, and keep SIP hooks HTTPS-only.

With this architecture in mind you can choose the right combination of WebSocket, REST, LiveKit, and SIP capabilities for your deployment before implementing the finer details.

Overview

Build with Sayna

Client libraries

Telephony & SIP

Operate

Platform architecture

System landscape

Traffic entry points

Voice orchestration pipeline

LiveKit & SIP interplay

Component responsibilities

Common deployment patterns

Operational checklist

Overview

Build with Sayna

Client libraries

Telephony & SIP

Operate

​System landscape

​Traffic entry points

​Voice orchestration pipeline

​LiveKit & SIP interplay

​Component responsibilities

​Common deployment patterns

​Operational checklist

System landscape

Traffic entry points

Voice orchestration pipeline

LiveKit & SIP interplay

Component responsibilities

Common deployment patterns

Operational checklist