Documentation
Everything you need to integrate Sayd into your application.
Speech-to-Text
Real-time streaming transcription with < 200ms latency. Supports batch and streaming modes.
Real-time Streaming
WebSocket-based audio streaming for live transcription with word-level timestamps.
Talk API
Push-to-talk voice input with AI-powered transcript cleaning. Send audio, get polished text.
Quick Start
Install the SDK and make your first API call:
import sayd
client = sayd.Client(api_key="sk-your-key")
# Transcribe an audio file
result = client.stt.transcribe(
model="sayd-v1",
audio=open("audio.wav", "rb"),
language="auto",
)
print(result.text)
# Stream audio for real-time transcription
for chunk in client.stt.stream(audio_stream, model="sayd-v1"):
print(chunk.text, end="", flush=True)Authentication
All API requests require an API key. Get yours from the dashboard and include it in requests:
# Include your API key in the Authorization header
curl -X POST https://api.sayd.dev/v1/stt/transcribe \
-H "Authorization: Bearer sk-your-key" \
-F model="sayd-v1" \
-F language="auto" \
-F audio=@audio.wavTalk API — Voice Input with AI Cleaning
The Talk API combines real-time speech-to-text with LLM-powered transcript cleaning. Send raw audio via WebSocket, get polished, publication-ready text back. Perfect for push-to-talk interfaces, voice notes, and dictation features.
import sayd
client = sayd.Client(api_key="sk-your-key")
# Create a Talk session for real-time voice input with AI cleaning
session = client.talk.create(
language="auto", # "en", "zh", or "multi"
sample_rate=16000, # 8000 or 16000 Hz
codec="pcm16", # "pcm16" or "opus"
cleaning_level="standard", # "light", "standard", "aggressive"
output_format="paragraph", # "paragraph", "bullets", "raw"
)
# Connect to the WebSocket and stream audio
ws = session.connect()
# Stream PCM16 audio frames
for chunk in audio_source:
ws.send_bytes(chunk)
# Signal end of recording — the server will automatically
# drain any in-flight audio frames (up to 500ms), wait for
# STT to stabilize, then run LLM cleaning.
ws.send_json({"type": "end"})
# Receive the AI-cleaned transcript
result = ws.receive() # {"type": "cleaned", "text": "..."}
print(result["text"])WebSocket Protocol
After creating a session via POST /v1/talk, connect to the returned WebSocket URL to stream audio and receive results.
# WebSocket Protocol — Talk API
## Connect
ws = WebSocket("wss://api.sayd.dev/v1/talk/stream/{session_id}?api_key=sk-...")
## Server Messages (you receive)
{"type": "ready"} # Session ready, start sending audio
{"type": "partial", "text": "..."} # Interim transcript (may change)
{"type": "sentence", "segments": []} # Final transcript segment
{"type": "cleaned", "text": "..."} # ✨ AI-cleaned result
{"type": "complete", ...} # Session complete
## Client Messages (you send)
[binary PCM16 frames] # Raw audio data
{"type": "end"} # Signal end of recording
{"type": "keepalive"} # Keep connection alive
## End Signal Behavior
# After receiving "end", the server continues accepting
# in-flight audio for up to 500ms (drain window), ensuring
# no trailing speech is lost. Then it waits for STT to
# stabilize and runs LLM cleaning. No client-side delay needed.API Endpoints
/v1/stt/transcribeTranscribe audio file/v1/stt/streamStream audio for real-time transcription/v1/talkCreate a Talk session (returns WebSocket URL)/v1/talk/stream/{id}Stream audio with AI cleaning/v1/talkList Talk sessions/v1/talk/{id}Get Talk session details & resultsSDK
Python
Official Python SDK — supports both sync and async clients.