Voice Chat
Talk to your AI naturally with speech-to-text and text-to-speech — choose from local or cloud engines.
http:// — you must access Uplink via HTTPS or localhost. If you're accessing Uplink from another device on your network (e.g. your phone), use a tool like Tailscale which provides HTTPS automatically, or set up your own TLS certificate. TTS (text-to-speech) playback works over HTTP — only the microphone requires HTTPS.
How It Works
Uplink's voice mode has two independent pipelines:
- Speech-to-Text (STT) — Your voice is transcribed to text, then sent to the AI as a regular message
- Text-to-Speech (TTS) — The AI's text response is converted to audio and played back
Both are configured independently in Settings → Voice & TTS. You can use one without the other — for example, dictate messages but read responses, or type messages but hear responses spoken aloud.
TTS Engines
Uplink supports five text-to-speech engines: Edge TTS, OpenAI TTS, XTTS, ElevenLabs, and Piper. You select which one to use in Settings → Voice & TTS, and provide the server URL or API key if needed.
ElevenLabs (Cloud)
ElevenLabs is the primary TTS provider in Uplink. Cloud-based with the highest quality voice synthesis available — natural, expressive, and highly realistic.
- Highest quality voice output among all supported engines
- Requires an ElevenLabs API key
- Voice selection available in Settings → Voice & TTS
- Requires internet connection
- Pay-per-character pricing (free tier available with limits)
OpenAI TTS (Cloud)
OpenAI's text-to-speech API. High quality, natural-sounding voices. Requires an OpenAI API key.
- 10 voices: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer
- Very natural prosody and intonation
- Requires API key and has per-character pricing
- Configure your API key in Settings → Voice & TTS
Edge TTS (Free Cloud)
Microsoft's Edge TTS API. Free, no API key required, excellent quality with many voice options. Requires installing the node-edge-tts package.
# Install the Edge TTS dependency (required)
npm install node-edge-tts
- Hundreds of voices across dozens of languages
- No API key required — completely free
- Requires internet connection
- Slight latency due to cloud round-trip
Piper (Local)
A fast, local text-to-speech engine that runs entirely on your CPU — no GPU required. Great for offline or low-latency use cases with many available voice models.
- Runs locally — no internet connection needed
- Fast inference on CPU (no GPU required)
- Many community voice models available in
.onnxformat - Requires the
PIPER_MODELenvironment variable pointing to the.onnxmodel file - Optional:
PIPER_EXECUTABLEto specify a custom Piper binary path - Optional:
PIPER_CONFIGto specify a custom config JSON file
# Set the required environment variable before starting Uplink
export PIPER_MODEL=/path/to/voice-model.onnx
# Optional: custom Piper binary and config
export PIPER_EXECUTABLE=/path/to/piper
export PIPER_CONFIG=/path/to/voice-model.onnx.json
Coqui XTTS (Local GPU)
A high-quality local option — runs entirely on your machine. Requires an NVIDIA GPU with CUDA support and a running XTTS server.
# Install and run Coqui XTTS server (separate from Uplink)
pip install TTS
python -m TTS.server --model_name tts_models/multilingual/multi-dataset/xtts_v2
# Enter the server URL in Uplink Settings → Voice & TTS → TTS Server URL
STT Engines
Uplink supports four speech-to-text providers: faster-whisper (local, free), Groq Whisper (cloud, free tier), OpenAI Whisper (cloud, paid), and Browser STT (free, built-in).
OpenAI Whisper (Cloud)
OpenAI's cloud-based speech-to-text API. Highly accurate transcription across many languages using the whisper-1 model.
- Uses your existing OpenAI API key
- Model:
whisper-1 - Excellent accuracy across many languages
- Requires internet connection
- Pay-per-minute pricing
Groq Whisper (Cloud)
Groq's cloud-based speech-to-text service. Extremely fast transcription powered by Groq's LPU hardware, with a generous free tier.
- Requires a Groq API key
- Model:
whisper-large-v3-turbo - Very fast transcription — low latency
- Free tier available
- Requires internet connection
faster-whisper (Local)
A highly optimized implementation of OpenAI's Whisper model. Runs locally on CPU or GPU with excellent accuracy.
# Install and run faster-whisper server
pip install faster-whisper
# Use a wrapper server that exposes an HTTP API, e.g.:
pip install whisper-asr-webservice
python -m whisper_asr.webservice --model medium --device cuda
# Enter the server URL in Uplink Settings → Voice & TTS → STT Server URL
Browser STT (Built-in)
Built-in browser speech recognition using the Web Speech API. Free and requires no setup, but accuracy and language support varies by browser.
- No configuration needed — works out of the box
- Free and built into modern browsers
- Accuracy depends on browser and device
- Requires internet connection (most browsers use cloud backend)
Voice Configuration Summary
All voice settings are in Settings → Voice & TTS. Here's what each field does:
| Setting | Description | Default |
|---|---|---|
| TTS Engine | Which TTS backend to use | none (disabled) |
| TTS Server URL | URL for XTTS server (if using local) | localhost:8020 |
| TTS Voice | Voice name/ID for the selected engine | Depends on engine |
| STT Engine | Which STT backend to use | none (disabled) |
| STT Server URL | URL for faster-whisper server | localhost:8000 |
Using Voice Mode
Once configured, activate voice mode from the microphone button in the chat input area. Hold to talk (or tap to toggle, depending on your settings). Your speech is transcribed, sent as a message, and the response is spoken back to you.
Voice mode works on desktop browsers, mobile Safari, and mobile Chrome. For the best experience on mobile, install Uplink as a PWA.