Voice Chat

Talk to your AI naturally with speech-to-text and text-to-speech — choose from local or cloud engines.

⚠ HTTPS Required for Microphone Browsers only allow microphone access on secure origins. Speech-to-text will not work over plain http:// — you must access Uplink via HTTPS or localhost. If you're accessing Uplink from another device on your network (e.g. your phone), use a tool like Tailscale which provides HTTPS automatically, or set up your own TLS certificate. TTS (text-to-speech) playback works over HTTP — only the microphone requires HTTPS.

How It Works

Uplink's voice mode has two independent pipelines:

Speech-to-Text (STT) — Your voice is transcribed to text, then sent to the AI as a regular message
Text-to-Speech (TTS) — The AI's text response is converted to audio and played back

Both are configured independently in Settings → Voice & TTS. You can use one without the other — for example, dictate messages but read responses, or type messages but hear responses spoken aloud.

TTS Engines

Uplink supports five text-to-speech engines: Edge TTS, OpenAI TTS, XTTS, ElevenLabs, and Piper. You select which one to use in Settings → Voice & TTS, and provide the server URL or API key if needed.

ElevenLabs (Cloud)

ElevenLabs is the primary TTS provider in Uplink. Cloud-based with the highest quality voice synthesis available — natural, expressive, and highly realistic.

Highest quality voice output among all supported engines
Requires an ElevenLabs API key
Voice selection available in Settings → Voice & TTS
Requires internet connection
Pay-per-character pricing (free tier available with limits)

OpenAI TTS (Cloud)

OpenAI's text-to-speech API. High quality, natural-sounding voices. Requires an OpenAI API key.

10 voices: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer
Very natural prosody and intonation
Requires API key and has per-character pricing
Configure your API key in Settings → Voice & TTS

Edge TTS (Free Cloud)

Microsoft's Edge TTS API. Free, no API key required, excellent quality with many voice options. Requires installing the node-edge-tts package.

bash

# Install the Edge TTS dependency (required)
npm install node-edge-tts

Hundreds of voices across dozens of languages
No API key required — completely free
Requires internet connection
Slight latency due to cloud round-trip

Piper (Local)

A fast, local text-to-speech engine that runs entirely on your CPU — no GPU required. Great for offline or low-latency use cases with many available voice models.

Runs locally — no internet connection needed
Fast inference on CPU (no GPU required)
Many community voice models available in .onnx format
Requires the PIPER_MODEL environment variable pointing to the .onnx model file
Optional: PIPER_EXECUTABLE to specify a custom Piper binary path
Optional: PIPER_CONFIG to specify a custom config JSON file

bash

# Set the required environment variable before starting Uplink
export PIPER_MODEL=/path/to/voice-model.onnx

# Optional: custom Piper binary and config
export PIPER_EXECUTABLE=/path/to/piper
export PIPER_CONFIG=/path/to/voice-model.onnx.json

Coqui XTTS (Local GPU)

A high-quality local option — runs entirely on your machine. Requires an NVIDIA GPU with CUDA support and a running XTTS server.

bash

# Install and run Coqui XTTS server (separate from Uplink)
pip install TTS
python -m TTS.server --model_name tts_models/multilingual/multi-dataset/xtts_v2

# Enter the server URL in Uplink Settings → Voice & TTS → TTS Server URL

ℹ GPU Required XTTS needs an NVIDIA GPU with at least 4 GB VRAM. On CPU it will run but with unacceptable latency for real-time conversation.

STT Engines

Uplink supports four speech-to-text providers: faster-whisper (local, free), Groq Whisper (cloud, free tier), OpenAI Whisper (cloud, paid), and Browser STT (free, built-in).

OpenAI Whisper (Cloud)

OpenAI's cloud-based speech-to-text API. Highly accurate transcription across many languages using the whisper-1 model.

Uses your existing OpenAI API key
Model: whisper-1
Excellent accuracy across many languages
Requires internet connection
Pay-per-minute pricing

Groq Whisper (Cloud)

Groq's cloud-based speech-to-text service. Extremely fast transcription powered by Groq's LPU hardware, with a generous free tier.

Requires a Groq API key
Model: whisper-large-v3-turbo
Very fast transcription — low latency
Free tier available
Requires internet connection

faster-whisper (Local)

A highly optimized implementation of OpenAI's Whisper model. Runs locally on CPU or GPU with excellent accuracy.

bash

# Install and run faster-whisper server
pip install faster-whisper
# Use a wrapper server that exposes an HTTP API, e.g.:
pip install whisper-asr-webservice
python -m whisper_asr.webservice --model medium --device cuda

# Enter the server URL in Uplink Settings → Voice & TTS → STT Server URL

Browser STT (Built-in)

Built-in browser speech recognition using the Web Speech API. Free and requires no setup, but accuracy and language support varies by browser.

No configuration needed — works out of the box
Free and built into modern browsers
Accuracy depends on browser and device
Requires internet connection (most browsers use cloud backend)

Voice Configuration Summary

All voice settings are in Settings → Voice & TTS. Here's what each field does:

Setting	Description	Default
TTS Engine	Which TTS backend to use	none (disabled)
TTS Server URL	URL for XTTS server (if using local)	localhost:8020
TTS Voice	Voice name/ID for the selected engine	Depends on engine
STT Engine	Which STT backend to use	none (disabled)
STT Server URL	URL for faster-whisper server	localhost:8000

⚠ User-Provided Servers Uplink does not bundle or manage TTS/STT servers. You install and run them yourself, then point Uplink to the URLs. This is by design — it keeps Uplink lightweight and gives you full control over models and hardware.

Using Voice Mode

Once configured, activate voice mode from the microphone button in the chat input area. Hold to talk (or tap to toggle, depending on your settings). Your speech is transcribed, sent as a message, and the response is spoken back to you.

Voice mode works on desktop browsers, mobile Safari, and mobile Chrome. For the best experience on mobile, install Uplink as a PWA.

ℹ Microphone on remote devices If you're accessing Uplink from a phone or tablet over your local network, the microphone button won't work unless you're on a secure origin (HTTPS or localhost). See Remote Access for how to set this up with Tailscale.