Voice Chat

Talk to your AI naturally with speech-to-text and text-to-speech — choose from local or cloud engines.

⚠ HTTPS Required for Microphone Browsers only allow microphone access on secure origins. Speech-to-text will not work over plain http:// — you must access Uplink via HTTPS or localhost. If you're accessing Uplink from another device on your network (e.g. your phone), use a tool like Tailscale which provides HTTPS automatically, or set up your own TLS certificate. TTS (text-to-speech) playback works over HTTP — only the microphone requires HTTPS.

How It Works

Uplink's voice mode has two independent pipelines:

Both are configured independently in Settings → Voice & TTS. You can use one without the other — for example, dictate messages but read responses, or type messages but hear responses spoken aloud.

TTS Engines

Uplink supports five text-to-speech engines: Edge TTS, OpenAI TTS, XTTS, ElevenLabs, and Piper. You select which one to use in Settings → Voice & TTS, and provide the server URL or API key if needed.

ElevenLabs (Cloud)

ElevenLabs is the primary TTS provider in Uplink. Cloud-based with the highest quality voice synthesis available — natural, expressive, and highly realistic.

OpenAI TTS (Cloud)

OpenAI's text-to-speech API. High quality, natural-sounding voices. Requires an OpenAI API key.

Edge TTS (Free Cloud)

Microsoft's Edge TTS API. Free, no API key required, excellent quality with many voice options. Requires installing the node-edge-tts package.

bash
# Install the Edge TTS dependency (required)
npm install node-edge-tts

Piper (Local)

A fast, local text-to-speech engine that runs entirely on your CPU — no GPU required. Great for offline or low-latency use cases with many available voice models.

bash
# Set the required environment variable before starting Uplink
export PIPER_MODEL=/path/to/voice-model.onnx

# Optional: custom Piper binary and config
export PIPER_EXECUTABLE=/path/to/piper
export PIPER_CONFIG=/path/to/voice-model.onnx.json

Coqui XTTS (Local GPU)

A high-quality local option — runs entirely on your machine. Requires an NVIDIA GPU with CUDA support and a running XTTS server.

bash
# Install and run Coqui XTTS server (separate from Uplink)
pip install TTS
python -m TTS.server --model_name tts_models/multilingual/multi-dataset/xtts_v2

# Enter the server URL in Uplink Settings → Voice & TTS → TTS Server URL
ℹ GPU Required XTTS needs an NVIDIA GPU with at least 4 GB VRAM. On CPU it will run but with unacceptable latency for real-time conversation.

STT Engines

Uplink supports four speech-to-text providers: faster-whisper (local, free), Groq Whisper (cloud, free tier), OpenAI Whisper (cloud, paid), and Browser STT (free, built-in).

OpenAI Whisper (Cloud)

OpenAI's cloud-based speech-to-text API. Highly accurate transcription across many languages using the whisper-1 model.

Groq Whisper (Cloud)

Groq's cloud-based speech-to-text service. Extremely fast transcription powered by Groq's LPU hardware, with a generous free tier.

faster-whisper (Local)

A highly optimized implementation of OpenAI's Whisper model. Runs locally on CPU or GPU with excellent accuracy.

bash
# Install and run faster-whisper server
pip install faster-whisper
# Use a wrapper server that exposes an HTTP API, e.g.:
pip install whisper-asr-webservice
python -m whisper_asr.webservice --model medium --device cuda

# Enter the server URL in Uplink Settings → Voice & TTS → STT Server URL

Browser STT (Built-in)

Built-in browser speech recognition using the Web Speech API. Free and requires no setup, but accuracy and language support varies by browser.

Voice Configuration Summary

All voice settings are in Settings → Voice & TTS. Here's what each field does:

SettingDescriptionDefault
TTS EngineWhich TTS backend to usenone (disabled)
TTS Server URLURL for XTTS server (if using local)localhost:8020
TTS VoiceVoice name/ID for the selected engineDepends on engine
STT EngineWhich STT backend to usenone (disabled)
STT Server URLURL for faster-whisper serverlocalhost:8000
⚠ User-Provided Servers Uplink does not bundle or manage TTS/STT servers. You install and run them yourself, then point Uplink to the URLs. This is by design — it keeps Uplink lightweight and gives you full control over models and hardware.

Using Voice Mode

Once configured, activate voice mode from the microphone button in the chat input area. Hold to talk (or tap to toggle, depending on your settings). Your speech is transcribed, sent as a message, and the response is spoken back to you.

Voice mode works on desktop browsers, mobile Safari, and mobile Chrome. For the best experience on mobile, install Uplink as a PWA.

ℹ Microphone on remote devices If you're accessing Uplink from a phone or tablet over your local network, the microphone button won't work unless you're on a secure origin (HTTPS or localhost). See Remote Access for how to set this up with Tailscale.