Quick start (Docker Compose)
Prerequisites: a Linux host (Ubuntu 24.04), Docker,git, curl.
make all seeds .env from .env.example, brings the stack up, and prints an API key plus the
service URLs when it’s done. The meeting bot is built from source (make bot), not pulled —
the published vexaai/vexa-bot:dev on Docker Hub is the older 0.10 line and is not compatible with
this stack’s lifecycle.v1 (bots reach joining then fail). make all warns loudly if the bot image
is missing. For a transcript, set a transcription (STT) token in .env
(TRANSCRIPTION_SERVICE_TOKEN) — get one at vexa.ai/account, or self-host the transcription service on
a GPU for a fully air-gapped install. The API is then at http://localhost:18056 (the gateway) and the
terminal web workbench at http://localhost:13000.
The stack
| Service | Role |
|---|---|
gateway (:18056) | the one front door — auth, scopes, routing |
| admin-api | users + API keys |
| meeting-api | bots, transcripts, recordings (to object storage) |
| runtime | spawns bot + agent containers on demand (via the Docker socket) |
| agent-api | the agent control plane — dispatch, chat, routines, events |
terminal (:13000) | the web workbench — proxies /ws → gateway and REST/login → agent-api/admin-api |
| redis · postgres · minio | bus + scheduler · metadata · object storage (recordings + workspaces) |
BROWSER_IMAGE) and an agent container per dispatch (AGENT_IMAGE), then reaps them. The
BROWSER_IMAGE is built from source here (make bot) and the runtime spawns it without pulling
— so it must exist locally before any bot can join (build it once; make all checks and warns if it’s
absent).
Configuration
- Transcription (STT) —
TRANSCRIPTION_SERVICE_URL/TRANSCRIPTION_SERVICE_TOKEN. Unset → bots join and capture, but produce no transcript. - Object storage — MinIO (
MINIO_*): meeting recordings and agent workspaces live in your bucket. The defaultMINIO_HOST_PORT=9000is a common port — if it’s already taken on your host (make allfails withbind … 127.0.0.1:9000 … address already in use), set a free port in.env. - Agent inference — bring your own: point the agent at your endpoint so no inference leaves the
network (
VEXA_AGENT_MODEL/ mounted credentials). - Secrets —
ADMIN_TOKEN,INTERNAL_API_SECRET, DB credentials. Set real values before exposing.
Transcription (the separate GPU unit)
Speech-to-text is the one GPU workload, so it is carved out of the main stack:make all
runs GPU-free and anywhere, and the STT service is its own deploy unit at
deploy/transcription
(core/meetings/services/transcription
is the brick — faster-whisper / CTranslate2 behind an OpenAI-compatible /v1/audio/transcriptions).
Stand it up wherever a GPU lives (the same host or a dedicated GPU box):
deploy/compose/.env:
collector
→ live fan-out. Scale by adding workers (one GPU each) in the unit’s docker-compose.yml +
nginx.conf.
Publishing behind a reverse proxy
make all binds every service to 127.0.0.1 (loopback only). To expose the terminal at a public
hostname, put a TLS-terminating reverse proxy in front of the terminal port (TERMINAL_PORT, default
13000) and tell the terminal its public origin so auth cookies and OAuth callbacks are correct:
/ws to the gateway itself, so the proxy only needs standard
WebSocket-upgrade headers):