A speech-to-text transcription API powered by OpenAI's Whisper model. Convert audio files to text with high accuracy.
This API transcribes audio files to text. Simply upload an audio file and get back the transcribed text along with segment-level details including timestamps.
- A machine with ffmpeg installed (for audio format support)
- 2GB+ of disk space for the model file
-
Clone and navigate to the project
git clone <repository-url> cd whisper-rust-api
-
Download the transcription model
make download-model
-
Configure (optional)
cp .env.example .env
Edit
.envif you need to change the port or other settings. -
Start the API
make start
The API is now running at http://localhost:8000
Upload an audio file to be transcribed:
curl -X POST --data-binary @audio.mp3 http://localhost:8000/transcribeSupported formats: WAV, MP3, M4A, FLAC, OGG, and more.
Response:
{
"result": {
"text": "Hello world. How are you today?",
"segments": [
{ "id": 0, "start": 0, "end": 1500, "text_start": 0, "text_end": 12 },
{ "id": 1, "start": 1500, "end": 3200, "text_start": 12, "text_end": 31 }
]
},
"processing_time_ms": 1234
}Each segment includes audio timestamps (start/end in milliseconds) and character offsets (text_start/text_end) into the full text string.
curl http://localhost:8000/healthResponse:
{
"status": "ok",
"version": "0.2.0"
}curl http://localhost:8000/infoShows the current model, configuration, and available endpoints.
Run make help to see all available commands:
make start # Start the API server
make stop # Stop the API server
make dev # Run with auto-reload for development
make build # Build the application
make test # Run tests
make clean # Clean up build files
make docker-run # Run in Docker
make docker-down # Stop Docker container
make docker-logs # View Docker logs
Requires Rust to be installed. Then:
make build
make startThe easiest way to deploy:
make docker-runOr use Docker Compose directly:
docker-compose up -dTo stop:
make docker-downCreate a .env file (copy from .env.example) to customize:
| Setting | Default | Purpose |
|---|---|---|
WHISPER_PORT |
8000 | Port the API listens on |
WHISPER_HOST |
0.0.0.0 | Host address |
WHISPER_THREADS |
4 | Number of CPU threads to use |
WHISPER_MODEL |
./models/ggml-base.en.bin |
Path to the model file |
RUST_LOG |
info | Logging detail (debug, info, warn, error) |
The default model (base.en) is optimized for English and is ~140MB.
For other languages or different accuracy/speed tradeoffs, download a different model from Hugging Face:
- tiny.en (75MB) - Fastest, English only
- base.en (140MB) - Default, English only
- small.en (466MB) - Better accuracy, English only
- medium.en (1.5GB) - High accuracy, English only
- tiny (75MB) - Supports 99 languages
- base (140MB) - Supports 99 languages
- small (466MB) - Supports 99 languages
To use a different model:
# Download the model
wget -O models/ggml-small.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.en.bin
# Update .env
echo "WHISPER_MODEL=./models/ggml-small.en.bin" >> .env
# Restart the API
make stop
make startRun make download-model to download the default model.
Install ffmpeg:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Fedora
sudo dnf install ffmpegChange the port in .env:
WHISPER_PORT=8001
- Use a smaller model (e.g.,
tiny.eninstead ofbase.en) - Increase
WHISPER_THREADSin.env(if your CPU has multiple cores) - Ensure no other heavy processes are running
Use a smaller model:
WHISPER_MODEL=./models/ggml-tiny.en.bin
| Method | Endpoint | Description |
|---|---|---|
| POST | /transcribe |
Upload audio and get transcription |
| GET | /health |
Check if API is running |
| GET | /info |
Get API information and configuration |
- Audio format: WAV files process faster than MP3 (no conversion needed)
- File size: Smaller audio files process faster
- Threads: More threads = faster processing on multi-core systems (up to CPU core count)
- Model size: Smaller models are faster but less accurate
- Check the Docker logs:
make docker-logs - Review the configuration in
.env - Ensure the model file was downloaded:
ls models/ggml-*.bin - Verify ffmpeg is installed:
ffmpeg -version