dograh/evals/stt
2026-01-06 15:26:07 +05:30
..
audio add smart turn as provider 2026-01-06 15:26:07 +05:30
providers add smart turn as provider 2026-01-06 15:26:07 +05:30
__init__.py feat: add stt evals 2026-01-06 13:18:13 +05:30
audio_streamer.py add smart turn as provider 2026-01-06 15:26:07 +05:30
benchmark.py add smart turn as provider 2026-01-06 15:26:07 +05:30
README.md add smart turn as provider 2026-01-06 15:26:07 +05:30

STT Evaluation Benchmark

Benchmark for comparing Speech-to-Text providers using WebSocket streaming with focus on:

  • Speaker diarization - identifying who said what
  • Keyterm boosting - improving recognition of specific terms (Deepgram)

Providers

Provider Diarization Keyterm Boost Streaming
Deepgram Yes Yes WebSocket (v1/v2)
Speechmatics Yes Additional vocab WebSocket RT

Setup

# Install dependencies
pip install websockets

# Set API keys
export DEEPGRAM_API_KEY="your-key"
export SPEECHMATICS_API_KEY="your-key"

Note: Requires ffmpeg installed for audio conversion to PCM16.

Usage

Run from the project root directory:

# Test both providers with diarization
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize

# Test only Deepgram
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --providers deepgram

# Test with keyterm boosting (Deepgram)
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --keyterms "Dograh" "Pipecat"

# Use different sample rate (default: 8000 Hz)
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --sample-rate 16000

# Show word-level timings
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --show-words

# Save results to JSON
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --save

CLI Options

Option Description
audio_file Path to audio file (relative to evals/stt/ or absolute)
--providers Providers to test: deepgram, speechmatics (default: both)
--diarize Enable speaker diarization
--keyterms Keywords to boost (Deepgram) / additional vocab (Speechmatics)
--language Language code (default: en)
--sample-rate Audio sample rate for streaming (default: 8000)
--show-words Show individual word timings
--save Save results to JSON in results/

Directory Structure

evals/stt/
├── audio/              # Audio test files
│   └── multi_speaker.m4a
├── results/            # Saved benchmark results (JSON)
├── providers/          # STT provider implementations
│   ├── base.py         # Base classes
│   ├── deepgram_provider.py    # WebSocket streaming
│   └── speechmatics_provider.py # WebSocket streaming
├── audio_streamer.py   # PCM16 audio file streamer
├── benchmark.py        # Main runner script
└── README.md

How It Works

  1. Audio Conversion: The AudioStreamer converts any audio file to raw PCM16 using ffmpeg
  2. WebSocket Connection: Providers connect to their respective WebSocket APIs
  3. Streaming: Audio is sent in chunks (configurable sample rate, default 8kHz)
  4. Result Collection: Transcripts and speaker info are collected from WebSocket responses
  5. Comparison: Results are parsed into a common format for comparison

Output Example

Audio file: /path/to/audio/multi_speaker.m4a
Providers: ['deepgram', 'speechmatics']
Diarization: True
Sample rate: 8000 Hz

============================================================
Provider: DEEPGRAM
============================================================

Duration: 45.32s
Speakers detected: 2 - ['0', '1']

Transcript:
Hello, welcome to the demo...

--- Speaker Segments ---
[0.0s] Speaker 0: Hello, welcome to the demo.
[2.5s] Speaker 1: Thanks for having me.
...

============================================================
COMPARISON SUMMARY
============================================================

Provider        Duration   Speakers   Words
---------------------------------------------
deepgram        45.32      2          312
speechmatics    45.32      2          308

Adding New Providers

  1. Create a new file in providers/ (e.g., whisper_provider.py)
  2. Implement the STTProvider abstract class with WebSocket streaming
  3. Use AudioStreamer for PCM16 conversion
  4. Add to providers/__init__.py
  5. Add to benchmark.py provider choices

API Documentation