mirror of https://github.com/dograh-hq/dograh.git synced 2026-06-13 08:15:21 +02:00

Abhishek Kumar 6d589b7452 feat: add stt evals		2026-01-06 13:18:13 +05:30
..
audio	feat: add stt evals	2026-01-06 13:18:13 +05:30
providers	feat: add stt evals	2026-01-06 13:18:13 +05:30
__init__.py	feat: add stt evals	2026-01-06 13:18:13 +05:30
benchmark.py	feat: add stt evals	2026-01-06 13:18:13 +05:30
README.md	feat: add stt evals	2026-01-06 13:18:13 +05:30

README.md

STT Evaluation Benchmark

Benchmark for comparing Speech-to-Text providers with focus on:

Speaker diarization - identifying who said what
Keyterm boosting - improving recognition of specific terms (Deepgram)

Providers

Provider	Diarization	Keyterm Boost	Notes
Deepgram	Yes	Yes	`diarize=true`, `keyterm` param
Speechmatics	Yes	No	`diarization: "speaker"` config

Setup

# Install dependencies (httpx is required)
pip install httpx

# Set API keys
export DEEPGRAM_API_KEY="your-key"
export SPEECHMATICS_API_KEY="your-key"

Usage

Run from the project root directory:

# Test both providers with diarization
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize

# Test only Deepgram
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --providers deepgram

# Test with keyterm boosting (Deepgram only)
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --keyterms "Dograh" "Pipecat"

# Show word-level timings
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --show-words

# Save results to JSON
python -m evals.stt.benchmark audio/multi_speaker.m4a --diarize --save

CLI Options

Option	Description
`audio_file`	Path to audio file (relative to evals/stt/ or absolute)
`--providers`	Providers to test: `deepgram`, `speechmatics` (default: both)
`--diarize`	Enable speaker diarization
`--keyterms`	Keywords to boost (Deepgram only)
`--language`	Language code (default: en)
`--show-words`	Show individual word timings
`--save`	Save results to JSON in `results/`

Directory Structure

evals/stt/
├── audio/              # Audio test files
│   └── multi_speaker.m4a
├── results/            # Saved benchmark results (JSON)
├── providers/          # STT provider implementations
│   ├── base.py         # Base classes
│   ├── deepgram_provider.py
│   └── speechmatics_provider.py
├── benchmark.py        # Main runner script
└── README.md

Output Example

Provider: DEEPGRAM
Duration: 45.32s
Speakers detected: 2 - ['0', '1']

Transcript:
Hello, welcome to the demo...

--- Speaker Segments ---
[0.0s] Speaker 0: Hello, welcome to the demo.
[2.5s] Speaker 1: Thanks for having me.
...

Adding New Providers

Create a new file in providers/ (e.g., whisper_provider.py)
Implement the STTProvider abstract class
Add to providers/__init__.py
Add to benchmark.py provider choices

API Documentation

Deepgram Diarization: https://developers.deepgram.com/docs/diarization
Deepgram Keyterms: https://developers.deepgram.com/docs/keyterm
Speechmatics Diarization: https://docs.speechmatics.com/features/diarization