Commit graph

63 commits

Author SHA1 Message Date
Sabiha Khan
951e73a645
feat: add custom sarvam tts voice (#449)
* feat: add custom sarvam tts voice

* chore: refactor registry and add deepgram multi

---------

Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-06-18 12:33:21 +05:30
Sabiha Khan
a2d9ed24ed
fix: add pace option in sarvam tts config (#447)
* fix: add pace option in sarvam tts config

* fix: generate client
2026-06-17 14:45:09 +05:30
Harshita Jain
e79cb42f31
feat: add Smallest AI TTS and STT provider integration (#444)
* feat: add Smallest AI TTS and STT provider integration

Integrates Smallest AI's Waves (TTS) and Pulse (STT) APIs as selectable
providers in the Dograh platform. Dograh's pipecat fork already contains
the pipecat-level service implementations; this wires them into the API
configuration registry and service factory.

- Added `SMALLEST = "smallest"` to `ServiceProviders` enum
- Registered `SmallestAITTSConfiguration` (lightning-v3.1/v2, voices,
  language, speed) and `SmallestAISTTConfiguration` (pulse model, 30+
  languages) Pydantic config classes with the TTS/STT registries
- Added factory branches in `create_tts_service` and `create_stt_service`
  routing to `SmallestTTSService` and `SmallestSTTService` from pipecat

* fix: update Smallest AI models to v4 naming convention

- TTS: rename lightning-v3.1 → lightning_v3.1, add lightning_v3.1_pro, drop deprecated lightning-v2
- STT: keep pulse only (pulse-pro is not a streaming model)

* fix: change default TTS voice from emily to sophia for lightning_v3.1

emily is not a verified lightning_v3.1 voice; sophia is the pipecat
SmallestTTSService default and confirmed to work with the standard pool.

* fix: replace 9 invalid lightning_v3.1 voice IDs with verified ones

jasmine, james, michael, aria, lara, asel, sarah, rishi, deepika do not
exist in the lightning_v3.1 voice catalog. Replaced with avery, liam,
lucas, olivia, freya, devansh, maya, dhruv, maithili — all verified
against the API.

* fix: smallest ai config validation and tts model compatibility

* chore: ruff fix

* chore: updated smallest ai schema in openapi.json

---------

Co-authored-by: Sabiha Khan <sabihak89@gmail.com>
Co-authored-by: Sabiha Khan <87858386+chewwbaka@users.noreply.github.com>
2026-06-17 12:55:53 +05:30
Abhishek Kumar
3d1886c450 feat: persist split user and bot audio 2026-06-16 15:19:49 +05:30
Abhishek Kumar
dd3f2e7323 feat: add huggingface inferece provider endpoint 2026-06-15 22:56:01 +05:30
Abhishek
1f1149f4d5
feat: billing and credit management v2 (#429)
* feat: use mps generated correlation ID

* chore: update pipecat submodule

* feat: add credit purchase URL

* feat: carve out billing page and show credit ledger

* feat: deprecate dograh based quota tracking

* fix: remove cost calculation from dograh codebase

* fix: create mps account on migrate to v2

* chore: update pipecat
2026-06-12 14:55:30 +05:30
Manasseh
e79c3e26f0
feat: add Cartesia Sonic 3.5 TTS model (#423) 2026-06-10 15:18:13 +05:30
Abhishek Kumar
91ac460799 chore: finish renaming UserConfiguration 2026-06-09 16:30:03 +05:30
Abhishek
cdbd06c8d9
feat: add config v2 to simplify billing (#428)
* feat: add model config v2

* chore: centralize user org selection

* chore: move preferences to platform settings

* fix: decouple org preference and ai model preferences
2026-06-09 16:10:26 +05:30
Vishal Dhateria
7ba95c0fbe
feat: add Azure AI multi-provider support (TTS, STT, Embeddings, Realtime) (#381)
* feat: add Azure AI multi-provider support (TTS, STT, Embeddings, Realtime)

Enables Azure AI services across all model layers so users with Azure
credits can consolidate billing on a single provider.

- Voice (TTS): AzureSpeechTTSConfiguration via azure_speech provider
- Transcriber (STT): AzureSpeechSTTConfiguration via azure_speech provider
- Embedding: AzureOpenAIEmbeddingsConfiguration via azure provider
- Realtime: AzureRealtimeLLMConfiguration via azure_realtime provider

New files:
- api/services/pipecat/realtime/azure_realtime.py
- api/services/gen_ai/embedding/azure_openai_service.py
- api/tests/test_azure_speech_service_factory.py

The UI picks up all four providers automatically from the schema —
no frontend changes required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add validation for URL and params

---------

Co-authored-by: Vishal Dhateria <vishal@finela.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-06-02 12:50:00 +05:30
developer603
8a4a2e25db
feat: allow overriding base URL of OpenAI STT and TTS (#377)
Mirrors the LLM treatment from #368 for the OpenAI STT and OpenAI TTS
providers. Users running OpenAI-compatible self-hosted services (vLLM,
Speaches, llama.cpp, custom proxies) can now point Dograh at them via
the OpenAI provider with `base_url`, instead of being forced onto the
Speaches provider as a workaround.

Changes:

* `registry.py` — add `base_url` field (default `https://api.openai.com/v1`)
  to `OpenAISTTConfiguration` and `OpenAITTSService`, identical in shape
  to the existing `OpenAILLMService.base_url` from #368.

* `service_factory.py` — in the OPENAI branches of `create_stt_service`
  and `create_tts_service`, lift `base_url` off the user config, run it
  through `_validate_runtime_service_url`, and forward it as a kwarg to
  `OpenAISTTService` / `OpenAITTSService` (both already accept it). Same
  pattern as the LLM branch.

* `test_user_configured_service_url_security.py` — adds four runtime
  validation tests covering private-IP rejection and localhost rejection
  in SaaS mode for both STT and TTS. Existing OSS-mode permissiveness
  is unchanged (DEPLOYMENT_MODE=oss skips the validator, as before).

No schema migration needed — Pydantic populates the default; existing
configurations without `base_url` continue to talk to api.openai.com.

`check_validity.py` requires no edits because the per-service validation
loop already iterates `("base_url", "endpoint")` via `getattr`, and the
`_check_openai_api_key` dispatcher already routes OPENAI providers
through the base_url-aware code path (introduced in #368) for STT and
TTS too.

Tests pass locally:

    pytest api/tests/test_user_configured_service_url_security.py
    23 passed in 4.80s   (19 existing + 4 new)

Co-authored-by: developer603 <developer603@users.noreply.github.com>
2026-06-02 12:06:58 +05:30
Matt Van Horn
dd85c4a1b4
fix: support object and array parameters in custom HTTP tools (#373)
* fix: support object and array parameters in custom HTTP tools

* feat(ui): expose object and array types in the custom tool parameter editor

* fix: error handling and schema generation

---------

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-06-02 11:35:38 +05:30
Abhay Babbar
98d2b24cba
Add Sarvam LLM, update Sarvam STT models, expose usage_info on run detail (#351)
* Add Sarvam LLM provider, update Sarvam STT models, expose usage_info on run detail.
Depends on pipecat PR dograh-hq/pipecat#43 for STT string language support.
Submodule bump will follow after that merges.

* test: cover Sarvam STT language mapping; link Sarvam docs

---------

Co-authored-by: Sabiha Khan <sabihak89@gmail.com>
2026-06-01 10:29:31 +05:30
nuthalapativarun
5b61ad645f
feat: stamp API key into model override at save time to survive global provider change (#362)
* fix: stamp API key into model override at save time to survive global provider change

When a workflow overrides the TTS/LLM/STT provider to match the current
global config, the override dict only stores model/voice fields, not the
API key. If the global config later switches to a different provider, the
override can no longer inherit the API key and calls fail.

Fix: enrich_overrides_with_api_keys() copies the global provider's API
key (and other secret fields) into the override dict at workflow-save
time, making the override self-contained regardless of future global
config changes.

* feat: add test coverage and masking logic

---------

Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-05-27 14:01:14 +05:30
Abhishek
8a58b0992d
feat: allow overriding base URL of OpenAI models (#368)
* Add OpenAI-compatible API option in model configuration

Backend-only cherry-pick from 20617db37a.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix: harden the base url settings in SaaS mode

---------

Co-authored-by: Chris Briddock <briddockchristopher@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-27 13:07:45 +05:30
Abhishek
3892b58486
feat: add ultravox realtime and fix signature issue in telephony (#345)
* feat: add ultravox realtime and fix signature issue in telephony

- Add UltraVox realtime
- Fix signature issue on telephony

* fix: fix regression for wss_backend_endpoint
2026-05-23 12:51:55 +05:30
Abhishek Kumar
9135c2da13 feat: add xai grok as realtime model 2026-05-22 18:04:59 +05:30
Abhishek Kumar
291264de7b Merge branch 'main' of https://github.com/dograh-hq/dograh 2026-05-22 14:36:54 +05:30
Abhishek Kumar
ad2fa07058 feat: add google stt and tts. add folders to organize agents 2026-05-22 14:36:50 +05:30
Octopus
0e0d3136ca
feat: add MiniMax provider support (Chat + TTS) (#309)
* feat: add MiniMax provider support (Chat + TTS)

- Add MiniMax LLM provider using OpenAI-compatible API
  - Models: MiniMax-M2.7, MiniMax-M2.7-highspeed
  - Default base URL: https://api.minimax.io/v1
  - Uses MINIMAX_API_KEY for authentication
- Add MiniMax TTS provider using Pipecat's MiniMaxHttpTTSService
  - Models: speech-2.8-hd (default), speech-2.8-turbo
  - 6 built-in voices
  - Requires group_id configuration
- Add unit tests for both providers

* fix(minimax): validator, temperature, session cleanup, reasoning filter
  - check_validity.py: wire MiniMax into _validator_map and enforce
    group_id at save time. Without this, saving a config with a valid
    key was rejected.
  - registry.py: surface temperature on the LLM config (gt=0; MiniMax
    rejects 0) and base_url on the TTS config
  - service_factory.py:
    * Plumb temperature through create_llm_service
    * Normalize TTS base_url to include /t2a_v2 — pipecat appends only
      ?GroupId=... to the URL.
    * Use the new MiniMaxLLMService (from pipecat) to strip
      <think>...</think> reasoning that MiniMax-M2.7 emits inline in
      delta.content (otherwise it leaks straight to TTS).
    * Use MiniMaxOwnedSessionTTSService so the per-instance aiohttp
      session gets closed in cleanup() instead of leaking sockets/FDs.
  - minimax_tts.py: small wrapper around MiniMaxHttpTTSService that owns
    the session it was handed (pipecat's caller-owns-session API
    conflicts with the ftory's per-instance pattern).
  - pipecat submodule: bumps to a commit that adds MiniMaxLLMService — a
    thin OpenAILLMService subclass with the streaming <think> filter
    (mirrors NvidiaLLMService's pattern for NIM reasoning models).
  - Tests updated/added for all of the above.

  Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Sabiha Khan <sabihak89@gmail.com>
2026-05-22 13:09:41 +05:30
Mohamed-Mamdouh
5f28c1b2a9
feat: add Tuner Integration to Dograh (#311)
* Add tuner integration

* bump pipecat version

* chore: update pipecat submodule to match upstream and use tuner-pipecat-sdk 0.2.0

Update pipecat submodule from 0.0.109.dev23 to 13e98d0d9 (the exact commit
upstream dograh-hq/dograh uses after v1.30.1). This installs pipecat-ai as
1.1.0.post277 via setuptools_scm, satisfying tuner-pipecat-sdk 0.2.0's
pipecat-ai>=1.0.0 requirement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* wire tuner

* feat: refactor integrations into self contained packages

* chore: simplify ensure_public_access_token

* fix: remove NodeSpec and make DTOs the source of truth

* feat: send relevant signal to mcp using to_mcp_dict

* fix: fix tests

* cleanup: remove nango integrations

* feat: add agents.md for integrations

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-05-20 14:37:33 +05:30
palinko91
afa78fe859
fix(stt): align Speechmatics language registry with official transcription codes (#317) 2026-05-19 19:00:38 +05:30
Abhishek
2381a803ad
feat: add openai realtime models (#298)
* feat: add openai realtime models

* chore: bump pipecat

* fix: resample telephony audio for openai realtime

* fix: sampling rate fix for openai realtime

* chore: clean up dead code
2026-05-16 18:05:23 +05:30
Abhishek
7f0dac1ad5
feat: configurable ElevenLabs base URL for Data Residency (#278)
* feat: configurable ElevenLabs base URL for Data Residency (#269)

Adds a `base_url` field to `ElevenlabsTTSConfiguration` so users on an
ElevenLabs Data Residency plan (EU, etc.) can point Dograh at the
regional endpoint instead of the hardcoded global one. Defaults to
`https://api.elevenlabs.io`, preserving existing behaviour. The
service factory rewrites the HTTP scheme to WSS when constructing the
WebSocket TTS service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: fix drift

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:01:13 +05:30
Abhishek Kumar
4171ad7a54 feat: add test mode for API trigger 2026-04-25 16:30:26 +05:30
Abhishek
38d1d928b7
feat: agent versioning and model configurations override (#227)
* feat: add tests and migrations

* feat: workflow versioning among published and draft

* feat: add a new settings page to simplify workflow detail page

* fix: fix tsclient generation
2026-04-08 19:20:31 +05:30
Abhishek Kumar
e04ce4e852 chore: add language option for Rime 2026-04-07 18:32:09 +05:30
Abhishek Kumar
e255b33813 feat: add Rime TTS 2026-04-07 14:05:47 +05:30
Abhishek Kumar
c4c4b591db feat: add gladia stt support 2026-04-04 14:47:48 +05:30
Abhishek Kumar
501d06c00d feat: add Assembly AI STT 2026-04-03 07:10:37 +05:30
Abhishek Kumar
f368fe5134 feat: set calculator as custom tool on demand 2026-04-02 14:07:03 +05:30
Abhishek
87e72d5f6f
feat: add gemini live and speaches integration (#220)
* feat: add speaches models

* feat: add gemini realtime and speaches integration

- Add gemini realtime support
- Add speaches support for locally hosted LLMs

* chore: bump pipecat

* feat: add language option

* fix: add skip aggregator types to tts settings

* fix: make API key optional for realtime
2026-03-31 21:42:03 +05:30
Abhishek Kumar
83f05ab146 fix: send auth credentials with validate service keys 2026-03-27 00:07:38 +05:30
Abhishek Kumar
ac0731a374 feat: add support for self hosted llm models 2026-03-24 17:50:45 +05:30
neil from camb.ai
31e075d114
feat: add CAMB AI TTS integration (#187)
Co-authored-by: Abhishek <abhishek@a6k.me>
2026-03-24 12:54:07 +05:30
Abhishek Kumar
f8cf433ba3 feat: add speed configuration for cartesia 2026-03-23 21:51:16 +05:30
Abhishek Kumar
fe84f086ba feat: add AWS Bedrock support 2026-03-19 15:06:59 +05:30
Abhishek
494c60d774
feat: add hybrid text + recording functionality in agents (#191)
* feat: add recording feature in agents

* chore: pin pipecat version

* feat: show usage in UI

* chore: update pipecat
2026-03-16 15:04:08 +05:30
Abhishek Kumar
4d807266a7 feat: download campaign report 2026-03-11 17:57:04 +05:30
Abhishek
57e8768e0b
feat: allow multiple API keys (#186)
* feat: allow multiple API keys

* chore: cleanup

* chore: upgrade pipecat

* feat: make default api_key as list
2026-03-10 15:17:40 +05:30
Abhishek Kumar
e34e4f8f3c chore: upgrade pipecat 2026-03-06 16:49:14 +05:30
Abhishek
a836825b83
feat: add qa node in workflow builder (#172)
* feat: add qa node in workflow builder

* feat: add qa analysis token usage in usage_info

* fix: mask the API key in QA node

* feat: add advanced configuration in QA node
2026-02-25 13:53:30 +05:30
Abhishek Kumar
f1f4830012 fix: fix default voice of cartesia tts 2026-02-23 21:32:03 +05:30
Abhishek Kumar
e111cbb36d feat: add cartesia tts 2026-02-20 20:41:11 +05:30
Sabiha Khan
c711920165
feat: telephony call transfer (#155)
* transfer call

* fix: ignore completed call status

* chore: refactor telephony

* chore: refactor pipecat engine custom tools and other telephony services

* chore: code refactor

* chore: put back office ambient sound files

* chore: remove transport from engine

* fix: fix alembic revision

* chore: remove set_transferring_call from engine

* fix: send OutputAudio frame and let transport chunk it

* fix: reinstate docker compose

* chore: remove unused transfer-twmil route for caller

* chore: update pipecat submodule

---------

Co-authored-by: Abhishek Kumar <abhishek@a6k.me>
2026-02-16 14:33:33 +05:30
Abhishek Kumar
525601088a feat: add languages for deepgram and dograh 2026-02-13 11:44:57 +05:30
Abhishek Kumar
a75bc72cb5 feat: add sarvam v3 voices 2026-02-13 10:11:48 +05:30
Abhishek Kumar
4c936ae57d feat: add openrouter support 2026-02-09 13:31:32 +05:30
Abhishek
911c5ed416
fix: changes to update pipecat version to 0.0.100 (#122)
* feat: add stt evals

* add smart turn as provider

* chore: remove deprecations

* chore: format files

* fix: remove deprecated UserIdleProcessor

* fix: remove deprecated TranscriptProcessor

* chore: update pipecat submodule

* feat: add evals visualisation

* fix: trigger llm generation on client connected and pipeline started

* chore: update pipecat

* chore: update pipecat submodule

* Add tests

* fix: slow loading of workflow page

* chore: update pipecat submodule

* Show version after release

* Fixes #99

* fix: provider check for websocket connection

* Fixes #107

* Fix #96

* chore: fix documentation

* fix: cloudonix campaign call error

---------

Co-authored-by: Sabiha Khan <sabihak89@gmail.com>
2026-01-23 18:53:59 +05:30
Abhishek Kumar
c58aa557de feat: add voices in Dograh configuration 2026-01-19 14:52:54 +05:30