chore: update documentation

This commit is contained in:
Abhishek Kumar 2026-06-19 18:11:35 +05:30
parent 7cc0467cfb
commit da4a8a005a
21 changed files with 314 additions and 179 deletions

View file

@ -15,5 +15,12 @@ Please note that you must copy and keep the API key secretly, since this is the
### Service Keys
Service Keys are the keys which you generate to be used in [Model Configurations](inference-providers). In order to generate that, you can go to `/api-keys` and create a new key.
<Note>
You can use a Service Key created in Dograh Cloud (`https://app.dograh.com/api-keys`) in your self-hosted Dograh deployment. Create the Service Key from your Dograh Cloud account, then paste it into **Model Configurations** in your self-hosted instance to use Dograh-managed inference providers. You can purchase Dograh credits in Dograh Cloud; billing happens on the Cloud account that owns the Service Key.
</Note>
![Create a new Service Key](../images/service-keys.png)
<Warning>
Service Keys are scoped to the Dograh Cloud account that created them. You cannot use a Service Key from one cloud-hosted account in another cloud-hosted account; create a new Service Key from the account where you want to use it.
</Warning>

View file

@ -1,187 +1,93 @@
---
title: "Model Configurations"
description: "Voice Agents need AI Models to work, like LLM (Large Language Model), TTS (Voice) and STT (Transcriber). You can use any of your faviourite providers with Dograh Platform to run your Voice Agent."
description: "Configure the speech-to-speech, Dograh-managed, or bring-your-own-key models your Dograh agents use."
---
## How Model Configuration Works
## How model configurations work
Dograh uses a **two-level configuration system** for AI models:
Model Configurations define the default AI model setup for your organization. Agents use this configuration unless you set agent-level model overrides in the agent settings.
1. **Global configuration** — A single set of model settings (LLM, TTS, STT) that applies to **all agents** by default.
2. **Agent-level overrides** — Optional per-agent settings that override the global configuration for specific services.
To configure models, open **Models** in your Dograh dashboard:
If no overrides are set for an agent, it uses the global configuration as-is.
<Note>
Agent-level overrides are **selective** — you can override only the services you want to change. For example, you can override just the LLM provider for a specific agent while keeping the global TTS and STT settings. There is no need to reconfigure every service.
</Note>
## Global Configuration
The global configuration is the default model setup shared across all your agents. Dograh ships with its own models by default — when you sign up on https://app.dograh.com or set up the platform on your self-hosted infrastructure, you get some Dograh model credits to start with.
To configure the global models, go to **Model Configurations** in your dashboard:
- **Hosted:** `https://app.dograh.com/model-configurations`
- **Self-hosted:** `http://localhost:3010/model-configurations`
- **Local development:** `http://localhost:3000/model-configurations`
![Model Configuration](../images/service-configuration.png)
The Models page has three top-level sections:
From here you can configure each service:
| Service | What it does |
|---------|-------------|
| **LLM** | The language model that generates responses (e.g., OpenAI GPT-4.1, Anthropic Claude) |
| **TTS (Voice)** | The text-to-speech model that converts responses to spoken audio (e.g., ElevenLabs, Cartesia) |
| **STT (Transcriber)** | The speech-to-text model that transcribes user speech (e.g., Deepgram, AssemblyAI) |
| **Realtime** | A single speech-to-speech model that handles LLM, TTS, and STT in one (e.g., Gemini Live) |
Select a provider from the dropdown and configure the API key, model, and any provider-specific settings. For Dograh's own models, see [Service Keys](api-keys) for instructions on creating Service Keys.
## Agent-Level Model Overrides
You can override the global model configuration for any individual agent. This is useful when different agents have different requirements — for example, a customer support agent might use a faster, cheaper LLM while a sales agent uses a more capable one.
### Configuring overrides
1. Open the agent you want to customize.
2. Go to **Settings** in the agent detail page.
3. Select the **Model Overrides** tab.
4. You will see tabs for each service: **LLM**, **Voice** (TTS), and **Transcriber** (STT).
5. Toggle **Override** on for the service you want to change.
6. Configure the provider, model, and other settings as needed.
7. Save your changes.
### Selective overrides
Each service can be toggled independently. When an override is **off** for a service, the agent inherits the global setting for that service. When an override is **on**, the agent uses the override setting instead.
| LLM Override | TTS Override | STT Override | Result |
|---|---|---|---|
| Off | Off | Off | Agent uses global config for all services |
| On | Off | Off | Agent uses custom LLM, global TTS and STT |
| Off | On | Off | Agent uses global LLM and STT, custom TTS |
| On | On | On | Agent uses custom config for all services |
For example, if you only want to change the voice for a specific agent:
1. Leave the LLM and Transcriber overrides **off**.
2. Toggle the Voice override **on**.
3. Select a different TTS provider or voice.
4. The agent will use your custom voice while still using the global LLM and STT.
### Realtime mode override
You can also switch an individual agent to use a **Realtime** provider (such as Gemini Live) even if the global configuration uses standard LLM + TTS + STT. Toggle the **Realtime** switch in the Model Overrides tab, then configure the realtime provider, model, and voice.
| Section | When to use it |
|---------|----------------|
| **Speech to Speech** | Use a realtime speech-to-speech model for the live conversation. You still configure an LLM alongside it for variable extraction and QA. |
| **Dograh** | Use Dograh-managed LLM, voice, and transcriber models behind one Dograh Service Key. |
| **BYOK** | Bring your own provider keys and configure LLM, Voice, Transcriber, and Embedding models separately. |
<Note>
When an agent uses a Realtime provider, it replaces the separate TTS and STT services with a single speech-to-speech model. An **LLM** is still required alongside the Realtime model — it's used for out-of-band tasks like variable extraction and QA analysis, which the realtime service does not handle. Context compaction is not applicable in Realtime mode and is ignored if enabled.
Model settings are organization-scoped. If no agent-level override is set, every agent in the organization uses the saved global configuration.
</Note>
## Gemini 3.1 Live
## Speech to Speech
Gemini 3.1 Live is Google's realtime multimodal API that handles both LLM and voice in a single model. Instead of configuring separate LLM, TTS, and STT services, Gemini Live acts as an all-in-one realtime provider — it processes speech input, generates a response, and speaks it back, all over a single streaming connection.
Use **Speech to Speech** when you want a realtime model to handle the live spoken conversation directly. In this mode, the realtime model handles speech input and speech output, so you do not configure separate Voice and Transcriber services.
Dograh supports Gemini 3.1 Live as a **Realtime** provider. The default model is `gemini-3.1-flash-live-preview`.
![Speech to Speech model configuration](../images/model-configuration-speech-to-speech.png)
### Available Voices
The Speech to Speech section has nested tabs:
You can choose from the following built-in voices:
| Tab | What to configure |
|-----|-------------------|
| **Realtime Model** | The speech-to-speech provider, model, voice, and API key. |
| **LLM** | A standard LLM used for non-realtime tasks such as variable extraction and QA analysis. |
| **Embedding** | An embedding model used by features that need embeddings, such as retrieval from knowledge base content. |
| Voice | Description |
|-------|-------------|
| Puck | Default voice |
| Charon | — |
| Kore | — |
| Fenrir | — |
| Aoede | — |
<Warning>
An LLM is still required when you use Speech to Speech. The realtime model handles the live voice conversation, but Dograh uses the LLM for analysis tasks that happen outside the live audio stream.
</Warning>
### Getting a Gemini API Key
## Dograh
To use Gemini 3.1 Live with Dograh, you need a Google Gemini API key. Follow these steps:
Use **Dograh** when you want Dograh to manage the model providers for you. This path uses one Dograh Service Key for Dograh-managed models instead of separate provider keys for LLM, Voice, and Transcriber.
1. Go to [Google AI Studio](https://aistudio.google.com/).
2. Sign in with your Google account.
3. Click on **Get API Key** in the left sidebar.
4. Click **Create API Key**.
5. Select an existing Google Cloud project or create a new one.
6. Copy the generated API key and store it securely.
![Dograh model configuration](../images/model-configuration-dograh.png)
<Note>
The Gemini API key is different from a Google Cloud service account key. You specifically need a **Gemini API key** from Google AI Studio for use with Dograh.
</Note>
Configure:
### Configuring Gemini 3.1 Live in Dograh
| Field | What it controls |
|-------|------------------|
| **Voice** | The Dograh-managed voice to use. |
| **Speed** | The voice playback speed. |
| **Language** | The language behavior, including multilingual auto-detect when available. |
| **API Key** | Your Dograh Service Key. Create Service Keys from **Developers**. |
1. Go to **Model Configurations** in your Dograh dashboard (`https://app.dograh.com/model-configurations` for hosted or `http://localhost:3010/model-configurations` for local).
2. Under the **Realtime** section, select `google_realtime` as the provider.
3. Paste your Gemini API key.
4. Select the model (`gemini-3.1-flash-live-preview` is available by default, or you can enter a model name manually).
5. Choose a voice from the dropdown (default is `Puck`).
6. Select the language (currently `en` is supported).
For details on creating and using Service Keys, see [API Keys and Service Keys](api-keys#service-keys).
<Note>
When using a Realtime provider like Gemini Live, you do not need to configure separate TTS and STT services — the realtime model handles speech in and out. However, you **must** still configure an **LLM** under the LLM tab: it powers variable extraction and QA analysis, which the realtime service does not perform.
</Note>
## BYOK
## Gemini Live on Vertex AI
Use **BYOK** when you want to bring your own provider accounts and API keys. This gives you separate control over each model category.
If you want to run Gemini Live through your own Google Cloud project — for billing consolidation, VPC controls, regional residency, or enterprise IAM — Dograh also supports Gemini Live via **Vertex AI** as a separate provider (`google_vertex_realtime`). The default model is `google/gemini-live-2.5-flash-native-audio`.
![BYOK model configuration](../images/model-configuration-byok.png)
Unlike Google AI Studio (which uses a single Gemini API key), Vertex AI authenticates with a **service account** belonging to your Google Cloud project.
The BYOK section has nested tabs:
### Prerequisites
| Tab | What to configure |
|-----|-------------------|
| **LLM** | The chat or reasoning model provider, model, optional base URL, and API key. |
| **Voice** | The text-to-speech provider, voice, model, speed, optional base URL, and API key. |
| **Transcriber** | The speech-to-text provider, model, language, and API key. |
| **Embedding** | The embedding provider, model, and API key. |
1. A Google Cloud project with billing enabled.
2. The Vertex AI API enabled on that project:
Provider-specific fields appear only when they apply. For example, OpenAI-compatible LLM providers can expose a **Base URL** field, ElevenLabs voices can expose a voice ID, and transcribers can expose language options.
```bash
gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID
```
## Agent-level model overrides
3. A service account with the **Vertex AI User** role (`roles/aiplatform.user`) on the project:
You can override the organization model configuration for an individual agent. This is useful when different agents need different models, voices, transcribers, or providers.
```bash
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:YOUR_SA@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
```
To configure an override:
4. A **JSON** key for that service account (P12 keys are not supported).
1. Open the agent.
2. Go to **Settings**.
3. Open **Model Overrides**.
4. Enable the override for the service you want to customize.
5. Configure the provider, model, and keys for that service.
6. Save the agent settings.
### Creating the service account key
1. In the GCP Console, go to **IAM & Admin → Service Accounts**.
2. Pick an existing service account (or create a new one).
3. Open the **Keys** tab → **Add Key → Create new key**.
4. Choose **JSON** as the key type and click **Create**.
5. The key file will download to your computer — store it securely and treat it as a secret.
<Note>
Always pick **JSON**, not P12. The Vertex AI client libraries used by Dograh only accept service-account JSON keys; P12 is a legacy format retained for older Google Workspace integrations.
</Note>
### Configuring Vertex AI Realtime in Dograh
1. Go to **Model Configurations** in your Dograh dashboard.
2. Enable the **Realtime** toggle.
3. Under the **Realtime** section, select `google_vertex_realtime` as the provider.
4. Fill in the fields:
| Field | What to put in |
|---|---|
| **Model** | Vertex publisher/model id, e.g. `google/gemini-live-2.5-flash-native-audio` |
| **Voice** | One of the built-in voices (Puck, Charon, Kore, Fenrir, Aoede) |
| **Language** | BCP-47 code (e.g. `en-US`) |
| **Project Id** | The `project_id` value from your service-account JSON |
| **Location** | GCP region where the model is available (e.g. `us-east4`) |
| **Credentials** | Paste the **entire contents** of the service-account JSON file |
| **API Key** | Leave blank — Vertex AI does not use API keys |
5. Save the configuration.
<Note>
Paste the whole JSON file into the **Credentials** field — including `private_key`, `client_email`, and all other entries. Don't try to extract individual fields. If `Credentials` is left blank, Dograh falls back to **Application Default Credentials (ADC)** from the host environment, which is useful when running Dograh on a GCP VM or GKE pod with an attached service account.
</Note>
<Note>
IAM changes can take up to ~60 seconds to propagate. If you see `Permission 'aiplatform.endpoints.predict' denied`, wait a minute and retry — or double-check that the role was granted to the same service account whose JSON you pasted.
</Note>
Agent-level overrides are selective. For example, you can override only the Voice service for one agent while it continues to use the organization-level LLM and Transcriber configuration.

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB