feat: add openai realtime models (#298)

* feat: add openai realtime models * chore: bump pipecat * fix: resample telephony audio for openai realtime * fix: sampling rate fix for openai realtime * chore: clean up dead code
2026-07-25 12:01:04 +02:00 · 2026-05-16 18:05:23 +05:30 · 2026-05-16 18:05:23 +05:30 · 2381a803ad
commit 2381a803ad
parent 45b00cd5d0
45 changed files with 1991 additions and 173 deletions
--- a/docs/api-reference/openapi.json
+++ b/docs/api-reference/openapi.json
--- a/docs/configurations/inference-providers.mdx
+++ b/docs/configurations/inference-providers.mdx
@ -120,4 +120,68 @@ To use Gemini 3.1 Live with Dograh, you need a Google Gemini API key. Follow the

 <Note>
  When using a Realtime provider like Gemini Live, you do not need to configure separate TTS and STT services — the realtime model handles speech in and out. However, you **must** still configure an **LLM** under the LLM tab: it powers variable extraction and QA analysis, which the realtime service does not perform.
+</Note>
+
+## Gemini Live on Vertex AI
+
+If you want to run Gemini Live through your own Google Cloud project — for billing consolidation, VPC controls, regional residency, or enterprise IAM — Dograh also supports Gemini Live via **Vertex AI** as a separate provider (`google_vertex_realtime`). The default model is `google/gemini-live-2.5-flash-native-audio`.
+
+Unlike Google AI Studio (which uses a single Gemini API key), Vertex AI authenticates with a **service account** belonging to your Google Cloud project.
+
+### Prerequisites
+
+1. A Google Cloud project with billing enabled.
+2. The Vertex AI API enabled on that project:
+
+   ```bash
+   gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID
+   ```
+
+3. A service account with the **Vertex AI User** role (`roles/aiplatform.user`) on the project:
+
+   ```bash
+   gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
+     --member="serviceAccount:YOUR_SA@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
+     --role="roles/aiplatform.user"
+   ```
+
+4. A **JSON** key for that service account (P12 keys are not supported).
+
+### Creating the service account key
+
+1. In the GCP Console, go to **IAM & Admin → Service Accounts**.
+2. Pick an existing service account (or create a new one).
+3. Open the **Keys** tab → **Add Key → Create new key**.
+4. Choose **JSON** as the key type and click **Create**.
+5. The key file will download to your computer — store it securely and treat it as a secret.
+
+<Note>
+  Always pick **JSON**, not P12. The Vertex AI client libraries used by Dograh only accept service-account JSON keys; P12 is a legacy format retained for older Google Workspace integrations.
+</Note>
+
+### Configuring Vertex AI Realtime in Dograh
+
+1. Go to **Model Configurations** in your Dograh dashboard.
+2. Enable the **Realtime** toggle.
+3. Under the **Realtime** section, select `google_vertex_realtime` as the provider.
+4. Fill in the fields:
+
+   | Field | What to put in |
+   |---|---|
+   | **Model** | Vertex publisher/model id, e.g. `google/gemini-live-2.5-flash-native-audio` |
+   | **Voice** | One of the built-in voices (Puck, Charon, Kore, Fenrir, Aoede) |
+   | **Language** | BCP-47 code (e.g. `en-US`) |
+   | **Project Id** | The `project_id` value from your service-account JSON |
+   | **Location** | GCP region where the model is available (e.g. `us-east4`) |
+   | **Credentials** | Paste the **entire contents** of the service-account JSON file |
+   | **API Key** | Leave blank — Vertex AI does not use API keys |
+
+5. Save the configuration.
+
+<Note>
+  Paste the whole JSON file into the **Credentials** field — including `private_key`, `client_email`, and all other entries. Don't try to extract individual fields. If `Credentials` is left blank, Dograh falls back to **Application Default Credentials (ADC)** from the host environment, which is useful when running Dograh on a GCP VM or GKE pod with an attached service account.
+</Note>
+
+<Note>
+  IAM changes can take up to ~60 seconds to propagate. If you see `Permission 'aiplatform.endpoints.predict' denied`, wait a minute and retry — or double-check that the role was granted to the same service account whose JSON you pasted.
 </Note>
--- a/docs/integrations/telephony/custom.mdx
+++ b/docs/integrations/telephony/custom.mdx
@ -193,7 +193,6 @@ If your provider POSTs webhooks to Dograh (answer URL, status callbacks, hangup
 ```python
 # providers/your_provider/routes.py
 from fastapi import APIRouter, Request
-from api.services.telephony.factory import get_telephony_provider
 from api.services.telephony.status_processor import (
    StatusCallbackRequest,
    _process_status_update,
@ -286,7 +285,7 @@ register(SPEC)
 | Field | Used by |
 | --- | --- |
 | `name` | Stored as the discriminator on every `TelephonyConfiguration` row and as the `WorkflowRunMode` value |
-| `provider_cls` | `factory.get_telephony_provider*` |
+| `provider_cls` | `factory.get_default_telephony_provider`, `get_telephony_provider_by_id`, `get_telephony_provider_for_run` |
 | `config_loader` | `factory._normalize_with_phone_numbers` (replaces the old if/elif chain) |
 | `transport_factory` | `run_pipeline_telephony` |
 | `audio_config` | `create_audio_config()` and `run_pipeline_telephony` |
@ -375,7 +374,7 @@ For end-to-end testing, save your provider through the telephony-configurations

 ## Best Practices

-1. **Trust the registry** — never import another provider's class directly; resolve through `factory.get_telephony_provider*`.
+1. **Trust the registry** — never import another provider's class directly; resolve through the factory helpers (`get_default_telephony_provider`, `get_telephony_provider_by_id`, etc.).
 2. **Sensitive fields** — mark every credential field `sensitive=True` in `ProviderUIMetadata`. The save endpoint masks these on read and preserves the original when the client re-submits a masked value.
 3. **Inbound signature verification** — always validate inbound webhook signatures in `verify_inbound_signature`. Returning `True` when no signature header is present is acceptable; return `False` when a signature *is* present but invalid.
 4. **Transports load credentials lazily** — call `load_credentials_for_transport` with the `telephony_configuration_id` from the workflow run. Don't read the org's default config from `transport.py`.