feat: add openai realtime models (#298)

* feat: add openai realtime models

* chore: bump pipecat

* fix: resample telephony audio for openai realtime

* fix: sampling rate fix for openai realtime

* chore: clean up dead code
This commit is contained in:
Abhishek 2026-05-16 18:05:23 +05:30 committed by GitHub
parent 45b00cd5d0
commit 2381a803ad
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
45 changed files with 1991 additions and 173 deletions

File diff suppressed because one or more lines are too long

View file

@ -120,4 +120,68 @@ To use Gemini 3.1 Live with Dograh, you need a Google Gemini API key. Follow the
<Note>
When using a Realtime provider like Gemini Live, you do not need to configure separate TTS and STT services — the realtime model handles speech in and out. However, you **must** still configure an **LLM** under the LLM tab: it powers variable extraction and QA analysis, which the realtime service does not perform.
</Note>
## Gemini Live on Vertex AI
If you want to run Gemini Live through your own Google Cloud project — for billing consolidation, VPC controls, regional residency, or enterprise IAM — Dograh also supports Gemini Live via **Vertex AI** as a separate provider (`google_vertex_realtime`). The default model is `google/gemini-live-2.5-flash-native-audio`.
Unlike Google AI Studio (which uses a single Gemini API key), Vertex AI authenticates with a **service account** belonging to your Google Cloud project.
### Prerequisites
1. A Google Cloud project with billing enabled.
2. The Vertex AI API enabled on that project:
```bash
gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID
```
3. A service account with the **Vertex AI User** role (`roles/aiplatform.user`) on the project:
```bash
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:YOUR_SA@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
```
4. A **JSON** key for that service account (P12 keys are not supported).
### Creating the service account key
1. In the GCP Console, go to **IAM & Admin → Service Accounts**.
2. Pick an existing service account (or create a new one).
3. Open the **Keys** tab → **Add Key → Create new key**.
4. Choose **JSON** as the key type and click **Create**.
5. The key file will download to your computer — store it securely and treat it as a secret.
<Note>
Always pick **JSON**, not P12. The Vertex AI client libraries used by Dograh only accept service-account JSON keys; P12 is a legacy format retained for older Google Workspace integrations.
</Note>
### Configuring Vertex AI Realtime in Dograh
1. Go to **Model Configurations** in your Dograh dashboard.
2. Enable the **Realtime** toggle.
3. Under the **Realtime** section, select `google_vertex_realtime` as the provider.
4. Fill in the fields:
| Field | What to put in |
|---|---|
| **Model** | Vertex publisher/model id, e.g. `google/gemini-live-2.5-flash-native-audio` |
| **Voice** | One of the built-in voices (Puck, Charon, Kore, Fenrir, Aoede) |
| **Language** | BCP-47 code (e.g. `en-US`) |
| **Project Id** | The `project_id` value from your service-account JSON |
| **Location** | GCP region where the model is available (e.g. `us-east4`) |
| **Credentials** | Paste the **entire contents** of the service-account JSON file |
| **API Key** | Leave blank — Vertex AI does not use API keys |
5. Save the configuration.
<Note>
Paste the whole JSON file into the **Credentials** field — including `private_key`, `client_email`, and all other entries. Don't try to extract individual fields. If `Credentials` is left blank, Dograh falls back to **Application Default Credentials (ADC)** from the host environment, which is useful when running Dograh on a GCP VM or GKE pod with an attached service account.
</Note>
<Note>
IAM changes can take up to ~60 seconds to propagate. If you see `Permission 'aiplatform.endpoints.predict' denied`, wait a minute and retry — or double-check that the role was granted to the same service account whose JSON you pasted.
</Note>

View file

@ -193,7 +193,6 @@ If your provider POSTs webhooks to Dograh (answer URL, status callbacks, hangup
```python
# providers/your_provider/routes.py
from fastapi import APIRouter, Request
from api.services.telephony.factory import get_telephony_provider
from api.services.telephony.status_processor import (
StatusCallbackRequest,
_process_status_update,
@ -286,7 +285,7 @@ register(SPEC)
| Field | Used by |
| --- | --- |
| `name` | Stored as the discriminator on every `TelephonyConfiguration` row and as the `WorkflowRunMode` value |
| `provider_cls` | `factory.get_telephony_provider*` |
| `provider_cls` | `factory.get_default_telephony_provider`, `get_telephony_provider_by_id`, `get_telephony_provider_for_run` |
| `config_loader` | `factory._normalize_with_phone_numbers` (replaces the old if/elif chain) |
| `transport_factory` | `run_pipeline_telephony` |
| `audio_config` | `create_audio_config()` and `run_pipeline_telephony` |
@ -375,7 +374,7 @@ For end-to-end testing, save your provider through the telephony-configurations
## Best Practices
1. **Trust the registry** — never import another provider's class directly; resolve through `factory.get_telephony_provider*`.
1. **Trust the registry** — never import another provider's class directly; resolve through the factory helpers (`get_default_telephony_provider`, `get_telephony_provider_by_id`, etc.).
2. **Sensitive fields** — mark every credential field `sensitive=True` in `ProviderUIMetadata`. The save endpoint masks these on read and preserves the original when the client re-submits a masked value.
3. **Inbound signature verification** — always validate inbound webhook signatures in `verify_inbound_signature`. Returning `True` when no signature header is present is acceptable; return `False` when a signature *is* present but invalid.
4. **Transports load credentials lazily** — call `load_credentials_for_transport` with the `telephony_configuration_id` from the workflow run. Don't read the org's default config from `transport.py`.