mirror of
https://github.com/dograh-hq/dograh.git
synced 2026-06-07 07:55:16 +02:00
feat: add openai realtime models (#298)
* feat: add openai realtime models * chore: bump pipecat * fix: resample telephony audio for openai realtime * fix: sampling rate fix for openai realtime * chore: clean up dead code
This commit is contained in:
parent
45b00cd5d0
commit
2381a803ad
45 changed files with 1991 additions and 173 deletions
File diff suppressed because one or more lines are too long
|
|
@ -120,4 +120,68 @@ To use Gemini 3.1 Live with Dograh, you need a Google Gemini API key. Follow the
|
|||
|
||||
<Note>
|
||||
When using a Realtime provider like Gemini Live, you do not need to configure separate TTS and STT services — the realtime model handles speech in and out. However, you **must** still configure an **LLM** under the LLM tab: it powers variable extraction and QA analysis, which the realtime service does not perform.
|
||||
</Note>
|
||||
|
||||
## Gemini Live on Vertex AI
|
||||
|
||||
If you want to run Gemini Live through your own Google Cloud project — for billing consolidation, VPC controls, regional residency, or enterprise IAM — Dograh also supports Gemini Live via **Vertex AI** as a separate provider (`google_vertex_realtime`). The default model is `google/gemini-live-2.5-flash-native-audio`.
|
||||
|
||||
Unlike Google AI Studio (which uses a single Gemini API key), Vertex AI authenticates with a **service account** belonging to your Google Cloud project.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. A Google Cloud project with billing enabled.
|
||||
2. The Vertex AI API enabled on that project:
|
||||
|
||||
```bash
|
||||
gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID
|
||||
```
|
||||
|
||||
3. A service account with the **Vertex AI User** role (`roles/aiplatform.user`) on the project:
|
||||
|
||||
```bash
|
||||
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
|
||||
--member="serviceAccount:YOUR_SA@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
|
||||
--role="roles/aiplatform.user"
|
||||
```
|
||||
|
||||
4. A **JSON** key for that service account (P12 keys are not supported).
|
||||
|
||||
### Creating the service account key
|
||||
|
||||
1. In the GCP Console, go to **IAM & Admin → Service Accounts**.
|
||||
2. Pick an existing service account (or create a new one).
|
||||
3. Open the **Keys** tab → **Add Key → Create new key**.
|
||||
4. Choose **JSON** as the key type and click **Create**.
|
||||
5. The key file will download to your computer — store it securely and treat it as a secret.
|
||||
|
||||
<Note>
|
||||
Always pick **JSON**, not P12. The Vertex AI client libraries used by Dograh only accept service-account JSON keys; P12 is a legacy format retained for older Google Workspace integrations.
|
||||
</Note>
|
||||
|
||||
### Configuring Vertex AI Realtime in Dograh
|
||||
|
||||
1. Go to **Model Configurations** in your Dograh dashboard.
|
||||
2. Enable the **Realtime** toggle.
|
||||
3. Under the **Realtime** section, select `google_vertex_realtime` as the provider.
|
||||
4. Fill in the fields:
|
||||
|
||||
| Field | What to put in |
|
||||
|---|---|
|
||||
| **Model** | Vertex publisher/model id, e.g. `google/gemini-live-2.5-flash-native-audio` |
|
||||
| **Voice** | One of the built-in voices (Puck, Charon, Kore, Fenrir, Aoede) |
|
||||
| **Language** | BCP-47 code (e.g. `en-US`) |
|
||||
| **Project Id** | The `project_id` value from your service-account JSON |
|
||||
| **Location** | GCP region where the model is available (e.g. `us-east4`) |
|
||||
| **Credentials** | Paste the **entire contents** of the service-account JSON file |
|
||||
| **API Key** | Leave blank — Vertex AI does not use API keys |
|
||||
|
||||
5. Save the configuration.
|
||||
|
||||
<Note>
|
||||
Paste the whole JSON file into the **Credentials** field — including `private_key`, `client_email`, and all other entries. Don't try to extract individual fields. If `Credentials` is left blank, Dograh falls back to **Application Default Credentials (ADC)** from the host environment, which is useful when running Dograh on a GCP VM or GKE pod with an attached service account.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
IAM changes can take up to ~60 seconds to propagate. If you see `Permission 'aiplatform.endpoints.predict' denied`, wait a minute and retry — or double-check that the role was granted to the same service account whose JSON you pasted.
|
||||
</Note>
|
||||
|
|
@ -193,7 +193,6 @@ If your provider POSTs webhooks to Dograh (answer URL, status callbacks, hangup
|
|||
```python
|
||||
# providers/your_provider/routes.py
|
||||
from fastapi import APIRouter, Request
|
||||
from api.services.telephony.factory import get_telephony_provider
|
||||
from api.services.telephony.status_processor import (
|
||||
StatusCallbackRequest,
|
||||
_process_status_update,
|
||||
|
|
@ -286,7 +285,7 @@ register(SPEC)
|
|||
| Field | Used by |
|
||||
| --- | --- |
|
||||
| `name` | Stored as the discriminator on every `TelephonyConfiguration` row and as the `WorkflowRunMode` value |
|
||||
| `provider_cls` | `factory.get_telephony_provider*` |
|
||||
| `provider_cls` | `factory.get_default_telephony_provider`, `get_telephony_provider_by_id`, `get_telephony_provider_for_run` |
|
||||
| `config_loader` | `factory._normalize_with_phone_numbers` (replaces the old if/elif chain) |
|
||||
| `transport_factory` | `run_pipeline_telephony` |
|
||||
| `audio_config` | `create_audio_config()` and `run_pipeline_telephony` |
|
||||
|
|
@ -375,7 +374,7 @@ For end-to-end testing, save your provider through the telephony-configurations
|
|||
|
||||
## Best Practices
|
||||
|
||||
1. **Trust the registry** — never import another provider's class directly; resolve through `factory.get_telephony_provider*`.
|
||||
1. **Trust the registry** — never import another provider's class directly; resolve through the factory helpers (`get_default_telephony_provider`, `get_telephony_provider_by_id`, etc.).
|
||||
2. **Sensitive fields** — mark every credential field `sensitive=True` in `ProviderUIMetadata`. The save endpoint masks these on read and preserves the original when the client re-submits a masked value.
|
||||
3. **Inbound signature verification** — always validate inbound webhook signatures in `verify_inbound_signature`. Returning `True` when no signature header is present is acceptable; return `False` when a signature *is* present but invalid.
|
||||
4. **Transports load credentials lazily** — call `load_credentials_for_transport` with the `telephony_configuration_id` from the workflow run. Don't read the org's default config from `transport.py`.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue