mirror of
https://github.com/dograh-hq/dograh.git
synced 2026-06-07 07:55:16 +02:00
87 lines
3.6 KiB
Text
87 lines
3.6 KiB
Text
---
|
|
title: "How Dograh Works"
|
|
description: "The big picture — from API call to phone conversation to transcript"
|
|
---
|
|
|
|
Dograh is a platform for building and running voice AI agents. You define a conversation flow, connect a phone number, and Dograh handles the rest — transcribing the caller's speech (STT), generating intelligent responses (LLM), speaking them back in a natural voice (TTS), and returning structured results when the call ends.
|
|
|
|
## The core loop
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant U as Dashboard / Your App
|
|
participant API as Dograh API
|
|
participant STT as Transcriber (STT)
|
|
participant LLM as LLM Provider
|
|
participant TTS as Voice Synthesizer (TTS)
|
|
participant Tel as Telephony Provider
|
|
participant Cal as Caller / Contact
|
|
|
|
U->>API: Trigger call (dashboard or API)
|
|
API->>Tel: Initiate outbound call
|
|
Tel->>Cal: Phone rings
|
|
Cal-->>Tel: Answers
|
|
Tel-->>API: Raw audio stream
|
|
loop Conversation
|
|
API->>STT: Caller audio
|
|
STT-->>API: Transcribed text
|
|
API->>LLM: Transcript + agent prompt + context
|
|
LLM-->>API: Agent response text
|
|
API->>TTS: Response text
|
|
TTS-->>API: Synthesized audio
|
|
API->>Tel: Audio stream
|
|
Tel->>Cal: Agent speaks
|
|
end
|
|
API->>API: Extract context, run webhooks
|
|
API-->>U: Run record (transcript, recording, gathered data)
|
|
```
|
|
|
|
## Key components
|
|
|
|
**Workflows (Agents)**
|
|
The conversation logic. A workflow is a graph of nodes (conversation steps) connected by edges (conditional transitions). You define what the agent says, when it moves on, and what data it collects.
|
|
|
|
**Runs**
|
|
Every execution of a workflow creates a run. The run record holds the transcript, recording, extracted data, and cost information.
|
|
|
|
**Telephony**
|
|
The phone infrastructure. Dograh connects to your telephony provider (Twilio, Vonage, etc.) to place and receive calls. The audio streams between the caller and Dograh in real time.
|
|
|
|
**Transcriber (STT)**
|
|
Converts the caller's speech to text in real time. Dograh sends the audio stream to your configured speech-to-text provider and uses the transcript to drive both the LLM and the final run record.
|
|
|
|
**LLM Provider**
|
|
Processes the transcript and the active node's prompt to generate the agent's next response. It also evaluates edge conditions to decide when to move the conversation forward.
|
|
|
|
**Voice Synthesizer (TTS)**
|
|
Converts the LLM's text response to audio and streams it back to the caller. The choice of TTS provider and voice is configurable per agent.
|
|
|
|
## How it fits together
|
|
|
|
When you trigger a call:
|
|
|
|
1. Dograh instructs your telephony provider to dial the number
|
|
2. When the caller answers, a real-time audio pipeline opens
|
|
3. The caller's speech is transcribed by the STT provider
|
|
4. The transcript is sent to the LLM with the active node's prompt and conversation history
|
|
5. The LLM responds — the response is synthesized to audio by the TTS provider and streamed to the caller
|
|
6. When an edge condition is met, Dograh transitions to the next node
|
|
7. When an end node is reached, the call ends
|
|
8. Post-call: context is extracted, webhooks fire, the run record is saved
|
|
|
|
## Next steps
|
|
|
|
<CardGroup cols={2}>
|
|
<Card title="Workflows & Agents" href="/core-concepts/workflows-and-agents">
|
|
How the conversation graph works
|
|
</Card>
|
|
<Card title="Calls & Runs" href="/core-concepts/calls-and-runs">
|
|
The lifecycle of a call
|
|
</Card>
|
|
<Card title="Context & Variables" href="/core-concepts/context-and-variables">
|
|
How data flows through a conversation
|
|
</Card>
|
|
<Card title="Campaigns" href="/core-concepts/campaigns">
|
|
Running agents at scale
|
|
</Card>
|
|
</CardGroup>
|