mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-26 00:46:22 +02:00
Add AsyncAPI spec for websocket (#613)
* AsyncAPI for websocket docs * Delete old docs * Update docs/README.md to point to docs site * Add generated API docs
This commit is contained in:
parent
fce43ae035
commit
8a17375603
110 changed files with 8325 additions and 23324 deletions
357
specs/websocket/STREAMING.md
Normal file
357
specs/websocket/STREAMING.md
Normal file
|
|
@ -0,0 +1,357 @@
|
|||
# WebSocket Streaming Message Patterns
|
||||
|
||||
This document describes streaming behavior for TrustGraph WebSocket services.
|
||||
|
||||
## Overview
|
||||
|
||||
Many TrustGraph services support streaming responses, where a single request results in multiple response messages sent progressively over time. This enables:
|
||||
- Real-time output as it's generated
|
||||
- Lower latency for first results
|
||||
- Progressive UI updates
|
||||
- Better user experience for long-running operations
|
||||
|
||||
## Streaming Protocol
|
||||
|
||||
### Request ID Correlation
|
||||
|
||||
All streaming responses for a request share the same `id`:
|
||||
|
||||
```json
|
||||
// Single request
|
||||
{"id": "req-1", "service": "agent", "flow": "my-flow", "request": {...}}
|
||||
|
||||
// Multiple responses with same id
|
||||
{"id": "req-1", "response": {...}} // First chunk
|
||||
{"id": "req-1", "response": {...}} // Second chunk
|
||||
{"id": "req-1", "response": {...}} // Final chunk
|
||||
```
|
||||
|
||||
### Completion Indicators
|
||||
|
||||
Services use different fields to indicate the final message:
|
||||
|
||||
| Service | Completion Field | Final Value |
|
||||
|---------|-----------------|-------------|
|
||||
| agent | `end-of-dialog` | `true` |
|
||||
| document-rag | `end-of-stream` | `true` |
|
||||
| graph-rag | `end-of-stream` | `true` |
|
||||
| text-completion | `end-of-stream` | `true` |
|
||||
| prompt | `end-of-stream` | `true` |
|
||||
|
||||
## Streaming Services
|
||||
|
||||
### Agent Service
|
||||
|
||||
Agent service streams thought processes, actions, observations, and answers:
|
||||
|
||||
```json
|
||||
// Request
|
||||
{
|
||||
"id": "agent-1",
|
||||
"service": "agent",
|
||||
"flow": "my-flow",
|
||||
"request": {
|
||||
"question": "What is quantum computing?",
|
||||
"streaming": true
|
||||
}
|
||||
}
|
||||
|
||||
// Response stream
|
||||
{
|
||||
"id": "agent-1",
|
||||
"response": {
|
||||
"chunk-type": "thought",
|
||||
"content": "I need to explain quantum computing concepts",
|
||||
"end-of-dialog": false
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
"id": "agent-1",
|
||||
"response": {
|
||||
"chunk-type": "answer",
|
||||
"content": "Quantum computing is a type of computing that uses quantum mechanical phenomena...",
|
||||
"end-of-dialog": false
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
"id": "agent-1",
|
||||
"response": {
|
||||
"chunk-type": "answer",
|
||||
"content": "Key principles include superposition and entanglement.",
|
||||
"end-of-dialog": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Chunk Types**:
|
||||
- `thought`: Internal reasoning
|
||||
- `action`: Tool/action being invoked
|
||||
- `observation`: Result from tool/action
|
||||
- `answer`: Final answer content
|
||||
|
||||
### Document RAG Service
|
||||
|
||||
Document RAG streams answer chunks:
|
||||
|
||||
```json
|
||||
// Request
|
||||
{
|
||||
"id": "rag-1",
|
||||
"service": "document-rag",
|
||||
"flow": "my-flow",
|
||||
"request": {
|
||||
"query": "What are the main features?",
|
||||
"streaming": true,
|
||||
"doc-limit": 20
|
||||
}
|
||||
}
|
||||
|
||||
// Response stream
|
||||
{
|
||||
"id": "rag-1",
|
||||
"response": {
|
||||
"content": "The main features include: 1) ",
|
||||
"end-of-stream": false
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
"id": "rag-1",
|
||||
"response": {
|
||||
"content": "Knowledge graph storage, 2) Vector embeddings, ",
|
||||
"end-of-stream": false
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
"id": "rag-1",
|
||||
"response": {
|
||||
"content": "3) RAG capabilities.",
|
||||
"end-of-stream": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Graph RAG Service
|
||||
|
||||
Similar to Document RAG but retrieves from knowledge graph:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "graph-rag-1",
|
||||
"service": "graph-rag",
|
||||
"flow": "my-flow",
|
||||
"request": {
|
||||
"query": "What entities are related to quantum computing?",
|
||||
"streaming": true,
|
||||
"triple-limit": 100
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response stream has same structure as Document RAG.
|
||||
|
||||
### Text Completion Service
|
||||
|
||||
Streams generated text:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "complete-1",
|
||||
"service": "text-completion",
|
||||
"flow": "my-flow",
|
||||
"request": {
|
||||
"prompt": "Once upon a time",
|
||||
"streaming": true,
|
||||
"max-output-tokens": 100
|
||||
}
|
||||
}
|
||||
|
||||
// Response stream
|
||||
{
|
||||
"id": "complete-1",
|
||||
"response": {
|
||||
"content": " there was a ",
|
||||
"end-of-stream": false
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
"id": "complete-1",
|
||||
"response": {
|
||||
"content": "kingdom far away...",
|
||||
"end-of-stream": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Prompt Service
|
||||
|
||||
Streams prompt expansion/generation:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "prompt-1",
|
||||
"service": "prompt",
|
||||
"flow": "my-flow",
|
||||
"request": {
|
||||
"template": "default-template",
|
||||
"variables": {"topic": "quantum"},
|
||||
"streaming": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response stream contains progressive prompt text.
|
||||
|
||||
## Non-Streaming Services
|
||||
|
||||
These services return a single response message:
|
||||
|
||||
- **config**: Configuration operations
|
||||
- **flow**: Flow lifecycle management
|
||||
- **librarian**: Library operations
|
||||
- **knowledge**: Knowledge graph operations
|
||||
- **collection-management**: Collection metadata
|
||||
- **embeddings**: Generate embeddings
|
||||
- **mcp-tool**: Tool invocation
|
||||
- **triples**: Triple pattern queries
|
||||
- **objects**: GraphQL queries
|
||||
- **nlp-query**: NLP-based queries
|
||||
- **structured-query**: Structured queries
|
||||
- **structured-diag**: Diagnostics
|
||||
- **graph-embeddings**: Embedding-based graph search
|
||||
- **document-embeddings**: Embedding-based document search
|
||||
- **text-load**: Text loading (returns status)
|
||||
- **document-load**: Document loading (returns status)
|
||||
|
||||
## Client Implementation Guide
|
||||
|
||||
### Basic Streaming Handler
|
||||
|
||||
```javascript
|
||||
const pendingRequests = new Map();
|
||||
|
||||
// Send request
|
||||
const requestId = generateUniqueId();
|
||||
const request = {
|
||||
id: requestId,
|
||||
service: 'agent',
|
||||
flow: 'my-flow',
|
||||
request: {
|
||||
question: 'What is quantum computing?',
|
||||
streaming: true
|
||||
}
|
||||
};
|
||||
|
||||
pendingRequests.set(requestId, {
|
||||
chunks: [],
|
||||
complete: false
|
||||
});
|
||||
|
||||
websocket.send(JSON.stringify(request));
|
||||
|
||||
// Handle responses
|
||||
websocket.onmessage = (event) => {
|
||||
const message = JSON.parse(event.data);
|
||||
|
||||
if (message.error) {
|
||||
// Handle error
|
||||
console.error(`Request ${message.id} failed:`, message.error);
|
||||
pendingRequests.delete(message.id);
|
||||
return;
|
||||
}
|
||||
|
||||
const pending = pendingRequests.get(message.id);
|
||||
if (!pending) {
|
||||
console.warn(`Unexpected response for ${message.id}`);
|
||||
return;
|
||||
}
|
||||
|
||||
// Accumulate chunk
|
||||
pending.chunks.push(message.response);
|
||||
|
||||
// Check if complete
|
||||
const isComplete =
|
||||
message.response['end-of-stream'] === true ||
|
||||
message.response['end-of-dialog'] === true;
|
||||
|
||||
if (isComplete) {
|
||||
pending.complete = true;
|
||||
console.log(`Request ${message.id} complete:`, pending.chunks);
|
||||
pendingRequests.delete(message.id);
|
||||
} else {
|
||||
// Process intermediate chunk
|
||||
console.log(`Chunk for ${message.id}:`, message.response);
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling in Streaming
|
||||
|
||||
Errors can occur at any point during streaming:
|
||||
|
||||
```json
|
||||
// Mid-stream error
|
||||
{
|
||||
"id": "req-1",
|
||||
"response": {
|
||||
"chunk-type": "thought",
|
||||
"content": "Processing...",
|
||||
"end-of-dialog": false
|
||||
}
|
||||
}
|
||||
|
||||
// Error interrupts stream
|
||||
{
|
||||
"id": "req-1",
|
||||
"error": {
|
||||
"type": "service-error",
|
||||
"message": "Backend timeout"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
When an error occurs, no further response messages will be sent for that request ID. The client should:
|
||||
1. Stop waiting for completion
|
||||
2. Handle the partial results appropriately
|
||||
3. Clean up request state
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Multiplexing Streaming Requests
|
||||
|
||||
Multiple streaming requests can be active simultaneously:
|
||||
|
||||
```json
|
||||
{"id": "req-1", "service": "agent", ...}
|
||||
{"id": "req-2", "service": "document-rag", ...}
|
||||
{"id": "req-3", "service": "text-completion", ...}
|
||||
|
||||
// Responses may interleave
|
||||
{"id": "req-2", "response": {...}}
|
||||
{"id": "req-1", "response": {...}}
|
||||
{"id": "req-3", "response": {...}}
|
||||
{"id": "req-1", "response": {...}}
|
||||
{"id": "req-2", "response": {...}}
|
||||
```
|
||||
|
||||
### Backpressure
|
||||
|
||||
If the client is slow to consume streaming responses, the WebSocket connection may experience:
|
||||
- Buffering on the server side
|
||||
- Increased latency
|
||||
- Potential connection issues
|
||||
|
||||
Clients should process streaming chunks efficiently or implement flow control.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always check completion flags**: Don't assume a fixed number of chunks
|
||||
2. **Handle partial results**: Be prepared for errors mid-stream
|
||||
3. **Unique request IDs**: Ensure IDs are unique across active requests
|
||||
4. **Timeout handling**: Implement client-side timeouts for streaming requests
|
||||
5. **Memory management**: Don't accumulate unbounded chunks; process incrementally
|
||||
6. **User feedback**: Show progressive results to users as chunks arrive
|
||||
Loading…
Add table
Add a link
Reference in a new issue