trustgraph/specs/websocket/STREAMING.md

358 lines
7.7 KiB
Markdown
Raw Normal View History

# WebSocket Streaming Message Patterns
This document describes streaming behavior for TrustGraph WebSocket services.
## Overview
Many TrustGraph services support streaming responses, where a single request results in multiple response messages sent progressively over time. This enables:
- Real-time output as it's generated
- Lower latency for first results
- Progressive UI updates
- Better user experience for long-running operations
## Streaming Protocol
### Request ID Correlation
All streaming responses for a request share the same `id`:
```json
// Single request
{"id": "req-1", "service": "agent", "flow": "my-flow", "request": {...}}
// Multiple responses with same id
{"id": "req-1", "response": {...}} // First chunk
{"id": "req-1", "response": {...}} // Second chunk
{"id": "req-1", "response": {...}} // Final chunk
```
### Completion Indicators
Services use different fields to indicate the final message:
| Service | Completion Field | Final Value |
|---------|-----------------|-------------|
| agent | `end-of-dialog` | `true` |
| document-rag | `end-of-stream` | `true` |
| graph-rag | `end-of-stream` | `true` |
| text-completion | `end-of-stream` | `true` |
| prompt | `end-of-stream` | `true` |
## Streaming Services
### Agent Service
Agent service streams thought processes, actions, observations, and answers:
```json
// Request
{
"id": "agent-1",
"service": "agent",
"flow": "my-flow",
"request": {
"question": "What is quantum computing?",
"streaming": true
}
}
// Response stream
{
"id": "agent-1",
"response": {
"chunk-type": "thought",
"content": "I need to explain quantum computing concepts",
"end-of-dialog": false
}
}
{
"id": "agent-1",
"response": {
"chunk-type": "answer",
"content": "Quantum computing is a type of computing that uses quantum mechanical phenomena...",
"end-of-dialog": false
}
}
{
"id": "agent-1",
"response": {
"chunk-type": "answer",
"content": "Key principles include superposition and entanglement.",
"end-of-dialog": true
}
}
```
**Chunk Types**:
- `thought`: Internal reasoning
- `action`: Tool/action being invoked
- `observation`: Result from tool/action
- `answer`: Final answer content
### Document RAG Service
Document RAG streams answer chunks:
```json
// Request
{
"id": "rag-1",
"service": "document-rag",
"flow": "my-flow",
"request": {
"query": "What are the main features?",
"streaming": true,
"doc-limit": 20
}
}
// Response stream
{
"id": "rag-1",
"response": {
"content": "The main features include: 1) ",
"end-of-stream": false
}
}
{
"id": "rag-1",
"response": {
"content": "Knowledge graph storage, 2) Vector embeddings, ",
"end-of-stream": false
}
}
{
"id": "rag-1",
"response": {
"content": "3) RAG capabilities.",
"end-of-stream": true
}
}
```
### Graph RAG Service
Similar to Document RAG but retrieves from knowledge graph:
```json
{
"id": "graph-rag-1",
"service": "graph-rag",
"flow": "my-flow",
"request": {
"query": "What entities are related to quantum computing?",
"streaming": true,
"triple-limit": 100
}
}
```
Response stream has same structure as Document RAG.
### Text Completion Service
Streams generated text:
```json
{
"id": "complete-1",
"service": "text-completion",
"flow": "my-flow",
"request": {
"prompt": "Once upon a time",
"streaming": true,
"max-output-tokens": 100
}
}
// Response stream
{
"id": "complete-1",
"response": {
"content": " there was a ",
"end-of-stream": false
}
}
{
"id": "complete-1",
"response": {
"content": "kingdom far away...",
"end-of-stream": true
}
}
```
### Prompt Service
Streams prompt expansion/generation:
```json
{
"id": "prompt-1",
"service": "prompt",
"flow": "my-flow",
"request": {
"template": "default-template",
"variables": {"topic": "quantum"},
"streaming": true
}
}
```
Response stream contains progressive prompt text.
## Non-Streaming Services
These services return a single response message:
- **config**: Configuration operations
- **flow**: Flow lifecycle management
- **librarian**: Library operations
- **knowledge**: Knowledge graph operations
- **collection-management**: Collection metadata
- **embeddings**: Generate embeddings
- **mcp-tool**: Tool invocation
- **triples**: Triple pattern queries
- **objects**: GraphQL queries
- **nlp-query**: NLP-based queries
- **structured-query**: Structured queries
- **structured-diag**: Diagnostics
- **graph-embeddings**: Embedding-based graph search
- **document-embeddings**: Embedding-based document search
- **text-load**: Text loading (returns status)
- **document-load**: Document loading (returns status)
## Client Implementation Guide
### Basic Streaming Handler
```javascript
const pendingRequests = new Map();
// Send request
const requestId = generateUniqueId();
const request = {
id: requestId,
service: 'agent',
flow: 'my-flow',
request: {
question: 'What is quantum computing?',
streaming: true
}
};
pendingRequests.set(requestId, {
chunks: [],
complete: false
});
websocket.send(JSON.stringify(request));
// Handle responses
websocket.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.error) {
// Handle error
console.error(`Request ${message.id} failed:`, message.error);
pendingRequests.delete(message.id);
return;
}
const pending = pendingRequests.get(message.id);
if (!pending) {
console.warn(`Unexpected response for ${message.id}`);
return;
}
// Accumulate chunk
pending.chunks.push(message.response);
// Check if complete
const isComplete =
message.response['end-of-stream'] === true ||
message.response['end-of-dialog'] === true;
if (isComplete) {
pending.complete = true;
console.log(`Request ${message.id} complete:`, pending.chunks);
pendingRequests.delete(message.id);
} else {
// Process intermediate chunk
console.log(`Chunk for ${message.id}:`, message.response);
}
};
```
## Error Handling in Streaming
Errors can occur at any point during streaming:
```json
// Mid-stream error
{
"id": "req-1",
"response": {
"chunk-type": "thought",
"content": "Processing...",
"end-of-dialog": false
}
}
// Error interrupts stream
{
"id": "req-1",
"error": {
"type": "service-error",
"message": "Backend timeout"
}
}
```
When an error occurs, no further response messages will be sent for that request ID. The client should:
1. Stop waiting for completion
2. Handle the partial results appropriately
3. Clean up request state
## Performance Considerations
### Multiplexing Streaming Requests
Multiple streaming requests can be active simultaneously:
```json
{"id": "req-1", "service": "agent", ...}
{"id": "req-2", "service": "document-rag", ...}
{"id": "req-3", "service": "text-completion", ...}
// Responses may interleave
{"id": "req-2", "response": {...}}
{"id": "req-1", "response": {...}}
{"id": "req-3", "response": {...}}
{"id": "req-1", "response": {...}}
{"id": "req-2", "response": {...}}
```
### Backpressure
If the client is slow to consume streaming responses, the WebSocket connection may experience:
- Buffering on the server side
- Increased latency
- Potential connection issues
Clients should process streaming chunks efficiently or implement flow control.
## Best Practices
1. **Always check completion flags**: Don't assume a fixed number of chunks
2. **Handle partial results**: Be prepared for errors mid-stream
3. **Unique request IDs**: Ensure IDs are unique across active requests
4. **Timeout handling**: Implement client-side timeouts for streaming requests
5. **Memory management**: Don't accumulate unbounded chunks; process incrementally
6. **User feedback**: Show progressive results to users as chunks arrive