trustgraph/specs/websocket/STREAMING.md
cybermaggedon 8a17375603
Add AsyncAPI spec for websocket (#613)
* AsyncAPI for websocket docs

* Delete old docs

* Update docs/README.md to point to docs site

* Add generated API docs
2026-01-15 11:57:16 +00:00

7.7 KiB

WebSocket Streaming Message Patterns

This document describes streaming behavior for TrustGraph WebSocket services.

Overview

Many TrustGraph services support streaming responses, where a single request results in multiple response messages sent progressively over time. This enables:

  • Real-time output as it's generated
  • Lower latency for first results
  • Progressive UI updates
  • Better user experience for long-running operations

Streaming Protocol

Request ID Correlation

All streaming responses for a request share the same id:

// Single request
{"id": "req-1", "service": "agent", "flow": "my-flow", "request": {...}}

// Multiple responses with same id
{"id": "req-1", "response": {...}}  // First chunk
{"id": "req-1", "response": {...}}  // Second chunk
{"id": "req-1", "response": {...}}  // Final chunk

Completion Indicators

Services use different fields to indicate the final message:

Service Completion Field Final Value
agent end-of-dialog true
document-rag end-of-stream true
graph-rag end-of-stream true
text-completion end-of-stream true
prompt end-of-stream true

Streaming Services

Agent Service

Agent service streams thought processes, actions, observations, and answers:

// Request
{
  "id": "agent-1",
  "service": "agent",
  "flow": "my-flow",
  "request": {
    "question": "What is quantum computing?",
    "streaming": true
  }
}

// Response stream
{
  "id": "agent-1",
  "response": {
    "chunk-type": "thought",
    "content": "I need to explain quantum computing concepts",
    "end-of-dialog": false
  }
}

{
  "id": "agent-1",
  "response": {
    "chunk-type": "answer",
    "content": "Quantum computing is a type of computing that uses quantum mechanical phenomena...",
    "end-of-dialog": false
  }
}

{
  "id": "agent-1",
  "response": {
    "chunk-type": "answer",
    "content": "Key principles include superposition and entanglement.",
    "end-of-dialog": true
  }
}

Chunk Types:

  • thought: Internal reasoning
  • action: Tool/action being invoked
  • observation: Result from tool/action
  • answer: Final answer content

Document RAG Service

Document RAG streams answer chunks:

// Request
{
  "id": "rag-1",
  "service": "document-rag",
  "flow": "my-flow",
  "request": {
    "query": "What are the main features?",
    "streaming": true,
    "doc-limit": 20
  }
}

// Response stream
{
  "id": "rag-1",
  "response": {
    "content": "The main features include: 1) ",
    "end-of-stream": false
  }
}

{
  "id": "rag-1",
  "response": {
    "content": "Knowledge graph storage, 2) Vector embeddings, ",
    "end-of-stream": false
  }
}

{
  "id": "rag-1",
  "response": {
    "content": "3) RAG capabilities.",
    "end-of-stream": true
  }
}

Graph RAG Service

Similar to Document RAG but retrieves from knowledge graph:

{
  "id": "graph-rag-1",
  "service": "graph-rag",
  "flow": "my-flow",
  "request": {
    "query": "What entities are related to quantum computing?",
    "streaming": true,
    "triple-limit": 100
  }
}

Response stream has same structure as Document RAG.

Text Completion Service

Streams generated text:

{
  "id": "complete-1",
  "service": "text-completion",
  "flow": "my-flow",
  "request": {
    "prompt": "Once upon a time",
    "streaming": true,
    "max-output-tokens": 100
  }
}

// Response stream
{
  "id": "complete-1",
  "response": {
    "content": " there was a ",
    "end-of-stream": false
  }
}

{
  "id": "complete-1",
  "response": {
    "content": "kingdom far away...",
    "end-of-stream": true
  }
}

Prompt Service

Streams prompt expansion/generation:

{
  "id": "prompt-1",
  "service": "prompt",
  "flow": "my-flow",
  "request": {
    "template": "default-template",
    "variables": {"topic": "quantum"},
    "streaming": true
  }
}

Response stream contains progressive prompt text.

Non-Streaming Services

These services return a single response message:

  • config: Configuration operations
  • flow: Flow lifecycle management
  • librarian: Library operations
  • knowledge: Knowledge graph operations
  • collection-management: Collection metadata
  • embeddings: Generate embeddings
  • mcp-tool: Tool invocation
  • triples: Triple pattern queries
  • objects: GraphQL queries
  • nlp-query: NLP-based queries
  • structured-query: Structured queries
  • structured-diag: Diagnostics
  • graph-embeddings: Embedding-based graph search
  • document-embeddings: Embedding-based document search
  • text-load: Text loading (returns status)
  • document-load: Document loading (returns status)

Client Implementation Guide

Basic Streaming Handler

const pendingRequests = new Map();

// Send request
const requestId = generateUniqueId();
const request = {
  id: requestId,
  service: 'agent',
  flow: 'my-flow',
  request: {
    question: 'What is quantum computing?',
    streaming: true
  }
};

pendingRequests.set(requestId, {
  chunks: [],
  complete: false
});

websocket.send(JSON.stringify(request));

// Handle responses
websocket.onmessage = (event) => {
  const message = JSON.parse(event.data);

  if (message.error) {
    // Handle error
    console.error(`Request ${message.id} failed:`, message.error);
    pendingRequests.delete(message.id);
    return;
  }

  const pending = pendingRequests.get(message.id);
  if (!pending) {
    console.warn(`Unexpected response for ${message.id}`);
    return;
  }

  // Accumulate chunk
  pending.chunks.push(message.response);

  // Check if complete
  const isComplete =
    message.response['end-of-stream'] === true ||
    message.response['end-of-dialog'] === true;

  if (isComplete) {
    pending.complete = true;
    console.log(`Request ${message.id} complete:`, pending.chunks);
    pendingRequests.delete(message.id);
  } else {
    // Process intermediate chunk
    console.log(`Chunk for ${message.id}:`, message.response);
  }
};

Error Handling in Streaming

Errors can occur at any point during streaming:

// Mid-stream error
{
  "id": "req-1",
  "response": {
    "chunk-type": "thought",
    "content": "Processing...",
    "end-of-dialog": false
  }
}

// Error interrupts stream
{
  "id": "req-1",
  "error": {
    "type": "service-error",
    "message": "Backend timeout"
  }
}

When an error occurs, no further response messages will be sent for that request ID. The client should:

  1. Stop waiting for completion
  2. Handle the partial results appropriately
  3. Clean up request state

Performance Considerations

Multiplexing Streaming Requests

Multiple streaming requests can be active simultaneously:

{"id": "req-1", "service": "agent", ...}
{"id": "req-2", "service": "document-rag", ...}
{"id": "req-3", "service": "text-completion", ...}

// Responses may interleave
{"id": "req-2", "response": {...}}
{"id": "req-1", "response": {...}}
{"id": "req-3", "response": {...}}
{"id": "req-1", "response": {...}}
{"id": "req-2", "response": {...}}

Backpressure

If the client is slow to consume streaming responses, the WebSocket connection may experience:

  • Buffering on the server side
  • Increased latency
  • Potential connection issues

Clients should process streaming chunks efficiently or implement flow control.

Best Practices

  1. Always check completion flags: Don't assume a fixed number of chunks
  2. Handle partial results: Be prepared for errors mid-stream
  3. Unique request IDs: Ensure IDs are unique across active requests
  4. Timeout handling: Implement client-side timeouts for streaming requests
  5. Memory management: Don't accumulate unbounded chunks; process incrementally
  6. User feedback: Show progressive results to users as chunks arrive