Add AsyncAPI spec for websocket (#613)

* AsyncAPI for websocket docs * Delete old docs * Update docs/README.md to point to docs site * Add generated API docs
2026-04-26 00:46:22 +02:00 · 2026-01-15 11:57:16 +00:00 · 2026-01-15 11:57:16 +00:00 · 8a17375603
commit 8a17375603
parent fce43ae035
110 changed files with 8325 additions and 23324 deletions
--- a/specs/websocket/STREAMING.md
+++ b/specs/websocket/STREAMING.md
@ -0,0 +1,357 @@
+# WebSocket Streaming Message Patterns
+
+This document describes streaming behavior for TrustGraph WebSocket services.
+
+## Overview
+
+Many TrustGraph services support streaming responses, where a single request results in multiple response messages sent progressively over time. This enables:
+- Real-time output as it's generated
+- Lower latency for first results
+- Progressive UI updates
+- Better user experience for long-running operations
+
+## Streaming Protocol
+
+### Request ID Correlation
+
+All streaming responses for a request share the same `id`:
+
+```json
+// Single request
+{"id": "req-1", "service": "agent", "flow": "my-flow", "request": {...}}
+
+// Multiple responses with same id
+{"id": "req-1", "response": {...}}  // First chunk
+{"id": "req-1", "response": {...}}  // Second chunk
+{"id": "req-1", "response": {...}}  // Final chunk
+```
+
+### Completion Indicators
+
+Services use different fields to indicate the final message:
+
+| Service | Completion Field | Final Value |
+|---------|-----------------|-------------|
+| agent | `end-of-dialog` | `true` |
+| document-rag | `end-of-stream` | `true` |
+| graph-rag | `end-of-stream` | `true` |
+| text-completion | `end-of-stream` | `true` |
+| prompt | `end-of-stream` | `true` |
+
+## Streaming Services
+
+### Agent Service
+
+Agent service streams thought processes, actions, observations, and answers:
+
+```json
+// Request
+{
+  "id": "agent-1",
+  "service": "agent",
+  "flow": "my-flow",
+  "request": {
+    "question": "What is quantum computing?",
+    "streaming": true
+  }
+}
+
+// Response stream
+{
+  "id": "agent-1",
+  "response": {
+    "chunk-type": "thought",
+    "content": "I need to explain quantum computing concepts",
+    "end-of-dialog": false
+  }
+}
+
+{
+  "id": "agent-1",
+  "response": {
+    "chunk-type": "answer",
+    "content": "Quantum computing is a type of computing that uses quantum mechanical phenomena...",
+    "end-of-dialog": false
+  }
+}
+
+{
+  "id": "agent-1",
+  "response": {
+    "chunk-type": "answer",
+    "content": "Key principles include superposition and entanglement.",
+    "end-of-dialog": true
+  }
+}
+```
+
+**Chunk Types**:
+- `thought`: Internal reasoning
+- `action`: Tool/action being invoked
+- `observation`: Result from tool/action
+- `answer`: Final answer content
+
+### Document RAG Service
+
+Document RAG streams answer chunks:
+
+```json
+// Request
+{
+  "id": "rag-1",
+  "service": "document-rag",
+  "flow": "my-flow",
+  "request": {
+    "query": "What are the main features?",
+    "streaming": true,
+    "doc-limit": 20
+  }
+}
+
+// Response stream
+{
+  "id": "rag-1",
+  "response": {
+    "content": "The main features include: 1) ",
+    "end-of-stream": false
+  }
+}
+
+{
+  "id": "rag-1",
+  "response": {
+    "content": "Knowledge graph storage, 2) Vector embeddings, ",
+    "end-of-stream": false
+  }
+}
+
+{
+  "id": "rag-1",
+  "response": {
+    "content": "3) RAG capabilities.",
+    "end-of-stream": true
+  }
+}
+```
+
+### Graph RAG Service
+
+Similar to Document RAG but retrieves from knowledge graph:
+
+```json
+{
+  "id": "graph-rag-1",
+  "service": "graph-rag",
+  "flow": "my-flow",
+  "request": {
+    "query": "What entities are related to quantum computing?",
+    "streaming": true,
+    "triple-limit": 100
+  }
+}
+```
+
+Response stream has same structure as Document RAG.
+
+### Text Completion Service
+
+Streams generated text:
+
+```json
+{
+  "id": "complete-1",
+  "service": "text-completion",
+  "flow": "my-flow",
+  "request": {
+    "prompt": "Once upon a time",
+    "streaming": true,
+    "max-output-tokens": 100
+  }
+}
+
+// Response stream
+{
+  "id": "complete-1",
+  "response": {
+    "content": " there was a ",
+    "end-of-stream": false
+  }
+}
+
+{
+  "id": "complete-1",
+  "response": {
+    "content": "kingdom far away...",
+    "end-of-stream": true
+  }
+}
+```
+
+### Prompt Service
+
+Streams prompt expansion/generation:
+
+```json
+{
+  "id": "prompt-1",
+  "service": "prompt",
+  "flow": "my-flow",
+  "request": {
+    "template": "default-template",
+    "variables": {"topic": "quantum"},
+    "streaming": true
+  }
+}
+```
+
+Response stream contains progressive prompt text.
+
+## Non-Streaming Services
+
+These services return a single response message:
+
+- **config**: Configuration operations
+- **flow**: Flow lifecycle management
+- **librarian**: Library operations
+- **knowledge**: Knowledge graph operations
+- **collection-management**: Collection metadata
+- **embeddings**: Generate embeddings
+- **mcp-tool**: Tool invocation
+- **triples**: Triple pattern queries
+- **objects**: GraphQL queries
+- **nlp-query**: NLP-based queries
+- **structured-query**: Structured queries
+- **structured-diag**: Diagnostics
+- **graph-embeddings**: Embedding-based graph search
+- **document-embeddings**: Embedding-based document search
+- **text-load**: Text loading (returns status)
+- **document-load**: Document loading (returns status)
+
+## Client Implementation Guide
+
+### Basic Streaming Handler
+
+```javascript
+const pendingRequests = new Map();
+
+// Send request
+const requestId = generateUniqueId();
+const request = {
+  id: requestId,
+  service: 'agent',
+  flow: 'my-flow',
+  request: {
+    question: 'What is quantum computing?',
+    streaming: true
+  }
+};
+
+pendingRequests.set(requestId, {
+  chunks: [],
+  complete: false
+});
+
+websocket.send(JSON.stringify(request));
+
+// Handle responses
+websocket.onmessage = (event) => {
+  const message = JSON.parse(event.data);
+
+  if (message.error) {
+    // Handle error
+    console.error(`Request ${message.id} failed:`, message.error);
+    pendingRequests.delete(message.id);
+    return;
+  }
+
+  const pending = pendingRequests.get(message.id);
+  if (!pending) {
+    console.warn(`Unexpected response for ${message.id}`);
+    return;
+  }
+
+  // Accumulate chunk
+  pending.chunks.push(message.response);
+
+  // Check if complete
+  const isComplete =
+    message.response['end-of-stream'] === true ||
+    message.response['end-of-dialog'] === true;
+
+  if (isComplete) {
+    pending.complete = true;
+    console.log(`Request ${message.id} complete:`, pending.chunks);
+    pendingRequests.delete(message.id);
+  } else {
+    // Process intermediate chunk
+    console.log(`Chunk for ${message.id}:`, message.response);
+  }
+};
+```
+
+## Error Handling in Streaming
+
+Errors can occur at any point during streaming:
+
+```json
+// Mid-stream error
+{
+  "id": "req-1",
+  "response": {
+    "chunk-type": "thought",
+    "content": "Processing...",
+    "end-of-dialog": false
+  }
+}
+
+// Error interrupts stream
+{
+  "id": "req-1",
+  "error": {
+    "type": "service-error",
+    "message": "Backend timeout"
+  }
+}
+```
+
+When an error occurs, no further response messages will be sent for that request ID. The client should:
+1. Stop waiting for completion
+2. Handle the partial results appropriately
+3. Clean up request state
+
+## Performance Considerations
+
+### Multiplexing Streaming Requests
+
+Multiple streaming requests can be active simultaneously:
+
+```json
+{"id": "req-1", "service": "agent", ...}
+{"id": "req-2", "service": "document-rag", ...}
+{"id": "req-3", "service": "text-completion", ...}
+
+// Responses may interleave
+{"id": "req-2", "response": {...}}
+{"id": "req-1", "response": {...}}
+{"id": "req-3", "response": {...}}
+{"id": "req-1", "response": {...}}
+{"id": "req-2", "response": {...}}
+```
+
+### Backpressure
+
+If the client is slow to consume streaming responses, the WebSocket connection may experience:
+- Buffering on the server side
+- Increased latency
+- Potential connection issues
+
+Clients should process streaming chunks efficiently or implement flow control.
+
+## Best Practices
+
+1. **Always check completion flags**: Don't assume a fixed number of chunks
+2. **Handle partial results**: Be prepared for errors mid-stream
+3. **Unique request IDs**: Ensure IDs are unique across active requests
+4. **Timeout handling**: Implement client-side timeouts for streaming requests
+5. **Memory management**: Don't accumulate unbounded chunks; process incrementally
+6. **User feedback**: Show progressive results to users as chunks arrive