trustgraph/specs/websocket/STREAMING.md

# WebSocket Streaming Message Patterns

This document describes streaming behavior for TrustGraph WebSocket services.

## Overview

Many TrustGraph services support streaming responses, where a single request results in multiple response messages sent progressively over time. This enables:
- Real-time output as it's generated
- Lower latency for first results
- Progressive UI updates
- Better user experience for long-running operations

## Streaming Protocol

### Request ID Correlation

All streaming responses for a request share the same `id`:

```json
// Single request
{"id": "req-1", "service": "agent", "flow": "my-flow", "request": {...}}

// Multiple responses with same id
{"id": "req-1", "response": {...}}  // First chunk
{"id": "req-1", "response": {...}}  // Second chunk
{"id": "req-1", "response": {...}}  // Final chunk
```

### Completion Indicators

Services use different fields to indicate the final message:

| Service | Completion Field | Final Value |
|---------|-----------------|-------------|
| agent | `end-of-dialog` | `true` |
| document-rag | `end-of-stream` | `true` |
| graph-rag | `end-of-stream` | `true` |
| text-completion | `end-of-stream` | `true` |
| prompt | `end-of-stream` | `true` |

## Streaming Services

### Agent Service

Agent service streams thought processes, actions, observations, and answers:

```json
// Request
{
  "id": "agent-1",
  "service": "agent",
  "flow": "my-flow",
  "request": {
    "question": "What is quantum computing?",
    "streaming": true
  }
}

// Response stream
{
  "id": "agent-1",
  "response": {
    "chunk-type": "thought",
    "content": "I need to explain quantum computing concepts",
    "end-of-dialog": false
  }
}

{
  "id": "agent-1",
  "response": {
    "chunk-type": "answer",
    "content": "Quantum computing is a type of computing that uses quantum mechanical phenomena...",
    "end-of-dialog": false
  }
}

{
  "id": "agent-1",
  "response": {
    "chunk-type": "answer",
    "content": "Key principles include superposition and entanglement.",
    "end-of-dialog": true
  }
}
```

**Chunk Types**:
- `thought`: Internal reasoning
- `action`: Tool/action being invoked
- `observation`: Result from tool/action
- `answer`: Final answer content

### Document RAG Service

Document RAG streams answer chunks:

```json
// Request
{
  "id": "rag-1",
  "service": "document-rag",
  "flow": "my-flow",
  "request": {
    "query": "What are the main features?",
    "streaming": true,
    "doc-limit": 20
  }
}

// Response stream
{
  "id": "rag-1",
  "response": {
    "content": "The main features include: 1) ",
    "end-of-stream": false
  }
}

{
  "id": "rag-1",
  "response": {
    "content": "Knowledge graph storage, 2) Vector embeddings, ",
    "end-of-stream": false
  }
}

{
  "id": "rag-1",
  "response": {
    "content": "3) RAG capabilities.",
    "end-of-stream": true
  }
}
```

### Graph RAG Service

Similar to Document RAG but retrieves from knowledge graph:

```json
{
  "id": "graph-rag-1",
  "service": "graph-rag",
  "flow": "my-flow",
  "request": {
    "query": "What entities are related to quantum computing?",
    "streaming": true,
    "triple-limit": 100
  }
}
```

Response stream has same structure as Document RAG.

### Text Completion Service

Streams generated text:

```json
{
  "id": "complete-1",
  "service": "text-completion",
  "flow": "my-flow",
  "request": {
    "prompt": "Once upon a time",
    "streaming": true,
    "max-output-tokens": 100
  }
}

// Response stream
{
  "id": "complete-1",
  "response": {
    "content": " there was a ",
    "end-of-stream": false
  }
}

{
  "id": "complete-1",
  "response": {
    "content": "kingdom far away...",
    "end-of-stream": true
  }
}
```

### Prompt Service

Streams prompt expansion/generation:

```json
{
  "id": "prompt-1",
  "service": "prompt",
  "flow": "my-flow",
  "request": {
    "template": "default-template",
    "variables": {"topic": "quantum"},
    "streaming": true
  }
}
```

Response stream contains progressive prompt text.

## Non-Streaming Services

These services return a single response message:

- **config**: Configuration operations
- **flow**: Flow lifecycle management
- **librarian**: Library operations
- **knowledge**: Knowledge graph operations
- **collection-management**: Collection metadata
- **embeddings**: Generate embeddings
- **mcp-tool**: Tool invocation
- **triples**: Triple pattern queries
- **objects**: GraphQL queries
- **nlp-query**: NLP-based queries
- **structured-query**: Structured queries
- **structured-diag**: Diagnostics
- **graph-embeddings**: Embedding-based graph search
- **document-embeddings**: Embedding-based document search
- **text-load**: Text loading (returns status)
- **document-load**: Document loading (returns status)

## Client Implementation Guide

### Basic Streaming Handler

```javascript
const pendingRequests = new Map();

// Send request
const requestId = generateUniqueId();
const request = {
  id: requestId,
  service: 'agent',
  flow: 'my-flow',
  request: {
    question: 'What is quantum computing?',
    streaming: true
  }
};

pendingRequests.set(requestId, {
  chunks: [],
  complete: false
});

websocket.send(JSON.stringify(request));

// Handle responses
websocket.onmessage = (event) => {
  const message = JSON.parse(event.data);

  if (message.error) {
    // Handle error
    console.error(`Request ${message.id} failed:`, message.error);
    pendingRequests.delete(message.id);
    return;
  }

  const pending = pendingRequests.get(message.id);
  if (!pending) {
    console.warn(`Unexpected response for ${message.id}`);
    return;
  }

  // Accumulate chunk
  pending.chunks.push(message.response);

  // Check if complete
  const isComplete =
    message.response['end-of-stream'] === true ||
    message.response['end-of-dialog'] === true;

  if (isComplete) {
    pending.complete = true;
    console.log(`Request ${message.id} complete:`, pending.chunks);
    pendingRequests.delete(message.id);
  } else {
    // Process intermediate chunk
    console.log(`Chunk for ${message.id}:`, message.response);
  }
};
```

## Error Handling in Streaming

Errors can occur at any point during streaming:

```json
// Mid-stream error
{
  "id": "req-1",
  "response": {
    "chunk-type": "thought",
    "content": "Processing...",
    "end-of-dialog": false
  }
}

// Error interrupts stream
{
  "id": "req-1",
  "error": {
    "type": "service-error",
    "message": "Backend timeout"
  }
}
```

When an error occurs, no further response messages will be sent for that request ID. The client should:
1. Stop waiting for completion
2. Handle the partial results appropriately
3. Clean up request state

## Performance Considerations

### Multiplexing Streaming Requests

Multiple streaming requests can be active simultaneously:

```json
{"id": "req-1", "service": "agent", ...}
{"id": "req-2", "service": "document-rag", ...}
{"id": "req-3", "service": "text-completion", ...}

// Responses may interleave
{"id": "req-2", "response": {...}}
{"id": "req-1", "response": {...}}
{"id": "req-3", "response": {...}}
{"id": "req-1", "response": {...}}
{"id": "req-2", "response": {...}}
```

### Backpressure

If the client is slow to consume streaming responses, the WebSocket connection may experience:
- Buffering on the server side
- Increased latency
- Potential connection issues

Clients should process streaming chunks efficiently or implement flow control.

## Best Practices

1. **Always check completion flags**: Don't assume a fixed number of chunks
2. **Handle partial results**: Be prepared for errors mid-stream
3. **Unique request IDs**: Ensure IDs are unique across active requests
4. **Timeout handling**: Implement client-side timeouts for streaming requests
5. **Memory management**: Don't accumulate unbounded chunks; process incrementally
6. **User feedback**: Show progressive results to users as chunks arrive
Add AsyncAPI spec for websocket (#613) * AsyncAPI for websocket docs * Delete old docs * Update docs/README.md to point to docs site * Add generated API docs 2026-01-15 11:57:16 +00:00			`# WebSocket Streaming Message Patterns`

			`This document describes streaming behavior for TrustGraph WebSocket services.`

			`## Overview`

			`Many TrustGraph services support streaming responses, where a single request results in multiple response messages sent progressively over time. This enables:`
			`- Real-time output as it's generated`
			`- Lower latency for first results`
			`- Progressive UI updates`
			`- Better user experience for long-running operations`

			`## Streaming Protocol`

			`### Request ID Correlation`

			All streaming responses for a request share the same `id`:

			```json
			`// Single request`
			`{"id": "req-1", "service": "agent", "flow": "my-flow", "request": {...}}`

			`// Multiple responses with same id`
			`{"id": "req-1", "response": {...}} // First chunk`
			`{"id": "req-1", "response": {...}} // Second chunk`
			`{"id": "req-1", "response": {...}} // Final chunk`
			```

			`### Completion Indicators`

			`Services use different fields to indicate the final message:`

			`\| Service \| Completion Field \| Final Value \|`
			`\|---------\|-----------------\|-------------\|`
			\| agent \| `end-of-dialog` \| `true` \|
			\| document-rag \| `end-of-stream` \| `true` \|
			\| graph-rag \| `end-of-stream` \| `true` \|
			\| text-completion \| `end-of-stream` \| `true` \|
			\| prompt \| `end-of-stream` \| `true` \|

			`## Streaming Services`

			`### Agent Service`

			`Agent service streams thought processes, actions, observations, and answers:`

			```json
			`// Request`
			`{`
			`"id": "agent-1",`
			`"service": "agent",`
			`"flow": "my-flow",`
			`"request": {`
			`"question": "What is quantum computing?",`
			`"streaming": true`
			`}`
			`}`

			`// Response stream`
			`{`
			`"id": "agent-1",`
			`"response": {`
			`"chunk-type": "thought",`
			`"content": "I need to explain quantum computing concepts",`
			`"end-of-dialog": false`
			`}`
			`}`

			`{`
			`"id": "agent-1",`
			`"response": {`
			`"chunk-type": "answer",`
			`"content": "Quantum computing is a type of computing that uses quantum mechanical phenomena...",`
			`"end-of-dialog": false`
			`}`
			`}`

			`{`
			`"id": "agent-1",`
			`"response": {`
			`"chunk-type": "answer",`
			`"content": "Key principles include superposition and entanglement.",`
			`"end-of-dialog": true`
			`}`
			`}`
			```

			`Chunk Types:`
			- `thought`: Internal reasoning
			- `action`: Tool/action being invoked
			- `observation`: Result from tool/action
			- `answer`: Final answer content

			`### Document RAG Service`

			`Document RAG streams answer chunks:`

			```json
			`// Request`
			`{`
			`"id": "rag-1",`
			`"service": "document-rag",`
			`"flow": "my-flow",`
			`"request": {`
			`"query": "What are the main features?",`
			`"streaming": true,`
			`"doc-limit": 20`
			`}`
			`}`

			`// Response stream`
			`{`
			`"id": "rag-1",`
			`"response": {`
			`"content": "The main features include: 1) ",`
			`"end-of-stream": false`
			`}`
			`}`

			`{`
			`"id": "rag-1",`
			`"response": {`
			`"content": "Knowledge graph storage, 2) Vector embeddings, ",`
			`"end-of-stream": false`
			`}`
			`}`

			`{`
			`"id": "rag-1",`
			`"response": {`
			`"content": "3) RAG capabilities.",`
			`"end-of-stream": true`
			`}`
			`}`
			```

			`### Graph RAG Service`

			`Similar to Document RAG but retrieves from knowledge graph:`

			```json
			`{`
			`"id": "graph-rag-1",`
			`"service": "graph-rag",`
			`"flow": "my-flow",`
			`"request": {`
			`"query": "What entities are related to quantum computing?",`
			`"streaming": true,`
			`"triple-limit": 100`
			`}`
			`}`
			```

			`Response stream has same structure as Document RAG.`

			`### Text Completion Service`

			`Streams generated text:`

			```json
			`{`
			`"id": "complete-1",`
			`"service": "text-completion",`
			`"flow": "my-flow",`
			`"request": {`
			`"prompt": "Once upon a time",`
			`"streaming": true,`
			`"max-output-tokens": 100`
			`}`
			`}`

			`// Response stream`
			`{`
			`"id": "complete-1",`
			`"response": {`
			`"content": " there was a ",`
			`"end-of-stream": false`
			`}`
			`}`

			`{`
			`"id": "complete-1",`
			`"response": {`
			`"content": "kingdom far away...",`
			`"end-of-stream": true`
			`}`
			`}`
			```

			`### Prompt Service`

			`Streams prompt expansion/generation:`

			```json
			`{`
			`"id": "prompt-1",`
			`"service": "prompt",`
			`"flow": "my-flow",`
			`"request": {`
			`"template": "default-template",`
			`"variables": {"topic": "quantum"},`
			`"streaming": true`
			`}`
			`}`
			```

			`Response stream contains progressive prompt text.`

			`## Non-Streaming Services`

			`These services return a single response message:`

			`- config: Configuration operations`
			`- flow: Flow lifecycle management`
			`- librarian: Library operations`
			`- knowledge: Knowledge graph operations`
			`- collection-management: Collection metadata`
			`- embeddings: Generate embeddings`
			`- mcp-tool: Tool invocation`
			`- triples: Triple pattern queries`
			`- objects: GraphQL queries`
			`- nlp-query: NLP-based queries`
			`- structured-query: Structured queries`
			`- structured-diag: Diagnostics`
			`- graph-embeddings: Embedding-based graph search`
			`- document-embeddings: Embedding-based document search`
			`- text-load: Text loading (returns status)`
			`- document-load: Document loading (returns status)`

			`## Client Implementation Guide`

			`### Basic Streaming Handler`

			```javascript
			`const pendingRequests = new Map();`

			`// Send request`
			`const requestId = generateUniqueId();`
			`const request = {`
			`id: requestId,`
			`service: 'agent',`
			`flow: 'my-flow',`
			`request: {`
			`question: 'What is quantum computing?',`
			`streaming: true`
			`}`
			`};`

			`pendingRequests.set(requestId, {`
			`chunks: [],`
			`complete: false`
			`});`

			`websocket.send(JSON.stringify(request));`

			`// Handle responses`
			`websocket.onmessage = (event) => {`
			`const message = JSON.parse(event.data);`

			`if (message.error) {`
			`// Handle error`
			console.error(`Request ${message.id} failed:`, message.error);
			`pendingRequests.delete(message.id);`
			`return;`
			`}`

			`const pending = pendingRequests.get(message.id);`
			`if (!pending) {`
			console.warn(`Unexpected response for ${message.id}`);
			`return;`
			`}`

			`// Accumulate chunk`
			`pending.chunks.push(message.response);`

			`// Check if complete`
			`const isComplete =`
			`message.response['end-of-stream'] === true \|\|`
			`message.response['end-of-dialog'] === true;`

			`if (isComplete) {`
			`pending.complete = true;`
			console.log(`Request ${message.id} complete:`, pending.chunks);
			`pendingRequests.delete(message.id);`
			`} else {`
			`// Process intermediate chunk`
			console.log(`Chunk for ${message.id}:`, message.response);
			`}`
			`};`
			```

			`## Error Handling in Streaming`

			`Errors can occur at any point during streaming:`

			```json
			`// Mid-stream error`
			`{`
			`"id": "req-1",`
			`"response": {`
			`"chunk-type": "thought",`
			`"content": "Processing...",`
			`"end-of-dialog": false`
			`}`
			`}`

			`// Error interrupts stream`
			`{`
			`"id": "req-1",`
			`"error": {`
			`"type": "service-error",`
			`"message": "Backend timeout"`
			`}`
			`}`
			```

			`When an error occurs, no further response messages will be sent for that request ID. The client should:`
			`1. Stop waiting for completion`
			`2. Handle the partial results appropriately`
			`3. Clean up request state`

			`## Performance Considerations`

			`### Multiplexing Streaming Requests`

			`Multiple streaming requests can be active simultaneously:`

			```json
			`{"id": "req-1", "service": "agent", ...}`
			`{"id": "req-2", "service": "document-rag", ...}`
			`{"id": "req-3", "service": "text-completion", ...}`

			`// Responses may interleave`
			`{"id": "req-2", "response": {...}}`
			`{"id": "req-1", "response": {...}}`
			`{"id": "req-3", "response": {...}}`
			`{"id": "req-1", "response": {...}}`
			`{"id": "req-2", "response": {...}}`
			```

			`### Backpressure`

			`If the client is slow to consume streaming responses, the WebSocket connection may experience:`
			`- Buffering on the server side`
			`- Increased latency`
			`- Potential connection issues`

			`Clients should process streaming chunks efficiently or implement flow control.`

			`## Best Practices`

			`1. Always check completion flags: Don't assume a fixed number of chunks`
			`2. Handle partial results: Be prepared for errors mid-stream`
			`3. Unique request IDs: Ensure IDs are unique across active requests`
			`4. Timeout handling: Implement client-side timeouts for streaming requests`
			`5. Memory management: Don't accumulate unbounded chunks; process incrementally`
			`6. User feedback: Show progressive results to users as chunks arrive`