Update docs for 2.2 release (#766 )

- Update protocol specs - Update protocol docs - Update API specs
Subscriber resilience and RabbitMQ fixes (#765 )
2026-04-25 00:16:23 +02:00 · 2026-04-07 22:24:59 +01:00 · 2026-04-07 14:51:14 +01:00
24 changed files with 584 additions and 933 deletions
--- a/docs/api-gateway-changes-v1.8-to-v2.1.md
+++ b/docs/api-gateway-changes-v1.8-to-v2.1.md
@ -1,108 +0,0 @@
-# API Gateway Changes: v1.8 to v2.1
-
-## Summary
-
-The API gateway gained new WebSocket service dispatchers for embeddings
-queries, a new REST streaming endpoint for document content, and underwent
-a significant wire format change from `Value` to `Term`. The "objects"
-service was renamed to "rows".
-
---
-
-## New WebSocket Service Dispatchers
-
-These are new request/response services available through the WebSocket
-multiplexer at `/api/v1/socket` (flow-scoped):
-
-| Service Key | Description |
-|-------------|-------------|
-| `document-embeddings` | Queries document chunks by text similarity. Request/response uses `DocumentEmbeddingsRequest`/`DocumentEmbeddingsResponse` schemas. |
-| `row-embeddings` | Queries structured data rows by text similarity on indexed fields. Request/response uses `RowEmbeddingsRequest`/`RowEmbeddingsResponse` schemas. |
-
-These join the existing `graph-embeddings` dispatcher (which was already
-present in v1.8 but may have been updated).
-
-### Full list of WebSocket flow service dispatchers (v2.1)
-
-Request/response services (via `/api/v1/flow/{flow}/service/{kind}` or
-WebSocket mux):
-
- `agent`, `text-completion`, `prompt`, `mcp-tool`
- `graph-rag`, `document-rag`
- `embeddings`, `graph-embeddings`, `document-embeddings`
- `triples`, `rows`, `nlp-query`, `structured-query`, `structured-diag`
- `row-embeddings`
-
---
-
-## New REST Endpoint
-
-| Method | Path | Description |
-|--------|------|-------------|
-| `GET` | `/api/v1/document-stream` | Streams document content from the library as raw bytes. Query parameters: `user` (required), `document-id` (required), `chunk-size` (optional, default 1MB). Returns the document content in chunked transfer encoding, decoded from base64 internally. |
-
---
-
-## Renamed Service: "objects" to "rows"
-
-| v1.8 | v2.1 | Notes |
-|------|------|-------|
-| `objects_query.py` / `ObjectsQueryRequestor` | `rows_query.py` / `RowsQueryRequestor` | Schema changed from `ObjectsQueryRequest`/`ObjectsQueryResponse` to `RowsQueryRequest`/`RowsQueryResponse`. |
-| `objects_import.py` / `ObjectsImport` | `rows_import.py` / `RowsImport` | Import dispatcher for structured data. |
-
-The WebSocket service key changed from `"objects"` to `"rows"`, and the
-import dispatcher key similarly changed from `"objects"` to `"rows"`.
-
---
-
-## Wire Format Change: Value to Term
-
-The serialization layer (`serialize.py`) was rewritten to use the new `Term`
-type instead of the old `Value` type.
-
-### Old format (v1.8 — `Value`)
-
-```json
-{"v": "http://example.org/entity", "e": true}
-```
-
- `v`: the value (string)
- `e`: boolean flag indicating whether the value is a URI
-
-### New format (v2.1 — `Term`)
-
-IRIs:
-```json
-{"t": "i", "i": "http://example.org/entity"}
-```
-
-Literals:
-```json
-{"t": "l", "v": "some text", "d": "datatype-uri", "l": "en"}
-```
-
-Quoted triples (RDF-star):
-```json
-{"t": "r", "r": {"s": {...}, "p": {...}, "o": {...}}}
-```
-
- `t`: type discriminator — `"i"` (IRI), `"l"` (literal), `"r"` (quoted triple), `"b"` (blank node)
- Serialization now delegates to `TermTranslator` and `TripleTranslator` from `trustgraph.messaging.translators.primitives`
-
-### Other serialization changes
-
-| Field | v1.8 | v2.1 |
-|-------|------|------|
-| Metadata | `metadata.metadata` (subgraph) | `metadata.root` (simple value) |
-| Graph embeddings entity | `entity.vectors` (plural) | `entity.vector` (singular) |
-| Document embeddings chunk | `chunk.vectors` + `chunk.chunk` (text) | `chunk.vector` + `chunk.chunk_id` (ID reference) |
-
---
-
-## Breaking Changes
-
- **`Value` to `Term` wire format**: All clients sending/receiving triples, embeddings, or entity contexts through the gateway must update to the new Term format.
- **`objects` to `rows` rename**: WebSocket service key and import key changed.
- **Metadata field change**: `metadata.metadata` (a serialized subgraph) replaced by `metadata.root` (a simple value).
- **Embeddings field changes**: `vectors` (plural) became `vector` (singular); document embeddings now reference `chunk_id` instead of inline `chunk` text.
- **New `/api/v1/document-stream` endpoint**: Additive, not breaking.
--- a/docs/api.html
+++ b/docs/api.html
--- a/docs/cli-changes-v1.8-to-v2.1.md
+++ b/docs/cli-changes-v1.8-to-v2.1.md
@ -1,112 +0,0 @@
-# CLI Changes: v1.8 to v2.1
-
-## Summary
-
-The CLI (`trustgraph-cli`) has significant additions focused on three themes:
-**explainability/provenance**, **embeddings access**, and **graph querying**.
-Two legacy tools were removed, one was renamed, and several existing tools
-gained new capabilities.
-
---
-
-## New CLI Tools
-
-### Explainability & Provenance
-
-| Command | Description |
-|---------|-------------|
-| `tg-list-explain-traces` | Lists all explainability sessions (GraphRAG and Agent) in a collection, showing session IDs, type, question text, and timestamps. |
-| `tg-show-explain-trace` | Displays the full explainability trace for a session. For GraphRAG: Question, Exploration, Focus, Synthesis stages. For Agent: Session, Iterations (thought/action/observation), Final Answer. Auto-detects trace type. Supports `--show-provenance` to trace edges back to source documents. |
-| `tg-show-extraction-provenance` | Given a document ID, traverses the provenance chain: Document -> Pages -> Chunks -> Edges, using `prov:wasDerivedFrom` relationships. Supports `--show-content` and `--max-content` options. |
-
-### Embeddings
-
-| Command | Description |
-|---------|-------------|
-| `tg-invoke-embeddings` | Converts text to a vector embedding via the embeddings service. Accepts one or more text inputs, returns vectors as lists of floats. |
-| `tg-invoke-graph-embeddings` | Queries graph entities by text similarity using vector embeddings. Returns matching entities with similarity scores. |
-| `tg-invoke-document-embeddings` | Queries document chunks by text similarity using vector embeddings. Returns matching chunk IDs with similarity scores. |
-| `tg-invoke-row-embeddings` | Queries structured data rows by text similarity on indexed fields. Returns matching rows with index values and scores. Requires `--schema-name` and supports `--index-name`. |
-
-### Graph Querying
-
-| Command | Description |
-|---------|-------------|
-| `tg-query-graph` | Pattern-based triple store query. Unlike `tg-show-graph` (which dumps everything), this allows selective queries by any combination of subject, predicate, object, and graph. Auto-detects value types: IRIs (`http://...`, `urn:...`, `<...>`), quoted triples (`<<s p o>>`), and literals. |
-| `tg-get-document-content` | Retrieves document content from the library by document ID. Can output to file or stdout, handles both text and binary content. |
-
---
-
-## Removed CLI Tools
-
-| Command | Notes |
-|---------|-------|
-| `tg-load-pdf` | Removed. Document loading is now handled through the library/processing pipeline. |
-| `tg-load-text` | Removed. Document loading is now handled through the library/processing pipeline. |
-
---
-
-## Renamed CLI Tools
-
-| Old Name | New Name | Notes |
-|----------|----------|-------|
-| `tg-invoke-objects-query` | `tg-invoke-rows-query` | Reflects the terminology rename from "objects" to "rows" for structured data. |
-
---
-
-## Significant Changes to Existing Tools
-
-### `tg-invoke-graph-rag`
-
- **Explainability support**: Now supports a 4-stage explainability pipeline (Question, Grounding/Exploration, Focus, Synthesis) with inline provenance event display.
- **Streaming**: Uses WebSocket streaming for real-time output.
- **Provenance tracing**: Can trace selected edges back to source documents via reification and `prov:wasDerivedFrom` chains.
- Grew from ~30 lines to ~760 lines to accommodate the full explainability pipeline.
-
-### `tg-invoke-document-rag`
-
- **Explainability support**: Added `question_explainable()` mode that streams Document RAG responses with inline provenance events (Question, Grounding, Exploration, Synthesis stages).
-
-### `tg-invoke-agent`
-
- **Explainability support**: Added `question_explainable()` mode showing provenance events inline during agent execution (Question, Analysis, Conclusion, AgentThought, AgentObservation, AgentAnswer).
- Verbose mode shows thought/observation streams with emoji prefixes.
-
-### `tg-show-graph`
-
- **Streaming mode**: Now uses `triples_query_stream()` with configurable batch sizes for lower time-to-first-result and reduced memory overhead.
- **Named graph support**: New `--graph` filter option. Recognises named graphs:
-  - Default graph (empty): Core knowledge facts
-  - `urn:graph:source`: Extraction provenance
-  - `urn:graph:retrieval`: Query-time explainability
- **Show graph column**: New `--show-graph` flag to display the named graph for each triple.
- **Configurable limits**: New `--limit` and `--batch-size` options.
-
-### `tg-graph-to-turtle`
-
- **RDF-star support**: Now handles quoted triples (RDF-star reification).
- **Streaming mode**: Uses streaming for lower time-to-first-processing.
- **Wire format handling**: Updated to use the new term wire format (`{"t": "i", "i": uri}` for IRIs, `{"t": "l", "v": value}` for literals, `{"t": "r", "r": {...}}` for quoted triples).
- **Named graph support**: New `--graph` filter option.
-
-### `tg-set-tool`
-
- **New tool type**: `row-embeddings-query` for semantic search on structured data indexes.
- **New options**: `--schema-name`, `--index-name`, `--limit` for configuring row embeddings query tools.
-
-### `tg-show-tools`
-
- Displays the new `row-embeddings-query` tool type with its `schema-name`, `index-name`, and `limit` fields.
-
-### `tg-load-knowledge`
-
- **Progress reporting**: Now counts and reports triples and entity contexts loaded per file and in total.
- **Term format update**: Entity contexts now use the new Term format (`{"t": "i", "i": uri}`) instead of the old Value format (`{"v": entity, "e": True}`).
-
---
-
-## Breaking Changes
-
- **Terminology rename**: The `Value` schema was renamed to `Term` across the system (PR #622). This affects the wire format used by CLI tools that interact with the graph store. The new format uses `{"t": "i", "i": uri}` for IRIs and `{"t": "l", "v": value}` for literals, replacing the old `{"v": ..., "e": ...}` format.
- **`tg-invoke-objects-query` renamed** to `tg-invoke-rows-query`.
- **`tg-load-pdf` and `tg-load-text` removed**.
--- a/docs/python-api.md
+++ b/docs/python-api.md
@ -911,7 +911,7 @@ results = flow.graph_embeddings_query(
 # results contains {"entities": [{"entity": {...}, "score": 0.95}, ...]}
 ```

-### `graph_rag(self, query, user='trustgraph', collection='default', entity_limit=50, triple_limit=30, max_subgraph_size=150, max_path_length=2)`
+### `graph_rag(self, query, user='trustgraph', collection='default', entity_limit=50, triple_limit=30, max_subgraph_size=150, max_path_length=2, edge_score_limit=30, edge_limit=25)`

 Execute graph-based Retrieval-Augmented Generation (RAG) query.

@ -927,6 +927,8 @@ traversing entity relationships, then generates a response using an LLM.
 - `triple_limit`: Maximum triples per entity (default: 30)
 - `max_subgraph_size`: Maximum total triples in subgraph (default: 150)
 - `max_path_length`: Maximum traversal depth (default: 2)
+- `edge_score_limit`: Max edges for semantic pre-filter (default: 50)
+- `edge_limit`: Max edges after LLM scoring (default: 25)

 **Returns:** str: Generated response incorporating graph context

@ -1216,6 +1218,23 @@ Select matching schemas for a data sample using prompt analysis.

 **Returns:** dict with schema_matches array and metadata

+### `sparql_query(self, query, user='trustgraph', collection='default', limit=10000)`
+
+Execute a SPARQL query against the knowledge graph.
+
+**Arguments:**
+
+- `query`: SPARQL 1.1 query string
+- `user`: User/keyspace identifier (default: "trustgraph")
+- `collection`: Collection identifier (default: "default")
+- `limit`: Safety limit on results (default: 10000)
+
+**Returns:** dict with query results. Structure depends on query type: - SELECT: {"query-type": "select", "variables": [...], "bindings": [...]} - ASK: {"query-type": "ask", "ask-result": bool} - CONSTRUCT/DESCRIBE: {"query-type": "construct", "triples": [...]}
+
+**Raises:**
+
+- `ProtocolException`: If an error occurs
+
 ### `structured_query(self, question, user='trustgraph', collection='default')`

 Execute a natural language question against structured data.
@ -1937,54 +1956,24 @@ for triple in results.get("triples", []):
 from trustgraph.api import SocketClient
 ```

-Synchronous WebSocket client for streaming operations.
+Synchronous WebSocket client with persistent connection.

-Provides a synchronous interface to WebSocket-based TrustGraph services,
-wrapping async websockets library with synchronous generators for ease of use.
-Supports streaming responses from agents, RAG queries, and text completions.
-
-Note: This is a synchronous wrapper around async WebSocket operations. For
-true async support, use AsyncSocketClient instead.
+Maintains a single websocket connection and multiplexes requests
+by ID via a background reader task. Provides synchronous generators
+for streaming responses.

 ### Methods

 ### `__init__(self, url: str, timeout: int, token: str | None) -> None`

-Initialize synchronous WebSocket client.
-
-**Arguments:**
-
- `url`: Base URL for TrustGraph API (HTTP/HTTPS will be converted to WS/WSS)
- `timeout`: WebSocket timeout in seconds
- `token`: Optional bearer token for authentication
+Initialize self.  See help(type(self)) for accurate signature.

 ### `close(self) -> None`

-Close WebSocket connections.
-
-Note: Cleanup is handled automatically by context managers in async code.
+Close the persistent WebSocket connection.

 ### `flow(self, flow_id: str) -> 'SocketFlowInstance'`

-Get a flow instance for WebSocket streaming operations.
-
-**Arguments:**
-
- `flow_id`: Flow identifier
-
-**Returns:** SocketFlowInstance: Flow instance with streaming methods
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-# Stream agent responses
-for chunk in flow.agent(question="Hello", user="trustgraph", streaming=True):
-    print(chunk.content, end='', flush=True)
-```
-

 ---

@ -1997,618 +1986,82 @@ from trustgraph.api import SocketFlowInstance
 Synchronous WebSocket flow instance for streaming operations.

 Provides the same interface as REST FlowInstance but with WebSocket-based
-streaming support for real-time responses. All methods support an optional
-`streaming` parameter to enable incremental result delivery.
+streaming support for real-time responses.

 ### Methods

 ### `__init__(self, client: trustgraph.api.socket_client.SocketClient, flow_id: str) -> None`

-Initialize socket flow instance.
-
-**Arguments:**
-
- `client`: Parent SocketClient
- `flow_id`: Flow identifier
+Initialize self.  See help(type(self)) for accurate signature.

 ### `agent(self, question: str, user: str, state: Dict[str, Any] | None = None, group: str | None = None, history: List[Dict[str, Any]] | None = None, streaming: bool = False, **kwargs: Any) -> Dict[str, Any] | Iterator[trustgraph.api.types.StreamingChunk]`

 Execute an agent operation with streaming support.

-Agents can perform multi-step reasoning with tool use. This method always
-returns streaming chunks (thoughts, observations, answers) even when
-streaming=False, to show the agent's reasoning process.
-
-**Arguments:**
-
- `question`: User question or instruction
- `user`: User identifier
- `state`: Optional state dictionary for stateful conversations
- `group`: Optional group identifier for multi-user contexts
- `history`: Optional conversation history as list of message dicts
- `streaming`: Enable streaming mode (default: False)
- `**kwargs`: Additional parameters passed to the agent service
-
-**Returns:** Iterator[StreamingChunk]: Stream of agent thoughts, observations, and answers
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-# Stream agent reasoning
-for chunk in flow.agent(
-    question="What is quantum computing?",
-    user="trustgraph",
-    streaming=True
-):
-    if isinstance(chunk, AgentThought):
-        print(f"[Thinking] {chunk.content}")
-    elif isinstance(chunk, AgentObservation):
-        print(f"[Observation] {chunk.content}")
-    elif isinstance(chunk, AgentAnswer):
-        print(f"[Answer] {chunk.content}")
-```
-
 ### `agent_explain(self, question: str, user: str, collection: str, state: Dict[str, Any] | None = None, group: str | None = None, history: List[Dict[str, Any]] | None = None, **kwargs: Any) -> Iterator[trustgraph.api.types.StreamingChunk | trustgraph.api.types.ProvenanceEvent]`

 Execute an agent operation with explainability support.

-Streams both content chunks (AgentThought, AgentObservation, AgentAnswer)
-and provenance events (ProvenanceEvent). Provenance events contain URIs
-that can be fetched using ExplainabilityClient to get detailed information
-about the agent's reasoning process.
-
-Agent trace consists of:
- Session: The initial question and session metadata
- Iterations: Each thought/action/observation cycle
- Conclusion: The final answer
-
-**Arguments:**
-
- `question`: User question or instruction
- `user`: User identifier
- `collection`: Collection identifier for provenance storage
- `state`: Optional state dictionary for stateful conversations
- `group`: Optional group identifier for multi-user contexts
- `history`: Optional conversation history as list of message dicts
- `**kwargs`: Additional parameters passed to the agent service
- `Yields`: 
- `Union[StreamingChunk, ProvenanceEvent]`: Agent chunks and provenance events
-
-**Example:**
-
-```python
-from trustgraph.api import Api, ExplainabilityClient, ProvenanceEvent
-from trustgraph.api import AgentThought, AgentObservation, AgentAnswer
-
-socket = api.socket()
-flow = socket.flow("default")
-explain_client = ExplainabilityClient(flow)
-
-provenance_ids = []
-for item in flow.agent_explain(
-    question="What is the capital of France?",
-    user="trustgraph",
-    collection="default"
-):
-    if isinstance(item, AgentThought):
-        print(f"[Thought] {item.content}")
-    elif isinstance(item, AgentObservation):
-        print(f"[Observation] {item.content}")
-    elif isinstance(item, AgentAnswer):
-        print(f"[Answer] {item.content}")
-    elif isinstance(item, ProvenanceEvent):
-        provenance_ids.append(item.explain_id)
-
-# Fetch session trace after completion
-if provenance_ids:
-    trace = explain_client.fetch_agent_trace(
-        provenance_ids[0],  # Session URI is first
-        graph="urn:graph:retrieval",
-        user="trustgraph",
-        collection="default"
-    )
-```
-
 ### `document_embeddings_query(self, text: str, user: str, collection: str, limit: int = 10, **kwargs: Any) -> Dict[str, Any]`

 Query document chunks using semantic similarity.

-**Arguments:**
-
- `text`: Query text for semantic search
- `user`: User/keyspace identifier
- `collection`: Collection identifier
- `limit`: Maximum number of results (default: 10)
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** dict: Query results with chunk_ids of matching document chunks
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-results = flow.document_embeddings_query(
-    text="machine learning algorithms",
-    user="trustgraph",
-    collection="research-papers",
-    limit=5
-)
-# results contains {"chunks": [{"chunk_id": "...", "score": 0.95}, ...]}
-```
-
 ### `document_rag(self, query: str, user: str, collection: str, doc_limit: int = 10, streaming: bool = False, **kwargs: Any) -> str | Iterator[str]`

 Execute document-based RAG query with optional streaming.

-Uses vector embeddings to find relevant document chunks, then generates
-a response using an LLM. Streaming mode delivers results incrementally.
-
-**Arguments:**
-
- `query`: Natural language query
- `user`: User/keyspace identifier
- `collection`: Collection identifier
- `doc_limit`: Maximum document chunks to retrieve (default: 10)
- `streaming`: Enable streaming mode (default: False)
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** Union[str, Iterator[str]]: Complete response or stream of text chunks
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-# Streaming document RAG
-for chunk in flow.document_rag(
-    query="Summarize the key findings",
-    user="trustgraph",
-    collection="research-papers",
-    doc_limit=5,
-    streaming=True
-):
-    print(chunk, end='', flush=True)
-```
-
 ### `document_rag_explain(self, query: str, user: str, collection: str, doc_limit: int = 10, **kwargs: Any) -> Iterator[trustgraph.api.types.RAGChunk | trustgraph.api.types.ProvenanceEvent]`

 Execute document-based RAG query with explainability support.

-Streams both content chunks (RAGChunk) and provenance events (ProvenanceEvent).
-Provenance events contain URIs that can be fetched using ExplainabilityClient
-to get detailed information about how the response was generated.
-
-Document RAG trace consists of:
- Question: The user's query
- Exploration: Chunks retrieved from document store (chunk_count)
- Synthesis: The generated answer
-
-**Arguments:**
-
- `query`: Natural language query
- `user`: User/keyspace identifier
- `collection`: Collection identifier
- `doc_limit`: Maximum document chunks to retrieve (default: 10)
- `**kwargs`: Additional parameters passed to the service
- `Yields`: 
- `Union[RAGChunk, ProvenanceEvent]`: Content chunks and provenance events
-
-**Example:**
-
-```python
-from trustgraph.api import Api, ExplainabilityClient, RAGChunk, ProvenanceEvent
-
-socket = api.socket()
-flow = socket.flow("default")
-explain_client = ExplainabilityClient(flow)
-
-for item in flow.document_rag_explain(
-    query="Summarize the key findings",
-    user="trustgraph",
-    collection="research-papers",
-    doc_limit=5
-):
-    if isinstance(item, RAGChunk):
-        print(item.content, end='', flush=True)
-    elif isinstance(item, ProvenanceEvent):
-        # Fetch entity details
-        entity = explain_client.fetch_entity(
-            item.explain_id,
-            graph=item.explain_graph,
-            user="trustgraph",
-            collection="research-papers"
-        )
-        print(f"Event: {entity}", file=sys.stderr)
-```
-
 ### `embeddings(self, texts: list, **kwargs: Any) -> Dict[str, Any]`

 Generate vector embeddings for one or more texts.

-**Arguments:**
-
- `texts`: List of input texts to embed
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** dict: Response containing vectors (one set per input text)
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-result = flow.embeddings(["quantum computing"])
-vectors = result.get("vectors", [])
-```
-
 ### `graph_embeddings_query(self, text: str, user: str, collection: str, limit: int = 10, **kwargs: Any) -> Dict[str, Any]`

 Query knowledge graph entities using semantic similarity.

-**Arguments:**
-
- `text`: Query text for semantic search
- `user`: User/keyspace identifier
- `collection`: Collection identifier
- `limit`: Maximum number of results (default: 10)
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** dict: Query results with similar entities
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-results = flow.graph_embeddings_query(
-    text="physicist who discovered radioactivity",
-    user="trustgraph",
-    collection="scientists",
-    limit=5
-)
-```
-
-### `graph_rag(self, query: str, user: str, collection: str, max_subgraph_size: int = 1000, max_subgraph_count: int = 5, max_entity_distance: int = 3, streaming: bool = False, **kwargs: Any) -> str | Iterator[str]`
+### `graph_rag(self, query: str, user: str, collection: str, entity_limit: int = 50, triple_limit: int = 30, max_subgraph_size: int = 1000, max_path_length: int = 2, edge_score_limit: int = 30, edge_limit: int = 25, streaming: bool = False, **kwargs: Any) -> str | Iterator[str]`

 Execute graph-based RAG query with optional streaming.

-Uses knowledge graph structure to find relevant context, then generates
-a response using an LLM. Streaming mode delivers results incrementally.
-
-**Arguments:**
-
- `query`: Natural language query
- `user`: User/keyspace identifier
- `collection`: Collection identifier
- `max_subgraph_size`: Maximum total triples in subgraph (default: 1000)
- `max_subgraph_count`: Maximum number of subgraphs (default: 5)
- `max_entity_distance`: Maximum traversal depth (default: 3)
- `streaming`: Enable streaming mode (default: False)
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** Union[str, Iterator[str]]: Complete response or stream of text chunks
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-# Streaming graph RAG
-for chunk in flow.graph_rag(
-    query="Tell me about Marie Curie",
-    user="trustgraph",
-    collection="scientists",
-    streaming=True
-):
-    print(chunk, end='', flush=True)
-```
-
-### `graph_rag_explain(self, query: str, user: str, collection: str, max_subgraph_size: int = 1000, max_subgraph_count: int = 5, max_entity_distance: int = 3, **kwargs: Any) -> Iterator[trustgraph.api.types.RAGChunk | trustgraph.api.types.ProvenanceEvent]`
+### `graph_rag_explain(self, query: str, user: str, collection: str, entity_limit: int = 50, triple_limit: int = 30, max_subgraph_size: int = 1000, max_path_length: int = 2, edge_score_limit: int = 30, edge_limit: int = 25, **kwargs: Any) -> Iterator[trustgraph.api.types.RAGChunk | trustgraph.api.types.ProvenanceEvent]`

 Execute graph-based RAG query with explainability support.

-Streams both content chunks (RAGChunk) and provenance events (ProvenanceEvent).
-Provenance events contain URIs that can be fetched using ExplainabilityClient
-to get detailed information about how the response was generated.
-
-**Arguments:**
-
- `query`: Natural language query
- `user`: User/keyspace identifier
- `collection`: Collection identifier
- `max_subgraph_size`: Maximum total triples in subgraph (default: 1000)
- `max_subgraph_count`: Maximum number of subgraphs (default: 5)
- `max_entity_distance`: Maximum traversal depth (default: 3)
- `**kwargs`: Additional parameters passed to the service
- `Yields`: 
- `Union[RAGChunk, ProvenanceEvent]`: Content chunks and provenance events
-
-**Example:**
-
-```python
-from trustgraph.api import Api, ExplainabilityClient, RAGChunk, ProvenanceEvent
-
-socket = api.socket()
-flow = socket.flow("default")
-explain_client = ExplainabilityClient(flow)
-
-provenance_ids = []
-response_text = ""
-
-for item in flow.graph_rag_explain(
-    query="Tell me about Marie Curie",
-    user="trustgraph",
-    collection="scientists"
-):
-    if isinstance(item, RAGChunk):
-        response_text += item.content
-        print(item.content, end='', flush=True)
-    elif isinstance(item, ProvenanceEvent):
-        provenance_ids.append(item.provenance_id)
-
-# Fetch explainability details
-for prov_id in provenance_ids:
-    entity = explain_client.fetch_entity(
-        prov_id,
-        graph="urn:graph:retrieval",
-        user="trustgraph",
-        collection="scientists"
-    )
-    print(f"Entity: {entity}")
-```
-
 ### `mcp_tool(self, name: str, parameters: Dict[str, Any], **kwargs: Any) -> Dict[str, Any]`

 Execute a Model Context Protocol (MCP) tool.

-**Arguments:**
-
- `name`: Tool name/identifier
- `parameters`: Tool parameters dictionary
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** dict: Tool execution result
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-result = flow.mcp_tool(
-    name="search-web",
-    parameters={"query": "latest AI news", "limit": 5}
-)
-```
-
 ### `prompt(self, id: str, variables: Dict[str, str], streaming: bool = False, **kwargs: Any) -> str | Iterator[str]`

 Execute a prompt template with optional streaming.

-**Arguments:**
-
- `id`: Prompt template identifier
- `variables`: Dictionary of variable name to value mappings
- `streaming`: Enable streaming mode (default: False)
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** Union[str, Iterator[str]]: Complete response or stream of text chunks
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-# Streaming prompt execution
-for chunk in flow.prompt(
-    id="summarize-template",
-    variables={"topic": "quantum computing", "length": "brief"},
-    streaming=True
-):
-    print(chunk, end='', flush=True)
-```
-
 ### `row_embeddings_query(self, text: str, schema_name: str, user: str = 'trustgraph', collection: str = 'default', index_name: str | None = None, limit: int = 10, **kwargs: Any) -> Dict[str, Any]`

 Query row data using semantic similarity on indexed fields.

-Finds rows whose indexed field values are semantically similar to the
-input text, using vector embeddings. This enables fuzzy/semantic matching
-on structured data.
-
-**Arguments:**
-
- `text`: Query text for semantic search
- `schema_name`: Schema name to search within
- `user`: User/keyspace identifier (default: "trustgraph")
- `collection`: Collection identifier (default: "default")
- `index_name`: Optional index name to filter search to specific index
- `limit`: Maximum number of results (default: 10)
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** dict: Query results with matches containing index_name, index_value, text, and score
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-# Search for customers by name similarity
-results = flow.row_embeddings_query(
-    text="John Smith",
-    schema_name="customers",
-    user="trustgraph",
-    collection="sales",
-    limit=5
-)
-
-# Filter to specific index
-results = flow.row_embeddings_query(
-    text="machine learning engineer",
-    schema_name="employees",
-    index_name="job_title",
-    limit=10
-)
-```
-
 ### `rows_query(self, query: str, user: str, collection: str, variables: Dict[str, Any] | None = None, operation_name: str | None = None, **kwargs: Any) -> Dict[str, Any]`

 Execute a GraphQL query against structured rows.

-**Arguments:**
+### `sparql_query_stream(self, query: str, user: str = 'trustgraph', collection: str = 'default', limit: int = 10000, batch_size: int = 20, **kwargs: Any) -> Iterator[Dict[str, Any]]`

- `query`: GraphQL query string
- `user`: User/keyspace identifier
- `collection`: Collection identifier
- `variables`: Optional query variables dictionary
- `operation_name`: Optional operation name for multi-operation documents
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** dict: GraphQL response with data, errors, and/or extensions
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-query = '''
-{
-  scientists(limit: 10) {
-    name
-    field
-    discoveries
-  }
-}
-'''
-result = flow.rows_query(
-    query=query,
-    user="trustgraph",
-    collection="scientists"
-)
-```
+Execute a SPARQL query with streaming batches.

 ### `text_completion(self, system: str, prompt: str, streaming: bool = False, **kwargs) -> str | Iterator[str]`

 Execute text completion with optional streaming.

-**Arguments:**
-
- `system`: System prompt defining the assistant's behavior
- `prompt`: User prompt/question
- `streaming`: Enable streaming mode (default: False)
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** Union[str, Iterator[str]]: Complete response or stream of text chunks
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-# Non-streaming
-response = flow.text_completion(
-    system="You are helpful",
-    prompt="Explain quantum computing",
-    streaming=False
-)
-print(response)
-
-# Streaming
-for chunk in flow.text_completion(
-    system="You are helpful",
-    prompt="Explain quantum computing",
-    streaming=True
-):
-    print(chunk, end='', flush=True)
-```
-
 ### `triples_query(self, s: str | Dict[str, Any] | None = None, p: str | Dict[str, Any] | None = None, o: str | Dict[str, Any] | None = None, g: str | None = None, user: str | None = None, collection: str | None = None, limit: int = 100, **kwargs: Any) -> List[Dict[str, Any]]`

 Query knowledge graph triples using pattern matching.

-**Arguments:**
-
- `s`: Subject filter - URI string, Term dict, or None for wildcard
- `p`: Predicate filter - URI string, Term dict, or None for wildcard
- `o`: Object filter - URI/literal string, Term dict, or None for wildcard
- `g`: Named graph filter - URI string or None for all graphs
- `user`: User/keyspace identifier (optional)
- `collection`: Collection identifier (optional)
- `limit`: Maximum results to return (default: 100)
- `**kwargs`: Additional parameters passed to the service
-
-**Returns:** List[Dict]: List of matching triples in wire format
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-# Find all triples about a specific subject
-triples = flow.triples_query(
-    s="http://example.org/person/marie-curie",
-    user="trustgraph",
-    collection="scientists"
-)
-
-# Query with named graph filter
-triples = flow.triples_query(
-    s="urn:trustgraph:session:abc123",
-    g="urn:graph:retrieval",
-    user="trustgraph",
-    collection="default"
-)
-```
-
 ### `triples_query_stream(self, s: str | Dict[str, Any] | None = None, p: str | Dict[str, Any] | None = None, o: str | Dict[str, Any] | None = None, g: str | None = None, user: str | None = None, collection: str | None = None, limit: int = 100, batch_size: int = 20, **kwargs: Any) -> Iterator[List[Dict[str, Any]]]`

 Query knowledge graph triples with streaming batches.

-Yields batches of triples as they arrive, reducing time-to-first-result
-and memory overhead for large result sets.
-
-**Arguments:**
-
- `s`: Subject filter - URI string, Term dict, or None for wildcard
- `p`: Predicate filter - URI string, Term dict, or None for wildcard
- `o`: Object filter - URI/literal string, Term dict, or None for wildcard
- `g`: Named graph filter - URI string or None for all graphs
- `user`: User/keyspace identifier (optional)
- `collection`: Collection identifier (optional)
- `limit`: Maximum results to return (default: 100)
- `batch_size`: Triples per batch (default: 20)
- `**kwargs`: Additional parameters passed to the service
- `Yields`: 
- `List[Dict]`: Batches of triples in wire format
-
-**Example:**
-
-```python
-socket = api.socket()
-flow = socket.flow("default")
-
-for batch in flow.triples_query_stream(
-    user="trustgraph",
-    collection="default"
-):
-    for triple in batch:
-        print(triple["s"], triple["p"], triple["o"])
-```
-

 ---

@ -2618,17 +2071,35 @@ for batch in flow.triples_query_stream(
 from trustgraph.api import AsyncSocketClient
 ```

-Asynchronous WebSocket client
+Asynchronous WebSocket client with persistent connection.
+
+Maintains a single websocket connection and multiplexes requests
+by ID, routing responses via a background reader task.
+
+Use as an async context manager for proper lifecycle management:
+
+    async with AsyncSocketClient(url, timeout, token) as client:
+        result = await client._send_request(...)
+
+Or call connect()/aclose() manually.

 ### Methods

+### `__aenter__(self)`
+
+### `__aexit__(self, exc_type, exc_val, exc_tb)`
+
 ### `__init__(self, url: str, timeout: int, token: str | None)`

 Initialize self.  See help(type(self)) for accurate signature.

 ### `aclose(self)`

-Close WebSocket connection
+Close the persistent WebSocket connection cleanly.
+
+### `connect(self)`
+
+Establish the persistent websocket connection.

 ### `flow(self, flow_id: str)`

@ -3151,7 +2622,10 @@ Detect whether a session is GraphRAG or Agent type.

 Fetch the complete Agent trace starting from a session URI.

-Follows the provenance chain: Question -> Analysis(s) -> Conclusion
+Follows the provenance chain for all patterns:
+- ReAct: Question -> Analysis(s) -> Conclusion
+- Supervisor: Question -> Decomposition -> Finding(s) -> Synthesis
+- Plan-then-Execute: Question -> Plan -> StepResult(s) -> Synthesis

 **Arguments:**

@ -3162,7 +2636,7 @@ Follows the provenance chain: Question -> Analysis(s) -> Conclusion
 - `api`: TrustGraph Api instance for librarian access (optional)
 - `max_content`: Maximum content length for conclusion

-**Returns:** Dict with question, iterations (Analysis list), conclusion entities
+**Returns:** Dict with question, steps (mixed entity list), conclusion/synthesis

 ### `fetch_docrag_trace(self, question_uri: str, graph: str | None = None, user: str | None = None, collection: str | None = None, api: Any = None, max_content: int = 10000) -> Dict[str, Any]`

@ -3423,7 +2897,7 @@ Initialize self.  See help(type(self)) for accurate signature.
 from trustgraph.api import Analysis
 ```

-Analysis entity - one think/act/observe cycle (Agent only).
+Analysis+ToolUse entity - decision + tool call (Agent only).

 **Fields:**

@ -3432,11 +2906,33 @@ Analysis entity - one think/act/observe cycle (Agent only).
 - `action`: <class 'str'>
 - `arguments`: <class 'str'>
 - `thought`: <class 'str'>
- `observation`: <class 'str'>

 ### Methods

-### `__init__(self, uri: str, entity_type: str = '', action: str = '', arguments: str = '', thought: str = '', observation: str = '') -> None`
+### `__init__(self, uri: str, entity_type: str = '', action: str = '', arguments: str = '', thought: str = '') -> None`
+
+Initialize self.  See help(type(self)) for accurate signature.
+
+
+---
+
+## `Observation`
+
+```python
+from trustgraph.api import Observation
+```
+
+Observation entity - standalone tool result (Agent only).
+
+**Fields:**
+
+- `uri`: <class 'str'>
+- `entity_type`: <class 'str'>
+- `document`: <class 'str'>
+
+### Methods
+
+### `__init__(self, uri: str, entity_type: str = '', document: str = '') -> None`

 Initialize self.  See help(type(self)) for accurate signature.

@ -3761,10 +3257,11 @@ These chunks show how the agent is thinking about the problem.
 - `content`: <class 'str'>
 - `end_of_message`: <class 'bool'>
 - `chunk_type`: <class 'str'>
+- `message_id`: <class 'str'>

 ### Methods

-### `__init__(self, content: str, end_of_message: bool = False, chunk_type: str = 'thought') -> None`
+### `__init__(self, content: str, end_of_message: bool = False, chunk_type: str = 'thought', message_id: str = '') -> None`

 Initialize self.  See help(type(self)) for accurate signature.

@ -3787,10 +3284,11 @@ These chunks show what the agent learned from using tools.
 - `content`: <class 'str'>
 - `end_of_message`: <class 'bool'>
 - `chunk_type`: <class 'str'>
+- `message_id`: <class 'str'>

 ### Methods

-### `__init__(self, content: str, end_of_message: bool = False, chunk_type: str = 'observation') -> None`
+### `__init__(self, content: str, end_of_message: bool = False, chunk_type: str = 'observation', message_id: str = '') -> None`

 Initialize self.  See help(type(self)) for accurate signature.

@ -3818,10 +3316,11 @@ its reasoning and tool use.
 - `end_of_message`: <class 'bool'>
 - `chunk_type`: <class 'str'>
 - `end_of_dialog`: <class 'bool'>
+- `message_id`: <class 'str'>

 ### Methods

-### `__init__(self, content: str, end_of_message: bool = False, chunk_type: str = 'final-answer', end_of_dialog: bool = False) -> None`
+### `__init__(self, content: str, end_of_message: bool = False, chunk_type: str = 'final-answer', end_of_dialog: bool = False, message_id: str = '') -> None`

 Initialize self.  See help(type(self)) for accurate signature.

@ -3864,7 +3363,7 @@ from trustgraph.api import ProvenanceEvent

 Provenance event for explainability.

-Emitted during GraphRAG queries when explainable mode is enabled.
+Emitted during retrieval queries when explainable mode is enabled.
 Each event represents a provenance node created during query processing.

 **Fields:**
@ -3872,10 +3371,12 @@ Each event represents a provenance node created during query processing.
 - `explain_id`: <class 'str'>
 - `explain_graph`: <class 'str'>
 - `event_type`: <class 'str'>
+- `entity`: <class 'object'>
+- `triples`: <class 'list'>

 ### Methods

-### `__init__(self, explain_id: str, explain_graph: str = '', event_type: str = '') -> None`
+### `__init__(self, explain_id: str, explain_graph: str = '', event_type: str = '', entity: object = None, triples: list = <factory>) -> None`

 Initialize self.  See help(type(self)) for accurate signature.

--- a/docs/websocket.html
+++ b/docs/websocket.html
--- a/specs/api/components/schemas/agent/AgentResponse.yaml
+++ b/specs/api/components/schemas/agent/AgentResponse.yaml
@ -10,6 +10,7 @@ properties:
      - observation
      - answer
      - final-answer
+      - explain
      - error
    example: answer
  content:
@ -29,6 +30,11 @@ properties:
    type: string
    description: Named graph containing the explainability data
    example: urn:graph:retrieval
+  explain_triples:
+    type: array
+    description: Provenance triples for this explain event (inline, no follow-up query needed)
+    items:
+      $ref: '../common/Triple.yaml'
  end-of-message:
    type: boolean
    description: Current chunk type is complete (streaming mode)
--- a/specs/api/components/schemas/rag/DocumentRagResponse.yaml
+++ b/specs/api/components/schemas/rag/DocumentRagResponse.yaml
@ -18,6 +18,11 @@ properties:
    type: string
    description: Named graph containing the explainability data
    example: urn:graph:retrieval
+  explain_triples:
+    type: array
+    description: Provenance triples for this explain event (inline, no follow-up query needed)
+    items:
+      $ref: '../common/Triple.yaml'
  end-of-stream:
    type: boolean
    description: Indicates LLM response stream is complete
--- a/specs/api/components/schemas/rag/GraphRagResponse.yaml
+++ b/specs/api/components/schemas/rag/GraphRagResponse.yaml
@ -18,6 +18,11 @@ properties:
    type: string
    description: Named graph containing the explainability data
    example: urn:graph:retrieval
+  explain_triples:
+    type: array
+    description: Provenance triples for this explain event (inline, no follow-up query needed)
+    items:
+      $ref: '../common/Triple.yaml'
  end_of_stream:
    type: boolean
    description: Indicates LLM response stream is complete
--- a/specs/api/openapi.yaml
+++ b/specs/api/openapi.yaml
@ -2,7 +2,7 @@ openapi: 3.1.0

 info:
  title: TrustGraph API Gateway
-  version: "2.1"
+  version: "2.2"
  description: |
    REST API for TrustGraph - an AI-powered knowledge graph and RAG system.

@ -28,7 +28,7 @@ info:
    Require running flow instance, accessed via `/api/v1/flow/{flow}/service/{kind}`:
    - AI services: agent, text-completion, prompt, RAG (document/graph)
    - Embeddings: embeddings, graph-embeddings, document-embeddings
-    - Query: triples, rows, nlp-query, structured-query, row-embeddings
+    - Query: triples, rows, nlp-query, structured-query, sparql-query, row-embeddings
    - Data loading: text-load, document-load
    - Utilities: mcp-tool, structured-diag

@ -139,6 +139,8 @@ paths:
    $ref: './paths/flow/text-load.yaml'
  /api/v1/flow/{flow}/service/document-load:
    $ref: './paths/flow/document-load.yaml'
+  /api/v1/flow/{flow}/service/sparql-query:
+    $ref: './paths/flow/sparql-query.yaml'

  # Document streaming
  /api/v1/document-stream:
--- a/specs/api/paths/flow/agent.yaml
+++ b/specs/api/paths/flow/agent.yaml
@ -29,6 +29,7 @@ post:
    - `action`: Action being taken
    - `observation`: Result from action
    - `answer`: Final response to user
+    - `explain`: Provenance event with inline triples (`explain_triples`)
    - `error`: Error occurred

    Each chunk may have multiple messages. Check flags:
@ -116,6 +117,22 @@ post:
                content: ""
                end-of-message: true
                end-of-dialog: true
+            explainEvent:
+              summary: Explain event with inline provenance triples
+              value:
+                chunk-type: explain
+                content: ""
+                explain_id: urn:trustgraph:agent:abc123
+                explain_graph: urn:graph:retrieval
+                explain_triples:
+                  - s: {t: i, i: "urn:trustgraph:agent:abc123"}
+                    p: {t: i, i: "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}
+                    o: {t: i, i: "https://trustgraph.ai/ns/AgentSession"}
+                  - s: {t: i, i: "urn:trustgraph:agent:abc123"}
+                    p: {t: i, i: "https://trustgraph.ai/ns/query"}
+                    o: {t: l, v: "Explain quantum computing"}
+                end-of-message: true
+                end-of-dialog: false
            legacyResponse:
              summary: Legacy non-streaming response
              value:
--- a/specs/api/paths/flow/document-rag.yaml
+++ b/specs/api/paths/flow/document-rag.yaml
@ -24,8 +24,13 @@ post:
    ## Streaming

    Enable `streaming: true` to receive the answer as it's generated:
-    - Multiple messages with `response` content
+    - Multiple `chunk` messages with `response` content
+    - `explain` messages with inline provenance triples (`explain_triples`)
    - Final message with `end-of-stream: true`
+    - Session ends with `end_of_session: true`
+
+    Explain events carry `explain_id`, `explain_graph`, and `explain_triples`
+    inline in the stream, so no follow-up knowledge graph query is needed.

    Without streaming, returns complete answer in single response.

@ -96,6 +101,21 @@ post:
              value:
                response: "The research papers present three"
                end-of-stream: false
+            explainEvent:
+              summary: Explain event with inline provenance triples
+              value:
+                message_type: explain
+                explain_id: urn:trustgraph:question:abc123
+                explain_graph: urn:graph:retrieval
+                explain_triples:
+                  - s: {t: i, i: "urn:trustgraph:question:abc123"}
+                    p: {t: i, i: "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}
+                    o: {t: i, i: "https://trustgraph.ai/ns/DocumentRagQuestion"}
+                  - s: {t: i, i: "urn:trustgraph:question:abc123"}
+                    p: {t: i, i: "https://trustgraph.ai/ns/query"}
+                    o: {t: l, v: "What are the key findings in the research papers?"}
+                end-of-stream: false
+                end_of_session: false
            streamingComplete:
              summary: Streaming complete marker
              value:
--- a/specs/api/paths/flow/graph-rag.yaml
+++ b/specs/api/paths/flow/graph-rag.yaml
@ -25,8 +25,13 @@ post:
    ## Streaming

    Enable `streaming: true` to receive the answer as it's generated:
-    - Multiple messages with `response` content
+    - Multiple `chunk` messages with `response` content
+    - `explain` messages with inline provenance triples (`explain_triples`)
    - Final message with `end-of-stream: true`
+    - Session ends with `end_of_session: true`
+
+    Explain events carry `explain_id`, `explain_graph`, and `explain_triples`
+    inline in the stream, so no follow-up knowledge graph query is needed.

    Without streaming, returns complete answer in single response.

@ -116,6 +121,21 @@ post:
              value:
                response: "Quantum physics and computer science intersect"
                end-of-stream: false
+            explainEvent:
+              summary: Explain event with inline provenance triples
+              value:
+                message_type: explain
+                explain_id: urn:trustgraph:question:abc123
+                explain_graph: urn:graph:retrieval
+                explain_triples:
+                  - s: {t: i, i: "urn:trustgraph:question:abc123"}
+                    p: {t: i, i: "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}
+                    o: {t: i, i: "https://trustgraph.ai/ns/GraphRagQuestion"}
+                  - s: {t: i, i: "urn:trustgraph:question:abc123"}
+                    p: {t: i, i: "https://trustgraph.ai/ns/query"}
+                    o: {t: l, v: "What connections exist between quantum physics and computer science?"}
+                end_of_stream: false
+                end_of_session: false
            streamingComplete:
              summary: Streaming complete marker
              value:
--- a/specs/websocket/asyncapi.yaml
+++ b/specs/websocket/asyncapi.yaml
@ -2,7 +2,7 @@ asyncapi: 3.0.0

 info:
  title: TrustGraph WebSocket API
-  version: "2.1"
+  version: "2.2"
  description: |
    WebSocket API for TrustGraph - providing multiplexed, asynchronous access to all services.

@ -31,7 +31,7 @@ info:
    **Flow-Hosted Services** (require `flow` parameter):
    - agent, text-completion, prompt, document-rag, graph-rag
    - embeddings, graph-embeddings, document-embeddings
-    - triples, rows, nlp-query, structured-query, structured-diag, row-embeddings
+    - triples, rows, nlp-query, structured-query, sparql-query, structured-diag, row-embeddings
    - text-load, document-load, mcp-tool

    ## Schema Reuse
--- a/specs/websocket/components/messages/ServiceRequest.yaml
+++ b/specs/websocket/components/messages/ServiceRequest.yaml
@ -34,6 +34,7 @@ payload:
    - $ref: './requests/RowEmbeddingsRequest.yaml'
    - $ref: './requests/TextLoadRequest.yaml'
    - $ref: './requests/DocumentLoadRequest.yaml'
+    - $ref: './requests/SparqlQueryRequest.yaml'

 examples:
  - name: Config service request
--- a/specs/websocket/components/messages/requests/SparqlQueryRequest.yaml
+++ b/specs/websocket/components/messages/requests/SparqlQueryRequest.yaml
@ -0,0 +1,46 @@
+type: object
+description: WebSocket request for sparql-query service (flow-hosted service)
+required:
+  - id
+  - service
+  - flow
+  - request
+properties:
+  id:
+    type: string
+    description: Unique request identifier
+  service:
+    type: string
+    const: sparql-query
+    description: Service identifier for sparql-query service
+  flow:
+    type: string
+    description: Flow ID
+  request:
+    type: object
+    required:
+      - query
+    properties:
+      query:
+        type: string
+        description: SPARQL 1.1 query string
+      user:
+        type: string
+        default: trustgraph
+        description: User/keyspace identifier
+      collection:
+        type: string
+        default: default
+        description: Collection identifier
+      limit:
+        type: integer
+        default: 10000
+        description: Safety limit on number of results
+examples:
+  - id: req-1
+    service: sparql-query
+    flow: my-flow
+    request:
+      query: "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10"
+      user: trustgraph
+      collection: default
--- a/tests/unit/test_base/test_subscriber_graceful_shutdown.py
+++ b/tests/unit/test_base/test_subscriber_graceful_shutdown.py
@ -61,23 +61,21 @@ async def test_subscriber_deferred_acknowledgment_success():
        max_size=10,
        backpressure_strategy="block"
    )
-    
-    # Start subscriber to initialize consumer
-    await subscriber.start()
-    
+    subscriber.consumer = mock_consumer
+
    # Create queue for subscription
    queue = await subscriber.subscribe("test-queue")
-    
+
    # Create mock message with matching queue name
    msg = create_mock_message("test-queue", {"data": "test"})
-    
+
    # Process message
    await subscriber._process_message(msg)
-    
+
    # Should acknowledge successful delivery
    mock_consumer.acknowledge.assert_called_once_with(msg)
    mock_consumer.negative_acknowledge.assert_not_called()
-    
+
    # Message should be in queue
    assert not queue.empty()
    received_msg = await queue.get()
@ -108,9 +106,7 @@ async def test_subscriber_dropped_message_still_acks():
        max_size=1,  # Very small queue
        backpressure_strategy="drop_new"
    )
-
-    # Start subscriber to initialize consumer
-    await subscriber.start()
+    subscriber.consumer = mock_consumer

    # Create queue and fill it
    queue = await subscriber.subscribe("test-queue")
@ -151,9 +147,7 @@ async def test_subscriber_orphaned_message_acks():
        max_size=10,
        backpressure_strategy="block"
    )
-
-    # Start subscriber to initialize consumer
-    await subscriber.start()
+    subscriber.consumer = mock_consumer

    # Don't create any queues - message will be orphaned
    # This simulates a response arriving after the waiter has unsubscribed
@ -189,9 +183,7 @@ async def test_subscriber_backpressure_strategies():
        max_size=2,
        backpressure_strategy="drop_oldest"
    )
-    
-    # Start subscriber to initialize consumer
-    await subscriber.start()
+    subscriber.consumer = mock_consumer
    
    queue = await subscriber.subscribe("test-queue")
    
--- a/tests/unit/test_concurrency/test_consumer_concurrency.py
+++ b/tests/unit/test_concurrency/test_consumer_concurrency.py
@ -81,9 +81,8 @@ class TestTaskGroupConcurrency:

        # Track how many consume_from_queue calls are made
        call_count = 0
-        original_running = True

-        async def mock_consume(backend_consumer):
+        async def mock_consume(backend_consumer, executor=None):
            nonlocal call_count
            call_count += 1
            # Wait a bit to let all tasks start, then signal stop
@ -107,7 +106,7 @@ class TestTaskGroupConcurrency:
        consumer = _make_consumer(concurrency=1)
        call_count = 0

-        async def mock_consume(backend_consumer):
+        async def mock_consume(backend_consumer, executor=None):
            nonlocal call_count
            call_count += 1
            await asyncio.sleep(0.01)
@ -294,9 +293,8 @@ class TestPollTimeout:
            raise type('Timeout', (Exception,), {})("timeout")

        mock_pulsar_consumer.receive = capture_receive
-        consumer.consumer = mock_pulsar_consumer

-        await consumer.consume_from_queue()
+        await consumer.consume_from_queue(mock_pulsar_consumer)

        assert received_kwargs.get("timeout_millis") == 100

--- a/trustgraph-base/trustgraph/base/async_processor.py
+++ b/trustgraph-base/trustgraph/base/async_processor.py
@ -94,7 +94,6 @@ class AsyncProcessor:
            metrics = config_consumer_metrics,

            start_of_messages = False,
-            consumer_type = 'exclusive',
        )

        self.running = True
--- a/trustgraph-base/trustgraph/base/consumer.py
+++ b/trustgraph-base/trustgraph/base/consumer.py
@ -12,6 +12,7 @@
 import asyncio
 import time
 import logging
+from concurrent.futures import ThreadPoolExecutor

 from .. exceptions import TooManyRequests

@ -110,29 +111,37 @@ class Consumer:
                logger.info(f"Starting {self.concurrency} receiver threads")

                # Create one backend consumer per concurrent task.
-                # Each gets its own connection — required for backends
-                # like RabbitMQ where connections are not thread-safe.
+                # Each gets its own connection and dedicated thread —
+                # required for backends like RabbitMQ where connections
+                # are not thread-safe (pika BlockingConnection must be
+                # used from a single thread).
                consumers = []
+                executors = []
                for i in range(self.concurrency):
                    try:
                        logger.info(f"Subscribing to topic: {self.topic} (worker {i})")
-                        c = await asyncio.to_thread(
-                            self.backend.create_consumer,
-                            topic = self.topic,
-                            subscription = self.subscriber,
-                            schema = self.schema,
-                            initial_position = initial_pos,
-                            consumer_type = self.consumer_type,
+                        executor = ThreadPoolExecutor(max_workers=1)
+                        loop = asyncio.get_event_loop()
+                        c = await loop.run_in_executor(
+                            executor,
+                            lambda: self.backend.create_consumer(
+                                topic = self.topic,
+                                subscription = self.subscriber,
+                                schema = self.schema,
+                                initial_position = initial_pos,
+                                consumer_type = self.consumer_type,
+                            ),
                        )
                        consumers.append(c)
+                        executors.append(executor)
                        logger.info(f"Successfully subscribed to topic: {self.topic} (worker {i})")
                    except Exception as e:
                        logger.error(f"Consumer subscription exception (worker {i}): {e}", exc_info=True)
                        raise

                async with asyncio.TaskGroup() as tg:
-                    for c in consumers:
-                        tg.create_task(self.consume_from_queue(c))
+                    for c, ex in zip(consumers, executors):
+                        tg.create_task(self.consume_from_queue(c, ex))

                if self.metrics:
                    self.metrics.state("stopped")
@ -146,7 +155,10 @@ class Consumer:
                        c.close()
                    except Exception:
                        pass
+                for ex in executors:
+                    ex.shutdown(wait=False)
                consumers = []
+                executors = []
                await asyncio.sleep(self.reconnect_time)
                continue

@ -157,15 +169,18 @@ class Consumer:
                        c.close()
                    except Exception:
                        pass
+                for ex in executors:
+                    ex.shutdown(wait=False)

-    async def consume_from_queue(self, consumer):
+    async def consume_from_queue(self, consumer, executor=None):

+        loop = asyncio.get_event_loop()
        while self.running:

            try:
-                msg = await asyncio.to_thread(
-                    consumer.receive,
-                    timeout_millis=100
+                msg = await loop.run_in_executor(
+                    executor,
+                    lambda: consumer.receive(timeout_millis=100),
                )
            except Exception as e:
                # Handle timeout from any backend
@ -173,10 +188,11 @@ class Consumer:
                    continue
                raise e

-            await self.handle_one_from_queue(msg, consumer)
+            await self.handle_one_from_queue(msg, consumer, executor)

-    async def handle_one_from_queue(self, msg, consumer):
+    async def handle_one_from_queue(self, msg, consumer, executor=None):

+        loop = asyncio.get_event_loop()
        expiry = time.time() + self.rate_limit_timeout

        # This loop is for retry on rate-limit / resource limits
@ -187,8 +203,11 @@ class Consumer:
                logger.warning("Gave up waiting for rate-limit retry")

                # Message failed to be processed, this causes it to
-                # be retried
-                consumer.negative_acknowledge(msg)
+                # be retried.  Ack on the consumer's dedicated thread
+                # (pika is not thread-safe).
+                await loop.run_in_executor(
+                    executor, lambda: consumer.negative_acknowledge(msg)
+                )

                if self.metrics:
                    self.metrics.process("error")
@ -210,8 +229,11 @@ class Consumer:

                logger.debug("Message processed successfully")

-                # Acknowledge successful processing of the message
-                consumer.acknowledge(msg)
+                # Acknowledge on the consumer's dedicated thread
+                # (pika is not thread-safe)
+                await loop.run_in_executor(
+                    executor, lambda: consumer.acknowledge(msg)
+                )

                if self.metrics:
                    self.metrics.process("success")
@ -237,8 +259,10 @@ class Consumer:
                logger.error(f"Message processing exception: {e}", exc_info=True)

                # Message failed to be processed, this causes it to
-                # be retried
-                consumer.negative_acknowledge(msg)
+                # be retried.  Ack on the consumer's dedicated thread.
+                await loop.run_in_executor(
+                    executor, lambda: consumer.negative_acknowledge(msg)
+                )

                if self.metrics:
                    self.metrics.process("error")
--- a/trustgraph-base/trustgraph/base/subscriber.py
+++ b/trustgraph-base/trustgraph/base/subscriber.py
@ -7,6 +7,7 @@ import asyncio
 import time
 import logging
 import uuid
+from concurrent.futures import ThreadPoolExecutor

 # Module logger
 logger = logging.getLogger(__name__)
@ -38,6 +39,7 @@ class Subscriber:
        self.pending_acks = {}  # Track messages awaiting delivery

        self.consumer = None
+        self.executor = None

    def __del__(self):

@ -45,15 +47,6 @@ class Subscriber:

    async def start(self):

-        # Create consumer via backend
-        self.consumer = await asyncio.to_thread(
-            self.backend.create_consumer,
-            topic=self.topic,
-            subscription=self.subscription,
-            schema=self.schema,
-            consumer_type='exclusive',
-        )
-
        self.task = asyncio.create_task(self.run())

    async def stop(self):
@ -80,6 +73,21 @@ class Subscriber:

            try:

+                # Create consumer and dedicated thread if needed
+                # (first run or after failure)
+                if self.consumer is None:
+                    self.executor = ThreadPoolExecutor(max_workers=1)
+                    loop = asyncio.get_event_loop()
+                    self.consumer = await loop.run_in_executor(
+                        self.executor,
+                        lambda: self.backend.create_consumer(
+                            topic=self.topic,
+                            subscription=self.subscription,
+                            schema=self.schema,
+                            consumer_type='exclusive',
+                        ),
+                    )
+
                if self.metrics:
                    self.metrics.state("running")

@ -128,9 +136,12 @@ class Subscriber:
                    # Process messages only if not draining
                    if not self.draining:
                        try:
-                            msg = await asyncio.to_thread(
-                                self.consumer.receive,
-                                timeout_millis=250
+                            loop = asyncio.get_event_loop()
+                            msg = await loop.run_in_executor(
+                                self.executor,
+                                lambda: self.consumer.receive(
+                                    timeout_millis=250
+                                ),
                            )
                        except Exception as e:
                            # Handle timeout from any backend
@ -172,15 +183,18 @@ class Subscriber:
                    except Exception:
                        pass  # Already closed or error
                    self.consumer = None
-                
-         
+
+                if self.executor:
+                    self.executor.shutdown(wait=False)
+                    self.executor = None
+
            if self.metrics:
                self.metrics.state("stopped")

            if not self.running and not self.draining:
                return
-            
-            # If handler drops out, sleep a retry
+
+            # Sleep before retry
            await asyncio.sleep(1)

    async def subscribe(self, id):
--- a/trustgraph-flow/trustgraph/agent/mcp_tool/service.py
+++ b/trustgraph-flow/trustgraph/agent/mcp_tool/service.py
@ -24,7 +24,7 @@ class Service(ToolService):
            **params
        )

-        self.register_config_handler(self.on_mcp_config, types=["mcp-tool"])
+        self.register_config_handler(self.on_mcp_config, types=["mcp"])

        self.mcp_services = {}

--- a/trustgraph-flow/trustgraph/cores/service.py
+++ b/trustgraph-flow/trustgraph/cores/service.py
@ -108,7 +108,7 @@ class Processor(AsyncProcessor):
            flow_config = self,
        )

-        self.register_config_handler(self.on_knowledge_config, types=["kg-core"])
+        self.register_config_handler(self.on_knowledge_config, types=["flow"])

        self.flows = {}

--- a/trustgraph-flow/trustgraph/librarian/service.py
+++ b/trustgraph-flow/trustgraph/librarian/service.py
@ -246,7 +246,10 @@ class Processor(AsyncProcessor):
            taskgroup = self.taskgroup,
        )

-        self.register_config_handler(self.on_librarian_config, types=["librarian"])
+        self.register_config_handler(
+            self.on_librarian_config,
+            types=["flow", "active-flow"],
+        )

        self.flows = {}

--- a/trustgraph-flow/trustgraph/metering/counter.py
+++ b/trustgraph-flow/trustgraph/metering/counter.py
@ -40,7 +40,7 @@ class Processor(FlowProcessor):
            }
        )

-        self.register_config_handler(self.on_cost_config, types=["token-costs"])
+        self.register_config_handler(self.on_cost_config, types=["token-cost"])

        self.register_specification(
            ConsumerSpec(