mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-06-12 08:15:14 +02:00
Compare commits
No commits in common. "master" and "v2.5.2" have entirely different histories.
162 changed files with 4141 additions and 7080 deletions
34
README.md
34
README.md
|
|
@ -3,7 +3,7 @@
|
||||||
|
|
||||||
<img src="TG-fullname-logo.svg" width=100% />
|
<img src="TG-fullname-logo.svg" width=100% />
|
||||||
|
|
||||||
[](https://pypi.org/project/trustgraph/)  
|
[](https://pypi.org/project/trustgraph/) [](LICENSE) 
|
||||||
[](https://discord.gg/sQMwkRz5GX) [](https://deepwiki.com/trustgraph-ai/trustgraph)
|
)](https://discord.gg/sQMwkRz5GX) [](https://deepwiki.com/trustgraph-ai/trustgraph)
|
||||||
|
|
||||||
|
|
@ -11,11 +11,11 @@
|
||||||
|
|
||||||
<a href="https://trendshift.io/repositories/17291" target="_blank"><img src="https://trendshift.io/api/badge/repositories/17291" alt="trustgraph-ai%2Ftrustgraph | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
<a href="https://trendshift.io/repositories/17291" target="_blank"><img src="https://trendshift.io/api/badge/repositories/17291" alt="trustgraph-ai%2Ftrustgraph | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
||||||
|
|
||||||
# The semantic deployment platform
|
# The agent runtime platform
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
TrustGraph is a comprehensive semantic infrastructure for agents built around context graphs — structured, queryable representations of your domain knowledge that ground every agent query in verified, explainable facts in private deployments with sovereign control. The platform is the full stack for agentic systems: context graphs, memory, retrieval, orchestration, and inference for deterministic agent workloads.
|
TrustGraph is an agent runtime platform built around context graphs — structured, queryable representations of your domain knowledge that ground every agent query in verified, explainable facts in private deployments with sovereign control. The platform is the full stack for agentic systems: context graphs, memory, retrieval, orchestration, and inference for precision-critical agent workloads.
|
||||||
|
|
||||||
The platform:
|
The platform:
|
||||||
- [x] Multi-model and multimodal database system
|
- [x] Multi-model and multimodal database system
|
||||||
|
|
@ -99,21 +99,23 @@ For a browser based configuration, try the [Configuration Terminal](https://conf
|
||||||
- [**Developer APIs and CLI**](https://docs.trustgraph.ai/reference)
|
- [**Developer APIs and CLI**](https://docs.trustgraph.ai/reference)
|
||||||
- [**Deployment Guides**](https://docs.trustgraph.ai/deployment)
|
- [**Deployment Guides**](https://docs.trustgraph.ai/deployment)
|
||||||
|
|
||||||
## Context Graph UI
|
## Workbench
|
||||||
|
|
||||||
<img width="1389" height="961" alt="Image" src="https://github.com/user-attachments/assets/35c9250d-0f01-40cb-9294-1ee8fd9a1b56" />
|
The **Workbench** provides tools for all major features of TrustGraph. The **Workbench** is on port `8888` by default.
|
||||||
|
|
||||||
The UI provides tools for all major features of TrustGraph. The UI deploys on port `8888` by default.
|
- **Vector Search**: Search the installed knowledge bases
|
||||||
|
- **Agentic, GraphRAG and LLM Chat**: Chat interface for agents, GraphRAG queries, or direct to LLMs
|
||||||
- **Agent Console** — Query your agents directly with streaming responses and live explainability event tracking, so you can watch reasoning unfold in real time
|
- **Relationships**: Analyze deep relationships in the installed knowledge bases
|
||||||
- **GraphRAG View** — Interactive graph RAG queries with a visual explainability DAG and inline provenance display, making it easy to see exactly where answers came from
|
- **Graph Visualizer**: 3D GraphViz of the installed knowledge bases
|
||||||
- **Context Explorer** — An interactive 3D context graph explorer with dynamic graph loading, BFS neighborhood extraction, edge pulse animation, and multiple navigation views
|
- **Library**: Staging area for installing knowledge bases
|
||||||
- **Document Ingestion** — A complete upload and submission workflow with page and chunk inspection and document structure browsing
|
- **Flow Classes**: Workflow preset configurations
|
||||||
- **Ontology Workbench** — A full ontology editor with class and property trees, OWL/XML and Turtle import/export with round-trip fidelity, circular dependency detection, and safe-delete confirmation dialogs
|
- **Flows**: Create custom workflows and adjust LLM parameters during runtime
|
||||||
- **Schema Workbench** — Interactive schema management with list, create, edit, and delete operations including field and index management
|
- **Knowledge Cores**: Manage resuable knowledge bases
|
||||||
- **Flow Management** — Flow creation and detail views with configurable parameters, temperature controls, and grouped storage layout
|
- **Prompts**: Manage and adjust prompts during runtime
|
||||||
- **Workspace UX** — Workspace selection and management surfaced directly in the interface
|
- **Schemas**: Define custom schemas for structured data knowledge bases
|
||||||
- **Prompt Editor** — A dedicated prompt editing workflow
|
- **Ontologies**: Define custom ontologies for unstructured data knowledge bases
|
||||||
|
- **Agent Tools**: Define tools with collections, knowledge cores, MCP connections, and tool groups
|
||||||
|
- **MCP Tools**: Connect to MCP servers
|
||||||
|
|
||||||
## TypeScript Library for UIs
|
## TypeScript Library for UIs
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -23,7 +23,7 @@ RUN pip3 install --no-cache-dir \
|
||||||
langchain==1.2.16 langchain-core==1.3.2 langchain-huggingface==1.2.2 \
|
langchain==1.2.16 langchain-core==1.3.2 langchain-huggingface==1.2.2 \
|
||||||
langchain-community==0.4.1 \
|
langchain-community==0.4.1 \
|
||||||
sentence-transformers==5.4.1 transformers==5.7.0 \
|
sentence-transformers==5.4.1 transformers==5.7.0 \
|
||||||
huggingface-hub==1.13.0 click \
|
huggingface-hub==1.13.0 \
|
||||||
pulsar-client==3.11.0
|
pulsar-client==3.11.0
|
||||||
|
|
||||||
# Most commonly used embeddings model, just build it into the container
|
# Most commonly used embeddings model, just build it into the container
|
||||||
|
|
|
||||||
|
|
@ -7,7 +7,7 @@ FROM docker.io/fedora:42 AS base
|
||||||
|
|
||||||
ENV PIP_BREAK_SYSTEM_PACKAGES=1
|
ENV PIP_BREAK_SYSTEM_PACKAGES=1
|
||||||
|
|
||||||
RUN dnf install -y python3.13 libxcb mesa-libGL poppler-utils && \
|
RUN dnf install -y python3.13 libxcb mesa-libGL && \
|
||||||
alternatives --install /usr/bin/python python /usr/bin/python3.13 1 && \
|
alternatives --install /usr/bin/python python /usr/bin/python3.13 1 && \
|
||||||
python -m ensurepip --upgrade && \
|
python -m ensurepip --upgrade && \
|
||||||
pip3 install --no-cache-dir --upgrade 'pip>=26.0' 'setuptools>=78.1.1' && \
|
pip3 install --no-cache-dir --upgrade 'pip>=26.0' 'setuptools>=78.1.1' && \
|
||||||
|
|
|
||||||
|
|
@ -25,7 +25,7 @@ BUCKET_URL = "https://storage.googleapis.com/trustgraph-library"
|
||||||
INDEX_URL = f"{BUCKET_URL}/index.json"
|
INDEX_URL = f"{BUCKET_URL}/index.json"
|
||||||
|
|
||||||
default_url = os.getenv("TRUSTGRAPH_URL", "http://localhost:8088/")
|
default_url = os.getenv("TRUSTGRAPH_URL", "http://localhost:8088/")
|
||||||
default_workspace = os.getenv("TRUSTGRAPH_WORKSPACE", "default")
|
default_user = "trustgraph"
|
||||||
default_token = os.getenv("TRUSTGRAPH_TOKEN", None)
|
default_token = os.getenv("TRUSTGRAPH_TOKEN", None)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -113,7 +113,7 @@ def convert_metadata(metadata_json):
|
||||||
return triples
|
return triples
|
||||||
|
|
||||||
|
|
||||||
def load_document(api, doc_entry):
|
def load_document(api, user, doc_entry):
|
||||||
"""Fetch metadata and content for a document, then load into TrustGraph."""
|
"""Fetch metadata and content for a document, then load into TrustGraph."""
|
||||||
doc_id = doc_entry["id"]
|
doc_id = doc_entry["id"]
|
||||||
title = doc_entry["title"]
|
title = doc_entry["title"]
|
||||||
|
|
@ -133,6 +133,7 @@ def load_document(api, doc_entry):
|
||||||
api.add_document(
|
api.add_document(
|
||||||
id=doc["id"],
|
id=doc["id"],
|
||||||
metadata=metadata,
|
metadata=metadata,
|
||||||
|
user=user,
|
||||||
kind=doc["kind"],
|
kind=doc["kind"],
|
||||||
title=doc["title"],
|
title=doc["title"],
|
||||||
comments=doc["comments"],
|
comments=doc["comments"],
|
||||||
|
|
@ -143,12 +144,12 @@ def load_document(api, doc_entry):
|
||||||
print(f" done.")
|
print(f" done.")
|
||||||
|
|
||||||
|
|
||||||
def load_documents(api, docs):
|
def load_documents(api, user, docs):
|
||||||
"""Load a list of documents."""
|
"""Load a list of documents."""
|
||||||
print(f"Loading {len(docs)} document(s)...\n")
|
print(f"Loading {len(docs)} document(s)...\n")
|
||||||
for doc in docs:
|
for doc in docs:
|
||||||
try:
|
try:
|
||||||
load_document(api, doc)
|
load_document(api, user, doc)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f" FAILED: {e}", file=sys.stderr)
|
print(f" FAILED: {e}", file=sys.stderr)
|
||||||
print()
|
print()
|
||||||
|
|
@ -165,8 +166,8 @@ def main():
|
||||||
help=f"TrustGraph API URL (default: {default_url})",
|
help=f"TrustGraph API URL (default: {default_url})",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"-w", "--workspace", default=default_workspace,
|
"-U", "--user", default=default_user,
|
||||||
help=f"Workspace (default: {default_workspace})",
|
help=f"User ID (default: {default_user})",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"-t", "--token", default=default_token,
|
"-t", "--token", default=default_token,
|
||||||
|
|
@ -211,22 +212,22 @@ def main():
|
||||||
return
|
return
|
||||||
|
|
||||||
# Load commands need the API
|
# Load commands need the API
|
||||||
api = Api(args.url, token=args.token, workspace=args.workspace).library()
|
api = Api(args.url, token=args.token).library()
|
||||||
|
|
||||||
if args.command == "load-all":
|
if args.command == "load-all":
|
||||||
load_documents(api, index)
|
load_documents(api, args.user, index)
|
||||||
|
|
||||||
elif args.command == "load-doc":
|
elif args.command == "load-doc":
|
||||||
matches = [d for d in index if str(d.get("id")) == args.id]
|
matches = [d for d in index if str(d.get("id")) == args.id]
|
||||||
if not matches:
|
if not matches:
|
||||||
print(f"No document with ID '{args.id}' found.", file=sys.stderr)
|
print(f"No document with ID '{args.id}' found.", file=sys.stderr)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
load_documents(api, matches)
|
load_documents(api, args.user, matches)
|
||||||
|
|
||||||
elif args.command == "load-match":
|
elif args.command == "load-match":
|
||||||
results = search_index(index, args.query)
|
results = search_index(index, args.query)
|
||||||
if results:
|
if results:
|
||||||
load_documents(api, results)
|
load_documents(api, args.user, results)
|
||||||
else:
|
else:
|
||||||
print("No matches found.", file=sys.stderr)
|
print("No matches found.", file=sys.stderr)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
|
||||||
2260
docs/api.html
2260
docs/api.html
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load diff
|
|
@ -100,7 +100,6 @@ multi-word subsystems.
|
||||||
| `users:admin` | Assign / remove roles on users within the workspace |
|
| `users:admin` | Assign / remove roles on users within the workspace |
|
||||||
| `keys:self` | Create / revoke / list **own** API keys |
|
| `keys:self` | Create / revoke / list **own** API keys |
|
||||||
| `keys:admin` | Create / revoke / list **any user's** API keys within the workspace |
|
| `keys:admin` | Create / revoke / list **any user's** API keys within the workspace |
|
||||||
| `workspaces:list-own` | List workspaces the caller has access to |
|
|
||||||
| `workspaces:admin` | Create / delete / disable workspaces (system-level) |
|
| `workspaces:admin` | Create / delete / disable workspaces (system-level) |
|
||||||
| `iam:admin` | JWT signing-key rotation, IAM-level operations |
|
| `iam:admin` | JWT signing-key rotation, IAM-level operations |
|
||||||
| `metrics:read` | Prometheus metrics proxy |
|
| `metrics:read` | Prometheus metrics proxy |
|
||||||
|
|
@ -111,7 +110,7 @@ The open-source edition ships three roles:
|
||||||
|
|
||||||
| Role | Capabilities |
|
| Role | Capabilities |
|
||||||
|---|---|
|
|---|---|
|
||||||
| `reader` | `agent`, `graph:read`, `documents:read`, `rows:read`, `llm`, `embeddings`, `mcp`, `collections:read`, `knowledge:read`, `flows:read`, `config:read`, `keys:self`, `workspaces:list-own` |
|
| `reader` | `agent`, `graph:read`, `documents:read`, `rows:read`, `llm`, `embeddings`, `mcp`, `collections:read`, `knowledge:read`, `flows:read`, `config:read`, `keys:self` |
|
||||||
| `writer` | everything in `reader` **+** `graph:write`, `documents:write`, `rows:write`, `collections:write`, `knowledge:write` |
|
| `writer` | everything in `reader` **+** `graph:write`, `documents:write`, `rows:write`, `collections:write`, `knowledge:write` |
|
||||||
| `admin` | everything in `writer` **+** `config:write`, `flows:write`, `users:read`, `users:write`, `users:admin`, `keys:admin`, `workspaces:admin`, `iam:admin`, `metrics:read` |
|
| `admin` | everything in `writer` **+** `config:write`, `flows:write`, `users:read`, `users:write`, `users:admin`, `keys:admin`, `workspaces:admin`, `iam:admin`, `metrics:read` |
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -224,7 +224,6 @@ class ApiKeyRecord:
|
||||||
| `enable-user` | `user_id`, `workspace` (optional integrity check) | — | Re-enables a previously disabled user; does not restore API keys. |
|
| `enable-user` | `user_id`, `workspace` (optional integrity check) | — | Re-enables a previously disabled user; does not restore API keys. |
|
||||||
| `delete-user` | `user_id`, `workspace` (optional integrity check) | — | Hard-delete; removes user record, username lookup, and all the user's API keys. |
|
| `delete-user` | `user_id`, `workspace` (optional integrity check) | — | Hard-delete; removes user record, username lookup, and all the user's API keys. |
|
||||||
| `create-workspace` | `workspace_record` | `workspace` | System-level. |
|
| `create-workspace` | `workspace_record` | `workspace` | System-level. |
|
||||||
| `list-my-workspaces` | `actor` (gateway-injected) | `workspaces` | Returns the workspaces the calling user has access to. OSS: the user's home workspace; if the caller holds the `admin` role, returns all workspaces instead. Enterprise regimes return whatever workspaces the user has been granted access to. |
|
|
||||||
| `list-workspaces` | — | `workspaces` | System-level. |
|
| `list-workspaces` | — | `workspaces` | System-level. |
|
||||||
| `get-workspace` | `workspace_record` (id only) | `workspace` | System-level. |
|
| `get-workspace` | `workspace_record` (id only) | `workspace` | System-level. |
|
||||||
| `update-workspace` | `workspace_record` | `workspace` | System-level. |
|
| `update-workspace` | `workspace_record` | `workspace` | System-level. |
|
||||||
|
|
|
||||||
|
|
@ -1,535 +0,0 @@
|
||||||
---
|
|
||||||
layout: default
|
|
||||||
title: "Knowledge Core Completeness"
|
|
||||||
parent: "Tech Specs"
|
|
||||||
---
|
|
||||||
|
|
||||||
# Knowledge Core Completeness
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
Knowledge cores are portable snapshots of extracted knowledge: triples, graph
|
|
||||||
embeddings, and document embeddings stored in Cassandra's `knowledge` keyspace.
|
|
||||||
They can be downloaded as files, transferred between TrustGraph instances, and
|
|
||||||
loaded back into vector and graph stores.
|
|
||||||
|
|
||||||
Recent additions to TrustGraph — explainability/provenance and named graphs —
|
|
||||||
were not carried through to the knowledge core system. This means that
|
|
||||||
exporting and re-importing a core loses provenance links, graph assignments,
|
|
||||||
and source material, breaking the explainability chain.
|
|
||||||
|
|
||||||
This specification addresses three gaps:
|
|
||||||
|
|
||||||
1. **Named graphs not stored** — The `g` (graph name) field on triples is
|
|
||||||
silently dropped when writing to the core store and comes back as `None`
|
|
||||||
on read.
|
|
||||||
2. **Provenance triples not captured** — Provenance triples (PROV-O) are
|
|
||||||
generated during extraction and flow to graph stores, but never enter
|
|
||||||
the knowledge core store. It is unclear whether they arrive at the store
|
|
||||||
in the correct form.
|
|
||||||
3. **Source material not included** — Documents, text pages, and chunks in
|
|
||||||
the librarian's bucket store are not part of the core. After loading a
|
|
||||||
core on a different instance, provenance links to source material point
|
|
||||||
at nothing.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
- **Self-contained cores**: A downloaded knowledge core file contains
|
|
||||||
everything needed to reconstruct the full knowledge graph including
|
|
||||||
provenance and source attribution on a fresh instance.
|
|
||||||
- **Named graph preservation**: Round-tripping a core preserves graph
|
|
||||||
assignments on all triples.
|
|
||||||
- **Backward compatibility**: Existing core files (without graph names or
|
|
||||||
source material) can still be uploaded and loaded. New fields are optional
|
|
||||||
on import.
|
|
||||||
- **No change to core identity**: A core is still identified by its document
|
|
||||||
ID. The additional data is associated with the same core ID.
|
|
||||||
- **Minimal file format changes**: Extend the existing msgpack record format
|
|
||||||
with new record types rather than restructuring existing ones.
|
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
### Current Lifecycle
|
|
||||||
|
|
||||||
```
|
|
||||||
Extraction pipeline
|
|
||||||
│
|
|
||||||
├─ triples ──────────────────► knowledge core store (Cassandra)
|
|
||||||
├─ graph embeddings ─────────► knowledge core store (Cassandra)
|
|
||||||
├─ document embeddings ──────► knowledge core store (Cassandra)
|
|
||||||
├─ provenance triples ───────► graph store (only)
|
|
||||||
└─ source documents ─────────► librarian bucket store (only)
|
|
||||||
|
|
||||||
Download: Cassandra ──► knowledge manager ──► API gateway ──► client file
|
|
||||||
Upload: client file ──► API gateway ──► knowledge manager ──► Cassandra
|
|
||||||
Load: Cassandra ──► knowledge manager ──► Pulsar topics ──► graph/vector stores
|
|
||||||
```
|
|
||||||
|
|
||||||
### Current Core File Format (msgpack)
|
|
||||||
|
|
||||||
A core file is a sequence of concatenated msgpack records. Each record is a
|
|
||||||
2-element tuple: `(type_tag, payload)`.
|
|
||||||
|
|
||||||
| Type tag | Payload | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `"t"` | `{"m": {id, root, collection}, "t": [triple_dicts]}` | Triple batch |
|
|
||||||
| `"ge"` | `{"m": {id, root, collection}, "e": [{entity, vector}]}` | Graph embedding batch |
|
|
||||||
|
|
||||||
### What's Missing
|
|
||||||
|
|
||||||
#### Named Graphs
|
|
||||||
|
|
||||||
The `Triple` dataclass has a `g: str | None` field (graph name IRI), used to
|
|
||||||
separate provenance graphs (`urn:graph:source`, `urn:graph:retrieval`) from
|
|
||||||
the default graph. However:
|
|
||||||
|
|
||||||
- **Cassandra schema** (`knowledge.triples` table): stores a 6-tuple per
|
|
||||||
triple `(s_val, s_is_uri, p_val, p_is_uri, o_val, o_is_uri)` — no graph
|
|
||||||
field.
|
|
||||||
- **`add_triples()`** (`tables/knowledge.py:231`): destructures only `s`,
|
|
||||||
`p`, `o` — `g` is discarded.
|
|
||||||
- **`get_triples()`** (`tables/knowledge.py:396`): reconstructs `Triple`
|
|
||||||
with `g` defaulting to `None`.
|
|
||||||
- **Core file format**: triple dicts do not include a graph field.
|
|
||||||
|
|
||||||
#### Provenance Triples
|
|
||||||
|
|
||||||
Provenance triples are generated in the extraction pipeline
|
|
||||||
(`trustgraph-base/trustgraph/provenance/triples.py`) and published to graph
|
|
||||||
store topics. They use named graphs (`urn:graph:source`,
|
|
||||||
`urn:graph:retrieval`) and PROV-O vocabulary.
|
|
||||||
|
|
||||||
The knowledge core store processor (`storage/knowledge/store.py`) listens on
|
|
||||||
`triples-input` and `graph-embeddings-input`. Whether provenance triples
|
|
||||||
arrive on the same `triples-input` topic or a separate one needs
|
|
||||||
verification. Even if they do arrive, the graph name would be lost (per
|
|
||||||
above).
|
|
||||||
|
|
||||||
#### Source Material
|
|
||||||
|
|
||||||
The librarian stores the full document hierarchy in a separate system:
|
|
||||||
|
|
||||||
- **Blob store** (S3/MinIO): original documents, text pages, chunks —
|
|
||||||
keyed by object UUID under `doc/{object_id}`.
|
|
||||||
- **Cassandra `library` keyspace**: document metadata including `id`,
|
|
||||||
`kind` (MIME type), `title`, `parent_id`, `document_type`
|
|
||||||
(`source`/`extracted`), `object_id` (blob reference).
|
|
||||||
|
|
||||||
Provenance triples link extracted facts back to chunk/page/document IDs.
|
|
||||||
Those IDs resolve through the librarian. When a core is loaded on a
|
|
||||||
different instance, the librarian has no matching documents, so the entire
|
|
||||||
provenance chain is broken.
|
|
||||||
|
|
||||||
### Key Source Files
|
|
||||||
|
|
||||||
| Component | File | Purpose |
|
|
||||||
|-----------|------|---------|
|
|
||||||
| Core Cassandra schema | `trustgraph-flow/trustgraph/tables/knowledge.py` | Table definitions, read/write |
|
|
||||||
| Core manager | `trustgraph-flow/trustgraph/cores/knowledge.py` | API operations, load-to-store |
|
|
||||||
| Core store processor | `trustgraph-flow/trustgraph/storage/knowledge/store.py` | Extraction → Cassandra |
|
|
||||||
| CLI download | `trustgraph-cli/trustgraph/cli/get_kg_core.py` | Core → msgpack file |
|
|
||||||
| CLI upload | `trustgraph-cli/trustgraph/cli/put_kg_core.py` | Msgpack file → core |
|
|
||||||
| CLI load | `trustgraph-cli/trustgraph/cli/load_kg_core.py` | Core → graph/vector stores |
|
|
||||||
| API client | `trustgraph-base/trustgraph/api/knowledge.py` | Client-side knowledge API |
|
|
||||||
| Triple schema | `trustgraph-base/trustgraph/schema/core/primitives.py` | Triple dataclass with `g` field |
|
|
||||||
| Provenance generation | `trustgraph-base/trustgraph/provenance/triples.py` | PROV-O triple creation |
|
|
||||||
| Librarian | `trustgraph-flow/trustgraph/librarian/librarian.py` | Document storage service |
|
|
||||||
| Library tables | `trustgraph-flow/trustgraph/tables/library.py` | Document metadata in Cassandra |
|
|
||||||
| Blob store | `trustgraph-flow/trustgraph/librarian/blob_store.py` | S3/MinIO object storage |
|
|
||||||
|
|
||||||
## Technical Design
|
|
||||||
|
|
||||||
### Change 1: Named Graph Field in Core Storage
|
|
||||||
|
|
||||||
#### Cassandra Schema
|
|
||||||
|
|
||||||
Extend the `triples` tuple from 6 to 7 elements, adding the graph name:
|
|
||||||
|
|
||||||
```
|
|
||||||
triples list<tuple<
|
|
||||||
text, boolean, -- s_val, s_is_uri
|
|
||||||
text, boolean, -- p_val, p_is_uri
|
|
||||||
text, boolean, -- o_val, o_is_uri
|
|
||||||
text -- graph name (empty string = default graph)
|
|
||||||
>>
|
|
||||||
```
|
|
||||||
|
|
||||||
**Migration**: The schema change uses `ALTER TABLE` or is handled by
|
|
||||||
creating a new table version. Existing rows with 6-element tuples must be
|
|
||||||
handled gracefully on read — if the tuple has 6 elements, treat graph as
|
|
||||||
default.
|
|
||||||
|
|
||||||
#### Write Path (`add_triples`)
|
|
||||||
|
|
||||||
Change `tables/knowledge.py:add_triples()` to include `triple.g`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
triples = [
|
|
||||||
(
|
|
||||||
*term_to_tuple(v.s), *term_to_tuple(v.p), *term_to_tuple(v.o),
|
|
||||||
v.g or ""
|
|
||||||
)
|
|
||||||
for v in m.triples
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Read Path (`get_triples`)
|
|
||||||
|
|
||||||
Change `tables/knowledge.py:get_triples()` to restore the graph name:
|
|
||||||
|
|
||||||
```python
|
|
||||||
Triple(
|
|
||||||
s = tuple_to_term(elt[0], elt[1]),
|
|
||||||
p = tuple_to_term(elt[2], elt[3]),
|
|
||||||
o = tuple_to_term(elt[4], elt[5]),
|
|
||||||
g = elt[6] if len(elt) > 6 and elt[6] else None,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
The `len(elt) > 6` guard provides backward compatibility with existing
|
|
||||||
6-element rows.
|
|
||||||
|
|
||||||
#### Core File Format
|
|
||||||
|
|
||||||
Extend triple dicts in the `"t"` record to include the graph name:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# In get_kg_core.py write_triple — each triple dict gains "g" key
|
|
||||||
{"s": ..., "p": ..., "o": ..., "g": "urn:graph:source"}
|
|
||||||
```
|
|
||||||
|
|
||||||
On read (`put_kg_core.py`), treat missing `"g"` key as default graph for
|
|
||||||
backward compatibility with old core files.
|
|
||||||
|
|
||||||
### Change 2: Provenance Triples in Cores
|
|
||||||
|
|
||||||
#### Investigation Required
|
|
||||||
|
|
||||||
Before implementation, verify:
|
|
||||||
|
|
||||||
1. Whether provenance triples arrive on the `triples-input` topic that the
|
|
||||||
knowledge core store processor already listens on.
|
|
||||||
2. If not, which topic they use, and whether the store processor should
|
|
||||||
subscribe to it.
|
|
||||||
|
|
||||||
#### If provenance triples already arrive at the store
|
|
||||||
|
|
||||||
The only change needed is Change 1 (named graphs) — the provenance triples
|
|
||||||
are already being stored, just without their graph name. Once graph names
|
|
||||||
are preserved, provenance triples will round-trip correctly.
|
|
||||||
|
|
||||||
#### If provenance triples do NOT arrive at the store
|
|
||||||
|
|
||||||
Two options:
|
|
||||||
|
|
||||||
**Option A — Route provenance to the existing store topic**: Configure the
|
|
||||||
flow so provenance triples are published to the same `triples-input` topic.
|
|
||||||
This is the simpler approach and keeps the store processor unchanged.
|
|
||||||
|
|
||||||
**Option B — Add a subscription**: Add a new `ConsumerSpec` in the store
|
|
||||||
processor for the provenance topic. This keeps provenance routing
|
|
||||||
independent but adds complexity.
|
|
||||||
|
|
||||||
Recommendation: Option A, unless there is a reason provenance triples are
|
|
||||||
intentionally kept off the core store topic.
|
|
||||||
|
|
||||||
### Change 3: Source Material in Cores
|
|
||||||
|
|
||||||
This is the largest change. The goal is that when a core is loaded on a
|
|
||||||
fresh instance, provenance links to source material resolve.
|
|
||||||
|
|
||||||
#### Architecture
|
|
||||||
|
|
||||||
Source material is **not stored in the knowledge core tables**. It lives in
|
|
||||||
the librarian (Cassandra `library` keyspace + S3/MinIO blob store) and is
|
|
||||||
fetched on demand via the librarian's existing service API.
|
|
||||||
|
|
||||||
The knowledge manager acts as a **client of the librarian service** — it
|
|
||||||
calls the librarian's request/response API over pub/sub to retrieve document
|
|
||||||
metadata and content. It does not access the library's Cassandra tables or
|
|
||||||
blob store directly.
|
|
||||||
|
|
||||||
#### Transport
|
|
||||||
|
|
||||||
The librarian's pub/sub API already handles chunking of large documents.
|
|
||||||
This chunking is designed to be websocket-friendly, so library content
|
|
||||||
flowing through the API gateway to external clients does not require
|
|
||||||
re-chunking. The API gateway remains a transport layer.
|
|
||||||
|
|
||||||
```
|
|
||||||
Download:
|
|
||||||
Knowledge manager ──pub/sub──► Librarian (fetch metadata + content)
|
|
||||||
Knowledge manager ──pub/sub──► API gateway ──websocket──► Client
|
|
||||||
|
|
||||||
Upload:
|
|
||||||
Client ──websocket──► API gateway ──pub/sub──► Knowledge manager
|
|
||||||
Knowledge manager ──pub/sub──► Librarian (store metadata + content)
|
|
||||||
```
|
|
||||||
|
|
||||||
#### What to Include
|
|
||||||
|
|
||||||
The provenance chain links facts → chunks → pages → documents. For the
|
|
||||||
chain to resolve, the core must include:
|
|
||||||
|
|
||||||
1. **Document metadata** — the library record for each document in the
|
|
||||||
hierarchy (id, kind, title, parent_id, document_type, etc.)
|
|
||||||
2. **Document content** — the blob data for each document (original file,
|
|
||||||
extracted text pages, text chunks)
|
|
||||||
|
|
||||||
Including the full hierarchy is necessary because:
|
|
||||||
- A user viewing provenance needs to traverse fact → chunk → page → document
|
|
||||||
- The chunk text is needed to show what text a fact was extracted from
|
|
||||||
- The page text provides broader context
|
|
||||||
- The original document is needed for full source attribution
|
|
||||||
|
|
||||||
#### Size Implications
|
|
||||||
|
|
||||||
Source material will significantly increase core file sizes. A rough model:
|
|
||||||
|
|
||||||
| Component | Typical size per document |
|
|
||||||
|-----------|-------------------------|
|
|
||||||
| Triples + embeddings (current) | 1-10 MB |
|
|
||||||
| Chunk text (all chunks) | ~same as original document |
|
|
||||||
| Page text (all pages) | ~same as original document |
|
|
||||||
| Original document (PDF, etc.) | Varies widely (KB to hundreds of MB) |
|
|
||||||
|
|
||||||
For a 10 MB PDF, the core could grow from ~5 MB to ~25 MB (original +
|
|
||||||
derived text + existing data). For large document sets, cores could become
|
|
||||||
very large.
|
|
||||||
|
|
||||||
**Decision needed**: Whether to include original documents or just derived
|
|
||||||
text (pages + chunks). Including only derived text still allows provenance
|
|
||||||
display but loses the ability to serve the original file.
|
|
||||||
|
|
||||||
#### New Core File Record Types
|
|
||||||
|
|
||||||
Add new msgpack record types for library content:
|
|
||||||
|
|
||||||
| Type tag | Payload | Description |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `"lm"` | `{"id", "kind", "title", "parent_id", "document_type", "comments", "tags", "metadata"}` | Library document metadata |
|
|
||||||
| `"lb"` | `{"id", "data"}` | Library document blob content (chunked by pub/sub layer) |
|
|
||||||
|
|
||||||
These are emitted after the existing `"t"` and `"ge"` records during
|
|
||||||
download and processed during upload.
|
|
||||||
|
|
||||||
#### Download Path
|
|
||||||
|
|
||||||
Extend `KnowledgeManager.get_kg_core()` to:
|
|
||||||
|
|
||||||
1. Stream triples and graph embeddings from the core store (existing
|
|
||||||
behavior).
|
|
||||||
2. Use the librarian service API to retrieve documents associated with
|
|
||||||
this core ID:
|
|
||||||
a. Fetch the root document metadata and content.
|
|
||||||
b. Use `list-children` to discover child documents (pages, chunks).
|
|
||||||
c. Recursively fetch metadata and content for each child.
|
|
||||||
3. Stream each document as `"lm"` (metadata) and `"lb"` (content) records.
|
|
||||||
|
|
||||||
The knowledge manager gains the librarian service as a pub/sub dependency.
|
|
||||||
Large document content is chunked by the librarian's existing pub/sub
|
|
||||||
transport — the knowledge manager receives and forwards these chunks without
|
|
||||||
buffering the full blob in memory.
|
|
||||||
|
|
||||||
#### Upload Path
|
|
||||||
|
|
||||||
Extend `KnowledgeManager.put_kg_core()` to handle the new record types:
|
|
||||||
|
|
||||||
1. For `"lm"` records: call the librarian service API to create/update
|
|
||||||
the document metadata.
|
|
||||||
2. For `"lb"` records: call the librarian service API to store the
|
|
||||||
document content.
|
|
||||||
|
|
||||||
Parent-child relationships are preserved because `parent_id` is stored in
|
|
||||||
the metadata. Documents should be processed in hierarchy order (parent
|
|
||||||
before child) to satisfy any ordering constraints.
|
|
||||||
|
|
||||||
#### Load Path
|
|
||||||
|
|
||||||
The load path (`_load_kg_core`) publishes triples and embeddings to Pulsar
|
|
||||||
topics for ingestion into graph/vector stores. Source material does not need
|
|
||||||
to flow through the load path — it is already in the librarian after the
|
|
||||||
upload step and can be accessed directly by services that need it.
|
|
||||||
|
|
||||||
No changes to the load path for source material.
|
|
||||||
|
|
||||||
#### CLI Changes
|
|
||||||
|
|
||||||
**`tg-get-kg-core`**: Add handling for `"lm"` and `"lb"` record types in
|
|
||||||
the file writer.
|
|
||||||
|
|
||||||
**`tg-put-kg-core`**: Add handling for `"lm"` and `"lb"` record types in
|
|
||||||
the file reader. Send library records to the knowledge manager alongside
|
|
||||||
triple/embedding records.
|
|
||||||
|
|
||||||
#### Associating Documents with Cores
|
|
||||||
|
|
||||||
The core ID is `metadata.root`, which is the root document ID from the
|
|
||||||
librarian. This provides a natural join: the core's root document and all
|
|
||||||
its children (pages, chunks) are the source material for that core.
|
|
||||||
|
|
||||||
The librarian's `list-children` API provides the child documents. A
|
|
||||||
recursive traversal from the root document collects the full hierarchy.
|
|
||||||
|
|
||||||
### API Changes
|
|
||||||
|
|
||||||
#### KnowledgeResponse Schema
|
|
||||||
|
|
||||||
Add optional fields to `KnowledgeResponse` for library data:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@dataclass
|
|
||||||
class KnowledgeResponse:
|
|
||||||
error: Error | None = None
|
|
||||||
ids: list | None = None
|
|
||||||
eos: bool = False
|
|
||||||
triples: Triples | None = None
|
|
||||||
graph_embeddings: GraphEmbeddings | None = None
|
|
||||||
document_embeddings: DocumentEmbeddings | None = None
|
|
||||||
library_metadata: LibraryMetadata | None = None # new
|
|
||||||
library_blob: LibraryBlob | None = None # new
|
|
||||||
```
|
|
||||||
|
|
||||||
#### New Schema Types
|
|
||||||
|
|
||||||
```python
|
|
||||||
@dataclass
|
|
||||||
class LibraryMetadata:
|
|
||||||
id: str
|
|
||||||
kind: str | None = None
|
|
||||||
title: str | None = None
|
|
||||||
parent_id: str | None = None
|
|
||||||
document_type: str | None = None
|
|
||||||
comments: str | None = None
|
|
||||||
tags: list[str] | None = None
|
|
||||||
metadata: list[Triple] | None = None
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class LibraryBlob:
|
|
||||||
id: str
|
|
||||||
data: bytes
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Socket API
|
|
||||||
|
|
||||||
The existing streaming protocol for `get-kg-core` / `put-kg-core` carries
|
|
||||||
these new fields naturally — responses already stream multiple record types.
|
|
||||||
|
|
||||||
### Dependencies Between Changes
|
|
||||||
|
|
||||||
```
|
|
||||||
Change 1 (named graphs) ◄── Change 2 depends on this
|
|
||||||
│
|
|
||||||
└── Change 2 (provenance triples)
|
|
||||||
│
|
|
||||||
└── Change 3 (source material) is independent
|
|
||||||
```
|
|
||||||
|
|
||||||
Change 1 is a prerequisite for Change 2 (provenance triples use named
|
|
||||||
graphs). Change 3 is independent and can be implemented in parallel.
|
|
||||||
|
|
||||||
## Security Considerations
|
|
||||||
|
|
||||||
- **Workspace isolation**: Core download/upload must respect workspace
|
|
||||||
boundaries. Source material from the librarian must only be included if
|
|
||||||
it belongs to the same workspace as the core. This is already enforced
|
|
||||||
by the existing workspace-scoped queries.
|
|
||||||
- **Large blob transfer**: Streaming large documents through the API
|
|
||||||
is handled by the librarian's existing pub/sub chunking, which is
|
|
||||||
designed to be websocket-friendly. No additional chunking layer is
|
|
||||||
needed.
|
|
||||||
- **Cross-instance trust**: When uploading a core from an external source,
|
|
||||||
the library content should be treated as untrusted input. Document
|
|
||||||
metadata and blob content should be validated before insertion.
|
|
||||||
|
|
||||||
## Performance Considerations
|
|
||||||
|
|
||||||
- **Core file size**: Including source material will significantly increase
|
|
||||||
core file sizes. Consider adding a flag to download/upload commands to
|
|
||||||
optionally exclude source material for use cases where only the knowledge
|
|
||||||
graph is needed.
|
|
||||||
- **Streaming**: All paths already use streaming (paged Cassandra queries,
|
|
||||||
msgpack record-at-a-time). Library content should follow the same pattern.
|
|
||||||
- **Cassandra schema migration**: Changing the tuple width in the `triples`
|
|
||||||
table requires careful handling. Cassandra frozen tuples cannot be altered
|
|
||||||
in place — a migration strategy is needed (see Migration Plan).
|
|
||||||
|
|
||||||
## Testing Strategy
|
|
||||||
|
|
||||||
- **Unit tests**: Triple round-trip with graph name (write → read →
|
|
||||||
verify `g` field preserved). Backward compatibility with 6-element tuples.
|
|
||||||
- **Integration tests**: Full lifecycle — extract with provenance → download
|
|
||||||
core → upload to fresh instance → load → verify provenance chain resolves.
|
|
||||||
- **File format tests**: Read old-format core files (no graph name, no
|
|
||||||
library records) and verify they load without error.
|
|
||||||
- **Library inclusion tests**: Download core with source material → upload →
|
|
||||||
verify documents accessible through librarian.
|
|
||||||
|
|
||||||
## Migration Plan
|
|
||||||
|
|
||||||
### Cassandra Schema
|
|
||||||
|
|
||||||
The `triples` table stores tuples in a `list<tuple<...>>` column. Cassandra
|
|
||||||
does not support altering the type of an existing column. Options:
|
|
||||||
|
|
||||||
**Option A — New table**: Create a `triples_v2` table with the 7-element
|
|
||||||
tuple. Migrate data from `triples` to `triples_v2`. The read path checks
|
|
||||||
both tables during a transition period, then the old table is dropped.
|
|
||||||
|
|
||||||
**Option B — Dual read**: Keep the existing table. The read path handles
|
|
||||||
both 6-element and 7-element tuples by checking length. New writes use
|
|
||||||
7-element tuples. This works if Cassandra accepts variable-length tuples in
|
|
||||||
a list — **needs verification**.
|
|
||||||
|
|
||||||
**Option C — Separate graph column**: Instead of extending the tuple, add a
|
|
||||||
parallel `graphs list<text>` column where `graphs[i]` corresponds to
|
|
||||||
`triples[i]`. This avoids tuple migration entirely but requires keeping the
|
|
||||||
two lists in sync.
|
|
||||||
|
|
||||||
Recommendation: Verify Option B first (simplest). Fall back to Option A if
|
|
||||||
Cassandra rejects mixed tuple lengths.
|
|
||||||
|
|
||||||
### Core File Format
|
|
||||||
|
|
||||||
Backward compatible by design:
|
|
||||||
- Old files lack `"g"` in triple dicts and have no `"lm"`/`"lb"` records →
|
|
||||||
handled by defaults.
|
|
||||||
- New files read by old code → old code ignores unknown record types (the
|
|
||||||
existing `read_message` raises on unknown types, so this needs a small
|
|
||||||
fix to skip unknown types gracefully).
|
|
||||||
|
|
||||||
## Open Questions
|
|
||||||
|
|
||||||
1. **Provenance topic routing**: Do provenance triples currently arrive at
|
|
||||||
the `triples-input` topic consumed by the knowledge core store? If not,
|
|
||||||
what topic are they on?
|
|
||||||
|
|
||||||
2. **Include original documents?**: Should cores include the original
|
|
||||||
uploaded document (e.g. PDF), or only derived text (pages + chunks)?
|
|
||||||
Including originals makes cores fully self-contained but potentially
|
|
||||||
very large. Excluding them preserves provenance text display but loses
|
|
||||||
the ability to serve the original file.
|
|
||||||
|
|
||||||
3. **Optional source material**: Should there be a flag on download/upload
|
|
||||||
to include or exclude source material? This would let users choose
|
|
||||||
between compact cores (knowledge only) and complete cores (knowledge +
|
|
||||||
sources).
|
|
||||||
|
|
||||||
4. **Cassandra tuple migration**: Can Cassandra handle mixed-length tuples
|
|
||||||
in a `list<tuple<...>>` column, or is a table migration required?
|
|
||||||
|
|
||||||
5. **Document embedding cores**: DE cores are managed alongside KG cores.
|
|
||||||
Do they need the same treatment (source material inclusion)? The
|
|
||||||
document embeddings reference chunk IDs — the same provenance chain
|
|
||||||
applies.
|
|
||||||
|
|
||||||
6. **Core versioning**: Should the core file include a version marker so
|
|
||||||
readers can distinguish old-format from new-format files without
|
|
||||||
trial-and-error parsing?
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- Extraction-time provenance: `docs/tech-specs/extraction-time-provenance.md`
|
|
||||||
- Query-time explainability: `docs/tech-specs/query-time-explainability.md`
|
|
||||||
- Agent explainability: `docs/tech-specs/agent-explainability.md`
|
|
||||||
- Data ownership model: `docs/tech-specs/data-ownership-model.md`
|
|
||||||
File diff suppressed because one or more lines are too long
|
|
@ -28,9 +28,8 @@ specs/
|
||||||
Location: `specs/api/openapi.yaml`
|
Location: `specs/api/openapi.yaml`
|
||||||
|
|
||||||
The REST API specification documents:
|
The REST API specification documents:
|
||||||
- **Global Services**: IAM (user management, authentication)
|
- **5 Global Services**: config, flow, librarian, knowledge, collection-management
|
||||||
- **5 Workspace-Scoped Services**: config, flow, librarian, knowledge, collection-management
|
- **16 Flow-Hosted Services**: agent, RAG, embeddings, queries, loading, tools
|
||||||
- **16 Flow-Scoped Services**: agent, RAG, embeddings, queries, loading, tools
|
|
||||||
- **Import/Export**: Bulk data operations
|
- **Import/Export**: Bulk data operations
|
||||||
- **Metrics**: Prometheus monitoring
|
- **Metrics**: Prometheus monitoring
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -2,55 +2,6 @@
|
||||||
|
|
||||||
This directory contains the modular OpenAPI 3.1 specification for the TrustGraph REST API Gateway.
|
This directory contains the modular OpenAPI 3.1 specification for the TrustGraph REST API Gateway.
|
||||||
|
|
||||||
## Authentication
|
|
||||||
|
|
||||||
Clients authenticate by passing an opaque bearer token in the
|
|
||||||
`Authorization` header. The gateway resolves the token to an
|
|
||||||
authenticated identity and an associated workspace. Tokens are
|
|
||||||
obtained via the IAM service (e.g. `tg-login` or `tg-create-api-key`).
|
|
||||||
|
|
||||||
## Service Tiers
|
|
||||||
|
|
||||||
API services are organized into three tiers based on their scoping:
|
|
||||||
|
|
||||||
### Global services
|
|
||||||
|
|
||||||
These services are not scoped to a workspace. They manage
|
|
||||||
system-wide resources.
|
|
||||||
|
|
||||||
- **IAM** — user management, authentication, API key lifecycle
|
|
||||||
|
|
||||||
### Workspace-scoped services
|
|
||||||
|
|
||||||
These services operate within the workspace associated with the
|
|
||||||
authenticated token. The workspace is resolved by the gateway from
|
|
||||||
the bearer token — it is not passed as an explicit parameter.
|
|
||||||
|
|
||||||
- **Config** — configuration management (prompts, token costs, etc.)
|
|
||||||
- **Librarian** — document library management
|
|
||||||
- **Knowledge** — knowledge graph core management
|
|
||||||
- **Collection Management** — collection metadata
|
|
||||||
- **Flow** — flow lifecycle and blueprint management
|
|
||||||
|
|
||||||
### Flow-scoped services
|
|
||||||
|
|
||||||
These services require a `flow` parameter identifying the processing
|
|
||||||
flow to use, in addition to the workspace context from the token.
|
|
||||||
|
|
||||||
- **Agent** — agentic AI interactions
|
|
||||||
- **Document RAG** — retrieval-augmented generation over documents
|
|
||||||
- **Graph RAG** — retrieval-augmented generation over knowledge graphs
|
|
||||||
- **Text Completion** — LLM text completion
|
|
||||||
- **Prompt** — prompt template expansion
|
|
||||||
- **Embeddings** — vector embedding generation
|
|
||||||
- **SPARQL Query** — SPARQL queries against the knowledge graph
|
|
||||||
- **Graph Embeddings** — knowledge graph embedding queries
|
|
||||||
- **Document Embeddings** — document embedding queries
|
|
||||||
- **Structured Query** — structured data queries
|
|
||||||
- **Row Embeddings** — structured data embedding queries
|
|
||||||
- **Rows Query** — row-level data queries
|
|
||||||
- **Triples Query** — knowledge graph triple queries
|
|
||||||
|
|
||||||
## Structure
|
## Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -14,7 +14,7 @@ properties:
|
||||||
- delete-collection
|
- delete-collection
|
||||||
description: |
|
description: |
|
||||||
Collection operation:
|
Collection operation:
|
||||||
- `list-collections`: List collections in the current workspace (resolved from token)
|
- `list-collections`: List collections in workspace
|
||||||
- `update-collection`: Create or update collection metadata
|
- `update-collection`: Create or update collection metadata
|
||||||
- `delete-collection`: Delete collection
|
- `delete-collection`: Delete collection
|
||||||
collection:
|
collection:
|
||||||
|
|
|
||||||
|
|
@ -1,21 +0,0 @@
|
||||||
type: object
|
|
||||||
description: |
|
|
||||||
API key creation fields. Used with `create-api-key`.
|
|
||||||
properties:
|
|
||||||
user_id:
|
|
||||||
type: string
|
|
||||||
description: User to create the key for.
|
|
||||||
examples:
|
|
||||||
- usr_abc123
|
|
||||||
name:
|
|
||||||
type: string
|
|
||||||
description: Operator-facing label for the key (e.g. "laptop", "CI").
|
|
||||||
examples:
|
|
||||||
- laptop
|
|
||||||
expires:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Optional expiry timestamp in ISO-8601 UTC. Empty string or
|
|
||||||
omitted means the key does not expire.
|
|
||||||
examples:
|
|
||||||
- "2027-01-01T00:00:00Z"
|
|
||||||
|
|
@ -1,38 +0,0 @@
|
||||||
type: object
|
|
||||||
description: API key record returned by IAM operations.
|
|
||||||
properties:
|
|
||||||
id:
|
|
||||||
type: string
|
|
||||||
description: Key identifier.
|
|
||||||
examples:
|
|
||||||
- key_xyz789
|
|
||||||
user_id:
|
|
||||||
type: string
|
|
||||||
description: Owning user identifier.
|
|
||||||
examples:
|
|
||||||
- usr_abc123
|
|
||||||
name:
|
|
||||||
type: string
|
|
||||||
description: Operator-facing label.
|
|
||||||
examples:
|
|
||||||
- laptop
|
|
||||||
prefix:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
First 4 characters of the plaintext key, for identification
|
|
||||||
in listings. Never enough to reconstruct the key.
|
|
||||||
examples:
|
|
||||||
- tg_a
|
|
||||||
expires:
|
|
||||||
type: string
|
|
||||||
description: Expiry timestamp (ISO-8601 UTC). Empty if no expiry.
|
|
||||||
examples:
|
|
||||||
- "2027-01-01T00:00:00Z"
|
|
||||||
created:
|
|
||||||
type: string
|
|
||||||
description: Creation timestamp (ISO-8601 UTC).
|
|
||||||
examples:
|
|
||||||
- "2026-01-15T10:30:00Z"
|
|
||||||
last_used:
|
|
||||||
type: string
|
|
||||||
description: Last-used timestamp (ISO-8601 UTC). Empty if never used.
|
|
||||||
|
|
@ -1,106 +0,0 @@
|
||||||
type: object
|
|
||||||
description: |
|
|
||||||
IAM service request.
|
|
||||||
|
|
||||||
The IAM service is a **global service** — it operates at system level,
|
|
||||||
not scoped to a specific workspace. All operations are dispatched via
|
|
||||||
the `operation` field.
|
|
||||||
|
|
||||||
Some operations require admin capabilities; others (like `whoami` and
|
|
||||||
`list-my-workspaces`) are available to any authenticated user. See
|
|
||||||
the capability vocabulary for details.
|
|
||||||
|
|
||||||
The `actor` field is injected by the gateway and cannot be set by
|
|
||||||
the client. It identifies the authenticated caller.
|
|
||||||
required:
|
|
||||||
- operation
|
|
||||||
properties:
|
|
||||||
operation:
|
|
||||||
type: string
|
|
||||||
enum:
|
|
||||||
- whoami
|
|
||||||
- list-my-workspaces
|
|
||||||
- create-user
|
|
||||||
- list-users
|
|
||||||
- get-user
|
|
||||||
- update-user
|
|
||||||
- disable-user
|
|
||||||
- enable-user
|
|
||||||
- delete-user
|
|
||||||
- create-workspace
|
|
||||||
- list-workspaces
|
|
||||||
- get-workspace
|
|
||||||
- update-workspace
|
|
||||||
- disable-workspace
|
|
||||||
- create-api-key
|
|
||||||
- list-api-keys
|
|
||||||
- revoke-api-key
|
|
||||||
- reset-password
|
|
||||||
- rotate-signing-key
|
|
||||||
description: |
|
|
||||||
Operation to perform.
|
|
||||||
|
|
||||||
**Any authenticated user:**
|
|
||||||
- `whoami`: Return the caller's own user record
|
|
||||||
- `list-my-workspaces`: List workspaces the caller has access to
|
|
||||||
|
|
||||||
**User management (requires `users:read`/`users:write`/`users:admin`):**
|
|
||||||
- `create-user`: Create a new user in a workspace
|
|
||||||
- `list-users`: List users (optionally filtered by workspace)
|
|
||||||
- `get-user`: Get a specific user record
|
|
||||||
- `update-user`: Update user fields (name, email, roles, enabled)
|
|
||||||
- `disable-user`: Soft-disable a user and revoke their API keys
|
|
||||||
- `enable-user`: Re-enable a previously disabled user
|
|
||||||
- `delete-user`: Hard-delete a user and their API keys
|
|
||||||
|
|
||||||
**Workspace management (requires `workspaces:admin`):**
|
|
||||||
- `create-workspace`: Create a new workspace
|
|
||||||
- `list-workspaces`: List all workspaces (admin view)
|
|
||||||
- `get-workspace`: Get a specific workspace record
|
|
||||||
- `update-workspace`: Update workspace name or enabled state
|
|
||||||
- `disable-workspace`: Disable workspace and all its users
|
|
||||||
|
|
||||||
**API key management (requires `keys:self` or `keys:admin`):**
|
|
||||||
- `create-api-key`: Create an API key for a user
|
|
||||||
- `list-api-keys`: List API keys for a user
|
|
||||||
- `revoke-api-key`: Revoke (delete) an API key
|
|
||||||
|
|
||||||
**Password management:**
|
|
||||||
- `reset-password`: Admin-initiated password reset (requires `users:admin`)
|
|
||||||
|
|
||||||
**System (requires `iam:admin`):**
|
|
||||||
- `rotate-signing-key`: Rotate the JWT signing key
|
|
||||||
workspace:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Workspace scope. Required on workspace-scoped operations
|
|
||||||
(e.g. `create-user`). Acts as an optional integrity check on
|
|
||||||
operations that target a user or key — when supplied, the target's
|
|
||||||
home workspace must match.
|
|
||||||
|
|
||||||
Omitted for system-level operations (`list-workspaces`,
|
|
||||||
`rotate-signing-key`) and for identity-resolution operations
|
|
||||||
(`whoami`, `list-my-workspaces`).
|
|
||||||
examples:
|
|
||||||
- default
|
|
||||||
- production
|
|
||||||
user_id:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Target user identifier. Required for operations that act on a
|
|
||||||
specific user: `get-user`, `update-user`, `disable-user`,
|
|
||||||
`enable-user`, `delete-user`, `reset-password`, `list-api-keys`.
|
|
||||||
examples:
|
|
||||||
- usr_abc123
|
|
||||||
user:
|
|
||||||
$ref: './UserInput.yaml'
|
|
||||||
workspace_record:
|
|
||||||
$ref: './WorkspaceInput.yaml'
|
|
||||||
key:
|
|
||||||
$ref: './ApiKeyInput.yaml'
|
|
||||||
key_id:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
API key identifier. Required for `revoke-api-key`.
|
|
||||||
examples:
|
|
||||||
- key_xyz789
|
|
||||||
|
|
@ -1,51 +0,0 @@
|
||||||
type: object
|
|
||||||
description: |
|
|
||||||
IAM service response. Fields are populated depending on the
|
|
||||||
operation that was invoked.
|
|
||||||
properties:
|
|
||||||
user:
|
|
||||||
$ref: './UserRecord.yaml'
|
|
||||||
users:
|
|
||||||
type: array
|
|
||||||
description: List of user records (populated by `list-users`).
|
|
||||||
items:
|
|
||||||
$ref: './UserRecord.yaml'
|
|
||||||
workspace:
|
|
||||||
$ref: './WorkspaceRecord.yaml'
|
|
||||||
workspaces:
|
|
||||||
type: array
|
|
||||||
description: |
|
|
||||||
List of workspace records (populated by `list-workspaces` and
|
|
||||||
`list-my-workspaces`).
|
|
||||||
items:
|
|
||||||
$ref: './WorkspaceRecord.yaml'
|
|
||||||
api_key_plaintext:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Plaintext API key. Returned **once** by `create-api-key`.
|
|
||||||
Never populated on any other operation. The caller must
|
|
||||||
capture this value — it cannot be retrieved again.
|
|
||||||
api_key:
|
|
||||||
$ref: './ApiKeyRecord.yaml'
|
|
||||||
api_keys:
|
|
||||||
type: array
|
|
||||||
description: List of API key records (populated by `list-api-keys`).
|
|
||||||
items:
|
|
||||||
$ref: './ApiKeyRecord.yaml'
|
|
||||||
temporary_password:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Temporary password returned once by `reset-password`.
|
|
||||||
error:
|
|
||||||
type: object
|
|
||||||
description: Error details (present on failure).
|
|
||||||
properties:
|
|
||||||
type:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Error type. One of: `invalid-argument`, `not-found`,
|
|
||||||
`duplicate`, `auth-failed`, `weak-password`, `disabled`,
|
|
||||||
`operation-not-permitted`, `internal-error`.
|
|
||||||
message:
|
|
||||||
type: string
|
|
||||||
description: Human-readable error description (not surfaced to end users).
|
|
||||||
|
|
@ -1,42 +0,0 @@
|
||||||
type: object
|
|
||||||
description: |
|
|
||||||
User creation/update fields. Used with `create-user` and `update-user`.
|
|
||||||
The `password` field is only accepted on `create-user`.
|
|
||||||
properties:
|
|
||||||
username:
|
|
||||||
type: string
|
|
||||||
description: Login username. Unique within a workspace.
|
|
||||||
examples:
|
|
||||||
- alice
|
|
||||||
name:
|
|
||||||
type: string
|
|
||||||
description: Display name.
|
|
||||||
examples:
|
|
||||||
- Alice Smith
|
|
||||||
email:
|
|
||||||
type: string
|
|
||||||
description: Email address.
|
|
||||||
examples:
|
|
||||||
- alice@example.com
|
|
||||||
password:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Initial password. Only accepted on `create-user`; rejected on
|
|
||||||
`update-user`. Use `reset-password` or `change-password` to
|
|
||||||
modify passwords.
|
|
||||||
roles:
|
|
||||||
type: array
|
|
||||||
items:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Roles to assign. Open-source roles: `reader`, `writer`, `admin`.
|
|
||||||
examples:
|
|
||||||
- - reader
|
|
||||||
enabled:
|
|
||||||
type: boolean
|
|
||||||
description: Whether the user is enabled.
|
|
||||||
default: true
|
|
||||||
must_change_password:
|
|
||||||
type: boolean
|
|
||||||
description: Force password change on next login.
|
|
||||||
default: false
|
|
||||||
|
|
@ -1,46 +0,0 @@
|
||||||
type: object
|
|
||||||
description: User record returned by IAM operations.
|
|
||||||
properties:
|
|
||||||
id:
|
|
||||||
type: string
|
|
||||||
description: Unique user identifier.
|
|
||||||
examples:
|
|
||||||
- usr_abc123
|
|
||||||
workspace:
|
|
||||||
type: string
|
|
||||||
description: User's home workspace.
|
|
||||||
examples:
|
|
||||||
- default
|
|
||||||
username:
|
|
||||||
type: string
|
|
||||||
description: Login username (unique within workspace).
|
|
||||||
examples:
|
|
||||||
- alice
|
|
||||||
name:
|
|
||||||
type: string
|
|
||||||
description: Display name.
|
|
||||||
examples:
|
|
||||||
- Alice Smith
|
|
||||||
email:
|
|
||||||
type: string
|
|
||||||
description: Email address.
|
|
||||||
examples:
|
|
||||||
- alice@example.com
|
|
||||||
roles:
|
|
||||||
type: array
|
|
||||||
items:
|
|
||||||
type: string
|
|
||||||
description: Assigned roles.
|
|
||||||
examples:
|
|
||||||
- - reader
|
|
||||||
enabled:
|
|
||||||
type: boolean
|
|
||||||
description: Whether the user is enabled.
|
|
||||||
must_change_password:
|
|
||||||
type: boolean
|
|
||||||
description: Whether the user must change password on next login.
|
|
||||||
created:
|
|
||||||
type: string
|
|
||||||
description: Creation timestamp (ISO-8601 UTC).
|
|
||||||
examples:
|
|
||||||
- "2026-01-15T10:30:00Z"
|
|
||||||
|
|
@ -1,23 +0,0 @@
|
||||||
type: object
|
|
||||||
description: |
|
|
||||||
Workspace creation/update fields. Used with `create-workspace` and
|
|
||||||
`update-workspace`.
|
|
||||||
properties:
|
|
||||||
id:
|
|
||||||
type: string
|
|
||||||
description: |
|
|
||||||
Workspace identifier. Required for all workspace operations.
|
|
||||||
Immutable after creation.
|
|
||||||
examples:
|
|
||||||
- default
|
|
||||||
- production
|
|
||||||
name:
|
|
||||||
type: string
|
|
||||||
description: Human-readable workspace name.
|
|
||||||
examples:
|
|
||||||
- Default Workspace
|
|
||||||
- Production
|
|
||||||
enabled:
|
|
||||||
type: boolean
|
|
||||||
description: Whether the workspace is enabled.
|
|
||||||
default: true
|
|
||||||
|
|
@ -1,21 +0,0 @@
|
||||||
type: object
|
|
||||||
description: Workspace record returned by IAM operations.
|
|
||||||
properties:
|
|
||||||
id:
|
|
||||||
type: string
|
|
||||||
description: Workspace identifier.
|
|
||||||
examples:
|
|
||||||
- default
|
|
||||||
name:
|
|
||||||
type: string
|
|
||||||
description: Human-readable workspace name.
|
|
||||||
examples:
|
|
||||||
- Default Workspace
|
|
||||||
enabled:
|
|
||||||
type: boolean
|
|
||||||
description: Whether the workspace is enabled.
|
|
||||||
created:
|
|
||||||
type: string
|
|
||||||
description: Creation timestamp (ISO-8601 UTC).
|
|
||||||
examples:
|
|
||||||
- "2026-01-01T00:00:00Z"
|
|
||||||
|
|
@ -18,7 +18,7 @@ properties:
|
||||||
- unload-kg-core
|
- unload-kg-core
|
||||||
description: |
|
description: |
|
||||||
Knowledge core operation:
|
Knowledge core operation:
|
||||||
- `list-kg-cores`: List knowledge cores in the current workspace (resolved from token)
|
- `list-kg-cores`: List knowledge cores in workspace
|
||||||
- `get-kg-core`: Get knowledge core by ID
|
- `get-kg-core`: Get knowledge core by ID
|
||||||
- `put-kg-core`: Store triples and/or embeddings
|
- `put-kg-core`: Store triples and/or embeddings
|
||||||
- `delete-kg-core`: Delete knowledge core by ID
|
- `delete-kg-core`: Delete knowledge core by ID
|
||||||
|
|
|
||||||
|
|
@ -2,44 +2,21 @@ openapi: 3.1.0
|
||||||
|
|
||||||
info:
|
info:
|
||||||
title: TrustGraph API Gateway
|
title: TrustGraph API Gateway
|
||||||
version: "2.4"
|
version: "2.2"
|
||||||
description: |
|
description: |
|
||||||
REST API for TrustGraph - an AI-powered knowledge graph and RAG system.
|
REST API for TrustGraph - an AI-powered knowledge graph and RAG system.
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The API provides access to:
|
The API provides access to:
|
||||||
- **Global Services**: IAM (user management, authentication)
|
- **Global Services**: Configuration, flow management, knowledge storage, library management
|
||||||
- **Workspace-Scoped Services**: Configuration, flow management, knowledge storage, library management
|
- **Flow-Hosted Services**: AI services like RAG, text completion, embeddings (require running flow)
|
||||||
- **Flow-Scoped Services**: AI services like RAG, text completion, embeddings (require running flow)
|
|
||||||
- **Import/Export**: Bulk data operations for triples, embeddings, entity contexts
|
- **Import/Export**: Bulk data operations for triples, embeddings, entity contexts
|
||||||
- **WebSocket**: Multiplexed interface for all services
|
- **WebSocket**: Multiplexed interface for all services
|
||||||
|
|
||||||
## Authentication
|
## Service Types
|
||||||
|
|
||||||
Clients authenticate by passing an opaque bearer token in the
|
|
||||||
`Authorization` header. The token is obtained via the IAM service
|
|
||||||
(e.g. `tg-login` or `tg-create-api-key`).
|
|
||||||
|
|
||||||
```
|
|
||||||
Authorization: Bearer <token>
|
|
||||||
```
|
|
||||||
|
|
||||||
The gateway resolves the token to an authenticated identity and an
|
|
||||||
associated workspace. The token is an opaque string — clients must
|
|
||||||
not make assumptions about its internal structure.
|
|
||||||
|
|
||||||
## Service Tiers
|
|
||||||
|
|
||||||
### Global Services
|
### Global Services
|
||||||
System-wide services with no workspace scoping:
|
|
||||||
- `iam` - User management, authentication, API key lifecycle
|
|
||||||
|
|
||||||
### Workspace-Scoped Services
|
|
||||||
Operate within the workspace associated with the authenticated
|
|
||||||
token. The workspace is resolved by the gateway — it is not
|
|
||||||
passed as an explicit parameter.
|
|
||||||
|
|
||||||
Fixed endpoints accessible via `/api/v1/{kind}`:
|
Fixed endpoints accessible via `/api/v1/{kind}`:
|
||||||
- `config` - Configuration management
|
- `config` - Configuration management
|
||||||
- `flow` - Flow lifecycle and blueprints
|
- `flow` - Flow lifecycle and blueprints
|
||||||
|
|
@ -47,17 +24,24 @@ info:
|
||||||
- `knowledge` - Knowledge graph core management
|
- `knowledge` - Knowledge graph core management
|
||||||
- `collection-management` - Collection metadata
|
- `collection-management` - Collection metadata
|
||||||
|
|
||||||
### Flow-Scoped Services
|
### Flow-Hosted Services
|
||||||
Require a `flow` parameter identifying the processing flow to use.
|
Require running flow instance, accessed via `/api/v1/flow/{flow}/service/{kind}`:
|
||||||
Workspace context comes from the authenticated token.
|
|
||||||
|
|
||||||
Accessed via `/api/v1/flow/{flow}/service/{kind}`:
|
|
||||||
- AI services: agent, text-completion, prompt, RAG (document/graph)
|
- AI services: agent, text-completion, prompt, RAG (document/graph)
|
||||||
- Embeddings: embeddings, graph-embeddings, document-embeddings
|
- Embeddings: embeddings, graph-embeddings, document-embeddings
|
||||||
- Query: triples, rows, nlp-query, structured-query, sparql-query, row-embeddings
|
- Query: triples, rows, nlp-query, structured-query, sparql-query, row-embeddings
|
||||||
- Data loading: text-load, document-load
|
- Data loading: text-load, document-load
|
||||||
- Utilities: mcp-tool, structured-diag
|
- Utilities: mcp-tool, structured-diag
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
Bearer token authentication when `GATEWAY_SECRET` environment variable is set.
|
||||||
|
Include token in Authorization header:
|
||||||
|
```
|
||||||
|
Authorization: Bearer <token>
|
||||||
|
```
|
||||||
|
|
||||||
|
If `GATEWAY_SECRET` is not set, API runs without authentication (development mode).
|
||||||
|
|
||||||
## Field Naming
|
## Field Naming
|
||||||
|
|
||||||
All JSON fields use **kebab-case**: `flow-id`, `blueprint-name`, `doc-limit`, etc.
|
All JSON fields use **kebab-case**: `flow-id`, `blueprint-name`, `doc-limit`, etc.
|
||||||
|
|
@ -89,20 +73,18 @@ security:
|
||||||
- bearerAuth: []
|
- bearerAuth: []
|
||||||
|
|
||||||
tags:
|
tags:
|
||||||
- name: IAM
|
|
||||||
description: Identity and access management (global)
|
|
||||||
- name: Config
|
- name: Config
|
||||||
description: Configuration management (workspace-scoped)
|
description: Configuration management (global service)
|
||||||
- name: Flow
|
- name: Flow
|
||||||
description: Flow lifecycle and blueprint management (workspace-scoped)
|
description: Flow lifecycle and blueprint management (global service)
|
||||||
- name: Librarian
|
- name: Librarian
|
||||||
description: Document library management (workspace-scoped)
|
description: Document library management (global service)
|
||||||
- name: Knowledge
|
- name: Knowledge
|
||||||
description: Knowledge graph core management (workspace-scoped)
|
description: Knowledge graph core management (global service)
|
||||||
- name: Collection
|
- name: Collection
|
||||||
description: Collection metadata management (workspace-scoped)
|
description: Collection metadata management (global service)
|
||||||
- name: Flow Services
|
- name: Flow Services
|
||||||
description: AI and query services hosted within flow instances (flow-scoped)
|
description: Services hosted within flow instances
|
||||||
- name: Import/Export
|
- name: Import/Export
|
||||||
description: Bulk data import and export
|
description: Bulk data import and export
|
||||||
- name: WebSocket
|
- name: WebSocket
|
||||||
|
|
@ -111,11 +93,6 @@ tags:
|
||||||
description: System metrics and monitoring
|
description: System metrics and monitoring
|
||||||
|
|
||||||
paths:
|
paths:
|
||||||
# Global services
|
|
||||||
/api/v1/iam:
|
|
||||||
$ref: './paths/iam.yaml'
|
|
||||||
|
|
||||||
# Workspace-scoped services
|
|
||||||
/api/v1/config:
|
/api/v1/config:
|
||||||
$ref: './paths/config.yaml'
|
$ref: './paths/config.yaml'
|
||||||
/api/v1/flow:
|
/api/v1/flow:
|
||||||
|
|
|
||||||
|
|
@ -1,13 +1,10 @@
|
||||||
post:
|
post:
|
||||||
tags:
|
tags:
|
||||||
- Collection
|
- Collection
|
||||||
summary: Collection metadata management (workspace-scoped)
|
summary: Collection metadata management
|
||||||
description: |
|
description: |
|
||||||
Manage collection metadata for organizing documents and knowledge.
|
Manage collection metadata for organizing documents and knowledge.
|
||||||
|
|
||||||
This is a **workspace-scoped** service. All operations apply to the
|
|
||||||
workspace associated with the authenticated bearer token.
|
|
||||||
|
|
||||||
## Collections
|
## Collections
|
||||||
|
|
||||||
Collections are organizational units for grouping:
|
Collections are organizational units for grouping:
|
||||||
|
|
|
||||||
|
|
@ -1,13 +1,9 @@
|
||||||
post:
|
post:
|
||||||
tags:
|
tags:
|
||||||
- Config
|
- Config
|
||||||
summary: Configuration service (workspace-scoped)
|
summary: Configuration service
|
||||||
description: |
|
description: |
|
||||||
Manage TrustGraph configuration including flows, prompts, token costs,
|
Manage TrustGraph configuration including flows, prompts, token costs, parameter types, and more.
|
||||||
parameter types, and more.
|
|
||||||
|
|
||||||
This is a **workspace-scoped** service. All operations apply to the
|
|
||||||
workspace associated with the authenticated bearer token.
|
|
||||||
|
|
||||||
## Operations
|
## Operations
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,13 +1,10 @@
|
||||||
post:
|
post:
|
||||||
tags:
|
tags:
|
||||||
- Flow
|
- Flow
|
||||||
summary: Flow lifecycle and blueprint management (workspace-scoped)
|
summary: Flow lifecycle and blueprint management
|
||||||
description: |
|
description: |
|
||||||
Manage flow instances and blueprints.
|
Manage flow instances and blueprints.
|
||||||
|
|
||||||
This is a **workspace-scoped** service. All operations apply to the
|
|
||||||
workspace associated with the authenticated bearer token.
|
|
||||||
|
|
||||||
## Important Distinction
|
## Important Distinction
|
||||||
|
|
||||||
The **flow service** manages *running flow instances*.
|
The **flow service** manages *running flow instances*.
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
AI agent that can understand questions, reason about them, and take actions.
|
AI agent that can understand questions, reason about them, and take actions.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Agent Overview
|
## Agent Overview
|
||||||
|
|
||||||
The agent service provides a conversational AI that:
|
The agent service provides a conversational AI that:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Query document embeddings to find similar text chunks by vector similarity.
|
Query document embeddings to find similar text chunks by vector similarity.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Document Embeddings Query Overview
|
## Document Embeddings Query Overview
|
||||||
|
|
||||||
Find document chunks semantically similar to a query vector:
|
Find document chunks semantically similar to a query vector:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Load binary documents (PDF, Word, etc.) into processing pipeline.
|
Load binary documents (PDF, Word, etc.) into processing pipeline.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Document Load Overview
|
## Document Load Overview
|
||||||
|
|
||||||
Fire-and-forget binary document loading:
|
Fire-and-forget binary document loading:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Retrieval-Augmented Generation over document embeddings.
|
Retrieval-Augmented Generation over document embeddings.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Document RAG Overview
|
## Document RAG Overview
|
||||||
|
|
||||||
Document RAG combines:
|
Document RAG combines:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Convert text to embedding vectors for semantic similarity search.
|
Convert text to embedding vectors for semantic similarity search.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Embeddings Overview
|
## Embeddings Overview
|
||||||
|
|
||||||
Embeddings transform text into dense vector representations that:
|
Embeddings transform text into dense vector representations that:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Query graph embeddings to find similar entities by vector similarity.
|
Query graph embeddings to find similar entities by vector similarity.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Graph Embeddings Query Overview
|
## Graph Embeddings Query Overview
|
||||||
|
|
||||||
Find entities semantically similar to a query vector:
|
Find entities semantically similar to a query vector:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Retrieval-Augmented Generation over knowledge graph.
|
Retrieval-Augmented Generation over knowledge graph.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Graph RAG Overview
|
## Graph RAG Overview
|
||||||
|
|
||||||
Graph RAG combines:
|
Graph RAG combines:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Execute MCP (Model Context Protocol) tools for agent capabilities.
|
Execute MCP (Model Context Protocol) tools for agent capabilities.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## MCP Tool Overview
|
## MCP Tool Overview
|
||||||
|
|
||||||
MCP tools provide agent capabilities through standardized protocol:
|
MCP tools provide agent capabilities through standardized protocol:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Convert natural language questions to structured GraphQL queries.
|
Convert natural language questions to structured GraphQL queries.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## NLP Query Overview
|
## NLP Query Overview
|
||||||
|
|
||||||
Transforms user questions into executable GraphQL:
|
Transforms user questions into executable GraphQL:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Execute stored prompt templates with variable substitution.
|
Execute stored prompt templates with variable substitution.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Prompt Service Overview
|
## Prompt Service Overview
|
||||||
|
|
||||||
The prompt service enables:
|
The prompt service enables:
|
||||||
|
|
|
||||||
|
|
@ -4,11 +4,6 @@ post:
|
||||||
summary: Row Embeddings Query - semantic search on structured data
|
summary: Row Embeddings Query - semantic search on structured data
|
||||||
description: |
|
description: |
|
||||||
Query row embeddings to find similar rows by vector similarity on indexed fields.
|
Query row embeddings to find similar rows by vector similarity on indexed fields.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
Enables fuzzy/semantic matching on structured data.
|
Enables fuzzy/semantic matching on structured data.
|
||||||
|
|
||||||
## Row Embeddings Query Overview
|
## Row Embeddings Query Overview
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Query structured data using GraphQL for row-oriented data access.
|
Query structured data using GraphQL for row-oriented data access.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Rows Query Overview
|
## Rows Query Overview
|
||||||
|
|
||||||
GraphQL interface to structured data:
|
GraphQL interface to structured data:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Execute a SPARQL 1.1 query against the knowledge graph.
|
Execute a SPARQL 1.1 query against the knowledge graph.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Supported Query Types
|
## Supported Query Types
|
||||||
|
|
||||||
- **SELECT**: Returns variable bindings as a table of results
|
- **SELECT**: Returns variable bindings as a table of results
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Analyze and understand structured data (CSV, JSON, XML).
|
Analyze and understand structured data (CSV, JSON, XML).
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Structured Diag Overview
|
## Structured Diag Overview
|
||||||
|
|
||||||
Helps process unknown structured data:
|
Helps process unknown structured data:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Ask natural language questions and get results directly.
|
Ask natural language questions and get results directly.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Structured Query Overview
|
## Structured Query Overview
|
||||||
|
|
||||||
Combines two operations in one call:
|
Combines two operations in one call:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Direct text completion using LLM without retrieval augmentation.
|
Direct text completion using LLM without retrieval augmentation.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Text Completion Overview
|
## Text Completion Overview
|
||||||
|
|
||||||
Pure LLM generation for:
|
Pure LLM generation for:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Load text documents into processing pipeline for indexing and embedding.
|
Load text documents into processing pipeline for indexing and embedding.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Text Load Overview
|
## Text Load Overview
|
||||||
|
|
||||||
Fire-and-forget document loading:
|
Fire-and-forget document loading:
|
||||||
|
|
|
||||||
|
|
@ -5,10 +5,6 @@ post:
|
||||||
description: |
|
description: |
|
||||||
Query knowledge graph using subject-predicate-object patterns.
|
Query knowledge graph using subject-predicate-object patterns.
|
||||||
|
|
||||||
This is a **flow-scoped** service. It requires a flow instance
|
|
||||||
and operates within the workspace associated with the
|
|
||||||
authenticated bearer token.
|
|
||||||
|
|
||||||
## Triples Query Overview
|
## Triples Query Overview
|
||||||
|
|
||||||
Query RDF triples with flexible pattern matching:
|
Query RDF triples with flexible pattern matching:
|
||||||
|
|
|
||||||
|
|
@ -1,206 +0,0 @@
|
||||||
post:
|
|
||||||
tags:
|
|
||||||
- IAM
|
|
||||||
summary: IAM service (global)
|
|
||||||
description: |
|
|
||||||
Identity and access management service.
|
|
||||||
|
|
||||||
This is a **global service** — it operates at system level, not
|
|
||||||
scoped to a specific workspace. The `workspace` field in the
|
|
||||||
request body is used as a scope filter or integrity check on
|
|
||||||
certain operations, not as an addressing component.
|
|
||||||
|
|
||||||
## Authentication
|
|
||||||
|
|
||||||
Most operations require a bearer token. The gateway resolves the
|
|
||||||
token to an authenticated identity and injects the `actor` field
|
|
||||||
(the caller's user ID) into the request. Clients cannot set
|
|
||||||
`actor` — the gateway overwrites it.
|
|
||||||
|
|
||||||
## Operations by Capability
|
|
||||||
|
|
||||||
### Any authenticated user
|
|
||||||
- `whoami`: Return the caller's own user record
|
|
||||||
- `list-my-workspaces`: List workspaces the caller has access to.
|
|
||||||
For open-source IAM: returns the caller's home workspace, or all
|
|
||||||
workspaces if the caller has the `admin` role.
|
|
||||||
|
|
||||||
### User management (`users:read` / `users:write` / `users:admin`)
|
|
||||||
- `create-user`: Create a new user in a workspace
|
|
||||||
- `list-users`: List users, optionally filtered by workspace
|
|
||||||
- `get-user`: Get a user record by ID
|
|
||||||
- `update-user`: Update user fields (name, email, roles, enabled)
|
|
||||||
- `disable-user`: Soft-disable a user and revoke their API keys
|
|
||||||
- `enable-user`: Re-enable a disabled user
|
|
||||||
- `delete-user`: Hard-delete a user and their API keys
|
|
||||||
|
|
||||||
### Workspace management (`workspaces:admin`)
|
|
||||||
- `create-workspace`: Create a new workspace
|
|
||||||
- `list-workspaces`: List all workspaces (admin view)
|
|
||||||
- `get-workspace`: Get a workspace record
|
|
||||||
- `update-workspace`: Update workspace name or enabled state
|
|
||||||
- `disable-workspace`: Disable a workspace and all its users
|
|
||||||
|
|
||||||
### API key management (`keys:self` / `keys:admin`)
|
|
||||||
- `create-api-key`: Create an API key (plaintext returned once)
|
|
||||||
- `list-api-keys`: List API keys for a user
|
|
||||||
- `revoke-api-key`: Revoke (delete) an API key
|
|
||||||
|
|
||||||
### Password management (`users:admin`)
|
|
||||||
- `reset-password`: Admin-initiated password reset (returns temporary password)
|
|
||||||
|
|
||||||
### System (`iam:admin`)
|
|
||||||
- `rotate-signing-key`: Rotate the JWT signing key
|
|
||||||
|
|
||||||
operationId: iamService
|
|
||||||
security:
|
|
||||||
- bearerAuth: []
|
|
||||||
requestBody:
|
|
||||||
required: true
|
|
||||||
content:
|
|
||||||
application/json:
|
|
||||||
schema:
|
|
||||||
$ref: '../components/schemas/iam/IamRequest.yaml'
|
|
||||||
examples:
|
|
||||||
whoami:
|
|
||||||
summary: Get the caller's own user record
|
|
||||||
value:
|
|
||||||
operation: whoami
|
|
||||||
listMyWorkspaces:
|
|
||||||
summary: List workspaces the caller has access to
|
|
||||||
value:
|
|
||||||
operation: list-my-workspaces
|
|
||||||
createUser:
|
|
||||||
summary: Create a new user
|
|
||||||
value:
|
|
||||||
operation: create-user
|
|
||||||
workspace: default
|
|
||||||
user:
|
|
||||||
username: alice
|
|
||||||
name: Alice Smith
|
|
||||||
email: alice@example.com
|
|
||||||
password: changeme123
|
|
||||||
roles:
|
|
||||||
- writer
|
|
||||||
listUsers:
|
|
||||||
summary: List users in a workspace
|
|
||||||
value:
|
|
||||||
operation: list-users
|
|
||||||
workspace: default
|
|
||||||
getUser:
|
|
||||||
summary: Get a specific user
|
|
||||||
value:
|
|
||||||
operation: get-user
|
|
||||||
user_id: usr_abc123
|
|
||||||
updateUser:
|
|
||||||
summary: Update a user's roles
|
|
||||||
value:
|
|
||||||
operation: update-user
|
|
||||||
user_id: usr_abc123
|
|
||||||
user:
|
|
||||||
roles:
|
|
||||||
- admin
|
|
||||||
disableUser:
|
|
||||||
summary: Disable a user
|
|
||||||
value:
|
|
||||||
operation: disable-user
|
|
||||||
user_id: usr_abc123
|
|
||||||
createWorkspace:
|
|
||||||
summary: Create a workspace
|
|
||||||
value:
|
|
||||||
operation: create-workspace
|
|
||||||
workspace_record:
|
|
||||||
id: production
|
|
||||||
name: Production Workspace
|
|
||||||
listWorkspaces:
|
|
||||||
summary: List all workspaces (admin)
|
|
||||||
value:
|
|
||||||
operation: list-workspaces
|
|
||||||
createApiKey:
|
|
||||||
summary: Create an API key
|
|
||||||
value:
|
|
||||||
operation: create-api-key
|
|
||||||
key:
|
|
||||||
user_id: usr_abc123
|
|
||||||
name: laptop
|
|
||||||
expires: "2027-01-01T00:00:00Z"
|
|
||||||
listApiKeys:
|
|
||||||
summary: List a user's API keys
|
|
||||||
value:
|
|
||||||
operation: list-api-keys
|
|
||||||
user_id: usr_abc123
|
|
||||||
revokeApiKey:
|
|
||||||
summary: Revoke an API key
|
|
||||||
value:
|
|
||||||
operation: revoke-api-key
|
|
||||||
key_id: key_xyz789
|
|
||||||
resetPassword:
|
|
||||||
summary: Admin-initiated password reset
|
|
||||||
value:
|
|
||||||
operation: reset-password
|
|
||||||
user_id: usr_abc123
|
|
||||||
responses:
|
|
||||||
'200':
|
|
||||||
description: Successful response
|
|
||||||
content:
|
|
||||||
application/json:
|
|
||||||
schema:
|
|
||||||
$ref: '../components/schemas/iam/IamResponse.yaml'
|
|
||||||
examples:
|
|
||||||
whoami:
|
|
||||||
summary: Caller's user record
|
|
||||||
value:
|
|
||||||
user:
|
|
||||||
id: usr_abc123
|
|
||||||
workspace: default
|
|
||||||
username: alice
|
|
||||||
name: Alice Smith
|
|
||||||
email: alice@example.com
|
|
||||||
roles:
|
|
||||||
- writer
|
|
||||||
enabled: true
|
|
||||||
must_change_password: false
|
|
||||||
created: "2026-01-15T10:30:00Z"
|
|
||||||
listMyWorkspaces:
|
|
||||||
summary: Workspaces the caller can access
|
|
||||||
value:
|
|
||||||
workspaces:
|
|
||||||
- id: default
|
|
||||||
name: Default Workspace
|
|
||||||
enabled: true
|
|
||||||
created: "2026-01-01T00:00:00Z"
|
|
||||||
listUsers:
|
|
||||||
summary: Users in a workspace
|
|
||||||
value:
|
|
||||||
users:
|
|
||||||
- id: usr_abc123
|
|
||||||
workspace: default
|
|
||||||
username: alice
|
|
||||||
name: Alice Smith
|
|
||||||
roles:
|
|
||||||
- writer
|
|
||||||
enabled: true
|
|
||||||
created: "2026-01-15T10:30:00Z"
|
|
||||||
createApiKey:
|
|
||||||
summary: New API key (plaintext returned once)
|
|
||||||
value:
|
|
||||||
api_key_plaintext: tg_aBcDeFgHiJkLmNoPqRsTuVwXyZ
|
|
||||||
api_key:
|
|
||||||
id: key_xyz789
|
|
||||||
user_id: usr_abc123
|
|
||||||
name: laptop
|
|
||||||
prefix: tg_a
|
|
||||||
expires: "2027-01-01T00:00:00Z"
|
|
||||||
created: "2026-05-29T14:00:00Z"
|
|
||||||
resetPassword:
|
|
||||||
summary: Temporary password (returned once)
|
|
||||||
value:
|
|
||||||
temporary_password: tmp_xK9mQ2pL
|
|
||||||
'400':
|
|
||||||
description: Bad request (unknown operation, missing required fields)
|
|
||||||
'401':
|
|
||||||
$ref: '../components/responses/Unauthorized.yaml'
|
|
||||||
'403':
|
|
||||||
description: Access denied (insufficient capabilities)
|
|
||||||
'500':
|
|
||||||
$ref: '../components/responses/Error.yaml'
|
|
||||||
|
|
@ -1,13 +1,9 @@
|
||||||
post:
|
post:
|
||||||
tags:
|
tags:
|
||||||
- Knowledge
|
- Knowledge
|
||||||
summary: Knowledge graph core management (workspace-scoped)
|
summary: Knowledge graph core management
|
||||||
description: |
|
description: |
|
||||||
Manage knowledge graph cores - persistent storage of triples and
|
Manage knowledge graph cores - persistent storage of triples and embeddings.
|
||||||
embeddings.
|
|
||||||
|
|
||||||
This is a **workspace-scoped** service. All operations apply to the
|
|
||||||
workspace associated with the authenticated bearer token.
|
|
||||||
|
|
||||||
## Knowledge Cores
|
## Knowledge Cores
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,13 +1,9 @@
|
||||||
post:
|
post:
|
||||||
tags:
|
tags:
|
||||||
- Librarian
|
- Librarian
|
||||||
summary: Document library management (workspace-scoped)
|
summary: Document library management
|
||||||
description: |
|
description: |
|
||||||
Manage document library: add, remove, list documents, and control
|
Manage document library: add, remove, list documents, and control processing.
|
||||||
processing.
|
|
||||||
|
|
||||||
This is a **workspace-scoped** service. All operations apply to the
|
|
||||||
workspace associated with the authenticated bearer token.
|
|
||||||
|
|
||||||
## Document Library
|
## Document Library
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -26,7 +26,7 @@ get:
|
||||||
|
|
||||||
### Request Message Format
|
### Request Message Format
|
||||||
|
|
||||||
**Workspace-Scoped Service Request** (no flow parameter):
|
**Global Service Request** (no flow parameter):
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"id": "req-123",
|
"id": "req-123",
|
||||||
|
|
@ -38,7 +38,7 @@ get:
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Flow-Scoped Service Request** (with flow parameter):
|
**Flow-Hosted Service Request** (with flow parameter):
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"id": "req-456",
|
"id": "req-456",
|
||||||
|
|
@ -54,7 +54,7 @@ get:
|
||||||
**Request Fields**:
|
**Request Fields**:
|
||||||
- `id` (string, required): Client-generated unique identifier for this request within the session. Used to match responses to requests.
|
- `id` (string, required): Client-generated unique identifier for this request within the session. Used to match responses to requests.
|
||||||
- `service` (string, required): Service identifier (e.g., "config", "agent", "document-rag"). Same as `{kind}` in REST URLs.
|
- `service` (string, required): Service identifier (e.g., "config", "agent", "document-rag"). Same as `{kind}` in REST URLs.
|
||||||
- `flow` (string, optional): Flow ID for flow-scoped services. Omit for workspace-scoped and global services.
|
- `flow` (string, optional): Flow ID for flow-hosted services. Omit for global services.
|
||||||
- `request` (object, required): Service-specific request payload. Same structure as REST API request body.
|
- `request` (object, required): Service-specific request payload. Same structure as REST API request body.
|
||||||
|
|
||||||
### Response Message Format
|
### Response Message Format
|
||||||
|
|
@ -96,14 +96,14 @@ get:
|
||||||
| `POST /api/v1/config` | `{"service": "config"}` |
|
| `POST /api/v1/config` | `{"service": "config"}` |
|
||||||
| `POST /api/v1/flow/{flow}/service/agent` | `{"service": "agent", "flow": "my-flow"}` |
|
| `POST /api/v1/flow/{flow}/service/agent` | `{"service": "agent", "flow": "my-flow"}` |
|
||||||
|
|
||||||
**Workspace-Scoped Services** (no `flow` parameter, workspace from token):
|
**Global Services** (no `flow` parameter):
|
||||||
- `config` - Configuration management
|
- `config` - Configuration management
|
||||||
- `flow` - Flow lifecycle and blueprints
|
- `flow` - Flow lifecycle and blueprints
|
||||||
- `librarian` - Document library management
|
- `librarian` - Document library management
|
||||||
- `knowledge` - Knowledge graph core management
|
- `knowledge` - Knowledge graph core management
|
||||||
- `collection-management` - Collection metadata
|
- `collection-management` - Collection metadata
|
||||||
|
|
||||||
**Flow-Scoped Services** (require `flow` parameter, workspace from token):
|
**Flow-Hosted Services** (require `flow` parameter):
|
||||||
- AI services: `agent`, `text-completion`, `prompt`, `document-rag`, `graph-rag`
|
- AI services: `agent`, `text-completion`, `prompt`, `document-rag`, `graph-rag`
|
||||||
- Embeddings: `embeddings`, `graph-embeddings`, `document-embeddings`
|
- Embeddings: `embeddings`, `graph-embeddings`, `document-embeddings`
|
||||||
- Query: `triples`, `objects`, `nlp-query`, `structured-query`
|
- Query: `triples`, `objects`, `nlp-query`, `structured-query`
|
||||||
|
|
@ -146,11 +146,9 @@ get:
|
||||||
|
|
||||||
## Authentication
|
## Authentication
|
||||||
|
|
||||||
The `/api/v1/socket` endpoint uses in-band authentication.
|
When `GATEWAY_SECRET` is set, include bearer token:
|
||||||
The WebSocket handshake is accepted unconditionally. After
|
- As query parameter: `ws://localhost:8088/api/v1/socket?token=<token>`
|
||||||
connecting, the client sends a bearer token as the first frame.
|
- Or in WebSocket subprotocol header
|
||||||
The gateway resolves the token to an identity and workspace.
|
|
||||||
All subsequent requests operate within that workspace context.
|
|
||||||
|
|
||||||
## Benefits Over REST
|
## Benefits Over REST
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -3,19 +3,10 @@ scheme: bearer
|
||||||
description: |
|
description: |
|
||||||
Bearer token authentication.
|
Bearer token authentication.
|
||||||
|
|
||||||
Clients authenticate by passing an opaque token in the
|
Set via `GATEWAY_SECRET` environment variable on the gateway.
|
||||||
`Authorization` header. The token is treated as an opaque string by
|
If `GATEWAY_SECRET` is not set, authentication is disabled (development mode).
|
||||||
clients — its internal structure is a gateway implementation detail
|
|
||||||
and must not be relied upon.
|
|
||||||
|
|
||||||
The gateway resolves the token to an authenticated identity and an
|
|
||||||
associated workspace. All workspace-scoped and flow-scoped operations
|
|
||||||
then execute within that workspace context.
|
|
||||||
|
|
||||||
Tokens are obtained via the IAM service (e.g. `tg-login` or
|
|
||||||
`tg-create-api-key`).
|
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
```
|
```
|
||||||
Authorization: Bearer <token>
|
Authorization: Bearer your-secret-token
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -24,7 +24,7 @@ echo
|
||||||
# Build WebSocket API documentation
|
# Build WebSocket API documentation
|
||||||
echo "Building WebSocket API documentation (AsyncAPI)..."
|
echo "Building WebSocket API documentation (AsyncAPI)..."
|
||||||
cd ../websocket
|
cd ../websocket
|
||||||
npx --yes @asyncapi/cli generate fromTemplate asyncapi.yaml @asyncapi/html-template -o /tmp/asyncapi-build -p singleFile=true --force-write --use-new-generator
|
npx --yes -p @asyncapi/cli asyncapi generate fromTemplate asyncapi.yaml @asyncapi/html-template -o /tmp/asyncapi-build -p singleFile=true --force-write
|
||||||
mv /tmp/asyncapi-build/index.html ../../docs/websocket.html
|
mv /tmp/asyncapi-build/index.html ../../docs/websocket.html
|
||||||
rm -rf /tmp/asyncapi-build
|
rm -rf /tmp/asyncapi-build
|
||||||
echo "✓ WebSocket API docs generated: docs/websocket.html"
|
echo "✓ WebSocket API docs generated: docs/websocket.html"
|
||||||
|
|
|
||||||
|
|
@ -2,7 +2,7 @@ asyncapi: 3.0.0
|
||||||
|
|
||||||
info:
|
info:
|
||||||
title: TrustGraph WebSocket API
|
title: TrustGraph WebSocket API
|
||||||
version: "2.4"
|
version: "2.2"
|
||||||
description: |
|
description: |
|
||||||
WebSocket API for TrustGraph - providing multiplexed, asynchronous access to all services.
|
WebSocket API for TrustGraph - providing multiplexed, asynchronous access to all services.
|
||||||
|
|
||||||
|
|
@ -14,35 +14,21 @@ info:
|
||||||
- **Efficient**: Lower overhead than HTTP REST
|
- **Efficient**: Lower overhead than HTTP REST
|
||||||
- **Streaming**: Real-time progressive responses
|
- **Streaming**: Real-time progressive responses
|
||||||
|
|
||||||
## Authentication
|
|
||||||
|
|
||||||
The `/api/v1/socket` endpoint uses **in-band authentication**.
|
|
||||||
The WebSocket handshake is accepted unconditionally. The client
|
|
||||||
must authenticate by sending a bearer token as the first message
|
|
||||||
after connecting. The gateway resolves the token to an
|
|
||||||
authenticated identity and workspace.
|
|
||||||
|
|
||||||
All subsequent requests execute within the workspace context
|
|
||||||
established by the authentication frame.
|
|
||||||
|
|
||||||
## Protocol Summary
|
## Protocol Summary
|
||||||
|
|
||||||
All messages are JSON with:
|
All messages are JSON with:
|
||||||
- `id`: Client-generated unique identifier for request/response correlation
|
- `id`: Client-generated unique identifier for request/response correlation
|
||||||
- `service`: Service identifier (e.g., "config", "agent", "document-rag")
|
- `service`: Service identifier (e.g., "config", "agent", "document-rag")
|
||||||
- `flow`: Optional flow ID for flow-scoped services
|
- `flow`: Optional flow ID for flow-hosted services
|
||||||
- `request`/`response`: Service-specific payload (identical to REST API schemas)
|
- `request`/`response`: Service-specific payload (identical to REST API schemas)
|
||||||
- `error`: Error information on failure
|
- `error`: Error information on failure
|
||||||
|
|
||||||
## Service Tiers
|
## Service Types
|
||||||
|
|
||||||
**Global Services** (no workspace scoping):
|
**Global Services** (no `flow` parameter):
|
||||||
- iam
|
|
||||||
|
|
||||||
**Workspace-Scoped Services** (workspace resolved from token):
|
|
||||||
- config, flow, librarian, knowledge, collection-management
|
- config, flow, librarian, knowledge, collection-management
|
||||||
|
|
||||||
**Flow-Scoped Services** (require `flow` parameter, workspace from token):
|
**Flow-Hosted Services** (require `flow` parameter):
|
||||||
- agent, text-completion, prompt, document-rag, graph-rag
|
- agent, text-completion, prompt, document-rag, graph-rag
|
||||||
- embeddings, graph-embeddings, document-embeddings
|
- embeddings, graph-embeddings, document-embeddings
|
||||||
- triples, rows, nlp-query, structured-query, sparql-query, structured-diag, row-embeddings
|
- triples, rows, nlp-query, structured-query, sparql-query, structured-diag, row-embeddings
|
||||||
|
|
@ -78,14 +64,11 @@ components:
|
||||||
securitySchemes:
|
securitySchemes:
|
||||||
bearerAuth:
|
bearerAuth:
|
||||||
type: httpApiKey
|
type: httpApiKey
|
||||||
name: Authorization
|
name: token
|
||||||
in: header
|
in: query
|
||||||
description: |
|
description: |
|
||||||
Bearer token authentication. The `/api/v1/socket` endpoint
|
Bearer token authentication when GATEWAY_SECRET is configured.
|
||||||
uses in-band authentication: the WebSocket handshake is
|
Include as query parameter: ws://localhost:8088/api/v1/socket?token=<token>
|
||||||
accepted unconditionally and the client sends a bearer token
|
|
||||||
as the first frame after connecting. The token is an opaque
|
|
||||||
string obtained via the IAM service.
|
|
||||||
|
|
||||||
messages:
|
messages:
|
||||||
ServiceRequest:
|
ServiceRequest:
|
||||||
|
|
|
||||||
|
|
@ -3,16 +3,8 @@ description: |
|
||||||
Primary WebSocket channel for all TrustGraph services.
|
Primary WebSocket channel for all TrustGraph services.
|
||||||
|
|
||||||
This single channel provides multiplexed access to:
|
This single channel provides multiplexed access to:
|
||||||
- Global services (IAM)
|
- All global services (config, flow, librarian, knowledge, collection-management)
|
||||||
- Workspace-scoped services (config, flow, librarian, knowledge, collection-management)
|
- All flow-hosted services (agent, RAG, embeddings, queries, loading, etc.)
|
||||||
- Flow-scoped services (agent, RAG, embeddings, queries, loading, etc.)
|
|
||||||
|
|
||||||
## Authentication
|
|
||||||
|
|
||||||
The handshake is accepted unconditionally. The client must send a
|
|
||||||
bearer token as the first frame after connecting (in-band auth).
|
|
||||||
The gateway resolves the token to an identity and workspace. All
|
|
||||||
subsequent requests execute within that workspace context.
|
|
||||||
|
|
||||||
## Multiplexing
|
## Multiplexing
|
||||||
|
|
||||||
|
|
@ -21,17 +13,16 @@ description: |
|
||||||
|
|
||||||
## Message Flow
|
## Message Flow
|
||||||
|
|
||||||
1. Client connects and sends bearer token as first frame (authentication)
|
1. Client sends request with unique `id`, `service`, optional `flow`, and `request` payload
|
||||||
2. Client sends requests with unique `id`, `service`, optional `flow`, and `request` payload
|
2. Server processes request asynchronously
|
||||||
3. Server processes request asynchronously
|
3. Server sends response(s) with matching `id` and either `response` or `error`
|
||||||
4. Server sends response(s) with matching `id` and either `response` or `error`
|
4. For streaming services, multiple responses may be sent with the same `id`
|
||||||
5. For streaming services, multiple responses may be sent with the same `id`
|
|
||||||
|
|
||||||
## Service Routing
|
## Service Routing
|
||||||
|
|
||||||
Messages are routed to services based on:
|
Messages are routed to services based on:
|
||||||
- `service`: Service identifier (required)
|
- `service`: Service identifier (required)
|
||||||
- `flow`: Flow ID (required for flow-scoped services, omitted for workspace-scoped and global services)
|
- `flow`: Flow ID (required for flow-hosted services, omitted for global services)
|
||||||
|
|
||||||
messages:
|
messages:
|
||||||
request:
|
request:
|
||||||
|
|
|
||||||
|
|
@ -9,17 +9,14 @@ description: |
|
||||||
payload:
|
payload:
|
||||||
description: Service request envelope with id, service, optional flow, and service-specific request payload
|
description: Service request envelope with id, service, optional flow, and service-specific request payload
|
||||||
oneOf:
|
oneOf:
|
||||||
# Global services
|
# Global services (no flow parameter)
|
||||||
- $ref: './requests/IamRequest.yaml'
|
|
||||||
|
|
||||||
# Workspace-scoped services (no flow parameter)
|
|
||||||
- $ref: './requests/ConfigRequest.yaml'
|
- $ref: './requests/ConfigRequest.yaml'
|
||||||
- $ref: './requests/FlowRequest.yaml'
|
- $ref: './requests/FlowRequest.yaml'
|
||||||
- $ref: './requests/LibrarianRequest.yaml'
|
- $ref: './requests/LibrarianRequest.yaml'
|
||||||
- $ref: './requests/KnowledgeRequest.yaml'
|
- $ref: './requests/KnowledgeRequest.yaml'
|
||||||
- $ref: './requests/CollectionManagementRequest.yaml'
|
- $ref: './requests/CollectionManagementRequest.yaml'
|
||||||
|
|
||||||
# Flow-scoped services (require flow parameter)
|
# Flow-hosted services (require flow parameter)
|
||||||
- $ref: './requests/AgentRequest.yaml'
|
- $ref: './requests/AgentRequest.yaml'
|
||||||
- $ref: './requests/DocumentRagRequest.yaml'
|
- $ref: './requests/DocumentRagRequest.yaml'
|
||||||
- $ref: './requests/GraphRagRequest.yaml'
|
- $ref: './requests/GraphRagRequest.yaml'
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for agent service (flow-scoped service)
|
description: WebSocket request for agent service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for collection-management service (workspace-scoped service)
|
description: WebSocket request for collection-management service (global service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for config service (workspace-scoped service)
|
description: WebSocket request for config service (global service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for document-embeddings service (flow-scoped service)
|
description: WebSocket request for document-embeddings service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for document-load service (flow-scoped service)
|
description: WebSocket request for document-load service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for document-rag service (flow-scoped service)
|
description: WebSocket request for document-rag service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for embeddings service (flow-scoped service)
|
description: WebSocket request for embeddings service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for flow service (workspace-scoped service)
|
description: WebSocket request for flow service (global service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for graph-embeddings service (flow-scoped service)
|
description: WebSocket request for graph-embeddings service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for graph-rag service (flow-scoped service)
|
description: WebSocket request for graph-rag service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,25 +0,0 @@
|
||||||
type: object
|
|
||||||
description: WebSocket request for IAM service (global service)
|
|
||||||
required:
|
|
||||||
- id
|
|
||||||
- service
|
|
||||||
- request
|
|
||||||
properties:
|
|
||||||
id:
|
|
||||||
type: string
|
|
||||||
description: Unique request identifier
|
|
||||||
service:
|
|
||||||
type: string
|
|
||||||
const: iam
|
|
||||||
description: Service identifier for IAM service
|
|
||||||
request:
|
|
||||||
$ref: '../../../../api/components/schemas/iam/IamRequest.yaml'
|
|
||||||
examples:
|
|
||||||
- id: req-1
|
|
||||||
service: iam
|
|
||||||
request:
|
|
||||||
operation: whoami
|
|
||||||
- id: req-2
|
|
||||||
service: iam
|
|
||||||
request:
|
|
||||||
operation: list-my-workspaces
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for knowledge service (workspace-scoped service)
|
description: WebSocket request for knowledge service (global service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for librarian service (workspace-scoped service)
|
description: WebSocket request for librarian service (global service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for mcp-tool service (flow-scoped service)
|
description: WebSocket request for mcp-tool service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for nlp-query service (flow-scoped service)
|
description: WebSocket request for nlp-query service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for prompt service (flow-scoped service)
|
description: WebSocket request for prompt service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for row-embeddings service (flow-scoped service)
|
description: WebSocket request for row-embeddings service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for rows service (flow-scoped service)
|
description: WebSocket request for rows service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for sparql-query service (flow-scoped service)
|
description: WebSocket request for sparql-query service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for structured-diag service (flow-scoped service)
|
description: WebSocket request for structured-diag service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for structured-query service (flow-scoped service)
|
description: WebSocket request for structured-query service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for text-completion service (flow-scoped service)
|
description: WebSocket request for text-completion service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for text-load service (flow-scoped service)
|
description: WebSocket request for text-load service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
type: object
|
type: object
|
||||||
description: WebSocket request for triples service (flow-scoped service)
|
description: WebSocket request for triples service (flow-hosted service)
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
- service
|
- service
|
||||||
|
|
|
||||||
|
|
@ -23,9 +23,8 @@ properties:
|
||||||
description: |
|
description: |
|
||||||
Service identifier. Same as {kind} in REST API URLs.
|
Service identifier. Same as {kind} in REST API URLs.
|
||||||
|
|
||||||
Global services: iam
|
Global services: config, flow, librarian, knowledge, collection-management
|
||||||
Workspace-scoped services: config, flow, librarian, knowledge, collection-management
|
Flow-hosted services: agent, text-completion, prompt, document-rag, graph-rag,
|
||||||
Flow-scoped services: agent, text-completion, prompt, document-rag, graph-rag,
|
|
||||||
embeddings, graph-embeddings, document-embeddings, triples, objects,
|
embeddings, graph-embeddings, document-embeddings, triples, objects,
|
||||||
nlp-query, structured-query, structured-diag, text-load, document-load, mcp-tool
|
nlp-query, structured-query, structured-diag, text-load, document-load, mcp-tool
|
||||||
examples:
|
examples:
|
||||||
|
|
@ -35,12 +34,10 @@ properties:
|
||||||
flow:
|
flow:
|
||||||
type: string
|
type: string
|
||||||
description: |
|
description: |
|
||||||
Flow ID for flow-scoped services. Required for services accessed via
|
Flow ID for flow-hosted services. Required for services accessed via
|
||||||
/api/v1/flow/{flow}/service/{kind} in REST API.
|
/api/v1/flow/{flow}/service/{kind} in REST API.
|
||||||
|
|
||||||
Omit for global services (iam) and workspace-scoped services
|
Omit this field for global services (config, flow, librarian, knowledge, collection-management).
|
||||||
(config, flow, librarian, knowledge, collection-management).
|
|
||||||
Workspace context is resolved from the authenticated token.
|
|
||||||
examples:
|
examples:
|
||||||
- my-flow
|
- my-flow
|
||||||
- production-flow
|
- production-flow
|
||||||
|
|
|
||||||
|
|
@ -410,56 +410,3 @@ class TestEdgeCases:
|
||||||
assert hosts == ['mixed-host']
|
assert hosts == ['mixed-host']
|
||||||
assert username is None # Stays None
|
assert username is None # Stays None
|
||||||
assert password == 'mixed-pass'
|
assert password == 'mixed-pass'
|
||||||
|
|
||||||
|
|
||||||
class TestReplicationFactorParamPath:
|
|
||||||
|
|
||||||
def test_explicit_kwarg(self):
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
_, _, _, _, rf = resolve_cassandra_config(
|
|
||||||
replication_factor=3,
|
|
||||||
)
|
|
||||||
assert rf == 3
|
|
||||||
|
|
||||||
def test_kwarg_overrides_env(self):
|
|
||||||
with patch.dict(os.environ, {'CASSANDRA_REPLICATION_FACTOR': '5'}, clear=True):
|
|
||||||
_, _, _, _, rf = resolve_cassandra_config(
|
|
||||||
replication_factor=3,
|
|
||||||
)
|
|
||||||
assert rf == 3
|
|
||||||
|
|
||||||
def test_env_fallback_when_kwarg_none(self):
|
|
||||||
with patch.dict(os.environ, {'CASSANDRA_REPLICATION_FACTOR': '5'}, clear=True):
|
|
||||||
_, _, _, _, rf = resolve_cassandra_config(
|
|
||||||
replication_factor=None,
|
|
||||||
)
|
|
||||||
assert rf == 5
|
|
||||||
|
|
||||||
def test_default_when_no_kwarg_no_env(self):
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
_, _, _, _, rf = resolve_cassandra_config()
|
|
||||||
assert rf == 1
|
|
||||||
|
|
||||||
def test_params_dict_path(self):
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
params = {'cassandra_replication_factor': 3}
|
|
||||||
_, _, _, _, rf = resolve_cassandra_config(
|
|
||||||
replication_factor=params.get('cassandra_replication_factor'),
|
|
||||||
)
|
|
||||||
assert rf == 3
|
|
||||||
|
|
||||||
def test_params_dict_overrides_env(self):
|
|
||||||
with patch.dict(os.environ, {'CASSANDRA_REPLICATION_FACTOR': '5'}, clear=True):
|
|
||||||
params = {'cassandra_replication_factor': 3}
|
|
||||||
_, _, _, _, rf = resolve_cassandra_config(
|
|
||||||
replication_factor=params.get('cassandra_replication_factor'),
|
|
||||||
)
|
|
||||||
assert rf == 3
|
|
||||||
|
|
||||||
def test_params_dict_missing_falls_to_env(self):
|
|
||||||
with patch.dict(os.environ, {'CASSANDRA_REPLICATION_FACTOR': '5'}, clear=True):
|
|
||||||
params = {}
|
|
||||||
_, _, _, _, rf = resolve_cassandra_config(
|
|
||||||
replication_factor=params.get('cassandra_replication_factor'),
|
|
||||||
)
|
|
||||||
assert rf == 5
|
|
||||||
|
|
@ -1,136 +0,0 @@
|
||||||
|
|
||||||
import os
|
|
||||||
import pytest
|
|
||||||
from unittest.mock import patch
|
|
||||||
|
|
||||||
from trustgraph.base.qdrant_config import (
|
|
||||||
get_qdrant_defaults,
|
|
||||||
resolve_qdrant_config,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class TestGetQdrantDefaults:
|
|
||||||
|
|
||||||
def test_defaults_with_no_env_vars(self):
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
defaults = get_qdrant_defaults()
|
|
||||||
assert defaults['url'] == 'http://localhost:6333'
|
|
||||||
assert defaults['api_key'] is None
|
|
||||||
assert defaults['replication_factor'] == 1
|
|
||||||
assert defaults['shard_number'] == 1
|
|
||||||
|
|
||||||
def test_defaults_from_env(self):
|
|
||||||
env = {
|
|
||||||
'QDRANT_URL': 'http://qdrant:6333',
|
|
||||||
'QDRANT_API_KEY': 'secret',
|
|
||||||
'QDRANT_REPLICATION_FACTOR': '3',
|
|
||||||
'QDRANT_SHARD_NUMBER': '5',
|
|
||||||
}
|
|
||||||
with patch.dict(os.environ, env, clear=True):
|
|
||||||
defaults = get_qdrant_defaults()
|
|
||||||
assert defaults['url'] == 'http://qdrant:6333'
|
|
||||||
assert defaults['api_key'] == 'secret'
|
|
||||||
assert defaults['replication_factor'] == 3
|
|
||||||
assert defaults['shard_number'] == 5
|
|
||||||
|
|
||||||
|
|
||||||
class TestResolveQdrantConfig:
|
|
||||||
|
|
||||||
def test_defaults(self):
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
url, api_key, rf, sn = resolve_qdrant_config()
|
|
||||||
assert url == 'http://localhost:6333'
|
|
||||||
assert api_key is None
|
|
||||||
assert rf == 1
|
|
||||||
assert sn == 1
|
|
||||||
|
|
||||||
def test_explicit_kwargs(self):
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
url, api_key, rf, sn = resolve_qdrant_config(
|
|
||||||
url='http://custom:6333',
|
|
||||||
api_key='key',
|
|
||||||
replication_factor=3,
|
|
||||||
shard_number=5,
|
|
||||||
)
|
|
||||||
assert url == 'http://custom:6333'
|
|
||||||
assert api_key == 'key'
|
|
||||||
assert rf == 3
|
|
||||||
assert sn == 5
|
|
||||||
|
|
||||||
def test_kwargs_override_env(self):
|
|
||||||
env = {
|
|
||||||
'QDRANT_URL': 'http://env:6333',
|
|
||||||
'QDRANT_REPLICATION_FACTOR': '10',
|
|
||||||
'QDRANT_SHARD_NUMBER': '10',
|
|
||||||
}
|
|
||||||
with patch.dict(os.environ, env, clear=True):
|
|
||||||
url, _, rf, sn = resolve_qdrant_config(
|
|
||||||
url='http://explicit:6333',
|
|
||||||
replication_factor=3,
|
|
||||||
shard_number=5,
|
|
||||||
)
|
|
||||||
assert url == 'http://explicit:6333'
|
|
||||||
assert rf == 3
|
|
||||||
assert sn == 5
|
|
||||||
|
|
||||||
def test_env_fallback_when_kwargs_none(self):
|
|
||||||
env = {
|
|
||||||
'QDRANT_URL': 'http://env:6333',
|
|
||||||
'QDRANT_REPLICATION_FACTOR': '3',
|
|
||||||
'QDRANT_SHARD_NUMBER': '5',
|
|
||||||
}
|
|
||||||
with patch.dict(os.environ, env, clear=True):
|
|
||||||
url, _, rf, sn = resolve_qdrant_config()
|
|
||||||
assert url == 'http://env:6333'
|
|
||||||
assert rf == 3
|
|
||||||
assert sn == 5
|
|
||||||
|
|
||||||
def test_params_dict_path(self):
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
params = {
|
|
||||||
'store_uri': 'http://params:6333',
|
|
||||||
'api_key': 'pkey',
|
|
||||||
'qdrant_replication_factor': 3,
|
|
||||||
'qdrant_shard_number': 5,
|
|
||||||
}
|
|
||||||
url, api_key, rf, sn = resolve_qdrant_config(
|
|
||||||
url=params.get('store_uri'),
|
|
||||||
api_key=params.get('api_key'),
|
|
||||||
replication_factor=params.get('qdrant_replication_factor'),
|
|
||||||
shard_number=params.get('qdrant_shard_number'),
|
|
||||||
)
|
|
||||||
assert url == 'http://params:6333'
|
|
||||||
assert api_key == 'pkey'
|
|
||||||
assert rf == 3
|
|
||||||
assert sn == 5
|
|
||||||
|
|
||||||
def test_params_dict_overrides_env(self):
|
|
||||||
env = {
|
|
||||||
'QDRANT_REPLICATION_FACTOR': '10',
|
|
||||||
'QDRANT_SHARD_NUMBER': '10',
|
|
||||||
}
|
|
||||||
with patch.dict(os.environ, env, clear=True):
|
|
||||||
params = {
|
|
||||||
'qdrant_replication_factor': 3,
|
|
||||||
'qdrant_shard_number': 5,
|
|
||||||
}
|
|
||||||
_, _, rf, sn = resolve_qdrant_config(
|
|
||||||
replication_factor=params.get('qdrant_replication_factor'),
|
|
||||||
shard_number=params.get('qdrant_shard_number'),
|
|
||||||
)
|
|
||||||
assert rf == 3
|
|
||||||
assert sn == 5
|
|
||||||
|
|
||||||
def test_params_dict_missing_falls_to_env(self):
|
|
||||||
env = {
|
|
||||||
'QDRANT_REPLICATION_FACTOR': '3',
|
|
||||||
'QDRANT_SHARD_NUMBER': '5',
|
|
||||||
}
|
|
||||||
with patch.dict(os.environ, env, clear=True):
|
|
||||||
params = {}
|
|
||||||
_, _, rf, sn = resolve_qdrant_config(
|
|
||||||
replication_factor=params.get('qdrant_replication_factor'),
|
|
||||||
shard_number=params.get('qdrant_shard_number'),
|
|
||||||
)
|
|
||||||
assert rf == 3
|
|
||||||
assert sn == 5
|
|
||||||
|
|
@ -11,12 +11,7 @@ from unittest.mock import AsyncMock, Mock, patch, MagicMock
|
||||||
from unittest.mock import call
|
from unittest.mock import call
|
||||||
|
|
||||||
from trustgraph.cores.knowledge import KnowledgeManager
|
from trustgraph.cores.knowledge import KnowledgeManager
|
||||||
from trustgraph.schema import (
|
from trustgraph.schema import KnowledgeResponse, Triples, GraphEmbeddings, Metadata, Triple, Term, EntityEmbeddings, IRI, LITERAL
|
||||||
KnowledgeResponse, Triples, GraphEmbeddings, Metadata, Triple, Term,
|
|
||||||
EntityEmbeddings, IRI, LITERAL,
|
|
||||||
LibraryMetadata, LibraryBlob,
|
|
||||||
LibrarianResponse, DocumentMetadata,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
|
|
@ -386,244 +381,3 @@ class TestKnowledgeManagerOtherMethods:
|
||||||
mock_respond.assert_called_once()
|
mock_respond.assert_called_once()
|
||||||
response = mock_respond.call_args[0][0]
|
response = mock_respond.call_args[0][0]
|
||||||
assert response.error is None
|
assert response.error is None
|
||||||
|
|
||||||
|
|
||||||
class TestKnowledgeManagerLibraryDownload:
|
|
||||||
"""Test get_kg_core streaming of library documents."""
|
|
||||||
|
|
||||||
@pytest.fixture
|
|
||||||
def manager_with_librarian(self, mock_flow_config):
|
|
||||||
with patch('trustgraph.cores.knowledge.KnowledgeTableStore'):
|
|
||||||
mock_librarian = AsyncMock()
|
|
||||||
manager = KnowledgeManager(
|
|
||||||
cassandra_host=["localhost"],
|
|
||||||
cassandra_username="test_user",
|
|
||||||
cassandra_password="test_pass",
|
|
||||||
keyspace="test_keyspace",
|
|
||||||
flow_config=mock_flow_config,
|
|
||||||
librarian=mock_librarian,
|
|
||||||
)
|
|
||||||
manager.table_store = AsyncMock()
|
|
||||||
return manager
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_get_kg_core_streams_library_docs(self, manager_with_librarian):
|
|
||||||
mock_request = Mock()
|
|
||||||
mock_request.id = "root-doc"
|
|
||||||
mock_respond = AsyncMock()
|
|
||||||
|
|
||||||
manager_with_librarian.table_store.get_triples = AsyncMock()
|
|
||||||
manager_with_librarian.table_store.get_graph_embeddings = AsyncMock()
|
|
||||||
|
|
||||||
root_meta = DocumentMetadata(
|
|
||||||
id="root-doc", kind="application/pdf", title="Test PDF",
|
|
||||||
document_type="source",
|
|
||||||
)
|
|
||||||
child_meta = DocumentMetadata(
|
|
||||||
id="chunk-1", kind="text/plain", title="Chunk 1",
|
|
||||||
parent_id="root-doc", document_type="chunk",
|
|
||||||
)
|
|
||||||
|
|
||||||
manager_with_librarian.librarian.fetch_document_metadata.return_value = root_meta
|
|
||||||
manager_with_librarian.librarian.request.return_value = LibrarianResponse(
|
|
||||||
document_metadatas=[child_meta],
|
|
||||||
)
|
|
||||||
manager_with_librarian.librarian.fetch_document_content.side_effect = [
|
|
||||||
b"cm9vdCBjb250ZW50",
|
|
||||||
b"Y2h1bmsgY29udGVudA==",
|
|
||||||
]
|
|
||||||
|
|
||||||
await manager_with_librarian.get_kg_core(
|
|
||||||
mock_request, mock_respond, "test-user"
|
|
||||||
)
|
|
||||||
|
|
||||||
responses = [c[0][0] for c in mock_respond.call_args_list]
|
|
||||||
|
|
||||||
lm_responses = [r for r in responses if r.library_metadata is not None]
|
|
||||||
lb_responses = [r for r in responses if r.library_blob is not None]
|
|
||||||
eos_responses = [r for r in responses if r.eos is True]
|
|
||||||
|
|
||||||
assert len(lm_responses) == 2
|
|
||||||
assert lm_responses[0].library_metadata.id == "root-doc"
|
|
||||||
assert lm_responses[0].library_metadata.document_type == "source"
|
|
||||||
assert lm_responses[1].library_metadata.id == "chunk-1"
|
|
||||||
assert lm_responses[1].library_metadata.parent_id == "root-doc"
|
|
||||||
|
|
||||||
assert len(lb_responses) == 2
|
|
||||||
assert lb_responses[0].library_blob.id == "root-doc"
|
|
||||||
assert lb_responses[0].library_blob.data == b"cm9vdCBjb250ZW50"
|
|
||||||
assert lb_responses[1].library_blob.id == "chunk-1"
|
|
||||||
|
|
||||||
assert len(eos_responses) == 1
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_get_kg_core_no_librarian_skips_library(self, mock_flow_config):
|
|
||||||
with patch('trustgraph.cores.knowledge.KnowledgeTableStore'):
|
|
||||||
manager = KnowledgeManager(
|
|
||||||
cassandra_host=["localhost"],
|
|
||||||
cassandra_username="u", cassandra_password="p",
|
|
||||||
keyspace="ks", flow_config=mock_flow_config,
|
|
||||||
)
|
|
||||||
manager.table_store = AsyncMock()
|
|
||||||
manager.table_store.get_triples = AsyncMock()
|
|
||||||
manager.table_store.get_graph_embeddings = AsyncMock()
|
|
||||||
|
|
||||||
mock_request = Mock()
|
|
||||||
mock_request.id = "doc-1"
|
|
||||||
mock_respond = AsyncMock()
|
|
||||||
|
|
||||||
await manager.get_kg_core(mock_request, mock_respond, "w")
|
|
||||||
|
|
||||||
responses = [c[0][0] for c in mock_respond.call_args_list]
|
|
||||||
assert all(r.library_metadata is None for r in responses)
|
|
||||||
assert all(r.library_blob is None for r in responses)
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_get_kg_core_librarian_metadata_failure_is_graceful(
|
|
||||||
self, manager_with_librarian,
|
|
||||||
):
|
|
||||||
mock_request = Mock()
|
|
||||||
mock_request.id = "missing-doc"
|
|
||||||
mock_respond = AsyncMock()
|
|
||||||
|
|
||||||
manager_with_librarian.table_store.get_triples = AsyncMock()
|
|
||||||
manager_with_librarian.table_store.get_graph_embeddings = AsyncMock()
|
|
||||||
manager_with_librarian.librarian.fetch_document_metadata.side_effect = (
|
|
||||||
RuntimeError("not found")
|
|
||||||
)
|
|
||||||
|
|
||||||
await manager_with_librarian.get_kg_core(
|
|
||||||
mock_request, mock_respond, "test-user"
|
|
||||||
)
|
|
||||||
|
|
||||||
responses = [c[0][0] for c in mock_respond.call_args_list]
|
|
||||||
assert all(r.library_metadata is None for r in responses)
|
|
||||||
assert any(r.eos for r in responses)
|
|
||||||
|
|
||||||
|
|
||||||
class TestKnowledgeManagerLibraryUpload:
|
|
||||||
"""Test put_kg_core handling of library metadata and blob records."""
|
|
||||||
|
|
||||||
@pytest.fixture
|
|
||||||
def manager_with_librarian(self, mock_flow_config):
|
|
||||||
with patch('trustgraph.cores.knowledge.KnowledgeTableStore'):
|
|
||||||
mock_librarian = AsyncMock()
|
|
||||||
manager = KnowledgeManager(
|
|
||||||
cassandra_host=["localhost"],
|
|
||||||
cassandra_username="u", cassandra_password="p",
|
|
||||||
keyspace="ks", flow_config=mock_flow_config,
|
|
||||||
librarian=mock_librarian,
|
|
||||||
)
|
|
||||||
manager.table_store = AsyncMock()
|
|
||||||
return manager
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_put_metadata_then_blob_calls_librarian(
|
|
||||||
self, manager_with_librarian,
|
|
||||||
):
|
|
||||||
mock_respond = AsyncMock()
|
|
||||||
manager_with_librarian.librarian.request.return_value = LibrarianResponse()
|
|
||||||
|
|
||||||
# First call: metadata
|
|
||||||
req_meta = Mock()
|
|
||||||
req_meta.triples = None
|
|
||||||
req_meta.graph_embeddings = None
|
|
||||||
req_meta.library_metadata = LibraryMetadata(
|
|
||||||
id="doc-1", kind="application/pdf", title="Test",
|
|
||||||
document_type="source",
|
|
||||||
)
|
|
||||||
req_meta.library_blob = None
|
|
||||||
await manager_with_librarian.put_kg_core(req_meta, mock_respond, "ws")
|
|
||||||
|
|
||||||
# Metadata is buffered, librarian not called yet
|
|
||||||
manager_with_librarian.librarian.request.assert_not_called()
|
|
||||||
|
|
||||||
# Second call: blob
|
|
||||||
req_blob = Mock()
|
|
||||||
req_blob.triples = None
|
|
||||||
req_blob.graph_embeddings = None
|
|
||||||
req_blob.library_metadata = None
|
|
||||||
req_blob.library_blob = LibraryBlob(
|
|
||||||
id="doc-1", data=b"dGVzdA==",
|
|
||||||
)
|
|
||||||
await manager_with_librarian.put_kg_core(req_blob, mock_respond, "ws")
|
|
||||||
|
|
||||||
# Now librarian should have been called with add-document
|
|
||||||
manager_with_librarian.librarian.request.assert_called_once()
|
|
||||||
call_args = manager_with_librarian.librarian.request.call_args[0][0]
|
|
||||||
assert call_args.operation == "add-document"
|
|
||||||
assert call_args.document_metadata.id == "doc-1"
|
|
||||||
assert call_args.document_metadata.kind == "application/pdf"
|
|
||||||
assert call_args.content == b"dGVzdA=="
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_put_child_document_uses_add_child_operation(
|
|
||||||
self, manager_with_librarian,
|
|
||||||
):
|
|
||||||
mock_respond = AsyncMock()
|
|
||||||
manager_with_librarian.librarian.request.return_value = LibrarianResponse()
|
|
||||||
|
|
||||||
req_meta = Mock()
|
|
||||||
req_meta.triples = None
|
|
||||||
req_meta.graph_embeddings = None
|
|
||||||
req_meta.library_metadata = LibraryMetadata(
|
|
||||||
id="chunk-1", kind="text/plain", title="Chunk",
|
|
||||||
parent_id="doc-1", document_type="chunk",
|
|
||||||
)
|
|
||||||
req_meta.library_blob = None
|
|
||||||
await manager_with_librarian.put_kg_core(req_meta, mock_respond, "ws")
|
|
||||||
|
|
||||||
req_blob = Mock()
|
|
||||||
req_blob.triples = None
|
|
||||||
req_blob.graph_embeddings = None
|
|
||||||
req_blob.library_metadata = None
|
|
||||||
req_blob.library_blob = LibraryBlob(id="chunk-1", data=b"Y2h1bms=")
|
|
||||||
await manager_with_librarian.put_kg_core(req_blob, mock_respond, "ws")
|
|
||||||
|
|
||||||
call_args = manager_with_librarian.librarian.request.call_args[0][0]
|
|
||||||
assert call_args.operation == "add-child-document"
|
|
||||||
assert call_args.document_metadata.parent_id == "doc-1"
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_put_blob_without_metadata_logs_warning(
|
|
||||||
self, manager_with_librarian,
|
|
||||||
):
|
|
||||||
mock_respond = AsyncMock()
|
|
||||||
|
|
||||||
req_blob = Mock()
|
|
||||||
req_blob.triples = None
|
|
||||||
req_blob.graph_embeddings = None
|
|
||||||
req_blob.library_metadata = None
|
|
||||||
req_blob.library_blob = LibraryBlob(id="orphan", data=b"data")
|
|
||||||
await manager_with_librarian.put_kg_core(req_blob, mock_respond, "ws")
|
|
||||||
|
|
||||||
# Librarian should not be called for orphan blob
|
|
||||||
manager_with_librarian.librarian.request.assert_not_called()
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_put_existing_document_is_graceful(
|
|
||||||
self, manager_with_librarian,
|
|
||||||
):
|
|
||||||
mock_respond = AsyncMock()
|
|
||||||
manager_with_librarian.librarian.request.side_effect = RuntimeError(
|
|
||||||
"Document already exists"
|
|
||||||
)
|
|
||||||
|
|
||||||
req_meta = Mock()
|
|
||||||
req_meta.triples = None
|
|
||||||
req_meta.graph_embeddings = None
|
|
||||||
req_meta.library_metadata = LibraryMetadata(
|
|
||||||
id="doc-1", kind="application/pdf", title="Test",
|
|
||||||
document_type="source",
|
|
||||||
)
|
|
||||||
req_meta.library_blob = None
|
|
||||||
await manager_with_librarian.put_kg_core(req_meta, mock_respond, "ws")
|
|
||||||
|
|
||||||
req_blob = Mock()
|
|
||||||
req_blob.triples = None
|
|
||||||
req_blob.graph_embeddings = None
|
|
||||||
req_blob.library_metadata = None
|
|
||||||
req_blob.library_blob = LibraryBlob(id="doc-1", data=b"data")
|
|
||||||
await manager_with_librarian.put_kg_core(req_blob, mock_respond, "ws")
|
|
||||||
|
|
||||||
# Should not raise — "already exists" is handled gracefully
|
|
||||||
|
|
@ -49,7 +49,7 @@ class TestPdfDecoderProcessor(IsolatedAsyncioTestCase):
|
||||||
async def test_on_message_success(self, mock_pdf_loader_class, mock_producer, mock_consumer):
|
async def test_on_message_success(self, mock_pdf_loader_class, mock_producer, mock_consumer):
|
||||||
"""Test successful PDF processing"""
|
"""Test successful PDF processing"""
|
||||||
# Mock PDF content
|
# Mock PDF content
|
||||||
pdf_content = b"%PDF-1.7\nfake pdf content"
|
pdf_content = b"fake pdf content"
|
||||||
pdf_base64 = base64.b64encode(pdf_content).decode('utf-8')
|
pdf_base64 = base64.b64encode(pdf_content).decode('utf-8')
|
||||||
|
|
||||||
# Mock PyPDFLoader
|
# Mock PyPDFLoader
|
||||||
|
|
@ -88,55 +88,13 @@ class TestPdfDecoderProcessor(IsolatedAsyncioTestCase):
|
||||||
# Verify triples were sent for each page (provenance)
|
# Verify triples were sent for each page (provenance)
|
||||||
assert mock_triples_flow.send.call_count == 2
|
assert mock_triples_flow.send.call_count == 2
|
||||||
|
|
||||||
@patch('trustgraph.base.librarian_client.Consumer')
|
|
||||||
@patch('trustgraph.base.librarian_client.Producer')
|
|
||||||
@patch('trustgraph.decoding.pdf.pdf_decoder.PyPDFLoader')
|
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor', MockAsyncProcessor)
|
|
||||||
async def test_on_message_rejects_librarian_content_that_is_not_pdf(self, mock_pdf_loader_class, mock_producer, mock_consumer):
|
|
||||||
"""Test rejecting non-PDF content before invoking the PDF loader"""
|
|
||||||
html_content = b"<html><body>Not found</body></html>"
|
|
||||||
html_base64 = base64.b64encode(html_content)
|
|
||||||
|
|
||||||
mock_metadata = Metadata(id="test-doc")
|
|
||||||
mock_document = Document(metadata=mock_metadata, document_id="doc-123")
|
|
||||||
mock_msg = MagicMock()
|
|
||||||
mock_msg.value.return_value = mock_document
|
|
||||||
|
|
||||||
mock_output_flow = AsyncMock()
|
|
||||||
mock_triples_flow = AsyncMock()
|
|
||||||
mock_flow = MagicMock(side_effect=lambda name: {
|
|
||||||
"output": mock_output_flow,
|
|
||||||
"triples": mock_triples_flow,
|
|
||||||
}.get(name))
|
|
||||||
mock_flow.librarian.fetch_document_metadata = AsyncMock(
|
|
||||||
return_value=MagicMock(kind="application/pdf")
|
|
||||||
)
|
|
||||||
mock_flow.librarian.fetch_document_content = AsyncMock(
|
|
||||||
return_value=html_base64
|
|
||||||
)
|
|
||||||
mock_flow.librarian.save_child_document = AsyncMock()
|
|
||||||
|
|
||||||
config = {
|
|
||||||
'id': 'test-pdf-decoder',
|
|
||||||
'taskgroup': AsyncMock()
|
|
||||||
}
|
|
||||||
|
|
||||||
processor = Processor(**config)
|
|
||||||
|
|
||||||
await processor.on_message(mock_msg, None, mock_flow)
|
|
||||||
|
|
||||||
mock_pdf_loader_class.assert_not_called()
|
|
||||||
mock_output_flow.send.assert_not_called()
|
|
||||||
mock_triples_flow.send.assert_not_called()
|
|
||||||
mock_flow.librarian.save_child_document.assert_not_called()
|
|
||||||
|
|
||||||
@patch('trustgraph.base.librarian_client.Consumer')
|
@patch('trustgraph.base.librarian_client.Consumer')
|
||||||
@patch('trustgraph.base.librarian_client.Producer')
|
@patch('trustgraph.base.librarian_client.Producer')
|
||||||
@patch('trustgraph.decoding.pdf.pdf_decoder.PyPDFLoader')
|
@patch('trustgraph.decoding.pdf.pdf_decoder.PyPDFLoader')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor', MockAsyncProcessor)
|
@patch('trustgraph.base.async_processor.AsyncProcessor', MockAsyncProcessor)
|
||||||
async def test_on_message_empty_pdf(self, mock_pdf_loader_class, mock_producer, mock_consumer):
|
async def test_on_message_empty_pdf(self, mock_pdf_loader_class, mock_producer, mock_consumer):
|
||||||
"""Test handling of empty PDF"""
|
"""Test handling of empty PDF"""
|
||||||
pdf_content = b"%PDF-1.7\nfake pdf content"
|
pdf_content = b"fake pdf content"
|
||||||
pdf_base64 = base64.b64encode(pdf_content).decode('utf-8')
|
pdf_base64 = base64.b64encode(pdf_content).decode('utf-8')
|
||||||
|
|
||||||
mock_loader = MagicMock()
|
mock_loader = MagicMock()
|
||||||
|
|
@ -168,7 +126,7 @@ class TestPdfDecoderProcessor(IsolatedAsyncioTestCase):
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor', MockAsyncProcessor)
|
@patch('trustgraph.base.async_processor.AsyncProcessor', MockAsyncProcessor)
|
||||||
async def test_on_message_unicode_content(self, mock_pdf_loader_class, mock_producer, mock_consumer):
|
async def test_on_message_unicode_content(self, mock_pdf_loader_class, mock_producer, mock_consumer):
|
||||||
"""Test handling of unicode content in PDF"""
|
"""Test handling of unicode content in PDF"""
|
||||||
pdf_content = b"%PDF-1.7\nfake pdf content"
|
pdf_content = b"fake pdf content"
|
||||||
pdf_base64 = base64.b64encode(pdf_content).decode('utf-8')
|
pdf_base64 = base64.b64encode(pdf_content).decode('utf-8')
|
||||||
|
|
||||||
mock_loader = MagicMock()
|
mock_loader = MagicMock()
|
||||||
|
|
|
||||||
|
|
@ -18,7 +18,7 @@ from trustgraph.embeddings.hf.hf import Processor
|
||||||
class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
||||||
"""Test HuggingFace dynamic model loading and caching"""
|
"""Test HuggingFace dynamic model loading and caching"""
|
||||||
|
|
||||||
@patch('langchain_huggingface.HuggingFaceEmbeddings')
|
@patch('trustgraph.embeddings.hf.hf.HuggingFaceEmbeddings')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
||||||
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
||||||
async def test_default_model_loaded_on_init(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
async def test_default_model_loaded_on_init(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
||||||
|
|
@ -39,7 +39,7 @@ class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
||||||
assert processor.cached_model_name == "test-model"
|
assert processor.cached_model_name == "test-model"
|
||||||
assert processor.embeddings is not None
|
assert processor.embeddings is not None
|
||||||
|
|
||||||
@patch('langchain_huggingface.HuggingFaceEmbeddings')
|
@patch('trustgraph.embeddings.hf.hf.HuggingFaceEmbeddings')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
||||||
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
||||||
async def test_model_caching_avoids_reload(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
async def test_model_caching_avoids_reload(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
||||||
|
|
@ -63,7 +63,7 @@ class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
||||||
mock_hf_class.assert_not_called()
|
mock_hf_class.assert_not_called()
|
||||||
assert processor.cached_model_name == "test-model"
|
assert processor.cached_model_name == "test-model"
|
||||||
|
|
||||||
@patch('langchain_huggingface.HuggingFaceEmbeddings')
|
@patch('trustgraph.embeddings.hf.hf.HuggingFaceEmbeddings')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
||||||
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
||||||
async def test_model_reload_on_name_change(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
async def test_model_reload_on_name_change(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
||||||
|
|
@ -84,7 +84,7 @@ class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
||||||
mock_hf_class.assert_called_once_with(model_name="different-model")
|
mock_hf_class.assert_called_once_with(model_name="different-model")
|
||||||
assert processor.cached_model_name == "different-model"
|
assert processor.cached_model_name == "different-model"
|
||||||
|
|
||||||
@patch('langchain_huggingface.HuggingFaceEmbeddings')
|
@patch('trustgraph.embeddings.hf.hf.HuggingFaceEmbeddings')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
||||||
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
||||||
async def test_on_embeddings_uses_default_model(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
async def test_on_embeddings_uses_default_model(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
||||||
|
|
@ -107,7 +107,7 @@ class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
||||||
assert processor.cached_model_name == "test-model" # Still using default
|
assert processor.cached_model_name == "test-model" # Still using default
|
||||||
assert result == [[0.1, 0.2, 0.3, 0.4, 0.5]]
|
assert result == [[0.1, 0.2, 0.3, 0.4, 0.5]]
|
||||||
|
|
||||||
@patch('langchain_huggingface.HuggingFaceEmbeddings')
|
@patch('trustgraph.embeddings.hf.hf.HuggingFaceEmbeddings')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
||||||
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
||||||
async def test_on_embeddings_uses_specified_model(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
async def test_on_embeddings_uses_specified_model(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
||||||
|
|
@ -130,7 +130,7 @@ class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
||||||
assert processor.cached_model_name == "custom-model"
|
assert processor.cached_model_name == "custom-model"
|
||||||
mock_hf_instance.embed_documents.assert_called_once_with(["test text"])
|
mock_hf_instance.embed_documents.assert_called_once_with(["test text"])
|
||||||
|
|
||||||
@patch('langchain_huggingface.HuggingFaceEmbeddings')
|
@patch('trustgraph.embeddings.hf.hf.HuggingFaceEmbeddings')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
||||||
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
||||||
async def test_multiple_model_switches(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
async def test_multiple_model_switches(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
||||||
|
|
@ -164,7 +164,7 @@ class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
||||||
assert call_count_after_b == initial_call_count + 2 # Reload for model-b
|
assert call_count_after_b == initial_call_count + 2 # Reload for model-b
|
||||||
assert call_count_after_a_again == initial_call_count + 3 # Reload back to model-a
|
assert call_count_after_a_again == initial_call_count + 3 # Reload back to model-a
|
||||||
|
|
||||||
@patch('langchain_huggingface.HuggingFaceEmbeddings')
|
@patch('trustgraph.embeddings.hf.hf.HuggingFaceEmbeddings')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
||||||
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
||||||
async def test_none_model_uses_default(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
async def test_none_model_uses_default(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
||||||
|
|
@ -187,7 +187,7 @@ class TestHuggingFaceDynamicModelLoading(IsolatedAsyncioTestCase):
|
||||||
assert mock_hf_class.call_count == initial_count
|
assert mock_hf_class.call_count == initial_count
|
||||||
assert processor.cached_model_name == "test-model"
|
assert processor.cached_model_name == "test-model"
|
||||||
|
|
||||||
@patch('langchain_huggingface.HuggingFaceEmbeddings')
|
@patch('trustgraph.embeddings.hf.hf.HuggingFaceEmbeddings')
|
||||||
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
@patch('trustgraph.base.async_processor.AsyncProcessor.__init__')
|
||||||
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
@patch('trustgraph.base.embeddings_service.EmbeddingsService.__init__')
|
||||||
async def test_initialization_without_model_uses_default(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
async def test_initialization_without_model_uses_default(self, mock_embeddings_init, mock_async_init, mock_hf_class):
|
||||||
|
|
|
||||||
|
|
@ -7,7 +7,7 @@ including template rendering, term merging, JSON validation, and error handling.
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
import json
|
import json
|
||||||
from unittest.mock import AsyncMock
|
from unittest.mock import AsyncMock, MagicMock, patch
|
||||||
|
|
||||||
from trustgraph.template.prompt_manager import PromptManager, PromptConfiguration, Prompt
|
from trustgraph.template.prompt_manager import PromptManager, PromptConfiguration, Prompt
|
||||||
|
|
||||||
|
|
@ -344,42 +344,6 @@ class TestPromptManager:
|
||||||
assert pm.terms == {} # Default empty terms
|
assert pm.terms == {} # Default empty terms
|
||||||
assert len(pm.prompts) == 0
|
assert len(pm.prompts) == 0
|
||||||
|
|
||||||
def test_load_config_does_not_swallow_keyboard_interrupt(self, monkeypatch):
|
|
||||||
"""KeyboardInterrupt should propagate out of config parsing."""
|
|
||||||
pm = PromptManager()
|
|
||||||
|
|
||||||
def interrupt(_value):
|
|
||||||
raise KeyboardInterrupt
|
|
||||||
|
|
||||||
monkeypatch.setattr("trustgraph.template.prompt_manager.json.loads", interrupt)
|
|
||||||
|
|
||||||
with pytest.raises(KeyboardInterrupt):
|
|
||||||
pm.load_config({"system": json.dumps("Test")})
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_json_parse_does_not_swallow_system_exit(self):
|
|
||||||
"""SystemExit should propagate out of JSON response parsing."""
|
|
||||||
pm = PromptManager()
|
|
||||||
config = {
|
|
||||||
"system": json.dumps("Test"),
|
|
||||||
"template-index": json.dumps(["json_response"]),
|
|
||||||
"template.json_response": json.dumps({
|
|
||||||
"prompt": "Generate JSON",
|
|
||||||
"response-type": "json"
|
|
||||||
})
|
|
||||||
}
|
|
||||||
pm.load_config(config)
|
|
||||||
|
|
||||||
def exit_parse(_text):
|
|
||||||
raise SystemExit(2)
|
|
||||||
|
|
||||||
pm.parse_json = exit_parse
|
|
||||||
mock_llm = AsyncMock()
|
|
||||||
mock_llm.return_value = "{}"
|
|
||||||
|
|
||||||
with pytest.raises(SystemExit):
|
|
||||||
await pm.invoke("json_response", {}, mock_llm)
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.unit
|
@pytest.mark.unit
|
||||||
class TestPromptManagerJsonl:
|
class TestPromptManagerJsonl:
|
||||||
|
|
|
||||||
|
|
@ -8,7 +8,6 @@ import pytest
|
||||||
from unittest.mock import Mock, patch, MagicMock, call
|
from unittest.mock import Mock, patch, MagicMock, call
|
||||||
import json
|
import json
|
||||||
|
|
||||||
from trustgraph.api.socket_client import SocketClient
|
|
||||||
from trustgraph.api import (
|
from trustgraph.api import (
|
||||||
Api,
|
Api,
|
||||||
Triple,
|
Triple,
|
||||||
|
|
@ -223,82 +222,6 @@ class TestSocketClient:
|
||||||
for method in expected_methods:
|
for method in expected_methods:
|
||||||
assert hasattr(flow_instance, method), f"Missing method: {method}"
|
assert hasattr(flow_instance, method), f"Missing method: {method}"
|
||||||
|
|
||||||
def test_socket_client_close_does_not_swallow_base_exceptions(self):
|
|
||||||
"""Test close cleanup does not suppress process-level interrupts."""
|
|
||||||
|
|
||||||
class InterruptingLoop:
|
|
||||||
def is_closed(self):
|
|
||||||
return False
|
|
||||||
|
|
||||||
def run_until_complete(self, awaitable):
|
|
||||||
if hasattr(awaitable, "close"):
|
|
||||||
awaitable.close()
|
|
||||||
raise SystemExit("stop")
|
|
||||||
|
|
||||||
socket = SocketClient(url="http://test/", timeout=60, token=None)
|
|
||||||
socket._loop = InterruptingLoop()
|
|
||||||
|
|
||||||
with pytest.raises(SystemExit):
|
|
||||||
socket.close()
|
|
||||||
|
|
||||||
@pytest.mark.parametrize(
|
|
||||||
("generator_method", "async_method"),
|
|
||||||
[
|
|
||||||
("_streaming_generator", "_send_request_async_streaming"),
|
|
||||||
("_streaming_generator_raw", "_send_request_async_streaming_raw"),
|
|
||||||
],
|
|
||||||
)
|
|
||||||
def test_socket_client_streaming_cleanup_does_not_swallow_base_exceptions(
|
|
||||||
self, generator_method, async_method
|
|
||||||
):
|
|
||||||
"""Test streaming cleanup does not suppress process-level interrupts."""
|
|
||||||
|
|
||||||
class FakeAsyncGenerator:
|
|
||||||
def __anext__(self):
|
|
||||||
return "next"
|
|
||||||
|
|
||||||
def aclose(self):
|
|
||||||
return "close"
|
|
||||||
|
|
||||||
class InterruptingLoop:
|
|
||||||
def run_until_complete(self, awaitable):
|
|
||||||
if awaitable == "next":
|
|
||||||
raise StopAsyncIteration
|
|
||||||
if awaitable == "close":
|
|
||||||
raise SystemExit("stop")
|
|
||||||
raise AssertionError(f"unexpected awaitable: {awaitable!r}")
|
|
||||||
|
|
||||||
socket = SocketClient(url="http://test/", timeout=60, token=None)
|
|
||||||
setattr(socket, async_method, lambda *args, **kwargs: FakeAsyncGenerator())
|
|
||||||
generator = getattr(socket, generator_method)(
|
|
||||||
"agent", "default", {}, InterruptingLoop()
|
|
||||||
)
|
|
||||||
|
|
||||||
with pytest.raises(SystemExit):
|
|
||||||
next(generator)
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_socket_client_reader_does_not_swallow_base_exceptions(self):
|
|
||||||
"""Test reader error fanout does not suppress process-level interrupts."""
|
|
||||||
|
|
||||||
class FailingSocket:
|
|
||||||
def __aiter__(self):
|
|
||||||
return self
|
|
||||||
|
|
||||||
async def __anext__(self):
|
|
||||||
raise ValueError("reader failed")
|
|
||||||
|
|
||||||
class InterruptingQueue:
|
|
||||||
async def put(self, message):
|
|
||||||
raise SystemExit("stop")
|
|
||||||
|
|
||||||
socket = SocketClient(url="http://test/", timeout=60, token=None)
|
|
||||||
socket._socket = FailingSocket()
|
|
||||||
socket._pending = {"req-1": InterruptingQueue()}
|
|
||||||
|
|
||||||
with pytest.raises(SystemExit):
|
|
||||||
await socket._reader()
|
|
||||||
|
|
||||||
|
|
||||||
class TestBulkClient:
|
class TestBulkClient:
|
||||||
"""Test bulk operations client"""
|
"""Test bulk operations client"""
|
||||||
|
|
|
||||||
|
|
@ -1,56 +0,0 @@
|
||||||
"""
|
|
||||||
Tests for ontology monitoring metrics.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from trustgraph.query.ontology.monitoring import (
|
|
||||||
PerformanceMonitor,
|
|
||||||
_extract_metric_label,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_extract_metric_label_reads_unquoted_label_value():
|
|
||||||
metric_name = "cache_requests_total{cache_type=entity,component=ontology}"
|
|
||||||
|
|
||||||
assert _extract_metric_label(metric_name, "cache_type") == "entity"
|
|
||||||
|
|
||||||
|
|
||||||
def test_extract_metric_label_reads_quoted_label_value():
|
|
||||||
metric_name = 'cache_requests_total{cache_type="entity",component="ontology"}'
|
|
||||||
|
|
||||||
assert _extract_metric_label(metric_name, "cache_type") == "entity"
|
|
||||||
|
|
||||||
|
|
||||||
def test_extract_metric_label_returns_none_when_label_missing():
|
|
||||||
metric_name = "cache_requests_total{component=ontology}"
|
|
||||||
|
|
||||||
assert _extract_metric_label(metric_name, "cache_type") is None
|
|
||||||
|
|
||||||
|
|
||||||
def test_performance_report_ignores_counters_without_cache_type_label():
|
|
||||||
monitor = PerformanceMonitor({"enabled": False})
|
|
||||||
monitor.metrics_collector.increment(
|
|
||||||
"cache_requests_total",
|
|
||||||
labels={"component": "ontology"},
|
|
||||||
)
|
|
||||||
monitor.metrics_collector.increment(
|
|
||||||
"cache_type=not_a_label",
|
|
||||||
labels={"component": "ontology"},
|
|
||||||
)
|
|
||||||
monitor.metrics_collector.increment(
|
|
||||||
"cache_requests_total",
|
|
||||||
labels={"cache_type": "entity"},
|
|
||||||
)
|
|
||||||
monitor.metrics_collector.increment(
|
|
||||||
"cache_hits_total",
|
|
||||||
labels={"cache_type": "entity"},
|
|
||||||
)
|
|
||||||
|
|
||||||
report = monitor.get_performance_report()
|
|
||||||
|
|
||||||
assert report["cache_performance"] == {
|
|
||||||
"entity": {
|
|
||||||
"hit_rate": 1.0,
|
|
||||||
"total_requests": 1.0,
|
|
||||||
"total_hits": 1.0,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -333,8 +333,8 @@ class TestUnifiedTableQueries:
|
||||||
"""Test queries against the unified rows table"""
|
"""Test queries against the unified rows table"""
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
@patch('trustgraph.query.rows.cassandra.service.async_execute_paged', new_callable=AsyncMock)
|
@patch('trustgraph.query.rows.cassandra.service.async_execute', new_callable=AsyncMock)
|
||||||
async def test_query_with_index_match(self, mock_async_execute_paged):
|
async def test_query_with_index_match(self, mock_async_execute):
|
||||||
"""Test query execution with matching index"""
|
"""Test query execution with matching index"""
|
||||||
processor = MagicMock()
|
processor = MagicMock()
|
||||||
processor.session = MagicMock()
|
processor.session = MagicMock()
|
||||||
|
|
@ -344,10 +344,10 @@ class TestUnifiedTableQueries:
|
||||||
processor.find_matching_index = Processor.find_matching_index.__get__(processor, Processor)
|
processor.find_matching_index = Processor.find_matching_index.__get__(processor, Processor)
|
||||||
processor.query_cassandra = Processor.query_cassandra.__get__(processor, Processor)
|
processor.query_cassandra = Processor.query_cassandra.__get__(processor, Processor)
|
||||||
|
|
||||||
# Mock async_execute_paged to return test data (list of pages)
|
# Mock async_execute to return test data
|
||||||
mock_row = MagicMock()
|
mock_row = MagicMock()
|
||||||
mock_row.data = {"id": "123", "name": "Test Product", "category": "electronics"}
|
mock_row.data = {"id": "123", "name": "Test Product", "category": "electronics"}
|
||||||
mock_async_execute_paged.return_value = [[mock_row]]
|
mock_async_execute.return_value = [mock_row]
|
||||||
|
|
||||||
schema = RowSchema(
|
schema = RowSchema(
|
||||||
name="products",
|
name="products",
|
||||||
|
|
@ -370,10 +370,10 @@ class TestUnifiedTableQueries:
|
||||||
|
|
||||||
# Verify Cassandra was connected and queried
|
# Verify Cassandra was connected and queried
|
||||||
processor.connect_cassandra.assert_called_once()
|
processor.connect_cassandra.assert_called_once()
|
||||||
mock_async_execute_paged.assert_called_once()
|
mock_async_execute.assert_called_once()
|
||||||
|
|
||||||
# Verify query structure - should query unified rows table
|
# Verify query structure - should query unified rows table
|
||||||
call_args = mock_async_execute_paged.call_args
|
call_args = mock_async_execute.call_args
|
||||||
query = call_args[0][1]
|
query = call_args[0][1]
|
||||||
params = call_args[0][2]
|
params = call_args[0][2]
|
||||||
|
|
||||||
|
|
@ -394,8 +394,8 @@ class TestUnifiedTableQueries:
|
||||||
assert results[0]["category"] == "electronics"
|
assert results[0]["category"] == "electronics"
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
@patch('trustgraph.query.rows.cassandra.service.async_scan', new_callable=AsyncMock)
|
@patch('trustgraph.query.rows.cassandra.service.async_execute', new_callable=AsyncMock)
|
||||||
async def test_query_without_index_match(self, mock_async_scan):
|
async def test_query_without_index_match(self, mock_async_execute):
|
||||||
"""Test query execution without matching index (scan mode)"""
|
"""Test query execution without matching index (scan mode)"""
|
||||||
processor = MagicMock()
|
processor = MagicMock()
|
||||||
processor.session = MagicMock()
|
processor.session = MagicMock()
|
||||||
|
|
@ -406,10 +406,12 @@ class TestUnifiedTableQueries:
|
||||||
processor._matches_filters = Processor._matches_filters.__get__(processor, Processor)
|
processor._matches_filters = Processor._matches_filters.__get__(processor, Processor)
|
||||||
processor.query_cassandra = Processor.query_cassandra.__get__(processor, Processor)
|
processor.query_cassandra = Processor.query_cassandra.__get__(processor, Processor)
|
||||||
|
|
||||||
# Mock async_scan to return filtered test data
|
# Mock async_execute to return test data
|
||||||
mock_row1 = MagicMock()
|
mock_row1 = MagicMock()
|
||||||
mock_row1.data = {"id": "1", "name": "Product A", "price": "100"}
|
mock_row1.data = {"id": "1", "name": "Product A", "price": "100"}
|
||||||
mock_async_scan.return_value = [mock_row1]
|
mock_row2 = MagicMock()
|
||||||
|
mock_row2.data = {"id": "2", "name": "Product B", "price": "200"}
|
||||||
|
mock_async_execute.return_value = [mock_row1, mock_row2]
|
||||||
|
|
||||||
schema = RowSchema(
|
schema = RowSchema(
|
||||||
name="products",
|
name="products",
|
||||||
|
|
@ -430,16 +432,13 @@ class TestUnifiedTableQueries:
|
||||||
limit=10
|
limit=10
|
||||||
)
|
)
|
||||||
|
|
||||||
# Verify async_scan was called
|
# Query should use ALLOW FILTERING for scan
|
||||||
mock_async_scan.assert_called_once()
|
call_args = mock_async_execute.call_args
|
||||||
|
|
||||||
# Verify query structure
|
|
||||||
call_args = mock_async_scan.call_args
|
|
||||||
query = call_args[0][1]
|
query = call_args[0][1]
|
||||||
|
|
||||||
assert "ALLOW FILTERING" in query
|
assert "ALLOW FILTERING" in query
|
||||||
|
|
||||||
# Should return filtered results
|
# Should post-filter results
|
||||||
assert len(results) == 1
|
assert len(results) == 1
|
||||||
assert results[0]["name"] == "Product A"
|
assert results[0]["name"] == "Product A"
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -259,8 +259,6 @@ class TestGraphEmbeddingsNullProtection:
|
||||||
proc.collection_exists = MagicMock(return_value=True)
|
proc.collection_exists = MagicMock(return_value=True)
|
||||||
proc._cache_lock = asyncio.Lock()
|
proc._cache_lock = asyncio.Lock()
|
||||||
proc._known_collections = set()
|
proc._known_collections = set()
|
||||||
proc.replication_factor = 1
|
|
||||||
proc.shard_number = 1
|
|
||||||
|
|
||||||
msg = MagicMock()
|
msg = MagicMock()
|
||||||
msg.metadata.collection = "graphs"
|
msg.metadata.collection = "graphs"
|
||||||
|
|
|
||||||
|
|
@ -35,9 +35,9 @@ def _make_store():
|
||||||
class TestGetGraphEmbeddings:
|
class TestGetGraphEmbeddings:
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
@patch('trustgraph.tables.knowledge.async_execute_paged', new_callable=AsyncMock)
|
@patch('trustgraph.tables.knowledge.async_execute', new_callable=AsyncMock)
|
||||||
async def test_row_converts_to_entity_embeddings_with_singular_vector(
|
async def test_row_converts_to_entity_embeddings_with_singular_vector(
|
||||||
self, mock_async_execute_paged
|
self, mock_async_execute
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Cassandra rows return entities as a list of [entity_tuple, vector]
|
Cassandra rows return entities as a list of [entity_tuple, vector]
|
||||||
|
|
@ -57,7 +57,7 @@ class TestGetGraphEmbeddings:
|
||||||
store = _make_store()
|
store = _make_store()
|
||||||
store.cassandra = Mock()
|
store.cassandra = Mock()
|
||||||
store.get_graph_embeddings_stmt = Mock()
|
store.get_graph_embeddings_stmt = Mock()
|
||||||
mock_async_execute_paged.return_value = [[fake_row]]
|
mock_async_execute.return_value = [fake_row]
|
||||||
|
|
||||||
received = []
|
received = []
|
||||||
|
|
||||||
|
|
@ -66,7 +66,7 @@ class TestGetGraphEmbeddings:
|
||||||
|
|
||||||
await store.get_graph_embeddings("alice", "doc-1", receiver)
|
await store.get_graph_embeddings("alice", "doc-1", receiver)
|
||||||
|
|
||||||
mock_async_execute_paged.assert_called_once_with(
|
mock_async_execute.assert_called_once_with(
|
||||||
store.cassandra,
|
store.cassandra,
|
||||||
store.get_graph_embeddings_stmt,
|
store.get_graph_embeddings_stmt,
|
||||||
("alice", "doc-1"),
|
("alice", "doc-1"),
|
||||||
|
|
@ -96,8 +96,8 @@ class TestGetGraphEmbeddings:
|
||||||
assert ge.entities[2].entity.value == "a literal entity"
|
assert ge.entities[2].entity.value == "a literal entity"
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
@patch('trustgraph.tables.knowledge.async_execute_paged', new_callable=AsyncMock)
|
@patch('trustgraph.tables.knowledge.async_execute', new_callable=AsyncMock)
|
||||||
async def test_empty_entities_blob_yields_empty_list(self, mock_async_execute_paged):
|
async def test_empty_entities_blob_yields_empty_list(self, mock_async_execute):
|
||||||
"""row[3] being None / empty must produce a GraphEmbeddings with
|
"""row[3] being None / empty must produce a GraphEmbeddings with
|
||||||
no entities, not raise."""
|
no entities, not raise."""
|
||||||
fake_row = (None, None, None, None)
|
fake_row = (None, None, None, None)
|
||||||
|
|
@ -105,7 +105,7 @@ class TestGetGraphEmbeddings:
|
||||||
store = _make_store()
|
store = _make_store()
|
||||||
store.cassandra = Mock()
|
store.cassandra = Mock()
|
||||||
store.get_graph_embeddings_stmt = Mock()
|
store.get_graph_embeddings_stmt = Mock()
|
||||||
mock_async_execute_paged.return_value = [[fake_row]]
|
mock_async_execute.return_value = [fake_row]
|
||||||
|
|
||||||
received = []
|
received = []
|
||||||
|
|
||||||
|
|
@ -118,8 +118,8 @@ class TestGetGraphEmbeddings:
|
||||||
assert received[0].entities == []
|
assert received[0].entities == []
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
@patch('trustgraph.tables.knowledge.async_execute_paged', new_callable=AsyncMock)
|
@patch('trustgraph.tables.knowledge.async_execute', new_callable=AsyncMock)
|
||||||
async def test_multiple_rows_each_emit_one_message(self, mock_async_execute_paged):
|
async def test_multiple_rows_each_emit_one_message(self, mock_async_execute):
|
||||||
fake_rows = [
|
fake_rows = [
|
||||||
(None, None, None, [
|
(None, None, None, [
|
||||||
(("http://example.org/a", True), [1.0]),
|
(("http://example.org/a", True), [1.0]),
|
||||||
|
|
@ -132,7 +132,7 @@ class TestGetGraphEmbeddings:
|
||||||
store = _make_store()
|
store = _make_store()
|
||||||
store.cassandra = Mock()
|
store.cassandra = Mock()
|
||||||
store.get_graph_embeddings_stmt = Mock()
|
store.get_graph_embeddings_stmt = Mock()
|
||||||
mock_async_execute_paged.return_value = [fake_rows]
|
mock_async_execute.return_value = fake_rows
|
||||||
|
|
||||||
received = []
|
received = []
|
||||||
|
|
||||||
|
|
@ -153,9 +153,9 @@ class TestGetTriples:
|
||||||
the same Metadata construction. Cover it for parity."""
|
the same Metadata construction. Cover it for parity."""
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
@patch('trustgraph.tables.knowledge.async_execute_paged', new_callable=AsyncMock)
|
@patch('trustgraph.tables.knowledge.async_execute', new_callable=AsyncMock)
|
||||||
async def test_row_converts_to_triples(self, mock_async_execute_paged):
|
async def test_row_converts_to_triples(self, mock_async_execute):
|
||||||
# row[3] is a list of (s_val, s_uri, p_val, p_uri, o_val, o_uri, graph)
|
# row[3] is a list of (s_val, s_uri, p_val, p_uri, o_val, o_uri)
|
||||||
fake_row = (
|
fake_row = (
|
||||||
None, None, None,
|
None, None, None,
|
||||||
[
|
[
|
||||||
|
|
@ -163,7 +163,6 @@ class TestGetTriples:
|
||||||
"http://example.org/alice", True,
|
"http://example.org/alice", True,
|
||||||
"http://example.org/knows", True,
|
"http://example.org/knows", True,
|
||||||
"http://example.org/bob", True,
|
"http://example.org/bob", True,
|
||||||
"urn:graph:source",
|
|
||||||
),
|
),
|
||||||
],
|
],
|
||||||
)
|
)
|
||||||
|
|
@ -171,7 +170,7 @@ class TestGetTriples:
|
||||||
store = _make_store()
|
store = _make_store()
|
||||||
store.cassandra = Mock()
|
store.cassandra = Mock()
|
||||||
store.get_triples_stmt = Mock()
|
store.get_triples_stmt = Mock()
|
||||||
mock_async_execute_paged.return_value = [[fake_row]]
|
mock_async_execute.return_value = [fake_row]
|
||||||
|
|
||||||
received = []
|
received = []
|
||||||
|
|
||||||
|
|
@ -192,33 +191,3 @@ class TestGetTriples:
|
||||||
assert t.s.iri == "http://example.org/alice"
|
assert t.s.iri == "http://example.org/alice"
|
||||||
assert t.p.iri == "http://example.org/knows"
|
assert t.p.iri == "http://example.org/knows"
|
||||||
assert t.o.iri == "http://example.org/bob"
|
assert t.o.iri == "http://example.org/bob"
|
||||||
assert t.g == "urn:graph:source"
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
@patch('trustgraph.tables.knowledge.async_execute_paged', new_callable=AsyncMock)
|
|
||||||
async def test_empty_graph_name_becomes_none(self, mock_async_execute_paged):
|
|
||||||
fake_row = (
|
|
||||||
None, None, None,
|
|
||||||
[
|
|
||||||
(
|
|
||||||
"http://example.org/alice", True,
|
|
||||||
"http://example.org/knows", True,
|
|
||||||
"http://example.org/bob", True,
|
|
||||||
"",
|
|
||||||
),
|
|
||||||
],
|
|
||||||
)
|
|
||||||
|
|
||||||
store = _make_store()
|
|
||||||
store.cassandra = Mock()
|
|
||||||
store.get_triples_stmt = Mock()
|
|
||||||
mock_async_execute_paged.return_value = [[fake_row]]
|
|
||||||
|
|
||||||
received = []
|
|
||||||
|
|
||||||
async def receiver(msg):
|
|
||||||
received.append(msg)
|
|
||||||
|
|
||||||
await store.get_triples("w", "d", receiver)
|
|
||||||
|
|
||||||
assert received[0].triples[0].g is None
|
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,5 @@
|
||||||
"""
|
"""
|
||||||
Round-trip unit tests for KnowledgeRequestTranslator and
|
Round-trip unit tests for KnowledgeRequestTranslator.
|
||||||
KnowledgeResponseTranslator.
|
|
||||||
|
|
||||||
Regression coverage: a previous version of the decode side constructed
|
Regression coverage: a previous version of the decode side constructed
|
||||||
EntityEmbeddings(vectors=...) — the schema field is `vector` (singular),
|
EntityEmbeddings(vectors=...) — the schema field is `vector` (singular),
|
||||||
|
|
@ -16,13 +15,9 @@ Triples breaks the test.
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from trustgraph.messaging.translators.knowledge import (
|
from trustgraph.messaging.translators.knowledge import KnowledgeRequestTranslator
|
||||||
KnowledgeRequestTranslator,
|
|
||||||
KnowledgeResponseTranslator,
|
|
||||||
)
|
|
||||||
from trustgraph.schema import (
|
from trustgraph.schema import (
|
||||||
KnowledgeRequest,
|
KnowledgeRequest,
|
||||||
KnowledgeResponse,
|
|
||||||
GraphEmbeddings,
|
GraphEmbeddings,
|
||||||
EntityEmbeddings,
|
EntityEmbeddings,
|
||||||
Triples,
|
Triples,
|
||||||
|
|
@ -30,8 +25,6 @@ from trustgraph.schema import (
|
||||||
Metadata,
|
Metadata,
|
||||||
Term,
|
Term,
|
||||||
IRI,
|
IRI,
|
||||||
LibraryMetadata,
|
|
||||||
LibraryBlob,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -152,161 +145,3 @@ class TestKnowledgeRequestTranslatorTriples:
|
||||||
assert t.s.iri == "http://example.org/alice"
|
assert t.s.iri == "http://example.org/alice"
|
||||||
assert t.p.iri == "http://example.org/knows"
|
assert t.p.iri == "http://example.org/knows"
|
||||||
assert t.o.iri == "http://example.org/bob"
|
assert t.o.iri == "http://example.org/bob"
|
||||||
|
|
||||||
|
|
||||||
class TestKnowledgeRequestTranslatorLibrary:
|
|
||||||
|
|
||||||
def test_roundtrip_preserves_library_metadata(self, translator):
|
|
||||||
request = KnowledgeRequest(
|
|
||||||
operation="put-kg-core",
|
|
||||||
id="doc-1",
|
|
||||||
library_metadata=LibraryMetadata(
|
|
||||||
id="doc-1",
|
|
||||||
kind="application/pdf",
|
|
||||||
title="Test Document",
|
|
||||||
parent_id="",
|
|
||||||
document_type="source",
|
|
||||||
comments="test comments",
|
|
||||||
tags=["tag1", "tag2"],
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
encoded = translator.encode(request)
|
|
||||||
assert "library-metadata" in encoded
|
|
||||||
lm = encoded["library-metadata"]
|
|
||||||
assert lm["id"] == "doc-1"
|
|
||||||
assert lm["kind"] == "application/pdf"
|
|
||||||
assert lm["title"] == "Test Document"
|
|
||||||
assert lm["parent-id"] == ""
|
|
||||||
assert lm["document-type"] == "source"
|
|
||||||
assert lm["comments"] == "test comments"
|
|
||||||
assert lm["tags"] == ["tag1", "tag2"]
|
|
||||||
|
|
||||||
decoded = translator.decode(encoded)
|
|
||||||
assert decoded.library_metadata is not None
|
|
||||||
assert decoded.library_metadata.id == "doc-1"
|
|
||||||
assert decoded.library_metadata.kind == "application/pdf"
|
|
||||||
assert decoded.library_metadata.title == "Test Document"
|
|
||||||
assert decoded.library_metadata.parent_id == ""
|
|
||||||
assert decoded.library_metadata.document_type == "source"
|
|
||||||
assert decoded.library_metadata.comments == "test comments"
|
|
||||||
assert decoded.library_metadata.tags == ["tag1", "tag2"]
|
|
||||||
|
|
||||||
def test_roundtrip_preserves_child_document_metadata(self, translator):
|
|
||||||
request = KnowledgeRequest(
|
|
||||||
operation="put-kg-core",
|
|
||||||
id="doc-1",
|
|
||||||
library_metadata=LibraryMetadata(
|
|
||||||
id="chunk-1",
|
|
||||||
kind="text/plain",
|
|
||||||
title="Chunk 1",
|
|
||||||
parent_id="doc-1",
|
|
||||||
document_type="chunk",
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
encoded = translator.encode(request)
|
|
||||||
decoded = translator.decode(encoded)
|
|
||||||
|
|
||||||
assert decoded.library_metadata.parent_id == "doc-1"
|
|
||||||
assert decoded.library_metadata.document_type == "chunk"
|
|
||||||
|
|
||||||
def test_roundtrip_preserves_library_blob(self, translator):
|
|
||||||
request = KnowledgeRequest(
|
|
||||||
operation="put-kg-core",
|
|
||||||
id="doc-1",
|
|
||||||
library_blob=LibraryBlob(
|
|
||||||
id="doc-1",
|
|
||||||
data=b"SGVsbG8gV29ybGQ=",
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
encoded = translator.encode(request)
|
|
||||||
assert "library-blob" in encoded
|
|
||||||
assert encoded["library-blob"]["id"] == "doc-1"
|
|
||||||
assert encoded["library-blob"]["data"] == "SGVsbG8gV29ybGQ="
|
|
||||||
|
|
||||||
decoded = translator.decode(encoded)
|
|
||||||
assert decoded.library_blob is not None
|
|
||||||
assert decoded.library_blob.id == "doc-1"
|
|
||||||
assert decoded.library_blob.data == "SGVsbG8gV29ybGQ="
|
|
||||||
|
|
||||||
def test_absent_library_fields_decode_as_none(self, translator):
|
|
||||||
decoded = translator.decode({
|
|
||||||
"operation": "get-kg-core",
|
|
||||||
"id": "doc-1",
|
|
||||||
})
|
|
||||||
assert decoded.library_metadata is None
|
|
||||||
assert decoded.library_blob is None
|
|
||||||
|
|
||||||
|
|
||||||
class TestKnowledgeResponseTranslatorLibrary:
|
|
||||||
|
|
||||||
@pytest.fixture
|
|
||||||
def response_translator(self):
|
|
||||||
return KnowledgeResponseTranslator()
|
|
||||||
|
|
||||||
def test_encode_library_metadata(self, response_translator):
|
|
||||||
response = KnowledgeResponse(
|
|
||||||
ids=None,
|
|
||||||
library_metadata=LibraryMetadata(
|
|
||||||
id="doc-1",
|
|
||||||
kind="application/pdf",
|
|
||||||
title="Test",
|
|
||||||
parent_id="",
|
|
||||||
document_type="source",
|
|
||||||
comments="",
|
|
||||||
tags=[],
|
|
||||||
),
|
|
||||||
)
|
|
||||||
encoded = response_translator.encode(response)
|
|
||||||
assert "library-metadata" in encoded
|
|
||||||
assert encoded["library-metadata"]["id"] == "doc-1"
|
|
||||||
assert encoded["library-metadata"]["kind"] == "application/pdf"
|
|
||||||
assert encoded["library-metadata"]["document-type"] == "source"
|
|
||||||
|
|
||||||
def test_encode_library_blob_bytes_to_string(self, response_translator):
|
|
||||||
response = KnowledgeResponse(
|
|
||||||
ids=None,
|
|
||||||
library_blob=LibraryBlob(
|
|
||||||
id="doc-1",
|
|
||||||
data=b"dGVzdCBkYXRh",
|
|
||||||
),
|
|
||||||
)
|
|
||||||
encoded = response_translator.encode(response)
|
|
||||||
assert "library-blob" in encoded
|
|
||||||
assert encoded["library-blob"]["id"] == "doc-1"
|
|
||||||
assert encoded["library-blob"]["data"] == "dGVzdCBkYXRh"
|
|
||||||
assert isinstance(encoded["library-blob"]["data"], str)
|
|
||||||
|
|
||||||
def test_encode_library_blob_string_passthrough(self, response_translator):
|
|
||||||
response = KnowledgeResponse(
|
|
||||||
ids=None,
|
|
||||||
library_blob=LibraryBlob(
|
|
||||||
id="doc-1",
|
|
||||||
data="already-a-string",
|
|
||||||
),
|
|
||||||
)
|
|
||||||
encoded = response_translator.encode(response)
|
|
||||||
assert encoded["library-blob"]["data"] == "already-a-string"
|
|
||||||
|
|
||||||
def test_library_metadata_is_not_final(self, response_translator):
|
|
||||||
response = KnowledgeResponse(
|
|
||||||
ids=None,
|
|
||||||
library_metadata=LibraryMetadata(id="doc-1"),
|
|
||||||
)
|
|
||||||
_, is_final = response_translator.encode_with_completion(response)
|
|
||||||
assert is_final is False
|
|
||||||
|
|
||||||
def test_library_blob_is_not_final(self, response_translator):
|
|
||||||
response = KnowledgeResponse(
|
|
||||||
ids=None,
|
|
||||||
library_blob=LibraryBlob(id="doc-1", data=b"data"),
|
|
||||||
)
|
|
||||||
_, is_final = response_translator.encode_with_completion(response)
|
|
||||||
assert is_final is False
|
|
||||||
|
|
||||||
def test_eos_is_final(self, response_translator):
|
|
||||||
response = KnowledgeResponse(eos=True)
|
|
||||||
_, is_final = response_translator.encode_with_completion(response)
|
|
||||||
assert is_final is True
|
|
||||||
|
|
|
||||||
|
|
@ -337,7 +337,7 @@ class Api:
|
||||||
from . bulk_client import BulkClient
|
from . bulk_client import BulkClient
|
||||||
# Extract base URL (remove api/v1/ suffix)
|
# Extract base URL (remove api/v1/ suffix)
|
||||||
base_url = self.url.rsplit("api/v1/", 1)[0].rstrip("/")
|
base_url = self.url.rsplit("api/v1/", 1)[0].rstrip("/")
|
||||||
self._bulk_client = BulkClient(base_url, self.timeout, self.token, workspace=self.workspace)
|
self._bulk_client = BulkClient(base_url, self.timeout, self.token)
|
||||||
return self._bulk_client
|
return self._bulk_client
|
||||||
|
|
||||||
def metrics(self):
|
def metrics(self):
|
||||||
|
|
@ -462,7 +462,7 @@ class Api:
|
||||||
from . async_bulk_client import AsyncBulkClient
|
from . async_bulk_client import AsyncBulkClient
|
||||||
# Extract base URL (remove api/v1/ suffix)
|
# Extract base URL (remove api/v1/ suffix)
|
||||||
base_url = self.url.rsplit("api/v1/", 1)[0].rstrip("/")
|
base_url = self.url.rsplit("api/v1/", 1)[0].rstrip("/")
|
||||||
self._async_bulk_client = AsyncBulkClient(base_url, self.timeout, self.token, workspace=self.workspace)
|
self._async_bulk_client = AsyncBulkClient(base_url, self.timeout, self.token)
|
||||||
return self._async_bulk_client
|
return self._async_bulk_client
|
||||||
|
|
||||||
def async_metrics(self):
|
def async_metrics(self):
|
||||||
|
|
|
||||||
|
|
@ -9,11 +9,10 @@ from . types import Triple
|
||||||
class AsyncBulkClient:
|
class AsyncBulkClient:
|
||||||
"""Asynchronous bulk operations client"""
|
"""Asynchronous bulk operations client"""
|
||||||
|
|
||||||
def __init__(self, url: str, timeout: int, token: Optional[str], workspace: str = "default") -> None:
|
def __init__(self, url: str, timeout: int, token: Optional[str]) -> None:
|
||||||
self.url: str = self._convert_to_ws_url(url)
|
self.url: str = self._convert_to_ws_url(url)
|
||||||
self.timeout: int = timeout
|
self.timeout: int = timeout
|
||||||
self.token: Optional[str] = token
|
self.token: Optional[str] = token
|
||||||
self.workspace: str = workspace
|
|
||||||
|
|
||||||
def _convert_to_ws_url(self, url: str) -> str:
|
def _convert_to_ws_url(self, url: str) -> str:
|
||||||
"""Convert HTTP URL to WebSocket URL"""
|
"""Convert HTTP URL to WebSocket URL"""
|
||||||
|
|
@ -26,21 +25,11 @@ class AsyncBulkClient:
|
||||||
else:
|
else:
|
||||||
return f"ws://{url}"
|
return f"ws://{url}"
|
||||||
|
|
||||||
def _build_ws_url(self, path: str) -> str:
|
|
||||||
"""Build a WebSocket URL with token and workspace query params."""
|
|
||||||
ws_url = f"{self.url}{path}"
|
|
||||||
params = []
|
|
||||||
if self.token:
|
|
||||||
params.append(f"token={self.token}")
|
|
||||||
if self.workspace:
|
|
||||||
params.append(f"workspace={self.workspace}")
|
|
||||||
if params:
|
|
||||||
ws_url = f"{ws_url}?{'&'.join(params)}"
|
|
||||||
return ws_url
|
|
||||||
|
|
||||||
async def import_triples(self, flow: str, triples: AsyncIterator[Triple], **kwargs: Any) -> None:
|
async def import_triples(self, flow: str, triples: AsyncIterator[Triple], **kwargs: Any) -> None:
|
||||||
"""Bulk import triples via WebSocket"""
|
"""Bulk import triples via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/triples")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/triples"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for triple in triples:
|
async for triple in triples:
|
||||||
|
|
@ -53,7 +42,9 @@ class AsyncBulkClient:
|
||||||
|
|
||||||
async def export_triples(self, flow: str, **kwargs: Any) -> AsyncIterator[Triple]:
|
async def export_triples(self, flow: str, **kwargs: Any) -> AsyncIterator[Triple]:
|
||||||
"""Bulk export triples via WebSocket"""
|
"""Bulk export triples via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/export/triples")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/export/triples"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for raw_message in websocket:
|
async for raw_message in websocket:
|
||||||
|
|
@ -66,7 +57,9 @@ class AsyncBulkClient:
|
||||||
|
|
||||||
async def import_graph_embeddings(self, flow: str, embeddings: AsyncIterator[Dict[str, Any]], **kwargs: Any) -> None:
|
async def import_graph_embeddings(self, flow: str, embeddings: AsyncIterator[Dict[str, Any]], **kwargs: Any) -> None:
|
||||||
"""Bulk import graph embeddings via WebSocket"""
|
"""Bulk import graph embeddings via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/graph-embeddings")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/graph-embeddings"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for embedding in embeddings:
|
async for embedding in embeddings:
|
||||||
|
|
@ -74,7 +67,9 @@ class AsyncBulkClient:
|
||||||
|
|
||||||
async def export_graph_embeddings(self, flow: str, **kwargs: Any) -> AsyncIterator[Dict[str, Any]]:
|
async def export_graph_embeddings(self, flow: str, **kwargs: Any) -> AsyncIterator[Dict[str, Any]]:
|
||||||
"""Bulk export graph embeddings via WebSocket"""
|
"""Bulk export graph embeddings via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/export/graph-embeddings")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/export/graph-embeddings"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for raw_message in websocket:
|
async for raw_message in websocket:
|
||||||
|
|
@ -82,7 +77,9 @@ class AsyncBulkClient:
|
||||||
|
|
||||||
async def import_document_embeddings(self, flow: str, embeddings: AsyncIterator[Dict[str, Any]], **kwargs: Any) -> None:
|
async def import_document_embeddings(self, flow: str, embeddings: AsyncIterator[Dict[str, Any]], **kwargs: Any) -> None:
|
||||||
"""Bulk import document embeddings via WebSocket"""
|
"""Bulk import document embeddings via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/document-embeddings")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/document-embeddings"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for embedding in embeddings:
|
async for embedding in embeddings:
|
||||||
|
|
@ -90,7 +87,9 @@ class AsyncBulkClient:
|
||||||
|
|
||||||
async def export_document_embeddings(self, flow: str, **kwargs: Any) -> AsyncIterator[Dict[str, Any]]:
|
async def export_document_embeddings(self, flow: str, **kwargs: Any) -> AsyncIterator[Dict[str, Any]]:
|
||||||
"""Bulk export document embeddings via WebSocket"""
|
"""Bulk export document embeddings via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/export/document-embeddings")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/export/document-embeddings"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for raw_message in websocket:
|
async for raw_message in websocket:
|
||||||
|
|
@ -98,7 +97,9 @@ class AsyncBulkClient:
|
||||||
|
|
||||||
async def import_entity_contexts(self, flow: str, contexts: AsyncIterator[Dict[str, Any]], **kwargs: Any) -> None:
|
async def import_entity_contexts(self, flow: str, contexts: AsyncIterator[Dict[str, Any]], **kwargs: Any) -> None:
|
||||||
"""Bulk import entity contexts via WebSocket"""
|
"""Bulk import entity contexts via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/entity-contexts")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/entity-contexts"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for context in contexts:
|
async for context in contexts:
|
||||||
|
|
@ -106,7 +107,9 @@ class AsyncBulkClient:
|
||||||
|
|
||||||
async def export_entity_contexts(self, flow: str, **kwargs: Any) -> AsyncIterator[Dict[str, Any]]:
|
async def export_entity_contexts(self, flow: str, **kwargs: Any) -> AsyncIterator[Dict[str, Any]]:
|
||||||
"""Bulk export entity contexts via WebSocket"""
|
"""Bulk export entity contexts via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/export/entity-contexts")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/export/entity-contexts"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for raw_message in websocket:
|
async for raw_message in websocket:
|
||||||
|
|
@ -114,7 +117,9 @@ class AsyncBulkClient:
|
||||||
|
|
||||||
async def import_rows(self, flow: str, rows: AsyncIterator[Dict[str, Any]], **kwargs: Any) -> None:
|
async def import_rows(self, flow: str, rows: AsyncIterator[Dict[str, Any]], **kwargs: Any) -> None:
|
||||||
"""Bulk import rows via WebSocket"""
|
"""Bulk import rows via WebSocket"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/rows")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/rows"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for row in rows:
|
async for row in rows:
|
||||||
|
|
|
||||||
|
|
@ -30,7 +30,6 @@ class AsyncSocketClient:
|
||||||
self.timeout = timeout
|
self.timeout = timeout
|
||||||
self.token = token
|
self.token = token
|
||||||
self.workspace = workspace
|
self.workspace = workspace
|
||||||
self._workspace_explicit = workspace != "default"
|
|
||||||
self._request_counter = 0
|
self._request_counter = 0
|
||||||
self._socket = None
|
self._socket = None
|
||||||
self._connect_cm = None
|
self._connect_cm = None
|
||||||
|
|
@ -93,8 +92,7 @@ class AsyncSocketClient:
|
||||||
)
|
)
|
||||||
|
|
||||||
if resp.get("type") == "auth-ok":
|
if resp.get("type") == "auth-ok":
|
||||||
if not self._workspace_explicit:
|
self.workspace = resp.get("workspace", self.workspace)
|
||||||
self.workspace = resp.get("workspace", self.workspace)
|
|
||||||
elif resp.get("type") == "auth-failed":
|
elif resp.get("type") == "auth-failed":
|
||||||
await self._socket.close()
|
await self._socket.close()
|
||||||
raise ProtocolException(
|
raise ProtocolException(
|
||||||
|
|
|
||||||
|
|
@ -34,7 +34,7 @@ class BulkClient:
|
||||||
Note: For true async support, use AsyncBulkClient instead.
|
Note: For true async support, use AsyncBulkClient instead.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, url: str, timeout: int, token: Optional[str], workspace: str = "default") -> None:
|
def __init__(self, url: str, timeout: int, token: Optional[str]) -> None:
|
||||||
"""
|
"""
|
||||||
Initialize synchronous bulk client.
|
Initialize synchronous bulk client.
|
||||||
|
|
||||||
|
|
@ -42,12 +42,10 @@ class BulkClient:
|
||||||
url: Base URL for TrustGraph API (HTTP/HTTPS will be converted to WS/WSS)
|
url: Base URL for TrustGraph API (HTTP/HTTPS will be converted to WS/WSS)
|
||||||
timeout: WebSocket timeout in seconds
|
timeout: WebSocket timeout in seconds
|
||||||
token: Optional bearer token for authentication
|
token: Optional bearer token for authentication
|
||||||
workspace: Workspace for data isolation
|
|
||||||
"""
|
"""
|
||||||
self.url: str = self._convert_to_ws_url(url)
|
self.url: str = self._convert_to_ws_url(url)
|
||||||
self.timeout: int = timeout
|
self.timeout: int = timeout
|
||||||
self.token: Optional[str] = token
|
self.token: Optional[str] = token
|
||||||
self.workspace: str = workspace
|
|
||||||
|
|
||||||
def _convert_to_ws_url(self, url: str) -> str:
|
def _convert_to_ws_url(self, url: str) -> str:
|
||||||
"""Convert HTTP URL to WebSocket URL"""
|
"""Convert HTTP URL to WebSocket URL"""
|
||||||
|
|
@ -60,18 +58,6 @@ class BulkClient:
|
||||||
else:
|
else:
|
||||||
return f"ws://{url}"
|
return f"ws://{url}"
|
||||||
|
|
||||||
def _build_ws_url(self, path: str) -> str:
|
|
||||||
"""Build a WebSocket URL with token and workspace query params."""
|
|
||||||
ws_url = f"{self.url}{path}"
|
|
||||||
params = []
|
|
||||||
if self.token:
|
|
||||||
params.append(f"token={self.token}")
|
|
||||||
if self.workspace:
|
|
||||||
params.append(f"workspace={self.workspace}")
|
|
||||||
if params:
|
|
||||||
ws_url = f"{ws_url}?{'&'.join(params)}"
|
|
||||||
return ws_url
|
|
||||||
|
|
||||||
def _run_async(self, coro: Coroutine[Any, Any, Any]) -> Any:
|
def _run_async(self, coro: Coroutine[Any, Any, Any]) -> Any:
|
||||||
"""Run async coroutine synchronously"""
|
"""Run async coroutine synchronously"""
|
||||||
try:
|
try:
|
||||||
|
|
@ -130,7 +116,9 @@ class BulkClient:
|
||||||
metadata: Optional[Dict[str, Any]], batch_size: int
|
metadata: Optional[Dict[str, Any]], batch_size: int
|
||||||
) -> None:
|
) -> None:
|
||||||
"""Async implementation of triple import"""
|
"""Async implementation of triple import"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/triples")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/triples"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
if metadata is None:
|
if metadata is None:
|
||||||
metadata = {"id": "", "metadata": [], "collection": "default"}
|
metadata = {"id": "", "metadata": [], "collection": "default"}
|
||||||
|
|
@ -206,7 +194,9 @@ class BulkClient:
|
||||||
|
|
||||||
async def _export_triples_async(self, flow: str) -> Iterator[Triple]:
|
async def _export_triples_async(self, flow: str) -> Iterator[Triple]:
|
||||||
"""Async implementation of triple export"""
|
"""Async implementation of triple export"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/export/triples")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/export/triples"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for raw_message in websocket:
|
async for raw_message in websocket:
|
||||||
|
|
@ -248,7 +238,9 @@ class BulkClient:
|
||||||
|
|
||||||
async def _import_graph_embeddings_async(self, flow: str, embeddings: Iterator[Dict[str, Any]]) -> None:
|
async def _import_graph_embeddings_async(self, flow: str, embeddings: Iterator[Dict[str, Any]]) -> None:
|
||||||
"""Async implementation of graph embeddings import"""
|
"""Async implementation of graph embeddings import"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/graph-embeddings")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/graph-embeddings"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
for embedding in embeddings:
|
for embedding in embeddings:
|
||||||
|
|
@ -304,7 +296,9 @@ class BulkClient:
|
||||||
|
|
||||||
async def _export_graph_embeddings_async(self, flow: str) -> Iterator[Dict[str, Any]]:
|
async def _export_graph_embeddings_async(self, flow: str) -> Iterator[Dict[str, Any]]:
|
||||||
"""Async implementation of graph embeddings export"""
|
"""Async implementation of graph embeddings export"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/export/graph-embeddings")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/export/graph-embeddings"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for raw_message in websocket:
|
async for raw_message in websocket:
|
||||||
|
|
@ -342,7 +336,9 @@ class BulkClient:
|
||||||
|
|
||||||
async def _import_document_embeddings_async(self, flow: str, embeddings: Iterator[Dict[str, Any]]) -> None:
|
async def _import_document_embeddings_async(self, flow: str, embeddings: Iterator[Dict[str, Any]]) -> None:
|
||||||
"""Async implementation of document embeddings import"""
|
"""Async implementation of document embeddings import"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/document-embeddings")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/document-embeddings"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
for embedding in embeddings:
|
for embedding in embeddings:
|
||||||
|
|
@ -398,7 +394,9 @@ class BulkClient:
|
||||||
|
|
||||||
async def _export_document_embeddings_async(self, flow: str) -> Iterator[Dict[str, Any]]:
|
async def _export_document_embeddings_async(self, flow: str) -> Iterator[Dict[str, Any]]:
|
||||||
"""Async implementation of document embeddings export"""
|
"""Async implementation of document embeddings export"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/export/document-embeddings")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/export/document-embeddings"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for raw_message in websocket:
|
async for raw_message in websocket:
|
||||||
|
|
@ -448,7 +446,9 @@ class BulkClient:
|
||||||
metadata: Optional[Dict[str, Any]], batch_size: int
|
metadata: Optional[Dict[str, Any]], batch_size: int
|
||||||
) -> None:
|
) -> None:
|
||||||
"""Async implementation of entity contexts import"""
|
"""Async implementation of entity contexts import"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/entity-contexts")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/entity-contexts"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
if metadata is None:
|
if metadata is None:
|
||||||
metadata = {"id": "", "metadata": [], "collection": "default"}
|
metadata = {"id": "", "metadata": [], "collection": "default"}
|
||||||
|
|
@ -522,7 +522,9 @@ class BulkClient:
|
||||||
|
|
||||||
async def _export_entity_contexts_async(self, flow: str) -> Iterator[Dict[str, Any]]:
|
async def _export_entity_contexts_async(self, flow: str) -> Iterator[Dict[str, Any]]:
|
||||||
"""Async implementation of entity contexts export"""
|
"""Async implementation of entity contexts export"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/export/entity-contexts")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/export/entity-contexts"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
async for raw_message in websocket:
|
async for raw_message in websocket:
|
||||||
|
|
@ -560,7 +562,9 @@ class BulkClient:
|
||||||
|
|
||||||
async def _import_rows_async(self, flow: str, rows: Iterator[Dict[str, Any]]) -> None:
|
async def _import_rows_async(self, flow: str, rows: Iterator[Dict[str, Any]]) -> None:
|
||||||
"""Async implementation of rows import"""
|
"""Async implementation of rows import"""
|
||||||
ws_url = self._build_ws_url(f"/api/v1/flow/{flow}/import/rows")
|
ws_url = f"{self.url}/api/v1/flow/{flow}/import/rows"
|
||||||
|
if self.token:
|
||||||
|
ws_url = f"{ws_url}?token={self.token}"
|
||||||
|
|
||||||
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
async with websockets.connect(ws_url, ping_interval=20, ping_timeout=self.timeout) as websocket:
|
||||||
for row in rows:
|
for row in rows:
|
||||||
|
|
|
||||||
|
|
@ -11,7 +11,6 @@ multiplexes requests by ID.
|
||||||
import json
|
import json
|
||||||
import asyncio
|
import asyncio
|
||||||
import websockets
|
import websockets
|
||||||
from websockets.exceptions import ConnectionClosed
|
|
||||||
from typing import Optional, Dict, Any, Iterator, Union, List
|
from typing import Optional, Dict, Any, Iterator, Union, List
|
||||||
from threading import Lock
|
from threading import Lock
|
||||||
|
|
||||||
|
|
@ -167,8 +166,7 @@ class SocketClient:
|
||||||
)
|
)
|
||||||
|
|
||||||
if resp.get("type") == "auth-ok":
|
if resp.get("type") == "auth-ok":
|
||||||
if self.workspace == "default":
|
self.workspace = resp.get("workspace", self.workspace)
|
||||||
self.workspace = resp.get("workspace", self.workspace)
|
|
||||||
elif resp.get("type") == "auth-failed":
|
elif resp.get("type") == "auth-failed":
|
||||||
await self._socket.close()
|
await self._socket.close()
|
||||||
raise ProtocolException(
|
raise ProtocolException(
|
||||||
|
|
@ -193,13 +191,13 @@ class SocketClient:
|
||||||
if request_id and request_id in self._pending:
|
if request_id and request_id in self._pending:
|
||||||
await self._pending[request_id].put(response)
|
await self._pending[request_id].put(response)
|
||||||
|
|
||||||
except ConnectionClosed:
|
except websockets.exceptions.ConnectionClosed:
|
||||||
pass
|
pass
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
for queue in self._pending.values():
|
for queue in self._pending.values():
|
||||||
try:
|
try:
|
||||||
await queue.put({"error": str(e)})
|
await queue.put({"error": str(e)})
|
||||||
except Exception:
|
except:
|
||||||
pass
|
pass
|
||||||
finally:
|
finally:
|
||||||
self._connected = False
|
self._connected = False
|
||||||
|
|
@ -252,7 +250,7 @@ class SocketClient:
|
||||||
finally:
|
finally:
|
||||||
try:
|
try:
|
||||||
loop.run_until_complete(async_gen.aclose())
|
loop.run_until_complete(async_gen.aclose())
|
||||||
except Exception:
|
except:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
def _streaming_generator_raw(
|
def _streaming_generator_raw(
|
||||||
|
|
@ -275,7 +273,7 @@ class SocketClient:
|
||||||
finally:
|
finally:
|
||||||
try:
|
try:
|
||||||
loop.run_until_complete(async_gen.aclose())
|
loop.run_until_complete(async_gen.aclose())
|
||||||
except Exception:
|
except:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
async def _send_request_async_streaming_raw(
|
async def _send_request_async_streaming_raw(
|
||||||
|
|
@ -502,7 +500,6 @@ class SocketClient:
|
||||||
|
|
||||||
def put_kg_core(
|
def put_kg_core(
|
||||||
self, id: str, triples=None, graph_embeddings=None,
|
self, id: str, triples=None, graph_embeddings=None,
|
||||||
library_metadata=None, library_blob=None,
|
|
||||||
) -> Dict[str, Any]:
|
) -> Dict[str, Any]:
|
||||||
request = {
|
request = {
|
||||||
"operation": "put-kg-core",
|
"operation": "put-kg-core",
|
||||||
|
|
@ -513,10 +510,6 @@ class SocketClient:
|
||||||
request["triples"] = triples
|
request["triples"] = triples
|
||||||
if graph_embeddings is not None:
|
if graph_embeddings is not None:
|
||||||
request["graph-embeddings"] = graph_embeddings
|
request["graph-embeddings"] = graph_embeddings
|
||||||
if library_metadata is not None:
|
|
||||||
request["library-metadata"] = library_metadata
|
|
||||||
if library_blob is not None:
|
|
||||||
request["library-blob"] = library_blob
|
|
||||||
return self._send_request_sync("knowledge", None, request)
|
return self._send_request_sync("knowledge", None, request)
|
||||||
|
|
||||||
def get_de_core(self, id: str) -> Iterator[Dict[str, Any]]:
|
def get_de_core(self, id: str) -> Iterator[Dict[str, Any]]:
|
||||||
|
|
@ -549,7 +542,7 @@ class SocketClient:
|
||||||
if self._loop and not self._loop.is_closed():
|
if self._loop and not self._loop.is_closed():
|
||||||
try:
|
try:
|
||||||
self._loop.run_until_complete(self._close_async())
|
self._loop.run_until_complete(self._close_async())
|
||||||
except Exception:
|
except:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
async def _close_async(self):
|
async def _close_async(self):
|
||||||
|
|
|
||||||
|
|
@ -103,19 +103,35 @@ def resolve_cassandra_config(
|
||||||
host: Optional[str] = None,
|
host: Optional[str] = None,
|
||||||
username: Optional[str] = None,
|
username: Optional[str] = None,
|
||||||
password: Optional[str] = None,
|
password: Optional[str] = None,
|
||||||
default_keyspace: Optional[str] = None,
|
default_keyspace: Optional[str] = None
|
||||||
replication_factor: Optional[int] = None,
|
|
||||||
) -> Tuple[List[str], Optional[str], Optional[str], Optional[str], int]:
|
) -> Tuple[List[str], Optional[str], Optional[str], Optional[str], int]:
|
||||||
|
"""
|
||||||
|
Resolve Cassandra configuration from various sources.
|
||||||
|
|
||||||
|
Can accept either argparse args object or explicit parameters.
|
||||||
|
Converts host string to list format for Cassandra driver.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
args: Optional argparse namespace with cassandra_host, cassandra_username, cassandra_password, cassandra_keyspace, cassandra_replication_factor
|
||||||
|
host: Optional explicit host parameter (overrides args)
|
||||||
|
username: Optional explicit username parameter (overrides args)
|
||||||
|
password: Optional explicit password parameter (overrides args)
|
||||||
|
default_keyspace: Optional default keyspace if not specified elsewhere
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
tuple: (hosts_list, username, password, keyspace, replication_factor)
|
||||||
|
"""
|
||||||
|
# If args provided, extract values
|
||||||
keyspace = None
|
keyspace = None
|
||||||
|
replication_factor = 1
|
||||||
if args is not None:
|
if args is not None:
|
||||||
host = host or getattr(args, 'cassandra_host', None)
|
host = host or getattr(args, 'cassandra_host', None)
|
||||||
username = username or getattr(args, 'cassandra_username', None)
|
username = username or getattr(args, 'cassandra_username', None)
|
||||||
password = password or getattr(args, 'cassandra_password', None)
|
password = password or getattr(args, 'cassandra_password', None)
|
||||||
keyspace = getattr(args, 'cassandra_keyspace', None)
|
keyspace = getattr(args, 'cassandra_keyspace', None)
|
||||||
replication_factor = replication_factor or getattr(
|
replication_factor = getattr(args, 'cassandra_replication_factor', 1)
|
||||||
args, 'cassandra_replication_factor', None
|
|
||||||
)
|
|
||||||
|
|
||||||
|
# Apply defaults if still None
|
||||||
defaults = get_cassandra_defaults()
|
defaults = get_cassandra_defaults()
|
||||||
host = host or defaults['host']
|
host = host or defaults['host']
|
||||||
username = username or defaults['username']
|
username = username or defaults['username']
|
||||||
|
|
|
||||||
|
|
@ -300,14 +300,6 @@ class IamClient(RequestResponse):
|
||||||
)
|
)
|
||||||
return resp.workspace
|
return resp.workspace
|
||||||
|
|
||||||
async def list_my_workspaces(self, actor="", timeout=IAM_TIMEOUT):
|
|
||||||
resp = await self._request(
|
|
||||||
operation="list-my-workspaces",
|
|
||||||
actor=actor,
|
|
||||||
timeout=timeout,
|
|
||||||
)
|
|
||||||
return list(resp.workspaces)
|
|
||||||
|
|
||||||
async def list_workspaces(self, actor="", timeout=IAM_TIMEOUT):
|
async def list_workspaces(self, actor="", timeout=IAM_TIMEOUT):
|
||||||
resp = await self._request(
|
resp = await self._request(
|
||||||
operation="list-workspaces",
|
operation="list-workspaces",
|
||||||
|
|
|
||||||
|
|
@ -11,7 +11,6 @@ Supports dual output to console and Loki for centralized log aggregation.
|
||||||
import contextvars
|
import contextvars
|
||||||
import logging
|
import logging
|
||||||
import logging.handlers
|
import logging.handlers
|
||||||
import uuid
|
|
||||||
from argparse import ArgumentParser
|
from argparse import ArgumentParser
|
||||||
from queue import Queue
|
from queue import Queue
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
@ -133,12 +132,14 @@ def setup_logging(args: dict[str, Any]) -> None:
|
||||||
try:
|
try:
|
||||||
from logging_loki import LokiHandler
|
from logging_loki import LokiHandler
|
||||||
|
|
||||||
instance_id = str(uuid.uuid4())[:8]
|
# Create Loki handler with optional authentication. The
|
||||||
|
# processor label is NOT baked in here — it's stamped onto
|
||||||
|
# each record by _ProcessorIdFilter reading the task-local
|
||||||
|
# contextvar, and logging_loki's emitter reads record.tags
|
||||||
|
# to build per-record Loki labels.
|
||||||
loki_handler_kwargs = {
|
loki_handler_kwargs = {
|
||||||
'url': loki_url,
|
'url': loki_url,
|
||||||
'version': "1",
|
'version': "1",
|
||||||
'tags': {'instance': instance_id},
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if loki_username and loki_password:
|
if loki_username and loki_password:
|
||||||
|
|
|
||||||
|
|
@ -1,87 +0,0 @@
|
||||||
|
|
||||||
import os
|
|
||||||
import argparse
|
|
||||||
from typing import Optional, Any, Tuple
|
|
||||||
|
|
||||||
|
|
||||||
def get_qdrant_defaults() -> dict:
|
|
||||||
return {
|
|
||||||
'url': os.getenv('QDRANT_URL', 'http://localhost:6333'),
|
|
||||||
'api_key': os.getenv('QDRANT_API_KEY'),
|
|
||||||
'replication_factor': int(os.getenv('QDRANT_REPLICATION_FACTOR', '1')),
|
|
||||||
'shard_number': int(os.getenv('QDRANT_SHARD_NUMBER', '1')),
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def add_qdrant_args(parser: argparse.ArgumentParser) -> None:
|
|
||||||
defaults = get_qdrant_defaults()
|
|
||||||
|
|
||||||
url_help = f"Qdrant URL (default: {defaults['url']})"
|
|
||||||
if 'QDRANT_URL' in os.environ:
|
|
||||||
url_help += " [from QDRANT_URL]"
|
|
||||||
|
|
||||||
api_key_help = "Qdrant API key"
|
|
||||||
if defaults['api_key']:
|
|
||||||
api_key_help += " (default: <set>)"
|
|
||||||
if 'QDRANT_API_KEY' in os.environ:
|
|
||||||
api_key_help += " [from QDRANT_API_KEY]"
|
|
||||||
|
|
||||||
replication_help = f"Qdrant collection replication factor (default: {defaults['replication_factor']})"
|
|
||||||
if 'QDRANT_REPLICATION_FACTOR' in os.environ:
|
|
||||||
replication_help += " [from QDRANT_REPLICATION_FACTOR]"
|
|
||||||
|
|
||||||
shard_help = f"Qdrant collection shard number (default: {defaults['shard_number']})"
|
|
||||||
if 'QDRANT_SHARD_NUMBER' in os.environ:
|
|
||||||
shard_help += " [from QDRANT_SHARD_NUMBER]"
|
|
||||||
|
|
||||||
parser.add_argument(
|
|
||||||
'--store-uri',
|
|
||||||
default=defaults['url'],
|
|
||||||
help=url_help,
|
|
||||||
)
|
|
||||||
|
|
||||||
parser.add_argument(
|
|
||||||
'--api-key',
|
|
||||||
default=defaults['api_key'],
|
|
||||||
help=api_key_help,
|
|
||||||
)
|
|
||||||
|
|
||||||
parser.add_argument(
|
|
||||||
'--qdrant-replication-factor',
|
|
||||||
type=int,
|
|
||||||
default=defaults['replication_factor'],
|
|
||||||
help=replication_help,
|
|
||||||
)
|
|
||||||
|
|
||||||
parser.add_argument(
|
|
||||||
'--qdrant-shard-number',
|
|
||||||
type=int,
|
|
||||||
default=defaults['shard_number'],
|
|
||||||
help=shard_help,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def resolve_qdrant_config(
|
|
||||||
args: Optional[Any] = None,
|
|
||||||
url: Optional[str] = None,
|
|
||||||
api_key: Optional[str] = None,
|
|
||||||
replication_factor: Optional[int] = None,
|
|
||||||
shard_number: Optional[int] = None,
|
|
||||||
) -> Tuple[str, Optional[str], int, int]:
|
|
||||||
if args is not None:
|
|
||||||
url = url or getattr(args, 'store_uri', None)
|
|
||||||
api_key = api_key or getattr(args, 'api_key', None)
|
|
||||||
replication_factor = replication_factor or getattr(
|
|
||||||
args, 'qdrant_replication_factor', None
|
|
||||||
)
|
|
||||||
shard_number = shard_number or getattr(
|
|
||||||
args, 'qdrant_shard_number', None
|
|
||||||
)
|
|
||||||
|
|
||||||
defaults = get_qdrant_defaults()
|
|
||||||
url = url or defaults['url']
|
|
||||||
api_key = api_key or defaults['api_key']
|
|
||||||
replication_factor = replication_factor or defaults['replication_factor']
|
|
||||||
shard_number = shard_number or defaults['shard_number']
|
|
||||||
|
|
||||||
return url, api_key, replication_factor, shard_number
|
|
||||||
|
|
@ -2,8 +2,7 @@ from typing import Dict, Any, Tuple, Optional
|
||||||
from ...schema import (
|
from ...schema import (
|
||||||
KnowledgeRequest, KnowledgeResponse, Triples, GraphEmbeddings,
|
KnowledgeRequest, KnowledgeResponse, Triples, GraphEmbeddings,
|
||||||
DocumentEmbeddings, ChunkEmbeddings,
|
DocumentEmbeddings, ChunkEmbeddings,
|
||||||
Metadata, EntityEmbeddings,
|
Metadata, EntityEmbeddings
|
||||||
LibraryMetadata, LibraryBlob,
|
|
||||||
)
|
)
|
||||||
from .base import MessageTranslator
|
from .base import MessageTranslator
|
||||||
from .primitives import ValueTranslator, SubgraphTranslator
|
from .primitives import ValueTranslator, SubgraphTranslator
|
||||||
|
|
@ -62,27 +61,6 @@ class KnowledgeRequestTranslator(MessageTranslator):
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
library_metadata = None
|
|
||||||
if "library-metadata" in data:
|
|
||||||
lm = data["library-metadata"]
|
|
||||||
library_metadata = LibraryMetadata(
|
|
||||||
id=lm.get("id", ""),
|
|
||||||
kind=lm.get("kind", ""),
|
|
||||||
title=lm.get("title", ""),
|
|
||||||
parent_id=lm.get("parent-id", ""),
|
|
||||||
document_type=lm.get("document-type", ""),
|
|
||||||
comments=lm.get("comments", ""),
|
|
||||||
tags=lm.get("tags", []),
|
|
||||||
)
|
|
||||||
|
|
||||||
library_blob = None
|
|
||||||
if "library-blob" in data:
|
|
||||||
lb = data["library-blob"]
|
|
||||||
library_blob = LibraryBlob(
|
|
||||||
id=lb.get("id", ""),
|
|
||||||
data=lb.get("data", b""),
|
|
||||||
)
|
|
||||||
|
|
||||||
return KnowledgeRequest(
|
return KnowledgeRequest(
|
||||||
operation=data.get("operation"),
|
operation=data.get("operation"),
|
||||||
id=data.get("id"),
|
id=data.get("id"),
|
||||||
|
|
@ -91,8 +69,6 @@ class KnowledgeRequestTranslator(MessageTranslator):
|
||||||
triples=triples,
|
triples=triples,
|
||||||
graph_embeddings=graph_embeddings,
|
graph_embeddings=graph_embeddings,
|
||||||
document_embeddings=document_embeddings,
|
document_embeddings=document_embeddings,
|
||||||
library_metadata=library_metadata,
|
|
||||||
library_blob=library_blob,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
def encode(self, obj: KnowledgeRequest) -> Dict[str, Any]:
|
def encode(self, obj: KnowledgeRequest) -> Dict[str, Any]:
|
||||||
|
|
@ -149,26 +125,6 @@ class KnowledgeRequestTranslator(MessageTranslator):
|
||||||
],
|
],
|
||||||
}
|
}
|
||||||
|
|
||||||
if obj.library_metadata:
|
|
||||||
result["library-metadata"] = {
|
|
||||||
"id": obj.library_metadata.id,
|
|
||||||
"kind": obj.library_metadata.kind,
|
|
||||||
"title": obj.library_metadata.title,
|
|
||||||
"parent-id": obj.library_metadata.parent_id,
|
|
||||||
"document-type": obj.library_metadata.document_type,
|
|
||||||
"comments": obj.library_metadata.comments,
|
|
||||||
"tags": obj.library_metadata.tags,
|
|
||||||
}
|
|
||||||
|
|
||||||
if obj.library_blob:
|
|
||||||
data = obj.library_blob.data
|
|
||||||
if isinstance(data, bytes):
|
|
||||||
data = data.decode("utf-8")
|
|
||||||
result["library-blob"] = {
|
|
||||||
"id": obj.library_blob.id,
|
|
||||||
"data": data,
|
|
||||||
}
|
|
||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -238,32 +194,6 @@ class KnowledgeResponseTranslator(MessageTranslator):
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
# Streaming library metadata response
|
|
||||||
if obj.library_metadata:
|
|
||||||
return {
|
|
||||||
"library-metadata": {
|
|
||||||
"id": obj.library_metadata.id,
|
|
||||||
"kind": obj.library_metadata.kind,
|
|
||||||
"title": obj.library_metadata.title,
|
|
||||||
"parent-id": obj.library_metadata.parent_id,
|
|
||||||
"document-type": obj.library_metadata.document_type,
|
|
||||||
"comments": obj.library_metadata.comments,
|
|
||||||
"tags": obj.library_metadata.tags,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Streaming library blob response
|
|
||||||
if obj.library_blob:
|
|
||||||
data = obj.library_blob.data
|
|
||||||
if isinstance(data, bytes):
|
|
||||||
data = data.decode("utf-8")
|
|
||||||
return {
|
|
||||||
"library-blob": {
|
|
||||||
"id": obj.library_blob.id,
|
|
||||||
"data": data,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# End of stream marker
|
# End of stream marker
|
||||||
if obj.eos is True:
|
if obj.eos is True:
|
||||||
return {"eos": True}
|
return {"eos": True}
|
||||||
|
|
@ -279,9 +209,7 @@ class KnowledgeResponseTranslator(MessageTranslator):
|
||||||
is_final = (
|
is_final = (
|
||||||
obj.ids is not None or # List response
|
obj.ids is not None or # List response
|
||||||
obj.eos is True or # End of stream
|
obj.eos is True or # End of stream
|
||||||
(not obj.triples and not obj.graph_embeddings
|
(not obj.triples and not obj.graph_embeddings and not obj.document_embeddings) # Empty response
|
||||||
and not obj.document_embeddings
|
|
||||||
and not obj.library_metadata and not obj.library_blob) # Empty response
|
|
||||||
)
|
)
|
||||||
|
|
||||||
return response, is_final
|
return response, is_final
|
||||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue