Update docs for API/CLI changes in 1.0 (#421)

* Update some API basics for the 0.23/1.0 API change
This commit is contained in:
cybermaggedon 2025-07-03 14:58:32 +01:00 committed by GitHub
parent f907ea7db8
commit 44bdd29f51
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
69 changed files with 19981 additions and 407 deletions

View file

@ -3,8 +3,10 @@
## Overview
If you want to interact with TrustGraph through APIs, there are 3
forms of API which may be of interest to you:
If you want to interact with TrustGraph through APIs, there are 4
forms of API which may be of interest to you. All four mechanisms
invoke the same underlying TrustGraph functionality but are made
available for integration in different ways:
### Pulsar APIs
@ -56,6 +58,31 @@ Cons:
using a basic REST API, particular if you want to cover all of the error
scenarios well
### Python SDK API
The `trustgraph-base` package provides a Python SDK that wraps the underlying
service invocations in a convenient Python API.
Pros:
- Native Python integration with type hints and documentation
- Simplified service invocation without manual message handling
- Built-in error handling and response parsing
- Convenient for Python-based applications and scripts
Cons:
- Python-specific, not available for other programming languages
- Requires Python environment and trustgraph-base package installation
- Less control over low-level message handling
## Flow-hosted APIs
There are two types of APIs: Flow-hosted which need a flow to be running
to operate. Non-flow-hosted which are core to the system, and can
be seen as 'global' - they are not dependent on a flow to be running.
Knowledge, Librarian, Config and Flow APIs fall into the latter
category.
## See also
- [TrustGraph websocket overview](websocket.md)
@ -64,9 +91,19 @@ Cons:
- [Text completion](api-text-completion.md)
- [Prompt completion](api-prompt.md)
- [Graph RAG](api-graph-rag.md)
- [Document RAG](api-document-rag.md)
- [Agent](api-agent.md)
- [Embeddings](api-embeddings.md)
- [Graph embeddings](api-graph-embeddings.md)
- [Document embeddings](api-document-embeddings.md)
- [Entity contexts](api-entity-contexts.md)
- [Triples query](api-triples-query.md)
- [Document load](api-document-load.md)
- [Text load](api-text-load.md)
- [Config](api-config.md)
- [Flow](api-flow.md)
- [Librarian](api-librarian.md)
- [Knowledge](api-knowledge.md)
- [Metrics](api-metrics.md)
- [Core import/export](api-core-import-export.md)

View file

@ -18,7 +18,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `thought`: Optional, a string, provides an interim agent thought
- `observation`: Optional, a string, provides an interim agent thought
- `answer`: Optional, a string, provides the final answer
@ -61,6 +61,7 @@ Request:
{
"id": "blrqotfefnmnh7de-20",
"service": "agent",
"flow": "default",
"request": {
"question": "What does NASA stand for?"
}

261
docs/apis/api-config.md Normal file
View file

@ -0,0 +1,261 @@
# TrustGraph Config API
This API provides centralized configuration management for TrustGraph components.
Configuration data is organized hierarchically by type and key, with support for
persistent storage and push notifications.
## Request/response
### Request
The request contains the following fields:
- `operation`: The operation to perform (`get`, `list`, `getvalues`, `put`, `delete`, `config`)
- `keys`: Array of ConfigKey objects (for `get`, `delete` operations)
- `type`: Configuration type (for `list`, `getvalues` operations)
- `values`: Array of ConfigValue objects (for `put` operation)
### Response
The response contains the following fields:
- `version`: Version number for tracking changes
- `values`: Array of ConfigValue objects returned by operations
- `directory`: Array of key names returned by `list` operation
- `config`: Full configuration map returned by `config` operation
- `error`: Error information if operation fails
## Operations
### PUT - Store Configuration Values
Request:
```json
{
"operation": "put",
"values": [
{
"type": "test",
"key": "key1",
"value": "value1"
}
]
}
```
Response:
```json
{
"version": 123
}
```
### GET - Retrieve Configuration Values
Request:
```json
{
"operation": "get",
"keys": [
{
"type": "test",
"key": "key1"
}
]
}
```
Response:
```json
{
"version": 123,
"values": [
{
"type": "test",
"key": "key1",
"value": "value1"
}
]
}
```
### LIST - List Keys by Type
Request:
```json
{
"operation": "list",
"type": "test"
}
```
Response:
```json
{
"version": 123,
"directory": ["key1", "key2", "key3"]
}
```
### GETVALUES - Get All Values by Type
Request:
```json
{
"operation": "getvalues",
"type": "test"
}
```
Response:
```json
{
"version": 123,
"values": [
{
"type": "test",
"key": "key1",
"value": "value1"
},
{
"type": "test",
"key": "key2",
"value": "value2"
}
]
}
```
### CONFIG - Get Entire Configuration
Request:
```json
{
"operation": "config"
}
```
Response:
```json
{
"version": 123,
"config": {
"test": {
"key1": "value1",
"key2": "value2"
}
}
}
```
### DELETE - Remove Configuration Values
Request:
```json
{
"operation": "delete",
"keys": [
{
"type": "test",
"key": "key1"
}
]
}
```
Response:
```json
{
"version": 124
}
```
## REST service
The REST service is available at `/api/v1/config` and accepts the above request formats.
## Websocket
Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.
Request:
```json
{
"id": "unique-request-id",
"service": "config",
"request": {
"operation": "get",
"keys": [
{
"type": "test",
"key": "key1"
}
]
}
}
```
Response:
```json
{
"id": "unique-request-id",
"response": {
"version": 123,
"values": [
{
"type": "test",
"key": "key1",
"value": "value1"
}
]
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Config API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/config.py
Default request queue:
`non-persistent://tg/request/config`
Default response queue:
`non-persistent://tg/response/config`
Request schema:
`trustgraph.schema.ConfigRequest`
Response schema:
`trustgraph.schema.ConfigResponse`
## Python SDK
The Python SDK provides convenient access to the Config API:
```python
from trustgraph.api.config import ConfigClient
client = ConfigClient()
# Put a value
await client.put("test", "key1", "value1")
# Get a value
value = await client.get("test", "key1")
# List keys
keys = await client.list("test")
# Get all values for a type
values = await client.get_values("test")
```
## Features
- **Hierarchical Organization**: Configuration organized by type and key
- **Versioning**: Each operation returns a version number for change tracking
- **Persistent Storage**: Data stored in Cassandra for persistence
- **Push Notifications**: Configuration changes pushed to subscribers
- **Multiple Access Methods**: Available via Pulsar, REST, WebSocket, and Python SDK

View file

@ -0,0 +1,324 @@
# TrustGraph Core Import/Export API
This API provides bulk import and export capabilities for TrustGraph knowledge cores.
It handles efficient transfer of both RDF triples and graph embeddings using MessagePack
binary format for high-performance data exchange.
## Overview
The Core Import/Export API enables:
- **Bulk Import**: Import large knowledge cores from binary streams
- **Bulk Export**: Export knowledge cores as binary streams
- **Efficient Format**: Uses MessagePack for compact, fast serialization
- **Dual Data Types**: Handles both RDF triples and graph embeddings
- **Streaming**: Supports streaming for large datasets
## Import Endpoint
**Endpoint:** `POST /api/v1/import-core`
**Query Parameters:**
- `id`: Knowledge core identifier
- `user`: User identifier
**Content-Type:** `application/octet-stream`
**Request Body:** MessagePack-encoded binary stream
### Import Process
1. **Stream Processing**: Reads binary data in 128KB chunks
2. **MessagePack Decoding**: Unpacks binary data into structured messages
3. **Knowledge Storage**: Stores data via Knowledge API
4. **Response**: Returns success/error status
### Import Data Format
The import stream contains MessagePack-encoded tuples with type indicators:
#### Triples Data
```python
("t", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"t": [ # triples array
{
"s": {"value": "subject", "is_uri": true},
"p": {"value": "predicate", "is_uri": true},
"o": {"value": "object", "is_uri": false}
}
]
})
```
#### Graph Embeddings Data
```python
("ge", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"e": [ # entities array
{
"e": {"value": "entity", "is_uri": true},
"v": [[0.1, 0.2, 0.3]] # vectors
}
]
})
```
## Export Endpoint
**Endpoint:** `GET /api/v1/export-core`
**Query Parameters:**
- `id`: Knowledge core identifier
- `user`: User identifier
**Content-Type:** `application/octet-stream`
**Response Body:** MessagePack-encoded binary stream
### Export Process
1. **Knowledge Retrieval**: Fetches data via Knowledge API
2. **MessagePack Encoding**: Encodes data into binary format
3. **Streaming Response**: Sends data as binary stream
4. **Type Identification**: Uses type prefixes for data classification
## Usage Examples
### Import Knowledge Core
```bash
# Import from file
curl -X POST \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/octet-stream" \
--data-binary @knowledge-core.msgpack \
"http://api-gateway:8080/api/v1/import-core?id=core-123&user=alice"
```
### Export Knowledge Core
```bash
# Export to file
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/v1/export-core?id=core-123&user=alice" \
-o knowledge-core.msgpack
```
## Python Integration
### Import Example
```python
import msgpack
import requests
def import_knowledge_core(core_id, user, triples_data, embeddings_data, token):
# Prepare data
messages = []
# Add triples
if triples_data:
messages.append(("t", {
"m": {
"i": core_id,
"m": [],
"u": user,
"c": "default"
},
"t": triples_data
}))
# Add embeddings
if embeddings_data:
messages.append(("ge", {
"m": {
"i": core_id,
"m": [],
"u": user,
"c": "default"
},
"e": embeddings_data
}))
# Pack data
binary_data = b''.join(msgpack.packb(msg) for msg in messages)
# Upload
response = requests.post(
f"http://api-gateway:8080/api/v1/import-core?id={core_id}&user={user}",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/octet-stream"
},
data=binary_data
)
return response.status_code == 200
# Usage
triples = [
{
"s": {"value": "Person1", "is_uri": True},
"p": {"value": "hasName", "is_uri": True},
"o": {"value": "John Doe", "is_uri": False}
}
]
embeddings = [
{
"e": {"value": "Person1", "is_uri": True},
"v": [[0.1, 0.2, 0.3, 0.4]]
}
]
success = import_knowledge_core("core-123", "alice", triples, embeddings, "your-token")
```
### Export Example
```python
import msgpack
import requests
def export_knowledge_core(core_id, user, token):
response = requests.get(
f"http://api-gateway:8080/api/v1/export-core?id={core_id}&user={user}",
headers={"Authorization": f"Bearer {token}"}
)
if response.status_code != 200:
return None
# Decode MessagePack stream
data = response.content
unpacker = msgpack.Unpacker()
unpacker.feed(data)
triples = []
embeddings = []
for unpacked in unpacker:
msg_type, msg_data = unpacked
if msg_type == "t":
triples.extend(msg_data["t"])
elif msg_type == "ge":
embeddings.extend(msg_data["e"])
return {
"triples": triples,
"embeddings": embeddings
}
# Usage
data = export_knowledge_core("core-123", "alice", "your-token")
if data:
print(f"Exported {len(data['triples'])} triples")
print(f"Exported {len(data['embeddings'])} embeddings")
```
## Data Format Specification
### MessagePack Tuples
Each message is a tuple: `(type_indicator, data_object)`
**Type Indicators:**
- `"t"`: RDF triples data
- `"ge"`: Graph embeddings data
### Metadata Structure
```python
{
"i": "core-identifier", # ID
"m": [...], # Metadata triples array
"u": "user-identifier", # User
"c": "collection-name" # Collection
}
```
### Triple Structure
```python
{
"s": {"value": "subject", "is_uri": boolean},
"p": {"value": "predicate", "is_uri": boolean},
"o": {"value": "object", "is_uri": boolean}
}
```
### Entity Embedding Structure
```python
{
"e": {"value": "entity", "is_uri": boolean},
"v": [[float, float, ...]] # Array of vectors
}
```
## Performance Characteristics
### Import Performance
- **Streaming**: Processes data in 128KB chunks
- **Memory Efficient**: Incremental unpacking
- **Concurrent**: Multiple imports can run simultaneously
- **Error Handling**: Robust error recovery
### Export Performance
- **Direct Streaming**: Data streamed directly from knowledge store
- **Efficient Encoding**: MessagePack for minimal overhead
- **Large Dataset Support**: Handles cores of any size
## Error Handling
### Import Errors
- **Format Errors**: Invalid MessagePack data
- **Type Errors**: Unknown type indicators
- **Storage Errors**: Knowledge API failures
- **Authentication**: Invalid user credentials
### Export Errors
- **Not Found**: Core ID doesn't exist
- **Access Denied**: User lacks permissions
- **System Errors**: Knowledge API failures
### Error Responses
- **HTTP 400**: Bad request (invalid parameters)
- **HTTP 401**: Unauthorized access
- **HTTP 404**: Core not found
- **HTTP 500**: Internal server error
## Use Cases
### Data Migration
- **System Upgrades**: Export/import during system migrations
- **Environment Sync**: Copy cores between environments
- **Backup/Restore**: Full knowledge core backup operations
### Batch Processing
- **Bulk Loading**: Load large knowledge datasets efficiently
- **Data Integration**: Merge knowledge from multiple sources
- **ETL Pipelines**: Extract-Transform-Load operations
### Performance Optimization
- **Faster Than REST**: Binary format reduces transfer time
- **Atomic Operations**: Complete import/export as single operation
- **Resource Efficient**: Minimal memory footprint during transfer
## Security Considerations
- **Authentication Required**: Bearer token authentication
- **User Isolation**: Access restricted to user's own cores
- **Data Validation**: Input validation on import
- **Audit Logging**: Operations logged for security auditing

View file

@ -0,0 +1,252 @@
# TrustGraph Document Embeddings API
This API provides import, export, and query capabilities for document embeddings. It handles
document chunks with their vector embeddings and metadata, supporting both real-time WebSocket
operations and request/response patterns.
## Schema Overview
### DocumentEmbeddings Structure
- `metadata`: Document metadata (ID, user, collection, RDF triples)
- `chunks`: Array of document chunks with embeddings
### ChunkEmbeddings Structure
- `chunk`: Text chunk as bytes
- `vectors`: Array of vector embeddings (Array of Array of Double)
### DocumentEmbeddingsRequest Structure
- `vectors`: Query vector embeddings
- `limit`: Maximum number of results
- `user`: User identifier
- `collection`: Collection identifier
### DocumentEmbeddingsResponse Structure
- `error`: Error information if operation fails
- `documents`: Array of matching documents as bytes
## Import/Export Operations
### Import - WebSocket Endpoint
**Endpoint:** `/api/v1/flow/{flow}/import/document-embeddings`
**Method:** WebSocket connection
**Request Format:**
```json
{
"metadata": {
"id": "doc-123",
"user": "alice",
"collection": "research",
"metadata": [
{
"s": {"v": "doc-123", "e": true},
"p": {"v": "dc:title", "e": true},
"o": {"v": "Research Paper", "e": false}
}
]
},
"chunks": [
{
"chunk": "This is the first chunk of the document...",
"vectors": [
[0.1, 0.2, 0.3, 0.4],
[0.5, 0.6, 0.7, 0.8]
]
},
{
"chunk": "This is the second chunk...",
"vectors": [
[0.9, 0.8, 0.7, 0.6],
[0.5, 0.4, 0.3, 0.2]
]
}
]
}
```
**Response:** Import operations are fire-and-forget with no response payload.
### Export - WebSocket Endpoint
**Endpoint:** `/api/v1/flow/{flow}/export/document-embeddings`
**Method:** WebSocket connection
The export endpoint streams document embeddings data in real-time. Each message contains:
```json
{
"metadata": {
"id": "doc-123",
"user": "alice",
"collection": "research",
"metadata": [
{
"s": {"v": "doc-123", "e": true},
"p": {"v": "dc:title", "e": true},
"o": {"v": "Research Paper", "e": false}
}
]
},
"chunks": [
{
"chunk": "Decoded text content of chunk",
"vectors": [[0.1, 0.2, 0.3, 0.4]]
}
]
}
```
## Query Operations
### Query Document Embeddings
**Purpose:** Find documents similar to provided vector embeddings
**Request:**
```json
{
"vectors": [
[0.1, 0.2, 0.3, 0.4, 0.5],
[0.6, 0.7, 0.8, 0.9, 1.0]
],
"limit": 10,
"user": "alice",
"collection": "research"
}
```
**Response:**
```json
{
"documents": [
"base64-encoded-document-1",
"base64-encoded-document-2"
]
}
```
## WebSocket Usage Examples
### Importing Document Embeddings
```javascript
// Connect to import endpoint
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/import/document-embeddings');
// Send document embeddings
ws.send(JSON.stringify({
metadata: {
id: "doc-123",
user: "alice",
collection: "research"
},
chunks: [
{
chunk: "Document content chunk 1",
vectors: [[0.1, 0.2, 0.3]]
}
]
}));
```
### Exporting Document Embeddings
```javascript
// Connect to export endpoint
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/export/document-embeddings');
// Listen for exported data
ws.onmessage = (event) => {
const documentEmbeddings = JSON.parse(event.data);
console.log('Received document:', documentEmbeddings.metadata.id);
console.log('Chunks:', documentEmbeddings.chunks.length);
};
```
## Data Format Details
### Metadata Format
Each metadata triple contains:
- `s`: Subject (object with `v` for value and `e` for is_entity boolean)
- `p`: Predicate (object with `v` for value and `e` for is_entity boolean)
- `o`: Object (object with `v` for value and `e` for is_entity boolean)
### Vector Format
- Vectors are arrays of floating-point numbers
- Each chunk can have multiple vectors (different embedding models)
- Vectors should be consistently dimensioned within a collection
### Text Encoding
- Chunk text is handled as UTF-8 encoded bytes internally
- WebSocket API accepts/returns plain text strings
- Base64 encoding used for binary data in query responses
## Python SDK
```python
from trustgraph.clients.document_embeddings_client import DocumentEmbeddingsClient
# Create client
client = DocumentEmbeddingsClient()
# Query similar documents
request = {
"vectors": [[0.1, 0.2, 0.3, 0.4]],
"limit": 5,
"user": "alice",
"collection": "research"
}
response = await client.query(request)
documents = response.documents
```
## Integration with TrustGraph
### Storage Integration
- Document embeddings are stored in vector databases
- Metadata is cross-referenced with knowledge graph
- Supports multi-tenant isolation by user and collection
### Processing Pipeline
1. **Document Ingestion**: Text documents loaded via text-load API
2. **Chunking**: Documents split into manageable chunks
3. **Embedding Generation**: Vector embeddings created for each chunk
4. **Storage**: Embeddings stored via import API
5. **Retrieval**: Similar documents found via query API
### Use Cases
- **Semantic Search**: Find documents similar to query embeddings
- **RAG Systems**: Retrieve relevant document chunks for question answering
- **Document Clustering**: Group similar documents using embeddings
- **Content Recommendations**: Suggest related documents to users
- **Knowledge Discovery**: Find connections between document collections
## Error Handling
Common error scenarios:
- Invalid vector dimensions
- Missing required metadata fields
- User/collection access restrictions
- WebSocket connection failures
- Malformed JSON data
Errors are returned in the response `error` field:
```json
{
"error": {
"type": "ValidationError",
"message": "Invalid vector dimensions"
}
}
```
## Performance Considerations
- **Batch Processing**: Import multiple documents in single WebSocket session
- **Vector Dimensions**: Consistent embedding dimensions improve performance
- **Collection Sizing**: Limit collections to reasonable sizes for query performance
- **Real-time vs Batch**: Choose appropriate method based on use case requirements

View file

@ -0,0 +1,96 @@
# TrustGraph Document RAG API
This presents a prompt to the Document RAG service and retrieves the answer.
This makes use of a number of the other APIs behind the scenes:
Embeddings, Document Embeddings, Prompt, TextCompletion, Triples Query.
## Request/response
### Request
The request contains the following fields:
- `query`: The question to answer
### Response
The response contains the following fields:
- `response`: LLM response
## REST service
The REST service accepts a request object containing the `query` field.
The response is a JSON object containing the `response` field.
e.g.
Request:
```
{
"query": "What does NASA stand for?"
}
```
Response:
```
{
"response": "National Aeronautics and Space Administration"
}
```
## Websocket
Requests have a `request` object containing the `query` field.
Responses have a `response` object containing `response` field.
e.g.
Request:
```
{
"id": "blrqotfefnmnh7de-14",
"service": "document-rag",
"flow": "default",
"request": {
"query": "What does NASA stand for?"
}
}
```
Response:
```
{
"id": "blrqotfefnmnh7de-14",
"response": {
"response": "National Aeronautics and Space Administration"
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Document RAG API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/retrieval.py
Default request queue:
`non-persistent://tg/request/document-rag`
Default response queue:
`non-persistent://tg/response/document-rag`
Request schema:
`trustgraph.schema.DocumentRagQuery`
Response schema:
`trustgraph.schema.DocumentRagResponse`
## Pulsar Python client
The client class is
`trustgraph.clients.DocumentRagClient`
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/clients/document_rag_client.py

View file

@ -10,7 +10,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `vectors`: Embeddings response, an array of arrays. An embedding is
an array of floating-point numbers. As multiple embeddings may be
returned, an array of embeddings is returned, hence an array
@ -51,6 +51,7 @@ Request:
{
"id": "qgzw1287vfjc8wsk-2",
"service": "embeddings",
"flow": "default",
"request": {
"text": "What is a cat?"
}

View file

@ -0,0 +1,259 @@
# TrustGraph Entity Contexts API
This API provides import and export capabilities for entity contexts data. Entity contexts
associate entities with their textual context information, commonly used for entity
descriptions, definitions, or explanatory text in knowledge graphs.
## Schema Overview
### EntityContext Structure
- `entity`: Entity identifier (Value object with value, is_uri, type)
- `context`: Textual context or description string
### EntityContexts Structure
- `metadata`: Metadata including ID, user, collection, and RDF triples
- `entities`: Array of EntityContext objects
### Value Structure
- `value`: The entity value as string
- `is_uri`: Boolean indicating if the value is a URI
- `type`: Data type of the value (optional)
## Import/Export Operations
### Import - WebSocket Endpoint
**Endpoint:** `/api/v1/flow/{flow}/import/entity-contexts`
**Method:** WebSocket connection
**Request Format:**
```json
{
"metadata": {
"id": "context-batch-123",
"user": "alice",
"collection": "research",
"metadata": [
{
"s": {"value": "source-doc", "is_uri": true},
"p": {"value": "dc:title", "is_uri": true},
"o": {"value": "Research Paper", "is_uri": false}
}
]
},
"entities": [
{
"entity": {
"v": "https://example.com/Person/JohnDoe",
"e": true
},
"context": "John Doe is a researcher at MIT specializing in artificial intelligence and machine learning."
},
{
"entity": {
"v": "https://example.com/Organization/MIT",
"e": true
},
"context": "Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts."
},
{
"entity": {
"v": "machine learning",
"e": false
},
"context": "Machine learning is a method of data analysis that automates analytical model building using algorithms."
}
]
}
```
**Response:** Import operations are fire-and-forget with no response payload.
### Export - WebSocket Endpoint
**Endpoint:** `/api/v1/flow/{flow}/export/entity-contexts`
**Method:** WebSocket connection
The export endpoint streams entity contexts data in real-time. Each message contains:
```json
{
"metadata": {
"id": "context-batch-123",
"user": "alice",
"collection": "research",
"metadata": [
{
"s": {"value": "source-doc", "is_uri": true},
"p": {"value": "dc:title", "is_uri": true},
"o": {"value": "Research Paper", "is_uri": false}
}
]
},
"entities": [
{
"entity": {
"v": "https://example.com/Person/JohnDoe",
"e": true
},
"context": "John Doe is a researcher at MIT specializing in artificial intelligence."
}
]
}
```
## WebSocket Usage Examples
### Importing Entity Contexts
```javascript
// Connect to import endpoint
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/import/entity-contexts');
// Send entity contexts
ws.send(JSON.stringify({
metadata: {
id: "context-batch-1",
user: "alice",
collection: "research"
},
entities: [
{
entity: {
v: "Albert Einstein",
e: false
},
context: "Albert Einstein was a German-born theoretical physicist widely acknowledged to be one of the greatest physicists of all time."
}
]
}));
```
### Exporting Entity Contexts
```javascript
// Connect to export endpoint
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/export/entity-contexts');
// Listen for exported data
ws.onmessage = (event) => {
const entityContexts = JSON.parse(event.data);
console.log('Received contexts for', entityContexts.entities.length, 'entities');
entityContexts.entities.forEach(item => {
console.log('Entity:', item.entity.v);
console.log('Context:', item.context);
});
};
```
## Data Format Details
### Entity Format
The `entity` field uses the Value structure:
- `v`: The entity value (name, URI, identifier)
- `e`: Boolean indicating if it's a URI entity (true) or literal (false)
- `type`: Optional data type specification
### Context Format
- Plain text string providing description or context
- Can include definitions, explanations, or background information
- Supports multi-sentence descriptions and detailed context
### Metadata Format
Each metadata triple contains:
- `s`: Subject (object with `value` and `is_uri` fields)
- `p`: Predicate (object with `value` and `is_uri` fields)
- `o`: Object (object with `value` and `is_uri` fields)
## Integration with TrustGraph
### Storage Integration
- Entity contexts are stored in graph databases
- Links entities to their descriptive text
- Supports multi-tenant isolation by user and collection
### Processing Pipeline
1. **Text Analysis**: Extract entities from documents
2. **Context Extraction**: Identify descriptive text for entities
3. **Entity Linking**: Associate entities with their contexts
4. **Import**: Store entity-context pairs via import API
5. **Knowledge Enhancement**: Use contexts for better entity understanding
### Use Cases
- **Entity Disambiguation**: Provide context to distinguish similar entities
- **Knowledge Base Enhancement**: Add descriptive information to entities
- **Question Answering**: Use entity contexts to provide detailed answers
- **Entity Summarization**: Generate summaries based on collected contexts
- **Knowledge Graph Visualization**: Display rich entity information
## Authentication
Both import and export endpoints support authentication:
- API token authentication via Authorization header
- Flow-based access control
- User and collection isolation
## Error Handling
Common error scenarios:
- Invalid JSON format
- Missing required metadata fields
- User/collection access restrictions
- WebSocket connection failures
- Invalid entity value formats
Errors are typically handled at the WebSocket connection level with connection termination or error messages.
## Performance Considerations
- **Batch Processing**: Import multiple entity contexts in single messages
- **Context Length**: Balance detailed context with performance
- **Flow Capacity**: Ensure target flow can handle entity context volume
- **Real-time vs Batch**: Choose appropriate method based on use case
## Python Integration
While no direct Python SDK is mentioned in the codebase, integration can be achieved through:
```python
import websocket
import json
# Connect to import endpoint
def import_entity_contexts(flow_id, contexts_data):
ws_url = f"ws://api-gateway:8080/api/v1/flow/{flow_id}/import/entity-contexts"
ws = websocket.create_connection(ws_url)
# Send data
ws.send(json.dumps(contexts_data))
ws.close()
# Usage example
contexts = {
"metadata": {
"id": "batch-1",
"user": "alice",
"collection": "research"
},
"entities": [
{
"entity": {"v": "Neural Networks", "e": False},
"context": "Neural networks are computing systems inspired by biological neural networks."
}
]
}
import_entity_contexts("my-flow", contexts)
```
## Features
- **Real-time Streaming**: WebSocket-based import/export for live data flow
- **Batch Operations**: Process multiple entity contexts efficiently
- **Rich Metadata**: Full metadata support with RDF triples
- **Entity Types**: Support for both URI entities and literal values
- **Flow Integration**: Direct integration with TrustGraph processing flows
- **Multi-tenant Support**: User and collection-based data isolation

252
docs/apis/api-flow.md Normal file
View file

@ -0,0 +1,252 @@
# TrustGraph Flow API
This API provides workflow management for TrustGraph components. It manages flow classes
(workflow templates) and flow instances (active running workflows) that orchestrate
complex data processing pipelines.
## Request/response
### Request
The request contains the following fields:
- `operation`: The operation to perform (see operations below)
- `class_name`: Flow class name (for class operations and start-flow)
- `class_definition`: Flow class definition JSON (for put-class)
- `description`: Flow description (for start-flow)
- `flow_id`: Flow instance ID (for flow instance operations)
### Response
The response contains the following fields:
- `class_names`: Array of flow class names (returned by list-classes)
- `flow_ids`: Array of active flow IDs (returned by list-flows)
- `class_definition`: Flow class definition JSON (returned by get-class)
- `flow`: Flow instance JSON (returned by get-flow)
- `description`: Flow description (returned by get-flow)
- `error`: Error information if operation fails
## Operations
### Flow Class Operations
#### LIST-CLASSES - List All Flow Classes
Request:
```json
{
"operation": "list-classes"
}
```
Response:
```json
{
"class_names": ["pdf-processor", "text-analyzer", "knowledge-extractor"]
}
```
#### GET-CLASS - Get Flow Class Definition
Request:
```json
{
"operation": "get-class",
"class_name": "pdf-processor"
}
```
Response:
```json
{
"class_definition": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion\", \"response\": \"persistent://tg/response/text-completion\"}}, \"description\": \"PDF processing workflow\"}"
}
```
#### PUT-CLASS - Create/Update Flow Class
Request:
```json
{
"operation": "put-class",
"class_name": "pdf-processor",
"class_definition": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion\", \"response\": \"persistent://tg/response/text-completion\"}}, \"description\": \"PDF processing workflow\"}"
}
```
Response:
```json
{}
```
#### DELETE-CLASS - Remove Flow Class
Request:
```json
{
"operation": "delete-class",
"class_name": "pdf-processor"
}
```
Response:
```json
{}
```
### Flow Instance Operations
#### LIST-FLOWS - List Active Flow Instances
Request:
```json
{
"operation": "list-flows"
}
```
Response:
```json
{
"flow_ids": ["flow-123", "flow-456", "flow-789"]
}
```
#### GET-FLOW - Get Flow Instance
Request:
```json
{
"operation": "get-flow",
"flow_id": "flow-123"
}
```
Response:
```json
{
"flow": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion-flow-123\", \"response\": \"persistent://tg/response/text-completion-flow-123\"}}}",
"description": "PDF processing workflow instance"
}
```
#### START-FLOW - Start Flow Instance
Request:
```json
{
"operation": "start-flow",
"class_name": "pdf-processor",
"flow_id": "flow-123",
"description": "Processing document batch 1"
}
```
Response:
```json
{}
```
#### STOP-FLOW - Stop Flow Instance
Request:
```json
{
"operation": "stop-flow",
"flow_id": "flow-123"
}
```
Response:
```json
{}
```
## REST service
The REST service is available at `/api/v1/flow` and accepts the above request formats.
## Websocket
Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.
Request:
```json
{
"id": "unique-request-id",
"service": "flow",
"request": {
"operation": "list-classes"
}
}
```
Response:
```json
{
"id": "unique-request-id",
"response": {
"class_names": ["pdf-processor", "text-analyzer"]
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Flow API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/flows.py
Default request queue:
`non-persistent://tg/request/flow`
Default response queue:
`non-persistent://tg/response/flow`
Request schema:
`trustgraph.schema.FlowRequest`
Response schema:
`trustgraph.schema.FlowResponse`
## Python SDK
The Python SDK provides convenient access to the Flow API:
```python
from trustgraph.api.flow import FlowClient
client = FlowClient()
# List all flow classes
classes = await client.list_classes()
# Get a flow class definition
definition = await client.get_class("pdf-processor")
# Start a flow instance
await client.start_flow("pdf-processor", "flow-123", "Processing batch 1")
# List active flows
flows = await client.list_flows()
# Stop a flow instance
await client.stop_flow("flow-123")
```
## Features
- **Flow Classes**: Templates that define workflow structure and interfaces
- **Flow Instances**: Active running workflows based on flow classes
- **Dynamic Management**: Flows can be started/stopped dynamically
- **Template Processing**: Uses template replacement for customizing flow instances
- **Integration**: Works with TrustGraph ecosystem for data processing pipelines
- **Persistent Storage**: Flow definitions and instances stored for reliability
## Use Cases
- **Document Processing**: Orchestrating PDF processing through chunking, extraction, and storage
- **Knowledge Extraction**: Managing workflows for relationship and definition extraction
- **Data Pipelines**: Coordinating complex multi-step data processing workflows
- **Resource Management**: Dynamically scaling processing flows based on demand

View file

@ -17,7 +17,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `entities`: An array of graph entities. The entity type is described here:
TrustGraph uses the same schema for knowledge graph elements:
@ -85,6 +85,7 @@ Request:
{
"id": "qgzw1287vfjc8wsk-3",
"service": "graph-embeddings-query",
"flow": "default",
"request": {
"vectors": [
[

View file

@ -14,7 +14,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `response`: LLM response
## REST service
@ -52,6 +52,7 @@ Request:
{
"id": "blrqotfefnmnh7de-14",
"service": "graph-rag",
"flow": "default",
"request": {
"query": "What does NASA stand for?"
}

310
docs/apis/api-knowledge.md Normal file
View file

@ -0,0 +1,310 @@
# TrustGraph Knowledge API
This API provides knowledge graph management for TrustGraph. It handles storage, retrieval,
and flow integration of knowledge cores containing RDF triples and graph embeddings with
multi-tenant support.
## Request/response
### Request
The request contains the following fields:
- `operation`: The operation to perform (see operations below)
- `user`: User identifier (for user-specific operations)
- `id`: Knowledge core identifier
- `flow`: Flow identifier (for load operations)
- `collection`: Collection identifier (for load operations)
- `triples`: RDF triples data (for put operations)
- `graph_embeddings`: Graph embeddings data (for put operations)
### Response
The response contains the following fields:
- `error`: Error information if operation fails
- `ids`: Array of knowledge core IDs (returned by list operation)
- `eos`: End of stream indicator for streaming responses
- `triples`: RDF triples data (returned by get operation)
- `graph_embeddings`: Graph embeddings data (returned by get operation)
## Operations
### PUT-KG-CORE - Store Knowledge Core
Request:
```json
{
"operation": "put-kg-core",
"user": "alice",
"id": "core-123",
"triples": {
"metadata": {
"id": "core-123",
"user": "alice",
"collection": "research"
},
"triples": [
{
"s": {"value": "Person1", "is_uri": true},
"p": {"value": "hasName", "is_uri": true},
"o": {"value": "John Doe", "is_uri": false}
},
{
"s": {"value": "Person1", "is_uri": true},
"p": {"value": "worksAt", "is_uri": true},
"o": {"value": "Company1", "is_uri": true}
}
]
},
"graph_embeddings": {
"metadata": {
"id": "core-123",
"user": "alice",
"collection": "research"
},
"entities": [
{
"entity": {"value": "Person1", "is_uri": true},
"vectors": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
}
]
}
}
```
Response:
```json
{}
```
### GET-KG-CORE - Retrieve Knowledge Core
Request:
```json
{
"operation": "get-kg-core",
"id": "core-123"
}
```
Response:
```json
{
"triples": {
"metadata": {
"id": "core-123",
"user": "alice",
"collection": "research"
},
"triples": [
{
"s": {"value": "Person1", "is_uri": true},
"p": {"value": "hasName", "is_uri": true},
"o": {"value": "John Doe", "is_uri": false}
}
]
},
"graph_embeddings": {
"metadata": {
"id": "core-123",
"user": "alice",
"collection": "research"
},
"entities": [
{
"entity": {"value": "Person1", "is_uri": true},
"vectors": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
}
]
}
}
```
### LIST-KG-CORES - List Knowledge Cores
Request:
```json
{
"operation": "list-kg-cores",
"user": "alice"
}
```
Response:
```json
{
"ids": ["core-123", "core-456", "core-789"]
}
```
### DELETE-KG-CORE - Delete Knowledge Core
Request:
```json
{
"operation": "delete-kg-core",
"user": "alice",
"id": "core-123"
}
```
Response:
```json
{}
```
### LOAD-KG-CORE - Load Knowledge Core into Flow
Request:
```json
{
"operation": "load-kg-core",
"id": "core-123",
"flow": "qa-flow",
"collection": "research"
}
```
Response:
```json
{}
```
### UNLOAD-KG-CORE - Unload Knowledge Core from Flow
Request:
```json
{
"operation": "unload-kg-core",
"id": "core-123"
}
```
Response:
```json
{}
```
## Data Structures
### Triple Structure
Each RDF triple contains:
- `s`: Subject (Value object)
- `p`: Predicate (Value object)
- `o`: Object (Value object)
### Value Structure
- `value`: The actual value as string
- `is_uri`: Boolean indicating if value is a URI
- `type`: Data type of the value (optional)
### Triples Structure
- `metadata`: Metadata including ID, user, collection
- `triples`: Array of Triple objects
### Graph Embeddings Structure
- `metadata`: Metadata including ID, user, collection
- `entities`: Array of EntityEmbeddings objects
### Entity Embeddings Structure
- `entity`: The entity being embedded (Value object)
- `vectors`: Array of vector embeddings (Array of Array of Double)
## REST service
The REST service is available at `/api/v1/knowledge` and accepts the above request formats.
## Websocket
Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.
Request:
```json
{
"id": "unique-request-id",
"service": "knowledge",
"request": {
"operation": "list-kg-cores",
"user": "alice"
}
}
```
Response:
```json
{
"id": "unique-request-id",
"response": {
"ids": ["core-123", "core-456"]
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Knowledge API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/knowledge.py
Default request queue:
`non-persistent://tg/request/knowledge`
Default response queue:
`non-persistent://tg/response/knowledge`
Request schema:
`trustgraph.schema.KnowledgeRequest`
Response schema:
`trustgraph.schema.KnowledgeResponse`
## Python SDK
The Python SDK provides convenient access to the Knowledge API:
```python
from trustgraph.api.knowledge import KnowledgeClient
client = KnowledgeClient()
# List knowledge cores
cores = await client.list_kg_cores("alice")
# Get a knowledge core
core = await client.get_kg_core("core-123")
# Store a knowledge core
await client.put_kg_core(
user="alice",
id="core-123",
triples=triples_data,
graph_embeddings=embeddings_data
)
# Load core into flow
await client.load_kg_core("core-123", "qa-flow", "research")
# Delete a knowledge core
await client.delete_kg_core("alice", "core-123")
```
## Features
- **Knowledge Core Management**: Store, retrieve, list, and delete knowledge cores
- **Dual Data Types**: Support for both RDF triples and graph embeddings
- **Flow Integration**: Load knowledge cores into processing flows
- **Multi-tenant Support**: User-specific knowledge cores with isolation
- **Streaming Support**: Efficient transfer of large knowledge cores
- **Collection Organization**: Group knowledge cores by collection
- **Semantic Reasoning**: RDF triples enable symbolic reasoning
- **Vector Similarity**: Graph embeddings enable neural approaches
## Use Cases
- **Knowledge Base Construction**: Build semantic knowledge graphs from documents
- **Question Answering**: Load knowledge cores for graph-based QA systems
- **Semantic Search**: Use embeddings for similarity-based knowledge retrieval
- **Multi-domain Knowledge**: Organize knowledge by user and collection
- **Hybrid Reasoning**: Combine symbolic (triples) and neural (embeddings) approaches
- **Knowledge Transfer**: Export and import knowledge cores between systems

360
docs/apis/api-librarian.md Normal file
View file

@ -0,0 +1,360 @@
# TrustGraph Librarian API
This API provides document library management for TrustGraph. It handles document storage,
metadata management, and processing orchestration using hybrid storage (MinIO for content,
Cassandra for metadata) with multi-user support.
## Request/response
### Request
The request contains the following fields:
- `operation`: The operation to perform (see operations below)
- `document_id`: Document identifier (for document operations)
- `document_metadata`: Document metadata object (for add/update operations)
- `content`: Document content as base64-encoded bytes (for add operations)
- `processing_id`: Processing job identifier (for processing operations)
- `processing_metadata`: Processing metadata object (for add-processing)
- `user`: User identifier (required for most operations)
- `collection`: Collection filter (optional for list operations)
- `criteria`: Query criteria array (for filtering operations)
### Response
The response contains the following fields:
- `error`: Error information if operation fails
- `document_metadata`: Single document metadata (for get operations)
- `content`: Document content as base64-encoded bytes (for get-content)
- `document_metadatas`: Array of document metadata (for list operations)
- `processing_metadatas`: Array of processing metadata (for list-processing)
## Document Operations
### ADD-DOCUMENT - Add Document to Library
Request:
```json
{
"operation": "add-document",
"document_metadata": {
"id": "doc-123",
"time": 1640995200000,
"kind": "application/pdf",
"title": "Research Paper",
"comments": "Important research findings",
"user": "alice",
"tags": ["research", "ai", "machine-learning"],
"metadata": [
{
"subject": "doc-123",
"predicate": "dc:creator",
"object": "Dr. Smith"
}
]
},
"content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
}
```
Response:
```json
{}
```
### GET-DOCUMENT-METADATA - Get Document Metadata
Request:
```json
{
"operation": "get-document-metadata",
"document_id": "doc-123",
"user": "alice"
}
```
Response:
```json
{
"document_metadata": {
"id": "doc-123",
"time": 1640995200000,
"kind": "application/pdf",
"title": "Research Paper",
"comments": "Important research findings",
"user": "alice",
"tags": ["research", "ai", "machine-learning"],
"metadata": [
{
"subject": "doc-123",
"predicate": "dc:creator",
"object": "Dr. Smith"
}
]
}
}
```
### GET-DOCUMENT-CONTENT - Get Document Content
Request:
```json
{
"operation": "get-document-content",
"document_id": "doc-123",
"user": "alice"
}
```
Response:
```json
{
"content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
}
```
### LIST-DOCUMENTS - List User's Documents
Request:
```json
{
"operation": "list-documents",
"user": "alice",
"collection": "research"
}
```
Response:
```json
{
"document_metadatas": [
{
"id": "doc-123",
"time": 1640995200000,
"kind": "application/pdf",
"title": "Research Paper",
"comments": "Important research findings",
"user": "alice",
"tags": ["research", "ai"]
},
{
"id": "doc-124",
"time": 1640995300000,
"kind": "text/plain",
"title": "Meeting Notes",
"comments": "Team meeting discussion",
"user": "alice",
"tags": ["meeting", "notes"]
}
]
}
```
### UPDATE-DOCUMENT - Update Document Metadata
Request:
```json
{
"operation": "update-document",
"document_metadata": {
"id": "doc-123",
"title": "Updated Research Paper",
"comments": "Updated findings and conclusions",
"user": "alice",
"tags": ["research", "ai", "machine-learning", "updated"]
}
}
```
Response:
```json
{}
```
### REMOVE-DOCUMENT - Remove Document
Request:
```json
{
"operation": "remove-document",
"document_id": "doc-123",
"user": "alice"
}
```
Response:
```json
{}
```
## Processing Operations
### ADD-PROCESSING - Start Document Processing
Request:
```json
{
"operation": "add-processing",
"processing_metadata": {
"id": "proc-456",
"document_id": "doc-123",
"time": 1640995400000,
"flow": "pdf-extraction",
"user": "alice",
"collection": "research",
"tags": ["extraction", "nlp"]
}
}
```
Response:
```json
{}
```
### LIST-PROCESSING - List Processing Jobs
Request:
```json
{
"operation": "list-processing",
"user": "alice",
"collection": "research"
}
```
Response:
```json
{
"processing_metadatas": [
{
"id": "proc-456",
"document_id": "doc-123",
"time": 1640995400000,
"flow": "pdf-extraction",
"user": "alice",
"collection": "research",
"tags": ["extraction", "nlp"]
}
]
}
```
### REMOVE-PROCESSING - Stop Processing Job
Request:
```json
{
"operation": "remove-processing",
"processing_id": "proc-456",
"user": "alice"
}
```
Response:
```json
{}
```
## REST service
The REST service is available at `/api/v1/librarian` and accepts the above request formats.
## Websocket
Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.
Request:
```json
{
"id": "unique-request-id",
"service": "librarian",
"request": {
"operation": "list-documents",
"user": "alice"
}
}
```
Response:
```json
{
"id": "unique-request-id",
"response": {
"document_metadatas": [...]
},
"complete": true
}
```
## Pulsar
The Pulsar schema for the Librarian API is defined in Python code here:
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/library.py
Default request queue:
`non-persistent://tg/request/librarian`
Default response queue:
`non-persistent://tg/response/librarian`
Request schema:
`trustgraph.schema.LibrarianRequest`
Response schema:
`trustgraph.schema.LibrarianResponse`
## Python SDK
The Python SDK provides convenient access to the Librarian API:
```python
from trustgraph.api.library import LibrarianClient
client = LibrarianClient()
# Add a document
with open("document.pdf", "rb") as f:
content = f.read()
await client.add_document(
doc_id="doc-123",
title="Research Paper",
content=content,
user="alice",
tags=["research", "ai"]
)
# Get document metadata
metadata = await client.get_document_metadata("doc-123", "alice")
# List documents
documents = await client.list_documents("alice", collection="research")
# Start processing
await client.add_processing(
processing_id="proc-456",
document_id="doc-123",
flow="pdf-extraction",
user="alice"
)
```
## Features
- **Hybrid Storage**: MinIO for content, Cassandra for metadata
- **Multi-user Support**: User-based document ownership and access control
- **Rich Metadata**: RDF-style metadata triples and tagging system
- **Processing Integration**: Automatic triggering of document processing workflows
- **Content Types**: Support for multiple document formats (PDF, text, etc.)
- **Collection Management**: Optional document grouping by collection
- **Metadata Search**: Query documents by metadata criteria
## Use Cases
- **Document Management**: Store and organize documents with rich metadata
- **Knowledge Extraction**: Process documents to extract structured knowledge
- **Research Libraries**: Manage collections of research papers and documents
- **Content Processing**: Orchestrate document processing workflows
- **Multi-tenant Systems**: Support multiple users with isolated document libraries

313
docs/apis/api-metrics.md Normal file
View file

@ -0,0 +1,313 @@
# TrustGraph Metrics API
This API provides access to TrustGraph system metrics through a Prometheus proxy endpoint.
It allows authenticated access to monitoring and observability data from the TrustGraph
system components.
## Overview
The Metrics API is implemented as a proxy to a Prometheus metrics server, providing:
- System performance metrics
- Service health information
- Resource utilization data
- Request/response statistics
- Error rates and latency metrics
## Authentication
All metrics endpoints require Bearer token authentication:
```
Authorization: Bearer <your-api-token>
```
Unauthorized requests return HTTP 401.
## Endpoint
**Base Path:** `/api/metrics`
**Method:** GET
**Description:** Proxies requests to the underlying Prometheus API
## Usage Examples
### Query Current Metrics
```bash
# Get all available metrics
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/metrics/query?query=up"
# Get specific metric with time range
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/metrics/query_range?query=cpu_usage&start=1640995200&end=1640998800&step=60"
# Get metric metadata
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/metrics/metadata"
```
### Common Prometheus API Endpoints
The metrics API supports all standard Prometheus API endpoints:
#### Instant Queries
```
GET /api/metrics/query?query=<prometheus_query>
```
#### Range Queries
```
GET /api/metrics/query_range?query=<query>&start=<timestamp>&end=<timestamp>&step=<duration>
```
#### Metadata
```
GET /api/metrics/metadata
GET /api/metrics/metadata?metric=<metric_name>
```
#### Series
```
GET /api/metrics/series?match[]=<series_selector>
```
#### Label Values
```
GET /api/metrics/label/<label_name>/values
```
#### Targets
```
GET /api/metrics/targets
```
## Example Queries
### System Health
```bash
# Check if services are up
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=up"
# Get service uptime
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=time()-process_start_time_seconds"
```
### Performance Metrics
```bash
# CPU usage
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=rate(cpu_seconds_total[5m])"
# Memory usage
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=process_resident_memory_bytes"
# Request rate
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=rate(http_requests_total[5m])"
```
### TrustGraph-Specific Metrics
```bash
# Document processing rate
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=rate(trustgraph_documents_processed_total[5m])"
# Knowledge graph size
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=trustgraph_triples_count"
# Embedding generation rate
curl -H "Authorization: Bearer token" \
"http://api-gateway:8080/api/metrics/query?query=rate(trustgraph_embeddings_generated_total[5m])"
```
## Response Format
Responses follow the standard Prometheus API format:
### Successful Query Response
```json
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "up",
"instance": "api-gateway:8080",
"job": "trustgraph"
},
"value": [1640995200, "1"]
}
]
}
}
```
### Range Query Response
```json
{
"status": "success",
"data": {
"resultType": "matrix",
"result": [
{
"metric": {
"__name__": "cpu_usage",
"instance": "worker-1"
},
"values": [
[1640995200, "0.15"],
[1640995260, "0.18"],
[1640995320, "0.12"]
]
}
]
}
}
```
### Error Response
```json
{
"status": "error",
"errorType": "bad_data",
"error": "invalid query syntax"
}
```
## Available Metrics
### Standard System Metrics
- `up`: Service availability (1 = up, 0 = down)
- `process_resident_memory_bytes`: Memory usage
- `process_cpu_seconds_total`: CPU time
- `http_requests_total`: HTTP request count
- `http_request_duration_seconds`: Request latency
### TrustGraph-Specific Metrics
- `trustgraph_documents_processed_total`: Documents processed count
- `trustgraph_triples_count`: Knowledge graph triple count
- `trustgraph_embeddings_generated_total`: Embeddings generated count
- `trustgraph_flow_executions_total`: Flow execution count
- `trustgraph_pulsar_messages_total`: Pulsar message count
- `trustgraph_errors_total`: Error count by component
## Time Series Queries
### Time Ranges
Use standard Prometheus time range formats:
- `5m`: 5 minutes
- `1h`: 1 hour
- `1d`: 1 day
- `1w`: 1 week
### Rate Calculations
```bash
# 5-minute rate
rate(metric_name[5m])
# Increase over time
increase(metric_name[1h])
```
### Aggregations
```bash
# Sum across instances
sum(metric_name)
# Average by label
avg by (instance) (metric_name)
# Top 5 values
topk(5, metric_name)
```
## Integration Examples
### Python Integration
```python
import requests
def query_metrics(token, query):
headers = {"Authorization": f"Bearer {token}"}
params = {"query": query}
response = requests.get(
"http://api-gateway:8080/api/metrics/query",
headers=headers,
params=params
)
return response.json()
# Get system uptime
uptime = query_metrics("your-token", "time() - process_start_time_seconds")
```
### JavaScript Integration
```javascript
async function queryMetrics(token, query) {
const response = await fetch(
`http://api-gateway:8080/api/metrics/query?query=${encodeURIComponent(query)}`,
{
headers: {
'Authorization': `Bearer ${token}`
}
}
);
return await response.json();
}
// Get request rate
const requestRate = await queryMetrics('your-token', 'rate(http_requests_total[5m])');
```
## Error Handling
### Common HTTP Status Codes
- `200`: Success
- `400`: Bad request (invalid query)
- `401`: Unauthorized (invalid/missing token)
- `422`: Unprocessable entity (query execution error)
- `500`: Internal server error
### Error Types
- `bad_data`: Invalid query syntax
- `timeout`: Query execution timeout
- `canceled`: Query was canceled
- `execution`: Query execution error
## Best Practices
### Query Optimization
- Use appropriate time ranges to limit data volume
- Apply label filters to reduce result sets
- Use recording rules for frequently accessed metrics
### Rate Limiting
- Avoid high-frequency polling
- Cache results when appropriate
- Use appropriate step sizes for range queries
### Security
- Keep API tokens secure
- Use HTTPS in production
- Rotate tokens regularly
## Use Cases
- **System Monitoring**: Track system health and performance
- **Capacity Planning**: Monitor resource utilization trends
- **Alerting**: Set up alerts based on metric thresholds
- **Performance Analysis**: Analyze system performance over time
- **Debugging**: Investigate issues using detailed metrics
- **Business Intelligence**: Track document processing and knowledge extraction metrics

View file

@ -15,7 +15,7 @@ The request contains the following fields:
### Response
The request contains either of these fields:
The response contains either of these fields:
- `text`: A plain text response
- `object`: A structured object, JSON-encoded
@ -60,6 +60,7 @@ Request:
{
"id": "akshfkiehfkseffh-142",
"service": "prompt",
"flow": "default",
"request": {
"id": "extract-definitions",
"variables": {

View file

@ -19,7 +19,7 @@ The request contains the following fields:
### Response
The request contains the following fields:
The response contains the following fields:
- `response`: LLM response
## REST service
@ -59,6 +59,7 @@ Request:
{
"id": "blrqotfefnmnh7de-1",
"service": "text-completion",
"flow": "default",
"request": {
"system": "You are a helpful agent",
"prompt": "What does NASA stand for?"

168
docs/apis/api-text-load.md Normal file
View file

@ -0,0 +1,168 @@
# TrustGraph Text Load API
This API loads text documents into TrustGraph processing pipelines. It's a sender API
that accepts text documents with metadata and queues them for processing through
specified flows.
## Request Format
The text-load API accepts a JSON request with the following fields:
- `id`: Document identifier (typically a URI)
- `metadata`: Array of RDF triples providing document metadata
- `charset`: Character encoding (defaults to "utf-8")
- `text`: Base64-encoded text content
- `user`: User identifier (defaults to "trustgraph")
- `collection`: Collection identifier (defaults to "default")
## Request Example
```json
{
"id": "https://example.com/documents/research-paper-123",
"metadata": [
{
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
"p": {"v": "http://purl.org/dc/terms/title", "e": true},
"o": {"v": "Machine Learning in Healthcare", "e": false}
},
{
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
"p": {"v": "http://purl.org/dc/terms/creator", "e": true},
"o": {"v": "Dr. Jane Smith", "e": false}
},
{
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
"p": {"v": "http://purl.org/dc/terms/subject", "e": true},
"o": {"v": "Healthcare AI", "e": false}
}
],
"charset": "utf-8",
"text": "VGhpcyBpcyBhIHNhbXBsZSByZXNlYXJjaCBwYXBlciBhYm91dCBtYWNoaW5lIGxlYXJuaW5nIGluIGhlYWx0aGNhcmUuLi4=",
"user": "researcher",
"collection": "healthcare-research"
}
```
## Response
The text-load API is a sender API with no response body. Success is indicated by HTTP status code 200.
## REST service
The text-load service is available at:
`POST /api/v1/flow/{flow-id}/service/text-load`
Where `{flow-id}` is the identifier of the flow that will process the document.
Example:
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d @document.json \
http://api-gateway:8080/api/v1/flow/pdf-processing/service/text-load
```
## Metadata Format
Each metadata triple contains:
- `s`: Subject (object with `v` for value and `e` for is_entity boolean)
- `p`: Predicate (object with `v` for value and `e` for is_entity boolean)
- `o`: Object (object with `v` for value and `e` for is_entity boolean)
The `e` field indicates whether the value should be treated as an entity (true) or literal (false).
## Common Metadata Properties
### Document Properties
- `http://purl.org/dc/terms/title`: Document title
- `http://purl.org/dc/terms/creator`: Document author
- `http://purl.org/dc/terms/subject`: Document subject/topic
- `http://purl.org/dc/terms/description`: Document description
- `http://purl.org/dc/terms/date`: Publication date
- `http://purl.org/dc/terms/language`: Document language
### Organizational Properties
- `http://xmlns.com/foaf/0.1/name`: Organization name
- `http://www.w3.org/2006/vcard/ns#hasAddress`: Organization address
- `http://xmlns.com/foaf/0.1/homepage`: Organization website
### Publication Properties
- `http://purl.org/ontology/bibo/doi`: DOI identifier
- `http://purl.org/ontology/bibo/isbn`: ISBN identifier
- `http://purl.org/ontology/bibo/volume`: Publication volume
- `http://purl.org/ontology/bibo/issue`: Publication issue
## Text Encoding
The `text` field must contain base64-encoded content. To encode text:
```bash
# Command line encoding
echo "Your text content here" | base64
# Python encoding
import base64
encoded_text = base64.b64encode("Your text content here".encode('utf-8')).decode('utf-8')
```
## Integration with Processing Flows
Once loaded, text documents are processed through the specified flow, which typically includes:
1. **Text Chunking**: Breaking documents into manageable chunks
2. **Embedding Generation**: Creating vector embeddings for semantic search
3. **Knowledge Extraction**: Extracting entities and relationships
4. **Graph Storage**: Storing extracted knowledge in the knowledge graph
5. **Indexing**: Making content searchable for RAG queries
## Error Handling
Common errors include:
- Invalid base64 encoding in text field
- Missing required fields (id, text)
- Invalid metadata triple format
- Flow not found or inactive
## Python SDK
```python
import base64
from trustgraph.api.text_load import TextLoadClient
client = TextLoadClient()
# Prepare document
document = {
"id": "https://example.com/doc-123",
"metadata": [
{
"s": {"v": "https://example.com/doc-123", "e": True},
"p": {"v": "http://purl.org/dc/terms/title", "e": True},
"o": {"v": "Sample Document", "e": False}
}
],
"charset": "utf-8",
"text": base64.b64encode("Document content here".encode('utf-8')).decode('utf-8'),
"user": "alice",
"collection": "research"
}
# Load document
await client.load_text_document("my-flow", document)
```
## Use Cases
- **Research Paper Ingestion**: Load academic papers with rich metadata
- **Document Processing**: Ingest documents for knowledge extraction
- **Content Management**: Build searchable document repositories
- **RAG System Population**: Load content for question-answering systems
- **Knowledge Base Construction**: Convert documents into structured knowledge
## Features
- **Rich Metadata**: Full RDF metadata support for semantic annotation
- **Flow Integration**: Direct integration with TrustGraph processing flows
- **Multi-tenant**: User and collection-based document organization
- **Encoding Support**: Flexible character encoding support
- **No Response Required**: Fire-and-forget operation for high throughput

View file

@ -21,7 +21,7 @@ Returned triples will match all of `s`, `p` and `o` where provided.
### Response
The request contains the following fields:
The response contains the following fields:
- `response`: A list of triples.
Each triple contains `s`, `p` and `o` fields describing the
@ -33,15 +33,53 @@ Each triple element uses the same schema:
- `is_uri`: A boolean value which is true if this is a graph entity i.e.
`value` is a URI, not a literal value.
## Data Format Details
### Triple Element Format
To reduce the size of JSON messages, triple elements (subject, predicate, object) are encoded using a compact format:
- `v`: The value as a string (maps to `value` in the full schema)
- `e`: Boolean indicating if this is an entity/URI (maps to `is_uri` in the full schema)
Each triple element (`s`, `p`, `o`) contains:
- `v`: The actual value as a string
- `e`: Boolean indicating the value type
- `true`: The value is a URI/entity (e.g., `"http://example.com/Person1"`)
- `false`: The value is a literal (e.g., `"John Doe"`, `"42"`, `"2023-01-01"`)
### Examples
**URI/Entity Element:**
```json
{
"v": "http://trustgraph.ai/e/space-station-modules",
"e": true
}
```
**Literal Element:**
```json
{
"v": "space station modules",
"e": false
}
```
**Numeric Literal:**
```json
{
"v": "42",
"e": false
}
```
## REST service
The REST service accepts a request object containing the `s`, `p`, `o`
and `limit` fields.
The response is a JSON object containing the `response` field.
To reduce the size of the JSON, the graph entities are encoded as an
object with `value` and `is_uri` mapped to `v` and `e` respectively.
e.g.
This example query matches triples with a subject of
@ -58,6 +96,7 @@ Request:
{
"id": "qgzw1287vfjc8wsk-4",
"service": "triples-query",
"flow": "default",
"request": {
"s": {
"v": "http://trustgraph.ai/e/space-station-modules",
@ -97,13 +136,9 @@ Response:
## Websocket
Requests have a `request` object containing the `system` and
`prompt` fields.
Requests have a `request` object containing the query fields (`s`, `p`, `o`, `limit`).
Responses have a `response` object containing `response` field.
To reduce the size of the JSON, the graph entities are encoded as an
object with `value` and `is_uri` mapped to `v` and `e` respectively.
e.g.
Request:
@ -178,10 +213,3 @@ The client class is
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/clients/triples_query_client.py

View file

@ -1,3 +1,230 @@
# TrustGraph Pulsar API
Coming soon
Apache Pulsar is the underlying message queue system used by TrustGraph for inter-component communication. Understanding Pulsar queue names is essential for direct integration with TrustGraph services.
## Overview
TrustGraph uses two types of APIs with different queue naming patterns:
1. **Global Services**: Fixed queue names, not dependent on flows
2. **Flow-Hosted Services**: Dynamic queue names that depend on the specific flow configuration
## Global Services (Fixed Queue Names)
These services run independently and have fixed Pulsar queue names:
### Config API
- **Request Queue**: `non-persistent://tg/request/config`
- **Response Queue**: `non-persistent://tg/response/config`
- **Push Queue**: `persistent://tg/config/config`
### Flow API
- **Request Queue**: `non-persistent://tg/request/flow`
- **Response Queue**: `non-persistent://tg/response/flow`
### Knowledge API
- **Request Queue**: `non-persistent://tg/request/knowledge`
- **Response Queue**: `non-persistent://tg/response/knowledge`
### Librarian API
- **Request Queue**: `non-persistent://tg/request/librarian`
- **Response Queue**: `non-persistent://tg/response/librarian`
## Flow-Hosted Services (Dynamic Queue Names)
These services are hosted within specific flows and have queue names that depend on the flow configuration:
- Agent API
- Document RAG API
- Graph RAG API
- Text Completion API
- Prompt API
- Embeddings API
- Graph Embeddings API
- Triples Query API
- Text Load API
- Document Load API
## Discovering Flow-Hosted Queue Names
To find the queue names for flow-hosted services, you need to query the flow configuration using the Config API.
### Method 1: Using the Config API
Query for the flow configuration:
**Request:**
```json
{
"operation": "get",
"keys": [
{
"type": "flows",
"key": "your-flow-name"
}
]
}
```
**Response:**
The response will contain a flow definition with an "interfaces" object that lists all queue names.
### Method 2: Using the CLI
Use the TrustGraph CLI to dump the configuration:
```bash
tg-show-config
```
## Flow Interface Types
Flow configurations define two types of service interfaces:
### 1. Request/Response Interfaces
Services that accept a request and return a response:
```json
{
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:document-rag+graph-rag",
"response": "non-persistent://tg/response/graph-rag:document-rag+graph-rag"
}
}
```
**Examples**: agent, document-rag, graph-rag, text-completion, prompt, embeddings, graph-embeddings, triples
### 2. Fire-and-Forget Interfaces
Services that accept data but don't return a response:
```json
{
"text-load": "persistent://tg/flow/text-document-load:default"
}
```
**Examples**: text-load, document-load, triples-store, graph-embeddings-store, document-embeddings-store, entity-contexts-load
## Example Flow Configuration
Here's an example of a complete flow configuration showing queue names:
```json
{
"class-name": "document-rag+graph-rag",
"description": "Default processing flow",
"interfaces": {
"agent": {
"request": "non-persistent://tg/request/agent:default",
"response": "non-persistent://tg/response/agent:default"
},
"document-rag": {
"request": "non-persistent://tg/request/document-rag:document-rag+graph-rag",
"response": "non-persistent://tg/response/document-rag:document-rag+graph-rag"
},
"graph-rag": {
"request": "non-persistent://tg/request/graph-rag:document-rag+graph-rag",
"response": "non-persistent://tg/response/graph-rag:document-rag+graph-rag"
},
"text-completion": {
"request": "non-persistent://tg/request/text-completion:document-rag+graph-rag",
"response": "non-persistent://tg/response/text-completion:document-rag+graph-rag"
},
"embeddings": {
"request": "non-persistent://tg/request/embeddings:document-rag+graph-rag",
"response": "non-persistent://tg/response/embeddings:document-rag+graph-rag"
},
"triples": {
"request": "non-persistent://tg/request/triples:document-rag+graph-rag",
"response": "non-persistent://tg/response/triples:document-rag+graph-rag"
},
"text-load": "persistent://tg/flow/text-document-load:default",
"document-load": "persistent://tg/flow/document-load:default",
"triples-store": "persistent://tg/flow/triples-store:default",
"graph-embeddings-store": "persistent://tg/flow/graph-embeddings-store:default"
}
}
```
## Queue Naming Patterns
### Global Services
- **Pattern**: `{persistence}://tg/{namespace}/{service-name}`
- **Example**: `non-persistent://tg/request/config`
### Flow-Hosted Request/Response
- **Pattern**: `{persistence}://tg/{namespace}/{service-name}:{flow-identifier}`
- **Example**: `non-persistent://tg/request/graph-rag:document-rag+graph-rag`
### Flow-Hosted Fire-and-Forget
- **Pattern**: `{persistence}://tg/flow/{service-name}:{flow-identifier}`
- **Example**: `persistent://tg/flow/text-document-load:default`
## Persistence Types
- **non-persistent**: Messages are not persisted to disk, faster but less reliable
- **persistent**: Messages are persisted to disk, slower but more reliable
## Practical Usage
### Python Example
```python
import pulsar
from trustgraph.schema import ConfigRequest, ConfigResponse
# Connect to Pulsar
client = pulsar.Client('pulsar://localhost:6650')
# Create producer for config requests
producer = client.create_producer(
'non-persistent://tg/request/config',
schema=pulsar.schema.AvroSchema(ConfigRequest)
)
# Create consumer for config responses
consumer = client.subscribe(
'non-persistent://tg/response/config',
subscription_name='my-subscription',
schema=pulsar.schema.AvroSchema(ConfigResponse)
)
# Send request
request = ConfigRequest(operation='list-classes')
producer.send(request)
# Receive response
response = consumer.receive()
print(response.value())
```
### Flow Service Example
```python
# First, get the flow configuration to find queue names
config_request = ConfigRequest(
operation='get',
keys=[ConfigKey(type='flows', key='my-flow')]
)
# Use the returned interface information to determine queue names
# Then connect to the appropriate queues for the service you need
```
## Best Practices
1. **Query Flow Configuration**: Always query the current flow configuration to get accurate queue names
2. **Handle Dynamic Names**: Flow-hosted service queue names can change when flows are reconfigured
3. **Choose Appropriate Persistence**: Use persistent queues for critical data, non-persistent for performance
4. **Schema Validation**: Use the appropriate Pulsar schema for each service
5. **Error Handling**: Implement proper error handling for queue connection and message failures
## Security Considerations
- Pulsar access should be restricted in production environments
- Use appropriate authentication and authorization mechanisms
- Monitor queue access and message patterns for security anomalies
- Consider encryption for sensitive data in messages

View file

@ -18,13 +18,16 @@ When hosted using docker compose, you can access the service at
## Request
A request message is a JSON message containing 3 fields:
A request message is a JSON message containing 3/4 fields:
- `id`: A unique ID which is used to correlate requests and responses.
You should make sure it is unique.
- `service`: The name of the service to invoke.
- `request`: The request body which is passed to the service - this is
defined in the API documentation for that service.
- `flow`: Some APIs are supported by processors launched within a flow,
are are dependent on a flow running. For such APIs, the flow identifier
needs to be provided.
e.g.
@ -32,6 +35,7 @@ e.g.
{
"id": "qgzw1287vfjc8wsk-1",
"service": "graph-rag",
"flow": "default",
"request": {
"query": "What does NASA stand for?"
}
@ -86,6 +90,7 @@ Request:
{
"id": "blrqotfefnmnh7de-20",
"service": "agent",
"flow": "default",
"request": {
"question": "What does NASA stand for?"
}