mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-05-05 05:12:36 +02:00
Update docs for API/CLI changes in 1.0 (#421)
* Update some API basics for the 0.23/1.0 API change
This commit is contained in:
parent
f907ea7db8
commit
44bdd29f51
69 changed files with 19981 additions and 407 deletions
|
|
@ -3,8 +3,10 @@
|
|||
|
||||
## Overview
|
||||
|
||||
If you want to interact with TrustGraph through APIs, there are 3
|
||||
forms of API which may be of interest to you:
|
||||
If you want to interact with TrustGraph through APIs, there are 4
|
||||
forms of API which may be of interest to you. All four mechanisms
|
||||
invoke the same underlying TrustGraph functionality but are made
|
||||
available for integration in different ways:
|
||||
|
||||
### Pulsar APIs
|
||||
|
||||
|
|
@ -56,6 +58,31 @@ Cons:
|
|||
using a basic REST API, particular if you want to cover all of the error
|
||||
scenarios well
|
||||
|
||||
### Python SDK API
|
||||
|
||||
The `trustgraph-base` package provides a Python SDK that wraps the underlying
|
||||
service invocations in a convenient Python API.
|
||||
|
||||
Pros:
|
||||
- Native Python integration with type hints and documentation
|
||||
- Simplified service invocation without manual message handling
|
||||
- Built-in error handling and response parsing
|
||||
- Convenient for Python-based applications and scripts
|
||||
|
||||
Cons:
|
||||
- Python-specific, not available for other programming languages
|
||||
- Requires Python environment and trustgraph-base package installation
|
||||
- Less control over low-level message handling
|
||||
|
||||
## Flow-hosted APIs
|
||||
|
||||
There are two types of APIs: Flow-hosted which need a flow to be running
|
||||
to operate. Non-flow-hosted which are core to the system, and can
|
||||
be seen as 'global' - they are not dependent on a flow to be running.
|
||||
|
||||
Knowledge, Librarian, Config and Flow APIs fall into the latter
|
||||
category.
|
||||
|
||||
## See also
|
||||
|
||||
- [TrustGraph websocket overview](websocket.md)
|
||||
|
|
@ -64,9 +91,19 @@ Cons:
|
|||
- [Text completion](api-text-completion.md)
|
||||
- [Prompt completion](api-prompt.md)
|
||||
- [Graph RAG](api-graph-rag.md)
|
||||
- [Document RAG](api-document-rag.md)
|
||||
- [Agent](api-agent.md)
|
||||
- [Embeddings](api-embeddings.md)
|
||||
- [Graph embeddings](api-graph-embeddings.md)
|
||||
- [Document embeddings](api-document-embeddings.md)
|
||||
- [Entity contexts](api-entity-contexts.md)
|
||||
- [Triples query](api-triples-query.md)
|
||||
- [Document load](api-document-load.md)
|
||||
- [Text load](api-text-load.md)
|
||||
- [Config](api-config.md)
|
||||
- [Flow](api-flow.md)
|
||||
- [Librarian](api-librarian.md)
|
||||
- [Knowledge](api-knowledge.md)
|
||||
- [Metrics](api-metrics.md)
|
||||
- [Core import/export](api-core-import-export.md)
|
||||
|
||||
|
|
|
|||
|
|
@ -18,7 +18,7 @@ The request contains the following fields:
|
|||
|
||||
### Response
|
||||
|
||||
The request contains the following fields:
|
||||
The response contains the following fields:
|
||||
- `thought`: Optional, a string, provides an interim agent thought
|
||||
- `observation`: Optional, a string, provides an interim agent thought
|
||||
- `answer`: Optional, a string, provides the final answer
|
||||
|
|
@ -61,6 +61,7 @@ Request:
|
|||
{
|
||||
"id": "blrqotfefnmnh7de-20",
|
||||
"service": "agent",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"question": "What does NASA stand for?"
|
||||
}
|
||||
|
|
|
|||
261
docs/apis/api-config.md
Normal file
261
docs/apis/api-config.md
Normal file
|
|
@ -0,0 +1,261 @@
|
|||
# TrustGraph Config API
|
||||
|
||||
This API provides centralized configuration management for TrustGraph components.
|
||||
Configuration data is organized hierarchically by type and key, with support for
|
||||
persistent storage and push notifications.
|
||||
|
||||
## Request/response
|
||||
|
||||
### Request
|
||||
|
||||
The request contains the following fields:
|
||||
- `operation`: The operation to perform (`get`, `list`, `getvalues`, `put`, `delete`, `config`)
|
||||
- `keys`: Array of ConfigKey objects (for `get`, `delete` operations)
|
||||
- `type`: Configuration type (for `list`, `getvalues` operations)
|
||||
- `values`: Array of ConfigValue objects (for `put` operation)
|
||||
|
||||
### Response
|
||||
|
||||
The response contains the following fields:
|
||||
- `version`: Version number for tracking changes
|
||||
- `values`: Array of ConfigValue objects returned by operations
|
||||
- `directory`: Array of key names returned by `list` operation
|
||||
- `config`: Full configuration map returned by `config` operation
|
||||
- `error`: Error information if operation fails
|
||||
|
||||
## Operations
|
||||
|
||||
### PUT - Store Configuration Values
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "put",
|
||||
"values": [
|
||||
{
|
||||
"type": "test",
|
||||
"key": "key1",
|
||||
"value": "value1"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"version": 123
|
||||
}
|
||||
```
|
||||
|
||||
### GET - Retrieve Configuration Values
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "get",
|
||||
"keys": [
|
||||
{
|
||||
"type": "test",
|
||||
"key": "key1"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"version": 123,
|
||||
"values": [
|
||||
{
|
||||
"type": "test",
|
||||
"key": "key1",
|
||||
"value": "value1"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### LIST - List Keys by Type
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "list",
|
||||
"type": "test"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"version": 123,
|
||||
"directory": ["key1", "key2", "key3"]
|
||||
}
|
||||
```
|
||||
|
||||
### GETVALUES - Get All Values by Type
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "getvalues",
|
||||
"type": "test"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"version": 123,
|
||||
"values": [
|
||||
{
|
||||
"type": "test",
|
||||
"key": "key1",
|
||||
"value": "value1"
|
||||
},
|
||||
{
|
||||
"type": "test",
|
||||
"key": "key2",
|
||||
"value": "value2"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### CONFIG - Get Entire Configuration
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "config"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"version": 123,
|
||||
"config": {
|
||||
"test": {
|
||||
"key1": "value1",
|
||||
"key2": "value2"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### DELETE - Remove Configuration Values
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "delete",
|
||||
"keys": [
|
||||
{
|
||||
"type": "test",
|
||||
"key": "key1"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"version": 124
|
||||
}
|
||||
```
|
||||
|
||||
## REST service
|
||||
|
||||
The REST service is available at `/api/v1/config` and accepts the above request formats.
|
||||
|
||||
## Websocket
|
||||
|
||||
Requests have a `request` object containing the operation fields.
|
||||
Responses have a `response` object containing the response fields.
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"id": "unique-request-id",
|
||||
"service": "config",
|
||||
"request": {
|
||||
"operation": "get",
|
||||
"keys": [
|
||||
{
|
||||
"type": "test",
|
||||
"key": "key1"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"id": "unique-request-id",
|
||||
"response": {
|
||||
"version": 123,
|
||||
"values": [
|
||||
{
|
||||
"type": "test",
|
||||
"key": "key1",
|
||||
"value": "value1"
|
||||
}
|
||||
]
|
||||
},
|
||||
"complete": true
|
||||
}
|
||||
```
|
||||
|
||||
## Pulsar
|
||||
|
||||
The Pulsar schema for the Config API is defined in Python code here:
|
||||
|
||||
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/config.py
|
||||
|
||||
Default request queue:
|
||||
`non-persistent://tg/request/config`
|
||||
|
||||
Default response queue:
|
||||
`non-persistent://tg/response/config`
|
||||
|
||||
Request schema:
|
||||
`trustgraph.schema.ConfigRequest`
|
||||
|
||||
Response schema:
|
||||
`trustgraph.schema.ConfigResponse`
|
||||
|
||||
## Python SDK
|
||||
|
||||
The Python SDK provides convenient access to the Config API:
|
||||
|
||||
```python
|
||||
from trustgraph.api.config import ConfigClient
|
||||
|
||||
client = ConfigClient()
|
||||
|
||||
# Put a value
|
||||
await client.put("test", "key1", "value1")
|
||||
|
||||
# Get a value
|
||||
value = await client.get("test", "key1")
|
||||
|
||||
# List keys
|
||||
keys = await client.list("test")
|
||||
|
||||
# Get all values for a type
|
||||
values = await client.get_values("test")
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Hierarchical Organization**: Configuration organized by type and key
|
||||
- **Versioning**: Each operation returns a version number for change tracking
|
||||
- **Persistent Storage**: Data stored in Cassandra for persistence
|
||||
- **Push Notifications**: Configuration changes pushed to subscribers
|
||||
- **Multiple Access Methods**: Available via Pulsar, REST, WebSocket, and Python SDK
|
||||
324
docs/apis/api-core-import-export.md
Normal file
324
docs/apis/api-core-import-export.md
Normal file
|
|
@ -0,0 +1,324 @@
|
|||
# TrustGraph Core Import/Export API
|
||||
|
||||
This API provides bulk import and export capabilities for TrustGraph knowledge cores.
|
||||
It handles efficient transfer of both RDF triples and graph embeddings using MessagePack
|
||||
binary format for high-performance data exchange.
|
||||
|
||||
## Overview
|
||||
|
||||
The Core Import/Export API enables:
|
||||
- **Bulk Import**: Import large knowledge cores from binary streams
|
||||
- **Bulk Export**: Export knowledge cores as binary streams
|
||||
- **Efficient Format**: Uses MessagePack for compact, fast serialization
|
||||
- **Dual Data Types**: Handles both RDF triples and graph embeddings
|
||||
- **Streaming**: Supports streaming for large datasets
|
||||
|
||||
## Import Endpoint
|
||||
|
||||
**Endpoint:** `POST /api/v1/import-core`
|
||||
|
||||
**Query Parameters:**
|
||||
- `id`: Knowledge core identifier
|
||||
- `user`: User identifier
|
||||
|
||||
**Content-Type:** `application/octet-stream`
|
||||
|
||||
**Request Body:** MessagePack-encoded binary stream
|
||||
|
||||
### Import Process
|
||||
|
||||
1. **Stream Processing**: Reads binary data in 128KB chunks
|
||||
2. **MessagePack Decoding**: Unpacks binary data into structured messages
|
||||
3. **Knowledge Storage**: Stores data via Knowledge API
|
||||
4. **Response**: Returns success/error status
|
||||
|
||||
### Import Data Format
|
||||
|
||||
The import stream contains MessagePack-encoded tuples with type indicators:
|
||||
|
||||
#### Triples Data
|
||||
```python
|
||||
("t", {
|
||||
"m": { # metadata
|
||||
"i": "core-id",
|
||||
"m": [], # metadata triples
|
||||
"u": "user",
|
||||
"c": "collection"
|
||||
},
|
||||
"t": [ # triples array
|
||||
{
|
||||
"s": {"value": "subject", "is_uri": true},
|
||||
"p": {"value": "predicate", "is_uri": true},
|
||||
"o": {"value": "object", "is_uri": false}
|
||||
}
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
#### Graph Embeddings Data
|
||||
```python
|
||||
("ge", {
|
||||
"m": { # metadata
|
||||
"i": "core-id",
|
||||
"m": [], # metadata triples
|
||||
"u": "user",
|
||||
"c": "collection"
|
||||
},
|
||||
"e": [ # entities array
|
||||
{
|
||||
"e": {"value": "entity", "is_uri": true},
|
||||
"v": [[0.1, 0.2, 0.3]] # vectors
|
||||
}
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
## Export Endpoint
|
||||
|
||||
**Endpoint:** `GET /api/v1/export-core`
|
||||
|
||||
**Query Parameters:**
|
||||
- `id`: Knowledge core identifier
|
||||
- `user`: User identifier
|
||||
|
||||
**Content-Type:** `application/octet-stream`
|
||||
|
||||
**Response Body:** MessagePack-encoded binary stream
|
||||
|
||||
### Export Process
|
||||
|
||||
1. **Knowledge Retrieval**: Fetches data via Knowledge API
|
||||
2. **MessagePack Encoding**: Encodes data into binary format
|
||||
3. **Streaming Response**: Sends data as binary stream
|
||||
4. **Type Identification**: Uses type prefixes for data classification
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Import Knowledge Core
|
||||
|
||||
```bash
|
||||
# Import from file
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer your-token" \
|
||||
-H "Content-Type: application/octet-stream" \
|
||||
--data-binary @knowledge-core.msgpack \
|
||||
"http://api-gateway:8080/api/v1/import-core?id=core-123&user=alice"
|
||||
```
|
||||
|
||||
### Export Knowledge Core
|
||||
|
||||
```bash
|
||||
# Export to file
|
||||
curl -H "Authorization: Bearer your-token" \
|
||||
"http://api-gateway:8080/api/v1/export-core?id=core-123&user=alice" \
|
||||
-o knowledge-core.msgpack
|
||||
```
|
||||
|
||||
## Python Integration
|
||||
|
||||
### Import Example
|
||||
|
||||
```python
|
||||
import msgpack
|
||||
import requests
|
||||
|
||||
def import_knowledge_core(core_id, user, triples_data, embeddings_data, token):
|
||||
# Prepare data
|
||||
messages = []
|
||||
|
||||
# Add triples
|
||||
if triples_data:
|
||||
messages.append(("t", {
|
||||
"m": {
|
||||
"i": core_id,
|
||||
"m": [],
|
||||
"u": user,
|
||||
"c": "default"
|
||||
},
|
||||
"t": triples_data
|
||||
}))
|
||||
|
||||
# Add embeddings
|
||||
if embeddings_data:
|
||||
messages.append(("ge", {
|
||||
"m": {
|
||||
"i": core_id,
|
||||
"m": [],
|
||||
"u": user,
|
||||
"c": "default"
|
||||
},
|
||||
"e": embeddings_data
|
||||
}))
|
||||
|
||||
# Pack data
|
||||
binary_data = b''.join(msgpack.packb(msg) for msg in messages)
|
||||
|
||||
# Upload
|
||||
response = requests.post(
|
||||
f"http://api-gateway:8080/api/v1/import-core?id={core_id}&user={user}",
|
||||
headers={
|
||||
"Authorization": f"Bearer {token}",
|
||||
"Content-Type": "application/octet-stream"
|
||||
},
|
||||
data=binary_data
|
||||
)
|
||||
|
||||
return response.status_code == 200
|
||||
|
||||
# Usage
|
||||
triples = [
|
||||
{
|
||||
"s": {"value": "Person1", "is_uri": True},
|
||||
"p": {"value": "hasName", "is_uri": True},
|
||||
"o": {"value": "John Doe", "is_uri": False}
|
||||
}
|
||||
]
|
||||
|
||||
embeddings = [
|
||||
{
|
||||
"e": {"value": "Person1", "is_uri": True},
|
||||
"v": [[0.1, 0.2, 0.3, 0.4]]
|
||||
}
|
||||
]
|
||||
|
||||
success = import_knowledge_core("core-123", "alice", triples, embeddings, "your-token")
|
||||
```
|
||||
|
||||
### Export Example
|
||||
|
||||
```python
|
||||
import msgpack
|
||||
import requests
|
||||
|
||||
def export_knowledge_core(core_id, user, token):
|
||||
response = requests.get(
|
||||
f"http://api-gateway:8080/api/v1/export-core?id={core_id}&user={user}",
|
||||
headers={"Authorization": f"Bearer {token}"}
|
||||
)
|
||||
|
||||
if response.status_code != 200:
|
||||
return None
|
||||
|
||||
# Decode MessagePack stream
|
||||
data = response.content
|
||||
unpacker = msgpack.Unpacker()
|
||||
unpacker.feed(data)
|
||||
|
||||
triples = []
|
||||
embeddings = []
|
||||
|
||||
for unpacked in unpacker:
|
||||
msg_type, msg_data = unpacked
|
||||
|
||||
if msg_type == "t":
|
||||
triples.extend(msg_data["t"])
|
||||
elif msg_type == "ge":
|
||||
embeddings.extend(msg_data["e"])
|
||||
|
||||
return {
|
||||
"triples": triples,
|
||||
"embeddings": embeddings
|
||||
}
|
||||
|
||||
# Usage
|
||||
data = export_knowledge_core("core-123", "alice", "your-token")
|
||||
if data:
|
||||
print(f"Exported {len(data['triples'])} triples")
|
||||
print(f"Exported {len(data['embeddings'])} embeddings")
|
||||
```
|
||||
|
||||
## Data Format Specification
|
||||
|
||||
### MessagePack Tuples
|
||||
|
||||
Each message is a tuple: `(type_indicator, data_object)`
|
||||
|
||||
**Type Indicators:**
|
||||
- `"t"`: RDF triples data
|
||||
- `"ge"`: Graph embeddings data
|
||||
|
||||
### Metadata Structure
|
||||
|
||||
```python
|
||||
{
|
||||
"i": "core-identifier", # ID
|
||||
"m": [...], # Metadata triples array
|
||||
"u": "user-identifier", # User
|
||||
"c": "collection-name" # Collection
|
||||
}
|
||||
```
|
||||
|
||||
### Triple Structure
|
||||
|
||||
```python
|
||||
{
|
||||
"s": {"value": "subject", "is_uri": boolean},
|
||||
"p": {"value": "predicate", "is_uri": boolean},
|
||||
"o": {"value": "object", "is_uri": boolean}
|
||||
}
|
||||
```
|
||||
|
||||
### Entity Embedding Structure
|
||||
|
||||
```python
|
||||
{
|
||||
"e": {"value": "entity", "is_uri": boolean},
|
||||
"v": [[float, float, ...]] # Array of vectors
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Import Performance
|
||||
- **Streaming**: Processes data in 128KB chunks
|
||||
- **Memory Efficient**: Incremental unpacking
|
||||
- **Concurrent**: Multiple imports can run simultaneously
|
||||
- **Error Handling**: Robust error recovery
|
||||
|
||||
### Export Performance
|
||||
- **Direct Streaming**: Data streamed directly from knowledge store
|
||||
- **Efficient Encoding**: MessagePack for minimal overhead
|
||||
- **Large Dataset Support**: Handles cores of any size
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Import Errors
|
||||
- **Format Errors**: Invalid MessagePack data
|
||||
- **Type Errors**: Unknown type indicators
|
||||
- **Storage Errors**: Knowledge API failures
|
||||
- **Authentication**: Invalid user credentials
|
||||
|
||||
### Export Errors
|
||||
- **Not Found**: Core ID doesn't exist
|
||||
- **Access Denied**: User lacks permissions
|
||||
- **System Errors**: Knowledge API failures
|
||||
|
||||
### Error Responses
|
||||
- **HTTP 400**: Bad request (invalid parameters)
|
||||
- **HTTP 401**: Unauthorized access
|
||||
- **HTTP 404**: Core not found
|
||||
- **HTTP 500**: Internal server error
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Data Migration
|
||||
- **System Upgrades**: Export/import during system migrations
|
||||
- **Environment Sync**: Copy cores between environments
|
||||
- **Backup/Restore**: Full knowledge core backup operations
|
||||
|
||||
### Batch Processing
|
||||
- **Bulk Loading**: Load large knowledge datasets efficiently
|
||||
- **Data Integration**: Merge knowledge from multiple sources
|
||||
- **ETL Pipelines**: Extract-Transform-Load operations
|
||||
|
||||
### Performance Optimization
|
||||
- **Faster Than REST**: Binary format reduces transfer time
|
||||
- **Atomic Operations**: Complete import/export as single operation
|
||||
- **Resource Efficient**: Minimal memory footprint during transfer
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Authentication Required**: Bearer token authentication
|
||||
- **User Isolation**: Access restricted to user's own cores
|
||||
- **Data Validation**: Input validation on import
|
||||
- **Audit Logging**: Operations logged for security auditing
|
||||
252
docs/apis/api-document-embeddings.md
Normal file
252
docs/apis/api-document-embeddings.md
Normal file
|
|
@ -0,0 +1,252 @@
|
|||
# TrustGraph Document Embeddings API
|
||||
|
||||
This API provides import, export, and query capabilities for document embeddings. It handles
|
||||
document chunks with their vector embeddings and metadata, supporting both real-time WebSocket
|
||||
operations and request/response patterns.
|
||||
|
||||
## Schema Overview
|
||||
|
||||
### DocumentEmbeddings Structure
|
||||
- `metadata`: Document metadata (ID, user, collection, RDF triples)
|
||||
- `chunks`: Array of document chunks with embeddings
|
||||
|
||||
### ChunkEmbeddings Structure
|
||||
- `chunk`: Text chunk as bytes
|
||||
- `vectors`: Array of vector embeddings (Array of Array of Double)
|
||||
|
||||
### DocumentEmbeddingsRequest Structure
|
||||
- `vectors`: Query vector embeddings
|
||||
- `limit`: Maximum number of results
|
||||
- `user`: User identifier
|
||||
- `collection`: Collection identifier
|
||||
|
||||
### DocumentEmbeddingsResponse Structure
|
||||
- `error`: Error information if operation fails
|
||||
- `documents`: Array of matching documents as bytes
|
||||
|
||||
## Import/Export Operations
|
||||
|
||||
### Import - WebSocket Endpoint
|
||||
|
||||
**Endpoint:** `/api/v1/flow/{flow}/import/document-embeddings`
|
||||
|
||||
**Method:** WebSocket connection
|
||||
|
||||
**Request Format:**
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"id": "doc-123",
|
||||
"user": "alice",
|
||||
"collection": "research",
|
||||
"metadata": [
|
||||
{
|
||||
"s": {"v": "doc-123", "e": true},
|
||||
"p": {"v": "dc:title", "e": true},
|
||||
"o": {"v": "Research Paper", "e": false}
|
||||
}
|
||||
]
|
||||
},
|
||||
"chunks": [
|
||||
{
|
||||
"chunk": "This is the first chunk of the document...",
|
||||
"vectors": [
|
||||
[0.1, 0.2, 0.3, 0.4],
|
||||
[0.5, 0.6, 0.7, 0.8]
|
||||
]
|
||||
},
|
||||
{
|
||||
"chunk": "This is the second chunk...",
|
||||
"vectors": [
|
||||
[0.9, 0.8, 0.7, 0.6],
|
||||
[0.5, 0.4, 0.3, 0.2]
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** Import operations are fire-and-forget with no response payload.
|
||||
|
||||
### Export - WebSocket Endpoint
|
||||
|
||||
**Endpoint:** `/api/v1/flow/{flow}/export/document-embeddings`
|
||||
|
||||
**Method:** WebSocket connection
|
||||
|
||||
The export endpoint streams document embeddings data in real-time. Each message contains:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"id": "doc-123",
|
||||
"user": "alice",
|
||||
"collection": "research",
|
||||
"metadata": [
|
||||
{
|
||||
"s": {"v": "doc-123", "e": true},
|
||||
"p": {"v": "dc:title", "e": true},
|
||||
"o": {"v": "Research Paper", "e": false}
|
||||
}
|
||||
]
|
||||
},
|
||||
"chunks": [
|
||||
{
|
||||
"chunk": "Decoded text content of chunk",
|
||||
"vectors": [[0.1, 0.2, 0.3, 0.4]]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Query Operations
|
||||
|
||||
### Query Document Embeddings
|
||||
|
||||
**Purpose:** Find documents similar to provided vector embeddings
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"vectors": [
|
||||
[0.1, 0.2, 0.3, 0.4, 0.5],
|
||||
[0.6, 0.7, 0.8, 0.9, 1.0]
|
||||
],
|
||||
"limit": 10,
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"documents": [
|
||||
"base64-encoded-document-1",
|
||||
"base64-encoded-document-2"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## WebSocket Usage Examples
|
||||
|
||||
### Importing Document Embeddings
|
||||
|
||||
```javascript
|
||||
// Connect to import endpoint
|
||||
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/import/document-embeddings');
|
||||
|
||||
// Send document embeddings
|
||||
ws.send(JSON.stringify({
|
||||
metadata: {
|
||||
id: "doc-123",
|
||||
user: "alice",
|
||||
collection: "research"
|
||||
},
|
||||
chunks: [
|
||||
{
|
||||
chunk: "Document content chunk 1",
|
||||
vectors: [[0.1, 0.2, 0.3]]
|
||||
}
|
||||
]
|
||||
}));
|
||||
```
|
||||
|
||||
### Exporting Document Embeddings
|
||||
|
||||
```javascript
|
||||
// Connect to export endpoint
|
||||
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/export/document-embeddings');
|
||||
|
||||
// Listen for exported data
|
||||
ws.onmessage = (event) => {
|
||||
const documentEmbeddings = JSON.parse(event.data);
|
||||
console.log('Received document:', documentEmbeddings.metadata.id);
|
||||
console.log('Chunks:', documentEmbeddings.chunks.length);
|
||||
};
|
||||
```
|
||||
|
||||
## Data Format Details
|
||||
|
||||
### Metadata Format
|
||||
Each metadata triple contains:
|
||||
- `s`: Subject (object with `v` for value and `e` for is_entity boolean)
|
||||
- `p`: Predicate (object with `v` for value and `e` for is_entity boolean)
|
||||
- `o`: Object (object with `v` for value and `e` for is_entity boolean)
|
||||
|
||||
### Vector Format
|
||||
- Vectors are arrays of floating-point numbers
|
||||
- Each chunk can have multiple vectors (different embedding models)
|
||||
- Vectors should be consistently dimensioned within a collection
|
||||
|
||||
### Text Encoding
|
||||
- Chunk text is handled as UTF-8 encoded bytes internally
|
||||
- WebSocket API accepts/returns plain text strings
|
||||
- Base64 encoding used for binary data in query responses
|
||||
|
||||
## Python SDK
|
||||
|
||||
```python
|
||||
from trustgraph.clients.document_embeddings_client import DocumentEmbeddingsClient
|
||||
|
||||
# Create client
|
||||
client = DocumentEmbeddingsClient()
|
||||
|
||||
# Query similar documents
|
||||
request = {
|
||||
"vectors": [[0.1, 0.2, 0.3, 0.4]],
|
||||
"limit": 5,
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
}
|
||||
|
||||
response = await client.query(request)
|
||||
documents = response.documents
|
||||
```
|
||||
|
||||
## Integration with TrustGraph
|
||||
|
||||
### Storage Integration
|
||||
- Document embeddings are stored in vector databases
|
||||
- Metadata is cross-referenced with knowledge graph
|
||||
- Supports multi-tenant isolation by user and collection
|
||||
|
||||
### Processing Pipeline
|
||||
1. **Document Ingestion**: Text documents loaded via text-load API
|
||||
2. **Chunking**: Documents split into manageable chunks
|
||||
3. **Embedding Generation**: Vector embeddings created for each chunk
|
||||
4. **Storage**: Embeddings stored via import API
|
||||
5. **Retrieval**: Similar documents found via query API
|
||||
|
||||
### Use Cases
|
||||
- **Semantic Search**: Find documents similar to query embeddings
|
||||
- **RAG Systems**: Retrieve relevant document chunks for question answering
|
||||
- **Document Clustering**: Group similar documents using embeddings
|
||||
- **Content Recommendations**: Suggest related documents to users
|
||||
- **Knowledge Discovery**: Find connections between document collections
|
||||
|
||||
## Error Handling
|
||||
|
||||
Common error scenarios:
|
||||
- Invalid vector dimensions
|
||||
- Missing required metadata fields
|
||||
- User/collection access restrictions
|
||||
- WebSocket connection failures
|
||||
- Malformed JSON data
|
||||
|
||||
Errors are returned in the response `error` field:
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "ValidationError",
|
||||
"message": "Invalid vector dimensions"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **Batch Processing**: Import multiple documents in single WebSocket session
|
||||
- **Vector Dimensions**: Consistent embedding dimensions improve performance
|
||||
- **Collection Sizing**: Limit collections to reasonable sizes for query performance
|
||||
- **Real-time vs Batch**: Choose appropriate method based on use case requirements
|
||||
96
docs/apis/api-document-rag.md
Normal file
96
docs/apis/api-document-rag.md
Normal file
|
|
@ -0,0 +1,96 @@
|
|||
# TrustGraph Document RAG API
|
||||
|
||||
This presents a prompt to the Document RAG service and retrieves the answer.
|
||||
This makes use of a number of the other APIs behind the scenes:
|
||||
Embeddings, Document Embeddings, Prompt, TextCompletion, Triples Query.
|
||||
|
||||
## Request/response
|
||||
|
||||
### Request
|
||||
|
||||
The request contains the following fields:
|
||||
- `query`: The question to answer
|
||||
|
||||
### Response
|
||||
|
||||
The response contains the following fields:
|
||||
- `response`: LLM response
|
||||
|
||||
## REST service
|
||||
|
||||
The REST service accepts a request object containing the `query` field.
|
||||
The response is a JSON object containing the `response` field.
|
||||
|
||||
e.g.
|
||||
|
||||
Request:
|
||||
```
|
||||
{
|
||||
"query": "What does NASA stand for?"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```
|
||||
{
|
||||
"response": "National Aeronautics and Space Administration"
|
||||
}
|
||||
```
|
||||
|
||||
## Websocket
|
||||
|
||||
Requests have a `request` object containing the `query` field.
|
||||
Responses have a `response` object containing `response` field.
|
||||
|
||||
e.g.
|
||||
|
||||
Request:
|
||||
|
||||
```
|
||||
{
|
||||
"id": "blrqotfefnmnh7de-14",
|
||||
"service": "document-rag",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"query": "What does NASA stand for?"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```
|
||||
{
|
||||
"id": "blrqotfefnmnh7de-14",
|
||||
"response": {
|
||||
"response": "National Aeronautics and Space Administration"
|
||||
},
|
||||
"complete": true
|
||||
}
|
||||
```
|
||||
|
||||
## Pulsar
|
||||
|
||||
The Pulsar schema for the Document RAG API is defined in Python code here:
|
||||
|
||||
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/retrieval.py
|
||||
|
||||
Default request queue:
|
||||
`non-persistent://tg/request/document-rag`
|
||||
|
||||
Default response queue:
|
||||
`non-persistent://tg/response/document-rag`
|
||||
|
||||
Request schema:
|
||||
`trustgraph.schema.DocumentRagQuery`
|
||||
|
||||
Response schema:
|
||||
`trustgraph.schema.DocumentRagResponse`
|
||||
|
||||
## Pulsar Python client
|
||||
|
||||
The client class is
|
||||
`trustgraph.clients.DocumentRagClient`
|
||||
|
||||
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/clients/document_rag_client.py
|
||||
|
|
@ -10,7 +10,7 @@ The request contains the following fields:
|
|||
|
||||
### Response
|
||||
|
||||
The request contains the following fields:
|
||||
The response contains the following fields:
|
||||
- `vectors`: Embeddings response, an array of arrays. An embedding is
|
||||
an array of floating-point numbers. As multiple embeddings may be
|
||||
returned, an array of embeddings is returned, hence an array
|
||||
|
|
@ -51,6 +51,7 @@ Request:
|
|||
{
|
||||
"id": "qgzw1287vfjc8wsk-2",
|
||||
"service": "embeddings",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"text": "What is a cat?"
|
||||
}
|
||||
|
|
|
|||
259
docs/apis/api-entity-contexts.md
Normal file
259
docs/apis/api-entity-contexts.md
Normal file
|
|
@ -0,0 +1,259 @@
|
|||
# TrustGraph Entity Contexts API
|
||||
|
||||
This API provides import and export capabilities for entity contexts data. Entity contexts
|
||||
associate entities with their textual context information, commonly used for entity
|
||||
descriptions, definitions, or explanatory text in knowledge graphs.
|
||||
|
||||
## Schema Overview
|
||||
|
||||
### EntityContext Structure
|
||||
- `entity`: Entity identifier (Value object with value, is_uri, type)
|
||||
- `context`: Textual context or description string
|
||||
|
||||
### EntityContexts Structure
|
||||
- `metadata`: Metadata including ID, user, collection, and RDF triples
|
||||
- `entities`: Array of EntityContext objects
|
||||
|
||||
### Value Structure
|
||||
- `value`: The entity value as string
|
||||
- `is_uri`: Boolean indicating if the value is a URI
|
||||
- `type`: Data type of the value (optional)
|
||||
|
||||
## Import/Export Operations
|
||||
|
||||
### Import - WebSocket Endpoint
|
||||
|
||||
**Endpoint:** `/api/v1/flow/{flow}/import/entity-contexts`
|
||||
|
||||
**Method:** WebSocket connection
|
||||
|
||||
**Request Format:**
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"id": "context-batch-123",
|
||||
"user": "alice",
|
||||
"collection": "research",
|
||||
"metadata": [
|
||||
{
|
||||
"s": {"value": "source-doc", "is_uri": true},
|
||||
"p": {"value": "dc:title", "is_uri": true},
|
||||
"o": {"value": "Research Paper", "is_uri": false}
|
||||
}
|
||||
]
|
||||
},
|
||||
"entities": [
|
||||
{
|
||||
"entity": {
|
||||
"v": "https://example.com/Person/JohnDoe",
|
||||
"e": true
|
||||
},
|
||||
"context": "John Doe is a researcher at MIT specializing in artificial intelligence and machine learning."
|
||||
},
|
||||
{
|
||||
"entity": {
|
||||
"v": "https://example.com/Organization/MIT",
|
||||
"e": true
|
||||
},
|
||||
"context": "Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts."
|
||||
},
|
||||
{
|
||||
"entity": {
|
||||
"v": "machine learning",
|
||||
"e": false
|
||||
},
|
||||
"context": "Machine learning is a method of data analysis that automates analytical model building using algorithms."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** Import operations are fire-and-forget with no response payload.
|
||||
|
||||
### Export - WebSocket Endpoint
|
||||
|
||||
**Endpoint:** `/api/v1/flow/{flow}/export/entity-contexts`
|
||||
|
||||
**Method:** WebSocket connection
|
||||
|
||||
The export endpoint streams entity contexts data in real-time. Each message contains:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"id": "context-batch-123",
|
||||
"user": "alice",
|
||||
"collection": "research",
|
||||
"metadata": [
|
||||
{
|
||||
"s": {"value": "source-doc", "is_uri": true},
|
||||
"p": {"value": "dc:title", "is_uri": true},
|
||||
"o": {"value": "Research Paper", "is_uri": false}
|
||||
}
|
||||
]
|
||||
},
|
||||
"entities": [
|
||||
{
|
||||
"entity": {
|
||||
"v": "https://example.com/Person/JohnDoe",
|
||||
"e": true
|
||||
},
|
||||
"context": "John Doe is a researcher at MIT specializing in artificial intelligence."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## WebSocket Usage Examples
|
||||
|
||||
### Importing Entity Contexts
|
||||
|
||||
```javascript
|
||||
// Connect to import endpoint
|
||||
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/import/entity-contexts');
|
||||
|
||||
// Send entity contexts
|
||||
ws.send(JSON.stringify({
|
||||
metadata: {
|
||||
id: "context-batch-1",
|
||||
user: "alice",
|
||||
collection: "research"
|
||||
},
|
||||
entities: [
|
||||
{
|
||||
entity: {
|
||||
v: "Albert Einstein",
|
||||
e: false
|
||||
},
|
||||
context: "Albert Einstein was a German-born theoretical physicist widely acknowledged to be one of the greatest physicists of all time."
|
||||
}
|
||||
]
|
||||
}));
|
||||
```
|
||||
|
||||
### Exporting Entity Contexts
|
||||
|
||||
```javascript
|
||||
// Connect to export endpoint
|
||||
const ws = new WebSocket('ws://api-gateway:8080/api/v1/flow/my-flow/export/entity-contexts');
|
||||
|
||||
// Listen for exported data
|
||||
ws.onmessage = (event) => {
|
||||
const entityContexts = JSON.parse(event.data);
|
||||
console.log('Received contexts for', entityContexts.entities.length, 'entities');
|
||||
|
||||
entityContexts.entities.forEach(item => {
|
||||
console.log('Entity:', item.entity.v);
|
||||
console.log('Context:', item.context);
|
||||
});
|
||||
};
|
||||
```
|
||||
|
||||
## Data Format Details
|
||||
|
||||
### Entity Format
|
||||
The `entity` field uses the Value structure:
|
||||
- `v`: The entity value (name, URI, identifier)
|
||||
- `e`: Boolean indicating if it's a URI entity (true) or literal (false)
|
||||
- `type`: Optional data type specification
|
||||
|
||||
### Context Format
|
||||
- Plain text string providing description or context
|
||||
- Can include definitions, explanations, or background information
|
||||
- Supports multi-sentence descriptions and detailed context
|
||||
|
||||
### Metadata Format
|
||||
Each metadata triple contains:
|
||||
- `s`: Subject (object with `value` and `is_uri` fields)
|
||||
- `p`: Predicate (object with `value` and `is_uri` fields)
|
||||
- `o`: Object (object with `value` and `is_uri` fields)
|
||||
|
||||
## Integration with TrustGraph
|
||||
|
||||
### Storage Integration
|
||||
- Entity contexts are stored in graph databases
|
||||
- Links entities to their descriptive text
|
||||
- Supports multi-tenant isolation by user and collection
|
||||
|
||||
### Processing Pipeline
|
||||
1. **Text Analysis**: Extract entities from documents
|
||||
2. **Context Extraction**: Identify descriptive text for entities
|
||||
3. **Entity Linking**: Associate entities with their contexts
|
||||
4. **Import**: Store entity-context pairs via import API
|
||||
5. **Knowledge Enhancement**: Use contexts for better entity understanding
|
||||
|
||||
### Use Cases
|
||||
- **Entity Disambiguation**: Provide context to distinguish similar entities
|
||||
- **Knowledge Base Enhancement**: Add descriptive information to entities
|
||||
- **Question Answering**: Use entity contexts to provide detailed answers
|
||||
- **Entity Summarization**: Generate summaries based on collected contexts
|
||||
- **Knowledge Graph Visualization**: Display rich entity information
|
||||
|
||||
## Authentication
|
||||
|
||||
Both import and export endpoints support authentication:
|
||||
- API token authentication via Authorization header
|
||||
- Flow-based access control
|
||||
- User and collection isolation
|
||||
|
||||
## Error Handling
|
||||
|
||||
Common error scenarios:
|
||||
- Invalid JSON format
|
||||
- Missing required metadata fields
|
||||
- User/collection access restrictions
|
||||
- WebSocket connection failures
|
||||
- Invalid entity value formats
|
||||
|
||||
Errors are typically handled at the WebSocket connection level with connection termination or error messages.
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **Batch Processing**: Import multiple entity contexts in single messages
|
||||
- **Context Length**: Balance detailed context with performance
|
||||
- **Flow Capacity**: Ensure target flow can handle entity context volume
|
||||
- **Real-time vs Batch**: Choose appropriate method based on use case
|
||||
|
||||
## Python Integration
|
||||
|
||||
While no direct Python SDK is mentioned in the codebase, integration can be achieved through:
|
||||
|
||||
```python
|
||||
import websocket
|
||||
import json
|
||||
|
||||
# Connect to import endpoint
|
||||
def import_entity_contexts(flow_id, contexts_data):
|
||||
ws_url = f"ws://api-gateway:8080/api/v1/flow/{flow_id}/import/entity-contexts"
|
||||
ws = websocket.create_connection(ws_url)
|
||||
|
||||
# Send data
|
||||
ws.send(json.dumps(contexts_data))
|
||||
ws.close()
|
||||
|
||||
# Usage example
|
||||
contexts = {
|
||||
"metadata": {
|
||||
"id": "batch-1",
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
},
|
||||
"entities": [
|
||||
{
|
||||
"entity": {"v": "Neural Networks", "e": False},
|
||||
"context": "Neural networks are computing systems inspired by biological neural networks."
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
import_entity_contexts("my-flow", contexts)
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Real-time Streaming**: WebSocket-based import/export for live data flow
|
||||
- **Batch Operations**: Process multiple entity contexts efficiently
|
||||
- **Rich Metadata**: Full metadata support with RDF triples
|
||||
- **Entity Types**: Support for both URI entities and literal values
|
||||
- **Flow Integration**: Direct integration with TrustGraph processing flows
|
||||
- **Multi-tenant Support**: User and collection-based data isolation
|
||||
252
docs/apis/api-flow.md
Normal file
252
docs/apis/api-flow.md
Normal file
|
|
@ -0,0 +1,252 @@
|
|||
# TrustGraph Flow API
|
||||
|
||||
This API provides workflow management for TrustGraph components. It manages flow classes
|
||||
(workflow templates) and flow instances (active running workflows) that orchestrate
|
||||
complex data processing pipelines.
|
||||
|
||||
## Request/response
|
||||
|
||||
### Request
|
||||
|
||||
The request contains the following fields:
|
||||
- `operation`: The operation to perform (see operations below)
|
||||
- `class_name`: Flow class name (for class operations and start-flow)
|
||||
- `class_definition`: Flow class definition JSON (for put-class)
|
||||
- `description`: Flow description (for start-flow)
|
||||
- `flow_id`: Flow instance ID (for flow instance operations)
|
||||
|
||||
### Response
|
||||
|
||||
The response contains the following fields:
|
||||
- `class_names`: Array of flow class names (returned by list-classes)
|
||||
- `flow_ids`: Array of active flow IDs (returned by list-flows)
|
||||
- `class_definition`: Flow class definition JSON (returned by get-class)
|
||||
- `flow`: Flow instance JSON (returned by get-flow)
|
||||
- `description`: Flow description (returned by get-flow)
|
||||
- `error`: Error information if operation fails
|
||||
|
||||
## Operations
|
||||
|
||||
### Flow Class Operations
|
||||
|
||||
#### LIST-CLASSES - List All Flow Classes
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "list-classes"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"class_names": ["pdf-processor", "text-analyzer", "knowledge-extractor"]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET-CLASS - Get Flow Class Definition
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "get-class",
|
||||
"class_name": "pdf-processor"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"class_definition": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion\", \"response\": \"persistent://tg/response/text-completion\"}}, \"description\": \"PDF processing workflow\"}"
|
||||
}
|
||||
```
|
||||
|
||||
#### PUT-CLASS - Create/Update Flow Class
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "put-class",
|
||||
"class_name": "pdf-processor",
|
||||
"class_definition": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion\", \"response\": \"persistent://tg/response/text-completion\"}}, \"description\": \"PDF processing workflow\"}"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
#### DELETE-CLASS - Remove Flow Class
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "delete-class",
|
||||
"class_name": "pdf-processor"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
### Flow Instance Operations
|
||||
|
||||
#### LIST-FLOWS - List Active Flow Instances
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "list-flows"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"flow_ids": ["flow-123", "flow-456", "flow-789"]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET-FLOW - Get Flow Instance
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "get-flow",
|
||||
"flow_id": "flow-123"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"flow": "{\"interfaces\": {\"text-completion\": {\"request\": \"persistent://tg/request/text-completion-flow-123\", \"response\": \"persistent://tg/response/text-completion-flow-123\"}}}",
|
||||
"description": "PDF processing workflow instance"
|
||||
}
|
||||
```
|
||||
|
||||
#### START-FLOW - Start Flow Instance
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "start-flow",
|
||||
"class_name": "pdf-processor",
|
||||
"flow_id": "flow-123",
|
||||
"description": "Processing document batch 1"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
#### STOP-FLOW - Stop Flow Instance
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "stop-flow",
|
||||
"flow_id": "flow-123"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
## REST service
|
||||
|
||||
The REST service is available at `/api/v1/flow` and accepts the above request formats.
|
||||
|
||||
## Websocket
|
||||
|
||||
Requests have a `request` object containing the operation fields.
|
||||
Responses have a `response` object containing the response fields.
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"id": "unique-request-id",
|
||||
"service": "flow",
|
||||
"request": {
|
||||
"operation": "list-classes"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"id": "unique-request-id",
|
||||
"response": {
|
||||
"class_names": ["pdf-processor", "text-analyzer"]
|
||||
},
|
||||
"complete": true
|
||||
}
|
||||
```
|
||||
|
||||
## Pulsar
|
||||
|
||||
The Pulsar schema for the Flow API is defined in Python code here:
|
||||
|
||||
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/flows.py
|
||||
|
||||
Default request queue:
|
||||
`non-persistent://tg/request/flow`
|
||||
|
||||
Default response queue:
|
||||
`non-persistent://tg/response/flow`
|
||||
|
||||
Request schema:
|
||||
`trustgraph.schema.FlowRequest`
|
||||
|
||||
Response schema:
|
||||
`trustgraph.schema.FlowResponse`
|
||||
|
||||
## Python SDK
|
||||
|
||||
The Python SDK provides convenient access to the Flow API:
|
||||
|
||||
```python
|
||||
from trustgraph.api.flow import FlowClient
|
||||
|
||||
client = FlowClient()
|
||||
|
||||
# List all flow classes
|
||||
classes = await client.list_classes()
|
||||
|
||||
# Get a flow class definition
|
||||
definition = await client.get_class("pdf-processor")
|
||||
|
||||
# Start a flow instance
|
||||
await client.start_flow("pdf-processor", "flow-123", "Processing batch 1")
|
||||
|
||||
# List active flows
|
||||
flows = await client.list_flows()
|
||||
|
||||
# Stop a flow instance
|
||||
await client.stop_flow("flow-123")
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Flow Classes**: Templates that define workflow structure and interfaces
|
||||
- **Flow Instances**: Active running workflows based on flow classes
|
||||
- **Dynamic Management**: Flows can be started/stopped dynamically
|
||||
- **Template Processing**: Uses template replacement for customizing flow instances
|
||||
- **Integration**: Works with TrustGraph ecosystem for data processing pipelines
|
||||
- **Persistent Storage**: Flow definitions and instances stored for reliability
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Document Processing**: Orchestrating PDF processing through chunking, extraction, and storage
|
||||
- **Knowledge Extraction**: Managing workflows for relationship and definition extraction
|
||||
- **Data Pipelines**: Coordinating complex multi-step data processing workflows
|
||||
- **Resource Management**: Dynamically scaling processing flows based on demand
|
||||
|
|
@ -17,7 +17,7 @@ The request contains the following fields:
|
|||
|
||||
### Response
|
||||
|
||||
The request contains the following fields:
|
||||
The response contains the following fields:
|
||||
- `entities`: An array of graph entities. The entity type is described here:
|
||||
|
||||
TrustGraph uses the same schema for knowledge graph elements:
|
||||
|
|
@ -85,6 +85,7 @@ Request:
|
|||
{
|
||||
"id": "qgzw1287vfjc8wsk-3",
|
||||
"service": "graph-embeddings-query",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"vectors": [
|
||||
[
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ The request contains the following fields:
|
|||
|
||||
### Response
|
||||
|
||||
The request contains the following fields:
|
||||
The response contains the following fields:
|
||||
- `response`: LLM response
|
||||
|
||||
## REST service
|
||||
|
|
@ -52,6 +52,7 @@ Request:
|
|||
{
|
||||
"id": "blrqotfefnmnh7de-14",
|
||||
"service": "graph-rag",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"query": "What does NASA stand for?"
|
||||
}
|
||||
|
|
|
|||
310
docs/apis/api-knowledge.md
Normal file
310
docs/apis/api-knowledge.md
Normal file
|
|
@ -0,0 +1,310 @@
|
|||
# TrustGraph Knowledge API
|
||||
|
||||
This API provides knowledge graph management for TrustGraph. It handles storage, retrieval,
|
||||
and flow integration of knowledge cores containing RDF triples and graph embeddings with
|
||||
multi-tenant support.
|
||||
|
||||
## Request/response
|
||||
|
||||
### Request
|
||||
|
||||
The request contains the following fields:
|
||||
- `operation`: The operation to perform (see operations below)
|
||||
- `user`: User identifier (for user-specific operations)
|
||||
- `id`: Knowledge core identifier
|
||||
- `flow`: Flow identifier (for load operations)
|
||||
- `collection`: Collection identifier (for load operations)
|
||||
- `triples`: RDF triples data (for put operations)
|
||||
- `graph_embeddings`: Graph embeddings data (for put operations)
|
||||
|
||||
### Response
|
||||
|
||||
The response contains the following fields:
|
||||
- `error`: Error information if operation fails
|
||||
- `ids`: Array of knowledge core IDs (returned by list operation)
|
||||
- `eos`: End of stream indicator for streaming responses
|
||||
- `triples`: RDF triples data (returned by get operation)
|
||||
- `graph_embeddings`: Graph embeddings data (returned by get operation)
|
||||
|
||||
## Operations
|
||||
|
||||
### PUT-KG-CORE - Store Knowledge Core
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "put-kg-core",
|
||||
"user": "alice",
|
||||
"id": "core-123",
|
||||
"triples": {
|
||||
"metadata": {
|
||||
"id": "core-123",
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
},
|
||||
"triples": [
|
||||
{
|
||||
"s": {"value": "Person1", "is_uri": true},
|
||||
"p": {"value": "hasName", "is_uri": true},
|
||||
"o": {"value": "John Doe", "is_uri": false}
|
||||
},
|
||||
{
|
||||
"s": {"value": "Person1", "is_uri": true},
|
||||
"p": {"value": "worksAt", "is_uri": true},
|
||||
"o": {"value": "Company1", "is_uri": true}
|
||||
}
|
||||
]
|
||||
},
|
||||
"graph_embeddings": {
|
||||
"metadata": {
|
||||
"id": "core-123",
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
},
|
||||
"entities": [
|
||||
{
|
||||
"entity": {"value": "Person1", "is_uri": true},
|
||||
"vectors": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
### GET-KG-CORE - Retrieve Knowledge Core
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "get-kg-core",
|
||||
"id": "core-123"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"triples": {
|
||||
"metadata": {
|
||||
"id": "core-123",
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
},
|
||||
"triples": [
|
||||
{
|
||||
"s": {"value": "Person1", "is_uri": true},
|
||||
"p": {"value": "hasName", "is_uri": true},
|
||||
"o": {"value": "John Doe", "is_uri": false}
|
||||
}
|
||||
]
|
||||
},
|
||||
"graph_embeddings": {
|
||||
"metadata": {
|
||||
"id": "core-123",
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
},
|
||||
"entities": [
|
||||
{
|
||||
"entity": {"value": "Person1", "is_uri": true},
|
||||
"vectors": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### LIST-KG-CORES - List Knowledge Cores
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "list-kg-cores",
|
||||
"user": "alice"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"ids": ["core-123", "core-456", "core-789"]
|
||||
}
|
||||
```
|
||||
|
||||
### DELETE-KG-CORE - Delete Knowledge Core
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "delete-kg-core",
|
||||
"user": "alice",
|
||||
"id": "core-123"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
### LOAD-KG-CORE - Load Knowledge Core into Flow
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "load-kg-core",
|
||||
"id": "core-123",
|
||||
"flow": "qa-flow",
|
||||
"collection": "research"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
### UNLOAD-KG-CORE - Unload Knowledge Core from Flow
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "unload-kg-core",
|
||||
"id": "core-123"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
## Data Structures
|
||||
|
||||
### Triple Structure
|
||||
Each RDF triple contains:
|
||||
- `s`: Subject (Value object)
|
||||
- `p`: Predicate (Value object)
|
||||
- `o`: Object (Value object)
|
||||
|
||||
### Value Structure
|
||||
- `value`: The actual value as string
|
||||
- `is_uri`: Boolean indicating if value is a URI
|
||||
- `type`: Data type of the value (optional)
|
||||
|
||||
### Triples Structure
|
||||
- `metadata`: Metadata including ID, user, collection
|
||||
- `triples`: Array of Triple objects
|
||||
|
||||
### Graph Embeddings Structure
|
||||
- `metadata`: Metadata including ID, user, collection
|
||||
- `entities`: Array of EntityEmbeddings objects
|
||||
|
||||
### Entity Embeddings Structure
|
||||
- `entity`: The entity being embedded (Value object)
|
||||
- `vectors`: Array of vector embeddings (Array of Array of Double)
|
||||
|
||||
## REST service
|
||||
|
||||
The REST service is available at `/api/v1/knowledge` and accepts the above request formats.
|
||||
|
||||
## Websocket
|
||||
|
||||
Requests have a `request` object containing the operation fields.
|
||||
Responses have a `response` object containing the response fields.
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"id": "unique-request-id",
|
||||
"service": "knowledge",
|
||||
"request": {
|
||||
"operation": "list-kg-cores",
|
||||
"user": "alice"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"id": "unique-request-id",
|
||||
"response": {
|
||||
"ids": ["core-123", "core-456"]
|
||||
},
|
||||
"complete": true
|
||||
}
|
||||
```
|
||||
|
||||
## Pulsar
|
||||
|
||||
The Pulsar schema for the Knowledge API is defined in Python code here:
|
||||
|
||||
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/knowledge.py
|
||||
|
||||
Default request queue:
|
||||
`non-persistent://tg/request/knowledge`
|
||||
|
||||
Default response queue:
|
||||
`non-persistent://tg/response/knowledge`
|
||||
|
||||
Request schema:
|
||||
`trustgraph.schema.KnowledgeRequest`
|
||||
|
||||
Response schema:
|
||||
`trustgraph.schema.KnowledgeResponse`
|
||||
|
||||
## Python SDK
|
||||
|
||||
The Python SDK provides convenient access to the Knowledge API:
|
||||
|
||||
```python
|
||||
from trustgraph.api.knowledge import KnowledgeClient
|
||||
|
||||
client = KnowledgeClient()
|
||||
|
||||
# List knowledge cores
|
||||
cores = await client.list_kg_cores("alice")
|
||||
|
||||
# Get a knowledge core
|
||||
core = await client.get_kg_core("core-123")
|
||||
|
||||
# Store a knowledge core
|
||||
await client.put_kg_core(
|
||||
user="alice",
|
||||
id="core-123",
|
||||
triples=triples_data,
|
||||
graph_embeddings=embeddings_data
|
||||
)
|
||||
|
||||
# Load core into flow
|
||||
await client.load_kg_core("core-123", "qa-flow", "research")
|
||||
|
||||
# Delete a knowledge core
|
||||
await client.delete_kg_core("alice", "core-123")
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Knowledge Core Management**: Store, retrieve, list, and delete knowledge cores
|
||||
- **Dual Data Types**: Support for both RDF triples and graph embeddings
|
||||
- **Flow Integration**: Load knowledge cores into processing flows
|
||||
- **Multi-tenant Support**: User-specific knowledge cores with isolation
|
||||
- **Streaming Support**: Efficient transfer of large knowledge cores
|
||||
- **Collection Organization**: Group knowledge cores by collection
|
||||
- **Semantic Reasoning**: RDF triples enable symbolic reasoning
|
||||
- **Vector Similarity**: Graph embeddings enable neural approaches
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Knowledge Base Construction**: Build semantic knowledge graphs from documents
|
||||
- **Question Answering**: Load knowledge cores for graph-based QA systems
|
||||
- **Semantic Search**: Use embeddings for similarity-based knowledge retrieval
|
||||
- **Multi-domain Knowledge**: Organize knowledge by user and collection
|
||||
- **Hybrid Reasoning**: Combine symbolic (triples) and neural (embeddings) approaches
|
||||
- **Knowledge Transfer**: Export and import knowledge cores between systems
|
||||
360
docs/apis/api-librarian.md
Normal file
360
docs/apis/api-librarian.md
Normal file
|
|
@ -0,0 +1,360 @@
|
|||
# TrustGraph Librarian API
|
||||
|
||||
This API provides document library management for TrustGraph. It handles document storage,
|
||||
metadata management, and processing orchestration using hybrid storage (MinIO for content,
|
||||
Cassandra for metadata) with multi-user support.
|
||||
|
||||
## Request/response
|
||||
|
||||
### Request
|
||||
|
||||
The request contains the following fields:
|
||||
- `operation`: The operation to perform (see operations below)
|
||||
- `document_id`: Document identifier (for document operations)
|
||||
- `document_metadata`: Document metadata object (for add/update operations)
|
||||
- `content`: Document content as base64-encoded bytes (for add operations)
|
||||
- `processing_id`: Processing job identifier (for processing operations)
|
||||
- `processing_metadata`: Processing metadata object (for add-processing)
|
||||
- `user`: User identifier (required for most operations)
|
||||
- `collection`: Collection filter (optional for list operations)
|
||||
- `criteria`: Query criteria array (for filtering operations)
|
||||
|
||||
### Response
|
||||
|
||||
The response contains the following fields:
|
||||
- `error`: Error information if operation fails
|
||||
- `document_metadata`: Single document metadata (for get operations)
|
||||
- `content`: Document content as base64-encoded bytes (for get-content)
|
||||
- `document_metadatas`: Array of document metadata (for list operations)
|
||||
- `processing_metadatas`: Array of processing metadata (for list-processing)
|
||||
|
||||
## Document Operations
|
||||
|
||||
### ADD-DOCUMENT - Add Document to Library
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "add-document",
|
||||
"document_metadata": {
|
||||
"id": "doc-123",
|
||||
"time": 1640995200000,
|
||||
"kind": "application/pdf",
|
||||
"title": "Research Paper",
|
||||
"comments": "Important research findings",
|
||||
"user": "alice",
|
||||
"tags": ["research", "ai", "machine-learning"],
|
||||
"metadata": [
|
||||
{
|
||||
"subject": "doc-123",
|
||||
"predicate": "dc:creator",
|
||||
"object": "Dr. Smith"
|
||||
}
|
||||
]
|
||||
},
|
||||
"content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
### GET-DOCUMENT-METADATA - Get Document Metadata
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "get-document-metadata",
|
||||
"document_id": "doc-123",
|
||||
"user": "alice"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"document_metadata": {
|
||||
"id": "doc-123",
|
||||
"time": 1640995200000,
|
||||
"kind": "application/pdf",
|
||||
"title": "Research Paper",
|
||||
"comments": "Important research findings",
|
||||
"user": "alice",
|
||||
"tags": ["research", "ai", "machine-learning"],
|
||||
"metadata": [
|
||||
{
|
||||
"subject": "doc-123",
|
||||
"predicate": "dc:creator",
|
||||
"object": "Dr. Smith"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GET-DOCUMENT-CONTENT - Get Document Content
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "get-document-content",
|
||||
"document_id": "doc-123",
|
||||
"user": "alice"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
|
||||
}
|
||||
```
|
||||
|
||||
### LIST-DOCUMENTS - List User's Documents
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "list-documents",
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"document_metadatas": [
|
||||
{
|
||||
"id": "doc-123",
|
||||
"time": 1640995200000,
|
||||
"kind": "application/pdf",
|
||||
"title": "Research Paper",
|
||||
"comments": "Important research findings",
|
||||
"user": "alice",
|
||||
"tags": ["research", "ai"]
|
||||
},
|
||||
{
|
||||
"id": "doc-124",
|
||||
"time": 1640995300000,
|
||||
"kind": "text/plain",
|
||||
"title": "Meeting Notes",
|
||||
"comments": "Team meeting discussion",
|
||||
"user": "alice",
|
||||
"tags": ["meeting", "notes"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### UPDATE-DOCUMENT - Update Document Metadata
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "update-document",
|
||||
"document_metadata": {
|
||||
"id": "doc-123",
|
||||
"title": "Updated Research Paper",
|
||||
"comments": "Updated findings and conclusions",
|
||||
"user": "alice",
|
||||
"tags": ["research", "ai", "machine-learning", "updated"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
### REMOVE-DOCUMENT - Remove Document
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "remove-document",
|
||||
"document_id": "doc-123",
|
||||
"user": "alice"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
## Processing Operations
|
||||
|
||||
### ADD-PROCESSING - Start Document Processing
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "add-processing",
|
||||
"processing_metadata": {
|
||||
"id": "proc-456",
|
||||
"document_id": "doc-123",
|
||||
"time": 1640995400000,
|
||||
"flow": "pdf-extraction",
|
||||
"user": "alice",
|
||||
"collection": "research",
|
||||
"tags": ["extraction", "nlp"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
### LIST-PROCESSING - List Processing Jobs
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "list-processing",
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"processing_metadatas": [
|
||||
{
|
||||
"id": "proc-456",
|
||||
"document_id": "doc-123",
|
||||
"time": 1640995400000,
|
||||
"flow": "pdf-extraction",
|
||||
"user": "alice",
|
||||
"collection": "research",
|
||||
"tags": ["extraction", "nlp"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### REMOVE-PROCESSING - Stop Processing Job
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"operation": "remove-processing",
|
||||
"processing_id": "proc-456",
|
||||
"user": "alice"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{}
|
||||
```
|
||||
|
||||
## REST service
|
||||
|
||||
The REST service is available at `/api/v1/librarian` and accepts the above request formats.
|
||||
|
||||
## Websocket
|
||||
|
||||
Requests have a `request` object containing the operation fields.
|
||||
Responses have a `response` object containing the response fields.
|
||||
|
||||
Request:
|
||||
```json
|
||||
{
|
||||
"id": "unique-request-id",
|
||||
"service": "librarian",
|
||||
"request": {
|
||||
"operation": "list-documents",
|
||||
"user": "alice"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"id": "unique-request-id",
|
||||
"response": {
|
||||
"document_metadatas": [...]
|
||||
},
|
||||
"complete": true
|
||||
}
|
||||
```
|
||||
|
||||
## Pulsar
|
||||
|
||||
The Pulsar schema for the Librarian API is defined in Python code here:
|
||||
|
||||
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/library.py
|
||||
|
||||
Default request queue:
|
||||
`non-persistent://tg/request/librarian`
|
||||
|
||||
Default response queue:
|
||||
`non-persistent://tg/response/librarian`
|
||||
|
||||
Request schema:
|
||||
`trustgraph.schema.LibrarianRequest`
|
||||
|
||||
Response schema:
|
||||
`trustgraph.schema.LibrarianResponse`
|
||||
|
||||
## Python SDK
|
||||
|
||||
The Python SDK provides convenient access to the Librarian API:
|
||||
|
||||
```python
|
||||
from trustgraph.api.library import LibrarianClient
|
||||
|
||||
client = LibrarianClient()
|
||||
|
||||
# Add a document
|
||||
with open("document.pdf", "rb") as f:
|
||||
content = f.read()
|
||||
|
||||
await client.add_document(
|
||||
doc_id="doc-123",
|
||||
title="Research Paper",
|
||||
content=content,
|
||||
user="alice",
|
||||
tags=["research", "ai"]
|
||||
)
|
||||
|
||||
# Get document metadata
|
||||
metadata = await client.get_document_metadata("doc-123", "alice")
|
||||
|
||||
# List documents
|
||||
documents = await client.list_documents("alice", collection="research")
|
||||
|
||||
# Start processing
|
||||
await client.add_processing(
|
||||
processing_id="proc-456",
|
||||
document_id="doc-123",
|
||||
flow="pdf-extraction",
|
||||
user="alice"
|
||||
)
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Hybrid Storage**: MinIO for content, Cassandra for metadata
|
||||
- **Multi-user Support**: User-based document ownership and access control
|
||||
- **Rich Metadata**: RDF-style metadata triples and tagging system
|
||||
- **Processing Integration**: Automatic triggering of document processing workflows
|
||||
- **Content Types**: Support for multiple document formats (PDF, text, etc.)
|
||||
- **Collection Management**: Optional document grouping by collection
|
||||
- **Metadata Search**: Query documents by metadata criteria
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Document Management**: Store and organize documents with rich metadata
|
||||
- **Knowledge Extraction**: Process documents to extract structured knowledge
|
||||
- **Research Libraries**: Manage collections of research papers and documents
|
||||
- **Content Processing**: Orchestrate document processing workflows
|
||||
- **Multi-tenant Systems**: Support multiple users with isolated document libraries
|
||||
313
docs/apis/api-metrics.md
Normal file
313
docs/apis/api-metrics.md
Normal file
|
|
@ -0,0 +1,313 @@
|
|||
# TrustGraph Metrics API
|
||||
|
||||
This API provides access to TrustGraph system metrics through a Prometheus proxy endpoint.
|
||||
It allows authenticated access to monitoring and observability data from the TrustGraph
|
||||
system components.
|
||||
|
||||
## Overview
|
||||
|
||||
The Metrics API is implemented as a proxy to a Prometheus metrics server, providing:
|
||||
- System performance metrics
|
||||
- Service health information
|
||||
- Resource utilization data
|
||||
- Request/response statistics
|
||||
- Error rates and latency metrics
|
||||
|
||||
## Authentication
|
||||
|
||||
All metrics endpoints require Bearer token authentication:
|
||||
|
||||
```
|
||||
Authorization: Bearer <your-api-token>
|
||||
```
|
||||
|
||||
Unauthorized requests return HTTP 401.
|
||||
|
||||
## Endpoint
|
||||
|
||||
**Base Path:** `/api/metrics`
|
||||
|
||||
**Method:** GET
|
||||
|
||||
**Description:** Proxies requests to the underlying Prometheus API
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Query Current Metrics
|
||||
|
||||
```bash
|
||||
# Get all available metrics
|
||||
curl -H "Authorization: Bearer your-token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=up"
|
||||
|
||||
# Get specific metric with time range
|
||||
curl -H "Authorization: Bearer your-token" \
|
||||
"http://api-gateway:8080/api/metrics/query_range?query=cpu_usage&start=1640995200&end=1640998800&step=60"
|
||||
|
||||
# Get metric metadata
|
||||
curl -H "Authorization: Bearer your-token" \
|
||||
"http://api-gateway:8080/api/metrics/metadata"
|
||||
```
|
||||
|
||||
### Common Prometheus API Endpoints
|
||||
|
||||
The metrics API supports all standard Prometheus API endpoints:
|
||||
|
||||
#### Instant Queries
|
||||
```
|
||||
GET /api/metrics/query?query=<prometheus_query>
|
||||
```
|
||||
|
||||
#### Range Queries
|
||||
```
|
||||
GET /api/metrics/query_range?query=<query>&start=<timestamp>&end=<timestamp>&step=<duration>
|
||||
```
|
||||
|
||||
#### Metadata
|
||||
```
|
||||
GET /api/metrics/metadata
|
||||
GET /api/metrics/metadata?metric=<metric_name>
|
||||
```
|
||||
|
||||
#### Series
|
||||
```
|
||||
GET /api/metrics/series?match[]=<series_selector>
|
||||
```
|
||||
|
||||
#### Label Values
|
||||
```
|
||||
GET /api/metrics/label/<label_name>/values
|
||||
```
|
||||
|
||||
#### Targets
|
||||
```
|
||||
GET /api/metrics/targets
|
||||
```
|
||||
|
||||
## Example Queries
|
||||
|
||||
### System Health
|
||||
```bash
|
||||
# Check if services are up
|
||||
curl -H "Authorization: Bearer token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=up"
|
||||
|
||||
# Get service uptime
|
||||
curl -H "Authorization: Bearer token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=time()-process_start_time_seconds"
|
||||
```
|
||||
|
||||
### Performance Metrics
|
||||
```bash
|
||||
# CPU usage
|
||||
curl -H "Authorization: Bearer token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=rate(cpu_seconds_total[5m])"
|
||||
|
||||
# Memory usage
|
||||
curl -H "Authorization: Bearer token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=process_resident_memory_bytes"
|
||||
|
||||
# Request rate
|
||||
curl -H "Authorization: Bearer token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=rate(http_requests_total[5m])"
|
||||
```
|
||||
|
||||
### TrustGraph-Specific Metrics
|
||||
```bash
|
||||
# Document processing rate
|
||||
curl -H "Authorization: Bearer token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=rate(trustgraph_documents_processed_total[5m])"
|
||||
|
||||
# Knowledge graph size
|
||||
curl -H "Authorization: Bearer token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=trustgraph_triples_count"
|
||||
|
||||
# Embedding generation rate
|
||||
curl -H "Authorization: Bearer token" \
|
||||
"http://api-gateway:8080/api/metrics/query?query=rate(trustgraph_embeddings_generated_total[5m])"
|
||||
```
|
||||
|
||||
## Response Format
|
||||
|
||||
Responses follow the standard Prometheus API format:
|
||||
|
||||
### Successful Query Response
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"data": {
|
||||
"resultType": "vector",
|
||||
"result": [
|
||||
{
|
||||
"metric": {
|
||||
"__name__": "up",
|
||||
"instance": "api-gateway:8080",
|
||||
"job": "trustgraph"
|
||||
},
|
||||
"value": [1640995200, "1"]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Range Query Response
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"data": {
|
||||
"resultType": "matrix",
|
||||
"result": [
|
||||
{
|
||||
"metric": {
|
||||
"__name__": "cpu_usage",
|
||||
"instance": "worker-1"
|
||||
},
|
||||
"values": [
|
||||
[1640995200, "0.15"],
|
||||
[1640995260, "0.18"],
|
||||
[1640995320, "0.12"]
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Error Response
|
||||
```json
|
||||
{
|
||||
"status": "error",
|
||||
"errorType": "bad_data",
|
||||
"error": "invalid query syntax"
|
||||
}
|
||||
```
|
||||
|
||||
## Available Metrics
|
||||
|
||||
### Standard System Metrics
|
||||
- `up`: Service availability (1 = up, 0 = down)
|
||||
- `process_resident_memory_bytes`: Memory usage
|
||||
- `process_cpu_seconds_total`: CPU time
|
||||
- `http_requests_total`: HTTP request count
|
||||
- `http_request_duration_seconds`: Request latency
|
||||
|
||||
### TrustGraph-Specific Metrics
|
||||
- `trustgraph_documents_processed_total`: Documents processed count
|
||||
- `trustgraph_triples_count`: Knowledge graph triple count
|
||||
- `trustgraph_embeddings_generated_total`: Embeddings generated count
|
||||
- `trustgraph_flow_executions_total`: Flow execution count
|
||||
- `trustgraph_pulsar_messages_total`: Pulsar message count
|
||||
- `trustgraph_errors_total`: Error count by component
|
||||
|
||||
## Time Series Queries
|
||||
|
||||
### Time Ranges
|
||||
Use standard Prometheus time range formats:
|
||||
- `5m`: 5 minutes
|
||||
- `1h`: 1 hour
|
||||
- `1d`: 1 day
|
||||
- `1w`: 1 week
|
||||
|
||||
### Rate Calculations
|
||||
```bash
|
||||
# 5-minute rate
|
||||
rate(metric_name[5m])
|
||||
|
||||
# Increase over time
|
||||
increase(metric_name[1h])
|
||||
```
|
||||
|
||||
### Aggregations
|
||||
```bash
|
||||
# Sum across instances
|
||||
sum(metric_name)
|
||||
|
||||
# Average by label
|
||||
avg by (instance) (metric_name)
|
||||
|
||||
# Top 5 values
|
||||
topk(5, metric_name)
|
||||
```
|
||||
|
||||
## Integration Examples
|
||||
|
||||
### Python Integration
|
||||
```python
|
||||
import requests
|
||||
|
||||
def query_metrics(token, query):
|
||||
headers = {"Authorization": f"Bearer {token}"}
|
||||
params = {"query": query}
|
||||
|
||||
response = requests.get(
|
||||
"http://api-gateway:8080/api/metrics/query",
|
||||
headers=headers,
|
||||
params=params
|
||||
)
|
||||
|
||||
return response.json()
|
||||
|
||||
# Get system uptime
|
||||
uptime = query_metrics("your-token", "time() - process_start_time_seconds")
|
||||
```
|
||||
|
||||
### JavaScript Integration
|
||||
```javascript
|
||||
async function queryMetrics(token, query) {
|
||||
const response = await fetch(
|
||||
`http://api-gateway:8080/api/metrics/query?query=${encodeURIComponent(query)}`,
|
||||
{
|
||||
headers: {
|
||||
'Authorization': `Bearer ${token}`
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
return await response.json();
|
||||
}
|
||||
|
||||
// Get request rate
|
||||
const requestRate = await queryMetrics('your-token', 'rate(http_requests_total[5m])');
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common HTTP Status Codes
|
||||
- `200`: Success
|
||||
- `400`: Bad request (invalid query)
|
||||
- `401`: Unauthorized (invalid/missing token)
|
||||
- `422`: Unprocessable entity (query execution error)
|
||||
- `500`: Internal server error
|
||||
|
||||
### Error Types
|
||||
- `bad_data`: Invalid query syntax
|
||||
- `timeout`: Query execution timeout
|
||||
- `canceled`: Query was canceled
|
||||
- `execution`: Query execution error
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Query Optimization
|
||||
- Use appropriate time ranges to limit data volume
|
||||
- Apply label filters to reduce result sets
|
||||
- Use recording rules for frequently accessed metrics
|
||||
|
||||
### Rate Limiting
|
||||
- Avoid high-frequency polling
|
||||
- Cache results when appropriate
|
||||
- Use appropriate step sizes for range queries
|
||||
|
||||
### Security
|
||||
- Keep API tokens secure
|
||||
- Use HTTPS in production
|
||||
- Rotate tokens regularly
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **System Monitoring**: Track system health and performance
|
||||
- **Capacity Planning**: Monitor resource utilization trends
|
||||
- **Alerting**: Set up alerts based on metric thresholds
|
||||
- **Performance Analysis**: Analyze system performance over time
|
||||
- **Debugging**: Investigate issues using detailed metrics
|
||||
- **Business Intelligence**: Track document processing and knowledge extraction metrics
|
||||
|
|
@ -15,7 +15,7 @@ The request contains the following fields:
|
|||
|
||||
### Response
|
||||
|
||||
The request contains either of these fields:
|
||||
The response contains either of these fields:
|
||||
- `text`: A plain text response
|
||||
- `object`: A structured object, JSON-encoded
|
||||
|
||||
|
|
@ -60,6 +60,7 @@ Request:
|
|||
{
|
||||
"id": "akshfkiehfkseffh-142",
|
||||
"service": "prompt",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"id": "extract-definitions",
|
||||
"variables": {
|
||||
|
|
|
|||
|
|
@ -19,7 +19,7 @@ The request contains the following fields:
|
|||
|
||||
### Response
|
||||
|
||||
The request contains the following fields:
|
||||
The response contains the following fields:
|
||||
- `response`: LLM response
|
||||
|
||||
## REST service
|
||||
|
|
@ -59,6 +59,7 @@ Request:
|
|||
{
|
||||
"id": "blrqotfefnmnh7de-1",
|
||||
"service": "text-completion",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"system": "You are a helpful agent",
|
||||
"prompt": "What does NASA stand for?"
|
||||
|
|
|
|||
168
docs/apis/api-text-load.md
Normal file
168
docs/apis/api-text-load.md
Normal file
|
|
@ -0,0 +1,168 @@
|
|||
# TrustGraph Text Load API
|
||||
|
||||
This API loads text documents into TrustGraph processing pipelines. It's a sender API
|
||||
that accepts text documents with metadata and queues them for processing through
|
||||
specified flows.
|
||||
|
||||
## Request Format
|
||||
|
||||
The text-load API accepts a JSON request with the following fields:
|
||||
- `id`: Document identifier (typically a URI)
|
||||
- `metadata`: Array of RDF triples providing document metadata
|
||||
- `charset`: Character encoding (defaults to "utf-8")
|
||||
- `text`: Base64-encoded text content
|
||||
- `user`: User identifier (defaults to "trustgraph")
|
||||
- `collection`: Collection identifier (defaults to "default")
|
||||
|
||||
## Request Example
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "https://example.com/documents/research-paper-123",
|
||||
"metadata": [
|
||||
{
|
||||
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
|
||||
"p": {"v": "http://purl.org/dc/terms/title", "e": true},
|
||||
"o": {"v": "Machine Learning in Healthcare", "e": false}
|
||||
},
|
||||
{
|
||||
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
|
||||
"p": {"v": "http://purl.org/dc/terms/creator", "e": true},
|
||||
"o": {"v": "Dr. Jane Smith", "e": false}
|
||||
},
|
||||
{
|
||||
"s": {"v": "https://example.com/documents/research-paper-123", "e": true},
|
||||
"p": {"v": "http://purl.org/dc/terms/subject", "e": true},
|
||||
"o": {"v": "Healthcare AI", "e": false}
|
||||
}
|
||||
],
|
||||
"charset": "utf-8",
|
||||
"text": "VGhpcyBpcyBhIHNhbXBsZSByZXNlYXJjaCBwYXBlciBhYm91dCBtYWNoaW5lIGxlYXJuaW5nIGluIGhlYWx0aGNhcmUuLi4=",
|
||||
"user": "researcher",
|
||||
"collection": "healthcare-research"
|
||||
}
|
||||
```
|
||||
|
||||
## Response
|
||||
|
||||
The text-load API is a sender API with no response body. Success is indicated by HTTP status code 200.
|
||||
|
||||
## REST service
|
||||
|
||||
The text-load service is available at:
|
||||
`POST /api/v1/flow/{flow-id}/service/text-load`
|
||||
|
||||
Where `{flow-id}` is the identifier of the flow that will process the document.
|
||||
|
||||
Example:
|
||||
```bash
|
||||
curl -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d @document.json \
|
||||
http://api-gateway:8080/api/v1/flow/pdf-processing/service/text-load
|
||||
```
|
||||
|
||||
## Metadata Format
|
||||
|
||||
Each metadata triple contains:
|
||||
- `s`: Subject (object with `v` for value and `e` for is_entity boolean)
|
||||
- `p`: Predicate (object with `v` for value and `e` for is_entity boolean)
|
||||
- `o`: Object (object with `v` for value and `e` for is_entity boolean)
|
||||
|
||||
The `e` field indicates whether the value should be treated as an entity (true) or literal (false).
|
||||
|
||||
## Common Metadata Properties
|
||||
|
||||
### Document Properties
|
||||
- `http://purl.org/dc/terms/title`: Document title
|
||||
- `http://purl.org/dc/terms/creator`: Document author
|
||||
- `http://purl.org/dc/terms/subject`: Document subject/topic
|
||||
- `http://purl.org/dc/terms/description`: Document description
|
||||
- `http://purl.org/dc/terms/date`: Publication date
|
||||
- `http://purl.org/dc/terms/language`: Document language
|
||||
|
||||
### Organizational Properties
|
||||
- `http://xmlns.com/foaf/0.1/name`: Organization name
|
||||
- `http://www.w3.org/2006/vcard/ns#hasAddress`: Organization address
|
||||
- `http://xmlns.com/foaf/0.1/homepage`: Organization website
|
||||
|
||||
### Publication Properties
|
||||
- `http://purl.org/ontology/bibo/doi`: DOI identifier
|
||||
- `http://purl.org/ontology/bibo/isbn`: ISBN identifier
|
||||
- `http://purl.org/ontology/bibo/volume`: Publication volume
|
||||
- `http://purl.org/ontology/bibo/issue`: Publication issue
|
||||
|
||||
## Text Encoding
|
||||
|
||||
The `text` field must contain base64-encoded content. To encode text:
|
||||
|
||||
```bash
|
||||
# Command line encoding
|
||||
echo "Your text content here" | base64
|
||||
|
||||
# Python encoding
|
||||
import base64
|
||||
encoded_text = base64.b64encode("Your text content here".encode('utf-8')).decode('utf-8')
|
||||
```
|
||||
|
||||
## Integration with Processing Flows
|
||||
|
||||
Once loaded, text documents are processed through the specified flow, which typically includes:
|
||||
|
||||
1. **Text Chunking**: Breaking documents into manageable chunks
|
||||
2. **Embedding Generation**: Creating vector embeddings for semantic search
|
||||
3. **Knowledge Extraction**: Extracting entities and relationships
|
||||
4. **Graph Storage**: Storing extracted knowledge in the knowledge graph
|
||||
5. **Indexing**: Making content searchable for RAG queries
|
||||
|
||||
## Error Handling
|
||||
|
||||
Common errors include:
|
||||
- Invalid base64 encoding in text field
|
||||
- Missing required fields (id, text)
|
||||
- Invalid metadata triple format
|
||||
- Flow not found or inactive
|
||||
|
||||
## Python SDK
|
||||
|
||||
```python
|
||||
import base64
|
||||
from trustgraph.api.text_load import TextLoadClient
|
||||
|
||||
client = TextLoadClient()
|
||||
|
||||
# Prepare document
|
||||
document = {
|
||||
"id": "https://example.com/doc-123",
|
||||
"metadata": [
|
||||
{
|
||||
"s": {"v": "https://example.com/doc-123", "e": True},
|
||||
"p": {"v": "http://purl.org/dc/terms/title", "e": True},
|
||||
"o": {"v": "Sample Document", "e": False}
|
||||
}
|
||||
],
|
||||
"charset": "utf-8",
|
||||
"text": base64.b64encode("Document content here".encode('utf-8')).decode('utf-8'),
|
||||
"user": "alice",
|
||||
"collection": "research"
|
||||
}
|
||||
|
||||
# Load document
|
||||
await client.load_text_document("my-flow", document)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Research Paper Ingestion**: Load academic papers with rich metadata
|
||||
- **Document Processing**: Ingest documents for knowledge extraction
|
||||
- **Content Management**: Build searchable document repositories
|
||||
- **RAG System Population**: Load content for question-answering systems
|
||||
- **Knowledge Base Construction**: Convert documents into structured knowledge
|
||||
|
||||
## Features
|
||||
|
||||
- **Rich Metadata**: Full RDF metadata support for semantic annotation
|
||||
- **Flow Integration**: Direct integration with TrustGraph processing flows
|
||||
- **Multi-tenant**: User and collection-based document organization
|
||||
- **Encoding Support**: Flexible character encoding support
|
||||
- **No Response Required**: Fire-and-forget operation for high throughput
|
||||
|
|
@ -21,7 +21,7 @@ Returned triples will match all of `s`, `p` and `o` where provided.
|
|||
|
||||
### Response
|
||||
|
||||
The request contains the following fields:
|
||||
The response contains the following fields:
|
||||
- `response`: A list of triples.
|
||||
|
||||
Each triple contains `s`, `p` and `o` fields describing the
|
||||
|
|
@ -33,15 +33,53 @@ Each triple element uses the same schema:
|
|||
- `is_uri`: A boolean value which is true if this is a graph entity i.e.
|
||||
`value` is a URI, not a literal value.
|
||||
|
||||
## Data Format Details
|
||||
|
||||
### Triple Element Format
|
||||
|
||||
To reduce the size of JSON messages, triple elements (subject, predicate, object) are encoded using a compact format:
|
||||
|
||||
- `v`: The value as a string (maps to `value` in the full schema)
|
||||
- `e`: Boolean indicating if this is an entity/URI (maps to `is_uri` in the full schema)
|
||||
|
||||
Each triple element (`s`, `p`, `o`) contains:
|
||||
- `v`: The actual value as a string
|
||||
- `e`: Boolean indicating the value type
|
||||
- `true`: The value is a URI/entity (e.g., `"http://example.com/Person1"`)
|
||||
- `false`: The value is a literal (e.g., `"John Doe"`, `"42"`, `"2023-01-01"`)
|
||||
|
||||
### Examples
|
||||
|
||||
**URI/Entity Element:**
|
||||
```json
|
||||
{
|
||||
"v": "http://trustgraph.ai/e/space-station-modules",
|
||||
"e": true
|
||||
}
|
||||
```
|
||||
|
||||
**Literal Element:**
|
||||
```json
|
||||
{
|
||||
"v": "space station modules",
|
||||
"e": false
|
||||
}
|
||||
```
|
||||
|
||||
**Numeric Literal:**
|
||||
```json
|
||||
{
|
||||
"v": "42",
|
||||
"e": false
|
||||
}
|
||||
```
|
||||
|
||||
## REST service
|
||||
|
||||
The REST service accepts a request object containing the `s`, `p`, `o`
|
||||
and `limit` fields.
|
||||
The response is a JSON object containing the `response` field.
|
||||
|
||||
To reduce the size of the JSON, the graph entities are encoded as an
|
||||
object with `value` and `is_uri` mapped to `v` and `e` respectively.
|
||||
|
||||
e.g.
|
||||
|
||||
This example query matches triples with a subject of
|
||||
|
|
@ -58,6 +96,7 @@ Request:
|
|||
{
|
||||
"id": "qgzw1287vfjc8wsk-4",
|
||||
"service": "triples-query",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"s": {
|
||||
"v": "http://trustgraph.ai/e/space-station-modules",
|
||||
|
|
@ -97,13 +136,9 @@ Response:
|
|||
|
||||
## Websocket
|
||||
|
||||
Requests have a `request` object containing the `system` and
|
||||
`prompt` fields.
|
||||
Requests have a `request` object containing the query fields (`s`, `p`, `o`, `limit`).
|
||||
Responses have a `response` object containing `response` field.
|
||||
|
||||
To reduce the size of the JSON, the graph entities are encoded as an
|
||||
object with `value` and `is_uri` mapped to `v` and `e` respectively.
|
||||
|
||||
e.g.
|
||||
|
||||
Request:
|
||||
|
|
@ -178,10 +213,3 @@ The client class is
|
|||
|
||||
https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/clients/triples_query_client.py
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -1,3 +1,230 @@
|
|||
# TrustGraph Pulsar API
|
||||
|
||||
Coming soon
|
||||
Apache Pulsar is the underlying message queue system used by TrustGraph for inter-component communication. Understanding Pulsar queue names is essential for direct integration with TrustGraph services.
|
||||
|
||||
## Overview
|
||||
|
||||
TrustGraph uses two types of APIs with different queue naming patterns:
|
||||
|
||||
1. **Global Services**: Fixed queue names, not dependent on flows
|
||||
2. **Flow-Hosted Services**: Dynamic queue names that depend on the specific flow configuration
|
||||
|
||||
## Global Services (Fixed Queue Names)
|
||||
|
||||
These services run independently and have fixed Pulsar queue names:
|
||||
|
||||
### Config API
|
||||
- **Request Queue**: `non-persistent://tg/request/config`
|
||||
- **Response Queue**: `non-persistent://tg/response/config`
|
||||
- **Push Queue**: `persistent://tg/config/config`
|
||||
|
||||
### Flow API
|
||||
- **Request Queue**: `non-persistent://tg/request/flow`
|
||||
- **Response Queue**: `non-persistent://tg/response/flow`
|
||||
|
||||
### Knowledge API
|
||||
- **Request Queue**: `non-persistent://tg/request/knowledge`
|
||||
- **Response Queue**: `non-persistent://tg/response/knowledge`
|
||||
|
||||
### Librarian API
|
||||
- **Request Queue**: `non-persistent://tg/request/librarian`
|
||||
- **Response Queue**: `non-persistent://tg/response/librarian`
|
||||
|
||||
## Flow-Hosted Services (Dynamic Queue Names)
|
||||
|
||||
These services are hosted within specific flows and have queue names that depend on the flow configuration:
|
||||
|
||||
- Agent API
|
||||
- Document RAG API
|
||||
- Graph RAG API
|
||||
- Text Completion API
|
||||
- Prompt API
|
||||
- Embeddings API
|
||||
- Graph Embeddings API
|
||||
- Triples Query API
|
||||
- Text Load API
|
||||
- Document Load API
|
||||
|
||||
## Discovering Flow-Hosted Queue Names
|
||||
|
||||
To find the queue names for flow-hosted services, you need to query the flow configuration using the Config API.
|
||||
|
||||
### Method 1: Using the Config API
|
||||
|
||||
Query for the flow configuration:
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"operation": "get",
|
||||
"keys": [
|
||||
{
|
||||
"type": "flows",
|
||||
"key": "your-flow-name"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
The response will contain a flow definition with an "interfaces" object that lists all queue names.
|
||||
|
||||
### Method 2: Using the CLI
|
||||
|
||||
Use the TrustGraph CLI to dump the configuration:
|
||||
|
||||
```bash
|
||||
tg-show-config
|
||||
```
|
||||
|
||||
## Flow Interface Types
|
||||
|
||||
Flow configurations define two types of service interfaces:
|
||||
|
||||
### 1. Request/Response Interfaces
|
||||
|
||||
Services that accept a request and return a response:
|
||||
|
||||
```json
|
||||
{
|
||||
"graph-rag": {
|
||||
"request": "non-persistent://tg/request/graph-rag:document-rag+graph-rag",
|
||||
"response": "non-persistent://tg/response/graph-rag:document-rag+graph-rag"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Examples**: agent, document-rag, graph-rag, text-completion, prompt, embeddings, graph-embeddings, triples
|
||||
|
||||
### 2. Fire-and-Forget Interfaces
|
||||
|
||||
Services that accept data but don't return a response:
|
||||
|
||||
```json
|
||||
{
|
||||
"text-load": "persistent://tg/flow/text-document-load:default"
|
||||
}
|
||||
```
|
||||
|
||||
**Examples**: text-load, document-load, triples-store, graph-embeddings-store, document-embeddings-store, entity-contexts-load
|
||||
|
||||
## Example Flow Configuration
|
||||
|
||||
Here's an example of a complete flow configuration showing queue names:
|
||||
|
||||
```json
|
||||
{
|
||||
"class-name": "document-rag+graph-rag",
|
||||
"description": "Default processing flow",
|
||||
"interfaces": {
|
||||
"agent": {
|
||||
"request": "non-persistent://tg/request/agent:default",
|
||||
"response": "non-persistent://tg/response/agent:default"
|
||||
},
|
||||
"document-rag": {
|
||||
"request": "non-persistent://tg/request/document-rag:document-rag+graph-rag",
|
||||
"response": "non-persistent://tg/response/document-rag:document-rag+graph-rag"
|
||||
},
|
||||
"graph-rag": {
|
||||
"request": "non-persistent://tg/request/graph-rag:document-rag+graph-rag",
|
||||
"response": "non-persistent://tg/response/graph-rag:document-rag+graph-rag"
|
||||
},
|
||||
"text-completion": {
|
||||
"request": "non-persistent://tg/request/text-completion:document-rag+graph-rag",
|
||||
"response": "non-persistent://tg/response/text-completion:document-rag+graph-rag"
|
||||
},
|
||||
"embeddings": {
|
||||
"request": "non-persistent://tg/request/embeddings:document-rag+graph-rag",
|
||||
"response": "non-persistent://tg/response/embeddings:document-rag+graph-rag"
|
||||
},
|
||||
"triples": {
|
||||
"request": "non-persistent://tg/request/triples:document-rag+graph-rag",
|
||||
"response": "non-persistent://tg/response/triples:document-rag+graph-rag"
|
||||
},
|
||||
"text-load": "persistent://tg/flow/text-document-load:default",
|
||||
"document-load": "persistent://tg/flow/document-load:default",
|
||||
"triples-store": "persistent://tg/flow/triples-store:default",
|
||||
"graph-embeddings-store": "persistent://tg/flow/graph-embeddings-store:default"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Queue Naming Patterns
|
||||
|
||||
### Global Services
|
||||
- **Pattern**: `{persistence}://tg/{namespace}/{service-name}`
|
||||
- **Example**: `non-persistent://tg/request/config`
|
||||
|
||||
### Flow-Hosted Request/Response
|
||||
- **Pattern**: `{persistence}://tg/{namespace}/{service-name}:{flow-identifier}`
|
||||
- **Example**: `non-persistent://tg/request/graph-rag:document-rag+graph-rag`
|
||||
|
||||
### Flow-Hosted Fire-and-Forget
|
||||
- **Pattern**: `{persistence}://tg/flow/{service-name}:{flow-identifier}`
|
||||
- **Example**: `persistent://tg/flow/text-document-load:default`
|
||||
|
||||
## Persistence Types
|
||||
|
||||
- **non-persistent**: Messages are not persisted to disk, faster but less reliable
|
||||
- **persistent**: Messages are persisted to disk, slower but more reliable
|
||||
|
||||
## Practical Usage
|
||||
|
||||
### Python Example
|
||||
|
||||
```python
|
||||
import pulsar
|
||||
from trustgraph.schema import ConfigRequest, ConfigResponse
|
||||
|
||||
# Connect to Pulsar
|
||||
client = pulsar.Client('pulsar://localhost:6650')
|
||||
|
||||
# Create producer for config requests
|
||||
producer = client.create_producer(
|
||||
'non-persistent://tg/request/config',
|
||||
schema=pulsar.schema.AvroSchema(ConfigRequest)
|
||||
)
|
||||
|
||||
# Create consumer for config responses
|
||||
consumer = client.subscribe(
|
||||
'non-persistent://tg/response/config',
|
||||
subscription_name='my-subscription',
|
||||
schema=pulsar.schema.AvroSchema(ConfigResponse)
|
||||
)
|
||||
|
||||
# Send request
|
||||
request = ConfigRequest(operation='list-classes')
|
||||
producer.send(request)
|
||||
|
||||
# Receive response
|
||||
response = consumer.receive()
|
||||
print(response.value())
|
||||
```
|
||||
|
||||
### Flow Service Example
|
||||
|
||||
```python
|
||||
# First, get the flow configuration to find queue names
|
||||
config_request = ConfigRequest(
|
||||
operation='get',
|
||||
keys=[ConfigKey(type='flows', key='my-flow')]
|
||||
)
|
||||
|
||||
# Use the returned interface information to determine queue names
|
||||
# Then connect to the appropriate queues for the service you need
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Query Flow Configuration**: Always query the current flow configuration to get accurate queue names
|
||||
2. **Handle Dynamic Names**: Flow-hosted service queue names can change when flows are reconfigured
|
||||
3. **Choose Appropriate Persistence**: Use persistent queues for critical data, non-persistent for performance
|
||||
4. **Schema Validation**: Use the appropriate Pulsar schema for each service
|
||||
5. **Error Handling**: Implement proper error handling for queue connection and message failures
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Pulsar access should be restricted in production environments
|
||||
- Use appropriate authentication and authorization mechanisms
|
||||
- Monitor queue access and message patterns for security anomalies
|
||||
- Consider encryption for sensitive data in messages
|
||||
|
|
@ -18,13 +18,16 @@ When hosted using docker compose, you can access the service at
|
|||
|
||||
## Request
|
||||
|
||||
A request message is a JSON message containing 3 fields:
|
||||
A request message is a JSON message containing 3/4 fields:
|
||||
|
||||
- `id`: A unique ID which is used to correlate requests and responses.
|
||||
You should make sure it is unique.
|
||||
- `service`: The name of the service to invoke.
|
||||
- `request`: The request body which is passed to the service - this is
|
||||
defined in the API documentation for that service.
|
||||
- `flow`: Some APIs are supported by processors launched within a flow,
|
||||
are are dependent on a flow running. For such APIs, the flow identifier
|
||||
needs to be provided.
|
||||
|
||||
e.g.
|
||||
|
||||
|
|
@ -32,6 +35,7 @@ e.g.
|
|||
{
|
||||
"id": "qgzw1287vfjc8wsk-1",
|
||||
"service": "graph-rag",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"query": "What does NASA stand for?"
|
||||
}
|
||||
|
|
@ -86,6 +90,7 @@ Request:
|
|||
{
|
||||
"id": "blrqotfefnmnh7de-20",
|
||||
"service": "agent",
|
||||
"flow": "default",
|
||||
"request": {
|
||||
"question": "What does NASA stand for?"
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue