mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-26 00:46:22 +02:00
Update docs for API/CLI changes in 1.0 (#421)
* Update some API basics for the 0.23/1.0 API change
This commit is contained in:
parent
f907ea7db8
commit
44bdd29f51
69 changed files with 19981 additions and 407 deletions
324
docs/apis/api-core-import-export.md
Normal file
324
docs/apis/api-core-import-export.md
Normal file
|
|
@ -0,0 +1,324 @@
|
|||
# TrustGraph Core Import/Export API
|
||||
|
||||
This API provides bulk import and export capabilities for TrustGraph knowledge cores.
|
||||
It handles efficient transfer of both RDF triples and graph embeddings using MessagePack
|
||||
binary format for high-performance data exchange.
|
||||
|
||||
## Overview
|
||||
|
||||
The Core Import/Export API enables:
|
||||
- **Bulk Import**: Import large knowledge cores from binary streams
|
||||
- **Bulk Export**: Export knowledge cores as binary streams
|
||||
- **Efficient Format**: Uses MessagePack for compact, fast serialization
|
||||
- **Dual Data Types**: Handles both RDF triples and graph embeddings
|
||||
- **Streaming**: Supports streaming for large datasets
|
||||
|
||||
## Import Endpoint
|
||||
|
||||
**Endpoint:** `POST /api/v1/import-core`
|
||||
|
||||
**Query Parameters:**
|
||||
- `id`: Knowledge core identifier
|
||||
- `user`: User identifier
|
||||
|
||||
**Content-Type:** `application/octet-stream`
|
||||
|
||||
**Request Body:** MessagePack-encoded binary stream
|
||||
|
||||
### Import Process
|
||||
|
||||
1. **Stream Processing**: Reads binary data in 128KB chunks
|
||||
2. **MessagePack Decoding**: Unpacks binary data into structured messages
|
||||
3. **Knowledge Storage**: Stores data via Knowledge API
|
||||
4. **Response**: Returns success/error status
|
||||
|
||||
### Import Data Format
|
||||
|
||||
The import stream contains MessagePack-encoded tuples with type indicators:
|
||||
|
||||
#### Triples Data
|
||||
```python
|
||||
("t", {
|
||||
"m": { # metadata
|
||||
"i": "core-id",
|
||||
"m": [], # metadata triples
|
||||
"u": "user",
|
||||
"c": "collection"
|
||||
},
|
||||
"t": [ # triples array
|
||||
{
|
||||
"s": {"value": "subject", "is_uri": true},
|
||||
"p": {"value": "predicate", "is_uri": true},
|
||||
"o": {"value": "object", "is_uri": false}
|
||||
}
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
#### Graph Embeddings Data
|
||||
```python
|
||||
("ge", {
|
||||
"m": { # metadata
|
||||
"i": "core-id",
|
||||
"m": [], # metadata triples
|
||||
"u": "user",
|
||||
"c": "collection"
|
||||
},
|
||||
"e": [ # entities array
|
||||
{
|
||||
"e": {"value": "entity", "is_uri": true},
|
||||
"v": [[0.1, 0.2, 0.3]] # vectors
|
||||
}
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
## Export Endpoint
|
||||
|
||||
**Endpoint:** `GET /api/v1/export-core`
|
||||
|
||||
**Query Parameters:**
|
||||
- `id`: Knowledge core identifier
|
||||
- `user`: User identifier
|
||||
|
||||
**Content-Type:** `application/octet-stream`
|
||||
|
||||
**Response Body:** MessagePack-encoded binary stream
|
||||
|
||||
### Export Process
|
||||
|
||||
1. **Knowledge Retrieval**: Fetches data via Knowledge API
|
||||
2. **MessagePack Encoding**: Encodes data into binary format
|
||||
3. **Streaming Response**: Sends data as binary stream
|
||||
4. **Type Identification**: Uses type prefixes for data classification
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Import Knowledge Core
|
||||
|
||||
```bash
|
||||
# Import from file
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer your-token" \
|
||||
-H "Content-Type: application/octet-stream" \
|
||||
--data-binary @knowledge-core.msgpack \
|
||||
"http://api-gateway:8080/api/v1/import-core?id=core-123&user=alice"
|
||||
```
|
||||
|
||||
### Export Knowledge Core
|
||||
|
||||
```bash
|
||||
# Export to file
|
||||
curl -H "Authorization: Bearer your-token" \
|
||||
"http://api-gateway:8080/api/v1/export-core?id=core-123&user=alice" \
|
||||
-o knowledge-core.msgpack
|
||||
```
|
||||
|
||||
## Python Integration
|
||||
|
||||
### Import Example
|
||||
|
||||
```python
|
||||
import msgpack
|
||||
import requests
|
||||
|
||||
def import_knowledge_core(core_id, user, triples_data, embeddings_data, token):
|
||||
# Prepare data
|
||||
messages = []
|
||||
|
||||
# Add triples
|
||||
if triples_data:
|
||||
messages.append(("t", {
|
||||
"m": {
|
||||
"i": core_id,
|
||||
"m": [],
|
||||
"u": user,
|
||||
"c": "default"
|
||||
},
|
||||
"t": triples_data
|
||||
}))
|
||||
|
||||
# Add embeddings
|
||||
if embeddings_data:
|
||||
messages.append(("ge", {
|
||||
"m": {
|
||||
"i": core_id,
|
||||
"m": [],
|
||||
"u": user,
|
||||
"c": "default"
|
||||
},
|
||||
"e": embeddings_data
|
||||
}))
|
||||
|
||||
# Pack data
|
||||
binary_data = b''.join(msgpack.packb(msg) for msg in messages)
|
||||
|
||||
# Upload
|
||||
response = requests.post(
|
||||
f"http://api-gateway:8080/api/v1/import-core?id={core_id}&user={user}",
|
||||
headers={
|
||||
"Authorization": f"Bearer {token}",
|
||||
"Content-Type": "application/octet-stream"
|
||||
},
|
||||
data=binary_data
|
||||
)
|
||||
|
||||
return response.status_code == 200
|
||||
|
||||
# Usage
|
||||
triples = [
|
||||
{
|
||||
"s": {"value": "Person1", "is_uri": True},
|
||||
"p": {"value": "hasName", "is_uri": True},
|
||||
"o": {"value": "John Doe", "is_uri": False}
|
||||
}
|
||||
]
|
||||
|
||||
embeddings = [
|
||||
{
|
||||
"e": {"value": "Person1", "is_uri": True},
|
||||
"v": [[0.1, 0.2, 0.3, 0.4]]
|
||||
}
|
||||
]
|
||||
|
||||
success = import_knowledge_core("core-123", "alice", triples, embeddings, "your-token")
|
||||
```
|
||||
|
||||
### Export Example
|
||||
|
||||
```python
|
||||
import msgpack
|
||||
import requests
|
||||
|
||||
def export_knowledge_core(core_id, user, token):
|
||||
response = requests.get(
|
||||
f"http://api-gateway:8080/api/v1/export-core?id={core_id}&user={user}",
|
||||
headers={"Authorization": f"Bearer {token}"}
|
||||
)
|
||||
|
||||
if response.status_code != 200:
|
||||
return None
|
||||
|
||||
# Decode MessagePack stream
|
||||
data = response.content
|
||||
unpacker = msgpack.Unpacker()
|
||||
unpacker.feed(data)
|
||||
|
||||
triples = []
|
||||
embeddings = []
|
||||
|
||||
for unpacked in unpacker:
|
||||
msg_type, msg_data = unpacked
|
||||
|
||||
if msg_type == "t":
|
||||
triples.extend(msg_data["t"])
|
||||
elif msg_type == "ge":
|
||||
embeddings.extend(msg_data["e"])
|
||||
|
||||
return {
|
||||
"triples": triples,
|
||||
"embeddings": embeddings
|
||||
}
|
||||
|
||||
# Usage
|
||||
data = export_knowledge_core("core-123", "alice", "your-token")
|
||||
if data:
|
||||
print(f"Exported {len(data['triples'])} triples")
|
||||
print(f"Exported {len(data['embeddings'])} embeddings")
|
||||
```
|
||||
|
||||
## Data Format Specification
|
||||
|
||||
### MessagePack Tuples
|
||||
|
||||
Each message is a tuple: `(type_indicator, data_object)`
|
||||
|
||||
**Type Indicators:**
|
||||
- `"t"`: RDF triples data
|
||||
- `"ge"`: Graph embeddings data
|
||||
|
||||
### Metadata Structure
|
||||
|
||||
```python
|
||||
{
|
||||
"i": "core-identifier", # ID
|
||||
"m": [...], # Metadata triples array
|
||||
"u": "user-identifier", # User
|
||||
"c": "collection-name" # Collection
|
||||
}
|
||||
```
|
||||
|
||||
### Triple Structure
|
||||
|
||||
```python
|
||||
{
|
||||
"s": {"value": "subject", "is_uri": boolean},
|
||||
"p": {"value": "predicate", "is_uri": boolean},
|
||||
"o": {"value": "object", "is_uri": boolean}
|
||||
}
|
||||
```
|
||||
|
||||
### Entity Embedding Structure
|
||||
|
||||
```python
|
||||
{
|
||||
"e": {"value": "entity", "is_uri": boolean},
|
||||
"v": [[float, float, ...]] # Array of vectors
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Import Performance
|
||||
- **Streaming**: Processes data in 128KB chunks
|
||||
- **Memory Efficient**: Incremental unpacking
|
||||
- **Concurrent**: Multiple imports can run simultaneously
|
||||
- **Error Handling**: Robust error recovery
|
||||
|
||||
### Export Performance
|
||||
- **Direct Streaming**: Data streamed directly from knowledge store
|
||||
- **Efficient Encoding**: MessagePack for minimal overhead
|
||||
- **Large Dataset Support**: Handles cores of any size
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Import Errors
|
||||
- **Format Errors**: Invalid MessagePack data
|
||||
- **Type Errors**: Unknown type indicators
|
||||
- **Storage Errors**: Knowledge API failures
|
||||
- **Authentication**: Invalid user credentials
|
||||
|
||||
### Export Errors
|
||||
- **Not Found**: Core ID doesn't exist
|
||||
- **Access Denied**: User lacks permissions
|
||||
- **System Errors**: Knowledge API failures
|
||||
|
||||
### Error Responses
|
||||
- **HTTP 400**: Bad request (invalid parameters)
|
||||
- **HTTP 401**: Unauthorized access
|
||||
- **HTTP 404**: Core not found
|
||||
- **HTTP 500**: Internal server error
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Data Migration
|
||||
- **System Upgrades**: Export/import during system migrations
|
||||
- **Environment Sync**: Copy cores between environments
|
||||
- **Backup/Restore**: Full knowledge core backup operations
|
||||
|
||||
### Batch Processing
|
||||
- **Bulk Loading**: Load large knowledge datasets efficiently
|
||||
- **Data Integration**: Merge knowledge from multiple sources
|
||||
- **ETL Pipelines**: Extract-Transform-Load operations
|
||||
|
||||
### Performance Optimization
|
||||
- **Faster Than REST**: Binary format reduces transfer time
|
||||
- **Atomic Operations**: Complete import/export as single operation
|
||||
- **Resource Efficient**: Minimal memory footprint during transfer
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Authentication Required**: Bearer token authentication
|
||||
- **User Isolation**: Access restricted to user's own cores
|
||||
- **Data Validation**: Input validation on import
|
||||
- **Audit Logging**: Operations logged for security auditing
|
||||
Loading…
Add table
Add a link
Reference in a new issue