Update docs for API/CLI changes in 1.0 (#421)

* Update some API basics for the 0.23/1.0 API change
2026-06-17 02:45:14 +02:00 · 2025-07-03 14:58:32 +01:00 · 2025-07-03 14:58:32 +01:00 · 44bdd29f51
commit 44bdd29f51
parent f907ea7db8
69 changed files with 19981 additions and 407 deletions
--- a/docs/apis/api-librarian.md
+++ b/docs/apis/api-librarian.md
@ -0,0 +1,360 @@
+# TrustGraph Librarian API
+
+This API provides document library management for TrustGraph. It handles document storage, 
+metadata management, and processing orchestration using hybrid storage (MinIO for content, 
+Cassandra for metadata) with multi-user support.
+
+## Request/response
+
+### Request
+
+The request contains the following fields:
+- `operation`: The operation to perform (see operations below)
+- `document_id`: Document identifier (for document operations)
+- `document_metadata`: Document metadata object (for add/update operations)
+- `content`: Document content as base64-encoded bytes (for add operations)
+- `processing_id`: Processing job identifier (for processing operations)
+- `processing_metadata`: Processing metadata object (for add-processing)
+- `user`: User identifier (required for most operations)
+- `collection`: Collection filter (optional for list operations)
+- `criteria`: Query criteria array (for filtering operations)
+
+### Response
+
+The response contains the following fields:
+- `error`: Error information if operation fails
+- `document_metadata`: Single document metadata (for get operations)
+- `content`: Document content as base64-encoded bytes (for get-content)
+- `document_metadatas`: Array of document metadata (for list operations)
+- `processing_metadatas`: Array of processing metadata (for list-processing)
+
+## Document Operations
+
+### ADD-DOCUMENT - Add Document to Library
+
+Request:
+```json
+{
+    "operation": "add-document",
+    "document_metadata": {
+        "id": "doc-123",
+        "time": 1640995200000,
+        "kind": "application/pdf",
+        "title": "Research Paper",
+        "comments": "Important research findings",
+        "user": "alice",
+        "tags": ["research", "ai", "machine-learning"],
+        "metadata": [
+            {
+                "subject": "doc-123",
+                "predicate": "dc:creator",
+                "object": "Dr. Smith"
+            }
+        ]
+    },
+    "content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
+}
+```
+
+Response:
+```json
+{}
+```
+
+### GET-DOCUMENT-METADATA - Get Document Metadata
+
+Request:
+```json
+{
+    "operation": "get-document-metadata",
+    "document_id": "doc-123",
+    "user": "alice"
+}
+```
+
+Response:
+```json
+{
+    "document_metadata": {
+        "id": "doc-123",
+        "time": 1640995200000,
+        "kind": "application/pdf",
+        "title": "Research Paper",
+        "comments": "Important research findings",
+        "user": "alice",
+        "tags": ["research", "ai", "machine-learning"],
+        "metadata": [
+            {
+                "subject": "doc-123",
+                "predicate": "dc:creator",
+                "object": "Dr. Smith"
+            }
+        ]
+    }
+}
+```
+
+### GET-DOCUMENT-CONTENT - Get Document Content
+
+Request:
+```json
+{
+    "operation": "get-document-content",
+    "document_id": "doc-123",
+    "user": "alice"
+}
+```
+
+Response:
+```json
+{
+    "content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
+}
+```
+
+### LIST-DOCUMENTS - List User's Documents
+
+Request:
+```json
+{
+    "operation": "list-documents",
+    "user": "alice",
+    "collection": "research"
+}
+```
+
+Response:
+```json
+{
+    "document_metadatas": [
+        {
+            "id": "doc-123",
+            "time": 1640995200000,
+            "kind": "application/pdf",
+            "title": "Research Paper",
+            "comments": "Important research findings",
+            "user": "alice",
+            "tags": ["research", "ai"]
+        },
+        {
+            "id": "doc-124",
+            "time": 1640995300000,
+            "kind": "text/plain",
+            "title": "Meeting Notes",
+            "comments": "Team meeting discussion",
+            "user": "alice",
+            "tags": ["meeting", "notes"]
+        }
+    ]
+}
+```
+
+### UPDATE-DOCUMENT - Update Document Metadata
+
+Request:
+```json
+{
+    "operation": "update-document",
+    "document_metadata": {
+        "id": "doc-123",
+        "title": "Updated Research Paper",
+        "comments": "Updated findings and conclusions",
+        "user": "alice",
+        "tags": ["research", "ai", "machine-learning", "updated"]
+    }
+}
+```
+
+Response:
+```json
+{}
+```
+
+### REMOVE-DOCUMENT - Remove Document
+
+Request:
+```json
+{
+    "operation": "remove-document",
+    "document_id": "doc-123",
+    "user": "alice"
+}
+```
+
+Response:
+```json
+{}
+```
+
+## Processing Operations
+
+### ADD-PROCESSING - Start Document Processing
+
+Request:
+```json
+{
+    "operation": "add-processing",
+    "processing_metadata": {
+        "id": "proc-456",
+        "document_id": "doc-123",
+        "time": 1640995400000,
+        "flow": "pdf-extraction",
+        "user": "alice",
+        "collection": "research",
+        "tags": ["extraction", "nlp"]
+    }
+}
+```
+
+Response:
+```json
+{}
+```
+
+### LIST-PROCESSING - List Processing Jobs
+
+Request:
+```json
+{
+    "operation": "list-processing",
+    "user": "alice",
+    "collection": "research"
+}
+```
+
+Response:
+```json
+{
+    "processing_metadatas": [
+        {
+            "id": "proc-456",
+            "document_id": "doc-123",
+            "time": 1640995400000,
+            "flow": "pdf-extraction",
+            "user": "alice",
+            "collection": "research",
+            "tags": ["extraction", "nlp"]
+        }
+    ]
+}
+```
+
+### REMOVE-PROCESSING - Stop Processing Job
+
+Request:
+```json
+{
+    "operation": "remove-processing",
+    "processing_id": "proc-456",
+    "user": "alice"
+}
+```
+
+Response:
+```json
+{}
+```
+
+## REST service
+
+The REST service is available at `/api/v1/librarian` and accepts the above request formats.
+
+## Websocket
+
+Requests have a `request` object containing the operation fields.
+Responses have a `response` object containing the response fields.
+
+Request:
+```json
+{
+    "id": "unique-request-id",
+    "service": "librarian",
+    "request": {
+        "operation": "list-documents",
+        "user": "alice"
+    }
+}
+```
+
+Response:
+```json
+{
+    "id": "unique-request-id",
+    "response": {
+        "document_metadatas": [...]
+    },
+    "complete": true
+}
+```
+
+## Pulsar
+
+The Pulsar schema for the Librarian API is defined in Python code here:
+
+https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/library.py
+
+Default request queue:
+`non-persistent://tg/request/librarian`
+
+Default response queue:
+`non-persistent://tg/response/librarian`
+
+Request schema:
+`trustgraph.schema.LibrarianRequest`
+
+Response schema:
+`trustgraph.schema.LibrarianResponse`
+
+## Python SDK
+
+The Python SDK provides convenient access to the Librarian API:
+
+```python
+from trustgraph.api.library import LibrarianClient
+
+client = LibrarianClient()
+
+# Add a document
+with open("document.pdf", "rb") as f:
+    content = f.read()
+    
+await client.add_document(
+    doc_id="doc-123",
+    title="Research Paper",
+    content=content,
+    user="alice",
+    tags=["research", "ai"]
+)
+
+# Get document metadata
+metadata = await client.get_document_metadata("doc-123", "alice")
+
+# List documents
+documents = await client.list_documents("alice", collection="research")
+
+# Start processing
+await client.add_processing(
+    processing_id="proc-456",
+    document_id="doc-123",
+    flow="pdf-extraction",
+    user="alice"
+)
+```
+
+## Features
+
+- **Hybrid Storage**: MinIO for content, Cassandra for metadata
+- **Multi-user Support**: User-based document ownership and access control
+- **Rich Metadata**: RDF-style metadata triples and tagging system
+- **Processing Integration**: Automatic triggering of document processing workflows
+- **Content Types**: Support for multiple document formats (PDF, text, etc.)
+- **Collection Management**: Optional document grouping by collection
+- **Metadata Search**: Query documents by metadata criteria
+
+## Use Cases
+
+- **Document Management**: Store and organize documents with rich metadata
+- **Knowledge Extraction**: Process documents to extract structured knowledge
+- **Research Libraries**: Manage collections of research papers and documents
+- **Content Processing**: Orchestrate document processing workflows
+- **Multi-tenant Systems**: Support multiple users with isolated document libraries