# TrustGraph Librarian API

This API provides document library management for TrustGraph. It handles document storage,
metadata management, and processing orchestration using hybrid storage (S3-compatible object
storage for content, Cassandra for metadata) with multi-user support.

## Request/response

### Request

The request contains the following fields:
- `operation`: The operation to perform (see operations below)
- `document_id`: Document identifier (for document operations)
- `document_metadata`: Document metadata object (for add/update operations)
  - `id`: Document identifier (required)
  - `time`: Unix timestamp in seconds as a float (required for add operations)
  - `kind`: MIME type of document (required, e.g., "text/plain", "application/pdf")
  - `title`: Document title (optional)
  - `comments`: Document comments (optional)
  - `user`: Document owner (required)
  - `tags`: Array of tags (optional)
  - `metadata`: Array of RDF triples (optional) - each triple has:
    - `s`: Subject with `v` (value) and `e` (is_uri boolean)
    - `p`: Predicate with `v` (value) and `e` (is_uri boolean)
    - `o`: Object with `v` (value) and `e` (is_uri boolean)
- `content`: Document content as base64-encoded bytes (for add operations)
- `processing_id`: Processing job identifier (for processing operations)
- `processing_metadata`: Processing metadata object (for add-processing)
- `user`: User identifier (required for most operations)
- `collection`: Collection filter (optional for list operations)
- `criteria`: Query criteria array (for filtering operations)

### Response

The response contains the following fields:
- `error`: Error information if operation fails
- `document_metadata`: Single document metadata (for get operations)
- `content`: Document content as base64-encoded bytes (for get-content)
- `document_metadatas`: Array of document metadata (for list operations)
- `processing_metadatas`: Array of processing metadata (for list-processing)

## Document Operations

### ADD-DOCUMENT - Add Document to Library

Request:
```json
{
    "operation": "add-document",
    "document_metadata": {
        "id": "doc-123",
        "time": 1640995200.0,
        "kind": "application/pdf",
        "title": "Research Paper",
        "comments": "Important research findings",
        "user": "alice",
        "tags": ["research", "ai", "machine-learning"],
        "metadata": [
            {
                "s": {
                    "v": "http://example.com/doc-123",
                    "e": true
                },
                "p": {
                    "v": "http://purl.org/dc/elements/1.1/creator",
                    "e": true
                },
                "o": {
                    "v": "Dr. Smith",
                    "e": false
                }
            }
        ]
    },
    "content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
}
```

Response:
```json
{}
```

### GET-DOCUMENT-METADATA - Get Document Metadata

Request:
```json
{
    "operation": "get-document-metadata",
    "document_id": "doc-123",
    "user": "alice"
}
```

Response:
```json
{
    "document_metadata": {
        "id": "doc-123",
        "time": 1640995200.0,
        "kind": "application/pdf",
        "title": "Research Paper",
        "comments": "Important research findings",
        "user": "alice",
        "tags": ["research", "ai", "machine-learning"],
        "metadata": [
            {
                "s": {
                    "v": "http://example.com/doc-123",
                    "e": true
                },
                "p": {
                    "v": "http://purl.org/dc/elements/1.1/creator",
                    "e": true
                },
                "o": {
                    "v": "Dr. Smith",
                    "e": false
                }
            }
        ]
    }
}
```

### GET-DOCUMENT-CONTENT - Get Document Content

Request:
```json
{
    "operation": "get-document-content",
    "document_id": "doc-123",
    "user": "alice"
}
```

Response:
```json
{
    "content": "JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCg=="
}
```

### LIST-DOCUMENTS - List User's Documents

Request:
```json
{
    "operation": "list-documents",
    "user": "alice",
    "collection": "research"
}
```

Response:
```json
{
    "document_metadatas": [
        {
            "id": "doc-123",
            "time": 1640995200.0,
            "kind": "application/pdf",
            "title": "Research Paper",
            "comments": "Important research findings",
            "user": "alice",
            "tags": ["research", "ai"]
        },
        {
            "id": "doc-124",
            "time": 1640995300.0,
            "kind": "text/plain",
            "title": "Meeting Notes",
            "comments": "Team meeting discussion",
            "user": "alice",
            "tags": ["meeting", "notes"]
        }
    ]
}
```

### UPDATE-DOCUMENT - Update Document Metadata

Request:
```json
{
    "operation": "update-document",
    "document_metadata": {
        "id": "doc-123",
        "time": 1640995500.0,
        "title": "Updated Research Paper",
        "comments": "Updated findings and conclusions",
        "user": "alice",
        "tags": ["research", "ai", "machine-learning", "updated"],
        "metadata": []
    }
}
```

Response:
```json
{}
```

### REMOVE-DOCUMENT - Remove Document

Request:
```json
{
    "operation": "remove-document",
    "document_id": "doc-123",
    "user": "alice"
}
```

Response:
```json
{}
```

## Processing Operations

### ADD-PROCESSING - Start Document Processing

Request:
```json
{
    "operation": "add-processing",
    "processing_metadata": {
        "id": "proc-456",
        "document_id": "doc-123",
        "time": 1640995400.0,
        "flow": "pdf-extraction",
        "user": "alice",
        "collection": "research",
        "tags": ["extraction", "nlp"]
    }
}
```

Response:
```json
{}
```

### LIST-PROCESSING - List Processing Jobs

Request:
```json
{
    "operation": "list-processing",
    "user": "alice",
    "collection": "research"
}
```

Response:
```json
{
    "processing_metadatas": [
        {
            "id": "proc-456",
            "document_id": "doc-123",
            "time": 1640995400.0,
            "flow": "pdf-extraction",
            "user": "alice",
            "collection": "research",
            "tags": ["extraction", "nlp"]
        }
    ]
}
```

### REMOVE-PROCESSING - Stop Processing Job

Request:
```json
{
    "operation": "remove-processing",
    "processing_id": "proc-456",
    "user": "alice"
}
```

Response:
```json
{}
```

## REST service

The REST service is available at `/api/v1/librarian` and accepts the above request formats.

## Websocket

Requests have a `request` object containing the operation fields.
Responses have a `response` object containing the response fields.

Request:
```json
{
    "id": "unique-request-id",
    "service": "librarian",
    "request": {
        "operation": "list-documents",
        "user": "alice"
    }
}
```

Response:
```json
{
    "id": "unique-request-id",
    "response": {
        "document_metadatas": [...]
    },
    "complete": true
}
```

## Pulsar

The Pulsar schema for the Librarian API is defined in Python code here:

https://github.com/trustgraph-ai/trustgraph/blob/master/trustgraph-base/trustgraph/schema/library.py

Default request queue:
`non-persistent://tg/request/librarian`

Default response queue:
`non-persistent://tg/response/librarian`

Request schema:
`trustgraph.schema.LibrarianRequest`

Response schema:
`trustgraph.schema.LibrarianResponse`

## Python SDK

The Python SDK provides convenient access to the Librarian API:

```python
from trustgraph.api.library import LibrarianClient

client = LibrarianClient()

# Add a document
with open("document.pdf", "rb") as f:
    content = f.read()
    
await client.add_document(
    doc_id="doc-123",
    title="Research Paper",
    content=content,
    user="alice",
    tags=["research", "ai"]
)

# Get document metadata
metadata = await client.get_document_metadata("doc-123", "alice")

# List documents
documents = await client.list_documents("alice", collection="research")

# Start processing
await client.add_processing(
    processing_id="proc-456",
    document_id="doc-123",
    flow="pdf-extraction",
    user="alice"
)
```

## Features

- **Hybrid Storage**: S3-compatible object storage (MinIO, Ceph RGW, AWS S3, etc.) for content, Cassandra for metadata
- **Multi-user Support**: User-based document ownership and access control
- **Rich Metadata**: RDF-style metadata triples and tagging system
- **Processing Integration**: Automatic triggering of document processing workflows
- **Content Types**: Support for multiple document formats (PDF, text, etc.)
- **Collection Management**: Optional document grouping by collection
- **Metadata Search**: Query documents by metadata criteria
- **Flexible Storage Backend**: Works with any S3-compatible storage (MinIO, Ceph RADOS Gateway, AWS S3, Cloudflare R2, etc.)

## Use Cases

- **Document Management**: Store and organize documents with rich metadata
- **Knowledge Extraction**: Process documents to extract structured knowledge
- **Research Libraries**: Manage collections of research papers and documents
- **Content Processing**: Orchestrate document processing workflows
- **Multi-tenant Systems**: Support multiple users with isolated document libraries