trustgraph/specs/api/components/schemas/librarian/LibrarianRequest.yaml

type: object
description: |
  Librarian service request for document library management.

  Operations: add-document, remove-document, list-documents,
  get-document-metadata, stream-document, add-child-document,
  list-children, begin-upload, upload-chunk, complete-upload,
  abort-upload, get-upload-status, list-uploads,
  start-processing, stop-processing, list-processing
required:
  - operation
properties:
  operation:
    type: string
    enum:
      - add-document
      - remove-document
      - list-documents
      - get-document-metadata
      - get-document-content
      - stream-document
      - add-child-document
      - list-children
      - begin-upload
      - upload-chunk
      - complete-upload
      - abort-upload
      - get-upload-status
      - list-uploads
      - start-processing
      - stop-processing
      - list-processing
    description: |
      Library operation:
      - `add-document`: Add document to library
      - `remove-document`: Remove document from library
      - `list-documents`: List documents in library
      - `get-document-metadata`: Get document metadata
      - `get-document-content`: Get full document content in a single response.
        **Deprecated** — use `stream-document` instead. Fails for documents
        exceeding the broker's max message size.
      - `stream-document`: Stream document content in chunks. Each response
        includes `chunk_index` and `is_final`. Preferred over `get-document-content`
        for all document sizes.
      - `add-child-document`: Add a child document (e.g. page, chunk)
      - `list-children`: List child documents of a parent
      - `begin-upload`: Start a chunked upload session
      - `upload-chunk`: Upload a chunk of data
      - `complete-upload`: Finalize a chunked upload
      - `abort-upload`: Cancel a chunked upload
      - `get-upload-status`: Check upload progress
      - `list-uploads`: List active upload sessions
      - `start-processing`: Start processing library documents
      - `stop-processing`: Stop library processing
      - `list-processing`: List processing status
  flow:
    type: string
    description: Flow ID
    example: my-flow
  collection:
    type: string
    description: Collection identifier
    default: default
    example: default
  user:
    type: string
    description: User identifier
    default: trustgraph
    example: alice
  document-id:
    type: string
    description: Document identifier
    example: doc-123
  processing-id:
    type: string
    description: Processing task identifier
    example: proc-456
  document-metadata:
    $ref: '../common/DocumentMetadata.yaml'
  processing-metadata:
    $ref: '../common/ProcessingMetadata.yaml'
  content:
    type: string
    description: Document content (for add-document with inline content)
    example: This is the document content...
  criteria:
    type: array
    description: Search criteria for filtering documents
    items:
      type: object
      required:
        - key
        - value
        - operator
      properties:
        key:
          type: string
          description: Metadata field name
          example: author
        value:
          type: string
          description: Value to match
          example: John Doe
        operator:
          type: string
          enum: [eq, ne, gt, lt, contains]
          description: Comparison operator
          example: eq
REST API OpenAPI spec (#612) * OpenAPI spec in specs/api. Checked lint with redoc. 2026-01-15 11:04:37 +00:00			`type: object`
			`description: \|`
			`Librarian service request for document library management.`

			`Operations: add-document, remove-document, list-documents,`
RabbitMQ pub/sub backend with topic exchange architecture (#752) Adds a RabbitMQ backend as an alternative to Pulsar, selectable via PUBSUB_BACKEND=rabbitmq. Both backends implement the same PubSubBackend protocol — no application code changes needed to switch. RabbitMQ topology: - Single topic exchange per topicspace (e.g. 'tg') - Routing key derived from queue class and topic name - Shared consumers: named queue bound to exchange (competing, round-robin) - Exclusive consumers: anonymous auto-delete queue (broadcast, each gets every message). Used by Subscriber and config push consumer. - Thread-local producer connections (pika is not thread-safe) - Push-based consumption via basic_consume with process_data_events for heartbeat processing Consumer model changes: - Consumer class creates one backend consumer per concurrent task (required for pika thread safety, harmless for Pulsar) - Consumer class accepts consumer_type parameter - Subscriber passes consumer_type='exclusive' for broadcast semantics - Config push consumer uses consumer_type='exclusive' so every processor instance receives config updates - handle_one_from_queue receives consumer as parameter for correct per-connection ack/nack LibrarianClient: - New shared client class replacing duplicated librarian request-response code across 6+ services (chunking, decoders, RAG, etc.) - Uses stream-document instead of get-document-content for fetching document content in 1MB chunks (avoids broker message size limits) - Standalone object (self.librarian = LibrarianClient(...)) not a mixin - get-document-content marked deprecated in schema and OpenAPI spec Serialisation: - Extracted dataclass_to_dict/dict_to_dataclass to shared serialization.py (used by both Pulsar and RabbitMQ backends) Librarian queues: - Changed from flow class (persistent) back to request/response class now that stream-document eliminates large single messages - API upload chunk size reduced from 5MB to 3MB to stay under broker limits after base64 encoding Factory and CLI: - get_pubsub() handles 'rabbitmq' backend with RabbitMQ connection params - add_pubsub_args() includes RabbitMQ options (host, port, credentials) - add_pubsub_args(standalone=True) defaults to localhost for CLI tools - init_trustgraph skips Pulsar admin setup for non-Pulsar backends - tg-dump-queues and tg-monitor-prompts use backend abstraction - BaseClient and ConfigClient accept generic pubsub config 2026-04-02 12:47:16 +01:00			`get-document-metadata, stream-document, add-child-document,`
			`list-children, begin-upload, upload-chunk, complete-upload,`
			`abort-upload, get-upload-status, list-uploads,`
REST API OpenAPI spec (#612) * OpenAPI spec in specs/api. Checked lint with redoc. 2026-01-15 11:04:37 +00:00			`start-processing, stop-processing, list-processing`
			`required:`
			`- operation`
			`properties:`
			`operation:`
			`type: string`
			`enum:`
			`- add-document`
			`- remove-document`
			`- list-documents`
RabbitMQ pub/sub backend with topic exchange architecture (#752) Adds a RabbitMQ backend as an alternative to Pulsar, selectable via PUBSUB_BACKEND=rabbitmq. Both backends implement the same PubSubBackend protocol — no application code changes needed to switch. RabbitMQ topology: - Single topic exchange per topicspace (e.g. 'tg') - Routing key derived from queue class and topic name - Shared consumers: named queue bound to exchange (competing, round-robin) - Exclusive consumers: anonymous auto-delete queue (broadcast, each gets every message). Used by Subscriber and config push consumer. - Thread-local producer connections (pika is not thread-safe) - Push-based consumption via basic_consume with process_data_events for heartbeat processing Consumer model changes: - Consumer class creates one backend consumer per concurrent task (required for pika thread safety, harmless for Pulsar) - Consumer class accepts consumer_type parameter - Subscriber passes consumer_type='exclusive' for broadcast semantics - Config push consumer uses consumer_type='exclusive' so every processor instance receives config updates - handle_one_from_queue receives consumer as parameter for correct per-connection ack/nack LibrarianClient: - New shared client class replacing duplicated librarian request-response code across 6+ services (chunking, decoders, RAG, etc.) - Uses stream-document instead of get-document-content for fetching document content in 1MB chunks (avoids broker message size limits) - Standalone object (self.librarian = LibrarianClient(...)) not a mixin - get-document-content marked deprecated in schema and OpenAPI spec Serialisation: - Extracted dataclass_to_dict/dict_to_dataclass to shared serialization.py (used by both Pulsar and RabbitMQ backends) Librarian queues: - Changed from flow class (persistent) back to request/response class now that stream-document eliminates large single messages - API upload chunk size reduced from 5MB to 3MB to stay under broker limits after base64 encoding Factory and CLI: - get_pubsub() handles 'rabbitmq' backend with RabbitMQ connection params - add_pubsub_args() includes RabbitMQ options (host, port, credentials) - add_pubsub_args(standalone=True) defaults to localhost for CLI tools - init_trustgraph skips Pulsar admin setup for non-Pulsar backends - tg-dump-queues and tg-monitor-prompts use backend abstraction - BaseClient and ConfigClient accept generic pubsub config 2026-04-02 12:47:16 +01:00			`- get-document-metadata`
			`- get-document-content`
			`- stream-document`
			`- add-child-document`
			`- list-children`
			`- begin-upload`
			`- upload-chunk`
			`- complete-upload`
			`- abort-upload`
			`- get-upload-status`
			`- list-uploads`
REST API OpenAPI spec (#612) * OpenAPI spec in specs/api. Checked lint with redoc. 2026-01-15 11:04:37 +00:00			`- start-processing`
			`- stop-processing`
			`- list-processing`
			`description: \|`
			`Library operation:`
			- `add-document`: Add document to library
			- `remove-document`: Remove document from library
			- `list-documents`: List documents in library
RabbitMQ pub/sub backend with topic exchange architecture (#752) Adds a RabbitMQ backend as an alternative to Pulsar, selectable via PUBSUB_BACKEND=rabbitmq. Both backends implement the same PubSubBackend protocol — no application code changes needed to switch. RabbitMQ topology: - Single topic exchange per topicspace (e.g. 'tg') - Routing key derived from queue class and topic name - Shared consumers: named queue bound to exchange (competing, round-robin) - Exclusive consumers: anonymous auto-delete queue (broadcast, each gets every message). Used by Subscriber and config push consumer. - Thread-local producer connections (pika is not thread-safe) - Push-based consumption via basic_consume with process_data_events for heartbeat processing Consumer model changes: - Consumer class creates one backend consumer per concurrent task (required for pika thread safety, harmless for Pulsar) - Consumer class accepts consumer_type parameter - Subscriber passes consumer_type='exclusive' for broadcast semantics - Config push consumer uses consumer_type='exclusive' so every processor instance receives config updates - handle_one_from_queue receives consumer as parameter for correct per-connection ack/nack LibrarianClient: - New shared client class replacing duplicated librarian request-response code across 6+ services (chunking, decoders, RAG, etc.) - Uses stream-document instead of get-document-content for fetching document content in 1MB chunks (avoids broker message size limits) - Standalone object (self.librarian = LibrarianClient(...)) not a mixin - get-document-content marked deprecated in schema and OpenAPI spec Serialisation: - Extracted dataclass_to_dict/dict_to_dataclass to shared serialization.py (used by both Pulsar and RabbitMQ backends) Librarian queues: - Changed from flow class (persistent) back to request/response class now that stream-document eliminates large single messages - API upload chunk size reduced from 5MB to 3MB to stay under broker limits after base64 encoding Factory and CLI: - get_pubsub() handles 'rabbitmq' backend with RabbitMQ connection params - add_pubsub_args() includes RabbitMQ options (host, port, credentials) - add_pubsub_args(standalone=True) defaults to localhost for CLI tools - init_trustgraph skips Pulsar admin setup for non-Pulsar backends - tg-dump-queues and tg-monitor-prompts use backend abstraction - BaseClient and ConfigClient accept generic pubsub config 2026-04-02 12:47:16 +01:00			- `get-document-metadata`: Get document metadata
			- `get-document-content`: Get full document content in a single response.
			Deprecated — use `stream-document` instead. Fails for documents
			`exceeding the broker's max message size.`
			- `stream-document`: Stream document content in chunks. Each response
			includes `chunk_index` and `is_final`. Preferred over `get-document-content`
			`for all document sizes.`
			- `add-child-document`: Add a child document (e.g. page, chunk)
			- `list-children`: List child documents of a parent
			- `begin-upload`: Start a chunked upload session
			- `upload-chunk`: Upload a chunk of data
			- `complete-upload`: Finalize a chunked upload
			- `abort-upload`: Cancel a chunked upload
			- `get-upload-status`: Check upload progress
			- `list-uploads`: List active upload sessions
REST API OpenAPI spec (#612) * OpenAPI spec in specs/api. Checked lint with redoc. 2026-01-15 11:04:37 +00:00			- `start-processing`: Start processing library documents
			- `stop-processing`: Stop library processing
			- `list-processing`: List processing status
			`flow:`
			`type: string`
			`description: Flow ID`
			`example: my-flow`
			`collection:`
			`type: string`
			`description: Collection identifier`
			`default: default`
			`example: default`
			`user:`
			`type: string`
			`description: User identifier`
			`default: trustgraph`
			`example: alice`
			`document-id:`
			`type: string`
			`description: Document identifier`
			`example: doc-123`
			`processing-id:`
			`type: string`
			`description: Processing task identifier`
			`example: proc-456`
			`document-metadata:`
			`$ref: '../common/DocumentMetadata.yaml'`
			`processing-metadata:`
			`$ref: '../common/ProcessingMetadata.yaml'`
			`content:`
			`type: string`
			`description: Document content (for add-document with inline content)`
			`example: This is the document content...`
			`criteria:`
			`type: array`
			`description: Search criteria for filtering documents`
			`items:`
			`type: object`
			`required:`
			`- key`
			`- value`
			`- operator`
			`properties:`
			`key:`
			`type: string`
			`description: Metadata field name`
			`example: author`
			`value:`
			`type: string`
			`description: Value to match`
			`example: John Doe`
			`operator:`
			`type: string`
			`enum: [eq, ne, gt, lt, contains]`
			`description: Comparison operator`
			`example: eq`