trustgraph/specs/api/components/schemas/librarian/LibrarianRequest.yaml
cybermaggedon 24f0190ce7
RabbitMQ pub/sub backend with topic exchange architecture (#752)
Adds a RabbitMQ backend as an alternative to Pulsar, selectable via
PUBSUB_BACKEND=rabbitmq. Both backends implement the same PubSubBackend
protocol — no application code changes needed to switch.

RabbitMQ topology:
- Single topic exchange per topicspace (e.g. 'tg')
- Routing key derived from queue class and topic name
- Shared consumers: named queue bound to exchange (competing, round-robin)
- Exclusive consumers: anonymous auto-delete queue (broadcast, each gets
  every message). Used by Subscriber and config push consumer.
- Thread-local producer connections (pika is not thread-safe)
- Push-based consumption via basic_consume with process_data_events
  for heartbeat processing

Consumer model changes:
- Consumer class creates one backend consumer per concurrent task
  (required for pika thread safety, harmless for Pulsar)
- Consumer class accepts consumer_type parameter
- Subscriber passes consumer_type='exclusive' for broadcast semantics
- Config push consumer uses consumer_type='exclusive' so every
  processor instance receives config updates
- handle_one_from_queue receives consumer as parameter for correct
  per-connection ack/nack

LibrarianClient:
- New shared client class replacing duplicated librarian request-response
  code across 6+ services (chunking, decoders, RAG, etc.)
- Uses stream-document instead of get-document-content for fetching
  document content in 1MB chunks (avoids broker message size limits)
- Standalone object (self.librarian = LibrarianClient(...)) not a mixin
- get-document-content marked deprecated in schema and OpenAPI spec

Serialisation:
- Extracted dataclass_to_dict/dict_to_dataclass to shared
  serialization.py (used by both Pulsar and RabbitMQ backends)

Librarian queues:
- Changed from flow class (persistent) back to request/response class
  now that stream-document eliminates large single messages
- API upload chunk size reduced from 5MB to 3MB to stay under broker
  limits after base64 encoding

Factory and CLI:
- get_pubsub() handles 'rabbitmq' backend with RabbitMQ connection params
- add_pubsub_args() includes RabbitMQ options (host, port, credentials)
- add_pubsub_args(standalone=True) defaults to localhost for CLI tools
- init_trustgraph skips Pulsar admin setup for non-Pulsar backends
- tg-dump-queues and tg-monitor-prompts use backend abstraction
- BaseClient and ConfigClient accept generic pubsub config
2026-04-02 12:47:16 +01:00

108 lines
3.3 KiB
YAML

type: object
description: |
Librarian service request for document library management.
Operations: add-document, remove-document, list-documents,
get-document-metadata, stream-document, add-child-document,
list-children, begin-upload, upload-chunk, complete-upload,
abort-upload, get-upload-status, list-uploads,
start-processing, stop-processing, list-processing
required:
- operation
properties:
operation:
type: string
enum:
- add-document
- remove-document
- list-documents
- get-document-metadata
- get-document-content
- stream-document
- add-child-document
- list-children
- begin-upload
- upload-chunk
- complete-upload
- abort-upload
- get-upload-status
- list-uploads
- start-processing
- stop-processing
- list-processing
description: |
Library operation:
- `add-document`: Add document to library
- `remove-document`: Remove document from library
- `list-documents`: List documents in library
- `get-document-metadata`: Get document metadata
- `get-document-content`: Get full document content in a single response.
**Deprecated** — use `stream-document` instead. Fails for documents
exceeding the broker's max message size.
- `stream-document`: Stream document content in chunks. Each response
includes `chunk_index` and `is_final`. Preferred over `get-document-content`
for all document sizes.
- `add-child-document`: Add a child document (e.g. page, chunk)
- `list-children`: List child documents of a parent
- `begin-upload`: Start a chunked upload session
- `upload-chunk`: Upload a chunk of data
- `complete-upload`: Finalize a chunked upload
- `abort-upload`: Cancel a chunked upload
- `get-upload-status`: Check upload progress
- `list-uploads`: List active upload sessions
- `start-processing`: Start processing library documents
- `stop-processing`: Stop library processing
- `list-processing`: List processing status
flow:
type: string
description: Flow ID
example: my-flow
collection:
type: string
description: Collection identifier
default: default
example: default
user:
type: string
description: User identifier
default: trustgraph
example: alice
document-id:
type: string
description: Document identifier
example: doc-123
processing-id:
type: string
description: Processing task identifier
example: proc-456
document-metadata:
$ref: '../common/DocumentMetadata.yaml'
processing-metadata:
$ref: '../common/ProcessingMetadata.yaml'
content:
type: string
description: Document content (for add-document with inline content)
example: This is the document content...
criteria:
type: array
description: Search criteria for filtering documents
items:
type: object
required:
- key
- value
- operator
properties:
key:
type: string
description: Metadata field name
example: author
value:
type: string
description: Value to match
example: John Doe
operator:
type: string
enum: [eq, ne, gt, lt, contains]
description: Comparison operator
example: eq