mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Adds a RabbitMQ backend as an alternative to Pulsar, selectable via PUBSUB_BACKEND=rabbitmq. Both backends implement the same PubSubBackend protocol — no application code changes needed to switch. RabbitMQ topology: - Single topic exchange per topicspace (e.g. 'tg') - Routing key derived from queue class and topic name - Shared consumers: named queue bound to exchange (competing, round-robin) - Exclusive consumers: anonymous auto-delete queue (broadcast, each gets every message). Used by Subscriber and config push consumer. - Thread-local producer connections (pika is not thread-safe) - Push-based consumption via basic_consume with process_data_events for heartbeat processing Consumer model changes: - Consumer class creates one backend consumer per concurrent task (required for pika thread safety, harmless for Pulsar) - Consumer class accepts consumer_type parameter - Subscriber passes consumer_type='exclusive' for broadcast semantics - Config push consumer uses consumer_type='exclusive' so every processor instance receives config updates - handle_one_from_queue receives consumer as parameter for correct per-connection ack/nack LibrarianClient: - New shared client class replacing duplicated librarian request-response code across 6+ services (chunking, decoders, RAG, etc.) - Uses stream-document instead of get-document-content for fetching document content in 1MB chunks (avoids broker message size limits) - Standalone object (self.librarian = LibrarianClient(...)) not a mixin - get-document-content marked deprecated in schema and OpenAPI spec Serialisation: - Extracted dataclass_to_dict/dict_to_dataclass to shared serialization.py (used by both Pulsar and RabbitMQ backends) Librarian queues: - Changed from flow class (persistent) back to request/response class now that stream-document eliminates large single messages - API upload chunk size reduced from 5MB to 3MB to stay under broker limits after base64 encoding Factory and CLI: - get_pubsub() handles 'rabbitmq' backend with RabbitMQ connection params - add_pubsub_args() includes RabbitMQ options (host, port, credentials) - add_pubsub_args(standalone=True) defaults to localhost for CLI tools - init_trustgraph skips Pulsar admin setup for non-Pulsar backends - tg-dump-queues and tg-monitor-prompts use backend abstraction - BaseClient and ConfigClient accept generic pubsub config
108 lines
3.3 KiB
YAML
108 lines
3.3 KiB
YAML
type: object
|
|
description: |
|
|
Librarian service request for document library management.
|
|
|
|
Operations: add-document, remove-document, list-documents,
|
|
get-document-metadata, stream-document, add-child-document,
|
|
list-children, begin-upload, upload-chunk, complete-upload,
|
|
abort-upload, get-upload-status, list-uploads,
|
|
start-processing, stop-processing, list-processing
|
|
required:
|
|
- operation
|
|
properties:
|
|
operation:
|
|
type: string
|
|
enum:
|
|
- add-document
|
|
- remove-document
|
|
- list-documents
|
|
- get-document-metadata
|
|
- get-document-content
|
|
- stream-document
|
|
- add-child-document
|
|
- list-children
|
|
- begin-upload
|
|
- upload-chunk
|
|
- complete-upload
|
|
- abort-upload
|
|
- get-upload-status
|
|
- list-uploads
|
|
- start-processing
|
|
- stop-processing
|
|
- list-processing
|
|
description: |
|
|
Library operation:
|
|
- `add-document`: Add document to library
|
|
- `remove-document`: Remove document from library
|
|
- `list-documents`: List documents in library
|
|
- `get-document-metadata`: Get document metadata
|
|
- `get-document-content`: Get full document content in a single response.
|
|
**Deprecated** — use `stream-document` instead. Fails for documents
|
|
exceeding the broker's max message size.
|
|
- `stream-document`: Stream document content in chunks. Each response
|
|
includes `chunk_index` and `is_final`. Preferred over `get-document-content`
|
|
for all document sizes.
|
|
- `add-child-document`: Add a child document (e.g. page, chunk)
|
|
- `list-children`: List child documents of a parent
|
|
- `begin-upload`: Start a chunked upload session
|
|
- `upload-chunk`: Upload a chunk of data
|
|
- `complete-upload`: Finalize a chunked upload
|
|
- `abort-upload`: Cancel a chunked upload
|
|
- `get-upload-status`: Check upload progress
|
|
- `list-uploads`: List active upload sessions
|
|
- `start-processing`: Start processing library documents
|
|
- `stop-processing`: Stop library processing
|
|
- `list-processing`: List processing status
|
|
flow:
|
|
type: string
|
|
description: Flow ID
|
|
example: my-flow
|
|
collection:
|
|
type: string
|
|
description: Collection identifier
|
|
default: default
|
|
example: default
|
|
user:
|
|
type: string
|
|
description: User identifier
|
|
default: trustgraph
|
|
example: alice
|
|
document-id:
|
|
type: string
|
|
description: Document identifier
|
|
example: doc-123
|
|
processing-id:
|
|
type: string
|
|
description: Processing task identifier
|
|
example: proc-456
|
|
document-metadata:
|
|
$ref: '../common/DocumentMetadata.yaml'
|
|
processing-metadata:
|
|
$ref: '../common/ProcessingMetadata.yaml'
|
|
content:
|
|
type: string
|
|
description: Document content (for add-document with inline content)
|
|
example: This is the document content...
|
|
criteria:
|
|
type: array
|
|
description: Search criteria for filtering documents
|
|
items:
|
|
type: object
|
|
required:
|
|
- key
|
|
- value
|
|
- operator
|
|
properties:
|
|
key:
|
|
type: string
|
|
description: Metadata field name
|
|
example: author
|
|
value:
|
|
type: string
|
|
description: Value to match
|
|
example: John Doe
|
|
operator:
|
|
type: string
|
|
enum: [eq, ne, gt, lt, contains]
|
|
description: Comparison operator
|
|
example: eq
|