Processor group implementation: dev wrapper (#808)

Processor group implementation: A wrapper to launch multiple processors in a single processor - trustgraph-base/trustgraph/base/processor_group.py — group runner module. run_group(config) is the async body; run() is the endpoint. Loads JSON or YAML config, validates that every entry has a unique params.id, instantiates each class via importlib, shares one TaskGroup, mirrors AsyncProcessor.launch's retry loop and Prometheus startup. - trustgraph-base/pyproject.toml — added [project.scripts] block with processor-group = "trustgraph.base.processor_group:run". Key behaviours: - Unique id enforced up front — missing or duplicate params.id fails fast with a clear error, preventing the Prometheus Info label collision we flagged. - No registry — dotted class path is the identifier; any AsyncProcessor descendant importable at runtime is packable. - YAML import is lazy — only pulled in if the config file ends in .yaml/.yml, so JSON-only users don't need PyYAML installed. - Single Prometheus server — start_http_server runs once at startup, before the retry loop, matching launch()'s pattern. - Retry loop — same shape as AsyncProcessor.launch: catches ExceptionGroup from TaskGroup, logs, sleeps 4s, retries. Fail-group semantics (one processor dying tears down the group) — simple and surfaces bugs, as discussed. Example config: processors: - class: trustgraph.extract.kg.definitions.extract.Processor params: id: kg-extract-definitions - class: trustgraph.chunking.recursive.Processor params: id: chunker-recursive Run with processor-group -c group.yaml.
2026-06-15 09:45:13 +02:00 · 2026-04-14 15:19:04 +01:00 · 2026-04-14 15:19:04 +01:00 · f11c0ad0cb
commit f11c0ad0cb
parent 8954fa3ad7
6 changed files with 580 additions and 11 deletions
--- a/dev-tools/proc-group/README.md
+++ b/dev-tools/proc-group/README.md
@ -0,0 +1,117 @@
+# proc-group — run TrustGraph as a single process
+
+A dev-focused alternative to the per-container deployment. Instead of 30+
+containers each running a single processor, `processor-group` runs all the
+processors as asyncio tasks inside one Python process, sharing the event
+loop, Prometheus registry, and (importantly) resources on your laptop.
+
+This is **not** for production. Scale deployments should keep using
+per-processor containers — one failure bringing down the whole process,
+no horizontal scaling, and a single giant log are fine for dev and a
+bad idea in prod.
+
+## What this directory contains
+
+- `group.yaml` — the group runner config. One entry per processor, each
+  with the dotted class path and a params dict. Defaults (pubsub backend,
+  rabbitmq host, log level) are pulled in per-entry with a YAML anchor.
+- `README.md` — this file.
+
+## Prerequisites
+
+Install the TrustGraph packages into a venv:
+
+```
+pip install trustgraph-base trustgraph-flow trustgraph-unstructured
+```
+
+`trustgraph-base` provides the `processor-group` endpoint. The others
+provide the processor classes that `group.yaml` imports at runtime.
+`trustgraph-unstructured` is only needed if you want `document-decoder`
+(the `universal-decoder` processor).
+
+## Running it
+
+Start infrastructure (cassandra, qdrant, rabbitmq, garage, observability
+stack) with a working compose file. These aren't packable into the group -
+they're third-party services.  You may be able to run these as standalone
+services.
+
+To get Cassandra to be accessible from the host, you need to 
+set a couple of environment variables:
+```
+      CASSANDRA_BROADCAST_ADDRESS: 127.0.0.1
+      CASSANDRA_LISTEN_ADDRESS: 127.0.0.1
+```
+and also set `network: host`.  Then start services:
+
+```
+podman-compose up -d cassandra qdrant rabbitmq
+podman-compose up -d garage garage-init
+podman-compose up -d loki prometheus grafana
+podman-compose up -d init-trustgraph
+```
+
+`init-trustgraph` is a one-shot that seeds config and the default flow
+into cassandra/rabbitmq.  Don't leave too long a delay between starting
+`init-trustgraph` and running the processor-group, because it needs to
+talk to the config service.
+
+Run the api-gateway separately — it's an aiohttp HTTP server, not an
+`AsyncProcessor`, so the group runner doesn't host it:
+
+Raise the file descriptor limit — 30+ processors sharing one process
+open far more sockets than the default 1024 allows:
+
+```
+ulimit -n 65536
+```
+
+Then start the group from a terminal:
+
+```
+processor-group -c group.yaml --no-loki-enabled
+```
+
+You'll see every processor's startup messages interleaved in one log.
+Each processor has a supervisor that restarts it independently on
+failure, so a transient crash (or a dependency that isn't ready yet)
+only affects that one processor — siblings keep running and the failing
+one self-heals on the next retry.
+
+Finally when everything is running you can start the API gateway from
+its own terminal:
+
+```
+api-gateway \
+    --pubsub-backend rabbitmq --rabbitmq-host localhost \
+    --loki-url http://localhost:3100/loki/api/v1/push \
+    --no-metrics
+```
+
+
+
+## When things go wrong
+
+- **"Too many open files"** — raise `ulimit -n` further. 65536 is
+  usually plenty but some workflows need more.
+- **One processor failing repeatedly** — look for its id in the log. The
+  supervisor will log each failure before restarting. Fix the cause
+  (missing env var, unreachable dependency, bad params) and the
+  processor self-heals on the next 4-second retry without restarting
+  the whole group.
+- **Ctrl-C leaves the process hung** — the pika and cassandra drivers
+  spawn non-cooperative threads that asyncio can't cancel. Use Ctrl-\
+  (SIGQUIT) to force-kill. Not a bug in the group runner, just a
+  limitation of those libraries.
+
+## Environment variables
+
+Processors that talk to external LLMs or APIs read their credentials
+from env vars, same as in the per-container deployment:
+
+- `OPENAI_TOKEN`, `OPENAI_BASE_URL` — for `text-completion` /
+  `text-completion-rag`
+
+Export whatever your particular `group.yaml` needs before running.
+
--- a/dev-tools/proc-group/group.yaml
+++ b/dev-tools/proc-group/group.yaml
@ -0,0 +1,257 @@
+# Multi-processor group config, derived from docker-compose.yaml.
+#
+# Covers every AsyncProcessor-based service from the compose file.
+# Out of scope:
+#   - api-gateway       (aiohttp, not AsyncProcessor)
+#   - init-trustgraph   (one-shot init, not a processor)
+#   - document-decoder  (universal-decoder, trustgraph-unstructured package —
+#                        packable but lives in a separate image/package)
+#   - mcp-server        (trustgraph-mcp package, separate image)
+#   - ddg-mcp-server    (third-party image)
+#   - infrastructure    (cassandra, rabbitmq, qdrant, garage, grafana,
+#                        prometheus, loki, workbench-ui)
+#
+# Run with:
+#   processor-group -c group.yaml
+
+_defaults: &defaults
+  pubsub_backend: rabbitmq
+  rabbitmq_host: localhost
+  log_level: INFO
+
+processors:
+
+  - class: trustgraph.agent.orchestrator.Processor
+    params:
+      <<: *defaults
+      id: agent-manager
+
+  - class: trustgraph.chunking.recursive.Processor
+    params:
+      <<: *defaults
+      id: chunker
+      chunk_size: 2000
+      chunk_overlap: 50
+
+  - class: trustgraph.config.service.Processor
+    params:
+      <<: *defaults
+      id: config-svc
+      cassandra_host: localhost
+
+  - class: trustgraph.decoding.universal.Processor
+    params:
+      <<: *defaults
+      id: document-decoder
+
+  - class: trustgraph.embeddings.document_embeddings.Processor
+    params:
+      <<: *defaults
+      id: document-embeddings
+
+  - class: trustgraph.retrieval.document_rag.Processor
+    params:
+      <<: *defaults
+      id: document-rag
+      doc_limit: 20
+
+  - class: trustgraph.embeddings.fastembed.Processor
+    params:
+      <<: *defaults
+      id: embeddings
+      concurrency: 1
+
+  - class: trustgraph.embeddings.graph_embeddings.Processor
+    params:
+      <<: *defaults
+      id: graph-embeddings
+
+  - class: trustgraph.retrieval.graph_rag.Processor
+    params:
+      <<: *defaults
+      id: graph-rag
+      concurrency: 1
+      entity_limit: 50
+      triple_limit: 30
+      edge_limit: 30
+      edge_score_limit: 10
+      max_subgraph_size: 100
+      max_path_length: 2
+
+  - class: trustgraph.extract.kg.agent.Processor
+    params:
+      <<: *defaults
+      id: kg-extract-agent
+      concurrency: 1
+
+  - class: trustgraph.extract.kg.definitions.Processor
+    params:
+      <<: *defaults
+      id: kg-extract-definitions
+      concurrency: 1
+
+  - class: trustgraph.extract.kg.ontology.Processor
+    params:
+      <<: *defaults
+      id: kg-extract-ontology
+      concurrency: 1
+
+  - class: trustgraph.extract.kg.relationships.Processor
+    params:
+      <<: *defaults
+      id: kg-extract-relationships
+      concurrency: 1
+
+  - class: trustgraph.extract.kg.rows.Processor
+    params:
+      <<: *defaults
+      id: kg-extract-rows
+      concurrency: 1
+
+  - class: trustgraph.cores.service.Processor
+    params:
+      <<: *defaults
+      id: knowledge
+      cassandra_host: localhost
+
+  - class: trustgraph.storage.knowledge.store.Processor
+    params:
+      <<: *defaults
+      id: kg-store
+      cassandra_host: localhost
+
+  - class: trustgraph.librarian.Processor
+    params:
+      <<: *defaults
+      id: librarian
+      cassandra_host: localhost
+      object_store_endpoint: localhost:3900
+      object_store_access_key: GK000000000000000000000001
+      object_store_secret_key: b171f00be9be4c32c734f4c05fe64c527a8ab5eb823b376cfa8c2531f70fc427
+      object_store_region: garage
+
+  - class: trustgraph.agent.mcp_tool.Service
+    params:
+      <<: *defaults
+      id: mcp-tool
+
+  - class: trustgraph.metering.Processor
+    params:
+      <<: *defaults
+      id: metering
+
+  - class: trustgraph.metering.Processor
+    params:
+      <<: *defaults
+      id: metering-rag
+
+  - class: trustgraph.retrieval.nlp_query.Processor
+    params:
+      <<: *defaults
+      id: nlp-query
+
+  - class: trustgraph.prompt.template.Processor
+    params:
+      <<: *defaults
+      id: prompt
+      concurrency: 1
+
+  - class: trustgraph.prompt.template.Processor
+    params:
+      <<: *defaults
+      id: prompt-rag
+      concurrency: 1
+
+  - class: trustgraph.query.doc_embeddings.qdrant.Processor
+    params:
+      <<: *defaults
+      id: doc-embeddings-query
+      store_uri: http://localhost:6333
+
+  - class: trustgraph.query.graph_embeddings.qdrant.Processor
+    params:
+      <<: *defaults
+      id: graph-embeddings-query
+      store_uri: http://localhost:6333
+
+  - class: trustgraph.query.row_embeddings.qdrant.Processor
+    params:
+      <<: *defaults
+      id: row-embeddings-query
+      store_uri: http://localhost:6333
+
+  - class: trustgraph.query.rows.cassandra.Processor
+    params:
+      <<: *defaults
+      id: rows-query
+      cassandra_host: localhost
+
+  - class: trustgraph.query.triples.cassandra.Processor
+    params:
+      <<: *defaults
+      id: triples-query
+      cassandra_host: localhost
+
+  - class: trustgraph.embeddings.row_embeddings.Processor
+    params:
+      <<: *defaults
+      id: row-embeddings
+
+  - class: trustgraph.query.sparql.Processor
+    params:
+      <<: *defaults
+      id: sparql-query
+
+  - class: trustgraph.storage.doc_embeddings.qdrant.Processor
+    params:
+      <<: *defaults
+      id: doc-embeddings-write
+      store_uri: http://localhost:6333
+
+  - class: trustgraph.storage.graph_embeddings.qdrant.Processor
+    params:
+      <<: *defaults
+      id: graph-embeddings-write
+      store_uri: http://localhost:6333
+
+  - class: trustgraph.storage.row_embeddings.qdrant.Processor
+    params:
+      <<: *defaults
+      id: row-embeddings-write
+      store_uri: http://localhost:6333
+
+  - class: trustgraph.storage.rows.cassandra.Processor
+    params:
+      <<: *defaults
+      id: rows-write
+      cassandra_host: localhost
+
+  - class: trustgraph.storage.triples.cassandra.Processor
+    params:
+      <<: *defaults
+      id: triples-write
+      cassandra_host: localhost
+
+  - class: trustgraph.retrieval.structured_diag.Processor
+    params:
+      <<: *defaults
+      id: structured-diag
+
+  - class: trustgraph.retrieval.structured_query.Processor
+    params:
+      <<: *defaults
+      id: structured-query
+
+  - class: trustgraph.model.text_completion.openai.Processor
+    params:
+      <<: *defaults
+      id: text-completion
+      max_output: 8192
+      temperature: 0.0
+
+  - class: trustgraph.model.text_completion.openai.Processor
+    params:
+      <<: *defaults
+      id: text-completion-rag
+      max_output: 8192
+      temperature: 0.0