mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-06-13 16:55:14 +02:00

cybermaggedon 2bf4af294e Better proc group logging and concurrency (#810 ) - Silence pika, cassandra etc. logging at INFO (too much chatter) - Add per processor log tags so that logs can be understood in processor group. - Deal with RabbitMQ lag weirdness - Added more processor group examples		2026-04-15 14:52:01 +01:00
..
groups	Better proc group logging and concurrency (#810 )	2026-04-15 14:52:01 +01:00
group.yaml	Processor group implementation: dev wrapper (#808 )	2026-04-14 15:19:04 +01:00
README.md	Processor group implementation: dev wrapper (#808 )	2026-04-14 15:19:04 +01:00

README.md

proc-group — run TrustGraph as a single process

A dev-focused alternative to the per-container deployment. Instead of 30+ containers each running a single processor, processor-group runs all the processors as asyncio tasks inside one Python process, sharing the event loop, Prometheus registry, and (importantly) resources on your laptop.

This is not for production. Scale deployments should keep using per-processor containers — one failure bringing down the whole process, no horizontal scaling, and a single giant log are fine for dev and a bad idea in prod.

What this directory contains

group.yaml — the group runner config. One entry per processor, each with the dotted class path and a params dict. Defaults (pubsub backend, rabbitmq host, log level) are pulled in per-entry with a YAML anchor.
README.md — this file.

Prerequisites

Install the TrustGraph packages into a venv:

pip install trustgraph-base trustgraph-flow trustgraph-unstructured

trustgraph-base provides the processor-group endpoint. The others provide the processor classes that group.yaml imports at runtime. trustgraph-unstructured is only needed if you want document-decoder (the universal-decoder processor).

Running it

Start infrastructure (cassandra, qdrant, rabbitmq, garage, observability stack) with a working compose file. These aren't packable into the group - they're third-party services. You may be able to run these as standalone services.

To get Cassandra to be accessible from the host, you need to set a couple of environment variables:

      CASSANDRA_BROADCAST_ADDRESS: 127.0.0.1
      CASSANDRA_LISTEN_ADDRESS: 127.0.0.1

and also set network: host. Then start services:

podman-compose up -d cassandra qdrant rabbitmq
podman-compose up -d garage garage-init
podman-compose up -d loki prometheus grafana
podman-compose up -d init-trustgraph

init-trustgraph is a one-shot that seeds config and the default flow into cassandra/rabbitmq. Don't leave too long a delay between starting init-trustgraph and running the processor-group, because it needs to talk to the config service.

Run the api-gateway separately — it's an aiohttp HTTP server, not an AsyncProcessor, so the group runner doesn't host it:

Raise the file descriptor limit — 30+ processors sharing one process open far more sockets than the default 1024 allows:

ulimit -n 65536

Then start the group from a terminal:

processor-group -c group.yaml --no-loki-enabled

You'll see every processor's startup messages interleaved in one log. Each processor has a supervisor that restarts it independently on failure, so a transient crash (or a dependency that isn't ready yet) only affects that one processor — siblings keep running and the failing one self-heals on the next retry.

Finally when everything is running you can start the API gateway from its own terminal:

api-gateway \
    --pubsub-backend rabbitmq --rabbitmq-host localhost \
    --loki-url http://localhost:3100/loki/api/v1/push \
    --no-metrics

When things go wrong

"Too many open files" — raise ulimit -n further. 65536 is usually plenty but some workflows need more.
One processor failing repeatedly — look for its id in the log. The supervisor will log each failure before restarting. Fix the cause (missing env var, unreachable dependency, bad params) and the processor self-heals on the next 4-second retry without restarting the whole group.
Ctrl-C leaves the process hung — the pika and cassandra drivers spawn non-cooperative threads that asyncio can't cancel. Use Ctrl-
(SIGQUIT) to force-kill. Not a bug in the group runner, just a limitation of those libraries.

Environment variables

Processors that talk to external LLMs or APIs read their credentials from env vars, same as in the per-container deployment:

OPENAI_TOKEN, OPENAI_BASE_URL — for text-completion / text-completion-rag

Export whatever your particular group.yaml needs before running.