mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
Processor group implementation: dev wrapper (#808)
Processor group implementation: A wrapper to launch multiple
processors in a single processor
- trustgraph-base/trustgraph/base/processor_group.py — group runner
module. run_group(config) is the async body; run() is the
endpoint. Loads JSON or YAML config, validates that every entry
has a unique params.id, instantiates each class via importlib,
shares one TaskGroup, mirrors AsyncProcessor.launch's retry loop
and Prometheus startup.
- trustgraph-base/pyproject.toml — added [project.scripts] block
with processor-group = "trustgraph.base.processor_group:run".
Key behaviours:
- Unique id enforced up front — missing or duplicate params.id fails
fast with a clear error, preventing the Prometheus Info label
collision we flagged.
- No registry — dotted class path is the identifier; any
AsyncProcessor descendant importable at runtime is packable.
- YAML import is lazy — only pulled in if the config file ends in
.yaml/.yml, so JSON-only users don't need PyYAML installed.
- Single Prometheus server — start_http_server runs once at
startup, before the retry loop, matching launch()'s pattern.
- Retry loop — same shape as AsyncProcessor.launch: catches
ExceptionGroup from TaskGroup, logs, sleeps 4s,
retries. Fail-group semantics (one processor dying tears down the
group) — simple and surfaces bugs, as discussed.
Example config:
processors:
- class: trustgraph.extract.kg.definitions.extract.Processor
params:
id: kg-extract-definitions
- class: trustgraph.chunking.recursive.Processor
params:
id: chunker-recursive
Run with processor-group -c group.yaml.
This commit is contained in:
parent
8954fa3ad7
commit
f11c0ad0cb
6 changed files with 580 additions and 11 deletions
117
dev-tools/proc-group/README.md
Normal file
117
dev-tools/proc-group/README.md
Normal file
|
|
@ -0,0 +1,117 @@
|
||||||
|
# proc-group — run TrustGraph as a single process
|
||||||
|
|
||||||
|
A dev-focused alternative to the per-container deployment. Instead of 30+
|
||||||
|
containers each running a single processor, `processor-group` runs all the
|
||||||
|
processors as asyncio tasks inside one Python process, sharing the event
|
||||||
|
loop, Prometheus registry, and (importantly) resources on your laptop.
|
||||||
|
|
||||||
|
This is **not** for production. Scale deployments should keep using
|
||||||
|
per-processor containers — one failure bringing down the whole process,
|
||||||
|
no horizontal scaling, and a single giant log are fine for dev and a
|
||||||
|
bad idea in prod.
|
||||||
|
|
||||||
|
## What this directory contains
|
||||||
|
|
||||||
|
- `group.yaml` — the group runner config. One entry per processor, each
|
||||||
|
with the dotted class path and a params dict. Defaults (pubsub backend,
|
||||||
|
rabbitmq host, log level) are pulled in per-entry with a YAML anchor.
|
||||||
|
- `README.md` — this file.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Install the TrustGraph packages into a venv:
|
||||||
|
|
||||||
|
```
|
||||||
|
pip install trustgraph-base trustgraph-flow trustgraph-unstructured
|
||||||
|
```
|
||||||
|
|
||||||
|
`trustgraph-base` provides the `processor-group` endpoint. The others
|
||||||
|
provide the processor classes that `group.yaml` imports at runtime.
|
||||||
|
`trustgraph-unstructured` is only needed if you want `document-decoder`
|
||||||
|
(the `universal-decoder` processor).
|
||||||
|
|
||||||
|
## Running it
|
||||||
|
|
||||||
|
Start infrastructure (cassandra, qdrant, rabbitmq, garage, observability
|
||||||
|
stack) with a working compose file. These aren't packable into the group -
|
||||||
|
they're third-party services. You may be able to run these as standalone
|
||||||
|
services.
|
||||||
|
|
||||||
|
To get Cassandra to be accessible from the host, you need to
|
||||||
|
set a couple of environment variables:
|
||||||
|
```
|
||||||
|
CASSANDRA_BROADCAST_ADDRESS: 127.0.0.1
|
||||||
|
CASSANDRA_LISTEN_ADDRESS: 127.0.0.1
|
||||||
|
```
|
||||||
|
and also set `network: host`. Then start services:
|
||||||
|
|
||||||
|
```
|
||||||
|
podman-compose up -d cassandra qdrant rabbitmq
|
||||||
|
podman-compose up -d garage garage-init
|
||||||
|
podman-compose up -d loki prometheus grafana
|
||||||
|
podman-compose up -d init-trustgraph
|
||||||
|
```
|
||||||
|
|
||||||
|
`init-trustgraph` is a one-shot that seeds config and the default flow
|
||||||
|
into cassandra/rabbitmq. Don't leave too long a delay between starting
|
||||||
|
`init-trustgraph` and running the processor-group, because it needs to
|
||||||
|
talk to the config service.
|
||||||
|
|
||||||
|
Run the api-gateway separately — it's an aiohttp HTTP server, not an
|
||||||
|
`AsyncProcessor`, so the group runner doesn't host it:
|
||||||
|
|
||||||
|
Raise the file descriptor limit — 30+ processors sharing one process
|
||||||
|
open far more sockets than the default 1024 allows:
|
||||||
|
|
||||||
|
```
|
||||||
|
ulimit -n 65536
|
||||||
|
```
|
||||||
|
|
||||||
|
Then start the group from a terminal:
|
||||||
|
|
||||||
|
```
|
||||||
|
processor-group -c group.yaml --no-loki-enabled
|
||||||
|
```
|
||||||
|
|
||||||
|
You'll see every processor's startup messages interleaved in one log.
|
||||||
|
Each processor has a supervisor that restarts it independently on
|
||||||
|
failure, so a transient crash (or a dependency that isn't ready yet)
|
||||||
|
only affects that one processor — siblings keep running and the failing
|
||||||
|
one self-heals on the next retry.
|
||||||
|
|
||||||
|
Finally when everything is running you can start the API gateway from
|
||||||
|
its own terminal:
|
||||||
|
|
||||||
|
```
|
||||||
|
api-gateway \
|
||||||
|
--pubsub-backend rabbitmq --rabbitmq-host localhost \
|
||||||
|
--loki-url http://localhost:3100/loki/api/v1/push \
|
||||||
|
--no-metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## When things go wrong
|
||||||
|
|
||||||
|
- **"Too many open files"** — raise `ulimit -n` further. 65536 is
|
||||||
|
usually plenty but some workflows need more.
|
||||||
|
- **One processor failing repeatedly** — look for its id in the log. The
|
||||||
|
supervisor will log each failure before restarting. Fix the cause
|
||||||
|
(missing env var, unreachable dependency, bad params) and the
|
||||||
|
processor self-heals on the next 4-second retry without restarting
|
||||||
|
the whole group.
|
||||||
|
- **Ctrl-C leaves the process hung** — the pika and cassandra drivers
|
||||||
|
spawn non-cooperative threads that asyncio can't cancel. Use Ctrl-\
|
||||||
|
(SIGQUIT) to force-kill. Not a bug in the group runner, just a
|
||||||
|
limitation of those libraries.
|
||||||
|
|
||||||
|
## Environment variables
|
||||||
|
|
||||||
|
Processors that talk to external LLMs or APIs read their credentials
|
||||||
|
from env vars, same as in the per-container deployment:
|
||||||
|
|
||||||
|
- `OPENAI_TOKEN`, `OPENAI_BASE_URL` — for `text-completion` /
|
||||||
|
`text-completion-rag`
|
||||||
|
|
||||||
|
Export whatever your particular `group.yaml` needs before running.
|
||||||
|
|
||||||
257
dev-tools/proc-group/group.yaml
Normal file
257
dev-tools/proc-group/group.yaml
Normal file
|
|
@ -0,0 +1,257 @@
|
||||||
|
# Multi-processor group config, derived from docker-compose.yaml.
|
||||||
|
#
|
||||||
|
# Covers every AsyncProcessor-based service from the compose file.
|
||||||
|
# Out of scope:
|
||||||
|
# - api-gateway (aiohttp, not AsyncProcessor)
|
||||||
|
# - init-trustgraph (one-shot init, not a processor)
|
||||||
|
# - document-decoder (universal-decoder, trustgraph-unstructured package —
|
||||||
|
# packable but lives in a separate image/package)
|
||||||
|
# - mcp-server (trustgraph-mcp package, separate image)
|
||||||
|
# - ddg-mcp-server (third-party image)
|
||||||
|
# - infrastructure (cassandra, rabbitmq, qdrant, garage, grafana,
|
||||||
|
# prometheus, loki, workbench-ui)
|
||||||
|
#
|
||||||
|
# Run with:
|
||||||
|
# processor-group -c group.yaml
|
||||||
|
|
||||||
|
_defaults: &defaults
|
||||||
|
pubsub_backend: rabbitmq
|
||||||
|
rabbitmq_host: localhost
|
||||||
|
log_level: INFO
|
||||||
|
|
||||||
|
processors:
|
||||||
|
|
||||||
|
- class: trustgraph.agent.orchestrator.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: agent-manager
|
||||||
|
|
||||||
|
- class: trustgraph.chunking.recursive.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: chunker
|
||||||
|
chunk_size: 2000
|
||||||
|
chunk_overlap: 50
|
||||||
|
|
||||||
|
- class: trustgraph.config.service.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: config-svc
|
||||||
|
cassandra_host: localhost
|
||||||
|
|
||||||
|
- class: trustgraph.decoding.universal.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: document-decoder
|
||||||
|
|
||||||
|
- class: trustgraph.embeddings.document_embeddings.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: document-embeddings
|
||||||
|
|
||||||
|
- class: trustgraph.retrieval.document_rag.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: document-rag
|
||||||
|
doc_limit: 20
|
||||||
|
|
||||||
|
- class: trustgraph.embeddings.fastembed.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: embeddings
|
||||||
|
concurrency: 1
|
||||||
|
|
||||||
|
- class: trustgraph.embeddings.graph_embeddings.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: graph-embeddings
|
||||||
|
|
||||||
|
- class: trustgraph.retrieval.graph_rag.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: graph-rag
|
||||||
|
concurrency: 1
|
||||||
|
entity_limit: 50
|
||||||
|
triple_limit: 30
|
||||||
|
edge_limit: 30
|
||||||
|
edge_score_limit: 10
|
||||||
|
max_subgraph_size: 100
|
||||||
|
max_path_length: 2
|
||||||
|
|
||||||
|
- class: trustgraph.extract.kg.agent.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: kg-extract-agent
|
||||||
|
concurrency: 1
|
||||||
|
|
||||||
|
- class: trustgraph.extract.kg.definitions.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: kg-extract-definitions
|
||||||
|
concurrency: 1
|
||||||
|
|
||||||
|
- class: trustgraph.extract.kg.ontology.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: kg-extract-ontology
|
||||||
|
concurrency: 1
|
||||||
|
|
||||||
|
- class: trustgraph.extract.kg.relationships.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: kg-extract-relationships
|
||||||
|
concurrency: 1
|
||||||
|
|
||||||
|
- class: trustgraph.extract.kg.rows.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: kg-extract-rows
|
||||||
|
concurrency: 1
|
||||||
|
|
||||||
|
- class: trustgraph.cores.service.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: knowledge
|
||||||
|
cassandra_host: localhost
|
||||||
|
|
||||||
|
- class: trustgraph.storage.knowledge.store.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: kg-store
|
||||||
|
cassandra_host: localhost
|
||||||
|
|
||||||
|
- class: trustgraph.librarian.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: librarian
|
||||||
|
cassandra_host: localhost
|
||||||
|
object_store_endpoint: localhost:3900
|
||||||
|
object_store_access_key: GK000000000000000000000001
|
||||||
|
object_store_secret_key: b171f00be9be4c32c734f4c05fe64c527a8ab5eb823b376cfa8c2531f70fc427
|
||||||
|
object_store_region: garage
|
||||||
|
|
||||||
|
- class: trustgraph.agent.mcp_tool.Service
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: mcp-tool
|
||||||
|
|
||||||
|
- class: trustgraph.metering.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: metering
|
||||||
|
|
||||||
|
- class: trustgraph.metering.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: metering-rag
|
||||||
|
|
||||||
|
- class: trustgraph.retrieval.nlp_query.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: nlp-query
|
||||||
|
|
||||||
|
- class: trustgraph.prompt.template.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: prompt
|
||||||
|
concurrency: 1
|
||||||
|
|
||||||
|
- class: trustgraph.prompt.template.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: prompt-rag
|
||||||
|
concurrency: 1
|
||||||
|
|
||||||
|
- class: trustgraph.query.doc_embeddings.qdrant.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: doc-embeddings-query
|
||||||
|
store_uri: http://localhost:6333
|
||||||
|
|
||||||
|
- class: trustgraph.query.graph_embeddings.qdrant.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: graph-embeddings-query
|
||||||
|
store_uri: http://localhost:6333
|
||||||
|
|
||||||
|
- class: trustgraph.query.row_embeddings.qdrant.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: row-embeddings-query
|
||||||
|
store_uri: http://localhost:6333
|
||||||
|
|
||||||
|
- class: trustgraph.query.rows.cassandra.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: rows-query
|
||||||
|
cassandra_host: localhost
|
||||||
|
|
||||||
|
- class: trustgraph.query.triples.cassandra.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: triples-query
|
||||||
|
cassandra_host: localhost
|
||||||
|
|
||||||
|
- class: trustgraph.embeddings.row_embeddings.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: row-embeddings
|
||||||
|
|
||||||
|
- class: trustgraph.query.sparql.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: sparql-query
|
||||||
|
|
||||||
|
- class: trustgraph.storage.doc_embeddings.qdrant.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: doc-embeddings-write
|
||||||
|
store_uri: http://localhost:6333
|
||||||
|
|
||||||
|
- class: trustgraph.storage.graph_embeddings.qdrant.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: graph-embeddings-write
|
||||||
|
store_uri: http://localhost:6333
|
||||||
|
|
||||||
|
- class: trustgraph.storage.row_embeddings.qdrant.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: row-embeddings-write
|
||||||
|
store_uri: http://localhost:6333
|
||||||
|
|
||||||
|
- class: trustgraph.storage.rows.cassandra.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: rows-write
|
||||||
|
cassandra_host: localhost
|
||||||
|
|
||||||
|
- class: trustgraph.storage.triples.cassandra.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: triples-write
|
||||||
|
cassandra_host: localhost
|
||||||
|
|
||||||
|
- class: trustgraph.retrieval.structured_diag.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: structured-diag
|
||||||
|
|
||||||
|
- class: trustgraph.retrieval.structured_query.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: structured-query
|
||||||
|
|
||||||
|
- class: trustgraph.model.text_completion.openai.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: text-completion
|
||||||
|
max_output: 8192
|
||||||
|
temperature: 0.0
|
||||||
|
|
||||||
|
- class: trustgraph.model.text_completion.openai.Processor
|
||||||
|
params:
|
||||||
|
<<: *defaults
|
||||||
|
id: text-completion-rag
|
||||||
|
max_output: 8192
|
||||||
|
temperature: 0.0
|
||||||
|
|
@ -24,6 +24,9 @@ classifiers = [
|
||||||
[project.urls]
|
[project.urls]
|
||||||
Homepage = "https://github.com/trustgraph-ai/trustgraph"
|
Homepage = "https://github.com/trustgraph-ai/trustgraph"
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
processor-group = "trustgraph.base.processor_group:run"
|
||||||
|
|
||||||
[tool.setuptools.packages.find]
|
[tool.setuptools.packages.find]
|
||||||
include = ["trustgraph*"]
|
include = ["trustgraph*"]
|
||||||
|
|
||||||
|
|
|
||||||
197
trustgraph-base/trustgraph/base/processor_group.py
Normal file
197
trustgraph-base/trustgraph/base/processor_group.py
Normal file
|
|
@ -0,0 +1,197 @@
|
||||||
|
|
||||||
|
# Multi-processor group runner. Runs multiple AsyncProcessor descendants
|
||||||
|
# as concurrent tasks inside a single process, sharing one event loop,
|
||||||
|
# one Prometheus HTTP server, and one pub/sub backend pool.
|
||||||
|
#
|
||||||
|
# Intended for dev and resource-constrained deployments. Scale deployments
|
||||||
|
# should continue to use per-processor endpoints.
|
||||||
|
#
|
||||||
|
# Group config is a YAML or JSON file with shape:
|
||||||
|
#
|
||||||
|
# processors:
|
||||||
|
# - class: trustgraph.extract.kg.definitions.extract.Processor
|
||||||
|
# params:
|
||||||
|
# id: kg-extract-definitions
|
||||||
|
# triples_batch_size: 1000
|
||||||
|
# - class: trustgraph.chunking.recursive.Processor
|
||||||
|
# params:
|
||||||
|
# id: chunker-recursive
|
||||||
|
#
|
||||||
|
# Each entry's params are passed directly to the class constructor alongside
|
||||||
|
# the shared taskgroup. Defaults live inside each processor class.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import importlib
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import time
|
||||||
|
|
||||||
|
from prometheus_client import start_http_server
|
||||||
|
|
||||||
|
from . logging import add_logging_args, setup_logging
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _load_config(path):
|
||||||
|
with open(path) as f:
|
||||||
|
text = f.read()
|
||||||
|
if path.endswith((".yaml", ".yml")):
|
||||||
|
import yaml
|
||||||
|
return yaml.safe_load(text)
|
||||||
|
return json.loads(text)
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_class(dotted):
|
||||||
|
module_path, _, class_name = dotted.rpartition(".")
|
||||||
|
if not module_path:
|
||||||
|
raise ValueError(
|
||||||
|
f"Processor class must be a dotted path, got {dotted!r}"
|
||||||
|
)
|
||||||
|
module = importlib.import_module(module_path)
|
||||||
|
return getattr(module, class_name)
|
||||||
|
|
||||||
|
|
||||||
|
RESTART_DELAY_SECONDS = 4
|
||||||
|
|
||||||
|
|
||||||
|
async def _supervise(entry):
|
||||||
|
"""Run one processor with its own nested TaskGroup, restarting on any
|
||||||
|
failure. Each processor is isolated from its siblings — a crash here
|
||||||
|
does not propagate to the outer group."""
|
||||||
|
|
||||||
|
pid = entry["params"]["id"]
|
||||||
|
class_path = entry["class"]
|
||||||
|
|
||||||
|
while True:
|
||||||
|
|
||||||
|
try:
|
||||||
|
|
||||||
|
async with asyncio.TaskGroup() as inner_tg:
|
||||||
|
|
||||||
|
cls = _resolve_class(class_path)
|
||||||
|
params = dict(entry.get("params", {}))
|
||||||
|
params["taskgroup"] = inner_tg
|
||||||
|
|
||||||
|
logger.info(f"Starting {class_path} as {pid}")
|
||||||
|
|
||||||
|
p = cls(**params)
|
||||||
|
await p.start()
|
||||||
|
inner_tg.create_task(p.run())
|
||||||
|
|
||||||
|
# Clean exit — processor's run() returned without raising.
|
||||||
|
# Treat as a transient shutdown and restart, matching the
|
||||||
|
# behaviour of per-container `restart: on-failure`.
|
||||||
|
logger.warning(
|
||||||
|
f"Processor {pid} exited cleanly, will restart"
|
||||||
|
)
|
||||||
|
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
logger.info(f"Processor {pid} cancelled")
|
||||||
|
raise
|
||||||
|
|
||||||
|
except BaseExceptionGroup as eg:
|
||||||
|
for e in eg.exceptions:
|
||||||
|
logger.error(
|
||||||
|
f"Processor {pid} failure: {type(e).__name__}: {e}",
|
||||||
|
exc_info=e,
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(
|
||||||
|
f"Processor {pid} failure: {type(e).__name__}: {e}",
|
||||||
|
exc_info=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Restarting {pid} in {RESTART_DELAY_SECONDS}s..."
|
||||||
|
)
|
||||||
|
await asyncio.sleep(RESTART_DELAY_SECONDS)
|
||||||
|
|
||||||
|
|
||||||
|
async def run_group(config):
|
||||||
|
|
||||||
|
entries = config.get("processors", [])
|
||||||
|
if not entries:
|
||||||
|
raise RuntimeError("Group config has no processors")
|
||||||
|
|
||||||
|
seen_ids = set()
|
||||||
|
for entry in entries:
|
||||||
|
pid = entry.get("params", {}).get("id")
|
||||||
|
if pid is None:
|
||||||
|
raise RuntimeError(
|
||||||
|
f"Entry {entry.get('class')!r} missing params.id — "
|
||||||
|
f"required for metrics labelling"
|
||||||
|
)
|
||||||
|
if pid in seen_ids:
|
||||||
|
raise RuntimeError(f"Duplicate processor id {pid!r} in group")
|
||||||
|
seen_ids.add(pid)
|
||||||
|
|
||||||
|
async with asyncio.TaskGroup() as outer_tg:
|
||||||
|
for entry in entries:
|
||||||
|
outer_tg.create_task(_supervise(entry))
|
||||||
|
|
||||||
|
|
||||||
|
def run():
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
prog="processor-group",
|
||||||
|
description="Run multiple processors as tasks in one process",
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"-c", "--config",
|
||||||
|
required=True,
|
||||||
|
help="Path to group config file (JSON or YAML)",
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"--metrics",
|
||||||
|
action=argparse.BooleanOptionalAction,
|
||||||
|
default=True,
|
||||||
|
help="Metrics enabled (default: true)",
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"-P", "--metrics-port",
|
||||||
|
type=int,
|
||||||
|
default=8000,
|
||||||
|
help="Prometheus metrics port (default: 8000)",
|
||||||
|
)
|
||||||
|
|
||||||
|
add_logging_args(parser)
|
||||||
|
|
||||||
|
args = vars(parser.parse_args())
|
||||||
|
|
||||||
|
setup_logging(args)
|
||||||
|
|
||||||
|
config = _load_config(args["config"])
|
||||||
|
|
||||||
|
if args["metrics"]:
|
||||||
|
start_http_server(args["metrics_port"])
|
||||||
|
|
||||||
|
while True:
|
||||||
|
|
||||||
|
logger.info("Starting group...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
asyncio.run(run_group(config))
|
||||||
|
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
logger.info("Keyboard interrupt.")
|
||||||
|
return
|
||||||
|
|
||||||
|
except ExceptionGroup as e:
|
||||||
|
logger.error("Exception group:")
|
||||||
|
for se in e.exceptions:
|
||||||
|
logger.error(f" Type: {type(se)}")
|
||||||
|
logger.error(f" Exception: {se}", exc_info=se)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Type: {type(e)}")
|
||||||
|
logger.error(f"Exception: {e}", exc_info=True)
|
||||||
|
|
||||||
|
logger.warning("Will retry...")
|
||||||
|
time.sleep(4)
|
||||||
|
logger.info("Retrying...")
|
||||||
|
|
@ -9,7 +9,7 @@ from aiohttp import web
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
|
|
||||||
from trustgraph.base.logging import setup_logging
|
from trustgraph.base.logging import setup_logging, add_logging_args
|
||||||
from trustgraph.base.pubsub import get_pubsub, add_pubsub_args
|
from trustgraph.base.pubsub import get_pubsub, add_pubsub_args
|
||||||
|
|
||||||
from . auth import Authenticator
|
from . auth import Authenticator
|
||||||
|
|
@ -195,12 +195,7 @@ def run():
|
||||||
help=f'Secret API token (default: no auth)',
|
help=f'Secret API token (default: no auth)',
|
||||||
)
|
)
|
||||||
|
|
||||||
parser.add_argument(
|
add_logging_args(parser)
|
||||||
'-l', '--log-level',
|
|
||||||
default='INFO',
|
|
||||||
choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'],
|
|
||||||
help=f'Log level (default: INFO)'
|
|
||||||
)
|
|
||||||
|
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'--metrics',
|
'--metrics',
|
||||||
|
|
|
||||||
|
|
@ -102,10 +102,10 @@ class Processor(FlowProcessor):
|
||||||
__class__.cost_metric.labels(model=modelname, direction="input").inc(cost_in)
|
__class__.cost_metric.labels(model=modelname, direction="input").inc(cost_in)
|
||||||
__class__.cost_metric.labels(model=modelname, direction="output").inc(cost_out)
|
__class__.cost_metric.labels(model=modelname, direction="output").inc(cost_out)
|
||||||
|
|
||||||
logger.info(f"Model: {modelname}")
|
logger.debug(
|
||||||
logger.info(f"Input Tokens: {num_in}")
|
f"Model: {modelname}, in={num_in}, out={num_out}, "
|
||||||
logger.info(f"Output Tokens: {num_out}")
|
f"cost=${cost_per_call}"
|
||||||
logger.info(f"Cost for call: ${cost_per_call}")
|
)
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def add_args(parser):
|
def add_args(parser):
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue