trustgraph/docs/tech-specs/active-flow-key-restructure.md
cybermaggedon 9f84891fcc
Flow service lifecycle management (#822)
feat: separate flow service from config service with explicit queue
lifecycle management

The flow service is now an independent service that owns the lifecycle
of flow and blueprint queues. System services own their own queues.
Consumers never create queues.

Flow service separation:
- New service at trustgraph-flow/trustgraph/flow/service/
- Uses async ConfigClient (RequestResponse pattern) to talk to config
  service
- Config service stripped of all flow handling

Queue lifecycle management:
- PubSubBackend protocol gains create_queue, delete_queue,
  queue_exists, ensure_queue — all async
- RabbitMQ: implements via pika with asyncio.to_thread internally
- Pulsar: stubs for future admin REST API implementation
- Consumer _connect() no longer creates queues (passive=True for named
  queues)
- System services call ensure_queue on startup
- Flow service creates queues on flow start, deletes on flow stop
- Flow service ensures queues for pre-existing flows on startup

Two-phase flow stop:
- Phase 1: set flow status to "stopping", delete processor config
  entries
- Phase 2: retry queue deletion, then delete flow record

Config restructure:
- active-flow config replaced with processor:{name} types
- Each processor has its own config type, each flow variant is a key
- Flow start/stop use batch put/delete — single config push per
  operation
- FlowProcessor subscribes to its own type only

Blueprint format:
- Processor entries split into topics and parameters dicts
- Flow interfaces use {"flow": "topic"} instead of bare strings
- Specs (ConsumerSpec, ProducerSpec, etc.) read from
  definition["topics"]

Tests updated
2026-04-16 17:19:39 +01:00

2.5 KiB

layout title parent
default Active-Flow Key Restructure Tech Specs

Active-Flow Key Restructure

Problem

Active-flow config uses ('active-flow', processor) as its key, where each processor's value is a JSON blob containing all flow variants assigned to that processor:

('active-flow', 'chunker') -> { "default": {...}, "flow2": {...} }

This causes two problems:

  1. Read-modify-write on every change. Starting or stopping a flow requires fetching the processor's current blob, parsing it, adding or removing a variant, serialising it, and writing it back. This is a concurrency hazard if two flow operations target the same processor simultaneously.

  2. Noisy config pushes. Config subscribers subscribe to a type, not a specific key. Every active-flow write triggers a config push that causes every processor in the system to fetch the full config and re-evaluate, even though only one processor's config changed. With N processors in a blueprint, a single flow start/stop causes N writes and N^2 config fetches across the system.

Proposed Change

Restructure the key to ('active-flow', 'processor:variant') where each key holds a single flow variant's configuration:

('active-flow', 'chunker:default') -> { "topics": {...}, "parameters": {...} }
('active-flow', 'chunker:flow2')   -> { "topics": {...}, "parameters": {...} }

Starting a flow is a set of clean puts. Stopping a flow is a set of clean deletes. No read-modify-write. No JSON blob merging.

The config push problem (all processors fetching on every change) remains — that's a limitation of the config subscription model and would require per-key subscriptions to solve. But eliminating the read-modify-write removes the concurrency hazard and simplifies the flow service code.

What Changes

  • Flow service (flow.py): handle_start_flow writes individual keys per processor:variant instead of merging into per-processor blobs. handle_stop_flow deletes individual keys instead of read-modify-write.
  • FlowProcessor (flow_processor.py): on_configure_flows currently looks up config["active-flow"][self.id] to find a JSON blob of all its variants. Needs to scan all active-flow keys for entries prefixed with self.id: and assemble its flow list from those.
  • Config client: May benefit from a prefix-scan or pattern-match query to support the FlowProcessor lookup efficiently.
  • Initial config / bootstrapping: Any code that seeds active-flow entries at deployment time needs to use the new key format.