trustgraph/docs/tech-specs/active-flow-key-restructure.md

---
layout: default
title: "Active-Flow Key Restructure"
parent: "Tech Specs"
---

# Active-Flow Key Restructure

## Problem

Active-flow config uses `('active-flow', processor)` as its key, where
each processor's value is a JSON blob containing all flow variants
assigned to that processor:

```
('active-flow', 'chunker') -> { "default": {...}, "flow2": {...} }
```

This causes two problems:

1. **Read-modify-write on every change.** Starting or stopping a flow
   requires fetching the processor's current blob, parsing it, adding
   or removing a variant, serialising it, and writing it back. This is
   a concurrency hazard if two flow operations target the same
   processor simultaneously.

2. **Noisy config pushes.** Config subscribers subscribe to a type,
   not a specific key. Every active-flow write triggers a config push
   that causes every processor in the system to fetch the full config
   and re-evaluate, even though only one processor's config changed.
   With N processors in a blueprint, a single flow start/stop causes
   N writes and N^2 config fetches across the system.

## Proposed Change

Restructure the key to `('active-flow', 'processor:variant')` where
each key holds a single flow variant's configuration:

```
('active-flow', 'chunker:default') -> { "topics": {...}, "parameters": {...} }
('active-flow', 'chunker:flow2')   -> { "topics": {...}, "parameters": {...} }
```

Starting a flow is a set of clean puts. Stopping a flow is a set of
clean deletes. No read-modify-write. No JSON blob merging.

The config push problem (all processors fetching on every change)
remains — that's a limitation of the config subscription model and
would require per-key subscriptions to solve. But eliminating the
read-modify-write removes the concurrency hazard and simplifies the
flow service code.

## What Changes

- **Flow service** (`flow.py`): `handle_start_flow` writes individual
  keys per processor:variant instead of merging into per-processor
  blobs. `handle_stop_flow` deletes individual keys instead of
  read-modify-write.
- **FlowProcessor** (`flow_processor.py`): `on_configure_flows`
  currently looks up `config["active-flow"][self.id]` to find a JSON
  blob of all its variants. Needs to scan all active-flow keys for
  entries prefixed with `self.id:` and assemble its flow list from
  those.
- **Config client**: May benefit from a prefix-scan or pattern-match
  query to support the FlowProcessor lookup efficiently.
- **Initial config / bootstrapping**: Any code that seeds active-flow
  entries at deployment time needs to use the new key format.
Flow service lifecycle management (#822) feat: separate flow service from config service with explicit queue lifecycle management The flow service is now an independent service that owns the lifecycle of flow and blueprint queues. System services own their own queues. Consumers never create queues. Flow service separation: - New service at trustgraph-flow/trustgraph/flow/service/ - Uses async ConfigClient (RequestResponse pattern) to talk to config service - Config service stripped of all flow handling Queue lifecycle management: - PubSubBackend protocol gains create_queue, delete_queue, queue_exists, ensure_queue — all async - RabbitMQ: implements via pika with asyncio.to_thread internally - Pulsar: stubs for future admin REST API implementation - Consumer _connect() no longer creates queues (passive=True for named queues) - System services call ensure_queue on startup - Flow service creates queues on flow start, deletes on flow stop - Flow service ensures queues for pre-existing flows on startup Two-phase flow stop: - Phase 1: set flow status to "stopping", delete processor config entries - Phase 2: retry queue deletion, then delete flow record Config restructure: - active-flow config replaced with processor:{name} types - Each processor has its own config type, each flow variant is a key - Flow start/stop use batch put/delete — single config push per operation - FlowProcessor subscribes to its own type only Blueprint format: - Processor entries split into topics and parameters dicts - Flow interfaces use {"flow": "topic"} instead of bare strings - Specs (ConsumerSpec, ProducerSpec, etc.) read from definition["topics"] Tests updated 2026-04-16 17:19:39 +01:00			`---`
			`layout: default`
			`title: "Active-Flow Key Restructure"`
			`parent: "Tech Specs"`
			`---`

			`# Active-Flow Key Restructure`

			`## Problem`

			Active-flow config uses `('active-flow', processor)` as its key, where
			`each processor's value is a JSON blob containing all flow variants`
			`assigned to that processor:`

			```
			`('active-flow', 'chunker') -> { "default": {...}, "flow2": {...} }`
			```

			`This causes two problems:`

			`1. Read-modify-write on every change. Starting or stopping a flow`
			`requires fetching the processor's current blob, parsing it, adding`
			`or removing a variant, serialising it, and writing it back. This is`
			`a concurrency hazard if two flow operations target the same`
			`processor simultaneously.`

			`2. Noisy config pushes. Config subscribers subscribe to a type,`
			`not a specific key. Every active-flow write triggers a config push`
			`that causes every processor in the system to fetch the full config`
			`and re-evaluate, even though only one processor's config changed.`
			`With N processors in a blueprint, a single flow start/stop causes`
			`N writes and N^2 config fetches across the system.`

			`## Proposed Change`

			Restructure the key to `('active-flow', 'processor:variant')` where
			`each key holds a single flow variant's configuration:`

			```
			`('active-flow', 'chunker:default') -> { "topics": {...}, "parameters": {...} }`
			`('active-flow', 'chunker:flow2') -> { "topics": {...}, "parameters": {...} }`
			```

			`Starting a flow is a set of clean puts. Stopping a flow is a set of`
			`clean deletes. No read-modify-write. No JSON blob merging.`

			`The config push problem (all processors fetching on every change)`
			`remains — that's a limitation of the config subscription model and`
			`would require per-key subscriptions to solve. But eliminating the`
			`read-modify-write removes the concurrency hazard and simplifies the`
			`flow service code.`

			`## What Changes`

			- Flow service (`flow.py`): `handle_start_flow` writes individual
			`keys per processor:variant instead of merging into per-processor`
			blobs. `handle_stop_flow` deletes individual keys instead of
			`read-modify-write.`
			- FlowProcessor (`flow_processor.py`): `on_configure_flows`
			currently looks up `config["active-flow"][self.id]` to find a JSON
			`blob of all its variants. Needs to scan all active-flow keys for`
			entries prefixed with `self.id:` and assemble its flow list from
			`those.`
			`- Config client: May benefit from a prefix-scan or pattern-match`
			`query to support the FlowProcessor lookup efficiently.`
			`- Initial config / bootstrapping: Any code that seeds active-flow`
			`entries at deployment time needs to use the new key format.`