trustgraph/docs/tech-specs/active-flow-key-restructure.md

---
layout: default
title: "Active-Flow Key Restructure"
parent: "Tech Specs"
---

# Active-Flow Key Restructure

## Problem

Active-flow config uses `('active-flow', processor)` as its key, where
each processor's value is a JSON blob containing all flow variants
assigned to that processor:

```
('active-flow', 'chunker') -> { "default": {...}, "flow2": {...} }
```

This causes two problems:

1. **Read-modify-write on every change.** Starting or stopping a flow
   requires fetching the processor's current blob, parsing it, adding
   or removing a variant, serialising it, and writing it back. This is
   a concurrency hazard if two flow operations target the same
   processor simultaneously.

2. **Noisy config pushes.** Config subscribers subscribe to a type,
   not a specific key. Every active-flow write triggers a config push
   that causes every processor in the system to fetch the full config
   and re-evaluate, even though only one processor's config changed.
   With N processors in a blueprint, a single flow start/stop causes
   N writes and N^2 config fetches across the system.

## Proposed Change

Restructure the key to `('active-flow', 'processor:variant')` where
each key holds a single flow variant's configuration:

```
('active-flow', 'chunker:default') -> { "topics": {...}, "parameters": {...} }
('active-flow', 'chunker:flow2')   -> { "topics": {...}, "parameters": {...} }
```

Starting a flow is a set of clean puts. Stopping a flow is a set of
clean deletes. No read-modify-write. No JSON blob merging.

The config push problem (all processors fetching on every change)
remains — that's a limitation of the config subscription model and
would require per-key subscriptions to solve. But eliminating the
read-modify-write removes the concurrency hazard and simplifies the
flow service code.

## What Changes

- **Flow service** (`flow.py`): `handle_start_flow` writes individual
  keys per processor:variant instead of merging into per-processor
  blobs. `handle_stop_flow` deletes individual keys instead of
  read-modify-write.
- **FlowProcessor** (`flow_processor.py`): `on_configure_flows`
  currently looks up `config["active-flow"][self.id]` to find a JSON
  blob of all its variants. Needs to scan all active-flow keys for
  entries prefixed with `self.id:` and assemble its flow list from
  those.
- **Config client**: May benefit from a prefix-scan or pattern-match
  query to support the FlowProcessor lookup efficiently.
- **Initial config / bootstrapping**: Any code that seeds active-flow
  entries at deployment time needs to use the new key format.
Release/v2.3 -> master 2026-04-17 09:09:22 +01:00			`---`
			`layout: default`
			`title: "Active-Flow Key Restructure"`
			`parent: "Tech Specs"`
			`---`

			`# Active-Flow Key Restructure`

			`## Problem`

			Active-flow config uses `('active-flow', processor)` as its key, where
			`each processor's value is a JSON blob containing all flow variants`
			`assigned to that processor:`

			```
			`('active-flow', 'chunker') -> { "default": {...}, "flow2": {...} }`
			```

			`This causes two problems:`

			`1. Read-modify-write on every change. Starting or stopping a flow`
			`requires fetching the processor's current blob, parsing it, adding`
			`or removing a variant, serialising it, and writing it back. This is`
			`a concurrency hazard if two flow operations target the same`
			`processor simultaneously.`

			`2. Noisy config pushes. Config subscribers subscribe to a type,`
			`not a specific key. Every active-flow write triggers a config push`
			`that causes every processor in the system to fetch the full config`
			`and re-evaluate, even though only one processor's config changed.`
			`With N processors in a blueprint, a single flow start/stop causes`
			`N writes and N^2 config fetches across the system.`

			`## Proposed Change`

			Restructure the key to `('active-flow', 'processor:variant')` where
			`each key holds a single flow variant's configuration:`

			```
			`('active-flow', 'chunker:default') -> { "topics": {...}, "parameters": {...} }`
			`('active-flow', 'chunker:flow2') -> { "topics": {...}, "parameters": {...} }`
			```

			`Starting a flow is a set of clean puts. Stopping a flow is a set of`
			`clean deletes. No read-modify-write. No JSON blob merging.`

			`The config push problem (all processors fetching on every change)`
			`remains — that's a limitation of the config subscription model and`
			`would require per-key subscriptions to solve. But eliminating the`
			`read-modify-write removes the concurrency hazard and simplifies the`
			`flow service code.`

			`## What Changes`

			- Flow service (`flow.py`): `handle_start_flow` writes individual
			`keys per processor:variant instead of merging into per-processor`
			blobs. `handle_stop_flow` deletes individual keys instead of
			`read-modify-write.`
			- FlowProcessor (`flow_processor.py`): `on_configure_flows`
			currently looks up `config["active-flow"][self.id]` to find a JSON
			`blob of all its variants. Needs to scan all active-flow keys for`
			entries prefixed with `self.id:` and assemble its flow list from
			`those.`
			`- Config client: May benefit from a prefix-scan or pattern-match`
			`query to support the FlowProcessor lookup efficiently.`
			`- Initial config / bootstrapping: Any code that seeds active-flow`
			`entries at deployment time needs to use the new key format.`