trustgraph/docs/tech-specs/active-flow-key-restructure.md
2026-04-17 09:09:22 +01:00

2.5 KiB

layout title parent
default Active-Flow Key Restructure Tech Specs

Active-Flow Key Restructure

Problem

Active-flow config uses ('active-flow', processor) as its key, where each processor's value is a JSON blob containing all flow variants assigned to that processor:

('active-flow', 'chunker') -> { "default": {...}, "flow2": {...} }

This causes two problems:

  1. Read-modify-write on every change. Starting or stopping a flow requires fetching the processor's current blob, parsing it, adding or removing a variant, serialising it, and writing it back. This is a concurrency hazard if two flow operations target the same processor simultaneously.

  2. Noisy config pushes. Config subscribers subscribe to a type, not a specific key. Every active-flow write triggers a config push that causes every processor in the system to fetch the full config and re-evaluate, even though only one processor's config changed. With N processors in a blueprint, a single flow start/stop causes N writes and N^2 config fetches across the system.

Proposed Change

Restructure the key to ('active-flow', 'processor:variant') where each key holds a single flow variant's configuration:

('active-flow', 'chunker:default') -> { "topics": {...}, "parameters": {...} }
('active-flow', 'chunker:flow2')   -> { "topics": {...}, "parameters": {...} }

Starting a flow is a set of clean puts. Stopping a flow is a set of clean deletes. No read-modify-write. No JSON blob merging.

The config push problem (all processors fetching on every change) remains — that's a limitation of the config subscription model and would require per-key subscriptions to solve. But eliminating the read-modify-write removes the concurrency hazard and simplifies the flow service code.

What Changes

  • Flow service (flow.py): handle_start_flow writes individual keys per processor:variant instead of merging into per-processor blobs. handle_stop_flow deletes individual keys instead of read-modify-write.
  • FlowProcessor (flow_processor.py): on_configure_flows currently looks up config["active-flow"][self.id] to find a JSON blob of all its variants. Needs to scan all active-flow keys for entries prefixed with self.id: and assemble its flow list from those.
  • Config client: May benefit from a prefix-scan or pattern-match query to support the FlowProcessor lookup efficiently.
  • Initial config / bootstrapping: Any code that seeds active-flow entries at deployment time needs to use the new key format.