mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-26 00:46:22 +02:00
Release/v2.3 -> master
This commit is contained in:
parent
59e269185d
commit
e8bc96ef7e
31 changed files with 1202 additions and 398 deletions
67
docs/tech-specs/active-flow-key-restructure.md
Normal file
67
docs/tech-specs/active-flow-key-restructure.md
Normal file
|
|
@ -0,0 +1,67 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Active-Flow Key Restructure"
|
||||
parent: "Tech Specs"
|
||||
---
|
||||
|
||||
# Active-Flow Key Restructure
|
||||
|
||||
## Problem
|
||||
|
||||
Active-flow config uses `('active-flow', processor)` as its key, where
|
||||
each processor's value is a JSON blob containing all flow variants
|
||||
assigned to that processor:
|
||||
|
||||
```
|
||||
('active-flow', 'chunker') -> { "default": {...}, "flow2": {...} }
|
||||
```
|
||||
|
||||
This causes two problems:
|
||||
|
||||
1. **Read-modify-write on every change.** Starting or stopping a flow
|
||||
requires fetching the processor's current blob, parsing it, adding
|
||||
or removing a variant, serialising it, and writing it back. This is
|
||||
a concurrency hazard if two flow operations target the same
|
||||
processor simultaneously.
|
||||
|
||||
2. **Noisy config pushes.** Config subscribers subscribe to a type,
|
||||
not a specific key. Every active-flow write triggers a config push
|
||||
that causes every processor in the system to fetch the full config
|
||||
and re-evaluate, even though only one processor's config changed.
|
||||
With N processors in a blueprint, a single flow start/stop causes
|
||||
N writes and N^2 config fetches across the system.
|
||||
|
||||
## Proposed Change
|
||||
|
||||
Restructure the key to `('active-flow', 'processor:variant')` where
|
||||
each key holds a single flow variant's configuration:
|
||||
|
||||
```
|
||||
('active-flow', 'chunker:default') -> { "topics": {...}, "parameters": {...} }
|
||||
('active-flow', 'chunker:flow2') -> { "topics": {...}, "parameters": {...} }
|
||||
```
|
||||
|
||||
Starting a flow is a set of clean puts. Stopping a flow is a set of
|
||||
clean deletes. No read-modify-write. No JSON blob merging.
|
||||
|
||||
The config push problem (all processors fetching on every change)
|
||||
remains — that's a limitation of the config subscription model and
|
||||
would require per-key subscriptions to solve. But eliminating the
|
||||
read-modify-write removes the concurrency hazard and simplifies the
|
||||
flow service code.
|
||||
|
||||
## What Changes
|
||||
|
||||
- **Flow service** (`flow.py`): `handle_start_flow` writes individual
|
||||
keys per processor:variant instead of merging into per-processor
|
||||
blobs. `handle_stop_flow` deletes individual keys instead of
|
||||
read-modify-write.
|
||||
- **FlowProcessor** (`flow_processor.py`): `on_configure_flows`
|
||||
currently looks up `config["active-flow"][self.id]` to find a JSON
|
||||
blob of all its variants. Needs to scan all active-flow keys for
|
||||
entries prefixed with `self.id:` and assemble its flow list from
|
||||
those.
|
||||
- **Config client**: May benefit from a prefix-scan or pattern-match
|
||||
query to support the FlowProcessor lookup efficiently.
|
||||
- **Initial config / bootstrapping**: Any code that seeds active-flow
|
||||
entries at deployment time needs to use the new key format.
|
||||
Loading…
Add table
Add a link
Reference in a new issue