mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
288 lines
9.7 KiB
Markdown
288 lines
9.7 KiB
Markdown
---
|
|
layout: default
|
|
title: "Config Push 'Notify' Pattern Technical Specification"
|
|
parent: "Tech Specs"
|
|
---
|
|
|
|
# Config Push "Notify" Pattern Technical Specification
|
|
|
|
## Overview
|
|
|
|
Replace the current config push mechanism — which broadcasts the full config
|
|
blob on a `state` class queue — with a lightweight "notify" notification
|
|
containing only the version number and affected types. Processors that care
|
|
about those types fetch the full config via the existing request/response
|
|
interface.
|
|
|
|
This solves the RabbitMQ late-subscriber problem: when a process restarts,
|
|
its fresh queue has no historical messages, so it never receives the current
|
|
config state. With the notify pattern, the push queue is only a signal — the
|
|
source of truth is the config service's request/response API, which is
|
|
always available.
|
|
|
|
## Problem
|
|
|
|
On Pulsar, `state` class queues are persistent topics. A new subscriber
|
|
with `InitialPosition.Earliest` reads from message 0 and receives the
|
|
last config push. On RabbitMQ, each subscriber gets a fresh per-subscriber
|
|
queue (named with a new UUID). Messages published before the queue existed
|
|
are gone. A restarting processor never gets the current config.
|
|
|
|
## Design
|
|
|
|
### The Notify Message
|
|
|
|
The `ConfigPush` schema changes from carrying the full config to carrying
|
|
just a version number and the list of affected config types:
|
|
|
|
```python
|
|
@dataclass
|
|
class ConfigPush:
|
|
version: int = 0
|
|
types: list[str] = field(default_factory=list)
|
|
```
|
|
|
|
When the config service handles a `put` or `delete`, it knows which types
|
|
were affected (from the request's `values[].type` or `keys[].type`). It
|
|
includes those in the notify. On startup, the config service sends a notify
|
|
with an empty types list (meaning "everything").
|
|
|
|
### Subscribe-then-Fetch Startup (No Race Condition)
|
|
|
|
The critical ordering to avoid missing an update:
|
|
|
|
1. **Subscribe** to the config push queue. Buffer incoming notify messages.
|
|
2. **Fetch** the full config via request/response (`operation: "config"`).
|
|
This returns the config dict and a version number.
|
|
3. **Apply** the fetched config to all registered handlers.
|
|
4. **Process** buffered notifys. For any notify with `version > fetched_version`,
|
|
re-fetch and re-apply. Discard notifys with `version <= fetched_version`.
|
|
5. **Enter steady state**. Process future notifys as they arrive.
|
|
|
|
This is safe because:
|
|
- If an update happens before the subscription, the fetch picks it up.
|
|
- If an update happens between subscribe and fetch, it's in the buffer.
|
|
- If an update happens after the fetch, it arrives on the queue normally.
|
|
- Version comparison ensures no duplicate processing.
|
|
|
|
### Processor API
|
|
|
|
The current API requires processors to understand the full config dict
|
|
structure. The new API should be cleaner — processors declare which config
|
|
types they care about and provide a handler that receives only the relevant
|
|
config subset.
|
|
|
|
#### Current API
|
|
|
|
```python
|
|
# In processor __init__:
|
|
self.register_config_handler(self.on_configure_flows)
|
|
|
|
# Handler receives the entire config dict:
|
|
async def on_configure_flows(self, config, version):
|
|
if "active-flow" not in config:
|
|
return
|
|
if self.id in config["active-flow"]:
|
|
flow_config = json.loads(config["active-flow"][self.id])
|
|
# ...
|
|
```
|
|
|
|
#### New API
|
|
|
|
```python
|
|
# In processor __init__:
|
|
self.register_config_handler(
|
|
handler=self.on_configure_flows,
|
|
types=["active-flow"],
|
|
)
|
|
|
|
# Handler receives only the relevant config subset, same signature:
|
|
async def on_configure_flows(self, config, version):
|
|
# config still contains the full dict, but handler is only called
|
|
# when "active-flow" type changes (or on startup)
|
|
if "active-flow" not in config:
|
|
return
|
|
# ...
|
|
```
|
|
|
|
The `types` parameter is optional. If omitted, the handler is called for
|
|
every config change (backward compatible). If specified, the handler is
|
|
only invoked when the notify's `types` list intersects with the handler's
|
|
types, or on startup (empty types list = everything).
|
|
|
|
#### Internal Registration Structure
|
|
|
|
```python
|
|
# In AsyncProcessor:
|
|
def register_config_handler(self, handler, types=None):
|
|
self.config_handlers.append({
|
|
"handler": handler,
|
|
"types": set(types) if types else None, # None = all types
|
|
})
|
|
```
|
|
|
|
#### Notify Processing Logic
|
|
|
|
```python
|
|
async def on_config_notify(self, message, consumer, flow):
|
|
notify_version = message.value().version
|
|
notify_types = set(message.value().types)
|
|
|
|
# Skip if we already have this version or newer
|
|
if notify_version <= self.config_version:
|
|
return
|
|
|
|
# Fetch full config from config service
|
|
config, version = await self.config_client.config()
|
|
self.config_version = version
|
|
|
|
# Determine which handlers to invoke
|
|
for entry in self.config_handlers:
|
|
handler_types = entry["types"]
|
|
if handler_types is None:
|
|
# Handler cares about everything
|
|
await entry["handler"](config, version)
|
|
elif not notify_types or notify_types & handler_types:
|
|
# notify_types empty = startup (invoke all),
|
|
# or intersection with handler's types
|
|
await entry["handler"](config, version)
|
|
```
|
|
|
|
### Config Service Changes
|
|
|
|
#### Push Method
|
|
|
|
The `push()` method changes to send only version + types:
|
|
|
|
```python
|
|
async def push(self, types=None):
|
|
version = await self.config.get_version()
|
|
resp = ConfigPush(
|
|
version=version,
|
|
types=types or [],
|
|
)
|
|
await self.config_push_producer.send(resp)
|
|
```
|
|
|
|
#### Put/Delete Handlers
|
|
|
|
Extract affected types and pass to push:
|
|
|
|
```python
|
|
async def handle_put(self, v):
|
|
types = list(set(k.type for k in v.values))
|
|
for k in v.values:
|
|
await self.table_store.put_config(k.type, k.key, k.value)
|
|
await self.inc_version()
|
|
await self.push(types=types)
|
|
|
|
async def handle_delete(self, v):
|
|
types = list(set(k.type for k in v.keys))
|
|
for k in v.keys:
|
|
await self.table_store.delete_key(k.type, k.key)
|
|
await self.inc_version()
|
|
await self.push(types=types)
|
|
```
|
|
|
|
#### Queue Class Change
|
|
|
|
The config push queue changes from `state` class to `flow` class. The push
|
|
is now a transient signal — the source of truth is the config service's
|
|
request/response API, not the queue. `flow` class is persistent (survives
|
|
broker restarts) but doesn't require last-message retention, which was the
|
|
root cause of the RabbitMQ problem.
|
|
|
|
```python
|
|
config_push_queue = queue('config', cls='flow') # was cls='state'
|
|
```
|
|
|
|
#### Startup Push
|
|
|
|
On startup, the config service sends a notify with empty types list
|
|
(signalling "everything changed"):
|
|
|
|
```python
|
|
async def start(self):
|
|
await self.push(types=[]) # Empty = all types
|
|
await self.config_request_consumer.start()
|
|
```
|
|
|
|
### AsyncProcessor Changes
|
|
|
|
The `AsyncProcessor` needs a config request/response client alongside the
|
|
push consumer. The startup sequence becomes:
|
|
|
|
```python
|
|
async def start(self):
|
|
# 1. Start the push consumer (begins buffering notifys)
|
|
await self.config_sub_task.start()
|
|
|
|
# 2. Fetch current config via request/response
|
|
config, version = await self.config_client.config()
|
|
self.config_version = version
|
|
|
|
# 3. Apply to all handlers (startup = all handlers invoked)
|
|
for entry in self.config_handlers:
|
|
await entry["handler"](config, version)
|
|
|
|
# 4. Buffered notifys are now processed by on_config_notify,
|
|
# which skips versions <= self.config_version
|
|
```
|
|
|
|
The config client needs to be created in `__init__` using the existing
|
|
request/response queue infrastructure. The `ConfigClient` from
|
|
`trustgraph.clients.config_client` already exists but uses a synchronous
|
|
blocking pattern. An async variant or integration with the processor's
|
|
pub/sub backend is needed.
|
|
|
|
### Existing Config Handler Types
|
|
|
|
For reference, the config types currently used by handlers:
|
|
|
|
| Handler | Type(s) | Used By |
|
|
|---------|---------|---------|
|
|
| `on_configure_flows` | `active-flow` | All FlowProcessor subclasses |
|
|
| `on_collection_config` | `collection` | Storage services (triples, embeddings, rows) |
|
|
| `on_prompt_config` | `prompt` | Prompt template service, agent extract |
|
|
| `on_schema_config` | `schema` | Rows storage, row embeddings, NLP query, structured diag |
|
|
| `on_cost_config` | `token-costs` | Metering service |
|
|
| `on_ontology_config` | `ontology` | Ontology extraction |
|
|
| `on_librarian_config` | `librarian` | Librarian service |
|
|
| `on_mcp_config` | `mcp-tool` | MCP tool service |
|
|
| `on_knowledge_config` | `kg-core` | Cores service |
|
|
|
|
## Implementation Order
|
|
|
|
1. **Update ConfigPush schema** — change `config` field to `types` field.
|
|
|
|
2. **Update config service** — modify `push()` to send version + types.
|
|
Modify `handle_put`/`handle_delete` to extract affected types.
|
|
|
|
3. **Add async config query to AsyncProcessor** — create a
|
|
request/response client for config queries within the processor's
|
|
event loop.
|
|
|
|
4. **Implement subscribe-then-fetch startup** — reorder
|
|
`AsyncProcessor.start()` to subscribe first, then fetch, then
|
|
process buffered notifys with version comparison.
|
|
|
|
5. **Update register_config_handler** — add optional `types` parameter.
|
|
Update `on_config_notify` to filter by type intersection.
|
|
|
|
6. **Update existing handlers** — add `types` parameter to all
|
|
`register_config_handler` calls across the codebase.
|
|
|
|
7. **Backward compatibility** — handlers without `types` parameter
|
|
continue to work (invoked for all changes).
|
|
|
|
## Risks
|
|
|
|
- **Thundering herd**: if many processors restart simultaneously, they
|
|
all hit the config service API at once. Mitigated by the config service
|
|
already being designed for request/response load, and the number of
|
|
processors being small (tens, not thousands).
|
|
|
|
- **Config service availability**: processors now depend on the config
|
|
service being up at startup, not just having received a push. This is
|
|
already the case in practice — without config, processors can't do
|
|
anything useful.
|