feat: pluggable bootstrap framework with ordered initialisers (#847)

A generic, long-running bootstrap processor that converges a
deployment to its configured initial state and then idles.
Replaces the previous one-shot `tg-init-trustgraph` container model
and provides an extension point for enterprise / third-party
initialisers.

See docs/tech-specs/bootstrap.md for the full design.

Bootstrapper
------------
A single AsyncProcessor (trustgraph.bootstrap.bootstrapper.Processor)
that:

  * Reads a list of initialiser specifications (class, name, flag,
    params) from either a direct `initialisers` parameter
    (processor-group embedding) or a YAML/JSON file (`-c`, CLI).
  * On each wake, runs a cheap service-gate (config-svc +
    flow-svc round-trips), then iterates the initialiser list,
    running each whose configured flag differs from the one stored
    in __system__/init-state/<name>.
  * Stores per-initialiser completion state in the reserved
    __system__ workspace.
  * Adapts cadence: ~5s on gate failure, ~15s while converging,
    ~300s in steady state.
  * Isolates failures — one initialiser's exception does not block
    others in the same cycle; the failed one retries next wake.

Initialiser contract
--------------------
  * Subclass trustgraph.bootstrap.base.Initialiser.
  * Implement async run(ctx, old_flag, new_flag).
  * Opt out of the service gate with class attr
    wait_for_services=False (only used by PulsarTopology, since
    config-svc cannot come up until Pulsar namespaces exist).
  * ctx carries short-lived config and flow-svc clients plus a
    scoped logger.

Core initialisers (trustgraph.bootstrap.initialisers.*)
-------------------------------------------------------
  * PulsarTopology   — creates Pulsar tenant + namespaces
                       (pre-gate, blocking HTTP offloaded to
                        executor).
  * TemplateSeed     — seeds __template__ from an external JSON
                       file; re-run is upsert-missing by default,
                       overwrite-all opt-in.
  * WorkspaceInit    — populates a named workspace from either
                       the full contents of __template__ or a
                       seed file; raises cleanly if the template
                       isn't seeded yet so the bootstrapper retries
                       on the next cycle.
  * DefaultFlowStart — starts a specific flow in a workspace;
                       no-ops if the flow is already running.

Enterprise or third-party initialisers plug in via fully-qualified
dotted class paths in the bootstrapper's configuration — no core
code change required.

Config service
--------------
  * push(): filter out reserved workspaces (ids starting with "_")
    from the change notifications.  Stored config is preserved; only
    the broadcast is suppressed, so bootstrap / template state lives
    in config-svc without live processors ever reacting to it.

Config client
-------------
  * ConfigClient.get_all(workspace): wraps the existing `config`
    operation to return {type: {key: value}} for a workspace.
    WorkspaceInit uses it to copy __template__ without needing a
    hardcoded types list.

pyproject.toml
--------------
  * Adds a `bootstrap` console script pointing at the new Processor.

* Remove tg-init-trustgraph, superceded by bootstrap processor
This commit is contained in:
cybermaggedon 2026-04-22 18:03:46 +01:00 committed by GitHub
parent 31027e30ae
commit ae9936c9cc
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
17 changed files with 1312 additions and 273 deletions

View file

@ -60,6 +60,7 @@ agent-orchestrator = "trustgraph.agent.orchestrator:run"
api-gateway = "trustgraph.gateway:run"
chunker-recursive = "trustgraph.chunking.recursive:run"
chunker-token = "trustgraph.chunking.token:run"
bootstrap = "trustgraph.bootstrap.bootstrapper:run"
config-svc = "trustgraph.config.service:run"
flow-svc = "trustgraph.flow.service:run"
doc-embeddings-query-milvus = "trustgraph.query.doc_embeddings.milvus:run"

View file

@ -0,0 +1,68 @@
"""
Bootstrap framework: Initialiser base class and per-wake context.
See docs/tech-specs/bootstrap.md for the full design.
"""
import logging
from dataclasses import dataclass
from typing import Any
@dataclass
class InitContext:
"""Shared per-wake context passed to each initialiser.
The bootstrapper constructs one of these on every wake cycle,
tears it down at cycle end, and passes it into each initialiser's
``run()`` method. Fields are short-lived and safe to use during
a single cycle only.
"""
logger: logging.Logger
config: Any # ConfigClient
flow: Any # RequestResponse client for flow-svc
class Initialiser:
"""Base class for bootstrap initialisers.
Subclasses implement :meth:`run`. The bootstrapper manages
completion state, flag comparison, retry and error handling
subclasses describe only the work to perform.
Class attributes:
* ``wait_for_services`` (bool, default ``True``): when ``True`` the
initialiser only runs after the bootstrapper's service gate has
passed (config-svc and flow-svc reachable). Set ``False`` for
initialisers that bring up infrastructure the gate itself
depends on principally Pulsar topology, without which
config-svc cannot come online.
"""
wait_for_services: bool = True
def __init__(self, **params):
# Subclasses should consume their own params via keyword
# arguments in their own __init__ signatures. This catch-all
# is here so any kwargs that filter through unnoticed don't
# raise TypeError on construction.
pass
async def run(self, ctx, old_flag, new_flag):
"""Perform initialisation work.
:param ctx: :class:`InitContext` with logger, config client,
flow-svc client.
:param old_flag: Previously-stored flag string, or ``None`` if
this initialiser has never successfully completed in this
deployment.
:param new_flag: Currently-configured flag. A string chosen
by the operator; typically something like ``"v1"``.
:raises: Any exception on failure. The bootstrapper catches,
logs, and re-runs on the next cycle; completion state is
only written on clean return.
"""
raise NotImplementedError

View file

@ -0,0 +1 @@
from . service import *

View file

@ -0,0 +1,6 @@
#!/usr/bin/env python3
from . service import run
if __name__ == '__main__':
run()

View file

@ -0,0 +1,414 @@
"""
Bootstrapper processor.
Runs a pluggable list of initialisers in a reconciliation loop.
Each initialiser's completion state is recorded in the reserved
``__system__`` workspace under the ``init-state`` config type.
See docs/tech-specs/bootstrap.md for the full design.
"""
import asyncio
import importlib
import json
import logging
import uuid
from argparse import ArgumentParser
from dataclasses import dataclass
from trustgraph.base import AsyncProcessor
from trustgraph.base import ProducerMetrics, SubscriberMetrics
from trustgraph.base.config_client import ConfigClient
from trustgraph.base.request_response_spec import RequestResponse
from trustgraph.schema import (
ConfigRequest, ConfigResponse,
config_request_queue, config_response_queue,
)
from trustgraph.schema import (
FlowRequest, FlowResponse,
flow_request_queue, flow_response_queue,
)
from .. base import Initialiser, InitContext
logger = logging.getLogger(__name__)
default_ident = "bootstrap"
# Reserved workspace + config type under which completion state is
# stored. Reserved (`_`-prefix) workspaces are excluded from the
# config push broadcast — live processors never see these keys.
SYSTEM_WORKSPACE = "__system__"
INIT_STATE_TYPE = "init-state"
# Cadence tiers.
GATE_BACKOFF = 5 # Services not responding; retry soon.
INIT_RETRY = 15 # Gate passed but something ran/failed;
# converge quickly.
STEADY_INTERVAL = 300 # Everything at target flag; idle cheaply.
@dataclass
class InitialiserSpec:
"""One entry in the bootstrapper's configured list of initialisers."""
name: str
flag: str
instance: Initialiser
def _resolve_class(dotted):
"""Import and return a class by its dotted path."""
module_path, _, class_name = dotted.rpartition(".")
if not module_path:
raise ValueError(
f"Initialiser class must be a dotted path, got {dotted!r}"
)
module = importlib.import_module(module_path)
return getattr(module, class_name)
def _load_initialisers_file(path):
"""Load the initialisers spec list from a YAML or JSON file.
File shape:
.. code-block:: yaml
initialisers:
- class: trustgraph.bootstrap.initialisers.PulsarTopology
name: pulsar-topology
flag: v1
params:
admin_url: http://pulsar:8080
tenant: tg
- ...
"""
with open(path) as f:
content = f.read()
if path.endswith((".yaml", ".yml")):
import yaml
doc = yaml.safe_load(content)
else:
doc = json.loads(content)
if not isinstance(doc, dict) or "initialisers" not in doc:
raise RuntimeError(
f"{path}: expected a mapping with an 'initialisers' key"
)
return doc["initialisers"]
class Processor(AsyncProcessor):
def __init__(self, **params):
super().__init__(**params)
# Source the initialisers list either from a direct parameter
# (processor-group embedding) or from a file (CLI launch).
inits = params.get("initialisers")
if inits is None:
inits_file = params.get("initialisers_file")
if inits_file is None:
raise RuntimeError(
"Bootstrapper requires either the 'initialisers' "
"parameter or --initialisers-file"
)
inits = _load_initialisers_file(inits_file)
self.specs = []
names = set()
for entry in inits:
if not isinstance(entry, dict):
raise RuntimeError(
f"Initialiser entry must be a mapping, got: {entry!r}"
)
for required in ("class", "name", "flag"):
if required not in entry:
raise RuntimeError(
f"Initialiser entry missing required field "
f"{required!r}: {entry!r}"
)
name = entry["name"]
if name in names:
raise RuntimeError(f"Duplicate initialiser name {name!r}")
names.add(name)
cls = _resolve_class(entry["class"])
try:
instance = cls(**entry.get("params", {}))
except Exception as e:
raise RuntimeError(
f"Failed to instantiate initialiser "
f"{entry['class']!r} as {name!r}: "
f"{type(e).__name__}: {e}"
)
self.specs.append(InitialiserSpec(
name=name,
flag=entry["flag"],
instance=instance,
))
logger.info(
f"Bootstrapper: loaded {len(self.specs)} initialisers"
)
# ------------------------------------------------------------------
# Client construction (short-lived per wake cycle).
# ------------------------------------------------------------------
def _make_config_client(self):
rr_id = str(uuid.uuid4())
return ConfigClient(
backend=self.pubsub_backend,
subscription=f"{self.id}--config--{rr_id}",
consumer_name=self.id,
request_topic=config_request_queue,
request_schema=ConfigRequest,
request_metrics=ProducerMetrics(
processor=self.id, flow=None, name="config-request",
),
response_topic=config_response_queue,
response_schema=ConfigResponse,
response_metrics=SubscriberMetrics(
processor=self.id, flow=None, name="config-response",
),
)
def _make_flow_client(self):
rr_id = str(uuid.uuid4())
return RequestResponse(
backend=self.pubsub_backend,
subscription=f"{self.id}--flow--{rr_id}",
consumer_name=self.id,
request_topic=flow_request_queue,
request_schema=FlowRequest,
request_metrics=ProducerMetrics(
processor=self.id, flow=None, name="flow-request",
),
response_topic=flow_response_queue,
response_schema=FlowResponse,
response_metrics=SubscriberMetrics(
processor=self.id, flow=None, name="flow-response",
),
)
async def _open_clients(self):
config = self._make_config_client()
flow = self._make_flow_client()
await config.start()
try:
await flow.start()
except Exception:
await self._safe_stop(config)
raise
return config, flow
async def _safe_stop(self, client):
try:
await client.stop()
except Exception:
pass
# ------------------------------------------------------------------
# Service gate.
# ------------------------------------------------------------------
async def _gate_ready(self, config, flow):
try:
await config.keys(SYSTEM_WORKSPACE, INIT_STATE_TYPE)
except Exception as e:
logger.info(
f"Gate: config-svc not ready ({type(e).__name__}: {e})"
)
return False
try:
resp = await flow.request(
FlowRequest(
operation="list-blueprints",
workspace=SYSTEM_WORKSPACE,
),
timeout=5,
)
if resp.error:
logger.info(
f"Gate: flow-svc error: "
f"{resp.error.type}: {resp.error.message}"
)
return False
except Exception as e:
logger.info(
f"Gate: flow-svc not ready ({type(e).__name__}: {e})"
)
return False
return True
# ------------------------------------------------------------------
# Completion state.
# ------------------------------------------------------------------
async def _stored_flag(self, config, name):
raw = await config.get(SYSTEM_WORKSPACE, INIT_STATE_TYPE, name)
if raw is None:
return None
try:
return json.loads(raw)
except Exception:
return raw
async def _store_flag(self, config, name, flag):
await config.put(
SYSTEM_WORKSPACE, INIT_STATE_TYPE, name,
json.dumps(flag),
)
# ------------------------------------------------------------------
# Per-spec execution.
# ------------------------------------------------------------------
async def _run_spec(self, spec, config, flow):
"""Run a single initialiser spec.
Returns one of:
- ``"skip"``: stored flag already matches target, nothing to do.
- ``"ran"``: initialiser ran and completion state was updated.
- ``"failed"``: initialiser raised.
- ``"failed-state-write"``: initialiser succeeded but we could
not persist the new flag (transient will re-run next cycle).
"""
try:
old_flag = await self._stored_flag(config, spec.name)
except Exception as e:
logger.warning(
f"{spec.name}: could not read stored flag "
f"({type(e).__name__}: {e})"
)
return "failed"
if old_flag == spec.flag:
return "skip"
child_logger = logger.getChild(spec.name)
child_ctx = InitContext(
logger=child_logger,
config=config,
flow=flow,
)
child_logger.info(
f"Running (old_flag={old_flag!r} -> new_flag={spec.flag!r})"
)
try:
await spec.instance.run(child_ctx, old_flag, spec.flag)
except Exception as e:
child_logger.error(
f"Failed: {type(e).__name__}: {e}", exc_info=True,
)
return "failed"
try:
await self._store_flag(config, spec.name, spec.flag)
except Exception as e:
child_logger.warning(
f"Completed but could not persist state flag "
f"({type(e).__name__}: {e}); will re-run next cycle"
)
return "failed-state-write"
child_logger.info(f"Completed (flag={spec.flag!r})")
return "ran"
# ------------------------------------------------------------------
# Main loop.
# ------------------------------------------------------------------
async def run(self):
logger.info(
f"Bootstrapper starting with {len(self.specs)} initialisers"
)
while self.running:
sleep_for = STEADY_INTERVAL
try:
config, flow = await self._open_clients()
except Exception as e:
logger.info(
f"Failed to open clients "
f"({type(e).__name__}: {e}); retry in {GATE_BACKOFF}s"
)
await asyncio.sleep(GATE_BACKOFF)
continue
try:
# Phase 1: pre-service initialisers run unconditionally.
pre_specs = [
s for s in self.specs
if not s.instance.wait_for_services
]
pre_results = {}
for spec in pre_specs:
pre_results[spec.name] = await self._run_spec(
spec, config, flow,
)
# Phase 2: gate.
gate_ok = await self._gate_ready(config, flow)
# Phase 3: post-service initialisers, if gate passed.
post_results = {}
if gate_ok:
post_specs = [
s for s in self.specs
if s.instance.wait_for_services
]
for spec in post_specs:
post_results[spec.name] = await self._run_spec(
spec, config, flow,
)
# Cadence selection.
if not gate_ok:
sleep_for = GATE_BACKOFF
else:
all_results = {**pre_results, **post_results}
if any(r != "skip" for r in all_results.values()):
sleep_for = INIT_RETRY
else:
sleep_for = STEADY_INTERVAL
finally:
await self._safe_stop(config)
await self._safe_stop(flow)
await asyncio.sleep(sleep_for)
# ------------------------------------------------------------------
# CLI arg plumbing.
# ------------------------------------------------------------------
@staticmethod
def add_args(parser: ArgumentParser) -> None:
AsyncProcessor.add_args(parser)
parser.add_argument(
'-c', '--initialisers-file',
help='Path to YAML or JSON file describing the '
'initialisers to run. Ignored when the '
"'initialisers' parameter is provided directly "
'(e.g. when running inside a processor group).',
)
def run():
Processor.launch(default_ident, __doc__)

View file

@ -0,0 +1,20 @@
"""
Core bootstrap initialisers.
These cover the base TrustGraph deployment case. Enterprise or
third-party initialisers live in their own packages and are
referenced in the bootstrapper's config by fully-qualified dotted
path.
"""
from . pulsar_topology import PulsarTopology
from . template_seed import TemplateSeed
from . workspace_init import WorkspaceInit
from . default_flow_start import DefaultFlowStart
__all__ = [
"PulsarTopology",
"TemplateSeed",
"WorkspaceInit",
"DefaultFlowStart",
]

View file

@ -0,0 +1,101 @@
"""
DefaultFlowStart initialiser starts a named flow in a workspace
using a specified blueprint.
Separated from WorkspaceInit so deployments that want a workspace
without an auto-started flow can simply omit this initialiser.
Parameters
----------
workspace : str (default "default")
Workspace in which to start the flow.
flow_id : str (default "default")
Identifier for the started flow.
blueprint : str (required)
Blueprint name (must already exist in the workspace's config,
typically via TemplateSeed -> WorkspaceInit).
description : str (default "Default")
Human-readable description passed to flow-svc.
parameters : dict (optional)
Optional parameter overrides passed to start-flow.
"""
from trustgraph.schema import FlowRequest
from .. base import Initialiser
class DefaultFlowStart(Initialiser):
def __init__(
self,
workspace="default",
flow_id="default",
blueprint=None,
description="Default",
parameters=None,
**kwargs,
):
super().__init__(**kwargs)
if not blueprint:
raise ValueError(
"DefaultFlowStart requires 'blueprint'"
)
self.workspace = workspace
self.flow_id = flow_id
self.blueprint = blueprint
self.description = description
self.parameters = dict(parameters) if parameters else {}
async def run(self, ctx, old_flag, new_flag):
# Check whether the flow already exists. Belt-and-braces
# beyond the flag gate: if an operator stops and restarts the
# bootstrapper after the flow is already running, we don't
# want to blindly try to start it again.
list_resp = await ctx.flow.request(
FlowRequest(
operation="list-flows",
workspace=self.workspace,
),
timeout=10,
)
if list_resp.error:
raise RuntimeError(
f"list-flows failed: "
f"{list_resp.error.type}: {list_resp.error.message}"
)
if self.flow_id in (list_resp.flow_ids or []):
ctx.logger.info(
f"Flow {self.flow_id!r} already running in workspace "
f"{self.workspace!r}; nothing to do"
)
return
ctx.logger.info(
f"Starting flow {self.flow_id!r} "
f"(blueprint={self.blueprint!r}) "
f"in workspace {self.workspace!r}"
)
resp = await ctx.flow.request(
FlowRequest(
operation="start-flow",
workspace=self.workspace,
flow_id=self.flow_id,
blueprint_name=self.blueprint,
description=self.description,
parameters=self.parameters,
),
timeout=30,
)
if resp.error:
raise RuntimeError(
f"start-flow failed: "
f"{resp.error.type}: {resp.error.message}"
)
ctx.logger.info(
f"Flow {self.flow_id!r} started"
)

View file

@ -0,0 +1,131 @@
"""
PulsarTopology initialiser creates Pulsar tenant and namespaces
with their retention policies.
Runs pre-gate (``wait_for_services = False``) because config-svc and
flow-svc can't connect to Pulsar until these namespaces exist.
Admin-API calls are idempotent so re-runs on flag change are safe.
"""
import asyncio
import requests
from .. base import Initialiser
# Namespace configs. flow/request take broker defaults. response
# and notify get aggressive retention — those classes carry short-lived
# request/response and notification traffic only.
NAMESPACE_CONFIG = {
"flow": {},
"request": {},
"response": {
"retention_policies": {
"retentionSizeInMB": -1,
"retentionTimeInMinutes": 3,
"subscriptionExpirationTimeMinutes": 30,
},
},
"notify": {
"retention_policies": {
"retentionSizeInMB": -1,
"retentionTimeInMinutes": 3,
"subscriptionExpirationTimeMinutes": 5,
},
},
}
REQUEST_TIMEOUT = 10
class PulsarTopology(Initialiser):
wait_for_services = False
def __init__(
self,
admin_url="http://pulsar:8080",
tenant="tg",
**kwargs,
):
super().__init__(**kwargs)
self.admin_url = admin_url.rstrip("/")
self.tenant = tenant
async def run(self, ctx, old_flag, new_flag):
# requests is blocking; offload to executor so the loop stays
# responsive.
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, self._reconcile_sync, ctx.logger)
# ------------------------------------------------------------------
# Sync admin-API calls.
# ------------------------------------------------------------------
def _get_clusters(self):
resp = requests.get(
f"{self.admin_url}/admin/v2/clusters",
timeout=REQUEST_TIMEOUT,
)
resp.raise_for_status()
return resp.json()
def _tenant_exists(self):
resp = requests.get(
f"{self.admin_url}/admin/v2/tenants/{self.tenant}",
timeout=REQUEST_TIMEOUT,
)
return resp.status_code == 200
def _create_tenant(self, clusters):
resp = requests.put(
f"{self.admin_url}/admin/v2/tenants/{self.tenant}",
json={"adminRoles": [], "allowedClusters": clusters},
timeout=REQUEST_TIMEOUT,
)
if resp.status_code != 204:
raise RuntimeError(
f"Tenant {self.tenant!r} create failed: "
f"{resp.status_code} {resp.text}"
)
def _namespace_exists(self, namespace):
resp = requests.get(
f"{self.admin_url}/admin/v2/namespaces/"
f"{self.tenant}/{namespace}",
timeout=REQUEST_TIMEOUT,
)
return resp.status_code == 200
def _create_namespace(self, namespace, config):
resp = requests.put(
f"{self.admin_url}/admin/v2/namespaces/"
f"{self.tenant}/{namespace}",
json=config,
timeout=REQUEST_TIMEOUT,
)
if resp.status_code != 204:
raise RuntimeError(
f"Namespace {self.tenant}/{namespace} create failed: "
f"{resp.status_code} {resp.text}"
)
def _reconcile_sync(self, logger):
if not self._tenant_exists():
clusters = self._get_clusters()
logger.info(
f"Creating tenant {self.tenant!r} with clusters {clusters}"
)
self._create_tenant(clusters)
else:
logger.debug(f"Tenant {self.tenant!r} already exists")
for namespace, config in NAMESPACE_CONFIG.items():
if self._namespace_exists(namespace):
logger.debug(
f"Namespace {self.tenant}/{namespace} already exists"
)
continue
logger.info(
f"Creating namespace {self.tenant}/{namespace}"
)
self._create_namespace(namespace, config)

View file

@ -0,0 +1,93 @@
"""
TemplateSeed initialiser populates the reserved ``__template__``
workspace from an external JSON seed file.
Seed file shape:
.. code-block:: json
{
"flow-blueprint": {
"ontology": { ... },
"agent": { ... }
},
"prompt": {
...
},
...
}
Top-level keys are config types; nested keys are config entries.
Values are arbitrary JSON (they'll be ``json.dumps()``'d on write).
Parameters
----------
config_file : str
Path to the seed file on disk.
overwrite : bool (default False)
On re-run (flag change), if True overwrite all keys; if False
upsert-missing-only (preserves any operator customisation of
the template).
"""
import json
from .. base import Initialiser
TEMPLATE_WORKSPACE = "__template__"
class TemplateSeed(Initialiser):
def __init__(self, config_file, overwrite=False, **kwargs):
super().__init__(**kwargs)
if not config_file:
raise ValueError("TemplateSeed requires 'config_file'")
self.config_file = config_file
self.overwrite = overwrite
async def run(self, ctx, old_flag, new_flag):
with open(self.config_file) as f:
seed = json.load(f)
if old_flag is None:
# Clean first run — write every entry.
await self._write_all(ctx, seed)
return
# Re-run after flag change.
if self.overwrite:
await self._write_all(ctx, seed)
else:
await self._upsert_missing(ctx, seed)
async def _write_all(self, ctx, seed):
values = []
for type_name, entries in seed.items():
for key, value in entries.items():
values.append((type_name, key, json.dumps(value)))
if values:
await ctx.config.put_many(TEMPLATE_WORKSPACE, values)
ctx.logger.info(
f"Template seeded with {len(values)} entries"
)
async def _upsert_missing(self, ctx, seed):
written = 0
for type_name, entries in seed.items():
existing = set(
await ctx.config.keys(TEMPLATE_WORKSPACE, type_name)
)
values = []
for key, value in entries.items():
if key not in existing:
values.append(
(type_name, key, json.dumps(value))
)
if values:
await ctx.config.put_many(TEMPLATE_WORKSPACE, values)
written += len(values)
ctx.logger.info(
f"Template upsert-missing: {written} new entries"
)

View file

@ -0,0 +1,138 @@
"""
WorkspaceInit initialiser creates a workspace and populates it from
either the ``__template__`` workspace or a seed file on disk.
Parameters
----------
workspace : str
Target workspace to create / populate.
source : str
Either ``"template"`` (copy the full contents of the
``__template__`` workspace) or ``"seed-file"`` (read from
``seed_file``).
seed_file : str (required when source=="seed-file")
Path to a JSON seed file with the same shape TemplateSeed consumes.
overwrite : bool (default False)
On re-run (flag change), if True overwrite all keys; if False,
upsert-missing-only (preserves in-workspace customisations).
Raises (in ``run``)
-------------------
When source is ``"template"``, raises ``RuntimeError`` if the
``__template__`` workspace is empty indicating that TemplateSeed
hasn't run yet. The bootstrapper's retry loop will re-attempt on
the next cycle once the prerequisite is satisfied.
"""
import json
from .. base import Initialiser
TEMPLATE_WORKSPACE = "__template__"
class WorkspaceInit(Initialiser):
def __init__(
self,
workspace="default",
source="template",
seed_file=None,
overwrite=False,
**kwargs,
):
super().__init__(**kwargs)
if source not in ("template", "seed-file"):
raise ValueError(
f"WorkspaceInit: source must be 'template' or "
f"'seed-file', got {source!r}"
)
if source == "seed-file" and not seed_file:
raise ValueError(
"WorkspaceInit: seed_file required when source='seed-file'"
)
self.workspace = workspace
self.source = source
self.seed_file = seed_file
self.overwrite = overwrite
async def run(self, ctx, old_flag, new_flag):
if self.source == "seed-file":
tree = self._load_seed_file()
else:
tree = await self._load_from_template(ctx)
if old_flag is None or self.overwrite:
await self._write_all(ctx, tree)
else:
await self._upsert_missing(ctx, tree)
def _load_seed_file(self):
with open(self.seed_file) as f:
return json.load(f)
async def _load_from_template(self, ctx):
"""Build a seed tree from the entire ``__template__`` workspace.
Raises if the workspace is empty, so the bootstrapper knows
the prerequisite isn't met yet."""
raw_tree = await ctx.config.get_all(TEMPLATE_WORKSPACE)
tree = {}
total = 0
for type_name, entries in raw_tree.items():
parsed = {}
for key, raw in entries.items():
if raw is None:
continue
try:
parsed[key] = json.loads(raw)
except Exception:
parsed[key] = raw
total += 1
if parsed:
tree[type_name] = parsed
if total == 0:
raise RuntimeError(
"Template workspace is empty — has TemplateSeed run yet?"
)
ctx.logger.debug(
f"Loaded {total} template entries across {len(tree)} types"
)
return tree
async def _write_all(self, ctx, tree):
values = []
for type_name, entries in tree.items():
for key, value in entries.items():
values.append((type_name, key, json.dumps(value)))
if values:
await ctx.config.put_many(self.workspace, values)
ctx.logger.info(
f"Workspace {self.workspace!r} populated with "
f"{len(values)} entries"
)
async def _upsert_missing(self, ctx, tree):
written = 0
for type_name, entries in tree.items():
existing = set(
await ctx.config.keys(self.workspace, type_name)
)
values = []
for key, value in entries.items():
if key not in existing:
values.append(
(type_name, key, json.dumps(value))
)
if values:
await ctx.config.put_many(self.workspace, values)
written += len(values)
ctx.logger.info(
f"Workspace {self.workspace!r} upsert-missing: "
f"{written} new entries"
)

View file

@ -24,6 +24,21 @@ logger = logging.getLogger(__name__)
default_ident = "config-svc"
def is_reserved_workspace(workspace):
"""Reserved workspaces are storage-only.
Any workspace id beginning with ``_`` is reserved for internal use
(e.g. ``__template__`` holding factory-default seed config).
Reads and writes work normally so bootstrap and provisioning code
can use the standard config API, but **change notifications for
reserved workspaces are suppressed**. Services subscribed to the
config push therefore never see reserved-workspace events and
cannot accidentally act on template content as if it were live
state.
"""
return workspace.startswith("_")
default_config_request_queue = config_request_queue
default_config_response_queue = config_response_queue
default_config_push_queue = config_push_queue
@ -130,6 +145,21 @@ class Processor(AsyncProcessor):
async def push(self, changes=None):
# Suppress notifications from reserved workspaces (ids starting
# with "_", e.g. "__template__"). Stored config is preserved;
# only the broadcast is filtered. Keeps services oblivious to
# template / bootstrap state.
if changes:
filtered = {}
for type_name, workspaces in changes.items():
visible = [
w for w in workspaces
if not is_reserved_workspace(w)
]
if visible:
filtered[type_name] = visible
changes = filtered
version = await self.config.get_version()
resp = ConfigPush(