trustgraph/trustgraph-flow/trustgraph/config/service/service.py
cybermaggedon ae9936c9cc
feat: pluggable bootstrap framework with ordered initialisers (#847)
A generic, long-running bootstrap processor that converges a
deployment to its configured initial state and then idles.
Replaces the previous one-shot `tg-init-trustgraph` container model
and provides an extension point for enterprise / third-party
initialisers.

See docs/tech-specs/bootstrap.md for the full design.

Bootstrapper
------------
A single AsyncProcessor (trustgraph.bootstrap.bootstrapper.Processor)
that:

  * Reads a list of initialiser specifications (class, name, flag,
    params) from either a direct `initialisers` parameter
    (processor-group embedding) or a YAML/JSON file (`-c`, CLI).
  * On each wake, runs a cheap service-gate (config-svc +
    flow-svc round-trips), then iterates the initialiser list,
    running each whose configured flag differs from the one stored
    in __system__/init-state/<name>.
  * Stores per-initialiser completion state in the reserved
    __system__ workspace.
  * Adapts cadence: ~5s on gate failure, ~15s while converging,
    ~300s in steady state.
  * Isolates failures — one initialiser's exception does not block
    others in the same cycle; the failed one retries next wake.

Initialiser contract
--------------------
  * Subclass trustgraph.bootstrap.base.Initialiser.
  * Implement async run(ctx, old_flag, new_flag).
  * Opt out of the service gate with class attr
    wait_for_services=False (only used by PulsarTopology, since
    config-svc cannot come up until Pulsar namespaces exist).
  * ctx carries short-lived config and flow-svc clients plus a
    scoped logger.

Core initialisers (trustgraph.bootstrap.initialisers.*)
-------------------------------------------------------
  * PulsarTopology   — creates Pulsar tenant + namespaces
                       (pre-gate, blocking HTTP offloaded to
                        executor).
  * TemplateSeed     — seeds __template__ from an external JSON
                       file; re-run is upsert-missing by default,
                       overwrite-all opt-in.
  * WorkspaceInit    — populates a named workspace from either
                       the full contents of __template__ or a
                       seed file; raises cleanly if the template
                       isn't seeded yet so the bootstrapper retries
                       on the next cycle.
  * DefaultFlowStart — starts a specific flow in a workspace;
                       no-ops if the flow is already running.

Enterprise or third-party initialisers plug in via fully-qualified
dotted class paths in the bootstrapper's configuration — no core
code change required.

Config service
--------------
  * push(): filter out reserved workspaces (ids starting with "_")
    from the change notifications.  Stored config is preserved; only
    the broadcast is suppressed, so bootstrap / template state lives
    in config-svc without live processors ever reacting to it.

Config client
-------------
  * ConfigClient.get_all(workspace): wraps the existing `config`
    operation to return {type: {key: value}} for a workspace.
    WorkspaceInit uses it to copy __template__ without needing a
    hardcoded types list.

pyproject.toml
--------------
  * Adds a `bootstrap` console script pointing at the new Processor.

* Remove tg-init-trustgraph, superceded by bootstrap processor
2026-04-22 18:03:46 +01:00

231 lines
7.1 KiB
Python

"""
Config service. Manages system global configuration state
"""
import logging
from trustgraph.schema import Error
from trustgraph.schema import ConfigRequest, ConfigResponse, ConfigPush
from trustgraph.schema import config_request_queue, config_response_queue
from trustgraph.schema import config_push_queue
from trustgraph.base import AsyncProcessor, Consumer, Producer
from trustgraph.base.cassandra_config import add_cassandra_args, resolve_cassandra_config
from . config import Configuration
from ... base import ProcessorMetrics, ConsumerMetrics, ProducerMetrics
from ... base import Consumer, Producer
# Module logger
logger = logging.getLogger(__name__)
default_ident = "config-svc"
def is_reserved_workspace(workspace):
"""Reserved workspaces are storage-only.
Any workspace id beginning with ``_`` is reserved for internal use
(e.g. ``__template__`` holding factory-default seed config).
Reads and writes work normally so bootstrap and provisioning code
can use the standard config API, but **change notifications for
reserved workspaces are suppressed**. Services subscribed to the
config push therefore never see reserved-workspace events and
cannot accidentally act on template content as if it were live
state.
"""
return workspace.startswith("_")
default_config_request_queue = config_request_queue
default_config_response_queue = config_response_queue
default_config_push_queue = config_push_queue
default_cassandra_host = "cassandra"
class Processor(AsyncProcessor):
def __init__(self, **params):
config_request_queue = params.get(
"config_request_queue", default_config_request_queue
)
config_response_queue = params.get(
"config_response_queue", default_config_response_queue
)
config_push_queue = params.get(
"config_push_queue", default_config_push_queue
)
cassandra_host = params.get("cassandra_host")
cassandra_username = params.get("cassandra_username")
cassandra_password = params.get("cassandra_password")
# Resolve configuration with environment variable fallback
hosts, username, password, keyspace = resolve_cassandra_config(
host=cassandra_host,
username=cassandra_username,
password=cassandra_password,
default_keyspace="config"
)
# Store resolved configuration
self.cassandra_host = hosts
self.cassandra_username = username
self.cassandra_password = password
id = params.get("id")
super(Processor, self).__init__(
**params | {
"config_request_schema": ConfigRequest.__name__,
"config_response_schema": ConfigResponse.__name__,
"config_push_schema": ConfigPush.__name__,
"cassandra_host": self.cassandra_host,
"cassandra_username": self.cassandra_username,
"cassandra_password": self.cassandra_password,
}
)
config_request_metrics = ConsumerMetrics(
processor = self.id, flow = None, name = "config-request"
)
config_response_metrics = ProducerMetrics(
processor = self.id, flow = None, name = "config-response"
)
config_push_metrics = ProducerMetrics(
processor = self.id, flow = None, name = "config-push"
)
self.config_request_topic = config_request_queue
self.config_request_subscriber = id
self.config_request_consumer = Consumer(
taskgroup = self.taskgroup,
backend = self.pubsub,
flow = None,
topic = config_request_queue,
subscriber = id,
schema = ConfigRequest,
handler = self.on_config_request,
metrics = config_request_metrics,
)
self.config_response_producer = Producer(
backend = self.pubsub,
topic = config_response_queue,
schema = ConfigResponse,
metrics = config_response_metrics,
)
self.config_push_producer = Producer(
backend = self.pubsub,
topic = config_push_queue,
schema = ConfigPush,
metrics = config_push_metrics,
)
self.config = Configuration(
host = self.cassandra_host,
username = self.cassandra_username,
password = self.cassandra_password,
keyspace = keyspace,
push = self.push
)
logger.info("Config service initialized")
async def start(self):
await self.pubsub.ensure_topic(self.config_request_topic)
await self.push() # Startup poke: empty types = everything
await self.config_request_consumer.start()
async def push(self, changes=None):
# Suppress notifications from reserved workspaces (ids starting
# with "_", e.g. "__template__"). Stored config is preserved;
# only the broadcast is filtered. Keeps services oblivious to
# template / bootstrap state.
if changes:
filtered = {}
for type_name, workspaces in changes.items():
visible = [
w for w in workspaces
if not is_reserved_workspace(w)
]
if visible:
filtered[type_name] = visible
changes = filtered
version = await self.config.get_version()
resp = ConfigPush(
version = version,
changes = changes or {},
)
await self.config_push_producer.send(resp)
logger.info(
f"Pushed config poke version {version}, "
f"changes={resp.changes}"
)
async def on_config_request(self, msg, consumer, flow):
try:
v = msg.value()
# Sender-produced ID
id = msg.properties()["id"]
logger.debug(f"Handling config request {id}...")
resp = await self.config.handle(v)
await self.config_response_producer.send(
resp, properties={"id": id}
)
except Exception as e:
resp = ConfigResponse(
error=Error(
type = "config-error",
message = str(e),
),
)
await self.config_response_producer.send(
resp, properties={"id": id}
)
@staticmethod
def add_args(parser):
AsyncProcessor.add_args(parser)
parser.add_argument(
'--config-request-queue',
default=default_config_request_queue,
help=f'Config request queue (default: {default_config_request_queue})'
)
parser.add_argument(
'--config-response-queue',
default=default_config_response_queue,
help=f'Config response queue {default_config_response_queue}',
)
# Note: --config-push-queue is already added by AsyncProcessor.add_args()
add_cassandra_args(parser)
def run():
Processor.launch(default_ident, __doc__)