Verbatim move (indentation preserved — embedded raw-string fixtures are
content). lib.rs drops from 7,857 to ~4,750 lines; `use super::*` resolves
to the crate root through the #[path] module declaration unchanged. 95
tests green before and after.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
PolicyConfig::from_source + PolicyEngine::load_graph_from_source /
load_server_from_source — the path-based loaders delegate to them. Needed by
callers whose policy bundles don't live on the local filesystem (the cluster
catalog on object storage); kind-alignment validation stays loud through the
new path.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three primitives the cluster's object-storage port (RFC-006) needs, on the
engine's existing adapter rather than a parallel store:
- read_text_versioned: content + an opaque backend version token (S3: the
ETag from GET; local: content sha256 — ETags don't exist on a filesystem).
- write_text_if_match: replace only when the token still matches. S3 maps to
a conditional put (PutMode::Update / If-Match) — verified against RustFS
beta.8 through the real object_store 0.12.5 path, no extra builder config
needed; local compares content then swaps via temp+rename, the same
single-machine semantics callers had before this trait (safe under their
own lock protocol, not a cross-process barrier by itself). CAS-lost is
Ok(None), never silent.
- delete_prefix: recursive + idempotent (local remove_dir_all; S3 list +
delete, with the non-atomicity documented for crash-retry callers).
Gated S3 coverage: s3_adapter_conditional_writes_contract pins the
conditional-write behavior the cluster ledger will depend on (red if a
backend bump regresses it), and s3_schema_apply_migrates_live_graph closes
the previously-untested schema-apply-on-S3 path before the cluster's schema
executor leans on it. Engine gains the sha2 workspace dep.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
omnigraph load is now the single data-write command:
- works against remote graphs (POSTs the server's /ingest endpoint with the
same bearer/actor resolution as other remote commands) — previously load
was the only data command forced to open Lance storage directly
- --from <base> opts into fork-if-missing for --branch (the former ingest
semantics); without --from a missing branch is an error, never a fork
- --mode is now required: overwrite is destructive, so there is no implicit
default (the old silent default was overwrite)
- output gains base_branch/branch_created (and table sums on remote loads)
omnigraph ingest stays as a deprecated alias (defaults preserved: --from
main --mode merge) that prints a one-line warning to stderr, matching the
read/change deprecation convention; removal in a later release.
Docs updated in the same change: cli.md, cli-reference.md, policy.md,
audit.md, execution.md (unified load section), AGENTS.md quick-flow,
README.md.
BREAKING CHANGE: scripts running omnigraph load without --mode must now
pass it explicitly (previously defaulted to the destructive overwrite).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Branch creation becomes opt-in by presence of the request's 'from' field.
Previously the handler defaulted from to 'main' and always auto-created a
missing branch — a typo'd branch name silently forked main and landed the
data there, with the client none the wiser. Now a request without 'from'
against a missing branch returns 404 branch-not-found and creates nothing;
with 'from' set, fork-if-missing behaves as before. The BranchCreate
authority is only consulted when a fork will actually happen.
The handler calls the unified load_as directly (the deprecated ingest_as
shim is no longer used in the server). IngestOutput.base_branch becomes
nullable: it echoes the request's 'from' and is null when absent. OpenAPI
regenerated; the CLI's local ingest arm moves to load_file_as + the new
converter shape.
BREAKING CHANGE: clients that relied on implicit fork-from-main with 'from'
omitted must now pass from='main' explicitly. IngestOutput.base_branch is
now nullable.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The free helpers needlessly demanded &mut Omnigraph (every load API takes
&self) and read as leftovers. Rather than rewriting their ~200 call sites
across the test suites — which would have to re-derive the active-branch
resolution at each site — keep the one convenience and make it honest:
borrow immutably (&mut callers coerce, no churn) and document it as the
active-branch shorthand over Omnigraph::load.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
load_as/load_file_as gain a base: Option<&str> parameter: with Some(base) a
missing target branch is forked from base first (the former ingest
semantics); with None the target branch must exist — staging fails on an
unknown branch, so a typo'd name can never create one. LoadResult gains
branch/base_branch/branch_created metadata (additive).
The ingest family (ingest, ingest_as, ingest_file, ingest_file_as) becomes
#[deprecated] shims over load_as that preserve the historical contract
exactly (from: None still means fork from main; base recorded even when no
fork happened). IngestResult and to_ingest_tables stay for the shims and
the server until the removal release.
The layered policy check is unchanged: Change on the target branch always,
BranchCreate additionally when a fork actually happens (enforced inside
branch_create_from_as with the actor threaded through).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The LoadMode table still described Overwrite as an inline-commit-per-type
residual with a partial-truncation failure window. Since MR-793 Phase 2,
Overwrite goes through the same MutationStaging accumulator as Append/Merge,
staged as a Lance Operation::Overwrite transaction via stage_overwrite
(table_store.rs) and committed with commit_staged + publisher CAS — a
mid-load failure leaves Lance HEAD untouched in all three modes.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
resolve_query_decls hands its file contents to the caller; the per-query
digest/typecheck pass reuses them instead of re-reading (a file with N
queries was read N+1 times), which also closes the window where a file
changing between enumeration and validation produced a confusing
query_key_mismatch for a just-discovered name. Explicit-map declarations
read as before.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Paths in cluster.yaml and command examples are relative to one explicit
config folder (Terraform-shaped) — the ./ prefixes were noise and are gone
across the user docs (109 instances; ../ links and ./scripts executables
untouched). The cluster docs now present directory discovery as the primary
queries form with the list and map forms documented alongside.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cluster.yaml's graphs.<id>.queries previously accepted only an explicit
name->file map, forcing configs to re-enumerate every `query <name>` that
the .gq files already declare (the SPIKE cookbook needed 66 entries for 6
files). The files ARE the declaration now: `queries: queries/` discovers
every declaration in a directory's top-level *.gq (sorted), a list form
takes explicit files, and the map stays for fine-grained control.
Discovery is loud — unreadable/unparseable files and duplicate query names
fail validation (query_parse_error, duplicate_query_name). Downstream is
untouched: each discovered query is still an individually addressed
resource with the containing file's digest.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The ECS day-2 apply gains its required --config flag (the image ships no
omnigraph.yaml, so the CLI cannot locate the cluster dir without it), and
the docker-exec example uses the <you> placeholder convention instead of a
real-looking actor name.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- resolve_cluster_actor uses load_config directly: load_cli_config also
loads auth.env_file into the process env — a second thing, violating the
documented 'exactly one thing' omnigraph.yaml contract for cluster ops.
- resolve_cli_actor gets its doc comment back (the inserted helper had
absorbed the contiguous /// block).
- The actor-default test imports once as setup and asserts on apply alone,
idempotently, instead of re-importing inside the assertion helper.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The container contract (OMNIGRAPH_CLUSTER + mounted volume + token env),
ECS/Fargate+EFS and Railway-volume walkthroughs, the in-container day-2
loop, and the honest constraints list (volume mandatory, no hot reload,
single-writer apply, shared-volume replicas unvalidated). Operator guide
links the recipes.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
OMNIGRAPH_CLUSTER boots the container from a mounted cluster directory's
applied revision — checked first and exclusive (exit 64 when combined with
OMNIGRAPH_TARGET_URI/CONFIG/TARGET), the entrypoint-level mirror of the
server's mode-inference rule 0. The omnigraph CLI joins the image so the
day-2 loop (cluster apply/approve/status, data loads by explicit URI) runs
in-container via docker/ECS exec or railway shell — no omnigraph.yaml
required, which the cluster-local-config PR pins. entrypoint_test gains the
cluster case plus all three exclusivity refusals.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
local_cli_s3_end_to_end_init_load_read_flow ran `omnigraph init` without a
current_dir, so init's project scaffold landed in crates/omnigraph-cli/ —
poisoning any later test that resolves a graph target from the cwd config
(query_lint_requires_schema_or_resolvable_graph_target fails determinis-
tically once the file exists). Only manifests when OMNIGRAPH_S3_TEST_BUCKET
is set, which is why local FS runs and CI's scoped rustfs job never caught
it. The init and load calls now run inside the test's tempdir.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The 'Relationship to omnigraph.yaml' section becomes the exact rule set:
cluster commands read the per-operator config for exactly one thing (the
cli.actor default when --as is omitted), a --cluster server reads it for
nothing, and pointing data-plane targets at derived roots is ergonomics,
not coupling. Operator guide and CLI reference updated to match.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A --cluster server process whose cwd contains a MALFORMED omnigraph.yaml
boots and serves — proving mode-inference rule 0 returns before any config
search can run. New spawn_server_with_cluster_in support helper sets the
spawned server's cwd explicitly.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cluster FACTS stay unlayered (cluster.yaml only), but the operator's
identity is a per-operator fact — exactly the per-operator omnigraph.yaml's
permanent job, and the cascade every data-plane write already uses. cluster
apply/approve now resolve: --as flag wins and skips any config read
entirely (containers and CI stay config-free); without it, the standard cwd
search supplies cli.actor, with a malformed config failing loudly and
actionably ('pass --as to skip this lookup') rather than silently dropping
attribution. approve's no-actor error now names both sources.
Tests pin the contract from both sides: cli.actor is the no-flag default
for apply (echoed actor) and approve (approved_by), the flag overrides it,
a malformed omnigraph.yaml in cwd breaks nothing except the no-flag actor
lookup, and a conflicting well-formed one leaks nothing into cluster
outputs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
New docs/user/cluster.md — the practical companion to cluster-config.md's
reference: zero-to-served walkthrough (validate/import/plan/apply, derived
roots, data loading, --cluster serving), the day-2 edit->plan->apply->restart
loop with a per-change-kind table, drift observation and convergence, the
approval gate for destructive changes, crash/lock/lost-ledger recovery, the
boot-refusal table with remedies, deployment patterns (replicas, backup
unit, CI gating), and the explicit not-yet list (hot reload, S3-hosted
cluster dirs, per-query exposure, pipelines). Linked from the user index,
the agent guide's topic map, and cross-linked from the reference.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- The drift-heal verification now asserts `schema show` succeeded and
produced a schema before checking the rogue field's absence (a failed
command previously made the negative assertion vacuously pass).
- cluster_cli documents why it deliberately does not assert exit codes
(blocked applies exit non-zero by contract while emitting the structured
output callers assert on).
- The comprehensive lifecycle e2es honor OMNIGRAPH_SKIP_SYSTEM_E2E=1
(graceful skip-with-message, the S3-gate pattern) for constrained
sandboxes; requirements + suppression documented in testing.md.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
beta.4+ refuses the rustfsadmin/rustfsadmin test credentials unless
RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true is set — acceptable for the
ephemeral CI container and the local bootstrap script (which already passed
it). The three S3 suites were validated against the beta.8 binary locally
before this bump. The pin stays explicit, never `latest`, so future
upgrades remain deliberate.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two system tests composing the whole Phase 1-5 surface with real binaries:
- local_cluster_full_lifecycle_declare_serve_evolve_delete: declare two
graphs -> one apply creates and converges them -> the --cluster server
serves both stored queries -> schema+query evolve in one apply (migration
previewed in plan) -> restart serves the new shape -> out-of-band schema
drift observed by refresh and converged back by apply (rogue field
soft-dropped) -> approved graph delete -> restart serves the survivor and
404s the tombstoned graph -> final plan empty. Catches composition
regressions where each stage passes its own tests but the lifecycle
breaks (the composite_flow.rs principle at the control-plane level).
- local_cluster_serving_enforces_applied_policy_bindings: applied policy
bundles gate serving per their bindings over HTTP with bearer-resolved
actors — the cluster-bound bundle owns graph_list (admin 200, reader 403,
anonymous 401), the graph-bound bundle owns invoke_query (reader gets
rows; denied invocation is the documented anti-probing 404).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The standing caveat ('applied means recorded in the cluster catalog —
nothing more; the server still boots from omnigraph.yaml') retires: cluster
docs gain the 'Serving from the cluster' section (exclusivity, applied-
revision serving, fail-fast readiness, restart-to-pick-up, expose-all
bridge), server.md gains mode-inference rule 0 and the cluster-booted multi
mode, deployment.md the boot-source choice, and the CLI's apply note plus
the cli-reference cluster row (stale back to Stage 3A) now describe the full
convergence surface. RFC-005 flips to Landed with four implementation
deviations recorded.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Phase-5 contract end to end with real binaries: cluster import + apply
via the CLI, seed a row through the graph plane, boot omnigraph-server with
--cluster (no omnigraph.yaml anywhere), and the applied stored query serves
the row over HTTP through the multi-graph routes.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RFC-005 §D1/§D2: omnigraph-server --cluster <dir> is rule 0 of the mode
inference — an exclusive boot source (hard error when combined with a graph
URI, --target, or --config) that never opens omnigraph.yaml, not even the
implicit current-directory search. The cluster branch reads the applied
revision through omnigraph-cluster's serving-snapshot API and feeds the
EXISTING multi-graph pipeline: GraphStartupConfig per recorded graph at its
derived root, stored queries built via QueryRegistry::from_specs from
verified blob content (expose-all — the §D5 bridge until Phase 6
policy-owned exposure), cluster-bound policy bundles as the server-level
Cedar engine and graph-bound bundles per graph, straight from the
content-addressed blob paths. Multiple bundles binding one scope refuse boot
(one-bundle-per-scope is the serving pipeline's shape; stacking is a later
slice). Everything downstream — parallel opens, query type-checking,
registry, routing, auth, OpenAPI — is reused unchanged; cluster mode is a
new source, not a new pipeline.
First server->cluster crate dependency: read-only types + one fn;
omnigraph-cluster stays HTTP-free. open_multi_graph_state goes pub for
integration tests.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RFC-005 §D2/§D4: read_serving_snapshot reads the applied revision as
everything a server needs to boot — graphs at derived roots, stored-query
sources read from the content-addressed catalog and re-hashed against the
recorded digests, policy blob paths with their applied applies_to bindings.
All-or-nothing: missing state, pending recovery sidecars, missing/tampered
blobs, pre-5A entries without bindings, and an empty graph set each refuse
the snapshot with a remedy; no partial serving. Lock-free by design — the
state file is replaced atomically, so the read is a consistent
point-in-time ledger.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Slice 5A of RFC-005: the state ledger becomes serving-sufficient for the
Phase-5 server boot. StateResource gains an optional applies_to (normalized
typed refs: cluster | graph.<id>), written by apply for every applied policy
create/update from the desired config's validated bindings.
The hole this closes: applies_to is not part of the policy file digest, so a
binding-only edit previously produced NO plan change at all (a 4C e2e even
asserted that — the gap, not a contract). Binding changes are now
first-class: a post-diff pass emits an Update with equal before/after
digests and a binding_change marker (visible in plan/apply JSON and human
output as [bindings]), classification/execution treat it as an ordinary
catalog-tier applied change (payload skips naturally — the blob is
unchanged), and convergence requires zero binding divergence, so stale
bindings can never report converged. Pre-5A ledger entries (no bindings
recorded) surface as the same backfill Update; one apply heals them, exactly
the remedy RFC-005's boot-error path names.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The axiom-15 mode switch: omnigraph-server --cluster <dir> (mutually
exclusive with uri/--target/--config, zero omnigraph.yaml reads) serves the
APPLIED revision — graph set from state, query/policy content from the
content-addressed catalog at applied digests, cluster-scoped policy bundles
as the server-level Cedar engine. The load-bearing finding: state is not yet
serving-sufficient (policy applies_to bindings live only in cluster.yaml), so
slice 5A records binding metadata into the applied revision at apply time —
without it, boot-from-state silently becomes the merged read axiom 15
forbids. Fail-fast readiness table (missing state, pending sidecars, missing
blobs, unbound policies all refuse boot with remedies), the expose-all
mcp.expose bridge with its Phase 6 sunset, the operator migration path (exit
criterion 7), and 5A/5B/5C sequencing. The existing boot pipeline
(GraphStartupConfig -> registry -> routing/auth) is reused as-is — a new
source, not a new pipeline.
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Pipelines (scheduler, connectors, mapping, idempotency, run ledger) leave the
cluster control-plane rollout and become their own project with their own
RFC. This rollout guarantees only the socket, all of which already exists and
is enforced: the pipelines: config field is reserved (typed
future_phase_field rejection, test-covered), the pipeline.<name> typed
address and Pipeline resource kind are reserved in the resource model, and
axiom 13 fixes the contract any future implementation must satisfy
(definition reconciled, execution data-plane, fan-out statusful). The ETL
section in the high-level spec stands as the requirements record for that
project; exit criterion 9 defers to its RFC.
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Approvals + gated graph deletion in the user docs, the approve command in the
CLI reference, RFC-004 flipped to Landed with its three implementation
deviations recorded (row-8 retire-and-repropose, --as instead of --actor/--by,
consumed artifacts rewritten in place rather than moved).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Crash before the removal: root intact, approval file unconsumed, sidecar
survives, no ack; the next run retires the stale intent (row 8) and the
still-approved delete completes in the same run.
- Crash after the removal, before the state CAS: root gone, ledger
byte-identical, the sidecar carries the approval id; the next run's sweep
rolls the tombstone forward, consumes the approval, audits the recovery,
and converges (row 7b).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Stage 4C execution half (RFC-004 §D5/§D6 + sweep rows 7/7b/8): an approved
graph.<id> delete — and its riding schema/query deletes — classifies Applied
and executes LAST in the run, sidecar-fenced: pre-op manifest pin (best
effort; partial roots still delete), approval_id carried in the sidecar,
recursive root removal (NotFound tolerated), subtree tombstoned out of the
ledger with a tombstone observation, the approval consumed in the same state
CAS (ledger summary) and its artifact file rewritten with consumed_at only
after the CAS lands — a failed run consumes nothing and the approval stays
valid for the retry.
Sweep rows: already-tombstoned intents retire (7); a completed delete with a
stale ledger rolls forward — tombstone + approval consumption + audit entry
(7b, idempotent); a still-present root retires the stale intent with a
graph_delete_incomplete warning and the still-approved delete re-executes in
the same run (8) — prefix removal is idempotent, so retry IS the repair.
The multi-graph mixed e2e gets its conclusion: blocked without approval,
cluster approve graph.engineering --as andrew, converge, tombstone visible
in status. Phase 4's disposition matrix is now fully executable.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RFC-004 §D4, gate half: graph deletes (and their subtree) now classify
Blocked/approval_required instead of Deferred; the new cluster approve
command (requires the global --as actor) writes
__cluster/approvals/{ulid}.json bound to the desired config digest and the
change's before/after digests, so config or state drift invalidates the
artifact automatically (approval_stale warning, never authorizes). One gate
per subtree: compute_approvals lists only the graph-level delete, and
ApprovalRequirement gains a satisfied flag surfaced by plan. Consumption and
the delete executor land next — until then approved deletes stay blocked so
a gate-only build can never strip state without removing the root.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Crash before the engine call: sidecar (carrying the --as actor) survives,
live schema and ledger untouched, no ack; the next run's sweep retires the
stale intent and the same run applies and converges.
- Crash after the engine call, before the state CAS: the manifest moved with
the post-op pin in the sidecar, state.json byte-identical; the next run's
sweep rolls the ledger forward with a schema_apply audit entry and the run
converges.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Stage 4B (RFC-004 §D1/§D5): schema.<id> Update changes classify Applied and
execute after graph creates, sequentially and sidecar-fenced — read-write
open (the engine's own recovery runs first), pre-op manifest pin recorded,
apply_schema_as with allow_data_loss: false (soft drops only; hard drops wait
for 4C's approval artifacts), post-op pin rewritten into the sidecar, sidecar
retired only after the final state CAS. Queries gated on a same-plan schema
update unblock (the migration lands first in the same run); failures —
unsupported migrations, lock contention, user branches — surface as
schema_apply_failed with the engine's message, demote dependents via the
origin-aware demotion helper, and stop further graph-moving work.
Schema evolution is now fully cluster-driven (the defer -> manual schema
apply -> refresh loop is gone), and out-of-band schema drift is converged
back by apply as an ordinary soft migration (axiom 8: drift correction is
gated like any change; the recoverable tier needs no approval) — both pinned
by reworked e2es. The multi-graph mixed e2e's deferred row is now
delete-shaped, pre-staging the 4C surface.
Actor: cluster apply accepts the CLI's global --as via the new ApplyOptions /
apply_config_dir_with_options (apply_config_dir delegates unchanged); the
actor is echoed in ApplyOutput and recorded in sidecars and audit entries,
and threads to apply_schema_as so Cedar fires wherever a checker is
installed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RecoverySidecarKind::SchemaApply with digest-based sweep classification
(robust to unrelated manifest movement; version pins stay forensic):
ledger-consistent -> sidecar retired (RFC-004 rows 1+2); live digest matches
the intended schema, state stale -> roll forward with composite recompute and
a recovery_records audit entry (row 3); unverifiable or unexpected digests ->
pending, kept, graph-moving work blocked (rows 1-unopenable/6).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>