* feat(cluster): cluster_root_for_graph_uri detection helper (RFC-010 Slice 3)
Public helper the CLI uses to refuse `init` into a cluster-managed location:
given a graph storage URI of the cluster layout (`<root>/graphs/<id>.omni`),
return the cluster root if `<root>` holds `__cluster/state.json`, else None.
Cheap by construction — a URI that doesn't match the `<root>/graphs/<id>.omni`
shape returns None with zero I/O, so ordinary `init` targets never probe
storage. Works for file:// and s3:// via the storage adapter. Adds two
ClusterStore accessors (`display_root`, `has_state`).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3)
Two cluster-graph-aware CLI behaviors, sharing the cluster-resolution path.
Maintenance addressing. `optimize`/`repair`/`cleanup` gain
`--cluster <dir|s3://…> --cluster-graph <id>`, which resolves the graph's
storage URI from the served cluster snapshot (the same truth a `--cluster`
server boots from — `read_serving_snapshot*`) and opens it embedded. The
operator no longer hand-types `<storage>/graphs/<id>.omni`. A distinct flag is
required because the global `--graph` is `requires = server` and means a remote
multi-graph id. clap enforces both-or-neither and exclusion with the positional
URI / `--target`; an unserved graph errors loudly, pointing at `cluster apply`.
init signpost. `init` refuses a cluster-managed positional path (the
`<root>/graphs/<id>.omni` layout where `<root>` holds `__cluster/state.json`,
detected by `cluster_root_for_graph_uri`) and points at `cluster apply` — graphs
in an established cluster are created with ledger/recovery/approvals, not by
hand. The check is gated on the path shape, so ordinary `init` does no extra I/O
and existing pre-apply cluster-graph inits are unaffected.
planes guard remediation now also mentions `--cluster … --cluster-graph …`
(the two Slice-1 guard-string tests track it). Docs updated (cli-reference
Command planes, maintenance.md, cluster.md §7); the stale "no S3-hosted cluster
directories" limitation is dropped (RFC-006 landed it).
Tests (cli_cluster.rs, reusing the apply-a-cluster fixture): resolve by id,
unknown-id error, `--cluster` requires `--cluster-graph`, init refusal +
signpost, and ordinary init still works.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(cli): resolve cluster graphs from the state ledger, not the serving snapshot
Addresses the Greptile review on #221. `read_serving_snapshot*` does
all-or-nothing serving validation — recovery-sidecar checks plus a digest
verify of every catalog payload (query .gq, policy blobs). Using it to resolve
a maintenance target coupled `optimize`/`repair`/`cleanup` to the readiness of
unrelated resources: a single corrupt policy blob, or a pending recovery sweep,
would block the command before it could touch the graph — worst for `repair`,
the tool you reach for *when the cluster is degraded*.
Add `omnigraph_cluster::resolve_graph_storage_uri(cluster, graph_id)`: read the
state ledger, confirm the graph is in the applied revision, return
`graph_root(id)` — the URI is deterministically derivable, no catalog
validation. The CLI's cluster resolver now calls it.
Test: `optimize --cluster … --cluster-graph …` still resolves after the catalog
payloads (`__cluster/resources/`) are removed — the ledger-only path is not
blocked by degraded/unrelated catalog state.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* test(engine): pin the long-lived-handle heal contract for sidecar-covered drift
A Phase B -> Phase C failure (commit_staged advanced Lance HEAD, manifest
publish did not land, recovery sidecar persists) currently wedges every
subsequent staged write on the same engine handle: the commit-time drift
guard rejects with 'run omnigraph repair', but repair itself refuses
while a recovery sidecar is pending, so a long-lived server can only
recover by restart. The documented contract (writes.md 'Long-running
servers', invariants.md invariant 5) says refresh-time roll-forward
closes this residual without restart -- but no write path runs it.
Two red tests pin the intended contract at the write entry points:
a follow-up load (the POST /ingest shape: shared handle, no reopen)
and a follow-up mutation must heal roll-forward-eligible sidecars
in-process and then succeed.
Currently failing with:
table 'node:Company' has Lance HEAD version 2 ahead of manifest
version 1; run `omnigraph repair` before writing
The fix lands in the next commit.
* fix(engine): heal pending recovery sidecars at the staged-write entry points
Close the long-lived-process gap in the recovery protocol: a Phase B ->
Phase C residual (per-table commit_staged landed, manifest publish did
not, sidecar persists) previously recovered only at the next ReadWrite
open or via an explicit refresh() that no production write path called,
so a long-lived server wedged every subsequent write on the commit-time
drift guard until restart.
New recovery::heal_pending_sidecars_roll_forward:
- one list_dir of __recovery/ at write entry (empty -> immediate
return, the steady state), so the per-write cost is one storage list;
- per sidecar, acquires the same per-(table_key, table_branch) write
queues every sidecar writer holds from before write_sidecar until
after delete_sidecar, then re-checks sidecar existence -- this
serializes the heal against live writers instead of rolling an
in-flight sidecar forward from under its writer (which would fail
that writer's publish CAS spuriously). Lock order queues ->
coordinator matches every writer's commit->publish path. This is the
queue-acquisition design recovery.rs and write_queue.rs already
documented for in-process recovery;
- processes in RollForwardOnly mode: the common residual rolls forward
in-process; rollback-eligible sidecars still defer to the next
ReadWrite open (Dataset::restore is unsafe under concurrency).
Wire it into load_as and mutate_as (before the inline delete path can
advance any HEAD), and rebase Omnigraph::refresh onto the same helper
so refresh stops racing live writers' sidecars.
The maintenance entry points (apply_schema_as, branch_merge_as,
ensure_indices) intentionally keep their strict fail-loud preconditions
for now; wiring the same heal there is a follow-up with its own tests.
Turns the previous commit's two red tests green.
* fix(engine): name the right recovery path in the commit-time drift guard
The drift guard's 'run omnigraph repair before writing' advice is a
dead end when the drift is covered by a pending recovery sidecar:
repair refuses while a sidecar is pending. With the write-entry heal in
place, reaching this guard with sidecar-covered drift means the heal
deferred it (rollback-eligible), and the actual recovery path is a
read-write reopen. Distinguish the two classes on the error path only
(one sidecar list, after the conflict is already certain); a listing
failure falls back to the uncovered-drift wording rather than masking
the conflict.
Pinned by extending refresh_defers_rollback_eligible_sidecar_to_next_open
with a write attempt against the deferred sidecar.
* docs: write-entry in-process sidecar heal — contract and coverage
Update the recovery contract docs to match the previous two commits:
invariant 5 now states that the staged-write entry points and refresh
run in-process roll-forward recovery (long-lived processes converge on
the next write, not at restart); writes.md 'Long-running servers'
describes the heal's queue-acquisition concurrency contract, the
improved drift-guard error, and the entry points that intentionally do
not heal yet; testing.md indexes the new failpoint tests; AGENTS.md
capability matrix drops the claim that in-process recovery is entirely
future work (only the rollback path remains with the background
reconciler).
* test(engine): pin the entry heal contract for schema apply and branch merge
Without the write-entry heal, the two maintenance writers do worse than
wedge on sidecar-covered drift -- they proceed and decide its fate
implicitly:
- schema apply re-plans table rewrites from the manifest pin, orphaning
the drifted Phase-B commit (its rows silently vanish from the
rewritten table) while the stale sidecar lingers to misclassify
against the post-apply pins;
- branch merge publishes over the drift, making the failed writer's
commit visible as an unattributed side effect (no recovery audit
row), and leaves the stale sidecar behind.
Two red tests pin the intended contract: both entry points heal the
sidecar first (attributed roll-forward), then run on the converged
state. Currently failing on the stale-sidecar / dropped-rows
assertions; the fix lands in the next commit.
* fix(engine): heal pending recovery sidecars at the schema-apply and branch-merge entries
Extend the write-entry heal to the remaining two write entry points.
Unlike load/mutate (which wedge on the drift guard), these proceeded
over sidecar-covered drift and decided its fate implicitly:
- schema apply re-planned table rewrites from the manifest pin,
orphaning the drifted Phase-B commit -- its rows silently vanished
from the rewritten table -- while the stale sidecar lingered to
misclassify against the post-apply pins;
- branch merge published over the drift, making the failed writer's
commit visible without a recovery audit row, and left the stale
sidecar behind.
Both now run the same queue-serialized roll-forward heal at entry,
before their own sidecar exists, so recovery is attributed (audit row)
and deterministic. ensure_indices stays heal-free: it runs inside the
load / schema-apply flows after their entry heal.
Turns the previous commit's two red tests green. Docs updated in the
same change (invariant 5, writes.md, testing.md, AGENTS.md).
* test(engine): pin Phase A sidecar-write failure semantics
Storage fault-injection matrix, row 1: a sidecar PUT failure (S3
PutObject / fs write) in Phase A. New failpoint recovery.sidecar_write
at the top of write_sidecar -- the single choke point all five sidecar
writers go through -- models the storage error backend-generically.
Also adds the other three storage-fault failpoints used by the
following commits (recovery.sidecar_delete, recovery.sidecar_list,
recovery.record_audit); each is a no-op without the failpoints feature.
Pinned contract: every writer writes its sidecar BEFORE its first
HEAD-advancing commit, so a put failure aborts with zero drift (no
sidecar, Lance HEAD == manifest pin, no rows) and a transient fault
never wedges the graph -- the same handle writes/merges normally once
it clears. Covered for load (the staging writer) and branch_merge (the
multi-table writer, forced onto the RewriteMerged path by diverging
both sides).
* test(engine): pin Phase D delete, list, and audit-append storage-fault semantics
Storage fault-injection matrix, rows 2/3/5, plus the real-backend run:
- recovery.sidecar_delete: a Phase D delete failure (S3 DeleteObject)
must NOT fail the user's write -- the manifest publish already
landed, so the caller's data is durable. The swallowed failure
leaves a stale sidecar; the next write's entry heal consumes it via
the stale-sidecar audit-recovery path (RolledForward, attributed).
- recovery.sidecar_list: a __recovery/ list failure (S3 ListObjectsV2)
is loud at every consumer -- the write-entry heal fails the write
and the open-time sweep fails the open. Silently skipping recovery
over a pending sidecar would be consumer tolerance of drift. Once
the fault clears, open recovers the pending sidecar normally.
- recovery.record_audit: an audit write failure after the
roll-forward's manifest publish aborts that recovery attempt and
keeps the sidecar; re-entry detects the already-published manifest,
records exactly ONE RolledForward audit row, and converges -- the
retry tolerance documented on record_audit, exercised end-to-end.
- s3_load_recovers_after_publisher_failure_without_reopen: the
same-handle heal scenario on a real bucket (gated on
OMNIGRAPH_S3_TEST_BUCKET, skips locally), exercising sidecar
put/list/delete through S3StorageAdapter instead of the local-FS
adapter. CI wiring lands in a follow-up commit.
* test(engine): refuse corrupt recovery sidecars loudly
Storage fault-injection matrix, row 4 (no failpoint needed -- the
corrupt file is written by hand, sibling to the unknown-schema-version
refusal test): a truncated/garbage __recovery/{ulid}.json must be
refused loudly by both the write-entry heal (the write fails naming
the parse error) and the open-time sweep (ReadWrite open fails naming
the file), with the file left on disk for operator inspection.
Read-only opens still work -- the sweep is skipped there.
* test(engine): run the S3 sidecar-lifecycle coverage in CI + document the fault matrix
- ci.yml rustfs_integration: new step running the bucket-gated
failpoints tests (name filter s3_) against the RustFS container, so
sidecar put/list/delete are exercised through S3StorageAdapter on
every storage-affecting PR.
- writes.md: sidecar I/O failure semantics -- Phase A put failure
aborts with zero drift; Phase D delete failure is swallowed (write
already durable) and healed by the next write; list failures are
loud at heal and open; corrupt sidecars are refused with the file
kept for inspection; audit-append failures are retried to exactly
one audit row.
- testing.md: index the storage-fault matrix in the failpoints.rs row
and the new RustFS CI line.
* test(engine): pin read-visibility of acknowledged local if-absent writes
The cluster lib test import_missing_state_creates_state_with_graph_-
observation flakes at ~50% under full-workspace load ('EOF while
parsing a value' reading back the state.json its own import just
acknowledged). Root cause is in the engine's local storage adapter:
write_text_if_absent writes through a buffered tokio::fs::File and
returns when write_all resolves -- which, per tokio's documented File
semantics, means the bytes reached tokio's internal buffer, not the
file. The actual write completes in a background blocking task after
drop, so a caller that acknowledges success and reads the object back
can see an empty or partial file. Under load the window widens; the
red run fails at iteration 0 with 0 of 8192 bytes on disk.
The regression test pins the contract at the adapter boundary: when
write_text_if_absent resolves, the full contents are visible to any
reader; a losing second claim leaves the winner's object untouched.
The fix lands in the next commit.
* fix(engine): publish local storage writes with atomic visibility
Close the class, not the instance. The local adapter admitted three
ways for a reader to observe a write that was acknowledged or visible
before its bytes were complete:
1. write_text_if_absent acknowledged success when the buffered
tokio::fs::File write_all resolved -- i.e. when the bytes reached
tokio's internal buffer, not the file. A caller reading back its own
acknowledged write could see an empty object (the ~50% cluster
import flake under full-workspace load; the regression test failed
at iteration 0 with 0 of 8192 bytes visible).
2. The same call published its CLAIM (create_new) before its CONTENT,
so concurrent readers saw an empty claimed file in the window.
3. write_text (plain tokio::fs::write) exposed truncated content
mid-replace -- silently falsifying write_sidecar's 'readers either
see the complete sidecar or none' contract on local FS (true on S3,
where PutObject is atomic).
A flush in write_text_if_absent would have fixed only (1). Instead,
both local write paths now publish complete temp files atomically:
rename for replace (write_text -- the idiom write_text_if_match
already used) and hard_link for no-replace (write_text_if_absent --
link fails AlreadyExists, so exactly one of N concurrent claimants
wins and the winner's object is fully readable at the instant it
becomes visible). The local adapter now honors the same object-level
atomic-visibility contract as the S3 adapter, which is what every
caller (recovery sidecar protocol, cluster state CAS) was written
against. Crash-orphaned *.tmp.* files are inert: the sidecar sweep
filters to .json, and cluster state reads address state.json by name.
fsync/durability policy is unchanged (no fsync before, none now);
this fix is about visibility ordering, not power-loss durability.
Pre-existing on main (landed with the multi-graph server mode change,
PR #119); surfaced by this branch's heal work only because one extra
list_dir per write shifted test timing. Cluster lib suite: 12/25
failures before, 0/25 after. Turns the previous commit's red test
green.
* refactor(engine): one storage implementation over object_store for every backend
Collapse LocalStorageAdapter (hand-rolled tokio::fs) and
S3StorageAdapter into a single ObjectStorageAdapter backed by
Arc<dyn object_store::ObjectStore> -- LocalFileSystem for local URIs,
the existing AmazonS3 build for s3://, plus a pub in_memory()
constructor (full contract including TRUE conditional updates; the
in-memory test backend testing.md asked for at the adapter level).
Why: the acknowledged-before-visible bug showed the two-impl shape has
no referee -- one prose contract, two independent answers. Upstream
LocalFileSystem::put_opts is byte-for-byte the staged-temp+rename/
hard_link idiom that fix converged on, and Lance's own commit protocol
is built on the same primitives (put-if-not-exists / rename-if-not-
exists), so the substrate-aligned move is to stop hand-rolling it.
The per-backend residue shrinks to a UriCodec (URI <-> object path)
and one capability flag.
Semantics preserved by construction, with three deliberate deltas:
- exists() is now object-store-semantics everywhere (head + non-empty
prefix fallback): an EMPTY local directory no longer 'exists'. The
only dir-shaped caller (_graph_commits.lance probes) self-heals via
ensure_commit_graph_initialized where it previously wedged loudly.
- A directory at an object path reads as NotFound, not as an IO error
('only objects exist'). The cluster unreadable-payload test used a
same-named directory as a portable non-NotFound trigger; it now uses
chmod 000, which still models genuine transient IO.
- write_text_if_match keeps content-token semantics on local
(PutMode::Update is NotImplemented upstream for LocalFileSystem in
0.12.5 and 0.13.2); the capability flag gates the token SOURCE in
read_text_versioned too -- an ETag token with content-compare writes
would lose every CAS.
delete_prefix keeps a local remove_dir_all branch: directories are a
local-FS concept, and list+delete would leave empty skeletons that
cluster graph_root_exists (raw Path::exists) reports as still present.
LocalStorageAdapter remains as a delegating shim so the pinned
contract tests gate this swap textually unchanged; the shim and the
test parameterization over local + in-memory land next. Cargo gains
the explicit 'fs' feature (already transitively enabled by lance).
* test(engine): one executable storage contract, run against every backend
Remove the LocalStorageAdapter delegation shim and migrate its
construction sites to ObjectStorageAdapter::local(). Replace the
per-backend duplicated tests with a single contract_suite asserting
the trait's promises (atomic replace, exists incl. the dataset-root
prefix probe, one-winner if_absent, versioned CAS with loud CAS-lost,
rename, list round-trip with no sibling-prefix bleed, idempotent
delete/delete_prefix), run against the local backend and the new
in-memory backend -- which implements true conditional updates, so the
strong-CAS path is exercised without a bucket. The bucket-gated S3
variant already exists (s3_adapter_conditional_writes_contract).
New local-specific pins for the deliberate semantic edges of the
collapse: empty directories are not objects (exists=false; the Lance
dataset-root probe shape is the non-empty case), file://-anchored and
spaces-in-path list output round-trips byte-identically into
read_text, dot-segment paths are lexically absolutized (the CLI's
./graph.omni shape), and upstream rename creating missing destination
parents. The acknowledged-write visibility regression test stays, now
documenting that the cross-API std::fs read-back is the point.
* refactor(cluster): drop put_json's per-backend atomicity branch
The local temp+rename dance predates the storage adapter guaranteeing
atomic visibility; now that write_text publishes via a staged temp +
rename on the filesystem (and a single atomic PUT on object stores) by
contract, the branch duplicated upstream behavior. One call, both
backends.
* docs: storage adapter collapse — contract, in-memory backend, local CAS gap
- testing.md: the 'no MemStorage backend' note is half-closed —
ObjectStorageAdapter::in_memory() covers the text-object layer with
the full contract (true conditional updates); Lance datasets bypass
the adapter, so the engine substrate ask stays open.
- invariants.md: truth-matrix Tests row updated; new Known Gap for
local write_text_if_match (upstream PutMode::Update is unimplemented
for LocalFileSystem; content-token emulation is safe only under the
cluster lock protocol — close before admitting a lock-free caller).
- writes.md: backend notes for the unified adapter (name#N staging
residue invisible to the sweep, backend-wrapped error text with
exists()-probing for missing-vs-error, loud permission failures).
* docs: finish renaming the storage adapters in user docs and test comments
storage.md's URI-scheme table and the S3 failpoint test's doc comment
still named the deleted LocalStorageAdapter/S3StorageAdapter; both now
describe the unified ObjectStorageAdapter over object_store, including
the relative-path absolutization note for local URIs.
* test(engine): pin branch-awareness of the drift guard's recovery advice
A pending sidecar on ANOTHER branch does not cover this branch's
drift: with a deferred feature-branch sidecar on disk and genuinely
uncovered drift on main, the main write's error must still point at
omnigraph repair -- a read-write reopen recovers the sidecar but
cannot repair main's uncovered drift. Currently red: the guard
matches sidecar pins by table_key only, so the feature sidecar flips
main's advice to the reopen path. Fix in the next commit.
Surfaced by external review of the drift-guard change.
* fix(engine): branch-aware sidecar matching in the drift guard's advice
The commit-time drift guard's sidecar-covered check matched pins by
table_key alone, so a pending sidecar on another branch flipped this
branch's uncovered-drift advice from 'run omnigraph repair' to the
reopen path -- and a reopen recovers that sidecar but cannot repair
this branch's drift. Compare the pin's table_branch too. Turns the
previous commit's red test green.
Surfaced by external review of the drift-guard change.
* test(engine): pin heal non-interference with a live schema apply
The write-entry heal's schema-staging reconcile runs before any queue
acquisition, so a load on the same handle, overlapping a schema apply
parked between its staging write and manifest commit, promotes the
apply's staging files (new catalog live against the old manifest),
classifies the LIVE apply's sidecar, and publishes its registrations
out from under it. The resumed apply then collides with its own stolen
commit. Currently red with:
Lance("Concurrent modification: table version 3 already exists for
node:Tag")
The fix (per-sidecar reconcile under the sidecar's write-queue guards,
plus a serialization key the schema-apply writer and the heal both
acquire) lands in the next commit.
Surfaced by external review of the write-entry heal.
* fix(engine): serialize the heal's schema-staging reconcile with live schema applies
The write-entry heal ran recover_schema_state_files up front, before
acquiring any queue guards. Overlapping a live schema apply parked
between its staging write and manifest commit, the heal promoted the
apply's staging files (new catalog live against the old manifest),
classified the LIVE apply's sidecar, and published its registrations —
the resumed apply then collided with its own stolen commit.
Correct by construction:
- New schema-apply serialization queue key, acquired by the schema-
apply writer (alongside its per-table keys) from before write_sidecar
until after delete_sidecar. Per-table keys alone don't cover a
registration-only migration, which pins no existing tables but has a
sidecar and staging files on disk.
- The heal reconciles schema staging lazily, PER SchemaApply sidecar,
after acquiring that sidecar's guards (including the serialization
key) and re-confirming the sidecar exists — a sidecar that survives
the queue wait belongs to a dead writer, so the reconcile can no
longer race a live apply. Recomputing per sidecar also removes the
staleness of one up-front result across a multi-sidecar pass.
- Omnigraph::refresh drops its up-front reconcile-and-pass-through
(same race, and a pre-promoted result would make the heal's guarded
reconcile see clean staging and wrongly defer the sidecar): it now
reconciles standalone only when NO sidecar exists — which cannot
race a live apply, whose sidecar always precedes its staging files —
and otherwise defers entirely to the heal.
The open-time sweep keeps its precomputed reconcile: open has no
concurrent writers. Turns the previous commit's red test green.
Surfaced by external review of the write-entry heal.
Self-audit addendum folded in: refresh's no-sidecar gate had a TOCTOU
(a live apply could write its sidecar + staging between the empty
check and the reconcile) — the standalone reconcile now holds the
serialization key across the list-then-reconcile pair. The remaining
residual is cross-process only (in-process queues cannot serialize
against a writer in another process; the open-time sweep has the same
pre-existing exposure) and is now an explicit Known Gap in
invariants.md rather than an implicit one.
* test(engine): pin catalog reload after the heal recovers a schema apply
When the write-entry heal rolls a crashed apply's SchemaApply sidecar
forward on the same handle, disk and manifest move to the new schema
(staging promoted, registrations published) but the handle's in-memory
schema_source/catalog do not. Subsequent writes then validate against
the stale catalog and reject rows of types the graph already has.
Currently red with:
record 1: unknown node type 'Tag'
refresh() reloads after its heal; the write entry points must too.
Fix in the next commit.
Surfaced by external review of the write-entry heal.
* fix(engine): reload the in-memory catalog after the heal recovers a schema apply
heal_pending_recovery_sidecars refreshed the coordinator and
invalidated the runtime cache after processing sidecars, but never
reloaded schema_source/catalog — so a write whose entry heal rolled a
crashed SchemaApply sidecar forward proceeded to validate against the
OLD schema while disk and manifest were already on the new one.
reload_schema_if_source_changed is the same post-heal step refresh()
already runs; it no-ops on the (overwhelmingly common) non-schema heal
because the on-disk source is unchanged. Turns the previous commit's
red test green.
Surfaced by external review of the write-entry heal.
* test(engine): pin that a deleted-branch sidecar cannot wedge the graph
A rollback-eligible sidecar pinned to a branch is deferred by every
roll-forward-only pass; if the branch is then deleted, the sidecar
survives, referencing a branch with no manifest tree. The heal (every
write entry) and the open-time sweep (every ReadWrite open) both fail
opening the dead branch, and repair refuses while a sidecar is pending
-- a terminal read-only state with manual sidecar surgery as the only
exit. Currently red with:
Lance("Not found: .../__manifest/tree/feature/_versions")
The branch's tree and forks are already reclaimed, so the pinned drift
is unreachable and the sidecar is provably moot; the fix classifies it
as an orphaned-branch terminal state (audit + discard) in both passes.
Surfaced by review (P1, verified by repro).
* fix(engine): classify deleted-branch sidecars as orphaned instead of wedging
A deferred (rollback-eligible) sidecar pinned to a branch survives
branch_delete; both the write-entry heal and the open-time sweep then
failed unconditionally opening the dead branch -- every write and
every ReadWrite open errored, and repair refuses while a sidecar
pends. Terminal state, manual sidecar surgery the only exit.
The branch's tree and per-table forks are already reclaimed at delete,
so the drift the sidecar pins is unreachable and the sidecar is
provably moot. Both passes now check the sidecar's branch against the
manifest's branch list (the authority -- deliberately NOT inferred
from a Not-found on open, which could be a transient storage error
masking real recovery intent) and discard orphans with an
OrphanedBranchDiscarded audit row, commit appended on main since the
sidecar's own branch no longer has a commit graph.
The open-time half is pre-existing; the write-entry heal made it hot.
Turns the previous commit's red test green.
Surfaced by review (P1, verified by repro).
* chore: harden review nits — vacuous CI filter, root-runner skip, liveness note
- ci.yml: the RustFS sidecar-lifecycle step now fails loudly if the
's3_' name filter matches zero tests (cargo passes vacuously on an
empty filter; the step exists specifically to prove S3 sidecar I/O
coverage). The pre-existing CLI smoke step has the same shape and is
left for a follow-up.
- cluster unreadable-payload test: cfg(unix) + a skip-with-log when
running as root (mode 000 is still readable to root, common in
container dev runners), so the test degrades instead of failing.
- refresh: document the one-pass-late convergence for legacy staging
residue while non-SchemaApply sidecars pend, so nobody 'fixes' it by
re-running the reconcile unserialized — the exact race the
serialization key closes.
* test(engine): pin orphan-discard idempotency across a delete fault
discard_orphaned_branch_sidecar writes its audit row and main commit
before deleting the sidecar; a Phase D delete fault leaves the sidecar
on disk with the audit already durable, and the retry repeated the
whole path -- a second OrphanedBranchDiscarded audit row (and commit)
for the same operation. Currently red: 2 rows after one fault + retry.
The retry must only finish the delete. Fix next.
Also promotes the recovery-audit kinds reader into the shared test
helpers (it was recovery.rs-local).
Surfaced by external review of the orphan-discard fix.
* fix(engine): orphan-discard idempotency + heal reports acted-vs-deferred
Two review findings on the recovery surface:
- discard_orphaned_branch_sidecar now checks the audit table for an
existing (operation_id, OrphanedBranchDiscarded) row before appending
the commit + audit pair, so a Phase D delete fault retries ONLY the
delete instead of duplicating audit rows and commit-graph entries.
Cold path: the list scan runs only when an orphaned sidecar exists.
Turns the previous commit's red test green (exactly one audit row
across fault + retry).
- process_sidecar returns whether durable state changed; the heal sets
processed_any only for sidecars that were actually rolled forward /
rolled back / audit-recovered (orphan discards count). Deferred
sidecars (rollback-eligible, invariant-violating, unpromoted
SchemaApply) no longer trigger a per-write schema reload + full
runtime-cache invalidation while they pend -- the cache is
snapshot-keyed so this was waste, not corruption, but it was paid on
every write until reopen. Acted-paths' processed=true remains pinned
by load_after_schema_apply_phase_b_failure_uses_recovered_catalog
(the reload depends on it).
Surfaced by external review.
* test(engine): pin the orphan-discard audit-append fault leg as documented tolerance
The orphan discard's commit append and audit append are two writes; a
failure between them leaves a recovery commit with no audit row, and
the retry (keyed on the audit row, the operator-facing record) appends
a second commit before the audit lands. This is the same
not-atomic-pair-write tolerance record_audit documents and the
manifest->commit-graph Known Gap covers for every publish: bounded
commit-graph noise, audit row exactly-once under clean failures.
Keying idempotency on commit rows instead would need an operation_id
column on _graph_commits, and audit-before-commit would dangle the
graph_commit_id join -- both worse than the documented residual.
Make the tolerance explicit instead of implicit: docstring names the
window, a failpoint sits inside it, and the new test pins convergence
across the fault (sidecar consumed, exactly one audit row), completing
the orphan-discard fault matrix alongside the delete-fault leg.
Surfaced by external review of the orphan-discard idempotency.
* test(engine): pin honest drift-guard advice when sidecar listing fails
The guard's unwrap_or(false) conflated 'classified as uncovered' with
'could not classify': a transient list fault on the guard's second
list (the entry heal's first list having succeeded) confidently routed
the operator to omnigraph repair even when the heal had just deferred
a rollback-eligible sidecar -- and repair refuses while a sidecar is
pending. Currently red: the error says 'run omnigraph repair' with no
mention of the reopen path. The fix names both paths plus the failure
cause when classification is impossible.
Surfaced by external review of the drift-guard fallback.
* fix(engine): admit ambiguity in the drift guard when sidecar listing fails
Replace the unwrap_or(false) fallback with a tri-state: covered ->
reopen advice; uncovered -> repair advice; listing FAILED -> say the
drift could not be classified, name the cause, and give both paths in
order ('run repair, or reopen read-write if repair reports a pending
sidecar'). The old fallback confidently routed a transient list fault
to repair, which refuses while a sidecar is pending -- a self-
correcting but pointless detour. The conflict itself is still always
raised; only the advice degrades honestly. Turns the previous commit's
red test green.
Surfaced by external review of the drift-guard fallback.
All six crate manifests + their path-dependency constraints, Cargo.lock,
the regenerated openapi.json version metadata, AGENTS.md's surveyed
version, and the v0.7.0 release notes (object-storage clusters,
config-free --cluster serving, the operator config surface, keyed
credentials, operator targeting/aliases, and the omnigraph.yaml
deprecation stages).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
s3_cluster.rs runs the full control-plane lifecycle against a real
bucket (CI: containerized RustFS; locally the RustFS binary): import →
lock released (pins the drop-time release regression caught on the first
live smoke) → apply (graph roots + catalog on the bucket, nothing local)
→ serving snapshots from both the config dir and the bare URI → schema
evolution → approved delete (prefix removal) → empty-cluster refusal.
The server suite gains the config-free boot test: --cluster s3://… with
zero local files serves a stored query over HTTP.
CI: the rustfs job runs both suites; the classify filter covers the
cluster store/serve modules and the new test files. The server smoke
drops its name filter — every test in the s3 target is bucket-gated, and
a filter matching nothing passes vacuously (which silently ran zero
tests for a while).
Docs: deployment.md gains the Bucket-no-volume shape as the preferred
cloud deployment; cluster.md/server.md document --cluster <uri>;
testing.md maps the new suite.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two serving changes that complete RFC-006's read side:
ServingPolicy carries the policy bundle CONTENT (digest-verified at
snapshot read) instead of a blob path — the catalog may live on object
storage, and the server must not re-read mutable state after the
snapshot. The server grows a PolicySource enum: File for omnigraph.yaml
deployments (unchanged), Inline for cluster boots, wired through
PolicyEngine::load_{graph,server}_from_source.
read_serving_snapshot_from_storage(uri) reads the applied revision
straight from a storage root, and --cluster accepts a scheme-qualified
URI (s3://bucket/prefix): config-free serving — a serving box needs only
the URI and credentials; the ledger and catalog on the bucket ARE the
deployment artifact. Bare paths keep the config-directory behavior.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Caught by the first live s3 smoke: StateLockGuard's spawned async delete
dies with the runtime when a short-lived CLI process exits right after the
command — import's lock survived into the next command as state_lock_held.
On the multi-thread runtime (the CLI, and the gated s3 tests)
block_in_place waits for the delete to complete; current-thread runtimes
keep the spawn fallback with force-unlock as the documented recovery, same
as a crash.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cluster.yaml gains an optional storage: URI deciding where everything the
cluster STORES lives: the state ledger, lock, content-addressed catalog,
recovery sidecars, approval artifacts, and the derived graph roots
(<storage>/graphs/<id>.omni). Absent, it defaults to the config directory
itself — the original layout, byte-compatible, so pre-existing clusters and
the whole test suite are untouched. Declared configuration always stays in
the working tree (Terraform's config-local/state-remote split); credentials
are env-only, never in cluster.yaml.
Every command resolves its store from the declared root (a bad root is a
loud invalid_storage_root). Graph-root derivation, the delete executor
(prefix delete via the adapter), the sweep's existence probes, the catalog
payload write/verify/read paths, and the serving snapshot all flow through
ClusterStore — the last raw-fs holdouts for stored state are gone, and the
deny-list gains the rule that keeps it that way.
Tests: default-layout byte-compat, a file:// root relocating the entire
cluster (ledger+catalog+graphs under the new root, nothing under the config
dir, serving snapshot follows), invalid-root validation. 98 in-crate + 9
failpoints + full workspace gate green. The s3:// flavor lands with PR 3's
gated RustFS e2e.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
LocalStateBackend becomes ClusterStore: every stored byte — state ledger,
lock, recovery sidecars, approval artifacts — now flows through the
engine's StorageAdapter, making file:// and s3:// one code path. Behavior
on the file backend is byte-compatible (layout, CAS semantics, diagnostics,
lock release timing) and the entire pre-existing suite passes unchanged.
Mechanics: the ledger CAS keeps its public sha256 vocabulary while the
physical swap is token-conditioned (ETag If-Match on S3 via PR #186's
primitives; content-token + temp/rename locally — the pre-port semantics);
the lock is a create-only put (genuinely cross-machine on object stores)
with deterministic drop-release locally and best-effort spawned release on
S3; sidecars/approvals address by URI (SweepOutcome and the executors carry
strings); sweep row-1 retirement joins the uniform deferred post-CAS
cleanup. ClusterStore also gains the catalog-payload and graph-root
methods that commit 2 wires in.
Async ripple: status/force-unlock/serving-snapshot and the server's
settings loader chain go async (CLI dispatch and ~20 test hosts follow,
mechanically). tokio joins the cluster crate's runtime deps for the lock
guard's handle.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verbatim move of the public output/diagnostic types and the internal
state/sidecar/approval models; previously-private types and their fields
get pub(crate) (they were crate-visible by position before). lib.rs is now
the command pipeline + public API. 95 tests green; full workspace gate
green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verbatim move of cluster.yaml parsing, query discovery, source digesting,
header/id validation, path resolution, and live-graph observation. Two
helpers that the cut swept along were relocated to their right homes
(state-status helpers back to lib.rs, lock-file helpers to store.rs). 95
tests green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verbatim move of the Serving* types, read_serving_snapshot, and
read_verified_payload; public re-exports preserved (the server's imports
are unchanged). 95 tests green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verbatim move of LocalStateBackend, StateSnapshot, StateLockGuard and their
impls — the single home for stored-state I/O (state ledger, lock, recovery
sidecars, approval artifacts), where the RFC-006 object-storage port lands
next as a focused diff. Visibility bumps (pub(crate)) only; 95 tests green
before and after.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verbatim move (indentation preserved — embedded raw-string fixtures are
content). lib.rs drops from 7,857 to ~4,750 lines; `use super::*` resolves
to the crate root through the #[path] module declaration unchanged. 95
tests green before and after.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
resolve_query_decls hands its file contents to the caller; the per-query
digest/typecheck pass reuses them instead of re-reading (a file with N
queries was read N+1 times), which also closes the window where a file
changing between enumeration and validation produced a confusing
query_key_mismatch for a just-discovered name. Explicit-map declarations
read as before.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cluster.yaml's graphs.<id>.queries previously accepted only an explicit
name->file map, forcing configs to re-enumerate every `query <name>` that
the .gq files already declare (the SPIKE cookbook needed 66 entries for 6
files). The files ARE the declaration now: `queries: queries/` discovers
every declaration in a directory's top-level *.gq (sorted), a list form
takes explicit files, and the map stays for fine-grained control.
Discovery is loud — unreadable/unparseable files and duplicate query names
fail validation (query_parse_error, duplicate_query_name). Downstream is
untouched: each discovered query is still an individually addressed
resource with the containing file's digest.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RFC-005 §D2/§D4: read_serving_snapshot reads the applied revision as
everything a server needs to boot — graphs at derived roots, stored-query
sources read from the content-addressed catalog and re-hashed against the
recorded digests, policy blob paths with their applied applies_to bindings.
All-or-nothing: missing state, pending recovery sidecars, missing/tampered
blobs, pre-5A entries without bindings, and an empty graph set each refuse
the snapshot with a remedy; no partial serving. Lock-free by design — the
state file is replaced atomically, so the read is a consistent
point-in-time ledger.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Slice 5A of RFC-005: the state ledger becomes serving-sufficient for the
Phase-5 server boot. StateResource gains an optional applies_to (normalized
typed refs: cluster | graph.<id>), written by apply for every applied policy
create/update from the desired config's validated bindings.
The hole this closes: applies_to is not part of the policy file digest, so a
binding-only edit previously produced NO plan change at all (a 4C e2e even
asserted that — the gap, not a contract). Binding changes are now
first-class: a post-diff pass emits an Update with equal before/after
digests and a binding_change marker (visible in plan/apply JSON and human
output as [bindings]), classification/execution treat it as an ordinary
catalog-tier applied change (payload skips naturally — the blob is
unchanged), and convergence requires zero binding divergence, so stale
bindings can never report converged. Pre-5A ledger entries (no bindings
recorded) surface as the same backfill Update; one apply heals them, exactly
the remedy RFC-005's boot-error path names.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Crash before the removal: root intact, approval file unconsumed, sidecar
survives, no ack; the next run retires the stale intent (row 8) and the
still-approved delete completes in the same run.
- Crash after the removal, before the state CAS: root gone, ledger
byte-identical, the sidecar carries the approval id; the next run's sweep
rolls the tombstone forward, consumes the approval, audits the recovery,
and converges (row 7b).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Stage 4C execution half (RFC-004 §D5/§D6 + sweep rows 7/7b/8): an approved
graph.<id> delete — and its riding schema/query deletes — classifies Applied
and executes LAST in the run, sidecar-fenced: pre-op manifest pin (best
effort; partial roots still delete), approval_id carried in the sidecar,
recursive root removal (NotFound tolerated), subtree tombstoned out of the
ledger with a tombstone observation, the approval consumed in the same state
CAS (ledger summary) and its artifact file rewritten with consumed_at only
after the CAS lands — a failed run consumes nothing and the approval stays
valid for the retry.
Sweep rows: already-tombstoned intents retire (7); a completed delete with a
stale ledger rolls forward — tombstone + approval consumption + audit entry
(7b, idempotent); a still-present root retires the stale intent with a
graph_delete_incomplete warning and the still-approved delete re-executes in
the same run (8) — prefix removal is idempotent, so retry IS the repair.
The multi-graph mixed e2e gets its conclusion: blocked without approval,
cluster approve graph.engineering --as andrew, converge, tombstone visible
in status. Phase 4's disposition matrix is now fully executable.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RFC-004 §D4, gate half: graph deletes (and their subtree) now classify
Blocked/approval_required instead of Deferred; the new cluster approve
command (requires the global --as actor) writes
__cluster/approvals/{ulid}.json bound to the desired config digest and the
change's before/after digests, so config or state drift invalidates the
artifact automatically (approval_stale warning, never authorizes). One gate
per subtree: compute_approvals lists only the graph-level delete, and
ApprovalRequirement gains a satisfied flag surfaced by plan. Consumption and
the delete executor land next — until then approved deletes stay blocked so
a gate-only build can never strip state without removing the root.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Crash before the engine call: sidecar (carrying the --as actor) survives,
live schema and ledger untouched, no ack; the next run's sweep retires the
stale intent and the same run applies and converges.
- Crash after the engine call, before the state CAS: the manifest moved with
the post-op pin in the sidecar, state.json byte-identical; the next run's
sweep rolls the ledger forward with a schema_apply audit entry and the run
converges.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Stage 4B (RFC-004 §D1/§D5): schema.<id> Update changes classify Applied and
execute after graph creates, sequentially and sidecar-fenced — read-write
open (the engine's own recovery runs first), pre-op manifest pin recorded,
apply_schema_as with allow_data_loss: false (soft drops only; hard drops wait
for 4C's approval artifacts), post-op pin rewritten into the sidecar, sidecar
retired only after the final state CAS. Queries gated on a same-plan schema
update unblock (the migration lands first in the same run); failures —
unsupported migrations, lock contention, user branches — surface as
schema_apply_failed with the engine's message, demote dependents via the
origin-aware demotion helper, and stop further graph-moving work.
Schema evolution is now fully cluster-driven (the defer -> manual schema
apply -> refresh loop is gone), and out-of-band schema drift is converged
back by apply as an ordinary soft migration (axiom 8: drift correction is
gated like any change; the recoverable tier needs no approval) — both pinned
by reworked e2es. The multi-graph mixed e2e's deferred row is now
delete-shaped, pre-staging the 4C surface.
Actor: cluster apply accepts the CLI's global --as via the new ApplyOptions /
apply_config_dir_with_options (apply_config_dir delegates unchanged); the
actor is echoed in ApplyOutput and recorded in sidecars and audit entries,
and threads to apply_schema_as so Cedar fires wherever a checker is
installed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RecoverySidecarKind::SchemaApply with digest-based sweep classification
(robust to unrelated manifest movement; version pins stay forensic):
ledger-consistent -> sidecar retired (RFC-004 rows 1+2); live digest matches
the intended schema, state stale -> roll forward with composite recompute and
a recovery_records audit entry (row 3); unverifiable or unexpected digests ->
pending, kept, graph-moving work blocked (rows 1-unopenable/6).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RFC-004 §D7's data-aware preview: for every schema update, plan opens the
live graph read-only and embeds the engine's migration plan (supported flag
+ typed steps) in the change record; the human renderer prints the steps.
Preview failures (unreachable graph, planner error) degrade to the digest
diff with a schema_preview_unavailable warning — planning never blocks.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mechanical conversion ahead of Stage 4B (plan will preview schema migrations
against live graphs): signature, CLI dispatch, and test callers. Zero
behavior change.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Crash before the init (row 1): sidecar survives, nothing moved, no ack;
the next run's sweep removes the intent and the same run creates and
converges.
- Crash after the init, before the state CAS (row 4): the graph exists with
the post-init manifest pin in the sidecar, state.json byte-identical; the
next run's sweep rolls the ledger forward with a recovery_records audit
entry and the run converges.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Stage 4A (RFC-004 §D1/§D5): graph.<id> Create — and its paired schema Create,
which the init carries — classify Applied and execute first in the run,
sequentially and sidecar-fenced: sidecar written before Omnigraph::init at
the derived root, rewritten with the post-init manifest pin, deleted only
after the final state CAS lands. Dependent queries and policies no longer
block on a graph create in the same plan — creates run first, so they apply
in the same run; a create failure demotes them to blocked
(dependency_not_applied) and stops further graph-moving work (loud partials),
with the sidecar left for the sweep to classify. Graphs with a kept recovery
sidecar (rows 5/6) classify Blocked/cluster_recovery_pending, and the sweep's
Drifted/Error statuses are never clobbered by a generic Blocked.
Schema source is re-read and digest-verified under the lock before the init
(the write_resource_payload TOCTOU posture). Plan previews the same
dispositions. e2e fallout updated: a fresh multi-graph config now converges
in one apply; a destroyed root is re-created as an EMPTY graph by the next
apply (declarative convergence — visible in plan, called out in docs); the
new cluster_e2e_declared_graph_created_by_apply pins the no-manual-init flow.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
RFC-004 §D2/§D3 for the graph_create kind. RecoverySidecar records intent
under __cluster/recoveries/{ulid}.json; the roll-forward-only sweep runs at
the start of apply/refresh/import under the state lock and classifies each
survivor by observation: root absent -> intent removed (row 1); outcome
already recorded -> retired (row 2); create completed but state stale ->
ledger rolled forward with a recovery_records audit entry (row 4); partial
root -> Error/graph_create_incomplete, kept, never auto-deleted (row 5);
unexpected schema -> Drifted/actual_applied_state_pending, kept (row 6).
Sweep mutations ride the command's existing CAS write; completed sidecars
are deleted only after that write lands. Read-only status/plan warn
(cluster_recovery_pending) without acting. The apply payload gate now counts
only payload-phase errors so kept-sidecar diagnostics don't abort the run
before their statuses persist.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mechanical conversion ahead of Stage 4A graph create (which calls the async
Omnigraph::init from inside apply): the fn signature, the CLI dispatch arm,
and every test caller (#[test] -> #[tokio::test]). Zero behavior change; all
60 lib tests and 3 failpoint tests green before and after.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ScopedFailPoint::with_callback gives cfg_callback the same Drop-based cleanup
as cfg actions; a panic while the point is active no longer leaks the callback
into the process-global registry where it would fire under later tests
(greptile review, PR #167).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The apply-side coverage the implementation spec's hard gate requires before
Phase 4 graph-moving apply:
- crash after the payload phase: state.json byte-identical, blobs inert on
disk, lock released, no phantom statuses, nothing acknowledged; a plain
re-run repairs via skip-if-exists blob reuse.
- CAS race: a cfg_callback rewrites state.json at the exact read->write
window (the state.lock:false concurrent-writer scenario); apply surfaces
state_cas_mismatch, acknowledges nothing, reports the persisted status
snapshot, leaves the concurrent writer's state on disk; a re-run converges.
CI's failpoints step now runs both the engine and cluster suites.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Optional failpoints feature (dep:fail + fail/failpoints, deliberately NOT
enabling omnigraph/failpoints), a maybe_fail/ScopedFailPoint module returning
Diagnostic-typed injected errors, and two call sites in apply_config_dir:
cluster_apply.after_payload_phase (the crash point: blobs on disk, state
untouched) and cluster_apply.before_state_write (routes through the
persisted-statuses revert contract; a cfg_callback here can mutate state.json
to make the CAS check fail organically). Feature off compiles to Ok(()) —
zero behavior change. Tests live in a separate integration binary because the
fail registry is process-global. Also refresh the crate description (stale
'read-only' since Stage 3A).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Closes the Stage 3A product gap where a deleted or corrupted blob under
__cluster/resources/ went unnoticed forever (status reported converged and
apply could not repair it because the digests matched).
verify_catalog_payloads checks every query/policy digest in state against its
content-addressed blob (existence + full sha256 re-hash; graph/schema/unknown
addresses have no payloads and are skipped). status reports findings read-only
(warnings catalog_payload_missing/_mismatch; error catalog_payload_read_error
— an unverifiable catalog must not report healthy). refresh closes the
self-heal loop: missing/mismatched blobs mark the resource drifted and remove
its digest from state so the next plan proposes a create and the next apply
republishes; unreadable blobs keep the digest (no spurious republish), mark
error, and exit non-zero. Verification runs before graph observation so the
recomputed graph composite already excludes removed query digests.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two review findings (greptile, PR #165):
- ApplyOutput.resource_statuses on a failed state write now carries the
pre-apply on-disk snapshot instead of the in-memory mutations that were
never persisted, so automation reading the field independently of `ok`
cannot see phantom applied/blocked statuses. Regression test forces the
state write to fail via a read-only __cluster dir (unix-only, skips when
permissions are not enforced).
- Human-mode `cluster apply` prints the classified changes list on failure
too, so an operator debugging a partial apply without --json sees what was
attempted.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
apply_config_dir executes the query/policy subset of the plan: payloads are
written content-addressed under __cluster/resources/{query,policy}/... before
the state CAS (state is the publish point; orphaned blobs from a failed CAS
are inert and re-apply is the repair), then state.json is CAS-updated with
applied digests, Applied/Blocked statuses, and a revision bump. Graph/schema
changes are never executed here: schema content and graph lifecycle defer to
a later phase with loud warnings, while graph.<id> composite-digest updates
whose schema component is unchanged converge automatically via recomputation
from state's own components (without which apply could never converge).
Idempotent re-apply leaves state bytes and revision untouched.
PlanChange gains optional disposition/reason fields, populated by the same
classifier in cluster plan, so plan is an honest preview of what apply will
execute, derive, defer, or block.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>