omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-15 01:55:13 +02:00

Author	SHA1	Message	Date
Andrew Altshuler	6144bb18d6	feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3) (#221 ) * feat(cluster): cluster_root_for_graph_uri detection helper (RFC-010 Slice 3) Public helper the CLI uses to refuse `init` into a cluster-managed location: given a graph storage URI of the cluster layout (`<root>/graphs/<id>.omni`), return the cluster root if `<root>` holds `__cluster/state.json`, else None. Cheap by construction — a URI that doesn't match the `<root>/graphs/<id>.omni` shape returns None with zero I/O, so ordinary `init` targets never probe storage. Works for file:// and s3:// via the storage adapter. Adds two ClusterStore accessors (`display_root`, `has_state`). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3) Two cluster-graph-aware CLI behaviors, sharing the cluster-resolution path. Maintenance addressing. `optimize`/`repair`/`cleanup` gain `--cluster <dir\|s3://…> --cluster-graph <id>`, which resolves the graph's storage URI from the served cluster snapshot (the same truth a `--cluster` server boots from — `read_serving_snapshot`) and opens it embedded. The operator no longer hand-types `<storage>/graphs/<id>.omni`. A distinct flag is required because the global `--graph` is `requires = server` and means a remote multi-graph id. clap enforces both-or-neither and exclusion with the positional URI / `--target`; an unserved graph errors loudly, pointing at `cluster apply`. init signpost. `init` refuses a cluster-managed positional path (the `<root>/graphs/<id>.omni` layout where `<root>` holds `__cluster/state.json`, detected by `cluster_root_for_graph_uri`) and points at `cluster apply` — graphs in an established cluster are created with ledger/recovery/approvals, not by hand. The check is gated on the path shape, so ordinary `init` does no extra I/O and existing pre-apply cluster-graph inits are unaffected. planes guard remediation now also mentions `--cluster … --cluster-graph …` (the two Slice-1 guard-string tests track it). Docs updated (cli-reference Command planes, maintenance.md, cluster.md §7); the stale "no S3-hosted cluster directories" limitation is dropped (RFC-006 landed it). Tests (cli_cluster.rs, reusing the apply-a-cluster fixture): resolve by id, unknown-id error, `--cluster` requires `--cluster-graph`, init refusal + signpost, and ordinary init still works. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> fix(cli): resolve cluster graphs from the state ledger, not the serving snapshot Addresses the Greptile review on #221. `read_serving_snapshot` does all-or-nothing serving validation — recovery-sidecar checks plus a digest verify of every catalog payload (query .gq, policy blobs). Using it to resolve a maintenance target coupled `optimize`/`repair`/`cleanup` to the readiness of unrelated resources: a single corrupt policy blob, or a pending recovery sweep, would block the command before it could touch the graph — worst for `repair`, the tool you reach for when the cluster is degraded*. Add `omnigraph_cluster::resolve_graph_storage_uri(cluster, graph_id)`: read the state ledger, confirm the graph is in the applied revision, return `graph_root(id)` — the URI is deterministically derivable, no catalog validation. The CLI's cluster resolver now calls it. Test: `optimize --cluster … --cluster-graph …` still resolves after the catalog payloads (`__cluster/resources/`) are removed — the ledger-only path is not blocked by degraded/unrelated catalog state. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 02:52:21 +03:00
Ragnor Comerford	446b46d548	Recovery liveness, storage fault-injection matrix, and one storage implementation over object_store (#203 ) * test(engine): pin the long-lived-handle heal contract for sidecar-covered drift A Phase B -> Phase C failure (commit_staged advanced Lance HEAD, manifest publish did not land, recovery sidecar persists) currently wedges every subsequent staged write on the same engine handle: the commit-time drift guard rejects with 'run omnigraph repair', but repair itself refuses while a recovery sidecar is pending, so a long-lived server can only recover by restart. The documented contract (writes.md 'Long-running servers', invariants.md invariant 5) says refresh-time roll-forward closes this residual without restart -- but no write path runs it. Two red tests pin the intended contract at the write entry points: a follow-up load (the POST /ingest shape: shared handle, no reopen) and a follow-up mutation must heal roll-forward-eligible sidecars in-process and then succeed. Currently failing with: table 'node:Company' has Lance HEAD version 2 ahead of manifest version 1; run `omnigraph repair` before writing The fix lands in the next commit. * fix(engine): heal pending recovery sidecars at the staged-write entry points Close the long-lived-process gap in the recovery protocol: a Phase B -> Phase C residual (per-table commit_staged landed, manifest publish did not, sidecar persists) previously recovered only at the next ReadWrite open or via an explicit refresh() that no production write path called, so a long-lived server wedged every subsequent write on the commit-time drift guard until restart. New recovery::heal_pending_sidecars_roll_forward: - one list_dir of __recovery/ at write entry (empty -> immediate return, the steady state), so the per-write cost is one storage list; - per sidecar, acquires the same per-(table_key, table_branch) write queues every sidecar writer holds from before write_sidecar until after delete_sidecar, then re-checks sidecar existence -- this serializes the heal against live writers instead of rolling an in-flight sidecar forward from under its writer (which would fail that writer's publish CAS spuriously). Lock order queues -> coordinator matches every writer's commit->publish path. This is the queue-acquisition design recovery.rs and write_queue.rs already documented for in-process recovery; - processes in RollForwardOnly mode: the common residual rolls forward in-process; rollback-eligible sidecars still defer to the next ReadWrite open (Dataset::restore is unsafe under concurrency). Wire it into load_as and mutate_as (before the inline delete path can advance any HEAD), and rebase Omnigraph::refresh onto the same helper so refresh stops racing live writers' sidecars. The maintenance entry points (apply_schema_as, branch_merge_as, ensure_indices) intentionally keep their strict fail-loud preconditions for now; wiring the same heal there is a follow-up with its own tests. Turns the previous commit's two red tests green. * fix(engine): name the right recovery path in the commit-time drift guard The drift guard's 'run omnigraph repair before writing' advice is a dead end when the drift is covered by a pending recovery sidecar: repair refuses while a sidecar is pending. With the write-entry heal in place, reaching this guard with sidecar-covered drift means the heal deferred it (rollback-eligible), and the actual recovery path is a read-write reopen. Distinguish the two classes on the error path only (one sidecar list, after the conflict is already certain); a listing failure falls back to the uncovered-drift wording rather than masking the conflict. Pinned by extending refresh_defers_rollback_eligible_sidecar_to_next_open with a write attempt against the deferred sidecar. * docs: write-entry in-process sidecar heal — contract and coverage Update the recovery contract docs to match the previous two commits: invariant 5 now states that the staged-write entry points and refresh run in-process roll-forward recovery (long-lived processes converge on the next write, not at restart); writes.md 'Long-running servers' describes the heal's queue-acquisition concurrency contract, the improved drift-guard error, and the entry points that intentionally do not heal yet; testing.md indexes the new failpoint tests; AGENTS.md capability matrix drops the claim that in-process recovery is entirely future work (only the rollback path remains with the background reconciler). * test(engine): pin the entry heal contract for schema apply and branch merge Without the write-entry heal, the two maintenance writers do worse than wedge on sidecar-covered drift -- they proceed and decide its fate implicitly: - schema apply re-plans table rewrites from the manifest pin, orphaning the drifted Phase-B commit (its rows silently vanish from the rewritten table) while the stale sidecar lingers to misclassify against the post-apply pins; - branch merge publishes over the drift, making the failed writer's commit visible as an unattributed side effect (no recovery audit row), and leaves the stale sidecar behind. Two red tests pin the intended contract: both entry points heal the sidecar first (attributed roll-forward), then run on the converged state. Currently failing on the stale-sidecar / dropped-rows assertions; the fix lands in the next commit. * fix(engine): heal pending recovery sidecars at the schema-apply and branch-merge entries Extend the write-entry heal to the remaining two write entry points. Unlike load/mutate (which wedge on the drift guard), these proceeded over sidecar-covered drift and decided its fate implicitly: - schema apply re-planned table rewrites from the manifest pin, orphaning the drifted Phase-B commit -- its rows silently vanished from the rewritten table -- while the stale sidecar lingered to misclassify against the post-apply pins; - branch merge published over the drift, making the failed writer's commit visible without a recovery audit row, and left the stale sidecar behind. Both now run the same queue-serialized roll-forward heal at entry, before their own sidecar exists, so recovery is attributed (audit row) and deterministic. ensure_indices stays heal-free: it runs inside the load / schema-apply flows after their entry heal. Turns the previous commit's two red tests green. Docs updated in the same change (invariant 5, writes.md, testing.md, AGENTS.md). * test(engine): pin Phase A sidecar-write failure semantics Storage fault-injection matrix, row 1: a sidecar PUT failure (S3 PutObject / fs write) in Phase A. New failpoint recovery.sidecar_write at the top of write_sidecar -- the single choke point all five sidecar writers go through -- models the storage error backend-generically. Also adds the other three storage-fault failpoints used by the following commits (recovery.sidecar_delete, recovery.sidecar_list, recovery.record_audit); each is a no-op without the failpoints feature. Pinned contract: every writer writes its sidecar BEFORE its first HEAD-advancing commit, so a put failure aborts with zero drift (no sidecar, Lance HEAD == manifest pin, no rows) and a transient fault never wedges the graph -- the same handle writes/merges normally once it clears. Covered for load (the staging writer) and branch_merge (the multi-table writer, forced onto the RewriteMerged path by diverging both sides). * test(engine): pin Phase D delete, list, and audit-append storage-fault semantics Storage fault-injection matrix, rows 2/3/5, plus the real-backend run: - recovery.sidecar_delete: a Phase D delete failure (S3 DeleteObject) must NOT fail the user's write -- the manifest publish already landed, so the caller's data is durable. The swallowed failure leaves a stale sidecar; the next write's entry heal consumes it via the stale-sidecar audit-recovery path (RolledForward, attributed). - recovery.sidecar_list: a __recovery/ list failure (S3 ListObjectsV2) is loud at every consumer -- the write-entry heal fails the write and the open-time sweep fails the open. Silently skipping recovery over a pending sidecar would be consumer tolerance of drift. Once the fault clears, open recovers the pending sidecar normally. - recovery.record_audit: an audit write failure after the roll-forward's manifest publish aborts that recovery attempt and keeps the sidecar; re-entry detects the already-published manifest, records exactly ONE RolledForward audit row, and converges -- the retry tolerance documented on record_audit, exercised end-to-end. - s3_load_recovers_after_publisher_failure_without_reopen: the same-handle heal scenario on a real bucket (gated on OMNIGRAPH_S3_TEST_BUCKET, skips locally), exercising sidecar put/list/delete through S3StorageAdapter instead of the local-FS adapter. CI wiring lands in a follow-up commit. * test(engine): refuse corrupt recovery sidecars loudly Storage fault-injection matrix, row 4 (no failpoint needed -- the corrupt file is written by hand, sibling to the unknown-schema-version refusal test): a truncated/garbage __recovery/{ulid}.json must be refused loudly by both the write-entry heal (the write fails naming the parse error) and the open-time sweep (ReadWrite open fails naming the file), with the file left on disk for operator inspection. Read-only opens still work -- the sweep is skipped there. * test(engine): run the S3 sidecar-lifecycle coverage in CI + document the fault matrix - ci.yml rustfs_integration: new step running the bucket-gated failpoints tests (name filter s3_) against the RustFS container, so sidecar put/list/delete are exercised through S3StorageAdapter on every storage-affecting PR. - writes.md: sidecar I/O failure semantics -- Phase A put failure aborts with zero drift; Phase D delete failure is swallowed (write already durable) and healed by the next write; list failures are loud at heal and open; corrupt sidecars are refused with the file kept for inspection; audit-append failures are retried to exactly one audit row. - testing.md: index the storage-fault matrix in the failpoints.rs row and the new RustFS CI line. * test(engine): pin read-visibility of acknowledged local if-absent writes The cluster lib test import_missing_state_creates_state_with_graph_- observation flakes at ~50% under full-workspace load ('EOF while parsing a value' reading back the state.json its own import just acknowledged). Root cause is in the engine's local storage adapter: write_text_if_absent writes through a buffered tokio::fs::File and returns when write_all resolves -- which, per tokio's documented File semantics, means the bytes reached tokio's internal buffer, not the file. The actual write completes in a background blocking task after drop, so a caller that acknowledges success and reads the object back can see an empty or partial file. Under load the window widens; the red run fails at iteration 0 with 0 of 8192 bytes on disk. The regression test pins the contract at the adapter boundary: when write_text_if_absent resolves, the full contents are visible to any reader; a losing second claim leaves the winner's object untouched. The fix lands in the next commit. * fix(engine): publish local storage writes with atomic visibility Close the class, not the instance. The local adapter admitted three ways for a reader to observe a write that was acknowledged or visible before its bytes were complete: 1. write_text_if_absent acknowledged success when the buffered tokio::fs::File write_all resolved -- i.e. when the bytes reached tokio's internal buffer, not the file. A caller reading back its own acknowledged write could see an empty object (the ~50% cluster import flake under full-workspace load; the regression test failed at iteration 0 with 0 of 8192 bytes visible). 2. The same call published its CLAIM (create_new) before its CONTENT, so concurrent readers saw an empty claimed file in the window. 3. write_text (plain tokio::fs::write) exposed truncated content mid-replace -- silently falsifying write_sidecar's 'readers either see the complete sidecar or none' contract on local FS (true on S3, where PutObject is atomic). A flush in write_text_if_absent would have fixed only (1). Instead, both local write paths now publish complete temp files atomically: rename for replace (write_text -- the idiom write_text_if_match already used) and hard_link for no-replace (write_text_if_absent -- link fails AlreadyExists, so exactly one of N concurrent claimants wins and the winner's object is fully readable at the instant it becomes visible). The local adapter now honors the same object-level atomic-visibility contract as the S3 adapter, which is what every caller (recovery sidecar protocol, cluster state CAS) was written against. Crash-orphaned .tmp. files are inert: the sidecar sweep filters to .json, and cluster state reads address state.json by name. fsync/durability policy is unchanged (no fsync before, none now); this fix is about visibility ordering, not power-loss durability. Pre-existing on main (landed with the multi-graph server mode change, PR #119); surfaced by this branch's heal work only because one extra list_dir per write shifted test timing. Cluster lib suite: 12/25 failures before, 0/25 after. Turns the previous commit's red test green. * refactor(engine): one storage implementation over object_store for every backend Collapse LocalStorageAdapter (hand-rolled tokio::fs) and S3StorageAdapter into a single ObjectStorageAdapter backed by Arc<dyn object_store::ObjectStore> -- LocalFileSystem for local URIs, the existing AmazonS3 build for s3://, plus a pub in_memory() constructor (full contract including TRUE conditional updates; the in-memory test backend testing.md asked for at the adapter level). Why: the acknowledged-before-visible bug showed the two-impl shape has no referee -- one prose contract, two independent answers. Upstream LocalFileSystem::put_opts is byte-for-byte the staged-temp+rename/ hard_link idiom that fix converged on, and Lance's own commit protocol is built on the same primitives (put-if-not-exists / rename-if-not- exists), so the substrate-aligned move is to stop hand-rolling it. The per-backend residue shrinks to a UriCodec (URI <-> object path) and one capability flag. Semantics preserved by construction, with three deliberate deltas: - exists() is now object-store-semantics everywhere (head + non-empty prefix fallback): an EMPTY local directory no longer 'exists'. The only dir-shaped caller (_graph_commits.lance probes) self-heals via ensure_commit_graph_initialized where it previously wedged loudly. - A directory at an object path reads as NotFound, not as an IO error ('only objects exist'). The cluster unreadable-payload test used a same-named directory as a portable non-NotFound trigger; it now uses chmod 000, which still models genuine transient IO. - write_text_if_match keeps content-token semantics on local (PutMode::Update is NotImplemented upstream for LocalFileSystem in 0.12.5 and 0.13.2); the capability flag gates the token SOURCE in read_text_versioned too -- an ETag token with content-compare writes would lose every CAS. delete_prefix keeps a local remove_dir_all branch: directories are a local-FS concept, and list+delete would leave empty skeletons that cluster graph_root_exists (raw Path::exists) reports as still present. LocalStorageAdapter remains as a delegating shim so the pinned contract tests gate this swap textually unchanged; the shim and the test parameterization over local + in-memory land next. Cargo gains the explicit 'fs' feature (already transitively enabled by lance). * test(engine): one executable storage contract, run against every backend Remove the LocalStorageAdapter delegation shim and migrate its construction sites to ObjectStorageAdapter::local(). Replace the per-backend duplicated tests with a single contract_suite asserting the trait's promises (atomic replace, exists incl. the dataset-root prefix probe, one-winner if_absent, versioned CAS with loud CAS-lost, rename, list round-trip with no sibling-prefix bleed, idempotent delete/delete_prefix), run against the local backend and the new in-memory backend -- which implements true conditional updates, so the strong-CAS path is exercised without a bucket. The bucket-gated S3 variant already exists (s3_adapter_conditional_writes_contract). New local-specific pins for the deliberate semantic edges of the collapse: empty directories are not objects (exists=false; the Lance dataset-root probe shape is the non-empty case), file://-anchored and spaces-in-path list output round-trips byte-identically into read_text, dot-segment paths are lexically absolutized (the CLI's ./graph.omni shape), and upstream rename creating missing destination parents. The acknowledged-write visibility regression test stays, now documenting that the cross-API std::fs read-back is the point. * refactor(cluster): drop put_json's per-backend atomicity branch The local temp+rename dance predates the storage adapter guaranteeing atomic visibility; now that write_text publishes via a staged temp + rename on the filesystem (and a single atomic PUT on object stores) by contract, the branch duplicated upstream behavior. One call, both backends. * docs: storage adapter collapse — contract, in-memory backend, local CAS gap - testing.md: the 'no MemStorage backend' note is half-closed — ObjectStorageAdapter::in_memory() covers the text-object layer with the full contract (true conditional updates); Lance datasets bypass the adapter, so the engine substrate ask stays open. - invariants.md: truth-matrix Tests row updated; new Known Gap for local write_text_if_match (upstream PutMode::Update is unimplemented for LocalFileSystem; content-token emulation is safe only under the cluster lock protocol — close before admitting a lock-free caller). - writes.md: backend notes for the unified adapter (name#N staging residue invisible to the sweep, backend-wrapped error text with exists()-probing for missing-vs-error, loud permission failures). * docs: finish renaming the storage adapters in user docs and test comments storage.md's URI-scheme table and the S3 failpoint test's doc comment still named the deleted LocalStorageAdapter/S3StorageAdapter; both now describe the unified ObjectStorageAdapter over object_store, including the relative-path absolutization note for local URIs. * test(engine): pin branch-awareness of the drift guard's recovery advice A pending sidecar on ANOTHER branch does not cover this branch's drift: with a deferred feature-branch sidecar on disk and genuinely uncovered drift on main, the main write's error must still point at omnigraph repair -- a read-write reopen recovers the sidecar but cannot repair main's uncovered drift. Currently red: the guard matches sidecar pins by table_key only, so the feature sidecar flips main's advice to the reopen path. Fix in the next commit. Surfaced by external review of the drift-guard change. * fix(engine): branch-aware sidecar matching in the drift guard's advice The commit-time drift guard's sidecar-covered check matched pins by table_key alone, so a pending sidecar on another branch flipped this branch's uncovered-drift advice from 'run omnigraph repair' to the reopen path -- and a reopen recovers that sidecar but cannot repair this branch's drift. Compare the pin's table_branch too. Turns the previous commit's red test green. Surfaced by external review of the drift-guard change. * test(engine): pin heal non-interference with a live schema apply The write-entry heal's schema-staging reconcile runs before any queue acquisition, so a load on the same handle, overlapping a schema apply parked between its staging write and manifest commit, promotes the apply's staging files (new catalog live against the old manifest), classifies the LIVE apply's sidecar, and publishes its registrations out from under it. The resumed apply then collides with its own stolen commit. Currently red with: Lance("Concurrent modification: table version 3 already exists for node:Tag") The fix (per-sidecar reconcile under the sidecar's write-queue guards, plus a serialization key the schema-apply writer and the heal both acquire) lands in the next commit. Surfaced by external review of the write-entry heal. * fix(engine): serialize the heal's schema-staging reconcile with live schema applies The write-entry heal ran recover_schema_state_files up front, before acquiring any queue guards. Overlapping a live schema apply parked between its staging write and manifest commit, the heal promoted the apply's staging files (new catalog live against the old manifest), classified the LIVE apply's sidecar, and published its registrations — the resumed apply then collided with its own stolen commit. Correct by construction: - New schema-apply serialization queue key, acquired by the schema- apply writer (alongside its per-table keys) from before write_sidecar until after delete_sidecar. Per-table keys alone don't cover a registration-only migration, which pins no existing tables but has a sidecar and staging files on disk. - The heal reconciles schema staging lazily, PER SchemaApply sidecar, after acquiring that sidecar's guards (including the serialization key) and re-confirming the sidecar exists — a sidecar that survives the queue wait belongs to a dead writer, so the reconcile can no longer race a live apply. Recomputing per sidecar also removes the staleness of one up-front result across a multi-sidecar pass. - Omnigraph::refresh drops its up-front reconcile-and-pass-through (same race, and a pre-promoted result would make the heal's guarded reconcile see clean staging and wrongly defer the sidecar): it now reconciles standalone only when NO sidecar exists — which cannot race a live apply, whose sidecar always precedes its staging files — and otherwise defers entirely to the heal. The open-time sweep keeps its precomputed reconcile: open has no concurrent writers. Turns the previous commit's red test green. Surfaced by external review of the write-entry heal. Self-audit addendum folded in: refresh's no-sidecar gate had a TOCTOU (a live apply could write its sidecar + staging between the empty check and the reconcile) — the standalone reconcile now holds the serialization key across the list-then-reconcile pair. The remaining residual is cross-process only (in-process queues cannot serialize against a writer in another process; the open-time sweep has the same pre-existing exposure) and is now an explicit Known Gap in invariants.md rather than an implicit one. * test(engine): pin catalog reload after the heal recovers a schema apply When the write-entry heal rolls a crashed apply's SchemaApply sidecar forward on the same handle, disk and manifest move to the new schema (staging promoted, registrations published) but the handle's in-memory schema_source/catalog do not. Subsequent writes then validate against the stale catalog and reject rows of types the graph already has. Currently red with: record 1: unknown node type 'Tag' refresh() reloads after its heal; the write entry points must too. Fix in the next commit. Surfaced by external review of the write-entry heal. * fix(engine): reload the in-memory catalog after the heal recovers a schema apply heal_pending_recovery_sidecars refreshed the coordinator and invalidated the runtime cache after processing sidecars, but never reloaded schema_source/catalog — so a write whose entry heal rolled a crashed SchemaApply sidecar forward proceeded to validate against the OLD schema while disk and manifest were already on the new one. reload_schema_if_source_changed is the same post-heal step refresh() already runs; it no-ops on the (overwhelmingly common) non-schema heal because the on-disk source is unchanged. Turns the previous commit's red test green. Surfaced by external review of the write-entry heal. * test(engine): pin that a deleted-branch sidecar cannot wedge the graph A rollback-eligible sidecar pinned to a branch is deferred by every roll-forward-only pass; if the branch is then deleted, the sidecar survives, referencing a branch with no manifest tree. The heal (every write entry) and the open-time sweep (every ReadWrite open) both fail opening the dead branch, and repair refuses while a sidecar is pending -- a terminal read-only state with manual sidecar surgery as the only exit. Currently red with: Lance("Not found: .../__manifest/tree/feature/_versions") The branch's tree and forks are already reclaimed, so the pinned drift is unreachable and the sidecar is provably moot; the fix classifies it as an orphaned-branch terminal state (audit + discard) in both passes. Surfaced by review (P1, verified by repro). * fix(engine): classify deleted-branch sidecars as orphaned instead of wedging A deferred (rollback-eligible) sidecar pinned to a branch survives branch_delete; both the write-entry heal and the open-time sweep then failed unconditionally opening the dead branch -- every write and every ReadWrite open errored, and repair refuses while a sidecar pends. Terminal state, manual sidecar surgery the only exit. The branch's tree and per-table forks are already reclaimed at delete, so the drift the sidecar pins is unreachable and the sidecar is provably moot. Both passes now check the sidecar's branch against the manifest's branch list (the authority -- deliberately NOT inferred from a Not-found on open, which could be a transient storage error masking real recovery intent) and discard orphans with an OrphanedBranchDiscarded audit row, commit appended on main since the sidecar's own branch no longer has a commit graph. The open-time half is pre-existing; the write-entry heal made it hot. Turns the previous commit's red test green. Surfaced by review (P1, verified by repro). * chore: harden review nits — vacuous CI filter, root-runner skip, liveness note - ci.yml: the RustFS sidecar-lifecycle step now fails loudly if the 's3_' name filter matches zero tests (cargo passes vacuously on an empty filter; the step exists specifically to prove S3 sidecar I/O coverage). The pre-existing CLI smoke step has the same shape and is left for a follow-up. - cluster unreadable-payload test: cfg(unix) + a skip-with-log when running as root (mode 000 is still readable to root, common in container dev runners), so the test degrades instead of failing. - refresh: document the one-pass-late convergence for legacy staging residue while non-SchemaApply sidecars pend, so nobody 'fixes' it by re-running the reconcile unserialized — the exact race the serialization key closes. * test(engine): pin orphan-discard idempotency across a delete fault discard_orphaned_branch_sidecar writes its audit row and main commit before deleting the sidecar; a Phase D delete fault leaves the sidecar on disk with the audit already durable, and the retry repeated the whole path -- a second OrphanedBranchDiscarded audit row (and commit) for the same operation. Currently red: 2 rows after one fault + retry. The retry must only finish the delete. Fix next. Also promotes the recovery-audit kinds reader into the shared test helpers (it was recovery.rs-local). Surfaced by external review of the orphan-discard fix. * fix(engine): orphan-discard idempotency + heal reports acted-vs-deferred Two review findings on the recovery surface: - discard_orphaned_branch_sidecar now checks the audit table for an existing (operation_id, OrphanedBranchDiscarded) row before appending the commit + audit pair, so a Phase D delete fault retries ONLY the delete instead of duplicating audit rows and commit-graph entries. Cold path: the list scan runs only when an orphaned sidecar exists. Turns the previous commit's red test green (exactly one audit row across fault + retry). - process_sidecar returns whether durable state changed; the heal sets processed_any only for sidecars that were actually rolled forward / rolled back / audit-recovered (orphan discards count). Deferred sidecars (rollback-eligible, invariant-violating, unpromoted SchemaApply) no longer trigger a per-write schema reload + full runtime-cache invalidation while they pend -- the cache is snapshot-keyed so this was waste, not corruption, but it was paid on every write until reopen. Acted-paths' processed=true remains pinned by load_after_schema_apply_phase_b_failure_uses_recovered_catalog (the reload depends on it). Surfaced by external review. * test(engine): pin the orphan-discard audit-append fault leg as documented tolerance The orphan discard's commit append and audit append are two writes; a failure between them leaves a recovery commit with no audit row, and the retry (keyed on the audit row, the operator-facing record) appends a second commit before the audit lands. This is the same not-atomic-pair-write tolerance record_audit documents and the manifest->commit-graph Known Gap covers for every publish: bounded commit-graph noise, audit row exactly-once under clean failures. Keying idempotency on commit rows instead would need an operation_id column on _graph_commits, and audit-before-commit would dangle the graph_commit_id join -- both worse than the documented residual. Make the tolerance explicit instead of implicit: docstring names the window, a failpoint sits inside it, and the new test pins convergence across the fault (sidecar consumed, exactly one audit row), completing the orphan-discard fault matrix alongside the delete-fault leg. Surfaced by external review of the orphan-discard idempotency. * test(engine): pin honest drift-guard advice when sidecar listing fails The guard's unwrap_or(false) conflated 'classified as uncovered' with 'could not classify': a transient list fault on the guard's second list (the entry heal's first list having succeeded) confidently routed the operator to omnigraph repair even when the heal had just deferred a rollback-eligible sidecar -- and repair refuses while a sidecar is pending. Currently red: the error says 'run omnigraph repair' with no mention of the reopen path. The fix names both paths plus the failure cause when classification is impossible. Surfaced by external review of the drift-guard fallback. * fix(engine): admit ambiguity in the drift guard when sidecar listing fails Replace the unwrap_or(false) fallback with a tri-state: covered -> reopen advice; uncovered -> repair advice; listing FAILED -> say the drift could not be classified, name the cause, and give both paths in order ('run repair, or reopen read-write if repair reports a pending sidecar'). The old fallback confidently routed a transient list fault to repair, which refuses while a sidecar is pending -- a self- correcting but pointless detour. The conflict itself is still always raised; only the advice degrades honestly. Turns the previous commit's red test green. Surfaced by external review of the drift-guard fallback.	2026-06-13 11:20:08 +02:00
aaltshuler	dedd647cde	release: bump workspace to 0.7.0 All six crate manifests + their path-dependency constraints, Cargo.lock, the regenerated openapi.json version metadata, AGENTS.md's surveyed version, and the v0.7.0 release notes (object-storage clusters, config-free --cluster serving, the operator config surface, keyed credentials, operator targeting/aliases, and the omnigraph.yaml deprecation stages). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 14:12:33 +03:00
aaltshuler	8d7aed065f	test(cluster,server): gated object-storage cluster e2e + CI wiring + docs s3_cluster.rs runs the full control-plane lifecycle against a real bucket (CI: containerized RustFS; locally the RustFS binary): import → lock released (pins the drop-time release regression caught on the first live smoke) → apply (graph roots + catalog on the bucket, nothing local) → serving snapshots from both the config dir and the bare URI → schema evolution → approved delete (prefix removal) → empty-cluster refusal. The server suite gains the config-free boot test: --cluster s3://… with zero local files serves a stored query over HTTP. CI: the rustfs job runs both suites; the classify filter covers the cluster store/serve modules and the new test files. The server smoke drops its name filter — every test in the s3 target is bucket-gated, and a filter matching nothing passes vacuously (which silently ran zero tests for a while). Docs: deployment.md gains the Bucket-no-volume shape as the preferred cloud deployment; cluster.md/server.md document --cluster <uri>; testing.md maps the new suite. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:56:40 +03:00
aaltshuler	58855c0a7c	feat(cluster,server): inline policy content + config-free --cluster URI boot Two serving changes that complete RFC-006's read side: ServingPolicy carries the policy bundle CONTENT (digest-verified at snapshot read) instead of a blob path — the catalog may live on object storage, and the server must not re-read mutable state after the snapshot. The server grows a PolicySource enum: File for omnigraph.yaml deployments (unchanged), Inline for cluster boots, wired through PolicyEngine::load_{graph,server}_from_source. read_serving_snapshot_from_storage(uri) reads the applied revision straight from a storage root, and --cluster accepts a scheme-qualified URI (s3://bucket/prefix): config-free serving — a serving box needs only the URI and credentials; the ledger and catalog on the bucket ARE the deployment artifact. Bare paths keep the config-directory behavior. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:56:22 +03:00
aaltshuler	f6ae3e4fa3	fix(cluster): lock release must complete before a CLI process exits Caught by the first live s3 smoke: StateLockGuard's spawned async delete dies with the runtime when a short-lived CLI process exits right after the command — import's lock survived into the next command as state_lock_held. On the multi-thread runtime (the CLI, and the gated s3 tests) block_in_place waits for the delete to complete; current-thread runtimes keep the spawn fallback with force-unlock as the documented recovery, same as a crash. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 14:33:26 +03:00
aaltshuler	8dc2f15255	feat(cluster): the storage: root — state, catalog, and graph roots relocatable cluster.yaml gains an optional storage: URI deciding where everything the cluster STORES lives: the state ledger, lock, content-addressed catalog, recovery sidecars, approval artifacts, and the derived graph roots (<storage>/graphs/<id>.omni). Absent, it defaults to the config directory itself — the original layout, byte-compatible, so pre-existing clusters and the whole test suite are untouched. Declared configuration always stays in the working tree (Terraform's config-local/state-remote split); credentials are env-only, never in cluster.yaml. Every command resolves its store from the declared root (a bad root is a loud invalid_storage_root). Graph-root derivation, the delete executor (prefix delete via the adapter), the sweep's existence probes, the catalog payload write/verify/read paths, and the serving snapshot all flow through ClusterStore — the last raw-fs holdouts for stored state are gone, and the deny-list gains the rule that keeps it that way. Tests: default-layout byte-compat, a file:// root relocating the entire cluster (ledger+catalog+graphs under the new root, nothing under the config dir, serving snapshot follows), invalid-root validation. 98 in-crate + 9 failpoints + full workspace gate green. The s3:// flavor lands with PR 3's gated RustFS e2e. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 14:28:04 +03:00
aaltshuler	fd002abaa5	feat(cluster): port the storage backend to the engine StorageAdapter LocalStateBackend becomes ClusterStore: every stored byte — state ledger, lock, recovery sidecars, approval artifacts — now flows through the engine's StorageAdapter, making file:// and s3:// one code path. Behavior on the file backend is byte-compatible (layout, CAS semantics, diagnostics, lock release timing) and the entire pre-existing suite passes unchanged. Mechanics: the ledger CAS keeps its public sha256 vocabulary while the physical swap is token-conditioned (ETag If-Match on S3 via PR #186's primitives; content-token + temp/rename locally — the pre-port semantics); the lock is a create-only put (genuinely cross-machine on object stores) with deterministic drop-release locally and best-effort spawned release on S3; sidecars/approvals address by URI (SweepOutcome and the executors carry strings); sweep row-1 retirement joins the uniform deferred post-CAS cleanup. ClusterStore also gains the catalog-payload and graph-root methods that commit 2 wires in. Async ripple: status/force-unlock/serving-snapshot and the server's settings loader chain go async (CLI dispatch and ~20 test hosts follow, mechanically). tokio joins the cluster crate's runtime deps for the lock guard's handle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 14:11:14 +03:00
aaltshuler	db6fe03be1	refactor(cluster): move type definitions to types.rs Verbatim move of the public output/diagnostic types and the internal state/sidecar/approval models; previously-private types and their fields get pub(crate) (they were crate-visible by position before). lib.rs is now the command pipeline + public API. 95 tests green; full workspace gate green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 05:42:02 +03:00
aaltshuler	dc0a1fc5a5	refactor(cluster): move declared-config loading to config.rs Verbatim move of cluster.yaml parsing, query discovery, source digesting, header/id validation, path resolution, and live-graph observation. Two helpers that the cut swept along were relocated to their right homes (state-status helpers back to lib.rs, lock-file helpers to store.rs). 95 tests green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 05:37:20 +03:00
aaltshuler	dd17c0c50f	refactor(cluster): move diffing and classification to diff.rs Verbatim move of diff_resources, binding-change diffing, blast radius, approval gating, ResourceKind, classify_changes, and demotion. 95 tests green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 05:33:13 +03:00
aaltshuler	9c3e09e838	refactor(cluster): move the recovery sweep to sweep.rs Verbatim move of the sidecar classification (all RFC-004 D3 rows), tombstoning, and approval-consumption helpers. 95 tests green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 05:30:55 +03:00
aaltshuler	00fc5cf537	refactor(cluster): move the serving snapshot to serve.rs Verbatim move of the Serving* types, read_serving_snapshot, and read_verified_payload; public re-exports preserved (the server's imports are unchanged). 95 tests green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 05:29:44 +03:00
aaltshuler	5a8047e5d0	refactor(cluster): move the storage backend to store.rs Verbatim move of LocalStateBackend, StateSnapshot, StateLockGuard and their impls — the single home for stored-state I/O (state ledger, lock, recovery sidecars, approval artifacts), where the RFC-006 object-storage port lands next as a focused diff. Visibility bumps (pub(crate)) only; 95 tests green before and after. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 05:28:04 +03:00
aaltshuler	fbb86dee0e	refactor(cluster): move the in-source test suite to tests.rs Verbatim move (indentation preserved — embedded raw-string fixtures are content). lib.rs drops from 7,857 to ~4,750 lines; `use super::*` resolves to the crate root through the #[path] module declaration unchanged. 95 tests green before and after. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 05:25:53 +03:00
aaltshuler	4558454bc7	fix(cluster): address review — discovery reads each file exactly once resolve_query_decls hands its file contents to the caller; the per-query digest/typecheck pass reuses them instead of re-reading (a file with N queries was read N+1 times), which also closes the window where a file changing between enumeration and validation produced a confusing query_key_mismatch for a just-discovered name. Explicit-map declarations read as before. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 01:35:47 +03:00
aaltshuler	677320ceec	feat(cluster): Terraform-shaped query declaration — discover from files cluster.yaml's graphs.<id>.queries previously accepted only an explicit name->file map, forcing configs to re-enumerate every `query <name>` that the .gq files already declare (the SPIKE cookbook needed 66 entries for 6 files). The files ARE the declaration now: `queries: queries/` discovers every declaration in a directory's top-level *.gq (sorted), a list form takes explicit files, and the map stays for fine-grained control. Discovery is loud — unreadable/unparseable files and duplicate query names fail validation (query_parse_error, duplicate_query_name). Downstream is untouched: each discovered query is still an individually addressed resource with the containing file's digest. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 00:46:21 +03:00
aaltshuler	f5b43164b8	feat(cluster): pub read-only serving-snapshot API RFC-005 §D2/§D4: read_serving_snapshot reads the applied revision as everything a server needs to boot — graphs at derived roots, stored-query sources read from the content-addressed catalog and re-hashed against the recorded digests, policy blob paths with their applied applies_to bindings. All-or-nothing: missing state, pending recovery sidecars, missing/tampered blobs, pre-5A entries without bindings, and an empty graph set each refuse the snapshot with a remedy; no partial serving. Lock-free by design — the state file is replaced atomically, so the read is a consistent point-in-time ledger. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 17:39:26 +03:00
aaltshuler	0b84b1adc3	feat(cluster): record policy applies_to bindings in the applied revision Slice 5A of RFC-005: the state ledger becomes serving-sufficient for the Phase-5 server boot. StateResource gains an optional applies_to (normalized typed refs: cluster \| graph.<id>), written by apply for every applied policy create/update from the desired config's validated bindings. The hole this closes: applies_to is not part of the policy file digest, so a binding-only edit previously produced NO plan change at all (a 4C e2e even asserted that — the gap, not a contract). Binding changes are now first-class: a post-diff pass emits an Update with equal before/after digests and a binding_change marker (visible in plan/apply JSON and human output as [bindings]), classification/execution treat it as an ordinary catalog-tier applied change (payload skips naturally — the blob is unchanged), and convergence requires zero binding divergence, so stale bindings can never report converged. Pre-5A ledger entries (no bindings recorded) surface as the same backfill Update; one apply heals them, exactly the remedy RFC-005's boot-error path names. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 15:30:33 +03:00
aaltshuler	87691fe9c7	test(cluster): failpoint coverage for delete crash windows - Crash before the removal: root intact, approval file unconsumed, sidecar survives, no ack; the next run retires the stale intent (row 8) and the still-approved delete completes in the same run. - Crash after the removal, before the state CAS: root gone, ledger byte-identical, the sidecar carries the approval id; the next run's sweep rolls the tombstone forward, consumes the approval, audits the recovery, and converges (row 7b). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 14:34:54 +03:00
aaltshuler	d1d04217ab	feat(cluster): execute approved graph deletes in cluster apply Stage 4C execution half (RFC-004 §D5/§D6 + sweep rows 7/7b/8): an approved graph.<id> delete — and its riding schema/query deletes — classifies Applied and executes LAST in the run, sidecar-fenced: pre-op manifest pin (best effort; partial roots still delete), approval_id carried in the sidecar, recursive root removal (NotFound tolerated), subtree tombstoned out of the ledger with a tombstone observation, the approval consumed in the same state CAS (ledger summary) and its artifact file rewritten with consumed_at only after the CAS lands — a failed run consumes nothing and the approval stays valid for the retry. Sweep rows: already-tombstoned intents retire (7); a completed delete with a stale ledger rolls forward — tombstone + approval consumption + audit entry (7b, idempotent); a still-present root retires the stale intent with a graph_delete_incomplete warning and the still-approved delete re-executes in the same run (8) — prefix removal is idempotent, so retry IS the repair. The multi-graph mixed e2e gets its conclusion: blocked without approval, cluster approve graph.engineering --as andrew, converge, tombstone visible in status. Phase 4's disposition matrix is now fully executable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 14:34:02 +03:00
aaltshuler	f4e9105272	feat(cluster): cluster approve — digest-bound approval artifacts RFC-004 §D4, gate half: graph deletes (and their subtree) now classify Blocked/approval_required instead of Deferred; the new cluster approve command (requires the global --as actor) writes __cluster/approvals/{ulid}.json bound to the desired config digest and the change's before/after digests, so config or state drift invalidates the artifact automatically (approval_stale warning, never authorizes). One gate per subtree: compute_approvals lists only the graph-level delete, and ApprovalRequirement gains a satisfied flag surfaced by plan. Consumption and the delete executor land next — until then approved deletes stay blocked so a gate-only build can never strip state without removing the root. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 14:30:05 +03:00
aaltshuler	80cae4e8e1	test(cluster): failpoint coverage for schema-apply crash windows - Crash before the engine call: sidecar (carrying the --as actor) survives, live schema and ledger untouched, no ack; the next run's sweep retires the stale intent and the same run applies and converges. - Crash after the engine call, before the state CAS: the manifest moved with the post-op pin in the sidecar, state.json byte-identical; the next run's sweep rolls the ledger forward with a schema_apply audit entry and the run converges. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:13:15 +03:00
aaltshuler	a1ba4dc413	feat(cluster): execute schema applies in cluster apply Stage 4B (RFC-004 §D1/§D5): schema.<id> Update changes classify Applied and execute after graph creates, sequentially and sidecar-fenced — read-write open (the engine's own recovery runs first), pre-op manifest pin recorded, apply_schema_as with allow_data_loss: false (soft drops only; hard drops wait for 4C's approval artifacts), post-op pin rewritten into the sidecar, sidecar retired only after the final state CAS. Queries gated on a same-plan schema update unblock (the migration lands first in the same run); failures — unsupported migrations, lock contention, user branches — surface as schema_apply_failed with the engine's message, demote dependents via the origin-aware demotion helper, and stop further graph-moving work. Schema evolution is now fully cluster-driven (the defer -> manual schema apply -> refresh loop is gone), and out-of-band schema drift is converged back by apply as an ordinary soft migration (axiom 8: drift correction is gated like any change; the recoverable tier needs no approval) — both pinned by reworked e2es. The multi-graph mixed e2e's deferred row is now delete-shaped, pre-staging the 4C surface. Actor: cluster apply accepts the CLI's global --as via the new ApplyOptions / apply_config_dir_with_options (apply_config_dir delegates unchanged); the actor is echoed in ApplyOutput and recorded in sidecars and audit entries, and threads to apply_schema_as so Cedar fires wherever a checker is installed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:12:15 +03:00
aaltshuler	0571c05ebb	feat(cluster): schema-apply recovery sidecar kind and sweep RecoverySidecarKind::SchemaApply with digest-based sweep classification (robust to unrelated manifest movement; version pins stay forensic): ledger-consistent -> sidecar retired (RFC-004 rows 1+2); live digest matches the intended schema, state stale -> roll forward with composite recompute and a recovery_records audit entry (row 3); unverifiable or unexpected digests -> pending, kept, graph-moving work blocked (rows 1-unopenable/6). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:05:42 +03:00
aaltshuler	ca63a9340b	feat(cluster): embed schema migration previews in cluster plan RFC-004 §D7's data-aware preview: for every schema update, plan opens the live graph read-only and embeds the engine's migration plan (supported flag + typed steps) in the change record; the human renderer prints the steps. Preview failures (unreachable graph, planner error) degrade to the digest diff with a schema_preview_unavailable warning — planning never blocks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:04:19 +03:00
aaltshuler	b313075476	refactor(cluster): make plan_config_dir async Mechanical conversion ahead of Stage 4B (plan will preview schema migrations against live graphs): signature, CLI dispatch, and test callers. Zero behavior change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:02:12 +03:00
aaltshuler	83d77bcb16	test(cluster): failpoint coverage for graph-create crash windows - Crash before the init (row 1): sidecar survives, nothing moved, no ack; the next run's sweep removes the intent and the same run creates and converges. - Crash after the init, before the state CAS (row 4): the graph exists with the post-init manifest pin in the sidecar, state.json byte-identical; the next run's sweep rolls the ledger forward with a recovery_records audit entry and the run converges. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 04:59:48 +03:00
aaltshuler	c3007369cd	feat(cluster): execute graph creates in cluster apply Stage 4A (RFC-004 §D1/§D5): graph.<id> Create — and its paired schema Create, which the init carries — classify Applied and execute first in the run, sequentially and sidecar-fenced: sidecar written before Omnigraph::init at the derived root, rewritten with the post-init manifest pin, deleted only after the final state CAS lands. Dependent queries and policies no longer block on a graph create in the same plan — creates run first, so they apply in the same run; a create failure demotes them to blocked (dependency_not_applied) and stops further graph-moving work (loud partials), with the sidecar left for the sweep to classify. Graphs with a kept recovery sidecar (rows 5/6) classify Blocked/cluster_recovery_pending, and the sweep's Drifted/Error statuses are never clobbered by a generic Blocked. Schema source is re-read and digest-verified under the lock before the init (the write_resource_payload TOCTOU posture). Plan previews the same dispositions. e2e fallout updated: a fresh multi-graph config now converges in one apply; a destroyed root is re-created as an EMPTY graph by the next apply (declarative convergence — visible in plan, called out in docs); the new cluster_e2e_declared_graph_created_by_apply pins the no-manual-init flow. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 04:58:56 +03:00
aaltshuler	bf8cc7a753	feat(cluster): graph-create recovery sidecars and sweep RFC-004 §D2/§D3 for the graph_create kind. RecoverySidecar records intent under __cluster/recoveries/{ulid}.json; the roll-forward-only sweep runs at the start of apply/refresh/import under the state lock and classifies each survivor by observation: root absent -> intent removed (row 1); outcome already recorded -> retired (row 2); create completed but state stale -> ledger rolled forward with a recovery_records audit entry (row 4); partial root -> Error/graph_create_incomplete, kept, never auto-deleted (row 5); unexpected schema -> Drifted/actual_applied_state_pending, kept (row 6). Sweep mutations ride the command's existing CAS write; completed sidecars are deleted only after that write lands. Read-only status/plan warn (cluster_recovery_pending) without acting. The apply payload gate now counts only payload-phase errors so kept-sidecar diagnostics don't abort the run before their statuses persist. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 04:50:42 +03:00
aaltshuler	6fbf09d5c9	refactor(cluster): make apply_config_dir async Mechanical conversion ahead of Stage 4A graph create (which calls the async Omnigraph::init from inside apply): the fn signature, the CLI dispatch arm, and every test caller (#[test] -> #[tokio::test]). Zero behavior change; all 60 lib tests and 3 failpoint tests green before and after. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 04:43:38 +03:00
aaltshuler	16759b28b9	fix(cluster): RAII-guard the callback failpoint ScopedFailPoint::with_callback gives cfg_callback the same Drop-based cleanup as cfg actions; a panic while the point is active no longer leaks the callback into the process-global registry where it would fire under later tests (greptile review, PR #167). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 02:36:24 +03:00
aaltshuler	211b37e6de	test(cluster): failpoint tests for crash-mid-apply and state CAS race The apply-side coverage the implementation spec's hard gate requires before Phase 4 graph-moving apply: - crash after the payload phase: state.json byte-identical, blobs inert on disk, lock released, no phantom statuses, nothing acknowledged; a plain re-run repairs via skip-if-exists blob reuse. - CAS race: a cfg_callback rewrites state.json at the exact read->write window (the state.lock:false concurrent-writer scenario); apply surfaces state_cas_mismatch, acknowledges nothing, reports the persisted status snapshot, leaves the concurrent writer's state on disk; a re-run converges. CI's failpoints step now runs both the engine and cluster suites. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 02:14:06 +03:00
aaltshuler	21b531605f	feat(cluster): failpoint infrastructure mirroring the engine Optional failpoints feature (dep:fail + fail/failpoints, deliberately NOT enabling omnigraph/failpoints), a maybe_fail/ScopedFailPoint module returning Diagnostic-typed injected errors, and two call sites in apply_config_dir: cluster_apply.after_payload_phase (the crash point: blobs on disk, state untouched) and cluster_apply.before_state_write (routes through the persisted-statuses revert contract; a cfg_callback here can mutate state.json to make the CAS check fail organically). Feature off compiles to Ok(()) — zero behavior change. Tests live in a separate integration binary because the fail registry is process-global. Also refresh the crate description (stale 'read-only' since Stage 3A). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 02:12:59 +03:00
aaltshuler	15868972ff	feat(cluster): verify catalog payload blobs in status and refresh Closes the Stage 3A product gap where a deleted or corrupted blob under __cluster/resources/ went unnoticed forever (status reported converged and apply could not repair it because the digests matched). verify_catalog_payloads checks every query/policy digest in state against its content-addressed blob (existence + full sha256 re-hash; graph/schema/unknown addresses have no payloads and are skipped). status reports findings read-only (warnings catalog_payload_missing/_mismatch; error catalog_payload_read_error — an unverifiable catalog must not report healthy). refresh closes the self-heal loop: missing/mismatched blobs mark the resource drifted and remove its digest from state so the next plan proposes a create and the next apply republishes; unreadable blobs keep the digest (no spurious republish), mark error, and exit non-zero. Verification runs before graph observation so the recomputed graph composite already excludes removed query digests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 02:07:08 +03:00
aaltshuler	5e1dede08f	fix(cluster,cli): apply failure output — persisted statuses only, changes list printed Two review findings (greptile, PR #165): - ApplyOutput.resource_statuses on a failed state write now carries the pre-apply on-disk snapshot instead of the in-memory mutations that were never persisted, so automation reading the field independently of `ok` cannot see phantom applied/blocked statuses. Regression test forces the state write to fail via a read-only __cluster dir (unix-only, skips when permissions are not enforced). - Human-mode `cluster apply` prints the classified changes list on failure too, so an operator debugging a partial apply without --json sees what was attempted. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 00:35:03 +03:00
aaltshuler	1f8e5945cf	feat(cluster): config-only apply with content-addressed catalog publish apply_config_dir executes the query/policy subset of the plan: payloads are written content-addressed under __cluster/resources/{query,policy}/... before the state CAS (state is the publish point; orphaned blobs from a failed CAS are inert and re-apply is the repair), then state.json is CAS-updated with applied digests, Applied/Blocked statuses, and a revision bump. Graph/schema changes are never executed here: schema content and graph lifecycle defer to a later phase with loud warnings, while graph.<id> composite-digest updates whose schema component is unchanged converge automatically via recomputation from state's own components (without which apply could never converge). Idempotent re-apply leaves state bytes and revision untouched. PlanChange gains optional disposition/reason fields, populated by the same classifier in cluster plan, so plan is an honest preview of what apply will execute, derive, defer, or block. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 23:32:13 +03:00
aaltshuler	89b876c797	Add cluster state lock recovery	2026-06-09 22:31:46 +03:00
aaltshuler	d00d42274e	Implement cluster refresh and import	2026-06-09 21:17:23 +03:00
aaltshuler	2f19656c0e	fix(cluster): tighten state lock observations	2026-06-09 18:30:33 +03:00
aaltshuler	b046515e1c	Merge origin/main into cluster-config-docs	2026-06-09 18:11:12 +03:00
aaltshuler	a7956ea5a9	Add cluster JSON state ledger status	2026-06-08 21:09:23 +03:00
aaltshuler	043b02e617	feat(cluster): add read-only validate and plan	2026-06-08 20:07:39 +03:00

43 commits