omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
Ragnor Comerford	05a8bd5de1	server: gate /ingest /branches/* /schema/apply on per-actor admission Closes the gap that admission control only fired on /change. A heavy actor sending bulk-ingest traffic could exhaust shared engine capacity (Lance I/O threads, manifest churn) without hitting the per-actor cap. Wires `state.workload.try_admit(&actor_arc, est_bytes)` into the five remaining mutating handlers AFTER Cedar authorization (so denied requests don't consume admission slots) and BEFORE the engine call. Byte estimates per handler: - /ingest: request.data.len() (NDJSON body) - /schema/apply: request.schema_source.len() - /branches/create, /branches/delete, /branches/merge: 256 (small JSON; the heavy work is bounded per-(table, branch) by the engine's writer queue rather than by request size) The admission guard is held in `let _admission = ...` so it stays alive until handler return, releasing the count permit + decrementing the byte budget on drop. Pinned by `ingest_per_actor_admission_cap_returns_429` (previous commit). The test still fails on the Retry-After header assertion; the next commit emits the header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:57:53 +02:00
Ragnor Comerford	0976cbebc5	tests: pin /ingest admission gate + 429 Retry-After (red) Per AGENTS.md rule 8, this commit lands the failing regression test ahead of the fix. Currently fails on `f925ad1` with 8/8 statuses returning 200 because /ingest does not call WorkloadController::try_admit. The test pins: - /ingest is gated on per-actor admission control (returns 429 when the cap is exceeded). - 429 responses carry the structured `code: too_many_requests` error body so clients can distinguish them from generic conflicts. - 429 responses include a `Retry-After` header so clients can implement bounded backoff. The doc claim at api.rs:343 and lib.rs:344 was that this header exists; the IntoResponse impl currently emits no headers. Two follow-up commits will turn this green: 1. Wire WorkloadController::try_admit on /ingest and the four other mutating handlers (Block 2.1). 2. Emit the Retry-After header on 429/503 responses (Block 2.2). The test uses #[serial] + EnvGuard to override OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=1 without racing parallel tests, then spawns 8 concurrent /ingest tasks aligned at a tokio::sync::Barrier so multiple tasks reach try_admit close in time. With cap=1, at least one must be rejected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:57:01 +02:00
Ragnor Comerford	c263732b1a	tests: extend same-key insert test with /snapshot row-count assertion The existing change_concurrent_inserts_same_key_serialize_without_409 test claimed in its comment "asserts the final row count equals N" but only checked HTTP status codes. cubic flagged the gap; this commit adds the actual /snapshot read after the concurrent inserts to verify all N batches landed (no silent overwrite) by comparing the post-test node:Person row_count against SEED + N. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:49:38 +02:00
Ragnor Comerford	4ffbf6ec61	engine: drop swap-restore in branch_create_from; operate on local coord Closes the swap-restore race in `branch_create_from_impl` by simply not touching `self.coordinator` at all. Open the source-branch coordinator locally, call `branch_create` on it, drop it. The new branch is durable on disk via the manifest write that `GraphCoordinator::branch_create` issues on its own commit graph; subsequent reads of any coord will see it after their normal manifest refresh. Pre-fix: `branch_create_from_impl` ran swap → operate → restore as three separate `coordinator.write().await` acquisitions. Under `&self` concurrency, two callers with distinct source branches could interleave their swaps, leaving each caller's "operate" step running against the other's swapped coordinator and forking the new branch off the wrong HEAD. Pinned by `concurrent_branch_create_from_distinct_parents_does_not_corrupt_coordinator` (previous commit) which deterministically reproduced the race with 8/8 forks landing on the wrong parent. Why correct by design (AGENTS.md rule 9): closing the bug class "non-atomic three-step coordinator manipulation under &self callers" by removing the manipulation entirely. There's no scratch-space race to lose because there's no scratch space. Note: `branch_merge_impl` at `crates/omnigraph/src/exec/merge.rs:1085-1100` keeps the same swap-restore pattern. Its inner `branch_merge_on_current_target` calls `self.snapshot()` and `self.ensure_commit_graph_initialized()` which acquire the coord lock independently, so the simple "operate on local coord" refactor doesn't compose without a deeper interface change. The per-(table, branch) writer queue inside the merge body (`crates/omnigraph/src/exec/merge.rs:1224`) bounds the damage in practice; a deterministic regression for concurrent merges is tracked under Block 3.1 of the plan. `swap_coordinator_for_branch` and `restore_coordinator` remain crate-internal for now (still used by `branch_merge_impl`); a follow-up can remove them if the merge path is similarly refactored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:48:17 +02:00
Ragnor Comerford	3b33e9ac56	tests: pin branch_create_from swap-restore race (red) Per AGENTS.md rule 8, this commit lands the failing regression test ahead of the fix so the red → green pair is visible in git log. The test demonstrates that two concurrent `POST /branches` calls with distinct `from` parents corrupt coordinator state: A's "operate" step runs against B's swapped coordinator instead of its own, forking the new branch off the wrong parent's HEAD. Currently fails on `f925ad1` with all 8 gamma branches (declared parent: alpha, 5 rows) reporting 4 rows — beta's row count. The operate step ran against beta's coord because B's swap interleaved between A's swap and A's operate. Fix lands in the next commit: hold a single `coordinator.write().await` guard across the entire swap-operate-restore sequence in `branch_create_from_impl` so the three steps are atomic relative to other callers. Closes the bug class "non-atomic three-step coordinator manipulation under &self callers" rather than guarding the specific call site — the right architectural seam (single critical section per swap-restore sequence) eliminates the interleave window for branch_create_from and any future swap-restore caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:44:50 +02:00
Ragnor Comerford	4ca527cc53	staging: op-kind-aware drift check at commit_all entry Closes the bug class "Lance internal conflict surfaces as 500 instead of 409" for in-process concurrent strict-op writers on the same row. Pre-fix: in `MutationStaging::commit_all`, after queue acquisition, the staged Lance transaction (built against V0) was handed straight to `commit_staged`. When Lance HEAD has advanced past V0 (because the queue's prior winner already published), Lance's transaction conflict resolver fires `RetryableCommitConflict` for Update vs Update on the same row, which wraps as `OmniError::Lance(<string>)` and the API maps it to HTTP 500. Users see "internal server error" instead of a clean retryable conflict. Fix: track the strictest `MutationOpKind` per touched table on `MutationStaging` and propagate through `StagedMutation`. In `commit_all`'s recapture loop, before each `commit_staged`, fail-fast with `OmniError::manifest_expected_version_mismatch` (→ HTTP 409 ExpectedVersionMismatch) for tables whose tracked op_kind has `strict_pre_stage_version_check() == true` (Update/Delete/SchemaRewrite) when the staged dataset's version doesn't match the fresh manifest pin under the queue. Insert/Merge tables skip the check — concurrent inserts on disjoint keys legitimately coexist via Lance's auto-rebase, so the check would over-reject the existing same-key insert path. Threading: `ensure_path` now takes `op_kind` and stores it in a new `op_kinds: HashMap<String, MutationOpKind>` on `MutationStaging`, with strictness-upgrade semantics so mixed insert+update on the same table still fires the strict check at commit time. `StagedMutation` carries `op_kinds` through to `commit_all`. Pinned by `change_concurrent_updates_same_key_serialize_via_publisher_cas` in `crates/omnigraph-server/tests/server.rs` (added in the previous commit). All Phase 2 sentinels still pass: change_concurrent_inserts_same_key_serialize_without_409, change_conflict_returns_manifest_conflict_409, branch_merge_conflict_response_includes_structured_conflicts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:42:14 +02:00
Ragnor Comerford	ebf5a5769d	tests: pin UPDATE RYW under in-process concurrency (red) Per AGENTS.md rule 8, this commit lands the failing regression test ahead of the fix so the red → green pair is visible in git log. The test asserts the RYW invariant for in-process concurrent UPDATEs on the same row: exactly one writer commits and N-1 receive 409 manifest_conflict. Currently fails on `f925ad1` with 1 x 200 + 7 x 500: > "storage: Retryable commit conflict for version 6: This Update > transaction was preempted by concurrent transaction Update at > version 6. Please retry." Lance's transaction conflict resolver correctly detects the Update vs Update race, but the error wraps as `OmniError::Lance(<string>)` and the API surfaces it as 500 internal rather than 409 retryable conflict. Users see "internal server error" for what is documented as a retryable conflict path. The fix lands in the next commit: an op-kind-aware drift check at the commit_all entry that returns 409 ExpectedVersionMismatch for tables whose first touch was Update / Delete / SchemaRewrite when the staged dataset version drifts from the manifest pin under the queue. Closes the bug class "Lance internal conflict surfaces as 500 instead of 409" rather than mapping the specific Lance error variant — the right architectural layer (engine boundary, under the queue) catches the drift before commit_staged ever runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:33:53 +02:00
Ragnor Comerford	56a479ea2f	tests: failpoints schema_source().as_str() (CI fix) PR 2 made Omnigraph::schema_source() return Arc<String> via ArcSwap, but the failpoints test still compared against &'static str constants. Three E0308 type mismatches were blocking the Test Workspace CI job; this fix restores compilation. - failpoints.rs:125,160,195 now call schema_source().as_str() to align with the &str constants. - Drops 11 unused let mut db = ... bindings on the same path (engine write APIs are &self post PR 2 Step C). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:26:23 +02:00
Ragnor Comerford	f925ad1739	mr-686: Phase 2 — op-kind-aware version check + coord Mutex → RwLock Fix A: op-kind-aware ensure_expected_version. Insert/Merge skip the strict pre-stage check; Update/Delete/SchemaRewrite keep it. New MutationOpKind enum threaded through open_for_mutation_on_branch / open_owned_dataset_for_branch_write / reopen_for_mutation and all callers (execute_insert/update/delete_node/delete_edge, branch_merge::publish_rewritten_merge_table, schema_apply, ensure_indices_for_branch, loader Append/Merge/Overwrite). Closes the 77% rejection rate on same-key concurrent inserts. Fix B: coordinator Mutex -> RwLock. Reads parallelize via .read(); writes serialize via .write(). Atomic-commit invariant preserved by the single .write() covering commit_manifest_updates + record_graph_commit. Bench-as-test change_concurrent_inserts_same_key_serialize_without_409 (server.rs:2180) spawns 12 concurrent /change inserts on a single (table, branch); asserts every request returns 200. Was failing pre-Phase-2; passes post-Phase-2. change_conflict_returns_manifest_conflict_409 (cross-process drift sentinel) and branch_merge_conflict_response_includes_structured_conflicts both still pass. Bench (after-pr2-phase2): - single-actor 1x1: 14.9 ops/s, p50 68ms (baseline 12.3, +22%) - disjoint 8x8: 7.04 ops/s, p50 1023ms (baseline 6.24, +13%) - same-key 8x1: 2.62 ops/s, 0 errors (after-pr2: 77% errors) Disjoint stayed at +13% — Fix B's RwLock helped read paths but the publisher's .write() critical section still serializes graph-wide. Splitting GraphCoordinator into per-concern primitives (manifest in ArcSwap, commit_graph in RwLock, atomic-commit serializer) is the deferred next step. 102 lib + 30 branching + 24 runs + 16 staged_writes + 63 end_to_end + 40 server tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:42:26 +02:00
Ragnor Comerford	b93a130b40	staging: re-capture per-branch snapshot under queue (fixes cross-branch fail) The `bdd6440` commit re-captured expected_versions from `db.snapshot()` (bound-branch view). That broke any mutation on a non-bound branch: when the engine handle is bound to main but the mutation targets feature, the bound-branch snapshot returns main's pin for each table, not feature's. The publisher commits to feature, reads feature's manifest entry, sees a different version → 409 even though no concurrent writer existed. Reproduced by `branch_merge_conflict_response_includes_structured_conflicts` which mutates main then mutates feature on the same Omnigraph handle — the second mutation failed with "expected V6, current V5". Switch the re-capture to `db.snapshot_for_branch(branch).await` so the per-branch entries are resolved correctly. This is one fresh manifest read per mutation (the same I/O PR 1b had pre-Step-D), but it is now required for cross-branch correctness — Step D's "in-memory under single-coordinator invariant" rationale was only sound for single-branch workloads. Single-table same-branch mutations could still skip this read (queue exclusivity makes the publisher CAS a no-op), but the conditional adds complexity for marginal gain. Left as a follow-up perf optimization tracked in `.context/bench-results/comparison.md`. Bench numbers updated: - single-actor 1x1: 15.2 ops/s vs baseline 12.3 (+24%) - disjoint 8x8: 7.12 ops/s vs baseline 6.24 (+14%) - same-key 8x1: 77% errors via the strict ensure_expected_version check upstream of commit_all; same-key concurrent-write fix is a separate follow-up. All 102 lib + 39 server + 24 runs + 30 branching + 20 traversal + 9 validators tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 19:44:14 +02:00
Ragnor Comerford	bdd6440c83	staging: re-capture expected_versions under queue (PR 2 Step D fix) The Step D commit (`1b0a2c9`) skipped revalidation for single-table mutations, betting that the publisher's CAS would be a no-op under the per-(table, branch) queue. The bench falsified this: expected_versions was captured during stage_all (BEFORE acquire_many), so by the time the queue acquired and the publisher ran, those captured pins were stale w.r.t. any in-process concurrent writer that had published in between. Same-key 8x1 produced ~99% manifest_conflict 409 rejections because every actor after the first carried stale expected_versions. Fix: always re-read the in-memory snapshot under the queue and overwrite expected_versions with the current per-table values. Single-coordinator invariant (one Arc<Omnigraph> per process) makes this safe with zero I/O — publishes update the shared coordinator BEFORE releasing queue guards, so a contending tenant's read sees a fresh view by the time it acquires its keys. The publisher's CAS becomes a correct no-op for queued tables; cross-process drift (coord stale because coord doesn't see external publishes) still rejects via the publisher CAS as ExpectedVersionMismatch -> 409, preserving the change_conflict_returns_manifest_conflict_409 regression sentinel. Trade-off documented in the comment: SERIALIZABLE-opt-in writes (§VI.36 aspirational) will need an additional revalidation step here; the bench's append/upsert pattern is fine because Lance's natural rebase handles concurrent writes onto the same dataset. Bench results captured at .context/bench-results/after-pr2/ + .context/bench-results/comparison.md: - single-actor 1x1: 15.0 ops/s vs baseline 12.3 (+22%) - disjoint 8x8: 7.03 ops/s vs baseline 6.24 (+13%) - same-key 8x1: still rejected (76% errors) by the ensure_expected_version strict check upstream of commit_all; follow-up to address. Disjoint's 13% is below the master plan's ≥8× target. Bench shows the coordinator Mutex is now the dominant serializer; relaxing to RwLock for snapshot/version reads is the next perf step, tracked as a follow-up in comparison.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 19:28:36 +02:00
Ragnor Comerford	c15962e6b0	server: flip AppState to Arc<Omnigraph>, wire admission on /change (PR 2 Step F) The substantive PR 2 change. Removes the global server `RwLock<Omnigraph>` that has serialized every mutating request across all actors. Disjoint `(table, branch)` writes from different actors now run concurrently, guarded only by the engine's per-(table, branch) write queue (PR 1b) and per-actor admission control (PR 2 Step E). AppState changes: - `db: Arc<RwLock<Omnigraph>>` -> `engine: Arc<Omnigraph>` - New field: `workload: Arc<workload::WorkloadController>` initialized from env (`OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=16`, `OMNIGRAPH_PER_ACTOR_BYTES_MAX=4GiB`, `OMNIGRAPH_GLOBAL_REWRITE_MAX=4`). - `tokio::sync::RwLock` import dropped. Handler updates (16 sites): - All `Arc::clone(&state.db).read_owned().await` and `write_owned()` calls replaced with `let db = &state.engine`. Engine APIs are now `&self` (Step C) so this works directly. - `/export` clones `Arc<Omnigraph>` once and moves into the spawned task instead of acquiring a long-held read lock. - `/change` handler additionally wires `state.workload.try_admit(&actor_arc, est_bytes)`. Cedar runs FIRST so denied requests don't consume admission slots; admission runs SECOND before the engine call. `est_bytes` uses the request body size as a coarse proxy. API surface additions (`api::ErrorCode`): - `TooManyRequests` -> HTTP 429 (per-actor cap exceeded; respect `Retry-After`) - `ServiceUnavailable` -> HTTP 503 (global rewrite pool exhausted) `ApiError` constructors `too_many_requests` / `service_unavailable` and `from_workload_reject` (maps `RejectReason` variants to HTTP status). Other mutating handlers (`/ingest`, `/branches/*`, `/branches/merge`, `/schema/apply`) currently flow through the Arc<Omnigraph> path without admission gates; wiring those is mechanical and lands as a follow-up. The /change hot path covers the bulk of MR-686's load profile. OpenAPI regenerated to include the new ErrorCode variants. 102 lib + 39 server tests + 5 workload tests pass. The regression sentinel `change_conflict_returns_manifest_conflict_409` continues to pass (revalidation perf opt + per-table queue + publisher CAS preserve manifest_conflict semantics under concurrent writers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 17:08:26 +02:00
Ragnor Comerford	17a1665002	server: add WorkloadController for per-actor admission (PR 2 Step E) PR 2 removes the global server `RwLock<Omnigraph>` (Step F). Without admission control, one heavy actor would exhaust shared capacity (Lance I/O threads, manifest churn, network) and starve other actors. The WorkloadController bounds per-actor in-flight count + bytes and provides a global rewrite-pool semaphore for compaction / index builds. New file: `crates/omnigraph-server/src/workload.rs` (~250 LOC + 5 tests). API: - `WorkloadController::new(inflight_cap, byte_cap, rewrite_cap)` / `from_env()` / `with_defaults()`. - `try_admit(actor_id, est_bytes) -> Result<AdmissionGuard, RejectReason>` acquires both an in-flight count permit and adds est_bytes to the per-actor counter atomically; returns RejectReason on either gate. - `try_admit_rewrite() -> Result<RewriteGuard, RejectReason>` for the global rewrite pool (Step F maps RewriteGuard exhaustion to HTTP 503). - `RejectReason::{InFlightCountExceeded, ByteBudgetExceeded, GlobalRewriteExhausted}`. Race-free admission via `tokio::sync::Semaphore::try_acquire_owned()` for the count gate (master plan Finding 6: independent atomic load+check+add lets two callers both pass a cap-N check; the Semaphore gate is atomic). Bytes use `fetch_add` + decrement-on-rejection so the cap is never exceeded even on rollback. Defaults (override via env): - OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=16 - OMNIGRAPH_PER_ACTOR_BYTES_MAX=4_294_967_296 (4 GiB) - OMNIGRAPH_GLOBAL_REWRITE_MAX=4 Tests cover under-cap admission, byte-budget rollback, per-actor isolation, global rewrite cap, and the load-bearing 32-concurrent-vs- cap-16 race test (forces real contention via a broadcast release channel so guards can't recycle permits task-by-task; pins the master plan's race-free invariant). Adds workspace dep `dashmap = "6"` for per-actor state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:59:45 +02:00
Ragnor Comerford	1b0a2c9310	staging: skip revalidation single-table; in-memory snapshot multi-table (PR 2 Step D) Closes the PR 1b regression (-17% disjoint, -30% same-key) by eliminating the fresh `db.snapshot_for_branch(branch).await` that PR 1b's commit_all issued per mutation. Single-table mutations (`staged.len() + inline_committed.len() == 1`): skip revalidation entirely. The per-(table, branch) Mutex queue holds exclusive while we commit; the publisher's CAS catches any drift that slipped between expected_versions capture and queue acquisition. Conflict cost: 1 orphan Lance HEAD advance, recovered via the existing sidecar protocol on the next ReadWrite open. This is the same trade-off the master plan §"Revalidation perf optimization" prescribes. Multi-table mutations: replace `db.snapshot_for_branch(branch)` (fresh manifest read) with `db.snapshot()` (in-memory). Correct under MR-686's single-process scope because all in-process tenants share one `Arc<Omnigraph>` -> one coordinator; publishes update the shared coordinator BEFORE releasing queue guards, so a contending tenant reads a fresh in-memory view by the time it acquires its queue keys. The within-mutation race (A captures expected_versions[T2]=V0, B publishes T2->V1 during A's stage I/O, A then acquires T2's queue) is caught via the in-memory check. Multi-coordinator deployments (§VI.27 aspirational) would need force-refresh under the queue — documented in §VI's "Explicit non-commitments". Adds a SAFETY comment naming the two load-bearing premises: (1) per-table queue uses exclusive Mutex (not RwLock), and (2) single-coordinator invariant (one Omnigraph engine per process). Migrating either breaks this skip. Regression sentinel `change_conflict_returns_manifest_conflict_409` passes. 102 lib + 24 runs + 16 staged_writes pass with the new path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:53:51 +02:00
Ragnor Comerford	d08c42c369	engine: convert write APIs from &mut self to &self (PR 2 Step C) The interior-mutability primitives from Step B (catalog ArcSwap, schema_source ArcSwap, coordinator Mutex, and RuntimeCache's existing internal locking) make every Omnigraph engine write API safe to expose under &self. This commit flips the public surface so the HTTP server can hold Arc<Omnigraph> in PR 2 Step F instead of Arc<RwLock<Omnigraph>>. Public API conversions: - mutate, mutate_as - ingest, ingest_as, ingest_file, ingest_file_as - load, load_as, load_file - branch_merge, branch_merge_as - apply_schema - ensure_indices, ensure_indices_on - optimize Inner functions converted in lockstep (their signatures must match the new caller shape): - mutate_with_current_actor, ingest_with_current_actor, load_direct_on_branch - execute_named_mutation, execute_insert, execute_update, execute_delete, execute_delete_node, execute_delete_edge - branch_merge_impl, branch_merge_on_current_target - load_jsonl_reader - schema_apply::{apply_schema, apply_schema_with_lock, acquire_schema_apply_lock, release_schema_apply_lock, ensure_schema_apply_idle} - table_ops::{ensure_indices, ensure_indices_on, ensure_indices_for_branch, commit_prepared_updates, commit_prepared_updates_with_expected, commit_prepared_updates_on_branch, commit_prepared_updates_on_branch_with_expected, commit_manifest_updates, record_merge_commit, ensure_commit_graph_initialized, commit_updates_on_branch_with_expected} - optimize::optimize_all_tables - Omnigraph::commit_manifest_updates, record_merge_commit, commit_updates_on_branch_with_expected, ensure_commit_graph_initialized The conversion is mechanical: callers that previously took `db: &mut Omnigraph` now take `db: &Omnigraph`; every interior mutation goes through the existing locks (coordinator.lock().await, store_catalog, runtime_cache.invalidate_all). No new locks acquired, no new lock-order hazards introduced. 102 lib tests + 24 runs + 30 branching + 63 end_to_end + 39 server tests pass. Workspace compiles clean (1 warning on a now-redundant `mut` binding in CLI; cleaned up in a follow-up). The remaining work in PR 2 is the AppState flip (Arc<RwLock<Omnigraph>> -> Arc<Omnigraph> + WorkloadController), the revalidation perf optimization in commit_all, and the WorkloadController itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:52:02 +02:00
Ragnor Comerford	011f9b9610	engine: wrap coordinator in tokio Mutex (PR 2 Step B continued) Wraps the GraphCoordinator field in `Arc<tokio::sync::Mutex<...>>` so engine APIs can move from `&mut self` to `&self` without giving up the coordinator's mutating refresh path. Lock acquisition order: always before runtime_cache (when both are needed in one scope). Critical sections stay short — load+clone for snapshot/version/current_branch, single-method delegations elsewhere. Public API changes: - `Omnigraph::version()` and `Omnigraph::snapshot()` (pub(crate)) become async; callers add `.await`. - `Omnigraph::active_branch()` returns `Option<String>` (cloned) instead of `Option<&str>` borrowed from the coordinator. Callers either `.await` the result + use `.as_deref()`, or hoist into a binding. `&self`-converted methods this round (tied to the coordinator wrap, not the Step C surface refactor): - `swap_coordinator_for_branch` - `restore_coordinator` (now async; was sync) - `sync_branch` - `refresh` - `refresh_coordinator_only` - `reload_schema_if_source_changed` - `branch_create`, `branch_create_from`, `branch_delete`, `branch_list` - `delete_branch_storage_only` - `ensure_branch_delete_safe` - `ensure_schema_apply_idle` - `ensure_schema_apply_idle` helper in schema_apply.rs (matches signature) Caller updates: branch_create_from_impl threads `restore_coordinator`'s new async signature; schema_apply, table_ops, exec/merge wrap every direct `db.coordinator.X()` in `db.coordinator.lock().await.X()`; exec/merge hoists `active_branch_for_keys` once outside the per-table closure that builds queue keys + sidecar pins. All 102 lib tests + 30 branching + 24 runs + 10 lifecycle + 16 staged_writes + 63 end_to_end pass workspace-wide. Zero test regressions; the only behavior change is on the `Omnigraph` API surface (sync -> async on the three accessors above). Step C (engine API conversion: apply_schema, mutate_as, ingest_as, branch_merge_as &mut self -> &self) follows in a subsequent commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:38:48 +02:00
Ragnor Comerford	fcb47620d3	mr-686: bundle PR 0/1a/1b foundation + PR 2 catalog/schema_source ArcSwap Bundles the working-tree state from the prior session (PR 0 bench harness, PR 1a audit_actor_id removal, PR 1b WriteQueueManager + writer integration) together with the first half of PR 2's interior-mutability foundation (catalog and schema_source wrapped in Arc<ArcSwap<...>>). The two streams intermix in 7 of the same files, so splitting via git add -p was impractical. Subsequent PR 2 steps land as separate atomic commits. PR 0 — server-level concurrent /change bench harness - crates/omnigraph-server/examples/bench_concurrent_http.rs (new) - .context/bench-results/{baseline-main,after-pr1}/ (gitignored) PR 1a — drop the audit_actor_id field, thread per-call - removed Omnigraph::audit_actor_id and the swap-restore patterns in mutation.rs, merge.rs, loader/mod.rs - actor_id: Option<&str> threaded through MutationStaging::finalize, mutate_with_current_actor, ingest_with_current_actor, branch_merge_impl, branch_merge_on_current_target, commit_prepared_updates*, record_merge_commit, commit_updates_on_branch_with_expected - apply_schema and ensure_indices_for_branch pass None (system-attributed) PR 1b — per-(table_key, branch) write queue + revalidation + sidecar - new crates/omnigraph/src/db/write_queue.rs with WriteQueueManager, acquire/acquire_many, sorted+deduped acquisition; 6 unit tests - Arc<WriteQueueManager> field on Omnigraph + db.write_queue() accessor - MutationStaging::finalize split into stage_all (Phase A, no queue) and StagedMutation::commit_all (Phase B, acquire_many + revalidate pins + sidecar + commit_staged); guards held across publisher - delete-only mutations now emit recovery sidecars; revalidation extended to inline_committed tables - branch_merge_on_current_target, apply_schema_with_lock, and ensure_indices_for_branch acquire per-table queues for their touched tables PR 2 Step B (partial) — catalog and schema_source via ArcSwap - catalog: Catalog -> Arc<ArcSwap<Catalog>> - schema_source: String -> Arc<ArcSwap<String>> - public accessors return Arc<Catalog> / Arc<String>; readers bind locally where the borrow has to outlive an expression - new pub(crate) store_catalog / store_schema_source helpers replace the field assignments in apply_schema and reload_schema_if_source_changed - 117 tests across lifecycle/end_to_end/branching/runs pass; engine lib + workspace compile clean Coordinator wrap (Mutex) and the &mut self -> &self engine API conversion follow in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:22:38 +02:00
Ragnor Comerford	cd780e2d37	deps: add arc-swap to workspace for PR 2 catalog/schema_source wrapping PR 2 wraps the Omnigraph engine's catalog and schema_source fields in ArcSwap so reads stay zero-cost while apply_schema can swap atomically without &mut self. arc-swap lands as an unused workspace dep here so the follow-up commits that wrap fields can land in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:25:22 +02:00
Ragnor Comerford	c12f6adb0c	docs/invariants: add §VI.35-37 + non-commitments for MR-686 Three new §VI invariants name what OmniGraph commits to as an agent-native system of record: branches as the cross-query coordination primitive, per-query isolation as a per-query opt-in (Serializable up, eventual down), and type-aware agent-resolvable merges. Plus an explicit non-commitments subsection so reviewers see what is intentionally out of scope (Strict Serializable across queries, cross-process linearizable single-object writes, auto-resolution of ambiguous merge conflicts). §VII and §VIII renumber by +3 to make room (35-43 -> 38-46, 44-47 -> 47-50); deny-list and review-checklist references in §IX/§X follow. testing.md's pre-existing stale §VII.33/34/36 references resolve to their actual §VIII.47/48/50 targets in the same pass. staged_writes.rs:866's docstring gains an MR-686 forward reference so the load-bearing concurrency-hazard test points readers at the queue work that closes the gap. §VI.34 is preserved alongside the broader §VI.36 to keep its MR-425 pointer addressable; the overlap is documented in §VI.36's status line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 14:45:54 +02:00
Ragnor Comerford	a30666bc38	docs/tests: reserve Phase A/B/C/D for the per-writer recovery flow Three terminologies were calling themselves Phase A/B in PR #72: 1. Per-writer recovery (canonical, four phases A/B/C/D — sidecar / commit_staged loop / manifest publish / sidecar delete in `docs/runs.md:157`). 2. Per-table staged-write contract from MR-793 (two phases — `stage_*` then `commit_staged`). 3. Test-narrative scaffolding (Phase A = setup the failure, Phase B = verify recovery — used as section dividers in failpoints.rs). Same letters, three meanings; three reviewers including the bots have already misread the code in the resulting fog. This change keeps "Phase A/B/C/D" exclusively for #1 and rewrites the other two: - `ensure_indices_phase_a_btree_failure_leaves_existing_tables_writable` → `ensure_indices_stage_btree_failure_leaves_existing_tables_writable` (matches the `stage_create_btree_index` API verb). - Comment at `table_ops.rs:610` and the test docstring at `failpoints.rs:807` rewrite "a Phase A failure in the staged-index path" → "a stage-step failure in the staged-index path". - Twelve `// Phase A:` / `// Phase B:` test scaffolding comment headers in `failpoints.rs` (across six test fns) become `// Setup:` / `// Recovery:`. - A "Phase letter convention" note added near `docs/runs.md:165` spells the rule out for future readers. Also bundled: rename `composite_flow_init_load_branch_merge_time_travel_optimize_cleanup` → `composite_flow_canonical_lifecycle` so it pairs as a story name with `composite_flow_multi_branch_sequential_merges` (the previously- deferred symmetry rename). No behaviour change. Both renamed tests pass; full failpoints (18) + composite_flow (2) suites pass; workspace baseline + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 22:46:03 +02:00
Ragnor Comerford	fb0f024652	recovery: register added tables + tombstones in SchemaApply roll-forward Cursor flagged that SchemaApply sidecars only captured `Update` pins (via `snapshot.entry()?` in schema_apply.rs:166), so recovery's `roll_forward_all` only published `ManifestChange::Update` for the rewritten/indexed tables. Added types (`added_tables`) and tombstones (`renamed_tables` sources) were silently dropped during recovery. Reproducer: in `schema_apply_phase_b_failure_recovered_on_next_open`, the v2 schema added a `Tag` node type. Pre-fix, `node:Tag` ended up as an orphan dataset on disk while the manifest never received a `RegisterTable` entry — the live `_schema.pg` declared a type the manifest didn't know about, and `count_rows(node:Tag)` panicked with `no manifest entry for node:Tag`. The existing test passed only because it never queried Tag. Fix: 1. Extend `RecoverySidecar` with `additional_registrations` and `tombstones` fields (optional, serde-default for backward compat with existing on-disk sidecars). Both are SchemaApply-only. 2. Populate them in `apply_schema_with_lock` from the migration plan's upfront diff (`added_tables` + `renamed_tables` keys for registrations; `renamed_tables` values for tombstones, version- pinned at `source_entry.table_version + 1`). 3. Update `roll_forward_all` to: - emit `RegisterTable` + `Update` for each `additional_registrations` entry (read the dataset's current Lance HEAD for the version metadata + row_count) - emit `Tombstone` for each `tombstones` entry - filter against `snapshot` so previously-published registrations / tombstones are skipped (handles the post-Phase-C-success-but- sidecar-not-yet-deleted case — without filtering, the publisher's CAS pre-check would error with `expected=0, actual=N` on the redundant Register) 4. Extend the audit-row outcomes to include published registrations. Test changes: - `schema_apply_phase_b_failure_recovered_on_next_open` now asserts `count_rows(node:Tag) == 0` (no panic), proving the new manifest entry exists. - `schema_apply_recovers_pre_commit_crash` renamed to `schema_apply_pre_commit_crash_rolls_forward_via_sidecar` and rewritten — pre-fix it expected pre-commit crashes to roll BACK (delete staging, keep V1, leave Company as orphan); the sidecar protocol's "complete the writer's intent" semantic now rolls FORWARD (rename staging -> final, register Company atomically). The new assertions verify schema = V2 and `node:Company` is queryable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 22:15:50 +02:00
Ragnor Comerford	3ea7a1fd50	recovery: record RolledForward audit on stale-after-success sidecar Cursor flagged that if `roll_forward_all` succeeds (manifest pin advances) but `record_audit` then fails, the sidecar persists. On the next open, every table classifies as NoMovement (lance_head == manifest_pinned, both already reflect the prior roll-forward) → `decide` returns RollBack → `roll_back_sidecar` records a RolledBack audit row with empty per-table outcomes. Operators reading `_graph_commit_recoveries.lance` see "RolledBack" for an operation whose actual outcome was a successful roll-forward. `process_sidecar`'s RollBack arm now distinguishes "stale-after- success" from a legitimate rollback: when every classification is NoMovement AND any pin's `manifest_pinned > expected_version` (the manifest already advanced past the writer's CAS target), recovery dispatches to `record_audit_recovery_rollforward` which writes a RolledForward audit row with reconstructed outcomes (`from_version = expected_version`, `to_version = manifest_pinned`) and deletes the sidecar. No Lance writes — the substrate is already in the post-roll-forward state. Safe in `RollForwardOnly` mode (refresh-time recovery) because no `Dataset::restore` is involved; the legitimate-rollback path stays deferred to the next ReadWrite open as before. Added `recovery_records_rolled_forward_for_stale_sidecar_after_successful_roll_forward` integration test that synthesizes the state by writing a sidecar whose `expected_version < manifest_pin` and asserts: - audit row records `RolledForward` (not `RolledBack`) - per-table outcome reports the correct `from_version` / `to_version` pair - sidecar is deleted Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 20:12:43 +02:00
Ragnor Comerford	11a9b3c8b9	tests: assert actual total_people count, not just row count Cubic flagged that the time-travel `total_people` assertions only checked `result.num_rows() == 1`, which would still pass if the historical query returned the wrong count (e.g., 10 instead of 6 because of a planner regression resolving against current state instead of the captured snapshot). Added `assert_total` helper that extracts the Int64 `total` column and verifies the actual value. Replaces three weak `num_rows() == 1` assertions in `composite_flow_multi_branch_sequential_merges`: - post-both-merges: total = 10 - time-travel to pre-merge-a: total = 6 - post-reopen: total = 10 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 20:12:27 +02:00
Ragnor Comerford	0a6f3d796a	tests: extend multi-branch flow with .gq query checkpoints The first cut of `composite_flow_multi_branch_sequential_merges` used dataset-direct `count_rows` for read-side assertions, which proves data is on disk but skips the query path entirely — planner, BTree index lookup, edge traversal, aggregation, and snapshot resolution all stay untested. Replaced with strategic `.gq` query checkpoints: - branch isolation via `get_person` after Eve insert (Eve visible on feat-a; absent on main) - 1-hop traversal via `friends_of(Grace)` after the Knows-edge insert (validates the topology index against branch-local edges) - post-merge query-engine readback after merge feat-a → main (Eve findable through BTree, Grace's edge traversable through the rebuilt Knows index) - aggregation via `total_people` after merge feat-b → main (count over a multi-fragment table whose shape is the result of two sequential merges) - time-travel via `ReadTarget::Snapshot(captured_id)` for both `total_people` and `friends_of` / `get_person` at the two pre-merge points (catches planner regressions where historical queries accidentally resolve current indices) - post-reopen query-engine readback (catches reopen-time index/ catalog binding regressions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 19:42:17 +02:00
Ragnor Comerford	9fc6526ec0	tests: multi-branch sequential merges compositional flow Adds `composite_flow_multi_branch_sequential_merges` covering the agent-workflow pattern that single-merge tests in `branching.rs` cannot reach: two feature branches diverging from main with main writes interleaved between every diverge point, sequential merges into main, time-travel through the resulting merge DAG, and reopen consistency over a multi-merge history. The script (18 numbered steps with assertions per step): init+load → mutate main → branch feat-a → mutate main → mutate feat-a → branch feat-b → mutate feat-b → mutate feat-a (with edge) → merge feat-a → mutate main → merge feat-b → time-travel to pre-merge-a + pre-merge-b → reopen + verify. Catches eight compositional gap categories that only surface with ≥2 merges and main mutations between them: base/LCA recomputation across two merges, manifest-pin propagation through merge commits, time-travel through merge DAG without state bleed-through, branch- DAG consistency, sibling-branch isolation under writes elsewhere, post-merge main-write integration, multi-merge reopen replay, and clean-flow recovery-sidecar absence. `composite_flow.rs` was added to `docs/testing.md` so the before- every-task checklist points agents at the file before duplicating coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 19:34:04 +02:00
Ragnor Comerford	58a3ff0e48	recovery: align merge sidecar branch with active_branch + record rollback drift Two small PR #72 review findings addressed: - merge.rs sidecar pin recorded `entry.table_branch` (where the table currently lives in the target manifest) instead of the merge target branch where commits actually land via `publish_rewritten_merge_table` → `open_for_mutation` → `fork_dataset_from_entry_state`. Recovery's `open_lance_head` would then check the wrong ref. Aligned with the pattern already used in `ensure_indices_for_branch` (table_ops.rs:115). Added `branch_merge_sidecar_pins_table_branch_to_active_branch` contract test that reads the sidecar JSON and asserts every per-pin `table_branch` equals the active (target) branch — catches the regression even when the values happen to coincide in the test setup. - Rollback audit `from_version` previously equalled `to_version` (both `manifest_pinned`), telling operators nothing about the actual Lance HEAD drift before restore. Captured `lance_head` in `ClassifiedTable` and used it as `from_version` so audit rows now show "rolled back from v7 to v5" instead of "v5 → v5". Added `assert_rollback_outcomes_record_drift` invariant in the test helper, invoked automatically by every `RecoveryExpectation::RolledBack` test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 19:33:32 +02:00
Ragnor Comerford	815ff743f5	recovery: refresh-time roll-forward closes the in-process residual + invariants helper Bundle of three correctness fixes plus a shared invariants helper that existing tests now use. 1. SchemaApply atomicity: close the residual gap where a sidecar exists but staging files don't (e.g., Phase B failure BEFORE `_schema.pg.staging` write). `recover_schema_state_files` now returns a `SchemaStateRecovery` discriminator (`Noop` / `CleanedStaging` / `CompletedStagingRename { schema_apply_sidecar }`); the token threads through `recover_manifest_drift` → `process_sidecar`. SchemaApply sidecars are eligible for roll-forward ONLY when the staging rename completed in the same recovery pass. Full mode rolls back; RollForwardOnly defers. Without this, recovery would publish the manifest pin against new-schema data while `_schema.pg` stayed old (real corruption). New failpoint `schema_apply.before_staging_write` + new test `schema_apply_without_schema_staging_rolls_back_on_next_open` pin the gating. 2. Rollback target correction. Rollback now restores Lance HEAD to the current manifest pin (`state.manifest_pinned`) instead of the sidecar's `expected_version`. For UnexpectedAtP1/UnexpectedMultistep classifications these can differ; the old code could regress Lance HEAD past the manifest pin, re-introducing drift in the OTHER direction. The new behavior establishes `Lance HEAD == manifest pin` post-rollback — the canonical drift-free invariant. Param renamed from `expected_version` → `target_version` to match. Audit `to_version` records the actual restore target. This is a latent-behavior change. Any external consumer that compared `audit.to_version` against `sidecar.expected_version` for non-trivial classifications now sees the manifest pin instead. 3. Audit commit-graph unification. `record_audit` now opens the per-branch commit graph for ANY sidecar with `sidecar.branch.is_some()` — not just BranchMerge. Plain Mutation/Load/EnsureIndices commits on a feature branch now correctly land on that branch's commit graph, instead of main's. Closes the class of bug analogous to D2 but for non-merge writers. Pre-existing repos with non-main commits already on main's commit graph stay where they are; future recoveries write to the per-branch ref. Mixed-version compatibility is asymmetric but safe (old binaries ignore per-branch refs they don't know about; new binaries read both). 4. Recovery invariants helper + branch-axis cells. New `tests/helpers/recovery.rs` (~505 LOC) exports `assert_post_recovery_invariants(repo, op_id, RecoveryExpectation)` plus a `TableExpectation` builder. Six existing recovery tests refactored to call it; per-test bespoke assertions replaced. Two new branch-axis cells added in `tests/failpoints.rs`: - `recovery_rolls_forward_load_on_feature_branch` - `recovery_rolls_forward_ensure_indices_on_feature_branch` The loader gains a `mutation.post_finalize_pre_publisher` failpoint hook (gated on the `failpoints` feature; zero-cost in release) so the load test can pin the same Phase B → Phase C boundary the mutation path uses. Misc: - `Omnigraph::refresh` extracts `reload_schema_if_source_changed`: early-return when schema source unchanged (saves IR parse + catalog rebuild on the steady-state refresh path). - New test injection point `failpoint_publish_table_head_without_index_rebuild_for_test` under `#[cfg(feature = "failpoints")]`. Tests: 31 recovery + failpoint integration tests pass (14 + 17, up from 14 + 16). Full workspace sweep with `--features failpoints` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 16:04:48 +02:00
Ragnor Comerford	44c0d0bc4b	recovery: refresh reloads schema after staging recovery; non-main merge test pins parent_commit_id E1. After D3 added recover_schema_state_files to refresh(), the in-memory `self.schema_source` and `self.catalog` were left stale: a SchemaApply sidecar processed via refresh would rename the staging files (`_schema.pg`, IR contract) into place but the handle continued operating against the old catalog. Subsequent operations would surface schema mismatches against post-migration data on disk. Fix: after recover_manifest_drift completes, refresh() now mirrors open_with_storage_and_mode's schema-load sequence — re-reads `_schema.pg`, parses IR via load_or_bootstrap_schema_contract, rebuilds the catalog with fixup_blob_schemas, and assigns into self.schema_source / self.catalog. Steady-state cost: one read + one parse per refresh; only mutates handle state when the on-disk schema actually changed. E2. The non-main branch_merge recovery test (`branch_merge_phase_b_failure_recovered_on_non_main_target`) asserted only `merged_parent_commit_id` was non-null — but `merged_parent_commit_id` is independently populated from sidecar.merge_source_commit_id (the SOURCE branch's tip), so the assertion would pass even if D2's per-branch CommitGraph fix regressed (the bug was about `parent_commit_id`, the TARGET branch's tip). Fix: capture target_branch's commit-graph head BEFORE the failed merge by scanning target_branch's Lance ref on _graph_commits.lance and picking the latest commit by created_at. After recovery, find the recovery merge commit (the one with non-null merged_parent_commit_id) and assert its `parent_commit_id` == captured pre-failure head. Without D2, recovery would record the GLOBAL head (the source_branch's insert-Carol commit on this test) instead, and the assertion fails. Also fixes the column-type cast: created_at is stored as TimestampMicrosecondArray, not Int64Array. All workspace tests pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:06:17 +02:00
Ragnor Comerford	2ce4efc450	recovery: four review-round-4 fixes + branch-axis test matrix D1. roll_forward_all returns per-table actual published versions; the audit row's `to_version` records that, not pin.post_commit_pin (the latter is a lower bound for loose-match writers SchemaApply / EnsureIndices / BranchMerge — pin.post_commit_pin = expected + 1 while actual published HEAD can be expected + N). D2. Branch-merge recovery audit uses CommitGraph::open_at_branch when sidecar.branch is Some, so the merge parent is the TARGET BRANCH's tip (not the global head). Without this, recovered branch_merge on a non-main target records the wrong merged_parent_commit_id and future merges between the same pair lose already-up-to-date detection / merge-base correctness. D3. Omnigraph::refresh now mirrors open's recovery composition: runs recover_schema_state_files BEFORE recover_manifest_drift. Without this, a SchemaApply sidecar processed via refresh would publish the manifest + delete the sidecar without renaming the staging schema files, leaving the repo with new-schema data and old `_schema.pg` (real corruption). Refresh's docstring now enumerates each open-time recovery step it maintains, so the next maintainer's diff between open() and refresh() is trivial. D4. ensure_indices sidecar pin records `active_branch` (where commits actually land), not `entry.table_branch` (where the table currently lives). On first fork-on-write, the processing loop's open_owned_dataset_for_branch_write forks to active_branch and the commit lands there — recovery's open_lance_head must check the same branch. Without this, recovery checks the wrong ref and misses Phase B drift entirely. D5. Two new branch-axis tests: * recovery_rolls_back_feature_branch_sidecar_against_feature_branch — feature-branch rollback variant; asserts post-recovery audit kind == RolledBack and the actual restore commit landed on the feature ref. * branch_merge_phase_b_failure_recovered_on_non_main_target — non-main merge target variant; reads the target branch's commit graph (Lance ref) and asserts the recovery commit has a non-null merged_parent_commit_id (pins D2). Bug pattern: all four are at composition seams between concepts that were each tested individually (writer-precision × actual-Lance-HEAD; branch-context × commit-graph-API; recovery-path × writer-kind; pin- time-branch × commit-time-branch). The branch-axis matrix is the cheapest mechanical prevention for D2/D4-class regressions. All workspace tests pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 11:34:18 +02:00
Ragnor Comerford	aaa031e834	recovery: refresh-time roll-forward closes the in-process residual Adds RecoveryMode { Full, RollForwardOnly } and wires Omnigraph::refresh to invoke roll-forward-only recovery. This closes the documented "long-running server between Phase B failure and process restart" residual without requiring a restart, for the common case (mutation / load finalize → publisher failure). Why roll-forward only and not full sweep: * Roll-forward is safe under concurrency (publisher uses row-level CAS). * Roll-back uses Dataset::restore, which "wins" against concurrent Append/Update/Delete/CreateIndex/Merge per check_restore_txn — silently orphaning the concurrent writer's commit (pinned by tests/staged_writes.rs::lance_restore_loses_to_concurrent_append_via_orphaning). Sidecars that classify as RollBack-eligible are LEFT ON DISK for the next ReadWrite open, where no concurrent writers exist and full restore is safe. Implementation: * recovery.rs: RecoveryMode enum; recover_manifest_drift takes mode; process_sidecar branches on mode for Abort and RollBack — both defer to next ReadWrite open under RollForwardOnly. RollForward behavior unchanged. * omnigraph.rs: Omnigraph::refresh promoted to pub; calls recover_manifest_drift in RollForwardOnly mode after coordinator refresh. Steady-state cost: one list_dir of __recovery (early return on empty). Adds refresh_coordinator_only — pub(crate) — for engine-internal callers that hold an in-flight sidecar (the schema_apply lease-check + lock-release paths). Without this split, refresh would race the in-flight sidecar. * schema_apply.rs: switch all 6 internal db.refresh() call sites to refresh_coordinator_only(). Tests: * refresh_runs_roll_forward_recovery_in_process — trigger mutation.post_finalize_pre_publisher; without restart, call db.refresh(); assert sidecar deleted, drifted row visible, subsequent mutation succeeds. * refresh_defers_rollback_eligible_sidecar_to_next_open — synthesize a Mutation sidecar with bogus expected (UnexpectedAtP1 → RollBack); refresh leaves it on disk and Lance HEAD unchanged; drop and reopen runs the full sweep which advances HEAD via restore. Docs: * docs/runs.md "Long-running servers" caveat updated to describe the refresh-time roll-forward path and the rollback-defer behavior. * docs/invariants.md §VI.23 status line updated to reflect in-process closure of the common case. Workspace tests pass with --features failpoints; no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 00:15:42 +02:00
Ragnor Comerford	8c6506f5cd	recovery: close four correctness gaps (schema-apply, branch-aware, restore short-circuit, merge parent) B1. Schema-apply atomicity. Before this commit, a failure between `_schema.pg.staging` write and the manifest publish left the repo corrupt: Lance HEADs advanced under the new schema, manifest stayed at old pins, and on reopen schema-state recovery deleted the staging files (manifest's table set still matched the live schema), then manifest-drift recovery rolled the table versions forward — leaving new-schema data on disk with the old `_schema.pg` live. Fix: a SchemaApply sidecar is the marker that Phase B completed but Phase C didn't. New helper `has_schema_apply_sidecar` is consulted by `recover_schema_state_files` BEFORE its disambiguation logic; when present, it completes the staging→final rename so the subsequent manifest-drift roll-forward sees the new catalog. B2. Branch-aware recovery. Sidecars from feature-branch writers were being classified against main's snapshot and main's Lance HEAD, silently no-op'ing or rolling back the wrong table version (the classifier saw NoMovement; the writer's drift on the feature branch persisted; subsequent feature writers surfaced ExpectedVersionMismatch). Fix: SidecarTablePin gets an optional `table_branch` field; `recover_manifest_drift` opens a per-branch coordinator (`GraphCoordinator::open_branch`) per sidecar; `open_lance_head`, `restore_table_to_version`, and `roll_forward_all` honor the pin's branch via `Dataset::checkout_branch`. B3. Remove fragment-id short-circuit in `restore_table_to_version`. Equal fragment IDs do NOT imply equal content: Lance index commits and deletion-vector updates change the manifest without touching fragment IDs. Skipping restore in those cases would leave Lance HEAD ahead of the manifest with no recovery artifact left. Restore is now unconditional; pile-up under repeated mid-rollback crashes bounded and reclaimed by `omnigraph cleanup`. B4. Recovered branch_merge records merge parent. `record_audit` always called `append_commit`, dropping `merged_parent_commit_id`. Future `branch_merge feature -> main` between the same pair lost already-up-to-date detection. RecoverySidecar gets an optional `merge_source_commit_id`; `branch_merge_on_current_target` populates it from `source_head_commit_id`; `record_audit` dispatches to `append_merge_commit` when present. New tests: feature-branch sidecar classification (B2); B1 deepens the existing schema_apply test with live-`_schema.pg` and new-type assertions; B4 deepens the existing branch_merge test by reading `_graph_commits.lance` and asserting a non-null `merged_parent_commit_id`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 23:39:41 +02:00
Ragnor Comerford	35c4b16e91	recovery: address five outstanding review findings A1. tests/recovery.rs: rewrite recovery_multi_sidecar_requires_fresh_snapshot_for_correctness to use real `append_batch` instead of fragment-preserving `delete_where("1 = 2")`. The previous setup made restore_table_to_version's fragment-set short-circuit no-op the bug path, so the load-bearing `HEAD == v3` assertion passed in both bug and fix paths. Real appends produce different fragment-id sets across v1, v2, v3 so a real restore actually runs in the bug path (HEAD becomes v4). Added a person_batch helper matching the post-init Lance schema (id, age, name). A2. exec/merge.rs: filter recovery sidecar pins to `RewriteMerged` candidates only. `AdoptSourceState`'s pure-pointer-switch and fork subcases don't advance Lance HEAD; pinning them would force NoMovement on recovery and trigger an all-or-nothing rollback that destroys legit RewriteMerged work. Documented residual: AdoptSourceState subcases that internally call publish_rewritten_merge_table aren't covered by the sidecar; closing that requires pre-computing source deltas during candidate classification (a structural change to CandidateTableState) — left as follow-up. A3. db/omnigraph/table_ops.rs: add the same branch filter (`active_branch.is_some() && entry.table_branch.is_none() => continue`) to the ensure_indices sidecar pin loop that the processing loop already has. Without this, main-branch tables that need index work get pinned but never committed when ensure_indices runs on a feature branch → NoMovement → all-or-nothing rollback destroys feature-branch work. A4. tests/failpoints.rs: deepen schema_apply_phase_b_failure and branch_merge_phase_b_failure tests with post-recovery manifest-pin advance assertions. branch_merge test setup also mutates main so the merge produces at least one RewriteMerged candidate (required after A2's pin filter — a no-op merge with all-AdoptSourceState would write no sidecar). Fixed stale "BranchMerge is strict-classified" comment to reflect current loose classification. A5. tests/composite_flow.rs: remove duplicate back-to-back `total_people` query in step 12. Full workspace test sweep with --features failpoints passes: no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 15:09:58 +02:00
Ragnor Comerford	05e52f2ee0	recovery: rename composite test, strip ticket references, address review Three bundled changes: 1. Rename `tests/agent_lifecycle.rs` -> `tests/composite_flow.rs` (and the test function). OmniGraph is consumed by both humans and agents - naming the test after one audience misframes the library. 2. Strip Linear ticket IDs, PR numbers, bot reviewer names, and review-round labels from source, tests, and docs added by this branch. Internal traceability belongs in commit messages and PR descriptions, not in checked-in artifacts. Upstream lance-format/lance issue refs and pre-existing MR-XXX refs in docs not touched by this branch are left alone. 3. Two outstanding review findings addressed: - `needs_index_work_node` / `needs_index_work_edge`: propagate `count_rows` errors instead of `unwrap_or(0)`. Silently treating transient I/O failures as "0 rows" risked skipping a table from the recovery sidecar pin set that was actually about to be modified. - `recovery_multi_sidecar_requires_fresh_snapshot_for_correctness`: strengthen the assertion to fail when sidecar B classifies under a stale snapshot. The new assertion checks post-recovery Lance HEAD == v3 (no `Dataset::restore` ran). The previous "sidecar deleted + audit rows present" pair passed in both the bug and fix paths because both delete the sidecar and write an audit row; the differentiator is the post-recovery HEAD. Strengthening the assertion exposed an additional nuance: in this overlapping- sidecar scenario sidecar B's audit kind is RolledBack (no-op) rather than RolledForward, since sidecar A's roll-forward publishes Lance HEAD as the new manifest pin (absorbing B's work). The docstring now explains why this is correct given current `roll_forward_all` semantics. All workspace tests pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 13:56:36 +02:00
Ragnor Comerford	78cc548846	tests: composite agent-lifecycle integration test (MR-858) Implements MR-858 ahead of the rest of the MR-857 epic: the deterministic narrative test counterpart to MR-783's randomized harness. `tests/agent_lifecycle.rs::agent_lifecycle_init_load_branch_merge_time_travel_optimize_cleanup` walks the canonical agent flow end to end: 1. init repo with TEST_SCHEMA 2. load_jsonl seed data (4 Person + 2 Company nodes; Knows + WorksAt edges) 3. branch_create feature off main 4. mutate on feature: single-statement insert (Eve) + multi-statement insert+edge (Frank knows Eve) 5. query on feature: total_people / friends_of (traversal) / unemployed (anti-join) / friend_counts (aggregation) 6. mutate on main (set Bob's age) — sets up non-conflicting merge 7. branch_merge feature → main; verify version advance 8. query post-merge: confirm Eve visible on main (from feature) + Bob visible (from main mutation, carried through merge) 9. snapshot_at_version(pre_merge_version): time-travel still sees pre-merge state (4 Persons, no Eve) 10. optimize the post-merge graph; verify reads still work + counts unchanged 11. cleanup with --keep 10 --older-than 3600s (no-op for this short test, but exercises the call path) 12. drop + reopen; verify all counts + branch list consistent; confirm read path works post-cleanup-reopen Known limitation surfaced: post-optimize mutation path in step 11 hit `ExpectedVersionMismatch` because `optimize_all_tables` advances per-table Lance HEAD without updating the `__manifest` pin (`db/omnigraph/optimize.rs:77`), and something between optimize and re-open writes a higher version row to `__manifest`. Test documents this and defers full coverage to MR-859 (`omnigraph optimize` + `cleanup` integration coverage), keeping the read-path-after-cleanup assertion which is the headline operator concern. Test runs in <1s. ~672 workspace tests pass with --features failpoints; no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 13:10:28 +02:00
Ragnor Comerford	26b4c61d44	recovery: address PR #72 round-2 review findings Bot reviewers (cubic + cursor) flagged 5 follow-on issues after the first fix push. Three are real bugs in the Phase 6-8 ensure_indices sidecar wiring; two are AI-slop flags on shallow tests. One cursor finding is a false positive on intentional node/edge index asymmetry. Real bugs fixed: - needs_index_work_node and needs_index_work_edge now skip empty tables (count_rows == 0). The ensure_indices_for_branch loop has `if row_count > 0 { build_indices(...) }`, so empty tables produce zero commit_staged calls. Pinning them in the sidecar would force NoMovement classification on recovery and trigger the all-or-nothing rollback of any sibling table's legitimate index work (cubic #1). - needs_index_work_node and needs_index_work_edge now respect the table_branch parameter from the snapshot entry, instead of always passing None (== main). For branch writes, opening the wrong HEAD could miss recoverable Phase B commits (cubic #2). - needs_index_work_edge documented as intentionally BTree-only (mirrors the build_indices_on_dataset_for_catalog edge branch which only builds id/src/dst BTrees). Cursor flagged FTS/vector omission as inconsistency with the node helper; confirmed intentional via inline comment so future readers know the asymmetry is on purpose (cursor finding, false positive marked). Test improvements: - recovery_multi_sidecar_requires_fresh_snapshot_for_correctness — new integration test that uses TWO sidecars on the SAME table where sidecar B's expected_version equals sidecar A's post_commit_pin. Sidecar B's classification only succeeds if the recovery sweep refreshes the snapshot between iterations to see A's manifest update. Without the refresh fix from the prior commit, B would be classified against stale pins (cubic #4 follow-up). - recovery_ensure_indices_handles_empty_tables — new integration test that runs ensure_indices on an all-empty repo. With the round-2 fix, both initial and steady-state runs leave no sidecar (zero pins ⇒ zero sidecar I/O). Without the empty-table fix, the sidecar would pin Company (zero rows but missing indices) and force a NoMovement rollback (cubic #1 verification). - ensure_indices_phase_b_failure_does_not_leak_sidecar_when_no_work_needed — renamed/rewrote the prior `_recovered_on_next_open` test to assert the post-fix invariant: when load_jsonl auto-built every catalog index via prepare_updates_for_commit, ensure_indices's needs_work helpers correctly report zero pins and produce no sidecar. The old assertion ("exactly one sidecar must persist") was wrong for the scoped behavior. Test surface (post-round-2): - 25 unit tests in db::manifest::recovery (BranchMerge classifier, sort order, primitives — unchanged). - 12 integration tests in tests/recovery.rs (+2 from this commit). - 11 failpoint tests including the four per-writer Phase B → recovery tests (one renamed to reflect the scoped behavior). - ~672 workspace tests pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 12:50:33 +02:00
Ragnor Comerford	164bafbbe7	recovery: address PR #72 review findings Bot reviewers (cubic, cursor, chatgpt-codex) caught 4 merge-blocking bugs + 3 strongly-recommended fixes + 3 doc errors in the initial PR. Each fix has a paired test demonstrating the bug before the fix. Merge-blocking fixes: - BranchMerge moved to loose-match classifier arm. publish_rewritten_ merge_table runs multiple commit_staged calls per table (merge_insert + delete_where + index rebuilds). Strict classification rolled back valid completed Phase B work as UnexpectedMultistep. Three new unit tests pin the loose-match behavior for BranchMerge. - branch_merge sidecar uses self.active_branch() (the resolved target branch) instead of inferring from the first sorted table key. The previous heuristic could record None (== main) when the merge target was a non-main branch, causing recovery to publish to the wrong manifest namespace. - Best-effort sidecar delete in all 5 writer sites (mutation, loader, schema_apply, branch_merge, ensure_indices). Previously, a sidecar cleanup failure after a successful manifest publish would error out the user's call for a write that already landed. Now: log a warning and ignore — the next open's recovery sweep tidies the stale sidecar via NoMovement classification. - ensure_indices sidecar scoped to tables that need work via new helpers needs_index_work_node / needs_index_work_edge. Previously the sidecar pinned every catalog table; if only one needed indexing, the others classified as NoMovement and the all-or-nothing decision rolled back legitimate index work. Strongly-recommended fixes: - recover_manifest_drift now takes &mut GraphCoordinator and refreshes between sidecars. Sidecar B's classification needs to see sidecar A's manifest changes, otherwise B can be classified against stale pins and incorrectly roll back work that just landed. - list_sidecars sorts URIs before reading. Sidecar filenames are ULIDs (chronologically sortable), so this gives deterministic, time-ordered processing. Filesystem-order was nondeterministic. - ReadOnly opens skip recover_schema_state_files too (was: only the MR-847 sweep was gated). Read-only consumers may run with read-only credentials; silent open-time mutations violate the contract. Doc cleanups: - Removed stale "Phase 4 placeholder" comment from recover_manifest_drift. - docs/runs.md decision-tree wording now correctly surfaces the InvariantViolation abort path. - docs/branches-commits.md clarifies actor_id is in _graph_commit_actors.lance (joined by graph_commit_id), not on _graph_commits.lance itself. Test surface (post-fixes): - 25 unit tests in db::manifest::recovery (+4 from this commit). - 10 integration tests in tests/recovery.rs (+3 from this commit). - ~672 tests across ~25 binaries pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 12:21:40 +02:00
Ragnor Comerford	72d3da66de	recovery: per-writer Phase B failure → recovery integration tests (Phase 9) Add the three paired per-writer tests required by MR-847's acceptance criteria — "All four migrated writers ... have paired Phase B → recovery integration tests." Production additions (~10 LOC): - New failpoint `branch_merge.post_phase_b_pre_manifest_commit` in `exec/merge.rs::branch_merge_on_current_target` between the per-table publish loop and `commit_manifest_updates`. - New failpoint `ensure_indices.post_phase_b_pre_manifest_commit` in `db/omnigraph/table_ops.rs::ensure_indices_for_branch` between the per-table loops and `commit_prepared_updates_on_branch`. - For schema_apply, the existing `schema_apply.after_staging_write` failpoint already fires in the right window (after the per-table rewrites + index builds, before the manifest publish). Sidecar tweak: - `schema_apply` sidecar's `branch` is now `None` (was `Some("__schema_apply_lock__")`). The lock branch is purely a serialization sentinel; `coordinator.commit_changes_with_actor` publishes against the coordinator's pre-lock branch (main). After the failpoint fires, `release_schema_apply_lock` removes the lock branch — if the sidecar referenced it, the recovery sweep would try to publish to a branch that no longer exists and fail. Fix: record the actual publish target. Tests added in `tests/failpoints.rs` (~280 LOC): - `schema_apply_phase_b_failure_recovered_on_next_open` — seeds a row, opens, attempts a schema apply that adds a new node type + a new property (the new type ensures the table set differs so `recover_schema_state_files` doesn't trip on property-only ambiguity), failpoint fires, drops engine, reopens, asserts sidecar deleted + audit row recorded. - `branch_merge_phase_b_failure_recovered_on_next_open` — seeds main, branches off, mutates the branch, attempts merge with the `branch_merge.post_phase_b_pre_manifest_commit` failpoint active. Same recovery shape. - `ensure_indices_phase_b_failure_recovered_on_next_open` — seeds rows, attempts ensure_indices with the `ensure_indices.post_phase_b_pre_manifest_commit` failpoint active. After this commit, all four migrated writers have paired Phase B → recovery tests: - mutate_as / load: `recovery_rolls_forward_after_finalize_publisher_failure` (Phase 5) - schema_apply: `schema_apply_phase_b_failure_recovered_on_next_open` - branch_merge: `branch_merge_phase_b_failure_recovered_on_next_open` - ensure_indices: `ensure_indices_phase_b_failure_recovered_on_next_open` 11 failpoint tests pass; full workspace lib + integration tests pass (350+ tests across 20 binaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:46:24 +02:00
Ragnor Comerford	c6827919ca	recovery: wire sidecar into schema_apply, branch_merge, ensure_indices (Phases 6-8) Three writers each follow the same shape established in Phase 5: build SidecarTablePin list before the per-table commit_staged loop, write the sidecar via recovery::write_sidecar, do the existing work, delete the sidecar after the manifest publish succeeds. Loose-match classifier (recovery.rs): The classifier now distinguishes strict vs. loose match per SidecarKind. Strict (Mutation, Load, BranchMerge): exactly one commit_staged per table; lance_head == manifest_pinned + 1 AND post_commit_pin == lance_head required. Loose (SchemaApply, EnsureIndices): the writer may run N >= 1 commit_staged calls per table — index builds + rewrites compound, and the exact N is hard to compute at sidecar-write time. Loose accepts any lance_head > manifest_pinned (with expected_version still matching the manifest pin) as RolledPastExpected. The risk it admits — an external agent advancing HEAD between sidecar write and recovery — is out of scope for the single-coordinator model (MR-668 territory). roll_forward_all now reads the CURRENT Lance HEAD per table (not the sidecar's post_commit_pin) so the manifest publish reflects whatever HEAD landed, even if the loose-match writer committed multiple times per table. Per-writer wiring: - schema_apply::apply_schema_with_lock: sidecar covers rewritten_tables ∪ indexed_tables (the tables that go through stage_overwrite/stage_create_index commit_staged). Skips added_tables (fresh datasets, no Phase B residual class) and renamed_tables (handled by the existing schema-state staging recovery in recover_schema_state_files). - branch_merge::branch_merge_on_current_target: sidecar covers every table in candidates (publish_adopted_source_state + publish_rewritten_merge_table do the per-table commit_staged work). Sidecar writes after validate_merge_candidates and deletes after commit_manifest_updates. - ensure_indices_for_branch: sidecar covers every node + edge type in the catalog with a manifest entry (build_indices_on_dataset is per-table-per-index commit_staged). Skips when the catalog has nothing — steady-state calls incur no sidecar I/O when the manifest already pins all expected types. Allow recovery_audit.rs in forbidden_apis.rs: The new db/recovery_audit.rs uses Dataset::write to bootstrap the _graph_commit_recoveries.lance dataset (same pattern as commit_graph.rs which is already allow-listed). Add it to the ALLOW_LIST_FILES list in tests/forbidden_apis.rs. 8 new unit tests in db::manifest::recovery cover the loose-match classifier branches (SchemaApply + EnsureIndices accept multi-commit drift, NoMovement and InvariantViolation behave the same as strict). All 20 test binaries pass (350+ tests across the workspace). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:46:24 +02:00
Ragnor Comerford	49ca7e5068	recovery: wire sidecar into MutationStaging::finalize + flip headline test (Phase 5) Production wiring (~120 LOC): - `MutationStaging::finalize` now takes a `SidecarKind` parameter and returns an additional `Option<RecoverySidecarHandle>`. Builds a Vec<SidecarTablePin> from `pending` BEFORE the per-table commit_staged loop and writes the sidecar via `recovery::write_sidecar`. Skips the sidecar when `pending` is empty (delete-only mutation; D₂ keeps these out of the staged-write path so the option is just a clean signal, not a code path users hit). - `exec/mutation.rs::execute_mutation_as` (around line 740): destructure the new third element, pass `SidecarKind::Mutation`, delete the sidecar after `commit_updates_on_branch_with_expected` succeeds. - `loader/mod.rs::ingest_loaded` (around line 540): same shape, with `SidecarKind::Load`. The Overwrite path stays inline-commit (legacy residual; out of MR-847 scope per docs/runs.md). - New engine accessors `Omnigraph::storage_adapter()` and `Omnigraph::root_uri()` for the sidecar I/O. The pre-existing `db.storage` field stays private; no other engine code reaches around the accessor. - Re-exports from `db::manifest`: `new_sidecar`, `write_sidecar`, `delete_sidecar`, plus the `RecoverySidecar*` types and `SidecarKind`, so consumers in `exec/` can use them via `crate::db::manifest::...`. Bugfix folded in (~5 LOC): make `coordinator` mutable in `Omnigraph::open_with_storage_and_mode` and call `coordinator.refresh()` after the recovery sweep returns. Roll-forward advances the manifest pin on disk; without the refresh the returned engine carried a stale in-memory snapshot. The Phase 4 tests passed only because they opened Lance datasets directly rather than going through `db.snapshot()`. Storage adapter (~15 LOC): `LocalStorageAdapter::write_text` now ensures the parent directory exists via `tokio::fs::create_dir_all`. Required because the sidecar protocol writes into `__recovery/` which doesn't pre-exist after `Omnigraph::init`. S3 has no equivalent; PutObject is path-agnostic. Headline test flip (~150 LOC): - `tests/failpoints.rs::finalize_publisher_residual_drifts_lance_head_until_next_writer_recovers` is replaced by `recovery_rolls_forward_after_finalize_publisher_failure`. Same setup (failpoint at `mutation.post_finalize_pre_publisher`) but after the synthetic failure the test: 1. Asserts the sidecar persists in `__recovery/` for the recovery sweep to find. 2. Drops the engine handle. 3. Reopens via `Omnigraph::open` — recovery sweep classifies RolledPastExpected, decides RollForward, publishes the manifest update, records the audit row, deletes the sidecar. 4. Asserts the sidecar is gone. 5. Asserts the originally-attempted Eve insert is now visible (Person count = 1). 6. Asserts a subsequent insert succeeds without ExpectedVersionMismatch (Person count = 2). 7. Asserts the audit dataset `_graph_commit_recoveries.lance` exists. This is the headline contract the MR-847 acceptance criteria require. All other failpoint and runs tests continue to pass (8 + 24 unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:46:24 +02:00
Ragnor Comerford	ca21e73d43	recovery: roll-forward execution + audit row (Phase 4) Implement the remaining half of the open-time recovery sweep. Roll-forward execution (db/manifest/recovery.rs::roll_forward_all): constructs a GraphNamespacePublisher directly (recovery runs inside Omnigraph::open before the engine struct exists, so we can't go through Omnigraph::commit_updates_on_branch_with_expected). Builds a ManifestChange::Update per sidecar table reading row_count and TableVersionMetadata from the dataset at post_commit_pin (cheap; manifest-level reads, not a row scan), then calls publisher.publish with expected_table_versions = sidecar.expected_version per table. Single __manifest CAS extends every pin atomically — all-or-nothing at the substrate. Persistent CAS contention surfaces as the typed ExpectedVersionMismatch error and leaves the sidecar in place for the next open's retry. Audit model (new crates/omnigraph/src/db/recovery_audit.rs + record_audit() in recovery.rs): each successful recovery sweep records a graph-commit row tagged with actor_id="omnigraph:recovery" plus a row in a new sibling table _graph_commit_recoveries.lance carrying recovery_kind (RolledForward \| RolledBack), recovery_for_actor (the sidecar's original actor_id), operation_id (sidecar ULID), sidecar_writer_kind, per_table_outcomes (JSON-serialized for schema flexibility), and created_at. Operators investigating "did my mutation land?" can find the answer via `omnigraph commit list --filter actor=omnigraph:recovery` joined to the recoveries table by graph_commit_id. The sibling-table choice avoids bumping INTERNAL_MANIFEST_SCHEMA_VERSION or migrating _graph_commits.lance. Same not-atomic-pair-write shape as the existing _graph_commits + _graph_commit_actors split — a crash between the two sequential writes leaves an orphan commit row with no recovery row. Recovery sweep tolerates this: re-entry classifies already-restored / already-published tables as NoMovement, the action is a no-op, and the audit append is retried. Note on classifier: process_sidecar's RollBack arm now restores RolledPastExpected, UnexpectedAtP1, AND UnexpectedMultistep (any drift class). Earlier Phase 3 logic restricted to RolledPastExpected only, which left UnexpectedAtP1/UnexpectedMultistep tables drifted; the all-or-nothing decision rule per docs/invariants.md §VI.23 demands all drifted tables be restored. 3 new integration tests in tests/recovery.rs (7 total now): - recovery_rolls_forward_after_phase_b_completes — happy-path roll-forward; audit row recorded; idempotent on second open. - recovery_rolls_back_records_audit_row_with_recovery_actor — roll-back path also records an audit row with the original actor. - recovery_rolls_forward_with_null_actor — sidecar without actor_id still records the audit row (recovery_for_actor = None). 3 new unit tests in db::recovery_audit pin the round-trip + persistence + recovery_kind string parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:46:23 +02:00
Ragnor Comerford	c2fc3e7c40	recovery: wire open-time sweep + OpenMode (Phase 3) Add `OpenMode::{ReadWrite, ReadOnly}` and route `Omnigraph::open` through `open_with_storage_and_mode`. Recovery sweep runs only under `OpenMode::ReadWrite` — read-only consumers (NDJSON export, commit list, schema show) skip it via `Omnigraph::open_read_only`. Rationale: the sweep performs Lance writes (Dataset::restore, manifest publish); a read-only consumer with read-only object-store credentials shouldn't trigger writes, and reads always resolve through the manifest pin regardless of any drift on the per-table side. `recover_manifest_drift` lands in db/manifest/recovery.rs and is wired into Omnigraph::open AFTER recover_schema_state_files — schema-state recovery operates on staging files; manifest-drift recovery operates on Lance HEADs that may depend on schema-state being settled. Roll-back path is fully implemented: classify each table per the sidecar's intent, dispatch the all-or-nothing decision, and call restore_table_to_version for any table with drift (RolledPastExpected, UnexpectedAtP1, or UnexpectedMultistep). NoMovement tables are already at expected_version — no action. Sidecar deleted as the final step. Roll-forward path errors with a Phase-4 placeholder so it surfaces loudly if reached without the audit + manifest-publish wiring landing first. Concurrency: today (pre-MR-686) recovery is naturally serialized by the single-coordinator model. Open runs at server startup BEFORE Arc<RwLock<Omnigraph>> wraps the engine (lib.rs:194), so no request handlers can race. CLI is sequential by caller orchestration. Under MR-686's per-(table_key, branch) queues + MR-856 (background recovery reconciler), the queue acquisition will need to extend to recovery sweeps — handoff documented on MR-686 ticket and in MR-856. 4 integration tests in tests/recovery.rs pin the Phase 3 contract: - recovery_does_not_run_on_clean_open — no sidecars; sweep is a no-op. - recovery_refuses_unknown_schema_version_on_open — sidecar v=99 surfaces SidecarSchemaError and is left on disk for operator review. - read_only_open_skips_recovery_sweep — even a sidecar with bogus table_path doesn't get classified under OpenMode::ReadOnly. - recovery_rolls_back_synthetic_drift_on_open — sidecar with mismatched post_commit_pin classifies as UnexpectedAtP1, decision is RollBack, restore is invoked, sidecar is deleted, idempotent on second open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:46:23 +02:00
Ragnor Comerford	376d91d538	recovery: scaffold sidecar protocol + classifier + decision tree (Phase 2) Add db/manifest/recovery.rs with the primitives the open-time recovery sweep will invoke. No integration into Omnigraph::open or any writer path yet — those land in Phase 3+. Sidecar protocol: - RecoverySidecar JSON shape (schema_version=1; SidecarSchemaError refuses unknown versions — old binaries don't guess at newer shapes). - SidecarKind {Mutation, Load, SchemaApply, BranchMerge, EnsureIndices} for audit attribution. - SidecarTablePin {table_key, table_path, expected_version, post_commit_pin}. - write_sidecar / delete_sidecar / list_sidecars / parse_sidecar. Classifier + decision dispatcher (all-or-nothing per sidecar): - TableClassification {NoMovement, RolledPastExpected, UnexpectedAtP1, UnexpectedMultistep, InvariantViolation}. - classify_table(pin, lance_head, manifest_pinned). - decide(&[TableClassification]) -> SidecarDecision {RollForward, RollBack, Abort}. Mid-Phase-B crash with mixed states rolls BACK (not forward) — atomicity per docs/invariants.md §VI.23. Restore primitive: - restore_table_to_version(table_path, expected_version): open, checkout(expected_version), restore. Includes a fragment-set equality short-circuit so repeated mid-rollback crashes don't pile up Lance versions (Lance fragments are immutable; equal fragment-ids ⇒ equal content). StorageAdapter trait extension: - Added list_dir(dir_uri) -> Vec<String> for sidecar enumeration. LocalStorageAdapter uses tokio::fs::read_dir; S3StorageAdapter uses object_store::list with a prefix-collision guard (filters to require the directory '/' boundary so listing __recovery doesn't accidentally match __recovery_log/...). RecordingStorageAdapter (test wrapper) delegates to inner. 17 unit tests covering: classifier branches, decision branches (including mid-Phase-B mix → RollBack and empty slice → RollForward), JSON round-trip, schema-version refusal, restore HEAD+1, fragment-set short-circuit no-op, list_sidecars empty/round-trip/non-JSON-skip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:46:23 +02:00
Ragnor Comerford	1e70028293	MR-847: pin Lance restore semantics empirically (Phase 1) Two new tests in tests/staged_writes.rs that the recovery sweep design depends on: - lance_restore_appends_one_commit_with_checked_out_content — verifies Dataset::restore() (no-args; restores currently-checked-out version) produces HEAD+1, not HEAD+2 as the v1 design assumed. Source confirmed at lance-4.0.0/src/dataset.rs:1106; this test prevents a future lance bump from silently breaking the recovery rollback math. - lance_restore_loses_to_concurrent_append_via_orphaning — pins the concurrency hazard motivating MR-847's open-time-only invocation strategy: check_restore_txn (lance-4.0.0/src/io/commit/conflict_ resolver.rs:986) returns Ok against Append/Update/Delete/CreateIndex/ Merge/etc., so a Restore commits successfully even when a concurrent legitimate writer just landed an Append — silently orphaning the Append's data from the active timeline. MR-847 sidesteps via running recovery only at Omnigraph::open (before any other writers race); MR-856 (continuous-recovery reconciler) must guard via per-(table, branch) queue acquisition once MR-686 lands. These two tests together pin the foundation for MR-847's correctness claims and document the load-bearing constraint MR-856 will inherit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:46:23 +02:00
Ragnor Comerford	8726ffe0a3	release: bump version to 0.4.1	2026-05-02 23:20:50 +02:00
Ragnor Comerford	bf7d716d1b	forbidden_apis: add `.restore(` and document why `.append(` / `.delete(` are excluded Cubic flagged that the guard misses `ds.append() / ds.delete() / ds.restore()`. `.restore(` is added — Lance-specific (no false positives in the workspace). `.append(` and `.delete(` stay excluded with a documenting comment: * `.append(` over-matches `Vec::append`, `String::append`, every `arrow_array::xxxArrayBuilder::append` (30+ legit uses across `exec/mutation.rs`, `loader/jsonl.rs`, `exec/projection.rs`). * `.delete(` over-matches `ObjectStore::delete` (used in `storage.rs`, `db/schema_state.rs`, `db/omnigraph.rs:1277` for staging-file cleanup) and would require many `// forbidden-api-allow:` sentinels for legitimate uses. The remaining bypass route — engine code that imports `lance::Dataset` and calls `ds.append(reader, params)` — is bounded by: 1. The trait surface itself (sealed, only-callable-via-trait once Phase 1b call-site conversion completes). 2. The PR-review process catching new `lance::Dataset` imports in non-storage files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:25:52 +02:00
Ragnor Comerford	9b0920b5da	address PR #70 bot review (Cubic + Cursor): 7 inline + failpoint test + invariants notes Cubic findings: * `tests/forbidden_apis.rs`: expand `FORBIDDEN_PATTERNS` with `Dataset::write` / `Dataset::append` / `Dataset::delete` / `Dataset::merge_insert` / `Dataset::add_columns` / `update_columns` / `drop_columns` / `truncate_table` / `restore` and the bare `.merge_insert(` / `.add_columns(` / `.update_columns(` / `.drop_columns(` / `.truncate_table(` method patterns. Deliberately avoid `.append(` / `.delete(` / `.write(` (over-match `Vec::append`, `.delete_branch(`, arrow-array `.append(`, etc.). Allow-list `commit_graph.rs` and `graph_coordinator.rs` — they're manifest-layer infra that legitimately uses `Dataset::write` for system tables. * `schema_apply.rs:253`: pass `entry.table_branch.as_deref()` (not `None`) to `open_dataset_head_for_write` for consistency with the sibling `indexed_tables` block. Schema apply rejects non-main branches at the lock-acquire step today, so behavior is unchanged; this is a defensive consistency fix that survives a future relaxation of the lock check. * `storage_layer.rs:131` doc: was `Vec<&StagedWrite>` with lifetime claim; actually returns `Vec<StagedWrite>` (cloned). Fixed. * `AGENTS.md:201` capability matrix row + `storage_layer.rs:1` module doc: softened the "stage_* + commit_staged are the only paths" / "trait funnels every write" overclaim. Inline-commit residuals (`delete_where`, `create_vector_index`) remain on the trait pending upstream Lance work (#6658, #6666); legacy `append_batch` etc. remain pending Phase 1b / Phase 9. Module doc now describes the current transitional state honestly. Cursor Bugbot findings: * `storage_layer.rs:360`: trait `delete_where` consumed `SnapshotHandle` but returned only `DeleteState`, dropping the post-delete dataset. Future callers migrating from the inherent `&mut Dataset` API would lose the post-delete dataset state needed for indexing / `table_state` queries. Fixed: returns `(SnapshotHandle, DeleteState)` matching `append_batch` / `overwrite_batch` shape. * `storage_layer.rs:824`: removed dead `_scanner_type_marker` fn and the unused `Scanner` import (the marker existed only to suppress an unused-import warning — fixing the import is the cleaner answer). Engine-level Phase A failpoint test (closes the partial-criterion flagged in Cubic's acceptance-criteria checklist): * `db/omnigraph/table_ops.rs::stage_and_commit_btree`: instrumented with `crate::failpoints::maybe_fail("ensure_indices.post_stage_pre_commit_btree")` between `stage_create_btree_index` and `commit_staged`. * `tests/failpoints.rs::ensure_indices_phase_a_btree_failure_leaves_existing_tables_writable`: triggers the failpoint via a schema-apply that adds a new node type; proves that existing tables are unaffected (Person mutation succeeds after the failed apply) — i.e. Phase A failure leaves no Lance-HEAD drift on tables outside the failed `added_tables` iteration. `docs/invariants.md` transitional notes: * §VI.23 (atomicity per query): annotated as upheld at the writer-trait surface for inserts / updates / scalar-index builds / merge_insert / overwrite after MR-793 PR #70. Per-table commit_staged → manifest publish window remains; closing requires MR-847's recovery-on-open reconciler. `delete_where` and `create_vector_index` remain inline pending lance#6658 / #6666. * §VII.35 (reconciler pattern): annotated as partial — staged primitives are the building blocks; the reconciler task itself is MR-848. * §VIII.45 (reference impl per trait): `TableStorage` has its primary impl on `TableStore` with opaque-handle signatures; no test impl yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 18:47:07 +02:00
Ragnor Comerford	3135ff5d19	MR-793 phases 1-6: TableStorage trait + staged-write surface for engine writers Hoists Lance's stage+commit two-phase write pattern from "discipline at each writer" to a sealed trait surface (`TableStorage`). New engine code that needs to advance Lance HEAD MUST go through `stage_` + `commit_staged`; the trait's opaque `SnapshotHandle` / `StagedHandle` types keep `lance::Dataset` and `lance::Transaction` out of trait signatures. Phases landed (see .context/mr-793-design.md for the full plan): 1a: `crates/omnigraph/src/storage_layer.rs` — `TableStorage` trait, sealed (only in-tree types can impl), single impl on `TableStore` delegating to existing inherent methods; `Omnigraph::storage()` accessor returns `&dyn TableStorage`. * 2: three new staged primitives — `stage_overwrite`, `stage_create_btree_index`, `stage_create_inverted_index` — implementing the simple branch of Lance's `CreateIndexBuilder::execute` (scalar indices only; vector indices stay inline because `build_index_metadata_from_segments` is `pub(crate)` in lance-4.0.0). Six new tests in `tests/staged_writes.rs` pin both the new primitives and the inline residuals (`delete_where`, `create_vector_index`). * 3: `tests/forbidden_apis.rs` — defense-in-depth integration test walks engine source, fails on direct lance::* inline-commit API use outside `table_store.rs` / `db/manifest/`. Skips comment lines and honors `// forbidden-api-allow:` sentinels. * 4: `ensure_indices` migration — scalar index builds now route through `stage_create__index` + `commit_staged` instead of `create__index(&mut Dataset)`. Vector indices stay inline (residual, named honestly at the call site). * 5: `branch_merge::publish_rewritten_merge_table` migration — the merge_insert phase now uses `stage_merge_insert` + `commit_staged`; delete phase stays inline (Lance #6658 residual, named honestly). * 6: `schema_apply` rewritten_tables migration — non-empty rewrites use `stage_overwrite` + `commit_staged`; empty-batch rewrites stay inline because `InsertBuilder::execute_uncommitted` rejects empty data. The narrow inline window is bounded by `__schema_apply_lock__`. Verified-green test surface: * `cargo test -p omnigraph-engine` — 68 lib + ~120 integration tests (incl. 6 new staged_writes tests + the new forbidden_apis test). * `cargo test -p omnigraph-engine --features failpoints --test failpoints` — 5 tests, all green. * `cargo test --workspace` — green. Deferred to follow-up sessions (see design doc §17 split): * Phase 1b — convert remaining engine call sites to `&dyn TableStorage` (mostly READS that don't touch the staged-write invariant). * Phase 7 — recovery-on-open reconciler (closes Phase B → Phase C residual across process restarts; new subsystem). * Phase 8 — index-coverage reconciler (full §VII.35 compliance — removes synchronous index work from the publish path). * Phase 9 — demote unused `TableStore` inherent methods to `pub(crate)` (depends on Phase 1b). Lance upstream blockers documented: * lance-format/lance#6658 — two-phase delete API (open, no PRs). * Companion: `build_index_metadata_from_segments` should be `pub` so vector-index builds can be staged outside the lance crate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 11:03:15 +02:00
Ragnor Comerford	044ed46019	chore: scrub Linear ticket numbers and review-bot mentions from code comments OmniGraph is OSS; internal Linear ticket references and code-review-bot mentions in source-code comments don't help external readers and leak internal tooling. Replace ticket numbers (MR-XXX) with descriptive prose, drop linear.app URLs, and remove inline mentions of Cursor/Bugbot/Cubic/Codex review threads. Scope is limited to source-code comments (`crates/`). Docs under `docs/` keep their MR-XXX references — those are part of the established change-history narrative for in-repo docs and don't require a Linear account to find context for. No behavior changes; no public API changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 22:45:38 +02:00
Ragnor Comerford	ea16c74329	MR-794 step 2: address Cursor Bugbot follow-ups on commits `3223b51` + `052b6e6` Four code/doc fixes from the latest Cursor Bugbot pass: * Misplaced doc comment in table_store.rs (Medium): the doc block intended for `scan_pending_batches` was, after my earlier edit, attached to `collect_string_column_values` because the new helper was inserted between the original docblock and `scan_pending_batches`. Move the docblock back onto its function and add a note about the shared SQL-dialect contract with the Lance scanner (the predicate goes to both, which is fine for `predicate_to_sql`'s plain comparison shapes today; future Lance-specific scanner extensions in the filter would need translation). * Missing null check on committed `id` column (Low): the committed-side loop in `collect_node_ids_with_pending` (and the parallel non-pending `collect_node_ids`) read `id_col.value(i)` without `is_valid(i)` first. `id` is the @key column on every node type and non-nullable by schema, so this is unreachable today, but the inconsistency with the pending-side `is_valid` guard is worth closing for symmetry / defense. * Misleading comment in count_pending_src_with_dedupe (Low): the comment claimed "fall back to naive counting" but the code did `continue`. Fix: it's unreachable in practice (the pending-side schema always contains the key when the caller passes one), so failing loudly with a typed error if it ever does fire is correct — silently skipping the batch would let `@card` violations slip past validation. * PendingTable.schema mismatch surfaces too late (Medium): PendingTable captures the schema from the first batch and never updates it. On a blob-bearing table, `insert` produces a full-schema batch and `update` (without assigning every blob) produces a subset-schema batch. Pre-fix the mismatch surfaced inside finalize/MemTable construction — distant from the offending op. Post-fix `MutationStaging::append_batch` validates the new batch's schema against the existing accumulator's schema and returns a typed error directing the caller to split the mutation. Error fires at the offending op, not at end-of-query. New helper `schemas_compatible` compares field name + data_type pairs; nullability and field metadata differences stay tolerated (downstream concat already permits those). Cubic Cursor Bugbot finding #5 (cascade delete edge re-open) self-resolved in the bot's own analysis ("logic appears sound on re-examination") — no action. New test on tests/runs.rs: * append_batch_rejects_mismatched_schema_in_blob_table_at_offending_op — pins the early-error path. Builds a blob-bearing schema, runs an `insert + update` query where the update doesn't assign the blob, asserts the error fires at the second op with the "Split the mutation" message and the manifest is unchanged. Local: tests/runs.rs 24/24 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 21:50:13 +02:00
Ragnor Comerford	052b6e680f	MR-794 step 2: address PR #68 follow-up review (Cubic) — pending dedupe + projection guard + CI Three new findings from Cubic on commit `3223b51`: * Pending edge cardinality counted within-input duplicates (P2): count_src_per_edge's pending walk added every row to the count, including duplicate rows that finalize will collapse via dedupe_merge_batches_by_id. A LoadMode::Merge with the same edge id twice would over-count → spurious @card violation. Fix: when dedupe_key_column is Some, walk pending in reverse, track seen keys via HashSet, count only the kept (last-occurrence) rows. Mirrors finalize-time dedupe so cardinality counts what stage_merge_insert actually publishes. * scan_with_pending silently disabled merge-shadow when projection omitted key_column (P2): if a caller passed Some("id") as key_column but their projection didn't include "id", the filter_out_rows_where_string_in helper passed batches through unchanged — silently degrading to union semantics. Fix: validate up front that projection contains key_column when both are Some; return a typed Lance error otherwise. Tightened the helper too: missing column is now an internal error (was a silent passthrough). * Cascade-vs-explicit delete test was too weak (P2): asserted only that edge count decreased after delete. The cascade alone could satisfy that even if the explicit second-delete silently no-op'd. Strengthened: assert post_knows == 0, which only holds when both ops landed (Bob→Diana would survive if op-2 no-op'd). CI gap: also added test_failpoints_feature job to .github/workflows/ci.yml. The workspace test runs without --features failpoints (the feature is behind a Cargo flag), so the failpoints test suite was never exercised by CI before now. The new job builds + runs `cargo test -p omnigraph-engine --features failpoints --test failpoints` on every full CI run, mirroring the test_aws_feature pattern. New tests on tests/runs.rs: * load_merge_mode_dedupes_within_pending_for_cardinality_count (Cubic P2 #2 — pending-vs-pending dedup, distinct from the load_merge_mode_dedupes_edge_for_cardinality_count test which covers committed-vs-pending dedup). * scan_with_pending_rejects_key_column_missing_from_projection (Cubic P2 #3 — verifies the up-front validation rejects bad callers and that the happy path still works correctly). Local test results: * tests/runs.rs: 23/23 passed * tests/failpoints.rs --features failpoints: 7/7 passed (includes the two new finalize→publisher residual tests landed in `3223b51`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 20:47:45 +02:00

1 2 3

123 commits