Commit graph

403 commits

Author SHA1 Message Date
Andrew Altshuler
61da7bf406
docs(cluster): descope ETL pipelines to a separate project; keep the socket (#172)
Pipelines (scheduler, connectors, mapping, idempotency, run ledger) leave the
cluster control-plane rollout and become their own project with their own
RFC. This rollout guarantees only the socket, all of which already exists and
is enforced: the pipelines: config field is reserved (typed
future_phase_field rejection, test-covered), the pipeline.<name> typed
address and Pipeline resource kind are reserved in the resource model, and
axiom 13 fixes the contract any future implementation must satisfy
(definition reconciled, execution data-plane, fan-out statusful). The ETL
section in the high-level spec stands as the requirements record for that
project; exit criterion 9 defers to its RFC.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:53:16 +03:00
Andrew Altshuler
14b85a59de
Merge pull request #173 from ModernRelay/feat/cluster-graph-delete-4c
feat(cluster): Stage 4C — gated graph delete; Phase 4 complete
2026-06-10 14:53:11 +03:00
aaltshuler
c949a2b717 docs(cluster): document Stage 4C — Phase 4 complete
Approvals + gated graph deletion in the user docs, the approve command in the
CLI reference, RFC-004 flipped to Landed with its three implementation
deviations recorded (row-8 retire-and-repropose, --as instead of --actor/--by,
consumed artifacts rewritten in place rather than moved).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:44:12 +03:00
aaltshuler
87691fe9c7 test(cluster): failpoint coverage for delete crash windows
- Crash before the removal: root intact, approval file unconsumed, sidecar
  survives, no ack; the next run retires the stale intent (row 8) and the
  still-approved delete completes in the same run.
- Crash after the removal, before the state CAS: root gone, ledger
  byte-identical, the sidecar carries the approval id; the next run's sweep
  rolls the tombstone forward, consumes the approval, audits the recovery,
  and converges (row 7b).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:34:54 +03:00
aaltshuler
d1d04217ab feat(cluster): execute approved graph deletes in cluster apply
Stage 4C execution half (RFC-004 §D5/§D6 + sweep rows 7/7b/8): an approved
graph.<id> delete — and its riding schema/query deletes — classifies Applied
and executes LAST in the run, sidecar-fenced: pre-op manifest pin (best
effort; partial roots still delete), approval_id carried in the sidecar,
recursive root removal (NotFound tolerated), subtree tombstoned out of the
ledger with a tombstone observation, the approval consumed in the same state
CAS (ledger summary) and its artifact file rewritten with consumed_at only
after the CAS lands — a failed run consumes nothing and the approval stays
valid for the retry.

Sweep rows: already-tombstoned intents retire (7); a completed delete with a
stale ledger rolls forward — tombstone + approval consumption + audit entry
(7b, idempotent); a still-present root retires the stale intent with a
graph_delete_incomplete warning and the still-approved delete re-executes in
the same run (8) — prefix removal is idempotent, so retry IS the repair.

The multi-graph mixed e2e gets its conclusion: blocked without approval,
cluster approve graph.engineering --as andrew, converge, tombstone visible
in status. Phase 4's disposition matrix is now fully executable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:34:02 +03:00
aaltshuler
f4e9105272 feat(cluster): cluster approve — digest-bound approval artifacts
RFC-004 §D4, gate half: graph deletes (and their subtree) now classify
Blocked/approval_required instead of Deferred; the new cluster approve
command (requires the global --as actor) writes
__cluster/approvals/{ulid}.json bound to the desired config digest and the
change's before/after digests, so config or state drift invalidates the
artifact automatically (approval_stale warning, never authorizes). One gate
per subtree: compute_approvals lists only the graph-level delete, and
ApprovalRequirement gains a satisfied flag surfaced by plan. Consumption and
the delete executor land next — until then approved deletes stay blocked so
a gate-only build can never strip state without removing the root.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:30:05 +03:00
Andrew Altshuler
f799d4578c
Merge pull request #171 from ModernRelay/feat/cluster-schema-apply-4b
feat(cluster): Stage 4B — cluster-driven schema apply
2026-06-10 14:03:31 +03:00
aaltshuler
f217352c93 docs(cluster): document Stage 4B schema apply
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:14:20 +03:00
aaltshuler
80cae4e8e1 test(cluster): failpoint coverage for schema-apply crash windows
- Crash before the engine call: sidecar (carrying the --as actor) survives,
  live schema and ledger untouched, no ack; the next run's sweep retires the
  stale intent and the same run applies and converges.
- Crash after the engine call, before the state CAS: the manifest moved with
  the post-op pin in the sidecar, state.json byte-identical; the next run's
  sweep rolls the ledger forward with a schema_apply audit entry and the run
  converges.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:13:15 +03:00
aaltshuler
a1ba4dc413 feat(cluster): execute schema applies in cluster apply
Stage 4B (RFC-004 §D1/§D5): schema.<id> Update changes classify Applied and
execute after graph creates, sequentially and sidecar-fenced — read-write
open (the engine's own recovery runs first), pre-op manifest pin recorded,
apply_schema_as with allow_data_loss: false (soft drops only; hard drops wait
for 4C's approval artifacts), post-op pin rewritten into the sidecar, sidecar
retired only after the final state CAS. Queries gated on a same-plan schema
update unblock (the migration lands first in the same run); failures —
unsupported migrations, lock contention, user branches — surface as
schema_apply_failed with the engine's message, demote dependents via the
origin-aware demotion helper, and stop further graph-moving work.

Schema evolution is now fully cluster-driven (the defer -> manual schema
apply -> refresh loop is gone), and out-of-band schema drift is converged
back by apply as an ordinary soft migration (axiom 8: drift correction is
gated like any change; the recoverable tier needs no approval) — both pinned
by reworked e2es. The multi-graph mixed e2e's deferred row is now
delete-shaped, pre-staging the 4C surface.

Actor: cluster apply accepts the CLI's global --as via the new ApplyOptions /
apply_config_dir_with_options (apply_config_dir delegates unchanged); the
actor is echoed in ApplyOutput and recorded in sidecars and audit entries,
and threads to apply_schema_as so Cedar fires wherever a checker is
installed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:12:15 +03:00
aaltshuler
0571c05ebb feat(cluster): schema-apply recovery sidecar kind and sweep
RecoverySidecarKind::SchemaApply with digest-based sweep classification
(robust to unrelated manifest movement; version pins stay forensic):
ledger-consistent -> sidecar retired (RFC-004 rows 1+2); live digest matches
the intended schema, state stale -> roll forward with composite recompute and
a recovery_records audit entry (row 3); unverifiable or unexpected digests ->
pending, kept, graph-moving work blocked (rows 1-unopenable/6).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:05:42 +03:00
aaltshuler
ca63a9340b feat(cluster): embed schema migration previews in cluster plan
RFC-004 §D7's data-aware preview: for every schema update, plan opens the
live graph read-only and embeds the engine's migration plan (supported flag
+ typed steps) in the change record; the human renderer prints the steps.
Preview failures (unreachable graph, planner error) degrade to the digest
diff with a schema_preview_unavailable warning — planning never blocks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:04:19 +03:00
aaltshuler
b313075476 refactor(cluster): make plan_config_dir async
Mechanical conversion ahead of Stage 4B (plan will preview schema migrations
against live graphs): signature, CLI dispatch, and test callers. Zero
behavior change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:02:12 +03:00
Andrew Altshuler
e6921157cc
Merge pull request #170 from ModernRelay/feat/cluster-graph-create-4a
feat(cluster): Stage 4A — graph create in cluster apply
2026-06-10 05:19:17 +03:00
aaltshuler
cb6c67f196 docs(cluster): document Stage 4A graph create
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 05:00:42 +03:00
aaltshuler
83d77bcb16 test(cluster): failpoint coverage for graph-create crash windows
- Crash before the init (row 1): sidecar survives, nothing moved, no ack;
  the next run's sweep removes the intent and the same run creates and
  converges.
- Crash after the init, before the state CAS (row 4): the graph exists with
  the post-init manifest pin in the sidecar, state.json byte-identical; the
  next run's sweep rolls the ledger forward with a recovery_records audit
  entry and the run converges.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 04:59:48 +03:00
aaltshuler
c3007369cd feat(cluster): execute graph creates in cluster apply
Stage 4A (RFC-004 §D1/§D5): graph.<id> Create — and its paired schema Create,
which the init carries — classify Applied and execute first in the run,
sequentially and sidecar-fenced: sidecar written before Omnigraph::init at
the derived root, rewritten with the post-init manifest pin, deleted only
after the final state CAS lands. Dependent queries and policies no longer
block on a graph create in the same plan — creates run first, so they apply
in the same run; a create failure demotes them to blocked
(dependency_not_applied) and stops further graph-moving work (loud partials),
with the sidecar left for the sweep to classify. Graphs with a kept recovery
sidecar (rows 5/6) classify Blocked/cluster_recovery_pending, and the sweep's
Drifted/Error statuses are never clobbered by a generic Blocked.

Schema source is re-read and digest-verified under the lock before the init
(the write_resource_payload TOCTOU posture). Plan previews the same
dispositions. e2e fallout updated: a fresh multi-graph config now converges
in one apply; a destroyed root is re-created as an EMPTY graph by the next
apply (declarative convergence — visible in plan, called out in docs); the
new cluster_e2e_declared_graph_created_by_apply pins the no-manual-init flow.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 04:58:56 +03:00
aaltshuler
bf8cc7a753 feat(cluster): graph-create recovery sidecars and sweep
RFC-004 §D2/§D3 for the graph_create kind. RecoverySidecar records intent
under __cluster/recoveries/{ulid}.json; the roll-forward-only sweep runs at
the start of apply/refresh/import under the state lock and classifies each
survivor by observation: root absent -> intent removed (row 1); outcome
already recorded -> retired (row 2); create completed but state stale ->
ledger rolled forward with a recovery_records audit entry (row 4); partial
root -> Error/graph_create_incomplete, kept, never auto-deleted (row 5);
unexpected schema -> Drifted/actual_applied_state_pending, kept (row 6).
Sweep mutations ride the command's existing CAS write; completed sidecars
are deleted only after that write lands. Read-only status/plan warn
(cluster_recovery_pending) without acting. The apply payload gate now counts
only payload-phase errors so kept-sidecar diagnostics don't abort the run
before their statuses persist.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 04:50:42 +03:00
aaltshuler
6fbf09d5c9 refactor(cluster): make apply_config_dir async
Mechanical conversion ahead of Stage 4A graph create (which calls the async
Omnigraph::init from inside apply): the fn signature, the CLI dispatch arm,
and every test caller (#[test] -> #[tokio::test]). Zero behavior change; all
60 lib tests and 3 failpoint tests green before and after.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 04:43:38 +03:00
Andrew Altshuler
26b26999fd
ci(codeowners): aaltshuler owns all paths; remove ragnorc (#169)
Engineering and docs roles both resolve to @aaltshuler; every path
(catch-all, crates/**, docs/**, repo-level docs) now requires their review.
CODEOWNERS and the doc tables regenerated from codeowners-roles.yml via
render-codeowners.py.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 04:34:17 +03:00
Andrew Altshuler
58c66a54a2
docs(cluster): RFC-004 — graph & schema apply design (Phase 4) (#168)
* docs(cluster): RFC-004 — graph & schema apply design (Phase 4)

The design the implementation spec's exit criteria require before
graph-moving cluster apply ships. Core positions:

- Cluster recovery is roll-forward-only: the engine's own sidecars make every
  graph-level operation atomic within the graph, so the cluster never rolls a
  graph back — its sidecars (__cluster/recoveries/{ulid}.json) classify and
  record, converging the ledger to observable reality (axiom 5) or surfacing
  a loud pending-repair condition. Eight-row decision matrix, every row
  testable with the Stage 3B failpoint harness.
- Irreversible operations (graph delete, allow_data_loss schema apply)
  consume digest-bound approval artifacts written by a new cluster approve
  command and retired into state.approval_records (axiom 11). A stale
  approval can never authorize a different change.
- cluster apply gains an actor, threaded to apply_schema_as so engine Cedar
  enforcement and commit attribution work unchanged; the cluster adds no
  policy engine of its own.
- Deterministic ordering (creates -> schema applies -> catalog -> deletes),
  per-resource apply groups, cross-graph atomicity explicitly not promised.
- Staged 4A graph create / 4B schema apply / 4C graph delete, each gated on
  per-matrix-row failpoint tests.

Answers exit criteria 2 and 4 fully, 1/5/6 partially; 3/7/8/9 deferred to
their phases (coverage table in the RFC). Linked from the dev index and the
implementation spec's Phase 4 section.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs(cluster): RFC-004 review fixes — graph_delete sweep rows, state_cas_base contract

Two greptile findings: (1) D3 row 2 could not be evaluated for graph_delete
(no manifest to version-check after prefix removal) and 'root absent, state
already tombstoned' fell into the stale row — split into rows 7 (delete's
analog of row 2) and 7b (the roll-forward), with expected_manifest_version
documented as always null for the delete kind. (2) state_cas_base is now
explicitly audit/diagnostics-only — the sweep never consults it; independent
state mutations are handled by the ordinary CAS like any concurrent write.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 04:34:14 +03:00
Andrew Altshuler
effb9cc068
Merge pull request #167 from ModernRelay/feat/cluster-stage3b
feat(cluster): Stage 3B — catalog payload verification + failpoint coverage
2026-06-10 03:17:11 +03:00
aaltshuler
16759b28b9 fix(cluster): RAII-guard the callback failpoint
ScopedFailPoint::with_callback gives cfg_callback the same Drop-based cleanup
as cfg actions; a panic while the point is active no longer leaks the callback
into the process-global registry where it would fire under later tests
(greptile review, PR #167).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 02:36:24 +03:00
aaltshuler
08ea659c9b build: commit Cargo.lock for omnigraph-cluster's optional fail dependency
The failpoints feature added fail = { workspace = true, optional = true } to
the crate manifest; the lockfile edge belongs with it (--locked CI gate).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 02:21:10 +03:00
aaltshuler
50543a8ce0 docs(cluster): record Stage 3B failpoint + verification coverage
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 02:15:13 +03:00
aaltshuler
211b37e6de test(cluster): failpoint tests for crash-mid-apply and state CAS race
The apply-side coverage the implementation spec's hard gate requires before
Phase 4 graph-moving apply:

- crash after the payload phase: state.json byte-identical, blobs inert on
  disk, lock released, no phantom statuses, nothing acknowledged; a plain
  re-run repairs via skip-if-exists blob reuse.
- CAS race: a cfg_callback rewrites state.json at the exact read->write
  window (the state.lock:false concurrent-writer scenario); apply surfaces
  state_cas_mismatch, acknowledges nothing, reports the persisted status
  snapshot, leaves the concurrent writer's state on disk; a re-run converges.

CI's failpoints step now runs both the engine and cluster suites.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 02:14:06 +03:00
aaltshuler
21b531605f feat(cluster): failpoint infrastructure mirroring the engine
Optional failpoints feature (dep:fail + fail/failpoints, deliberately NOT
enabling omnigraph/failpoints), a maybe_fail/ScopedFailPoint module returning
Diagnostic-typed injected errors, and two call sites in apply_config_dir:
cluster_apply.after_payload_phase (the crash point: blobs on disk, state
untouched) and cluster_apply.before_state_write (routes through the
persisted-statuses revert contract; a cfg_callback here can mutate state.json
to make the CAS check fail organically). Feature off compiles to Ok(()) —
zero behavior change. Tests live in a separate integration binary because the
fail registry is process-global. Also refresh the crate description (stale
'read-only' since Stage 3A).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 02:12:59 +03:00
aaltshuler
acb3f1cc14 test(cli): e2e for catalog payload drift self-heal loop
status warns read-only -> refresh persists drift and drops the digest ->
apply republishes the blob -> status clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 02:08:14 +03:00
aaltshuler
15868972ff feat(cluster): verify catalog payload blobs in status and refresh
Closes the Stage 3A product gap where a deleted or corrupted blob under
__cluster/resources/ went unnoticed forever (status reported converged and
apply could not repair it because the digests matched).

verify_catalog_payloads checks every query/policy digest in state against its
content-addressed blob (existence + full sha256 re-hash; graph/schema/unknown
addresses have no payloads and are skipped). status reports findings read-only
(warnings catalog_payload_missing/_mismatch; error catalog_payload_read_error
— an unverifiable catalog must not report healthy). refresh closes the
self-heal loop: missing/mismatched blobs mark the resource drifted and remove
its digest from state so the next plan proposes a create and the next apply
republishes; unreadable blobs keep the digest (no spurious republish), mark
error, and exit non-zero. Verification runs before graph observation so the
recomputed graph composite already excludes removed query digests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 02:07:08 +03:00
Andrew Altshuler
b6d228ff54
test(cli): cluster e2e hardening — lost-state recovery, out-of-band drift, root destruction, multi-graph convergence (#166)
Four lifecycle compositions over the spawned binary that pin spec claims no
single-command test proves:

- Lost ledger: delete state.json -> re-import from the live graph -> re-apply
  converges onto the same content-addressed blobs (axiom 5's reconstructable-
  state resilience edge, end to end).
- Out-of-band schema apply (the Sarah/Bob violation): refresh marks
  graph/schema Drifted with schema_mismatch, status and plan surface it, and
  cluster apply refuses to silently correct it — state keeps the LIVE schema
  digest (drift correction is gated, axiom 8).
- Destroyed graph root: refresh records graph_missing drift and drops
  graph/schema digests while preserving query/policy; plan proposes deferred
  creates only; apply moves nothing and the catalog stays intact.
- Two graphs (one live, one not yet created) + a graph-spanning policy + a
  cluster-scoped policy: a single apply yields all four dispositions at once
  (applied/derived/deferred/blocked, deterministically ordered), then the
  second graph appears, refresh observes it, and apply converges.

Helpers: init_named_cluster_graph generalizes init_cluster_derived_graph;
write_multi_graph_cluster_fixture builds the two-graph config.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:59:20 +03:00
Andrew Altshuler
2d1c25d3fa
Merge pull request #165 from ModernRelay/feat/cluster-apply-stage3a
feat(cluster): Stage 3A — config-only cluster apply
2026-06-10 00:52:54 +03:00
aaltshuler
7f3ecf282a Merge origin/main (#164 axiom-15 docs, #86 TableStorage migration) into feat/cluster-apply-stage3a
Clean auto-merge; also fix the stale 'Stage 2C accepts' line in
cluster-config.md to Stage 3A.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:45:44 +03:00
aaltshuler
69b63c33ac Merge remote-tracking branch 'origin/main' into feat/cluster-apply-stage3a 2026-06-10 00:45:03 +03:00
Andrew Altshuler
cec65b8ef8
docs(cluster): axiom 15 — single ownership, mode-switch migration, per-operator layer (#164)
Encode the omnigraph.yaml ↔ cluster.yaml coexistence rules that were implicit
across the specs:

- cluster-axioms.md: new axiom 15 — every fact has exactly one owner at a time;
  coexistence is a mode switch, never a merge; omnigraph.yaml's job description
  shrinks to the permanent per-operator layer. Added review-tension bullet.
- cluster-config-specs.md: "Migration model" subsection (three coexistence
  windows: no-conflict, Phase-5 mode switch, bridges-with-sunsets) and a
  "per-operator layer" completeness table (connection, credential reference,
  active context, ergonomics, personal aliases) with its global-config-dir
  destination per the RFC-002 direction.
- cluster-config-implementation-spec.md: Compatibility Stance #7–#9 (single
  ownership, shrinking role, bridges carry sunsets); Phase 5 boot is an
  exclusive XOR mode switch; fixed the duplicated recoveries/recovery dirs in
  the Phase-1 storage layout.
- docs/user/cluster-config.md: "Relationship to omnigraph.yaml" section in
  current-reality terms (cluster catalog is inspectable, not live).

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:44:51 +03:00
aaltshuler
5e1dede08f fix(cluster,cli): apply failure output — persisted statuses only, changes list printed
Two review findings (greptile, PR #165):

- ApplyOutput.resource_statuses on a failed state write now carries the
  pre-apply on-disk snapshot instead of the in-memory mutations that were
  never persisted, so automation reading the field independently of `ok`
  cannot see phantom applied/blocked statuses. Regression test forces the
  state write to fail via a read-only __cluster dir (unix-only, skips when
  permissions are not enforced).
- Human-mode `cluster apply` prints the classified changes list on failure
  too, so an operator debugging a partial apply without --json sees what was
  attempted.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:35:03 +03:00
devin-ai-integration[bot]
2c578a60b2
(feat) convert engine call sites to &dyn TableStorage; demote legacy TableStore methods to pub(crate) (#86)
* MR-854: convert engine call sites to &dyn TableStorage; demote legacy methods

Phase 1b: every db.table_store.X(...) call site converts to
db.storage().X(...), reaching the storage layer through the sealed
TableStorage trait (returns &dyn TableStorage). Opaque SnapshotHandle
and StagedHandle replace bare lance::Dataset and Transaction in the
threaded values.

Phase 9: the inherent inline-commit methods on TableStore
(append_batch, merge_insert_batch{,es}, overwrite_batch,
create_btree_index, create_inverted_index) demote from pub to
pub(crate). Their only remaining direct users are table_store.rs
itself and the bulk loader's LoadMode::{Append, Overwrite, Merge}
concurrent fast-paths in loader::write_batch_to_dataset (no
two-phase shape in Lance 4.0.0 — closes after lance#6658 and #6666).

Docs:
- invariants.md \u00a7VI.23: drop "at the writer-trait surface"
  qualifier; staged primitives are now the only engine surface.
- runs.md: residual matrix shrinks to delete_where and
  create_vector_index (the two upstream-blocked residuals).
- forbidden_apis.rs: replace transitional language with the
  current allow-list shape (table_store.rs + loader concurrent
  fast-path only).

Files touched:
- changes/mod.rs, db/omnigraph.rs (+export/optimize/schema_apply/
  table_ops.rs), exec/{merge,mod,mutation,staging}.rs,
  loader/mod.rs, storage_layer.rs, table_store.rs,
  tests/forbidden_apis.rs, docs/{invariants,runs}.md.

Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>

* MR-854: replace test-only inline-commit append callers with local Lance helpers

After demoting TableStore::append_batch from pub to pub(crate), the
integration tests in tests/recovery.rs and tests/staged_writes.rs
that previously called store.append_batch(...) directly to simulate
HEAD-ahead-of-manifest drift can no longer access the inherent
method. Replace those calls with small in-test helpers that do a raw
Dataset::append (the same body the inherent method runs).

- tests/helpers/mod.rs gains lance_append_inline (shared helper).
- tests/staged_writes.rs gets a file-local lance_append_inline_local
  (staged_writes.rs does not import helpers::).
- tests/recovery.rs drops the unused TableStore import in the one
  function whose store binding became unused after the conversion.

Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>

* MR-854: retrigger CI for flaky Test Workspace job

Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>

* MR-854: convert remaining table_store call sites in export.rs / read_blob

Two leftover `self.table_store.X` / `db.table_store.X` call sites were
missed in the initial sweep — flagged by Devin Review on PR #86. Both
now go through the trait surface:

- `entity_from_snapshot` (db/omnigraph/export.rs): switch from
  `db.table_store.open_snapshot_table` + `db.table_store.scan` to
  `db.storage().open_snapshot_at_table` + `db.storage().scan`.

- `read_blob` (db/omnigraph.rs): replace
  `snapshot.open(table_key)` + `self.table_store.first_row_id_for_filter`
  with `self.storage().open_snapshot_at_table` +
  `self.storage().first_row_id_for_filter`. The follow-up
  `take_blobs` call still needs an `Arc<Dataset>` (it's a Lance blob
  accessor not surfaced through the trait), so we hand off via
  `SnapshotHandle::into_arc()` with a comment.

After this commit, no engine code outside `table_store.rs` reaches the
inherent `TableStore` API — the docs/runs.md and docs/invariants.md
claim is now uniformly true.

Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>

* MR-854: post-rebase doc fixes (Lance 6.0.1, MR-A framing, into_dataset note)

Reviewer feedback on the rebased PR:

* docs/dev/writes.md residuals matrix: drop demoted methods from the trait-surface table (now `pub(crate)`); keep only the two genuine trait-surface residuals (`delete_where`, `create_vector_index`); reframe under MR-A (Lance v7.x bump) per docs/dev/lance.md.

* tests/forbidden_apis.rs: update transitional allow-list header to (a) drop the truncate_table mislabel (truncate_table is a Lance Dataset method, not a TableStore method — overwrite_batch's internal call), (b) reframe trait-surface residuals under MR-A / Lance #6666.

* crates/omnigraph/src/storage_layer.rs::SnapshotHandle::{into_arc, into_dataset}: add single-ref invariant doc — both consume Arc via try_unwrap-or-clone; sibling SnapshotHandle clones across an await point force a deep Dataset clone.

* Replace lance-4.0.0 version refs with lance-6.0.1 in active source/test/dev-doc comments (storage_layer.rs, table_store.rs, table_ops.rs, schema_apply.rs, merge.rs, recovery.rs, staged_writes.rs, consistency.rs, docs/dev/execution.md, docs/user/query-language.md). Historical refs in docs/releases/v0.4.1.md and the canonical "Lance 4.0.0 → 6.0.1 migration" line in docs/dev/lance.md left intact.

No engine code changes.

* MR-854: update docs/dev/invariants.md Storage trait row + gap entry

Reviewer feedback: the docs reorg landed; the invariant row now lives in
docs/dev/invariants.md with stable headings (no more numbered §VI.23).

Update two pieces to reflect MR-854 completion:

* Status table 'Storage trait' row: was 'full call-site migration ... incomplete';
  now 'engine call sites all route through db.storage() (MR-854); inline-commit
  inherent methods are pub(crate)-demoted; capability/stat surfaces are roadmap'.

* 'Known Gaps' 'Storage abstraction' entry: was 'older inherent TableStore call
  sites and inline residuals remain'; now names the closed scope (MR-854 — call
  sites migrated, methods demoted, loader fast-paths) and the remaining
  trait-surface residuals under MR-A (Lance v7.x bump) and Lance #6666.

Cross-links to docs/dev/lance.md and docs/dev/writes.md so the framing stays
co-located with the canonical Lance surface tracking.

* MR-854: remove dead inline-commit methods from the storage surface

The loader concurrent fast-path (write_batch_to_dataset) is only reached
for LoadMode::Overwrite — Append/Merge route through MutationStaging — so
its Append/Merge arms were unreachable. Collapse it to overwrite-only and
drop the now-unused mode params, which removes the only callers of:

- TableStorage::append_batch + TableStorage::merge_insert_batches (trait)
- TableStore::merge_insert_batch + merge_insert_batches (inherent)

create_btree_index / create_inverted_index had zero callers anywhere
(scalar index builds use the stage_* primitives). Remove both from the
trait and the inherent impl.

Inherent append_batch stays pub(crate): overwrite_batch and recovery
tests use it. Migrate the one trait-append_batch test caller
(seed_person_row) to stage_append + commit_staged. The merge_insert
FirstSeen-workaround rationale moves from the deleted merge_insert_batch
into stage_merge_insert (now the sole merge path). No behavior change.

Also corrects the inaccurate loader residual comment (the prior text
blamed Lance #6658/#6666, which are the delete and vector-index issues,
for keeping overwrite inline; a stage_overwrite primitive already exists
and schema_apply uses it).

* MR-854: seal db.storage() to staged-only; move residuals to InlineCommitResidual

Split the three remaining inline-commit writes (overwrite_batch,
delete_where, create_vector_index) off the TableStorage trait onto a new
sealed InlineCommitResidual trait, reachable only via the explicit
Omnigraph::storage_inline_residual() accessor. db.storage() now exposes
only staged primitives + reads, so engine code cannot couple a write
with a Lance HEAD advance through the default surface — MR-793 acceptance
§1 ("no public method commits as a side effect of writing") now holds by
construction, not by review + naming.

Call sites moved to storage_inline_residual(): loader overwrite
fast-path, the three mutation delete_where paths, the branch-merge
delete, and the vector-index build. Impl bodies are unchanged (same
delegation to the pub(crate) inherent methods); this is a pure surface
reshape with no behavior change.

The residual trait holds two genuinely upstream-blocked methods
(delete_where -> Lance #6658/v7.x, create_vector_index -> Lance #6666)
plus overwrite_batch, kept for the loader's cross-table bulk-overwrite
concurrency until its staged migration lands (tracked follow-up).

* MR-854 docs: describe the staged-only seal; fix stale Lance index URLs

- writes.md / invariants.md / AGENTS.md: the inline-commit residuals now
  live on InlineCommitResidual behind db.storage_inline_residual(), so
  acceptance §1 holds by construction rather than 'option (b)' per-method
  enumeration. Drop the inaccurate 'until Lance exposes
  Operation::Overwrite { fragments }' claim (that op exists; stage_overwrite
  already builds it) and reframe overwrite_batch as a removable legacy
  residual gated on the loader's bulk-overwrite concurrency.
- forbidden_apis.rs: rewrite the allow-list doc for the split surface.
- lance.md: the index spec pages moved from /format/table/index/ to
  /format/index/ in Lance 6.x (the old paths 404). Fix all 13 URLs.

* MR-854: fix stale lance-4.0.0 comment refs flagged in review

Addresses greptile (exec/merge.rs) and aaltshuler's stale-version blocker:
update lance-4.0.0 -> 6.0.1 in the comment/doc refs within this PR's
footprint (exec/merge.rs, exec/mutation.rs, docs/dev/writes.md). Also
corrects exec/merge.rs to cite lance#6666 (not #6658) for
build_index_metadata_from_segments — that is the vector-index segment-commit
API; #6658 is the two-phase delete. (Pre-existing 4.0.0 refs in untouched
files like architecture.md/storage.md are main's incomplete migration
cleanup, left out of scope.)

* fix(storage): stage loader overwrites

* fix(storage): stage empty schema rewrites

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Ragnor Comerford <ragnor.comerford@gmail.com>
Co-authored-by: Ragnor Comerford <hello@ragnor.co>
2026-06-09 23:03:08 +02:00
aaltshuler
d870eaaf3f test(cli): cluster lifecycle e2e — real-graph import/apply/refresh, schema-change loop, force-unlock retry
Three composition tests over the spawned binary against a real derived graph:

- import -> plan (dispositions) -> apply -> status -> refresh -> plan-empty,
  then a query edit round-trip. Pins that refresh and apply recompute the
  graph composite digest identically — divergence would silently re-open
  the plan forever and no single-command test would catch it.
- The Stage 3A operator workflow across the control/data-plane boundary:
  cluster apply defers a schema change, omnigraph schema apply executes it,
  cluster refresh observes it, the next cluster apply re-converges.
- Held lock refuses apply, force-unlock clears it, retried apply converges.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 23:44:49 +03:00
aaltshuler
40a21e4e77 docs(cluster): document Stage 3A config-only cluster apply
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 23:36:33 +03:00
aaltshuler
bcef8444dd feat(cli): omnigraph cluster apply
Terraform-style: apply executes directly (cluster plan is the preview, now
annotated with apply dispositions). Human output prints per-change
dispositions, convergence, and the catalog-only caveat; --json emits the full
ApplyOutput. Exit is non-zero only on errors — deferred/blocked changes are
warnings with converged: false as the automation signal.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 23:34:48 +03:00
aaltshuler
1f8e5945cf feat(cluster): config-only apply with content-addressed catalog publish
apply_config_dir executes the query/policy subset of the plan: payloads are
written content-addressed under __cluster/resources/{query,policy}/... before
the state CAS (state is the publish point; orphaned blobs from a failed CAS
are inert and re-apply is the repair), then state.json is CAS-updated with
applied digests, Applied/Blocked statuses, and a revision bump. Graph/schema
changes are never executed here: schema content and graph lifecycle defer to
a later phase with loud warnings, while graph.<id> composite-digest updates
whose schema component is unchanged converge automatically via recomputation
from state's own components (without which apply could never converge).
Idempotent re-apply leaves state bytes and revision untouched.

PlanChange gains optional disposition/reason fields, populated by the same
classifier in cluster plan, so plan is an honest preview of what apply will
execute, derive, defer, or block.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 23:32:13 +03:00
Andrew Altshuler
171a8c5d13
docs(releases): attribute the __run__ sweep (MR-770) to v0.6.2, not v0.6.1 (#161)
The v0.6.2 notes omitted the MR-770 `__run__` cleanup entirely, and the v0.6.1
notes wrongly claimed it shipped in v0.6.1. The code (the `migrate_v2_to_v3`
`__manifest` sweep + `is_internal_run_branch`/`run_registry.rs` removal) first
appears at the v0.6.2 tag via #132 and is absent at v0.6.1.

- v0.6.2: add the MR-770 highlight, correct the manifest-stamp note to describe
  the v2→v3 auto-migration on first read-write open (with the read-only caveat),
  and mention the cleanup in the intro.
- v0.6.1: replace the two over-claiming `__run__` lines with corrections that
  point to v0.6.2.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 22:31:50 +03:00
aaltshuler
89b876c797 Add cluster state lock recovery 2026-06-09 22:31:46 +03:00
Andrew Altshuler
737a0f6e45
Merge pull request #159 from ModernRelay/codex/cluster-config-stage2-refresh-import
[codex] add cluster refresh and import
2026-06-09 21:25:08 +03:00
aaltshuler
d00d42274e Implement cluster refresh and import 2026-06-09 21:17:23 +03:00
Ragnor Comerford
e0d88d1295
fix(unique): collision-free tuple key shared by intake and merge, loud on un-keyable types (#160)
* fix(unique): collision-free tuple key shared by intake and merge, loud on un-keyable types

Hardening on top of #133. That PR introduced a shared
`loader::composite_unique_key(parts)` joining per-column scalars with U+001F
and routed both intake and branch-merge through it, closing the original
'|' vs U+001F separator drift. This takes the shared keying the rest of the
way to correct-by-design:

- Collision-free by construction: the key is now the tuple of per-column
  scalar strings (Vec<String>) keyed directly, no separator, so no data value
  (not even a literal U+001F) can forge a collision.
- One scalar converter across both paths: intake used an explicit type-match,
  merge used Arrow's array_value_to_string. Both now derive the key through
  composite_unique_key(group_columns, row), so they can't drift on conversion.
- Loud on un-keyable types: the scalar converter returned None for any Arrow
  type it didn't recognize, and the caller treated None as null-exempt, so a
  @unique on a column type it couldn't reduce (list, blob) was silently
  un-enforced. It now returns Err, surfacing the constraint it can't enforce
  instead of weakening it in silence.

Tests:
- consistency::composite_unique_key_is_consistent_across_intake_and_merge pins
  that intake and merge key the tuple identically (load-on-branch then merge
  of values containing '|').
- loader unit tests pin tuple keying + null exemption and the loud error on an
  un-keyable (binary) column.

Docs: invariants truth-matrix updated; stale loader/mod.rs line pointers fixed.
Scope unchanged: intra-batch / merge-candidate-set only; cross-version
uniqueness against committed rows stays a documented gap.

* fix(unique): cover all string encodings; make format_tuple private (PR #160 review)

Addresses two Greptile P2 comments on PR #160:

- unique_key_scalar handled only StringArray (Utf8). The loud-on-unknown-type
  behavior turned any legal string column that read back as LargeUtf8 or
  Utf8View into a hard write failure (the old code silently returned None). Add
  LargeStringArray and StringViewArray arms so a legal string column is keyable
  in every physical Arrow encoding; the Err path now fires only for a genuinely
  un-keyable logical type (list/blob/vector), never a legal value in an
  unenumerated encoding.
- format_tuple was pub(crate) but only used within loader/mod.rs; make it a
  private fn (matches the old format_unique_columns it replaced, minimal
  exposed surface).

New unit test unique_key_scalar_handles_all_string_encodings pins that Utf8 /
LargeUtf8 / Utf8View all render rather than error.
2026-06-09 19:28:21 +02:00
Ragnor Comerford
dbfdddc952
feat(engine): indexed graph traversal (#149)
* perf(engine): route Expand node hydration through the id BTREE via structured filter

hydrate_nodes built an `id IN (...)` SQL string applied via Scanner::filter,
which DataFusion evaluates with InListEval (O(N×M)) rather than using the id
BTREE scalar index — measured at 72× the indexed cost on a 100k-node hop
(MR-376). Build the id IN-list as a structured DataFusion Expr, AND it with
the pushable destination filters, and apply via Scanner::filter_expr (the same
path execute_node_scan already uses); Lance then compiles it to
scalar-index-search -> take.

Destination-filter pushability is now decided by ir_filter_to_expr (structured)
instead of ir_filter_to_sql, so list-contains (array_has) pushes down too.
Removes the now-dead string-filter helpers build_lance_filter, ir_filter_to_sql,
and ir_expr_to_sql; literal_to_sql stays (still used by the mutation delete path).

* feat(engine): add TableStore::scan_edges_by_endpoint for indexed neighbor lookup

Static helper returning edge rows that match a set of endpoint keys on src/dst,
projected to [key_col, opposite_col], via a structured `key_col IN (keys)`
filter_expr. Lance routes it through the persisted BTREE on the endpoint column
(index-search -> take), so cost scales with the frontier size rather than |E|.

Unused until execute_expand's indexed mode lands; isolated in its own commit so
the storage-layer primitive is reviewable on its own.

* feat(engine): add BTREE-indexed Expand traversal path

Split execute_expand into a dispatcher over execute_expand_csr (the existing
in-memory CSR BFS, unchanged) and a new execute_expand_indexed that serves each
hop by batching the frontier into one scan_edges_by_endpoint call against the
persisted src/dst BTREE (index-search -> take), then fans out per source row.
Both share expand_hydrate_and_align — the destination hydration + alignment +
hconcat + in-memory non-pushable filters — which now aligns by string id (a
HashMap) instead of a dense row-id vec, so one tail serves both modes.

Mode selection is OMNIGRAPH_TRAVERSAL_MODE for now (default csr); the
frontier-size auto policy and lazy CSR build follow. AntiJoin stays on CSR.

tests/traversal_indexed.rs (its own #[serial] binary, so env writes never race a
reader) asserts the indexed path matches CSR for one-hop, multi-hop, cross-type,
and no-match cases, and that a freshly-appended unindexed edge is still found
(partial index coverage — fast_search=false unindexed-fragment scan).

* feat(engine): frontier-size Expand dispatcher + lazy CSR build

Replace the env-only mode switch with an auto policy: Expand uses the
BTREE-indexed path when the source frontier is small and the hop count bounded
(OMNIGRAPH_EXPAND_INDEXED_MAX_FRONTIER=1024, OMNIGRAPH_EXPAND_INDEXED_MAX_HOPS=6),
else the in-memory CSR. OMNIGRAPH_TRAVERSAL_MODE=indexed|csr still forces a mode.

Make the CSR index lazy: thread a GraphIndexHandle (memoizing OnceCell over a
Cached/Direct/None builder) through execute_query/execute_pipeline/
execute_rrf_query/execute_anti_join instead of a pre-built Option<&GraphIndex>.
A query served entirely by the indexed path with no AntiJoin never pays the
O(|E|) CSR build — the perf win of Tier 3. AntiJoin still realizes the index
(its negation uses CSR has_neighbors).

Net effect: selective traversals (the common case) skip the whole-graph CSR
build and resolve neighbors from the persisted, incrementally-maintained
src/dst BTREE. Existing traversal/aggregation/end_to_end/search suites now run
the indexed path by default and stay green.

Docs: constants.md (new env knobs), query-language.md (Expand dual path),
indexes.md (graph index is lazy + the indexed alternative).

* test(engine): bench indexed vs CSR selective traversal

Add a selective single-source knows{1,2} comparison to bench_expand: per growing
|E|, time the cold query in csr vs indexed mode (fresh db each, so CSR pays its
O(|E|) build) and assert both modes return identical rows — a guard against the
scalar-index physical_rows silent fallback dropping unindexed-fragment rows. The
existing dense hop1/2/3 latency bench is unchanged.

* feat(engine): surface silent scalar-index fallback in indexed traversal (C6)

Add TableStore::key_column_index_coverage — a metadata-only check (no IO) of
whether a `key_col IN (...)` scan will be served by the persisted BTREE or
silently fall back to a full filtered scan, mirroring Lance's own decision:
no BTREE on the column, or any fragment missing physical_rows (which disables
scalar indices for the whole scan, lance dataset/scanner.rs create_filter_plan).
execute_expand_indexed calls it once per traversal and tracing::warn!s on
Degraded, so the perf cliff is observable instead of hidden behind a bench oracle.

Detection-only: results are correct either way (the scan returns all rows). Closes
the "no silent failures" gap the traversal best-practice audit flagged as the top
deviation, and adds an IndexCoverage value a future cost-based planner can consume.

* perf(engine): dense-id BFS on the indexed traversal path (C3)

execute_expand_indexed ran its per-source BFS in string space
(Vec<HashSet<String>>, HashMap<String,Vec<String>>, ~4 String clones per neighbor
occurrence). Intern node ids to u32 once via a per-traversal TypeIndex (no
GraphIndex/CSR build — laziness preserved) and run visited/seen/frontier/
neighbor-map in dense u32 space, mirroring the CSR path; de-intern only for the
per-hop IN-list and the emitted dst ids handed to the hydrate+align tail.

Behavior-preserving — the traversal_indexed CSR-vs-indexed equivalence tests are
the guard (results are identical, the key type just changes String -> u32).

* refactor(engine): thread the opened edge dataset into indexed Expand

Hoist the edge-dataset open and the C6 index-coverage warning out of
execute_expand_indexed into execute_expand, threading the opened dataset in
as a parameter so it is opened exactly once. Extract the endpoint-column
mapping (endpoint_columns) and the coverage warning (warn_on_degraded_coverage)
as helpers.

Behavior-preserving: same dataset, same warning, same dispatch decision. This
only relocates the open so the upcoming cost-based chooser can consult index
coverage before dispatch without opening the dataset twice.

* feat(engine): cost-based Expand dispatch chooser (C5)

Replace the fixed frontier<=1024 && hops<=6 dispatch threshold with a pure,
IO-free cost model. choose_expand_mode compares the indexed path's
frontier-relative work (hops * frontier * fanout, or hops * |E| when BTREE
coverage is degraded) against the cost of building the whole-graph CSR
(BUILD_FACTOR * |E|), from cheap manifest row counts. Under good coverage this
reduces to a selectivity ratio independent of |E|, preserving the flat-in-|E|
indexed win for selective traversals while routing dense / deep / high-fanout
or degraded-and-expensive traversals to CSR.

execute_expand decides cardinality-first and only opens the edge dataset to
confirm coverage when it leans indexed (no open on a clearly-CSR traversal).
The two env knobs become hard ceilings layered on the model; the
OMNIGRAPH_TRAVERSAL_MODE override still forces a path; the chosen mode is
traced. Results are unchanged across modes — only the path differs.

Adds inline crossover unit tests and extends the traversal_indexed both_modes
harness with an auto pass asserting the chooser is result-preserving across
every traversal shape. Documents the new flag semantics in
docs/user/{constants,query-language}.md.

* test(engine): pin Lance scalar-index coverage + system-column/deletion-metadata surface

Add three Lance surface guards de-risking a future persisted-adjacency cache:
- a compile-only guard pinning the fragment physical_rows + index-detail
  surface that key_column_index_coverage mirrors (the C6 fallback);
- a runtime probe confirming a scalar BTREE on the system column
  _row_last_updated_at_version is not buildable via the normal create-index
  path (the column is not in the user schema), so a version-column range delta
  is not viable as drafted;
- a runtime probe confirming per-fragment deletion metadata
  (deletion_file.num_deleted_rows) is available as cheap O(fragments) metadata,
  the primitive a fragment-coverage delete model would rely on.

The probes turn the two largest substrate assumptions into green/red CI facts
before any cache work begins.

* test(engine): regression for cross-type id-collision in indexed traversal

A node id is unique only within a type, so a Person and a Company can share an
id string. A variable-length traversal over a cross-type edge (WorksAt) must
structurally stop after one hop. This test builds a graph where 'shared' is both
a Person and a Company id and asserts worksAt{1,2} returns only the one-hop
company. It fails today: the indexed path's single string interner de-interns
the hop-1 Company id back to the colliding Person id and runs a hop-2 scan that
matches that Person's edges, emitting a spurious second-hop company (indexed
["other","shared"] vs csr ["shared"]).

* fix(engine): structurally cap cross-type Expand at one hop

A cross-type edge cannot chain (e.g. a Company is not a WorksAt source), so a
variable-length traversal over one is structurally single-hop. Both traversal
paths now enforce this by capping max hops at 1 when from_type != to_type,
instead of relying on the hop-2 scan returning empty.

That reliance was a correctness hole on the indexed path: it interns every
endpoint string into one dense id space, so a cross-type id-string collision (a
Person and a Company sharing an id) let hop 2 de-intern a destination id back to
the colliding source-type id and match its edges, emitting rows the CSR path
never produces. With the cap the cross-type second-hop scan never runs, so the
shared interner can no longer alias across types. Turns the regression test
green (indexed == csr == ["shared"]).

* perf(engine): set-oriented filtered anti-join, remove per-row dispatch

execute_anti_join's filtered slow path sliced the outer batch to one row at a
time and re-ran the inner pipeline per row, so each 1-row inner Expand dispatched
to the indexed path — one Lance scan per outer row, while the CSR realized up
front sat unused.

Replace it with a set-oriented anti-semi-join: tag each outer row with a
synthetic index column, run the inner pipeline once over the whole frontier (the
tag survives Expand's hconcat and Filter's row-drop), then exclude outer rows
whose tag survived. The inner Expand now runs as a single set-at-a-time traversal
over the full frontier; config is read once per operator, not per row (the env
nit is mooted). A produced-but-untagged inner batch fails loudly rather than
silently keeping every row. Results are unchanged (the predicated-negation tests
exercise the path over a multi-row outer with dst-filters).

* test(engine): drop flaky wall-clock budget from the merge truth table

The 30s wall-clock assertion in merge_pair_truth_table flakes under parallel
test load: it tripped at ~31s in the full --test-threads=4 gate while passing at
~20s in isolation. A fixed time budget in a correctness test depends on machine
and parallelism, not correctness; elapsed is still logged for visibility, and a
real merge-perf regression belongs in a bench. The cell-count correctness
assertions (81 / 36 / 45) are unchanged.

* fix(engine): total deterministic ORDER via entity-key tie-break + NULL contract

apply_ordering used an unstable lexsort with no tie-break, so rows with equal
user-sort keys came out in a run-dependent order (the input order depends on
scan parallelism / upstream hashing) — making ORDER ... LIMIT non-deterministic,
a latent deny-list violation (no nondeterministic result ordering).

Append the bound entities' key columns (<var>.id, unique per row) in canonical
name-sorted order as ascending tie-breaks, giving a total, reproducible order
(and a deterministic top-N when ties straddle the LIMIT cutoff). NULL placement
(nulls_first = !descending) is unchanged and now documented as the contract.

New tests/ordering.rs locks descending, multi-key precedence, the deterministic
key tie-break (data loaded in a different order than the expected output, so it
proves the tie sorts by key not by load order), and NULL placement under ASC/DESC.
docs/user/query-language.md documents the total-order + NULL contract.

* test(engine): property-based query-correctness invariants over generated graphs

Adds a proptest harness (new dev-dep) that generates small graphs whose Person
and Company keys are drawn from a shared 5-key alphabet, so cross-type id
collisions, cycles, and self-loops arise by search rather than from one
hand-built fixture. Three invariants:

- prop_expand_indexed_eq_csr: csr == indexed == auto over knows{1,3} (same-type,
  cycles) and worksAt{1,2} (cross-type, collision-prone) from every start.
- prop_results_subset_of_existing_nodes: no phantom rows (catches over-emission
  even if both modes are wrong identically).
- prop_antijoin_partitions_persons: not{worksAt} and its complement are disjoint
  and cover all persons.

Verified the guard bites: neutering the cross-type hop cap makes
prop_expand_indexed_eq_csr fail and proptest shrinks it to persons["c","e"] /
companies["b","c"] — the cross-type collision class the hand-built fixture
only sampled once. Tests are sync + #[serial] (per-case runtime; the mode test
writes OMNIGRAPH_TRAVERSAL_MODE).

* test(engine): cover cycle/self-loop termination + nested anti-join (C5 edge cases)

- variable_hops_terminate_and_dedup_on_cycle: a 3-cycle a->b->c->a traversed with
  knows{1,5} (ceiling above the cycle length) terminates and emits each node once
  (the c->a back-edge hits the seeded source); both_modes confirms indexed == csr.
  Uses a bounded range deliberately — unbounded {1,} is a typecheck error, not a
  runtime path.
- variable_hops_handle_self_loop: a->a self-loop does not loop forever and does
  not re-emit the seeded source.
- nested_anti_join_double_negation: not { worksAt; not { name = Acme } } recurses
  through execute_pipeline, yielding [Alice,Charlie,Diana] (people with no non-Acme
  employer) — distinct from plain unemployed [Charlie,Diana].

* test(engine): execution goldens for typed-literal filters (C4 gap #4)

New literal_filters.rs covers filtering by F64/F32/Bool/Date/DateTime LITERALS
across both arms: standalone comparisons ($m.score > 1.5, $m.ratio <= 0.25,
$m.active = true, $m.born >= date(...), $m.seen < datetime(...)) exercise the
in-memory comparison path, and inline bindings (Metric { active: true },
Metric { score: 3.0 }) exercise Lance filter_expr pushdown. Seeds partition each
predicate so a dropped/miscast filter returns all rows. (Param-bound scalars and
list-column contains are covered elsewhere.)

* test(engine): full rank-order goldens for nearest + bm25 (gap #2)

Existing search tests stopped at top-1 (nearest) or non-empty (bm25), so a
regression corrupting ranks 2..k or reversing the sort direction passed CI
silently. Pin the FULL ordered slug list: nearest([0.1,0.2,0.3,0.4]) ->
[ml-intro, nlp-guide, rl-intro] (ml-intro exact at dist 0, rest by ascending
L2); bm25(Learning) -> [rl-intro, ml-intro, dl-basics] (descending score).
nearest/bm25 skip apply_ordering (is_search_ordered) and return Lance native
order, so result_slugs row order == rank order; values resolved by running and
confirmed stable across runs.

* test(engine): search fuzzy/match_text characterization + RRF non-default pairings

- match_text_matches_exact_set_excludes_unrelated: match_text(body,'neural') ==
  [dl-basics] exactly (not just contains).
- fuzzy_does_not_match_under_default_tokenizer: characterizes that fuzzy() is
  inert with the default tokenizer here (search/match_text work, fuzzy returns
  nothing); turns red — to be promoted to a real golden — if fuzzy starts matching.
- rrf_fuses_two_fts_fields / rrf_fuses_two_vector_queries: RRF fuses arms other
  than the default nearest+bm25 (bm25 title+body; two vector queries), proving
  primary_var resolves and fusion runs. New fixtures/search.gq queries +
  two_vector_params helper. Orders resolved by running, confirmed stable.

* test(engine): anti-join fast-vs-slow path equivalence harness

anti_join_fast_and_slow_paths_agree: the CSR has_neighbors fast path
(not { $p worksAt $_ }) and the set-oriented inner-pipeline replay (same
negation forced slow by an always-true $c.name != "" dst filter) must produce
the same result ([Charlie, Diana]). Closes the second real engine fork explicitly.

* test(engine): regression for nested slow-path anti-join tag collision

A nested not { ... not { ... } } where both levels hit the set-oriented slow
path collides on the fixed __antijoin_outer_row correlation column: the inner
call appends a duplicate, and column_by_name reads the OUTER tag. Fan-out (p1
works at two companies) makes inner row indices diverge from outer tags, so the
bug returns the wrong person set. Fails on current code (left ["p2","p4"] vs
right ["p3","p4"]).

* fix(engine): collision-free anti-join correlation tag for nested negation

The set-oriented anti-join tagged the outer batch with a fixed column name and
read it back by name. Under a nested slow-path anti-join the enclosing tag rides
through the inner pipeline, so the inner call produced a duplicate field; Arrow
permits duplicate names and column_by_name returns the first, so the inner
negation mis-correlated against the outer row indices.

Choose a tag name not already present in the batch (suffix-incremented), so each
nesting level reads its own correlation column. Turns the fan-out regression
green; the existing nested/fast-vs-slow/proptest anti-join invariants still pass.

* fix(engine): cap cross-type hops in the Expand cost model

gather_cost_inputs fed the requested max_hops into choose_expand_mode even though
execute_expand_indexed runs at most one hop for a cross-type edge. So a cross-type
variable-length expand (e.g. worksAt{1,5}) had its indexed cost scaled by 5 while
only one hop runs, skewing the chooser toward CSR (an unnecessary whole-graph
build) near the crossover. Results were unaffected (modes are equivalent); this
is a plan-accuracy fix.

Add cost_effective_hops(requested, same_type) — caps to 1 for cross-type — and
apply it in gather_cost_inputs so the estimate matches what executes. Unit test
covers the cap and the crossover consequence (capped 1 hop stays indexed where
the requested 5 would have flipped to CSR).

* perf(engine): realize anti-join CSR lazily + reuse a warm CSR in the chooser

Two CSR build/reuse fixes flagged on the set-oriented anti-join work (results
unchanged — plan/perf accuracy):

- execute_anti_join called graph_index.get() (the O(|E|) whole-graph CSR build)
  unconditionally, but only the bulk fast path consumes it; a filtered/nested
  slow-path anti-join's inner Expand picks its own access path. Gate the build on
  a pure shape predicate (bulk_anti_join_applies) so a selective anti-join over a
  large graph no longer pays a build it won't use.
- gather_cost_inputs hardcoded csr_cached=false, so once an earlier op realized
  the CSR, later Expands still cost it as a cold build and could pick per-hop
  indexed scans over reusing the warm in-memory CSR. Add GraphIndexHandle::
  is_built() and thread it through so the chooser reuses a materialized CSR.

Anti-join, cross-type, proptest-equivalence, and chooser unit tests stay green.

* test(engine): RAII traversal-mode guard in proptest equivalence

prop_expand_indexed_eq_csr set/cleared OMNIGRAPH_TRAVERSAL_MODE manually; a panic
between set and clear (e.g. a query unwrap on a generated case) would leak the
forced mode into proptest's shrink/subsequent cases and mask the divergence under
test. Replace with a ModeGuard that clears on drop (including on unwind), scoping
the forced mode to a single query.

* test(engine): regression for multi-hop anti-join hop bounds

The bulk anti-join fast path answers via has_neighbors (one-hop existence), so
not { $p knows{2,2} $x } wrongly drops a node with a 1-hop neighbor but no
2-hop path. On a->b (sink) and c->d->e, only c has a 2-hop path; the query should
keep [a,b,d,e]. Fails on current code (left ["b","e"] — only the sinks).

* fix(engine): restrict anti-join bulk fast path to one-hop expands

bulk_anti_join_applies accepted any single Expand, but try_bulk_anti_join_mask
decides via the CSR has_neighbors one-hop existence check — wrong for multi-hop
negations. Require min_hops==1 && max_hops==1 in the predicate; anything else
falls to the slow path, whose inner Expand runs the real bounded traversal.
Turns the multi-hop regression green; one-hop anti-joins unchanged.

* fix(engine): IndexCoverage reports Degraded for uncovered fragments

key_column_index_coverage checked BTREE-exists + physical_rows but not that the
index actually covers the current fragments. Since edge-index creation is skipped
once a BTREE exists, fragments appended later stay unindexed while coverage still
reported Indexed — so the cost chooser priced a partly-full scan as fully indexed.

Compare the BTREE's fragment_bitmap (public on lance_table IndexMetadata) against
the dataset's current fragment ids; report Degraded when any are uncovered. A None
bitmap means Lance can't report coverage — don't over-degrade. Results are
unaffected (the scan returns unindexed-fragment rows either way); this corrects
the cost signal.

Test: a freshly-loaded edge BTREE is Indexed; after appending an edge the new
fragment is uncovered → Degraded. Surface guard pins IndexMetadata.fragment_bitmap.

* docs: clarify the Expand frontier ceiling bounds the initial dispatch frontier

The cap is applied at dispatch on the initial frontier; per-hop fan-out
(union_dense) is not hard-capped. Correct the constants.md and query-language.md
claims: the ceilings bound the initial-dispatch frontier/hops, the cost model
estimates total indexed work as ~hops*frontier*fanout (pricing dense fan-out
toward CSR), and per-hop work is not a hard bound. Drops the overstated 'hard
caps bound indexed work' / 'cost ∝ frontier' wording.
2026-06-09 18:09:13 +02:00
Andrew Altshuler
59b64ea097
Merge pull request #158 from ModernRelay/cluster-config-docs
[codex] integrate cluster config read-only state stack
2026-06-09 18:54:12 +03:00
aaltshuler
2f19656c0e fix(cluster): tighten state lock observations 2026-06-09 18:30:33 +03:00
aaltshuler
b046515e1c Merge origin/main into cluster-config-docs 2026-06-09 18:11:12 +03:00
Andrew Altshuler
4d7c1ef42f
Merge pull request #153 from ModernRelay/codex/cluster-config-stage2-state-ledger
[codex] Add cluster JSON state ledger status
2026-06-09 17:46:14 +03:00