Commit graph

14 commits

Author SHA1 Message Date
Ragnor Comerford
67baf615d9
build(deps): bump Lance 6.0.1 → 7.0.0 (correct-by-design substrate alignment) (#229)
* build(deps): bump Lance 6.0.1 → 7.0.0 (object_store 0.13.2, roaring 0.11.4)

Arrow stays 58 and DataFusion stays 53 (no change). The only transitive bump
is object_store 0.12.5 → 0.13.2. 141 upstream commits reviewed; no fixes lost
(the 6.0.x release-branch backports are all forward-ported into 7.0.0).

- object_store 0.13 moved get/put/head/rename/delete behind a new ObjectStoreExt
  trait (list/list_with_delimiter/put_opts stay on the core trait). Add
  `use object_store::ObjectStoreExt` in storage.rs and db/manifest/namespace.rs;
  no call-site changes. Mirrors Lance's own migration in PR #6672.
- roaring pinned to 0.11.4 (cargo update -p roaring --precise 0.11.4). Lance
  7.0.0's UpdatedFragmentOffsets newtype (lance#6650) derives Eq over
  HashMap<u64, RoaringBitmap>, which needs RoaringBitmap: Eq, added in roaring
  0.11.4; the loose `roaring = "0.11"` constraint otherwise resolves 0.11.3 and
  lance itself fails to compile.
- lance#6774: merge-insert INSERT rows now stamp _row_created_at_version with the
  commit version (was a fallback of 1). Flip the lance_version_columns assertion
  to `== v2` and correct the changes/mod.rs rationale comment. Production
  change-detection keys on _row_last_updated_at_version + ID membership, so its
  logic is unaffected.

Refs lance#6650, lance#6774, lance#6672.

* fix(storage): pin WriteParams::auto_cleanup = None (lance#6755 default flip)

lance#6755 flipped the WriteParams::auto_cleanup default from on (a full cleanup
pass every 20th commit) to None. On 6.0.1 the on-by-default hook could silently
GC versions that __manifest pins for snapshots/time-travel. OmniGraph owns
cleanup explicitly (optimize.rs::cleanup_all_tables) and never set auto_cleanup,
so it was relying on a default that is both wrong for our snapshot model and now
changed upstream.

Pin auto_cleanup: None explicitly at all 11 production WriteParams sites
(table_store ×6, commit_graph ×2, recovery_audit ×1, manifest/graph ×2 — the
__manifest + sub-table Create paths). Removes the dependency on a default-flag
value and locks in the snapshot-safe behavior regardless of future upstream
re-flips.

Refs lance#6755.

* test(lance): pin BTREE range-boundary correctness (lance#6796)

lance#6796 (issue #6792) fixed a BTREE scalar-index range-query bound
inclusiveness bug: `x <= hi AND x > lo` returned the wrong boundary row.
Add lance_surface_guards.rs::btree_range_query_boundary_is_correct, which
reproduces the exact #6792 shape (5 rows + an explicit BTREE drives the index
path even on tiny data) and pins the corrected inclusive-<= / exclusive->
semantics. It turns red if a future Lance regression reintroduces the bug.

OmniGraph today builds BTREE only on string @key columns and queries them by
equality/IN, so its current patterns do not hit this; the guard protects any
future BTREE-range path (BTREE-on-properties, range-on-key).

Refs lance#6796.

* docs(dev): align Lance docs + invariants to 7.0.0

- docs/dev/lance.md: new 2026-06-14 alignment stanza for the 6.0.1 → 7.0.0 bump
  (object_store ObjectStoreExt move, roaring 0.11.4, #6774/#6796/#6755 behavior,
  #6658 shipped → MR-A unblocked but separate, #6666 + blob compaction still
  open); prior 6.0.1 stanza demoted to historical.
- AGENTS.md: storage substrate 6.x → 7.x (line + architecture diagram).
- docs/dev/invariants.md: deletes/vector known gap updated — the staged
  two-phase delete API (lance#6658) now exists and MR-A is unblocked, but
  delete_where stays inline and D2 stays in place until the migration lands;
  create_vector_index still gated on lance#6666.

* fix(storage): skip Lance auto-cleanup on commit paths for legacy datasets

Addresses PR #229 review (Codex P1). `WriteParams::auto_cleanup` is create-time
config with no effect on existing datasets (Lance write.rs docs), so the previous
`auto_cleanup: None` change alone did NOT protect graphs created before the v7
bump: 6.0.1 defaulted auto_cleanup ON, leaving `lance.auto_cleanup.*` config on
those datasets, and Lance's per-commit hook (io/commit.rs: `if
!commit_config.skip_auto_cleanup`) fires off that stored config — so omnigraph's
own writes would GC versions the __manifest pins for snapshots/time-travel.

Skip the hook on every commit path, covering new and legacy datasets alike:
- commit_staged: CommitBuilder::with_skip_auto_cleanup(true) — the staged data path.
- __manifest publisher: MergeInsertBuilder::skip_auto_cleanup(true).
- all 11 WriteParams: skip_auto_cleanup: true (direct Dataset::write/append paths;
  auto_cleanup: None retained so new datasets store no cleanup config at all).

Tests:
- lance_surface_guards::skip_auto_cleanup_suppresses_version_gc — substrate:
  negative control (config GCs v1 without skip) + with-skip survival.
- staged_writes::commit_staged_skips_auto_cleanup_so_pinned_versions_survive —
  omnigraph usage: commit_staged on a legacy-config dataset preserves the pinned
  create version.

Refs lance#6755.

* test(lance): assert created_at-preserved + updated_at-bumped on merge_insert UPDATE

Addresses PR #229 review follow-up. `lance_merge_insert_update_preserves_created_at_version`
documented (in a comment) that a merge_insert UPDATE preserves created_at and
bumps updated_at, but only asserted the value change — leaving the change-feed
invariant unguarded. Add the two missing assertions:

- bob created_at == v1 (preserved across UPDATE; what the test name promises;
  lance#6774 only changed INSERT-row stamping).
- bob updated_at == v2 (bumped to the commit version) — the invariant
  OmniGraph's insert/update classification relies on (changes/mod.rs keys on
  _row_last_updated_at_version). A regression here would silently drop updates
  from the diff/change feed.
2026-06-14 20:42:24 +02:00
Andrew Altshuler
d6cf5b298c
feat(cli): plane-grouped --help + clap 4.6.1 (RFC-010 Slice 2) (#220)
* chore(deps): bump clap to 4.6.1

Workspace constraint "4" → "4.6" so the resolver picks up the 4.6 line
(a plain `cargo update` stayed on 4.5.x). clap 4.5.58 → 4.6.1
(clap_builder 4.6.0, clap_derive 4.6.1). Minor bump, no API breakage; the
workspace builds and all CLI suites pass unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(cli): group --help by plane (RFC-010 Slice 2)

Slice 1 declared the planes (the command_plane table + the wrong-plane
guard); this makes them visible in `--help`. clap can't print labeled
heading rows between subcommand groups (verified against the source —
help_heading is args-only, {subcommands} is one flat block), so per the
chosen approach: cluster + legend.

- Reorder the `Command` enum into plane bands (clap lists subcommands in
  declaration order): data (query, mutate, load, branch, snapshot, export,
  commit, schema, graphs) → storage/local-graph ops (init, optimize,
  repair, cleanup, lint, queries) → control (cluster) → session (policy,
  embed, login, logout, config, version). No magic display_order numbers —
  the source order IS the help order, with band comments for readers. The
  band placement matches `command_plane` (lint/queries are storage-plane:
  they reject --server), so the help grouping and the guard agree.
- Add an `after_help` legend on `Cli` naming the planes. Written to
  describe the planes (not enumerate every command) so it doesn't drift.

Help-polish (post-review): hide the deprecated `ingest` from the list
(still a valid command); trim the long `login` and `--as` descriptions to
one line each so the columns don't blow up.

The behavioral source of truth for planes stays `planes::command_plane`;
this ordering is its cosmetic counterpart.

Test: `help_groups_commands_by_plane` pins the legend phrase + the cluster
ordering (query < optimize < cluster). Doc: a line under cli-reference's
*Command planes* section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(cli): qualify mixed-plane commands in the --help legend

Addresses the Greptile P2 on #220: the legend placed `schema` entirely in
Data and `queries` entirely in Storage, but per `command_plane` the
subcommands differ — `schema plan` is storage-plane (rejects --server) and
`queries list` is session (no graph). A user reading the legend then running
`schema plan --server` would hit a rejection contradicting it. The Commands
list is one entry per top-level command (necessarily coarse), so the legend
carries the nuance: `schema [plan: storage]` and `queries [list: session]`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 01:49:40 +03:00
aaltshuler
4821e7208f refactor(api): extract omnigraph-api-types crate (RFC-009 Phase 2)
The HTTP wire DTOs and their engine-result -> DTO mappings move from
omnigraph-server's api module into a new omnigraph-api-types crate that
both server and CLI can depend on (engine must not — DAG: api-types ->
engine, never the reverse). The crate holds plain serde/utoipa types only;
the transport-coupled error->status mapping stays in the server (lib.rs/
handlers). The one server-runtime coupling (query_catalog_entry, which
maps a StoredQuery — not a wire type) stays behind in api.rs, now calling
the crate's pub param_descriptor.

api.rs becomes a thin `pub use omnigraph_api_types::*` re-export, so every
omnigraph_server::api::Foo path (handlers, the OpenApi schema list, CLI
imports) resolves unchanged. openapi.json regenerates BYTE-IDENTICAL (the
Phase-2 referee: 77 openapi tests green, zero diff).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 17:03:20 +03:00
Ragnor Comerford
446b46d548
Recovery liveness, storage fault-injection matrix, and one storage implementation over object_store (#203)
* test(engine): pin the long-lived-handle heal contract for sidecar-covered drift

A Phase B -> Phase C failure (commit_staged advanced Lance HEAD, manifest
publish did not land, recovery sidecar persists) currently wedges every
subsequent staged write on the same engine handle: the commit-time drift
guard rejects with 'run omnigraph repair', but repair itself refuses
while a recovery sidecar is pending, so a long-lived server can only
recover by restart. The documented contract (writes.md 'Long-running
servers', invariants.md invariant 5) says refresh-time roll-forward
closes this residual without restart -- but no write path runs it.

Two red tests pin the intended contract at the write entry points:
a follow-up load (the POST /ingest shape: shared handle, no reopen)
and a follow-up mutation must heal roll-forward-eligible sidecars
in-process and then succeed.

Currently failing with:
  table 'node:Company' has Lance HEAD version 2 ahead of manifest
  version 1; run `omnigraph repair` before writing

The fix lands in the next commit.

* fix(engine): heal pending recovery sidecars at the staged-write entry points

Close the long-lived-process gap in the recovery protocol: a Phase B ->
Phase C residual (per-table commit_staged landed, manifest publish did
not, sidecar persists) previously recovered only at the next ReadWrite
open or via an explicit refresh() that no production write path called,
so a long-lived server wedged every subsequent write on the commit-time
drift guard until restart.

New recovery::heal_pending_sidecars_roll_forward:
- one list_dir of __recovery/ at write entry (empty -> immediate
  return, the steady state), so the per-write cost is one storage list;
- per sidecar, acquires the same per-(table_key, table_branch) write
  queues every sidecar writer holds from before write_sidecar until
  after delete_sidecar, then re-checks sidecar existence -- this
  serializes the heal against live writers instead of rolling an
  in-flight sidecar forward from under its writer (which would fail
  that writer's publish CAS spuriously). Lock order queues ->
  coordinator matches every writer's commit->publish path. This is the
  queue-acquisition design recovery.rs and write_queue.rs already
  documented for in-process recovery;
- processes in RollForwardOnly mode: the common residual rolls forward
  in-process; rollback-eligible sidecars still defer to the next
  ReadWrite open (Dataset::restore is unsafe under concurrency).

Wire it into load_as and mutate_as (before the inline delete path can
advance any HEAD), and rebase Omnigraph::refresh onto the same helper
so refresh stops racing live writers' sidecars.

The maintenance entry points (apply_schema_as, branch_merge_as,
ensure_indices) intentionally keep their strict fail-loud preconditions
for now; wiring the same heal there is a follow-up with its own tests.

Turns the previous commit's two red tests green.

* fix(engine): name the right recovery path in the commit-time drift guard

The drift guard's 'run omnigraph repair before writing' advice is a
dead end when the drift is covered by a pending recovery sidecar:
repair refuses while a sidecar is pending. With the write-entry heal in
place, reaching this guard with sidecar-covered drift means the heal
deferred it (rollback-eligible), and the actual recovery path is a
read-write reopen. Distinguish the two classes on the error path only
(one sidecar list, after the conflict is already certain); a listing
failure falls back to the uncovered-drift wording rather than masking
the conflict.

Pinned by extending refresh_defers_rollback_eligible_sidecar_to_next_open
with a write attempt against the deferred sidecar.

* docs: write-entry in-process sidecar heal — contract and coverage

Update the recovery contract docs to match the previous two commits:
invariant 5 now states that the staged-write entry points and refresh
run in-process roll-forward recovery (long-lived processes converge on
the next write, not at restart); writes.md 'Long-running servers'
describes the heal's queue-acquisition concurrency contract, the
improved drift-guard error, and the entry points that intentionally do
not heal yet; testing.md indexes the new failpoint tests; AGENTS.md
capability matrix drops the claim that in-process recovery is entirely
future work (only the rollback path remains with the background
reconciler).

* test(engine): pin the entry heal contract for schema apply and branch merge

Without the write-entry heal, the two maintenance writers do worse than
wedge on sidecar-covered drift -- they proceed and decide its fate
implicitly:

- schema apply re-plans table rewrites from the manifest pin, orphaning
  the drifted Phase-B commit (its rows silently vanish from the
  rewritten table) while the stale sidecar lingers to misclassify
  against the post-apply pins;
- branch merge publishes over the drift, making the failed writer's
  commit visible as an unattributed side effect (no recovery audit
  row), and leaves the stale sidecar behind.

Two red tests pin the intended contract: both entry points heal the
sidecar first (attributed roll-forward), then run on the converged
state. Currently failing on the stale-sidecar / dropped-rows
assertions; the fix lands in the next commit.

* fix(engine): heal pending recovery sidecars at the schema-apply and branch-merge entries

Extend the write-entry heal to the remaining two write entry points.
Unlike load/mutate (which wedge on the drift guard), these proceeded
over sidecar-covered drift and decided its fate implicitly:

- schema apply re-planned table rewrites from the manifest pin,
  orphaning the drifted Phase-B commit -- its rows silently vanished
  from the rewritten table -- while the stale sidecar lingered to
  misclassify against the post-apply pins;
- branch merge published over the drift, making the failed writer's
  commit visible without a recovery audit row, and left the stale
  sidecar behind.

Both now run the same queue-serialized roll-forward heal at entry,
before their own sidecar exists, so recovery is attributed (audit row)
and deterministic. ensure_indices stays heal-free: it runs inside the
load / schema-apply flows after their entry heal.

Turns the previous commit's two red tests green. Docs updated in the
same change (invariant 5, writes.md, testing.md, AGENTS.md).

* test(engine): pin Phase A sidecar-write failure semantics

Storage fault-injection matrix, row 1: a sidecar PUT failure (S3
PutObject / fs write) in Phase A. New failpoint recovery.sidecar_write
at the top of write_sidecar -- the single choke point all five sidecar
writers go through -- models the storage error backend-generically.

Also adds the other three storage-fault failpoints used by the
following commits (recovery.sidecar_delete, recovery.sidecar_list,
recovery.record_audit); each is a no-op without the failpoints feature.

Pinned contract: every writer writes its sidecar BEFORE its first
HEAD-advancing commit, so a put failure aborts with zero drift (no
sidecar, Lance HEAD == manifest pin, no rows) and a transient fault
never wedges the graph -- the same handle writes/merges normally once
it clears. Covered for load (the staging writer) and branch_merge (the
multi-table writer, forced onto the RewriteMerged path by diverging
both sides).

* test(engine): pin Phase D delete, list, and audit-append storage-fault semantics

Storage fault-injection matrix, rows 2/3/5, plus the real-backend run:

- recovery.sidecar_delete: a Phase D delete failure (S3 DeleteObject)
  must NOT fail the user's write -- the manifest publish already
  landed, so the caller's data is durable. The swallowed failure
  leaves a stale sidecar; the next write's entry heal consumes it via
  the stale-sidecar audit-recovery path (RolledForward, attributed).

- recovery.sidecar_list: a __recovery/ list failure (S3 ListObjectsV2)
  is loud at every consumer -- the write-entry heal fails the write
  and the open-time sweep fails the open. Silently skipping recovery
  over a pending sidecar would be consumer tolerance of drift. Once
  the fault clears, open recovers the pending sidecar normally.

- recovery.record_audit: an audit write failure after the
  roll-forward's manifest publish aborts that recovery attempt and
  keeps the sidecar; re-entry detects the already-published manifest,
  records exactly ONE RolledForward audit row, and converges -- the
  retry tolerance documented on record_audit, exercised end-to-end.

- s3_load_recovers_after_publisher_failure_without_reopen: the
  same-handle heal scenario on a real bucket (gated on
  OMNIGRAPH_S3_TEST_BUCKET, skips locally), exercising sidecar
  put/list/delete through S3StorageAdapter instead of the local-FS
  adapter. CI wiring lands in a follow-up commit.

* test(engine): refuse corrupt recovery sidecars loudly

Storage fault-injection matrix, row 4 (no failpoint needed -- the
corrupt file is written by hand, sibling to the unknown-schema-version
refusal test): a truncated/garbage __recovery/{ulid}.json must be
refused loudly by both the write-entry heal (the write fails naming
the parse error) and the open-time sweep (ReadWrite open fails naming
the file), with the file left on disk for operator inspection.
Read-only opens still work -- the sweep is skipped there.

* test(engine): run the S3 sidecar-lifecycle coverage in CI + document the fault matrix

- ci.yml rustfs_integration: new step running the bucket-gated
  failpoints tests (name filter s3_) against the RustFS container, so
  sidecar put/list/delete are exercised through S3StorageAdapter on
  every storage-affecting PR.
- writes.md: sidecar I/O failure semantics -- Phase A put failure
  aborts with zero drift; Phase D delete failure is swallowed (write
  already durable) and healed by the next write; list failures are
  loud at heal and open; corrupt sidecars are refused with the file
  kept for inspection; audit-append failures are retried to exactly
  one audit row.
- testing.md: index the storage-fault matrix in the failpoints.rs row
  and the new RustFS CI line.

* test(engine): pin read-visibility of acknowledged local if-absent writes

The cluster lib test import_missing_state_creates_state_with_graph_-
observation flakes at ~50% under full-workspace load ('EOF while
parsing a value' reading back the state.json its own import just
acknowledged). Root cause is in the engine's local storage adapter:
write_text_if_absent writes through a buffered tokio::fs::File and
returns when write_all resolves -- which, per tokio's documented File
semantics, means the bytes reached tokio's internal buffer, not the
file. The actual write completes in a background blocking task after
drop, so a caller that acknowledges success and reads the object back
can see an empty or partial file. Under load the window widens; the
red run fails at iteration 0 with 0 of 8192 bytes on disk.

The regression test pins the contract at the adapter boundary: when
write_text_if_absent resolves, the full contents are visible to any
reader; a losing second claim leaves the winner's object untouched.

The fix lands in the next commit.

* fix(engine): publish local storage writes with atomic visibility

Close the class, not the instance. The local adapter admitted three
ways for a reader to observe a write that was acknowledged or visible
before its bytes were complete:

1. write_text_if_absent acknowledged success when the buffered
   tokio::fs::File write_all resolved -- i.e. when the bytes reached
   tokio's internal buffer, not the file. A caller reading back its own
   acknowledged write could see an empty object (the ~50% cluster
   import flake under full-workspace load; the regression test failed
   at iteration 0 with 0 of 8192 bytes visible).
2. The same call published its CLAIM (create_new) before its CONTENT,
   so concurrent readers saw an empty claimed file in the window.
3. write_text (plain tokio::fs::write) exposed truncated content
   mid-replace -- silently falsifying write_sidecar's 'readers either
   see the complete sidecar or none' contract on local FS (true on S3,
   where PutObject is atomic).

A flush in write_text_if_absent would have fixed only (1). Instead,
both local write paths now publish complete temp files atomically:
rename for replace (write_text -- the idiom write_text_if_match
already used) and hard_link for no-replace (write_text_if_absent --
link fails AlreadyExists, so exactly one of N concurrent claimants
wins and the winner's object is fully readable at the instant it
becomes visible). The local adapter now honors the same object-level
atomic-visibility contract as the S3 adapter, which is what every
caller (recovery sidecar protocol, cluster state CAS) was written
against. Crash-orphaned *.tmp.* files are inert: the sidecar sweep
filters to .json, and cluster state reads address state.json by name.

fsync/durability policy is unchanged (no fsync before, none now);
this fix is about visibility ordering, not power-loss durability.

Pre-existing on main (landed with the multi-graph server mode change,
PR #119); surfaced by this branch's heal work only because one extra
list_dir per write shifted test timing. Cluster lib suite: 12/25
failures before, 0/25 after. Turns the previous commit's red test
green.

* refactor(engine): one storage implementation over object_store for every backend

Collapse LocalStorageAdapter (hand-rolled tokio::fs) and
S3StorageAdapter into a single ObjectStorageAdapter backed by
Arc<dyn object_store::ObjectStore> -- LocalFileSystem for local URIs,
the existing AmazonS3 build for s3://, plus a pub in_memory()
constructor (full contract including TRUE conditional updates; the
in-memory test backend testing.md asked for at the adapter level).

Why: the acknowledged-before-visible bug showed the two-impl shape has
no referee -- one prose contract, two independent answers. Upstream
LocalFileSystem::put_opts is byte-for-byte the staged-temp+rename/
hard_link idiom that fix converged on, and Lance's own commit protocol
is built on the same primitives (put-if-not-exists / rename-if-not-
exists), so the substrate-aligned move is to stop hand-rolling it.
The per-backend residue shrinks to a UriCodec (URI <-> object path)
and one capability flag.

Semantics preserved by construction, with three deliberate deltas:
- exists() is now object-store-semantics everywhere (head + non-empty
  prefix fallback): an EMPTY local directory no longer 'exists'. The
  only dir-shaped caller (_graph_commits.lance probes) self-heals via
  ensure_commit_graph_initialized where it previously wedged loudly.
- A directory at an object path reads as NotFound, not as an IO error
  ('only objects exist'). The cluster unreadable-payload test used a
  same-named directory as a portable non-NotFound trigger; it now uses
  chmod 000, which still models genuine transient IO.
- write_text_if_match keeps content-token semantics on local
  (PutMode::Update is NotImplemented upstream for LocalFileSystem in
  0.12.5 and 0.13.2); the capability flag gates the token SOURCE in
  read_text_versioned too -- an ETag token with content-compare writes
  would lose every CAS.

delete_prefix keeps a local remove_dir_all branch: directories are a
local-FS concept, and list+delete would leave empty skeletons that
cluster graph_root_exists (raw Path::exists) reports as still present.

LocalStorageAdapter remains as a delegating shim so the pinned
contract tests gate this swap textually unchanged; the shim and the
test parameterization over local + in-memory land next. Cargo gains
the explicit 'fs' feature (already transitively enabled by lance).

* test(engine): one executable storage contract, run against every backend

Remove the LocalStorageAdapter delegation shim and migrate its
construction sites to ObjectStorageAdapter::local(). Replace the
per-backend duplicated tests with a single contract_suite asserting
the trait's promises (atomic replace, exists incl. the dataset-root
prefix probe, one-winner if_absent, versioned CAS with loud CAS-lost,
rename, list round-trip with no sibling-prefix bleed, idempotent
delete/delete_prefix), run against the local backend and the new
in-memory backend -- which implements true conditional updates, so the
strong-CAS path is exercised without a bucket. The bucket-gated S3
variant already exists (s3_adapter_conditional_writes_contract).

New local-specific pins for the deliberate semantic edges of the
collapse: empty directories are not objects (exists=false; the Lance
dataset-root probe shape is the non-empty case), file://-anchored and
spaces-in-path list output round-trips byte-identically into
read_text, dot-segment paths are lexically absolutized (the CLI's
./graph.omni shape), and upstream rename creating missing destination
parents. The acknowledged-write visibility regression test stays, now
documenting that the cross-API std::fs read-back is the point.

* refactor(cluster): drop put_json's per-backend atomicity branch

The local temp+rename dance predates the storage adapter guaranteeing
atomic visibility; now that write_text publishes via a staged temp +
rename on the filesystem (and a single atomic PUT on object stores) by
contract, the branch duplicated upstream behavior. One call, both
backends.

* docs: storage adapter collapse — contract, in-memory backend, local CAS gap

- testing.md: the 'no MemStorage backend' note is half-closed —
  ObjectStorageAdapter::in_memory() covers the text-object layer with
  the full contract (true conditional updates); Lance datasets bypass
  the adapter, so the engine substrate ask stays open.
- invariants.md: truth-matrix Tests row updated; new Known Gap for
  local write_text_if_match (upstream PutMode::Update is unimplemented
  for LocalFileSystem; content-token emulation is safe only under the
  cluster lock protocol — close before admitting a lock-free caller).
- writes.md: backend notes for the unified adapter (name#N staging
  residue invisible to the sweep, backend-wrapped error text with
  exists()-probing for missing-vs-error, loud permission failures).

* docs: finish renaming the storage adapters in user docs and test comments

storage.md's URI-scheme table and the S3 failpoint test's doc comment
still named the deleted LocalStorageAdapter/S3StorageAdapter; both now
describe the unified ObjectStorageAdapter over object_store, including
the relative-path absolutization note for local URIs.

* test(engine): pin branch-awareness of the drift guard's recovery advice

A pending sidecar on ANOTHER branch does not cover this branch's
drift: with a deferred feature-branch sidecar on disk and genuinely
uncovered drift on main, the main write's error must still point at
omnigraph repair -- a read-write reopen recovers the sidecar but
cannot repair main's uncovered drift. Currently red: the guard
matches sidecar pins by table_key only, so the feature sidecar flips
main's advice to the reopen path. Fix in the next commit.

Surfaced by external review of the drift-guard change.

* fix(engine): branch-aware sidecar matching in the drift guard's advice

The commit-time drift guard's sidecar-covered check matched pins by
table_key alone, so a pending sidecar on another branch flipped this
branch's uncovered-drift advice from 'run omnigraph repair' to the
reopen path -- and a reopen recovers that sidecar but cannot repair
this branch's drift. Compare the pin's table_branch too. Turns the
previous commit's red test green.

Surfaced by external review of the drift-guard change.

* test(engine): pin heal non-interference with a live schema apply

The write-entry heal's schema-staging reconcile runs before any queue
acquisition, so a load on the same handle, overlapping a schema apply
parked between its staging write and manifest commit, promotes the
apply's staging files (new catalog live against the old manifest),
classifies the LIVE apply's sidecar, and publishes its registrations
out from under it. The resumed apply then collides with its own stolen
commit. Currently red with:

  Lance("Concurrent modification: table version 3 already exists for
  node:Tag")

The fix (per-sidecar reconcile under the sidecar's write-queue guards,
plus a serialization key the schema-apply writer and the heal both
acquire) lands in the next commit.

Surfaced by external review of the write-entry heal.

* fix(engine): serialize the heal's schema-staging reconcile with live schema applies

The write-entry heal ran recover_schema_state_files up front, before
acquiring any queue guards. Overlapping a live schema apply parked
between its staging write and manifest commit, the heal promoted the
apply's staging files (new catalog live against the old manifest),
classified the LIVE apply's sidecar, and published its registrations —
the resumed apply then collided with its own stolen commit.

Correct by construction:

- New schema-apply serialization queue key, acquired by the schema-
  apply writer (alongside its per-table keys) from before write_sidecar
  until after delete_sidecar. Per-table keys alone don't cover a
  registration-only migration, which pins no existing tables but has a
  sidecar and staging files on disk.
- The heal reconciles schema staging lazily, PER SchemaApply sidecar,
  after acquiring that sidecar's guards (including the serialization
  key) and re-confirming the sidecar exists — a sidecar that survives
  the queue wait belongs to a dead writer, so the reconcile can no
  longer race a live apply. Recomputing per sidecar also removes the
  staleness of one up-front result across a multi-sidecar pass.
- Omnigraph::refresh drops its up-front reconcile-and-pass-through
  (same race, and a pre-promoted result would make the heal's guarded
  reconcile see clean staging and wrongly defer the sidecar): it now
  reconciles standalone only when NO sidecar exists — which cannot
  race a live apply, whose sidecar always precedes its staging files —
  and otherwise defers entirely to the heal.

The open-time sweep keeps its precomputed reconcile: open has no
concurrent writers. Turns the previous commit's red test green.

Surfaced by external review of the write-entry heal.

Self-audit addendum folded in: refresh's no-sidecar gate had a TOCTOU
(a live apply could write its sidecar + staging between the empty
check and the reconcile) — the standalone reconcile now holds the
serialization key across the list-then-reconcile pair. The remaining
residual is cross-process only (in-process queues cannot serialize
against a writer in another process; the open-time sweep has the same
pre-existing exposure) and is now an explicit Known Gap in
invariants.md rather than an implicit one.

* test(engine): pin catalog reload after the heal recovers a schema apply

When the write-entry heal rolls a crashed apply's SchemaApply sidecar
forward on the same handle, disk and manifest move to the new schema
(staging promoted, registrations published) but the handle's in-memory
schema_source/catalog do not. Subsequent writes then validate against
the stale catalog and reject rows of types the graph already has.
Currently red with:

  record 1: unknown node type 'Tag'

refresh() reloads after its heal; the write entry points must too.
Fix in the next commit.

Surfaced by external review of the write-entry heal.

* fix(engine): reload the in-memory catalog after the heal recovers a schema apply

heal_pending_recovery_sidecars refreshed the coordinator and
invalidated the runtime cache after processing sidecars, but never
reloaded schema_source/catalog — so a write whose entry heal rolled a
crashed SchemaApply sidecar forward proceeded to validate against the
OLD schema while disk and manifest were already on the new one.
reload_schema_if_source_changed is the same post-heal step refresh()
already runs; it no-ops on the (overwhelmingly common) non-schema heal
because the on-disk source is unchanged. Turns the previous commit's
red test green.

Surfaced by external review of the write-entry heal.

* test(engine): pin that a deleted-branch sidecar cannot wedge the graph

A rollback-eligible sidecar pinned to a branch is deferred by every
roll-forward-only pass; if the branch is then deleted, the sidecar
survives, referencing a branch with no manifest tree. The heal (every
write entry) and the open-time sweep (every ReadWrite open) both fail
opening the dead branch, and repair refuses while a sidecar is pending
-- a terminal read-only state with manual sidecar surgery as the only
exit. Currently red with:

  Lance("Not found: .../__manifest/tree/feature/_versions")

The branch's tree and forks are already reclaimed, so the pinned drift
is unreachable and the sidecar is provably moot; the fix classifies it
as an orphaned-branch terminal state (audit + discard) in both passes.

Surfaced by review (P1, verified by repro).

* fix(engine): classify deleted-branch sidecars as orphaned instead of wedging

A deferred (rollback-eligible) sidecar pinned to a branch survives
branch_delete; both the write-entry heal and the open-time sweep then
failed unconditionally opening the dead branch -- every write and
every ReadWrite open errored, and repair refuses while a sidecar
pends. Terminal state, manual sidecar surgery the only exit.

The branch's tree and per-table forks are already reclaimed at delete,
so the drift the sidecar pins is unreachable and the sidecar is
provably moot. Both passes now check the sidecar's branch against the
manifest's branch list (the authority -- deliberately NOT inferred
from a Not-found on open, which could be a transient storage error
masking real recovery intent) and discard orphans with an
OrphanedBranchDiscarded audit row, commit appended on main since the
sidecar's own branch no longer has a commit graph.

The open-time half is pre-existing; the write-entry heal made it hot.
Turns the previous commit's red test green.

Surfaced by review (P1, verified by repro).

* chore: harden review nits — vacuous CI filter, root-runner skip, liveness note

- ci.yml: the RustFS sidecar-lifecycle step now fails loudly if the
  's3_' name filter matches zero tests (cargo passes vacuously on an
  empty filter; the step exists specifically to prove S3 sidecar I/O
  coverage). The pre-existing CLI smoke step has the same shape and is
  left for a follow-up.
- cluster unreadable-payload test: cfg(unix) + a skip-with-log when
  running as root (mode 000 is still readable to root, common in
  container dev runners), so the test degrades instead of failing.
- refresh: document the one-pass-late convergence for legacy staging
  residue while non-SchemaApply sidecars pend, so nobody 'fixes' it by
  re-running the reconcile unserialized — the exact race the
  serialization key closes.

* test(engine): pin orphan-discard idempotency across a delete fault

discard_orphaned_branch_sidecar writes its audit row and main commit
before deleting the sidecar; a Phase D delete fault leaves the sidecar
on disk with the audit already durable, and the retry repeated the
whole path -- a second OrphanedBranchDiscarded audit row (and commit)
for the same operation. Currently red: 2 rows after one fault + retry.
The retry must only finish the delete. Fix next.

Also promotes the recovery-audit kinds reader into the shared test
helpers (it was recovery.rs-local).

Surfaced by external review of the orphan-discard fix.

* fix(engine): orphan-discard idempotency + heal reports acted-vs-deferred

Two review findings on the recovery surface:

- discard_orphaned_branch_sidecar now checks the audit table for an
  existing (operation_id, OrphanedBranchDiscarded) row before appending
  the commit + audit pair, so a Phase D delete fault retries ONLY the
  delete instead of duplicating audit rows and commit-graph entries.
  Cold path: the list scan runs only when an orphaned sidecar exists.
  Turns the previous commit's red test green (exactly one audit row
  across fault + retry).

- process_sidecar returns whether durable state changed; the heal sets
  processed_any only for sidecars that were actually rolled forward /
  rolled back / audit-recovered (orphan discards count). Deferred
  sidecars (rollback-eligible, invariant-violating, unpromoted
  SchemaApply) no longer trigger a per-write schema reload + full
  runtime-cache invalidation while they pend -- the cache is
  snapshot-keyed so this was waste, not corruption, but it was paid on
  every write until reopen. Acted-paths' processed=true remains pinned
  by load_after_schema_apply_phase_b_failure_uses_recovered_catalog
  (the reload depends on it).

Surfaced by external review.

* test(engine): pin the orphan-discard audit-append fault leg as documented tolerance

The orphan discard's commit append and audit append are two writes; a
failure between them leaves a recovery commit with no audit row, and
the retry (keyed on the audit row, the operator-facing record) appends
a second commit before the audit lands. This is the same
not-atomic-pair-write tolerance record_audit documents and the
manifest->commit-graph Known Gap covers for every publish: bounded
commit-graph noise, audit row exactly-once under clean failures.
Keying idempotency on commit rows instead would need an operation_id
column on _graph_commits, and audit-before-commit would dangle the
graph_commit_id join -- both worse than the documented residual.

Make the tolerance explicit instead of implicit: docstring names the
window, a failpoint sits inside it, and the new test pins convergence
across the fault (sidecar consumed, exactly one audit row), completing
the orphan-discard fault matrix alongside the delete-fault leg.

Surfaced by external review of the orphan-discard idempotency.

* test(engine): pin honest drift-guard advice when sidecar listing fails

The guard's unwrap_or(false) conflated 'classified as uncovered' with
'could not classify': a transient list fault on the guard's second
list (the entry heal's first list having succeeded) confidently routed
the operator to omnigraph repair even when the heal had just deferred
a rollback-eligible sidecar -- and repair refuses while a sidecar is
pending. Currently red: the error says 'run omnigraph repair' with no
mention of the reopen path. The fix names both paths plus the failure
cause when classification is impossible.

Surfaced by external review of the drift-guard fallback.

* fix(engine): admit ambiguity in the drift guard when sidecar listing fails

Replace the unwrap_or(false) fallback with a tri-state: covered ->
reopen advice; uncovered -> repair advice; listing FAILED -> say the
drift could not be classified, name the cause, and give both paths in
order ('run repair, or reopen read-write if repair reports a pending
sidecar'). The old fallback confidently routed a transient list fault
to repair, which refuses while a sidecar is pending -- a self-
correcting but pointless detour. The conflict itself is still always
raised; only the advice degrades honestly. Turns the previous commit's
red test green.

Surfaced by external review of the drift-guard fallback.
2026-06-13 11:20:08 +02:00
aaltshuler
043b02e617 feat(cluster): add read-only validate and plan 2026-06-08 20:07:39 +03:00
Andrew Altshuler
cb80fa40f1
exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113)
* exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr)

Switches `execute_node_scan` from string-flattened Lance SQL pushdown
(`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion
Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`).

## What this enables

1. **`CompOp::Contains` now pushes down.** `ir_filter_to_sql` returned
   `None` for list-contains (the comment said *"Can't pushdown list
   contains"*) because string SQL can't easily express it. With Expr,
   it lowers to DataFusion's `array_has(col, value)` builtin via the
   `nested_expressions` feature, and pushes down to Lance's scan layer
   the same way Eq/Lt/etc. do. Pinned by the new regression test
   `end_to_end::ir_filter_with_list_contains_pushes_down`.

2. **DataFusion 53's optimizer rules now reach our predicates.** Once
   the Expr lands at the Lance scanner, DF's planner runs:
   - `IN`-list vectorized eq kernel (DF #20528)
   - `PhysicalExprSimplifier` (DF #20111)
   - CASE WHEN x THEN y ELSE NULL shortcut (DF #20097)
   - Push limit into hash join (DF #20228)
   None of these were applicable before because the string SQL path
   short-circuited the optimizer.

## Scope

This is one of three string-flattened pushdown sites; the other two
(`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation
delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL
string path for now:

- The Expand pushdown still serializes through `hydrate_nodes`'s
  `extra_filter_sql: Option<&str>` parameter. Migrating it changes the
  `TableStorage` trait surface (`scan_stream(filter: Option<&str>)` →
  `Option<Expr>`) and the cascading call sites — out of scope for this
  MR.
- The mutation delete predicate still goes through `Dataset::delete(&str)`
  in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the
  Lance v7 bump per issue #112) will migrate that path to
  `DeleteBuilder::execute_uncommitted` taking an Expr.

The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql`
helpers stay in place to serve the remaining string-SQL consumers
(mutation predicates). They get retired when the other call sites
migrate.

## Cargo

Enables the `nested_expressions` feature on the `datafusion` workspace
dep. Lance already pulls in `datafusion-functions-nested` transitively
(it's listed in their feature set), so this just exposes the
`datafusion::functions_nested::expr_fn::array_has` re-export. No
transitive dep change (Cargo.lock unchanged).

## Tests

- New: `ir_filter_with_list_contains_pushes_down` — pins the case that
  was previously impossible (`ir_filter_to_sql` returning `None`).
- 906/906 workspace tests still pass.
- 417/417 engine integration tests pass (was 416 + the new one).
- 19/19 failpoints (recovery canary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break)

The RustFS S3 Integration job started failing 2026-05-23 with all 3
tests panicking on the first PUT:

  HTTP error: error sending request

The "Dump RustFS logs on failure" step revealed the container was
dying at startup:

  [FATAL] Server encountered an error and is shutting down:
  Default root credentials are not allowed on non-loopback listeners;
  set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values,
  bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
  for local development only

`rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a
credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as
"default" values. PR #111 passed yesterday because it ran against
beta.3; today's runs against beta.4 fail at container startup.

This is unrelated to PR #113's Expr-pushdown refactor — the bump
just happened to hit the same week.

Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The
right long-term fix is one of:
  - Rotate the CI creds to less-default values (less coupling to
    RustFS's "default" set definition)
  - Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the
    error message
  - Use a workflow service container with controlled lifecycle

Deferred — pinning is the minimal restore. Also incidentally
documents *which* version we tested against, which `:latest` never
did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 12:47:33 +01:00
Andrew Altshuler
3551e0d40e
chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111)
* tests: add lance_surface_guards pre-flight pins for the v6 bump

Land 8 named guards in a new test file that pin Lance API surfaces
OmniGraph relies on. Each guard turns a silent-break risk (variant
rename, struct restructure, async-flip) into a red CI bar instead of
runtime drift.

Guards (mapped to the silent-break inventory from the v6 migration plan):

  Runtime (#[tokio::test]):
  1. lance_error_too_much_write_contention_variant_exists — pins the
     variant referenced by db/manifest/publisher.rs::map_lance_publish_error.
  2. manifest_location_field_shape — pins .path/.size/.e_tag/.naming_scheme
     types and ManifestLocation accessor returning &Self (the access
     pattern at db/manifest/metadata.rs:84-88).
  6. write_params_default_does_not_set_storage_version — confirms our
     explicit V2_2 pin remains load-bearing (blob v2 requirement).

  Compile-only async fns (#[allow(...)] + unimplemented!() placeholders;
  never run, but cargo build --tests enforces the API shape):
  3. checkout_version + restore chain — pins the recovery rollback hammer
     at db/manifest/recovery.rs:505-522.
  4. DatasetBuilder::from_namespace().with_branch().with_version().load()
     — pins the namespace builder chain at db/manifest/namespace.rs:162-174.
  5. MergeInsertBuilder fluent chain — pins the manifest CAS at
     db/manifest/publisher.rs:370-391, including the return shape
     (Arc<Dataset>, MergeStats).
  7. compact_files(&mut ds, CompactionOptions, None) — pins
     db/omnigraph/optimize.rs:107.
  8. DeleteResult { new_dataset, num_deleted_rows } — pins the inline
     delete result shape (MR-A will repurpose this guard to the staged
     two-phase variant once Lance #6658 migration lands).

This is commit 1 of the chore/lance-6.0.1 migration. Cargo bump
follows in commit 2 (will trigger the guards under v6 if any surface
drifted).

Per the migration plan at ~/.claude/plans/shimmering-percolating-duckling.md
(written this session). Two guards from the plan deferred to follow-up:
  - manifest_cas_returns_row_level_contention_variant (full publisher
    race integration test — needs harness scaffolding)
  - table_version_metadata_byte_compatible_with_v4 (TableVersionMetadata
    is pub(crate); requires test reach extension).

Verified on v4: cargo test -p omnigraph-engine --test lance_surface_guards
passes 3/3 runtime tests; cargo build -p omnigraph-engine --tests
compiles all 5 compile-only guards clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): bump Lance 4.0.0 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58

The Cargo bump itself. Source is intentionally untouched — this commit
will not compile. The compile errors are the work-list for subsequent
commits on this branch.

Lance updates: lance + 7 sub-crates 4.0.0 → 6.0.1. Transitive churn:
  + lance-tokenizer v6.0.1 (vendored tokenizer per Lance PR #6512)
  + object_store 0.13.x (Lance 6 brings it transitively; our explicit
    pin stays at 0.12.5 for now — revisit in stages if diamond bites)
  - tantivy* crates (replaced by lance-tokenizer)

Compile error landscape on this commit (11 errors):
  • 1× E0432: `lance_index::DatasetIndexExt` import (Lance PR #6280
    moved it to lance::index). Sites: table_store.rs:20,
    db/manifest.rs:37 (the second site was missed by the pre-flight
    inventory).
  • 8× E0599: `create_index_builder` / `load_indices` missing on
    `lance::Dataset` — all downstream of the DatasetIndexExt move.
    Once the import is corrected on table_store.rs and db/manifest.rs,
    these resolve automatically.
  • 2× E0063: missing field `is_only_declared` in `DescribeTableResponse`
    initializer at db/manifest/namespace.rs:221, 364. New Lance
    namespace field per the v5 namespace restructure (PR #6186).

Surface guards (lance_surface_guards.rs, commit d571fa8) all still
compile + the 3 runtime ones pass on v6 — none of the silent-break
surfaces drifted. That's the load-bearing observation: the publisher
CAS chain, ManifestLocation field shape, checkout_version/restore,
DatasetBuilder fluent chain, MergeInsertBuilder return shape,
WriteParams::default, compact_files signature, and DeleteResult
fields are all v6-stable.

Next commits address the 11 errors per the migration plan stages
3-8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* imports: move DatasetIndexExt to lance::index (Lance PR #6280)

Lance 5.0 (PR #6280) moved `DatasetIndexExt` out of `lance-index` into
`lance::index`. `is_system_index` and `IndexType` stayed in `lance-index`.

Mechanical update of 6 import sites:
  crates/omnigraph/src/table_store.rs:20 — split into two `use` lines
  crates/omnigraph-server/tests/server.rs:10 — was traits::DatasetIndexExt
  crates/omnigraph/tests/search.rs:6
  crates/omnigraph/tests/branching.rs:7
  crates/omnigraph/tests/failpoints.rs:467
  crates/omnigraph-cli/tests/cli.rs:3 — was traits::DatasetIndexExt

All 9 E0599 cascading errors on .create_index_builder / .load_indices
resolve once the trait is back in scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* namespace: add is_only_declared field to DescribeTableResponse

Lance namespace 6.0.0 added `is_only_declared: Option<bool>` to
`DescribeTableResponse` (lance-namespace-reqwest-client 0.7+ via the
v5.0 namespace API restructure, Lance PR #6186). Set to `Some(false)`
because every table BranchManifestNamespace returns from describe_table
is materialized — the manifest snapshot only includes entries for
tables we've already opened via Dataset::open.

Two sites in db/manifest/namespace.rs (BranchManifestNamespace +
StagedTableNamespace impls of LanceNamespace::describe_table).

Closes the last two compile errors from the v6 bump in the engine lib.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cargo: add lance to omnigraph-cli + omnigraph-server dev-deps

Stage 3 moved DatasetIndexExt imports from `lance-index` to `lance::index`
in the cli and server test crates. Both crates only had `lance-index`
in their dev-dependencies; add `lance` alongside so the new path
resolves.

This is the last compile-error fix from the v6 bump — `cargo build
--workspace --tests` is now green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: refresh Lance alignment audit for v6.0.1; bump surveyed version

Per CLAUDE.md maintenance rule 2 (same-PR docs):

- docs/dev/lance.md: replace the v4.0.1 alignment audit stanza with
  the v6.0.1 audit. Captures every v5/v6 finding from this PR (the
  DatasetIndexExt move, DescribeTableResponse.is_only_declared,
  MergeInsertBuilder return shape, ManifestLocation field shape,
  LanceFileVersion::default flip, file-reader async, tokenizer
  vendor, Lance #6658/#6666/#6877 status). Cross-references each
  guard in tests/lance_surface_guards.rs.

- AGENTS.md: bump "Storage substrate: Lance 4.x" → "Lance 6.x".
  Note: surveyed crate version stays at 0.4.2 — substrate version
  bumps are independent of OmniGraph's release version.

- crates/omnigraph/src/storage_layer.rs: update the trait module-level
  doc-comment to reflect that Lance #6658 closed 2026-05-14 and
  delete_where two-phase migration is MR-A (the next follow-up).
  #6666 stays open; create_vector_index inline residual stays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* tests: silence clippy::diverging_sub_expression on compile-only guards

The five `_compile_*` async fns in lance_surface_guards.rs use
`let ds: Dataset = unimplemented!()` as a placeholder so type inference
can chase the method chain we want to pin, without ever running the
function. Clippy's `diverging_sub_expression` lint flags this pattern
because the RHS diverges; that's the entire point. Added to the
per-fn `#[allow(...)]` list, alongside dead_code / unreachable_code /
unused_variables / unused_mut already there.

No behavior change. cargo test -p omnigraph-engine --test
lance_surface_guards still 3/3 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: correct #6658 status — closed but API ships in Lance v7.x, not v6.0.1

The audit stanza in docs/dev/lance.md and the storage_layer.rs trait
doc-comment both implied the public DeleteBuilder::execute_uncommitted
API shipped with Lance 6.0.1. It did not. Issue #6658 closed
2026-05-14, but binary search across the release stream confirms:

  v6.0.1             no pub async fn execute_uncommitted on DeleteBuilder
  v6.1.0-rc.1       
  v7.0.0-beta.5     
  v7.0.0-beta.10     first appearance
  v7.0.0-rc.1       

So MR-A (delete two-phase migration) is gated on the Lance v7.x bump,
not on this PR. v7.0.0-rc.1 dropped 2026-05-21; GA likely within a
week.

No behavior change. Doc-only correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(lib): bump recursion_limit to 256 — Lance 6 trait depth on Linux

Lance 6's heavier trait surface around futures/streams in storage_layer.rs's
staged-write API pushes the rustc trait-resolution recursion limit past
the default 128 on Linux builds. CI on PR #111 surfaced this in both
`Test Workspace` and `Test omnigraph-server --features aws`:

  error: queries overflow the depth limit!
    = help: consider increasing the recursion limit by adding a
      `#![recursion_limit = "256"]` attribute to your crate (`omnigraph`)
    = note: query depth increased by 130 when computing layout of
      `{async block@crates/omnigraph/src/storage_layer.rs:697:5: 697:10}`

(The async block is `stage_create_btree_index`'s body — its return type
is several layers of `impl Future<Output=Result<StagedHandle>>` deep on
top of Lance's own builder return types.)

Local macOS builds happened to short-circuit before tripping the limit,
which is why this didn't surface during the v6 bump sequence. The fix
rustc itself suggests is one line at the crate root.

No behavior change. Revisit if a future Lance bump stops needing it.

Verified: `cargo build --locked -p omnigraph-server --features aws`
compiles clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 00:42:29 +01:00
Andrew Altshuler
9973683261
policy: chassis core — omnigraph-policy crate + Omnigraph::enforce() (MR-722) (#102)
PR #2 of the policy chassis series (PR #1 = MR-731, merged in #101).
The structural fix that moves Cedar enforcement from HTTP-only to
engine-wide. apply_schema is the proof-of-concept writer; PR #3 fans
the enforce() call out to the remaining six (mutate_as, load,
ingest_as, branch_create_from, branch_delete, branch_merge).

## What lands

### New crate: omnigraph-policy

The 844-line policy.rs moves from `omnigraph-server` into a new
`omnigraph-policy` workspace crate so both engine and server can
depend on it. Cedar dependency moves with it. The server's policy.rs
becomes a re-export shim (`pub use omnigraph_policy::*`) so existing
`omnigraph_server::PolicyAction` etc. paths keep working — CLI and
test consumers don't have to migrate in one go.

### New trait: PolicyChecker

```rust
pub trait PolicyChecker: Send + Sync {
    fn check(&self, action: PolicyAction, scope: &ResourceScope,
             actor: &str) -> Result<(), PolicyError>;
}
```

`PolicyEngine` (Cedar-backed) implements it. `Omnigraph::with_policy()`
takes `Arc<dyn PolicyChecker>`. Engine tests mock the trait without
spinning up Cedar. MR-725 will extend the trait with `predicate_for()`
for query-layer pushdown — additive, no call-site changes.

### New enum: ResourceScope

Four variants — Graph, Branch, TargetBranch, BranchTransition —
mapping cleanly to today's `(branch, target_branch)` shape on
PolicyRequest via `to_branch_pair()`. Each engine writer picks the
variant that matches the existing HTTP-layer convention so engine
and HTTP evaluate the same Cedar decision.

**Invariant**: ResourceScope stays at branch granularity. Per-type
and per-row scope are MR-725's territory, not engine-layer's.
Adding Type/Row variants here creates two places per-type policy
can be evaluated, which can drift. See chassis design refinements
comment on MR-722 (2026-05-17).

### Omnigraph::with_policy() + enforce()

* New `policy: Option<Arc<dyn PolicyChecker>>` field on Omnigraph,
  None by default (preserves embedded/dev no-enforcement mode).
* `with_policy(self, checker)` setter — builder-style, consumes self.
* `enforce(action, scope, actor)` — the gate. When policy is None,
  no-op. When policy is Some AND actor is None, hard error — silent
  bypass via "I forgot the actor" is exactly the footgun this gate
  is here to prevent.

### apply_schema_as: first writer wired

* New public method `apply_schema_as(source, options, actor)` that
  calls `enforce(SchemaApply, TargetBranch("main"), actor)` before
  acquiring the schema-apply lock or doing any other work.
* Existing `apply_schema(source)` and `apply_schema_with_options(...)`
  delegate to it with actor=None (no-actor variants).
* HTTP handler `server_schema_apply` updated to call apply_schema_as
  with the resolved actor. AppState construction injects the
  PolicyEngine into Omnigraph via `with_policy`. HTTP-layer
  authorize_request still fires first; the engine gate is the
  redundant-but-correct backstop and the only path that protects SDK
  / embedded callers. PR #3 removes the HTTP redundancy.

### OmniError::Policy

New error variant for engine-layer policy denial / evaluation
failure. ApiError::from_omni maps it to 403.

### MR-724 Admin action — Option A reservation

PolicyAction::Admin kept in the enum with a load-bearing doc
comment naming its future consumers (hot reload, audit log query,
approvals list per MR-726 / MR-732 / MR-734). No enforce(Admin, ...)
call site exists yet — the variant is reserved so the action
vocabulary is complete from chassis day one. MR-724 closes when
the first consumer surface ships.

### New SDK-side integration test

`crates/omnigraph/tests/policy_engine_chassis.rs` — four tests
covering:
* Policy denies for unauthorized actor → OmniError::Policy
* Policy permits for authorized actor → apply succeeds
* Policy installed + no actor → hard error (forget-the-actor footgun)
* No policy → no-op (embedded/dev default still works)

These exercise the engine path directly — no HTTP layer involved.

## Test results

- cargo test --workspace --locked --no-fail-fast: 851 passed, 0 failed
  * 45 server tests (existing) pass
  * 14 schema_apply tests (existing) pass
  * 4 new chassis tests pass
  * 60 OpenAPI tests pass (no HTTP API surface changes)
  * No regressions across the workspace

## Architectural decisions baked in

Per MR-722 chassis design refinements comment (2026-05-17):

1. PolicyChecker is a trait, not just a concrete. Engine and server
   consume the trait. MR-725 adds predicate_for() additively.
2. ResourceScope stays at branch granularity. No Type/Row variants.
3. Coarse-vs-fine framing pinned: engine-layer is action gate;
   query-layer (MR-725) is predicate gate. Both backed by same Cedar
   engine; non-overlapping responsibilities.
4. Admin action reserved for policy-management surfaces (MR-724
   Option A).

## Pending follow-ups (PR #3+)

- Fan-out enforce() to mutate_as, load, ingest_as, branch_create_from,
  branch_delete, branch_merge (PR #3).
- Remove HTTP-layer authorize_request redundancy once engine gate
  covers all writers (PR #3).
- CLI policy injection into Omnigraph for non-`policy validate|test|explain`
  subcommands (PR #3 or follow-up).
- MR-723 default-deny 3-state matrix (PR #4).
- MR-736 severity warn/deny (PR #5).
- AGENTS.md scope-of-enforcement rewrite once chassis fully lands.
- Coarse-vs-fine framing in docs/user/policy.md.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 00:36:36 +03:00
Ragnor Comerford
cd780e2d37
deps: add arc-swap to workspace for PR 2 catalog/schema_source wrapping
PR 2 wraps the Omnigraph engine's catalog and schema_source fields in
ArcSwap so reads stay zero-cost while apply_schema can swap atomically
without &mut self. arc-swap lands as an unused workspace dep here so the
follow-up commits that wrap fields can land in isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:25:22 +02:00
Ragnor Comerford
cdfbccbfdc
MR-794 step 2: scaffold MutationStaging accumulator + scan_with_pending
Add the scaffolding for the in-memory staged-write rewire — no behavior
change yet:

* New crates/omnigraph/src/exec/staging.rs with MutationStaging,
  PendingTable, PendingMode, StagedTablePath, plus the end-of-query
  finalize() that issues one stage_* + commit_staged per pending
  table (Merge mode dedupes by id, last-write-wins).
* TableStore::scan_with_pending and count_rows_with_pending helpers —
  Lance scan committed + DataFusion MemTable scan pending, concat.
  Sidesteps the Scanner::with_fragments filter-pushdown limitation
  documented on scan_with_staged.
* Add datafusion = "52" to workspace + omnigraph-engine deps for
  MemTable (transitively pulled by Lance already).

Engine code still uses the legacy MutationStaging shape; the rewire
lands in subsequent commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:42:21 +02:00
Andrew Altshuler
74eb5a5380
Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46)
* Parallel per-type load writes + omnigraph optimize/cleanup CLI

## MR-677.3 — parallel per-type load writes

The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).

## MR-676 — `omnigraph optimize` and `omnigraph cleanup`

New CLI subcommands that walk every node + edge table in the repo:

- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
  table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
  runs Lance `cleanup_old_versions` to prune historical manifests +
  unique fragments. Requires `--confirm` because it's destructive.
  Supports both count-based and time-based retention (or both AND'd
  together). Time uses chrono `DateTime<Utc>` (added as a workspace
  dep, default-features off).

Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.

Public API on `Omnigraph`:

  pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
  pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
      -> Result<Vec<TableCleanupStats>>

All 10 existing loader tests still pass.

Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-25 14:22:14 +03:00
andrew
c338e80180 Harden bearer auth: constant-time compare, hashed at rest, authoritative actor_id
Fixes two live authz bugs in omnigraph-server:

- Bearer-token lookup previously used HashMap::get, which compares keys with
  Eq and short-circuits on the first differing byte — a network-observable
  timing oracle for brute-forcing tokens. Tokens are now stored as SHA-256
  digests and compared with subtle::ConstantTimeEq, iterating every entry
  unconditionally so total work is independent of which slot matches. Raw
  token bytes no longer live in server memory after startup.

- authorize_request now overwrites PolicyRequest.actor_id from the
  authenticated session instead of trusting the handler-supplied field,
  which previously defaulted to "" via unwrap_or_default(). The empty
  string can no longer reach Cedar as a policy subject even if a future
  refactor drops the None check.

External API of AppState constructors is unchanged — tokens still enter as
Vec<(String, String)> and are hashed on the way in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 01:41:02 +03:00
Claude
859ec9faa8
Add OpenAPI spec generation via utoipa with /openapi.json endpoint
Integrate utoipa 5 to auto-generate an OpenAPI 3.1 spec from the existing
Axum handlers and serde types. All 16 endpoints are annotated with path
metadata, request/response schemas, security requirements, and tags. A
public /openapi.json endpoint serves the spec without requiring auth.

Includes 59 tests covering path completeness, HTTP methods, schema fields,
enum variants, security scheme, path/query parameters, request bodies,
response references, and endpoint integration.

https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY
2026-04-12 11:03:23 +00:00
andrew
338289656a Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00