perf(engine): remove the per-query metadata re-derivation tax on warm reads (#268)

* test(engine): add read-path IO instrumentation seam for warm-read cost tests Prerequisite seam for the query-latency fixes. Adds crates/omnigraph/src/instrumentation.rs: - CountingStorageAdapter: a StorageAdapter decorator counting per-method reads (read_text/exists/read_text_versioned/list_dir), for the schema-contract reads on the query path. - A per-query task-local (QueryIoProbes) carrying Lance WrappingObjectStore wrappers per open category plus a probe counter, delivered via with_query_io_probes. open_dataset_tracked attaches the wrapper so the open itself is counted (ObjectStoreParams.object_store_wrapper). Wires the wrappers into the manifest open (open_manifest_dataset) and the commit-graph opens (CommitGraph::open/open_at_branch). Production leaves the task-local unset, so nothing attaches. Makes Omnigraph::open_with_storage public so tests can inject the counting adapter. lance-io is a dev-dependency (IOTracker named only in tests). No runtime behavior change. * test(engine): warm same-branch read should reuse the coordinator (red) Cost-budget test using Lance IOTracker at the object-store boundary (the LanceDB IO-counted-test pattern). On a 20-commit-deep graph, a warm same-branch query re-opens a fresh coordinator, which opens both the commit graph and __manifest. Asserts the read opens the commit graph zero times and performs exactly one cheap version probe; today it does neither (it scans the commit graph on re-open and never probes). The freshness guard already passes. Adds the commit_many helper for history-depth fixtures. Red half of the Fix 1 red->green pair; turns green with the next commit. * perf(engine): same-branch reads reuse the warm coordinator (Fix 1) query()/resolved_target re-opened a fresh GraphCoordinator from storage on every read (full __manifest scan + two commit-graph scans), so a warm read's cost grew with commit history (invariant 15) though the data was unchanged. resolved_target now serves same-branch reads from the warm in-memory coordinator, gated by a cheap version probe (latest_version_id, one object-store op) instead of a full re-open: - fresh (probe == cached version): return the in-memory snapshot under the read lock, with a synthetic (branch, version) id and no commit-graph access (reads pin the snapshot by manifest version, not the commit DAG; invariant 2). - stale: take the write lock, re-probe (double-checked; tokio RwLock has no read->write upgrade), then refresh_manifest_only (no commit-graph scan), preserving strong consistency for external writers (invariant 6). Cross-branch and snapshot targets keep the existing cold-resolve path. Adds ManifestCoordinator/GraphCoordinator::probe_latest_version and GraphCoordinator::refresh_manifest_only. Nothing on the read path needs a real commit ULID (only RuntimeCache keys on the id, where synthetic is consistent), per a caller audit. A warm same-branch read on a 20-commit graph now does zero commit-graph opens and exactly one probe (down from a deep commit-graph scan) and still observes external commits. The residual per-table __manifest scans are removed later by Fix 2. * test(engine): warm query should validate the schema contract once (red) ensure_schema_state_valid runs twice per query (query()/run_query_at AND resolved_target/snapshot_at_version), each reading 3 contract files + 2 existence probes. A warm query thus does 6 read_text + 4 exists where one validation (3 + 2) suffices, measured via CountingStorageAdapter. Adds a drift guard (schema_source_drift_is_caught_on_read) that already passes. Red half of the finding-A red->green pair. * perf(engine): validate the schema contract once per query (finding A) ensure_schema_state_valid ran on every query AND again inside resolved_target / snapshot_at_version, so each query validated the schema contract twice (~10 storage ops). Removes the redundant query()/ run_query_at() calls; the validation inside resolved_target / snapshot_at_version still runs, so drift is detected exactly as before. A source-only fast path was rejected: a long-lived handle must detect external drift of the schema source, IR, OR state on its next operation (lifecycle::long_lived_handle_rejects_schema_*), which a source-only compare would miss. So the only safe latency win is not validating twice. A warm query now does one validation (3 read_text + 2 exists) instead of two (6 + 4). * test(engine): warm + multi-table reads should do zero manifest scans (red) After Fix 1 a warm same-branch read still scans __manifest ~44 times at 20-commit depth: not from resolution (Fix 1 removed that) but from the per-table open path, which routes through the Lance namespace and full-scans __manifest twice per touched table (describe_table + describe_table_version). Tightens the warm test to assert manifest read_iops == 0 and adds a multi-table (traversal) test asserting the same, pinning the "2 tables = 2x" tax. Red half of the Fix 2 red->green pair. * perf(engine): open touched tables by location+version, not via the namespace (Fix 2) SubTableEntry::open routed every read-path table open through DatasetBuilder::from_namespace(BranchManifestNamespace), whose describe_table full-scans __manifest and, with managed_versioning, makes Lance scan again (describe_table_version) -- two full __manifest scans per touched table. That was the residual that made warm-read manifest IO grow with history and the '2 tables = 2x' multi-table tax. The resolved Snapshot already holds each table's path/version/branch, so open directly: from_uri(table_uri_for_path(root, path, branch)).with_version(v). The branch-qualified location is the dataset that physically holds the version (main: {path}; branch: {path}/tree/{branch}, Lance native-branch storage), and with_version resolves it within THAT dataset's _versions. 0 namespace calls + 1 HEAD via the native ConditionalPutCommitHandler. The read namespace (BranchManifestNamespace) is now unused in production (writes use StagedTableNamespace), so it, its constructor, and the helpers only it used (to_namespace_version, publish_requests, their imports) are gated #[cfg(test)] -- retained to validate the namespace contract in unit tests. Removes the dead open_table_at_version_from_manifest. Warm same-branch + multi-table reads now scan __manifest zero times; branch + time-travel reads stay correct (branching.rs, point_in_time.rs, 2 lib regression tests); production-lib warnings unchanged (baseline). * test(engine): cost-budget coverage for branch-warm and stale-refresh reads (matrix) Extends the read-path cost-budget tests across more of the morphological matrix: - warm_branch_read_does_no_manifest_scans: a warm read on a non-main branch (handle synced to it) scans __manifest zero times, exercising Fix 2's branch-owned-table open (tree/{branch} + with_version) on Fix 1's warm path -- the cell that regressed when the open used with_branch against the base. - stale_read_refreshes_manifest_only: an external commit makes the next read take the stale path, which re-reads the manifest (read_iops > 0) but never scans the commit graph (refresh_manifest_only), pinning Fix 1's manifest-only refresh. Cold paths (cross-branch, time-travel) stay behavior-covered (branching.rs, point_in_time.rs) and are cold by design (Fix 1 warm-paths only same-branch), so there is no manifest==0 contract to assert there. * test(engine): same-branch write after external commit must not fork the commit DAG (red) * fix(engine): refresh commit-graph head before append to prevent same-branch DAG fork A same-branch write that follows an external commit committed a fresh manifest version (commit_all rebases the pin from a fresh coordinator) but appended off the coordinator's stale in-memory commit-graph head, forking the commit DAG (the new commit and the external commit shared a parent). Pre-existing for non-strict inserts; widened to strict ops by Fix 1's refresh_manifest_only freshening the read-time pin. record_graph_commit now refreshes the commit-graph head from storage before append_commit, so the parent is the true current head. record_merge_commit is unaffected (it passes explicit parents). * perf(engine): hold open Dataset handles + share one Session per graph (Fix 3) A warm same-branch read still re-opened every touched table per query (the "never warms up" residual after Fix 1+2). A per-graph held-handle cache keyed by (table_path, branch, version) now serves repeat reads with zero table opens, and one shared lance::Session per graph warms metadata/index caches across opens. Validated against LanceDB upstream (rust/lancedb/src/table/dataset.rs DatasetConsistencyWrapper): hold an Arc<Dataset> and reuse it for 0-IO warm reads; one Session per connection threaded into opens; writers never serve from the read cache; time-travel bypasses. One adaptation: omnigraph keys by version (snapshot-pins-version model) where LanceDB keys per-table+HEAD, reusing the in-repo GraphIndexCache LRU template. - ReadCaches (session + TableHandleCache) injected onto live-Branch-read snapshots in resolved_target; Snapshot::open serves from the cache or opens once with the session on a miss (via the instrumented open_table_dataset). - Writes (resolved_branch_target -> open HEAD) and time-travel / Snapshot-id reads bypass the cache. Version-in-key makes a write a new key (old handle ages out via LRU); invalidate_all at branch-switch/refresh is hygiene only. - Cost tests: a 2nd identical warm read does 0 table opens; a write re-opens only the changed table at its new version. Full engine suite green. * test(engine): forbid raw data opens in the read/exec layer (P2 guard) Extend the forbidden-API guard with Dataset::open / DatasetBuilder::from_uri / from_namespace so the read/exec layer (exec/, loader/, changes/, db/omnigraph/) cannot bypass Snapshot::open and the held-handle cache (Fix 3). The instrumented opener (instrumentation.rs) is allow-listed; two legitimate non-read opens (a test editing __manifest, Hard-drop version GC) carry sentinels. The storage/manifest layers stay allow-listed. Lean P2 scope, per LanceDB-upstream + minimize-liability: the data-read boundary already exists (SubTableEntry::open); this guard pins it so a future read cannot open around the cache. Centralizing all internal opens behind one opener is deferred. * docs(dev): invariant 15 (one source of truth, cheaply derived) + cost-budget testing Records the principle behind the query-latency work: Lance and the manifest are the source of truth, everything else a derived view held warm and refreshed by a cheap probe; the two failure modes (a drifting parallel copy, and cold re-derivation whose cost grows with history) are deny-listed. Adds the cost-budget testing discipline (assert a warm read's open/IO count is flat at commit-history depth, the LanceDB IO-counted pattern) and the warm_read_cost.rs row. Updates the read-path-re-derivation known gap to reflect what Fix 1/2/3 + finding A close, and adds the commit-graph-parent-under-concurrency gap. * fix(engine): branch-incarnation identity + unified invalidation + shared LruMap (PR #268 review) Phase 6 A-D, correct-by-design responses to the Codex/Greptile P2 review comments. A: warm-read freshness and the table-handle cache key use the manifest incarnation (e_tag, manifest-timestamp fallback, then version), so a deleted+recreated non-main branch reusing a version number cannot be served stale; main stays version-cheap, non-main loads latest_manifest; a detected stale refresh also invalidates read caches; two regression tests force the version collision. B: unify the two cache invalidations into Omnigraph::invalidate_read_caches() at the four sites. C: assert the stale path's probe count. D: shared LruMap behind both caches with unconditional eviction, plus a unit test. Full engine suite green; multi-process lineage fork and O(history) write refresh remain known gaps for Phase 6E/7.
2026-06-18 02:24:27 +02:00 · 2026-06-17 13:25:20 +02:00 · 2026-06-17 13:25:20 +02:00 · 5243c048aa
commit 5243c048aa
parent d0e06a6ff6
26 changed files with 1918 additions and 107 deletions
--- a/docs/dev/testing.md
+++ b/docs/dev/testing.md
@ -23,8 +23,9 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
 | `merge_truth_table.rs` | Merge-pair truth table (MR-786): all 9×9 `(left_op, right_op)` cells from `{noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel}`. Adding a new op to `OpVariant` forces a compile error in `build_case` until the new row + column are dispositioned. 36 executable cells run through real `branch_merge` with a structured oracle (`MergeOutcome` / `MergeConflictKind` + graph-state assert); 45 cells involving `dropProperty`/`addLabel`/`removeLabel` are recorded as `Unsupported` until the mutation grammar grows. |
 | `writes.rs` | Direct-publish writes: cancellation, non-strict insert/merge rebase under the per-table queue, strict stale-write conflicts, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery) |
 | `staged_writes.rs` | TableStore staged-write primitives (`stage_append`, `stage_merge_insert`, `commit_staged`, `scan_with_staged`, `count_rows_with_staged`) — primitive-level only; engine code uses the in-memory `MutationStaging` accumulator instead |
-| `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs; `// forbidden-api-allow: <reason>` sentinel exempts reviewed lines |
+| `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs, nor open datasets directly (`Dataset::open` / `DatasetBuilder::from_uri`/`from_namespace`) — reads route through `Snapshot::open` and the held-handle cache; `// forbidden-api-allow: <reason>` sentinel exempts reviewed lines |
 | `lance_surface_guards.rs` | Pins the Lance API surfaces omnigraph depends on (named runtime + compile-only guards; see [lance.md](lance.md)) — the first smoke check on any Lance version bump; e.g. `compact_files_still_fails_on_blob_columns` turns red when the upstream blob-compaction fix lands |
+| `warm_read_cost.rs` | Cost-budget tests for the warm read path (query-latency work), measured at the object-store boundary with Lance `IOTracker` (the LanceDB IO-counted pattern): a warm same-branch read does 0 manifest opens, 0 commit-graph opens, 1 version probe, validates the schema once (Fix 1 / finding A / Fix 2 at commit-history depth); stale same-branch reads perform exactly 2 probes and refresh manifest-only; recreated non-main branches with the same Lance version refresh by incarnation; recreated branch-owned table handles are distinguished by table e_tag or refresh-time cache clearing; recreated traversal topology is protected by synthetic snapshot-id incarnation or refresh-time cache clearing; a warm *repeat* read does 0 table opens via the held-handle cache and a write re-opens only the changed table at its new version/e_tag (Fix 3/6A). See "Cost-budget tests" below |
 | `lifecycle.rs` | Graph lifecycle, schema state |
 | `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) |
 | `changes.rs` | `diff_between` / `diff_commits` |
@ -125,5 +126,14 @@ When you pick up any change, walk through this:
 6. **For substrate-touching changes** (Lance behavior), reach for `failpoints` or fixture-driven scenarios, not stubbed-out mocks.
 7. **For server / API changes**, confirm the OpenAPI regeneration happens in `openapi.rs` and that the diff lands in `openapi.json`.
 8. **Verify your change makes an existing test fail before it makes the new one pass.** If you can break the code without breaking a test, your coverage gap is the problem to fix first.
+9. **Bound hot-path cost at history depth.** If the change touches a read or open path, add or extend a test that asserts a *bounded* cost (e.g. a warm same-branch read performs zero `Dataset::open`, or a fixed object-op count) against a fixture with realistic *commit-history depth*, not just realistic row counts. Cost that scales with history is invisible on a shallow fixture and only bites in production. See "Cost-budget tests" below.
+
+## Cost-budget tests: bound hot-path cost at history depth
+
+Correctness bugs fail loudly in tests; cost-scaling bugs pass every test and degrade silently in production. The engine read path historically had no cost assertion, and fixtures carry shallow commit history, so an O(commits)-per-query cost stayed green in CI and only surfaced on a long-lived graph (read snapshot resolution re-scanned the internal manifest and commit-graph tables on every query, and those tables were never compacted). Guard against the class:
+
+- **Assert a cost budget, not just a result.** For a read/open path, assert the number of `Dataset::open` calls (or object-store ops) a warm query performs, and that it does not grow with commit count. The reference is LanceDB's IO-counted tests, which assert a cached read costs 0-1 IO and carry a named regression test against "a list call on every subsequent query."
+- **Test at history depth.** Build a fixture with many *commits* (not many rows) and assert warm-read cost is flat across depths. A shallow fixture cannot catch an O(commits) cost.
+- This is the testing companion to invariant 15 in [docs/dev/invariants.md](invariants.md) (hot-path cost is bounded by work, not history).

 When in doubt, re-read [docs/dev/invariants.md](invariants.md) — quality gates apply to every change.