perf(engine): scope CSR topology index to traversed edges, reuse it cross-branch (#312)

* perf(engine): scope the CSR topology index to traversed edges, reuse it cross-branch

The in-memory CSR graph index was built over every edge type in the catalog and
cache-keyed by the resolved snapshot id, so a single-edge join
(`$x identifiesPerson $p`) full-scanned every edge table in the graph (the
40-60s / 428s-first-traversal hang), and a lazy-fork branch cold-rebuilt main's
index. Two cuts close that:

- Scope (A2): build only the edge types the query traverses
  (`referenced_edge_types` over Expand/AntiJoin, exhaustive match), not the whole
  catalog. Threaded through GraphIndexHandle -> RuntimeCache; cache-keyed on the
  scoped set.
- Cross-branch reuse (A1): key RuntimeCache by each edge table's physical identity
  (table_key, version, table_branch, e_tag) instead of the snapshot id, so a
  lazy-fork branch whose edge tables physically are main's reuses main's built
  index. Local-FS (e_tag None) falls back to refresh-invalidation.

Adds graph_build_count/graph_edges_built probes for the cost tests.

* test(engine): cost tests for scoped + cross-branch-reused topology index

fresh_branch_traversal_reuses_main_graph_index (A1: a lazy-fork branch reuses
main's cached CSR index, 0 rebuilds) and single_edge_query_builds_only_referenced_edge
(A2: a one-edge query builds only that edge, not the whole catalog), via the
graph_build_count/graph_edges_built probes. Forced CSR mode, #[serial]. Updates the
recreated-branch incarnation test comment for the physical-identity key.

* docs(engine): topology-index scoping + physical-identity cache key

Document the scoped CSR build and the physical-identity (e_tag) graph-index cache
key with its local-FS refresh-invalidation fallback across invariants, testing,
execution, and architecture docs.

* fix(test): move CSR-forced topology cost tests to the all-serial binary

The two topology-build cost tests force OMNIGRAPH_TRAVERSAL_MODE via process-
global env mutation, which query.rs reads. In warm_read_cost.rs (a mixed
serial/non-serial binary) a concurrent non-serial traversal test could race the
env write (UB under Rust 2024's unsafe set_var contract) and be forced onto CSR.
Move them to traversal_indexed.rs — the dedicated all-serial binary with no
non-serial env reader (its documented-safe home) — and add a ModeGuard RAII
helper so a panic mid-test cannot leak the override. Addresses a PR review (P2).

* fix(engine): include edge endpoints in the graph-index cache key

The A1 physical-identity key omitted the edge's (from_type, to_type). GraphIndex
keys its TypeIndexes by those endpoint names and execute_expand_csr looks them up
by the current catalog's names, so a schema repoint of an edge type that leaves
the edge table's physical identity unchanged would reuse a stale index built with
the old endpoint namespace and fail with "no type index for <new type>". The old
snapshot_id (carrying the manifest version) masked this; dropping it exposed it.
Adding the endpoints to the key rebuilds on a repoint while preserving lazy-fork
cross-branch reuse (same endpoints -> same key). Addresses a PR review (P1).

* test(engine): scoped with_traversal_mode seam + e_tag graph-index coverage

Replace the process-global OMNIGRAPH_TRAVERSAL_MODE env-mutation test hack (which
forced #[serial] + dedicated all-serial binaries and was triplicated as ModeGuard
+ set_mode/clear_mode) with one general abstraction: a task-local
`with_traversal_mode` seam mirroring `with_query_io_probes`. It is scope-bound
(leak-free even on panic) and process-safe (never touches shared state), so a
forced-mode test cannot affect a concurrent test in the same binary.
`traversal_indexed_override` consults the seam first, then the env var (which
stays the documented ops escape hatch).

- Migrate traversal_indexed.rs, proptest_equivalence.rs, and the two topology cost
  tests (moved back to warm_read_cost.rs) to the seam; drop all ModeGuard /
  set_mode / clear_mode / #[serial] / per-file column0 helpers.
- Consolidate the duplicated first-column extractors into one shared
  `helpers::first_column_sorted`.
- Add `s3_storage.rs::s3_fresh_branch_traversal_reuses_main_graph_index_with_etags`:
  the CSR cache-key cross-branch reuse path on a REAL per-table e_tag (None on
  local FS, so local tests can't reach it). Confirmed empirically that RustFS — the
  CI S3 backend — surfaces ETags into version_metadata.e_tag(). CI path filter now
  triggers the rustfs job on runtime_cache/graph_index changes.
This commit is contained in:
Ragnor Comerford 2026-06-28 20:03:06 +02:00 committed by GitHub
parent 20e5fada8a
commit e7e057e26d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
17 changed files with 639 additions and 175 deletions

View file

@ -90,6 +90,7 @@ jobs:
crates/omnigraph/src/db/manifest.rs|crates/omnigraph/src/db/manifest/*) run_rustfs_ci=true ;;
crates/omnigraph/tests/s3_storage.rs|crates/omnigraph/tests/write_cost_s3.rs|crates/omnigraph/tests/helpers/*) run_rustfs_ci=true ;;
crates/omnigraph/src/table_store.rs|crates/omnigraph/src/instrumentation.rs) run_rustfs_ci=true ;;
crates/omnigraph/src/runtime_cache.rs|crates/omnigraph/src/graph_index/*) run_rustfs_ci=true ;;
crates/omnigraph-cluster/src/store.rs|crates/omnigraph-cluster/src/serve.rs) run_rustfs_ci=true ;;
crates/omnigraph-cluster/tests/s3_cluster.rs) run_rustfs_ci=true ;;
crates/omnigraph-server/tests/s3.rs|crates/omnigraph-server/tests/support/*) run_rustfs_ci=true ;;

View file

@ -1296,8 +1296,9 @@ impl Omnigraph {
pub(crate) async fn graph_index_for_resolved(
&self,
resolved: &ResolvedTarget,
edge_types: &std::collections::HashMap<String, (String, String)>,
) -> Result<Arc<crate::graph_index::GraphIndex>> {
table_ops::graph_index_for_resolved(self, resolved).await
table_ops::graph_index_for_resolved(self, resolved, edge_types).await
}
/// Ensure BTree scalar indices exist on key columns.

View file

@ -10,15 +10,22 @@ pub(super) async fn graph_index(db: &Omnigraph) -> Result<Arc<crate::graph_index
.await?;
drop(coord);
let catalog = db.catalog();
db.runtime_cache.graph_index(&resolved, &catalog).await
// Whole-graph entry point: cover every edge type. Query execution scopes to
// the edges it actually traverses (see `referenced_edge_types`).
let edge_types: std::collections::HashMap<String, (String, String)> = catalog
.edge_types
.iter()
.map(|(name, et)| (name.clone(), (et.from_type.clone(), et.to_type.clone())))
.collect();
db.runtime_cache.graph_index(&resolved, &edge_types).await
}
pub(super) async fn graph_index_for_resolved(
db: &Omnigraph,
resolved: &ResolvedTarget,
edge_types: &std::collections::HashMap<String, (String, String)>,
) -> Result<Arc<crate::graph_index::GraphIndex>> {
let catalog = db.catalog();
db.runtime_cache.graph_index(resolved, &catalog).await
db.runtime_cache.graph_index(resolved, edge_types).await
}
pub(super) async fn ensure_indices(db: &Omnigraph) -> Result<Vec<PendingIndex>> {

View file

@ -50,7 +50,11 @@ impl Omnigraph {
.any(|op| matches!(op, IROp::Expand { .. } | IROp::AntiJoin { .. }));
// Lazy: an index-served query with no AntiJoin never builds the CSR.
let graph_index = if needs_graph {
GraphIndexHandle::cached(self, &resolved)
GraphIndexHandle::cached(
self,
&resolved,
referenced_edge_types(&ir.pipeline, &catalog),
)
} else {
GraphIndexHandle::none()
};
@ -95,14 +99,9 @@ impl Omnigraph {
.any(|op| matches!(op, IROp::Expand { .. } | IROp::AntiJoin { .. }));
// Lazy build against this historical snapshot (not the RuntimeCache,
// which is keyed to live branch targets); only a CSR-path Expand or an
// AntiJoin triggers it.
// AntiJoin triggers it. Scoped to the edges this query traverses.
let graph_index = if needs_graph {
let edge_types = catalog
.edge_types
.iter()
.map(|(name, et)| (name.clone(), (et.from_type.clone(), et.to_type.clone())))
.collect();
GraphIndexHandle::direct(&snapshot, edge_types)
GraphIndexHandle::direct(&snapshot, referenced_edge_types(&ir.pipeline, &catalog))
} else {
GraphIndexHandle::none()
};
@ -762,6 +761,51 @@ fn execute_pipeline<'a>(
})
}
/// The edge types a query's pipeline actually traverses, mapped to their
/// `(from_type, to_type)` endpoints. Recurses through `AntiJoin` inner pipelines
/// (whose bulk fast path consumes the CSR for the inner `Expand`'s edge). The
/// CSR build is scoped to exactly this set instead of every edge type in the
/// catalog — otherwise a single-edge join (`$x identifiesPerson $p`) that lands
/// on the CSR path would scan the whole graph's edge data (every message,
/// relationship, … table), the cause of the cross-edge-join hang. Empty when the
/// only traversal is an `AntiJoin` with no inner `Expand` — that shape never asks
/// the handle for an index, so an empty build is never realized.
fn referenced_edge_types(
pipeline: &[IROp],
catalog: &Catalog,
) -> HashMap<String, (String, String)> {
let mut names = std::collections::BTreeSet::new();
collect_referenced_edge_names(pipeline, &mut names);
names
.into_iter()
.filter_map(|name| {
catalog
.edge_types
.get(&name)
.map(|et| (name, (et.from_type.clone(), et.to_type.clone())))
})
.collect()
}
fn collect_referenced_edge_names(
pipeline: &[IROp],
out: &mut std::collections::BTreeSet<String>,
) {
for op in pipeline {
match op {
IROp::Expand { edge_type, .. } => {
out.insert(edge_type.clone());
}
IROp::AntiJoin { inner, .. } => collect_referenced_edge_names(inner, out),
// Exhaustive on purpose (no `_` arm): a new edge-referencing IROp must
// force a compile error here rather than silently under-scope the CSR
// build — an omitted edge would fail at runtime with "no adjacency
// index for edge". The non-traversal ops reference no edges.
IROp::NodeScan { .. } | IROp::Filter(_) => {}
}
}
}
/// Lazily provides the in-memory CSR graph index, building it on first use and
/// memoizing for the rest of the query. Indexed-mode Expand never asks for it,
/// so a query that is entirely index-served and has no AntiJoin never pays the
@ -776,7 +820,11 @@ pub struct GraphIndexHandle<'a> {
enum GraphIndexBuilder<'a> {
None,
Cached(&'a Omnigraph, &'a crate::db::ResolvedTarget),
Cached(
&'a Omnigraph,
&'a crate::db::ResolvedTarget,
HashMap<String, (String, String)>,
),
Direct(&'a Snapshot, HashMap<String, (String, String)>),
}
@ -788,10 +836,14 @@ impl<'a> GraphIndexHandle<'a> {
}
}
fn cached(db: &'a Omnigraph, resolved: &'a crate::db::ResolvedTarget) -> Self {
fn cached(
db: &'a Omnigraph,
resolved: &'a crate::db::ResolvedTarget,
edge_types: HashMap<String, (String, String)>,
) -> Self {
Self {
cell: tokio::sync::OnceCell::new(),
builder: GraphIndexBuilder::Cached(db, resolved),
builder: GraphIndexBuilder::Cached(db, resolved, edge_types),
}
}
@ -810,8 +862,8 @@ impl<'a> GraphIndexHandle<'a> {
.get_or_try_init(|| async {
match &self.builder {
GraphIndexBuilder::None => Ok::<Option<Arc<GraphIndex>>, OmniError>(None),
GraphIndexBuilder::Cached(db, resolved) => {
Ok(Some(db.graph_index_for_resolved(resolved).await?))
GraphIndexBuilder::Cached(db, resolved, edge_types) => {
Ok(Some(db.graph_index_for_resolved(resolved, edge_types).await?))
}
GraphIndexBuilder::Direct(snapshot, edge_types) => {
Ok(Some(Arc::new(GraphIndex::build(snapshot, edge_types).await?)))
@ -834,7 +886,12 @@ impl<'a> GraphIndexHandle<'a> {
/// forces the path (ops escape hatch + test hook). Both modes are semantically
/// identical, so the override only changes which path runs, never the result.
fn traversal_indexed_override() -> Option<bool> {
match std::env::var("OMNIGRAPH_TRAVERSAL_MODE").ok().as_deref() {
// The scoped test seam (`with_traversal_mode`) takes precedence over the
// process-global `OMNIGRAPH_TRAVERSAL_MODE` ops escape hatch.
let mode = crate::instrumentation::traversal_mode_override()
.map(str::to_string)
.or_else(|| std::env::var("OMNIGRAPH_TRAVERSAL_MODE").ok());
match mode.as_deref() {
Some("indexed") => Some(true),
Some("csr") => Some(false),
_ => None,
@ -2460,6 +2517,107 @@ mod expand_chooser_tests {
}
}
#[cfg(test)]
mod referenced_edge_types_tests {
use super::*;
fn node_scan(var: &str, ty: &str) -> IROp {
IROp::NodeScan {
variable: var.to_string(),
type_name: ty.to_string(),
filters: Vec::new(),
}
}
fn expand(edge: &str) -> IROp {
IROp::Expand {
src_var: "a".into(),
dst_var: "b".into(),
edge_type: edge.to_string(),
direction: Direction::Out,
dst_type: "X".into(),
min_hops: 1,
max_hops: Some(1),
dst_filters: Vec::new(),
}
}
fn names(pipeline: &[IROp]) -> Vec<String> {
let mut set = std::collections::BTreeSet::new();
collect_referenced_edge_names(pipeline, &mut set);
set.into_iter().collect()
}
#[test]
fn collects_a_single_expand_edge() {
assert_eq!(
names(&[node_scan("x", "ExternalID"), expand("identifiesPerson")]),
vec!["identifiesPerson".to_string()]
);
}
#[test]
fn ignores_non_traversal_ops_and_dedups() {
// A pipeline that touches one edge twice references exactly that one edge —
// never the whole catalog (the cross-edge-join hang this scoping fixes).
let pipeline = vec![
node_scan("x", "ExternalID"),
expand("identifiesPerson"),
IROp::Filter(IRFilter {
left: IRExpr::PropAccess {
variable: "p".into(),
property: "name".into(),
},
op: omnigraph_compiler::query::ast::CompOp::Eq,
right: IRExpr::Literal(Literal::String("a".into())),
}),
expand("identifiesPerson"),
];
assert_eq!(names(&pipeline), vec!["identifiesPerson".to_string()]);
}
#[test]
fn recurses_through_anti_join_inner_pipeline() {
// The bulk anti-join fast path consumes the CSR for the inner Expand's
// edge, so its edge type must be in scope even though it is nested.
let pipeline = vec![
node_scan("p", "Person"),
expand("knows"),
IROp::AntiJoin {
outer_var: "p".into(),
inner: vec![expand("worksAt")],
},
];
assert_eq!(
names(&pipeline),
vec!["knows".to_string(), "worksAt".to_string()]
);
}
#[test]
fn recurses_through_nested_anti_joins() {
let pipeline = vec![IROp::AntiJoin {
outer_var: "p".into(),
inner: vec![IROp::AntiJoin {
outer_var: "c".into(),
inner: vec![expand("deepEdge")],
}],
}];
assert_eq!(names(&pipeline), vec!["deepEdge".to_string()]);
}
#[test]
fn anti_join_with_no_inner_expand_references_no_edges() {
// A predicate-only anti-join never asks the handle for an index, so the
// empty set is correct — no whole-graph build is realized.
let pipeline = vec![IROp::AntiJoin {
outer_var: "p".into(),
inner: vec![node_scan("c", "Company")],
}];
assert!(names(&pipeline).is_empty());
}
}
#[cfg(test)]
mod literal_lowering_tests {
use super::*;

View file

@ -113,6 +113,12 @@ impl GraphIndex {
snapshot: &Snapshot,
edge_types: &HashMap<String, (String, String)>, // edge_name → (from_type, to_type)
) -> Result<Self> {
// INVARIANT (A1 graph-index cache key): the topology is a pure function of
// the edge tables' `src`/`dst` columns and nothing else. `RuntimeCache`
// keys `GraphIndexCacheKey` on each edge table's physical identity
// `(table_key, version, table_branch, e_tag)` so a lazy-fork branch reuses
// main's built index. If you read node tables, schema, or other state here,
// add it to that key or the cache will serve a stale index.
let mut type_indices: HashMap<String, TypeIndex> = HashMap::new();
let mut csr = HashMap::new();
let mut csc = HashMap::new();

View file

@ -59,6 +59,14 @@ pub struct QueryIoProbes {
/// Internal/system-table (`__manifest`) open CALLS — the complement of
/// `data_open_count`, kept for symmetry and debugging.
pub internal_open_count: Arc<AtomicU64>,
/// Counts topology-index builds (the `RuntimeCache::graph_index` cache-miss
/// path). A cost test asserts a fresh branch whose edge tables are unchanged
/// from main reuses main's cached index (0 builds) rather than rebuilding it.
pub graph_build_count: Arc<AtomicU64>,
/// Edge tables included in topology builds this query (summed over build
/// invocations). A cost test asserts a query referencing one edge builds only
/// that edge, not every catalog edge (the cold-build shrink A2 ships).
pub graph_edges_built: Arc<AtomicU64>,
}
tokio::task_local! {
@ -78,6 +86,32 @@ fn current<R>(f: impl FnOnce(&QueryIoProbes) -> R) -> Option<R> {
QUERY_IO_PROBES.try_with(f).ok()
}
tokio::task_local! {
static TRAVERSAL_MODE_OVERRIDE: Option<&'static str>;
}
/// Force the Expand execution mode (`"indexed"` | `"csr"`) for the scope of `fut`
/// WITHOUT mutating the process-global `OMNIGRAPH_TRAVERSAL_MODE` env var. This is
/// the general traversal-mode test seam: scope-bound (so it cannot leak — the
/// override is gone when `fut` resolves or unwinds) and process-safe (it never
/// touches shared state, so a forced-mode test never affects a concurrent test in
/// the same binary, removing the need for `#[serial]` + a dedicated all-serial
/// binary). Mirrors [`with_query_io_probes`]. The env var stays the production/ops
/// escape hatch; this scoped override takes precedence over it
/// (`exec::query::traversal_indexed_override`).
pub async fn with_traversal_mode<F>(mode: &'static str, fut: F) -> F::Output
where
F: std::future::Future,
{
TRAVERSAL_MODE_OVERRIDE.scope(Some(mode), fut).await
}
/// The scoped traversal-mode override active for this task, if any. `None` in
/// production (no scope installed), so the env var is consulted instead.
pub(crate) fn traversal_mode_override() -> Option<&'static str> {
TRAVERSAL_MODE_OVERRIDE.try_with(|m| *m).ok().flatten()
}
pub(crate) fn manifest_wrapper() -> Option<Arc<dyn WrappingObjectStore>> {
current(|p| p.manifest_wrapper.clone()).flatten()
}
@ -119,6 +153,16 @@ pub(crate) fn record_open(uri: &str) {
});
}
/// Record one topology-index build over `edges` edge tables (the
/// `RuntimeCache::graph_index` cache-miss path). No-op when no probes are
/// installed (production).
pub(crate) fn record_graph_build(edges: usize) {
let _ = current(|p| {
p.graph_build_count.fetch_add(1, Ordering::Relaxed);
p.graph_edges_built.fetch_add(edges as u64, Ordering::Relaxed);
});
}
/// Per-operation staged-write counts, installed for a task via
/// [`with_merge_write_probes`]. Lets a cost-budget test assert WHICH staged-write
/// primitive an operation invokes — e.g. that an append-only fast-forward merge

View file

@ -4,16 +4,20 @@ use std::sync::Arc;
use lance::Dataset;
use lance::session::Session;
use omnigraph_compiler::catalog::Catalog;
use tokio::sync::Mutex;
use crate::db::ResolvedTarget;
use crate::error::Result;
use crate::graph_index::GraphIndex;
/// Cache key for a built `GraphIndex`. Keyed (A1) by the physical identity of the
/// edge tables the topology is derived from, NOT by the resolved snapshot id. The
/// topology is a pure function of the edge tables' `src`/`dst`, so two snapshots
/// (e.g. main and a lazy-fork branch whose edge tables physically *are* main's)
/// with identical edge tables share one built index: a fresh branch reuses main's
/// instead of rebuilding it from a cold scan.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
struct GraphIndexCacheKey {
snapshot_id: String,
edge_tables: Vec<GraphIndexTableState>,
}
@ -22,6 +26,20 @@ struct GraphIndexTableState {
table_key: String,
table_version: u64,
table_branch: Option<String>,
/// Lance manifest incarnation token for this edge table version. Preserves the
/// incarnation distinction the dropped synthetic snapshot id used to carry: a
/// branch deleted and recreated at the same version number gets a new e_tag, so
/// the cache rebuilds instead of serving stale topology. `None` only on stores
/// without e_tags (local FS); there a same-branch manifest refresh clears the
/// cache as the fallback (the read-path gap in docs/dev/invariants.md).
e_tag: Option<String>,
/// The edge's `(from_type, to_type)` endpoint names at build time. `GraphIndex`
/// keys its `TypeIndex`es by these, and `execute_expand_csr` looks them up by
/// the *current* catalog's endpoint names — so a schema change that repoints an
/// edge type while leaving the edge table's physical identity unchanged must
/// invalidate the entry (else the reused index has the old type-index namespace
/// and the new traversal fails with "no type index for <new type>").
endpoints: (String, String),
}
#[derive(Debug, Default)]
@ -40,12 +58,18 @@ impl RuntimeCache {
cache.entries.invalidate_all();
}
/// Build (or fetch) the CSR/CSC graph index scoped to exactly `edge_types` —
/// the edge types the query actually traverses, not every edge type in the
/// catalog. Scoping is what keeps a single-edge join (`$x identifiesPerson
/// $p`) from scanning the whole graph's edge data; the cache key carries the
/// scoped set, so a `{Knows}` index and a `{Knows, WorksAt}` index are
/// distinct entries and never serve each other.
pub async fn graph_index(
&self,
resolved: &ResolvedTarget,
catalog: &Catalog,
edge_types: &HashMap<String, (String, String)>,
) -> Result<Arc<GraphIndex>> {
let key = graph_index_cache_key(resolved, catalog);
let key = graph_index_cache_key(resolved, edge_types);
{
let mut cache = self.graph_indices.lock().await;
if let Some(index) = cache.entries.get(&key).cloned() {
@ -53,13 +77,8 @@ impl RuntimeCache {
}
}
let edge_types = catalog
.edge_types
.iter()
.map(|(name, et)| (name.clone(), (et.from_type.clone(), et.to_type.clone())))
.collect();
let index = Arc::new(GraphIndex::build(&resolved.snapshot, &edge_types).await?);
crate::instrumentation::record_graph_build(edge_types.len());
let index = Arc::new(GraphIndex::build(&resolved.snapshot, edge_types).await?);
let mut cache = self.graph_indices.lock().await;
if let Some(existing) = cache.entries.get(&key).cloned() {
return Ok(existing);
@ -151,11 +170,13 @@ impl Default for GraphIndexCache {
}
}
fn graph_index_cache_key(resolved: &ResolvedTarget, catalog: &Catalog) -> GraphIndexCacheKey {
let mut edge_tables: Vec<GraphIndexTableState> = catalog
.edge_types
.keys()
.filter_map(|edge_name| {
fn graph_index_cache_key(
resolved: &ResolvedTarget,
edge_types: &HashMap<String, (String, String)>,
) -> GraphIndexCacheKey {
let mut edge_tables: Vec<GraphIndexTableState> = edge_types
.iter()
.filter_map(|(edge_name, endpoints)| {
let table_key = format!("edge:{}", edge_name);
resolved
.snapshot
@ -164,15 +185,14 @@ fn graph_index_cache_key(resolved: &ResolvedTarget, catalog: &Catalog) -> GraphI
table_key,
table_version: entry.table_version,
table_branch: entry.table_branch.clone(),
e_tag: entry.version_metadata.e_tag().map(str::to_string),
endpoints: endpoints.clone(),
})
})
.collect();
edge_tables.sort_by(|a, b| a.table_key.cmp(&b.table_key));
GraphIndexCacheKey {
snapshot_id: resolved.snapshot_id.as_str().to_string(),
edge_tables,
}
GraphIndexCacheKey { edge_tables }
}
/// Max held `Dataset` handles. A handle holds only Arcs (object store + manifest),
@ -290,9 +310,16 @@ mod tests {
use super::*;
fn key(id: usize) -> GraphIndexCacheKey {
// Distinct keys via a distinct edge table per id (the key no longer carries
// a snapshot id — it is the physical edge-table identity set, A1).
GraphIndexCacheKey {
snapshot_id: format!("snap-{id}"),
edge_tables: Vec::new(),
edge_tables: vec![GraphIndexTableState {
table_key: format!("edge:t{id}"),
table_version: 1,
table_branch: None,
e_tag: None,
endpoints: ("A".to_string(), "B".to_string()),
}],
}
}
@ -300,6 +327,34 @@ mod tests {
Arc::new(GraphIndex::empty_for_test())
}
/// An edge table at the same physical identity but a different `(from_type,
/// to_type)` endpoint mapping (a schema repoint) must NOT share a cache entry
/// — the built index's `TypeIndex` namespace is keyed by those endpoints.
#[test]
fn endpoint_remap_at_same_physical_identity_splits_cache_key() {
let base = GraphIndexTableState {
table_key: "edge:Knows".to_string(),
table_version: 7,
table_branch: None,
e_tag: Some("etag".to_string()),
endpoints: ("Person".to_string(), "Person".to_string()),
};
let repointed = GraphIndexTableState {
endpoints: ("Person".to_string(), "Account".to_string()),
..base.clone()
};
let k_old = GraphIndexCacheKey {
edge_tables: vec![base],
};
let k_new = GraphIndexCacheKey {
edge_tables: vec![repointed],
};
assert_ne!(
k_old, k_new,
"a schema endpoint remap must produce a distinct graph-index cache key"
);
}
#[test]
fn graph_index_cache_evicts_oldest_entry() {
let mut cache = GraphIndexCache::default();

View file

@ -309,6 +309,8 @@ impl OpProbes {
probe_count: Arc::clone(&h.probe_count),
data_open_count: Arc::clone(&h.data_open_count),
internal_open_count: Arc::clone(&h.internal_open_count),
// graph_build_count / graph_edges_built unused by this harness.
..Default::default()
};
(probes, h)
}

View file

@ -110,6 +110,26 @@ pub async fn count_rows_branch(db: &Omnigraph, branch: &str, table_key: &str) ->
ds.count_rows(None).await.unwrap()
}
/// First result column as sorted strings — the shared shape the traversal /
/// cost tests use to compare a query's returned names. Empty for a 0-row result.
pub fn first_column_sorted(result: &QueryResult) -> Vec<String> {
if result.num_rows() == 0 {
return Vec::new();
}
let batch = result.concat_batches().unwrap();
let col = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let mut v: Vec<String> = (0..col.len())
.filter(|&i| !col.is_null(i))
.map(|i| col.value(i).to_string())
.collect();
v.sort();
v
}
/// Collect all string values from a named column across batches.
pub fn collect_column_strings(batches: &[RecordBatch], col: &str) -> Vec<String> {
let mut out = Vec::new();

View file

@ -9,10 +9,10 @@
//! third ExpandMode, the anti-join fast/slow fork) fail loudly instead of
//! silently.
//!
//! Each test is a sync `#[test]` + `#[serial]`: it builds its own runtime and
//! `block_on`s per generated case (proptest closures are sync), and the
//! mode-equivalence test writes `OMNIGRAPH_TRAVERSAL_MODE`, so serial execution
//! keeps env writes from racing other tests in this binary.
//! Each test is a sync `#[test]`: it builds its own runtime and `block_on`s per
//! generated case (proptest closures are sync). The mode-equivalence test forces
//! the Expand mode via the scoped `with_traversal_mode` seam — no env mutation, so
//! no `#[serial]` and no leak across shrink/cases.
mod helpers;
@ -21,9 +21,9 @@ use std::collections::HashSet;
use arrow_array::{Array, StringArray};
use proptest::prelude::*;
use proptest::test_runner::{Config, TestRunner};
use serial_test::serial;
use omnigraph::db::{Omnigraph, ReadTarget};
use omnigraph::instrumentation::with_traversal_mode;
use omnigraph::loader::{LoadMode, load_jsonl};
use omnigraph_compiler::ir::ParamMap;
use omnigraph_compiler::query::ast::Literal;
@ -138,27 +138,6 @@ fn config() -> Config {
}
}
fn clear_mode() {
unsafe { std::env::remove_var("OMNIGRAPH_TRAVERSAL_MODE") };
}
/// RAII guard that sets `OMNIGRAPH_TRAVERSAL_MODE` and clears it on drop — so a
/// panic mid-case (e.g. a query `unwrap`) cannot leak the forced mode into
/// proptest's subsequent shrink/cases and mask the divergence under test. SAFE:
/// every test in this binary is `#[serial]`, so no thread reads the env during
/// the write.
struct ModeGuard;
impl ModeGuard {
fn set(mode: &str) -> Self {
unsafe { std::env::set_var("OMNIGRAPH_TRAVERSAL_MODE", mode) };
ModeGuard
}
}
impl Drop for ModeGuard {
fn drop(&mut self) {
unsafe { std::env::remove_var("OMNIGRAPH_TRAVERSAL_MODE") };
}
}
async fn load_graph(graph: &GenGraph) -> (tempfile::TempDir, Omnigraph) {
let dir = tempfile::tempdir().unwrap();
@ -203,7 +182,6 @@ async fn col0_set(db: &mut Omnigraph, name: &str, params: &ParamMap) -> HashSet<
// one (worksAt{1,2}, collision-prone). This is the search-over-the-class version
// of the hand-built cross-type-collision fixture.
#[test]
#[serial]
fn prop_expand_indexed_eq_csr() {
let rt = tokio::runtime::Runtime::new().unwrap();
let mut runner = TestRunner::new(config());
@ -214,17 +192,12 @@ fn prop_expand_indexed_eq_csr() {
for start in graph.persons.clone() {
let p = one_param(&start);
for q in ["friends", "employers"] {
// Each guard clears the mode on drop (end of the block,
// or on panic), so a forced mode never leaks across runs.
let csr = {
let _g = ModeGuard::set("csr");
col0_sorted(&mut db, q, &p).await
};
let indexed = {
let _g = ModeGuard::set("indexed");
col0_sorted(&mut db, q, &p).await
};
// No guard → env unset → auto (cost-based) path.
// The seam is scope-bound: the forced mode is gone when the
// wrapped future resolves, so it never leaks across runs.
let csr = with_traversal_mode("csr", col0_sorted(&mut db, q, &p)).await;
let indexed =
with_traversal_mode("indexed", col0_sorted(&mut db, q, &p)).await;
// No override → auto (cost-based) path.
let auto = col0_sorted(&mut db, q, &p).await;
if csr != indexed || csr != auto {
return Some((start, q, csr, indexed, auto));
@ -247,9 +220,7 @@ fn prop_expand_indexed_eq_csr() {
// destination type's loaded key set — independent of the two-mode comparison, so
// it catches over-emission even if both modes are wrong identically.
#[test]
#[serial]
fn prop_results_subset_of_existing_nodes() {
clear_mode();
let rt = tokio::runtime::Runtime::new().unwrap();
let mut runner = TestRunner::new(config());
runner
@ -282,9 +253,7 @@ fn prop_results_subset_of_existing_nodes() {
// INVARIANT 3: anti-join complement. `not { $p worksAt $_ }` and its complement
// (persons WITH a worksAt) must be disjoint and together cover all persons.
#[test]
#[serial]
fn prop_antijoin_partitions_persons() {
clear_mode();
let rt = tokio::runtime::Runtime::new().unwrap();
let mut runner = TestRunner::new(config());
runner

View file

@ -1,7 +1,11 @@
mod helpers;
use std::sync::Arc;
use std::sync::atomic::{AtomicU64, Ordering};
use omnigraph::db::MergeOutcome;
use omnigraph::db::Omnigraph;
use omnigraph::db::{Omnigraph, ReadTarget};
use omnigraph::instrumentation::{QueryIoProbes, with_query_io_probes, with_traversal_mode};
use omnigraph::loader::{LoadMode, load_jsonl};
use helpers::*;
@ -244,3 +248,82 @@ async fn s3_schema_apply_migrates_live_graph() {
"live S3 schema must carry the migration"
);
}
/// Graph-index (CSR topology) cross-branch reuse on a real object store, where the
/// cache key's per-table `e_tag` is a genuine non-`None` token (Lance e_tag is
/// `None` on local FS, so the local twin in `warm_read_cost.rs` keys on `None` —
/// this exercises the e_tag-present path production runs). With e_tags present, a
/// fresh lazy-fork branch reuses main's cached index (`graph_build_count == 0`).
/// Forces CSR via the scoped `with_traversal_mode` seam (no env mutation, so no
/// interference with the other tests in this binary).
#[tokio::test(flavor = "multi_thread")]
async fn s3_fresh_branch_traversal_reuses_main_graph_index_with_etags() {
let Some(uri) = s3_test_graph_uri("graph-index-etag") else {
eprintln!("skipping s3 graph-index test: OMNIGRAPH_S3_TEST_BUCKET is not set");
return;
};
let mut writer = Omnigraph::init(&uri, TEST_SCHEMA).await.unwrap();
// TEST_DATA seeds Alice->Bob and Alice->Charlie Knows edges.
load_jsonl(&mut writer, TEST_DATA, LoadMode::Overwrite)
.await
.unwrap();
// Separate reader: it never creates the branch, so branch_create below does
// not invalidate the reader's warm cache.
let reader = Omnigraph::open(&uri).await.unwrap();
// Warm main on the CSR path: builds + caches the topology index keyed by the
// edge table's physical identity incl. its real e_tag.
let warm = with_traversal_mode(
"csr",
reader.query(
ReadTarget::branch("main"),
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Alice")]),
),
)
.await
.unwrap();
assert_eq!(
first_column_sorted(&warm),
vec!["Bob", "Charlie"],
"test setup: Alice knows Bob and Charlie"
);
// Lazy fork: feature's edge tables are physically main's (same version +
// e_tag, table_branch = None).
writer.branch_create("feature").await.unwrap();
let graph_build = Arc::new(AtomicU64::new(0));
let probes = QueryIoProbes {
graph_build_count: Arc::clone(&graph_build),
..Default::default()
};
let on_branch = with_traversal_mode(
"csr",
with_query_io_probes(
probes,
reader.query(
ReadTarget::branch("feature"),
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Alice")]),
),
),
)
.await
.unwrap();
assert_eq!(
first_column_sorted(&on_branch),
vec!["Bob", "Charlie"],
"fresh branch sees main's edges (lazy fork) and the reused index is correct"
);
assert_eq!(
graph_build.load(Ordering::Relaxed),
0,
"with real e_tags, a fresh lazy-fork branch must reuse main's cached CSR index, not rebuild"
);
}

View file

@ -1,63 +1,43 @@
//! BTREE-indexed Expand path (`execute_expand_indexed`) coverage.
//!
//! These tests force the Expand execution mode via `OMNIGRAPH_TRAVERSAL_MODE`
//! and assert the indexed path matches the CSR path (both are semantically
//! identical — the indexed path just serves neighbor lookups from the persisted
//! src/dst BTREE instead of an in-memory CSR). They live in their own test
//! binary and are all `#[serial]`, so the env writes never race a concurrent
//! reader: within this process serial execution serializes every env read, and
//! other test binaries (e.g. `traversal.rs`) are separate processes whose env
//! stays unset (→ CSR), validating the shared hydrate/align tail on the CSR path.
//! These tests force the Expand execution mode via the scoped `with_traversal_mode`
//! test seam — NOT the process-global `OMNIGRAPH_TRAVERSAL_MODE` env var — and
//! assert the indexed path matches the CSR path (both are semantically identical:
//! the indexed path serves neighbor lookups from the persisted src/dst BTREE
//! instead of an in-memory CSR). The seam is scope-bound and process-safe, so
//! these tests need no `#[serial]` and no dedicated binary.
mod helpers;
use arrow_array::{Array, StringArray};
use omnigraph::db::Omnigraph;
use omnigraph::instrumentation::with_traversal_mode;
use omnigraph::loader::{LoadMode, load_jsonl};
use omnigraph::table_store::{IndexCoverage, TableStore};
use omnigraph_compiler::ir::ParamMap;
use serial_test::serial;
use helpers::*;
fn set_mode(mode: &str) {
// SAFE: every test here is #[serial] and this binary has no non-serial
// env reader, so no thread reads the environment during this write.
unsafe { std::env::set_var("OMNIGRAPH_TRAVERSAL_MODE", mode) };
}
fn clear_mode() {
unsafe { std::env::remove_var("OMNIGRAPH_TRAVERSAL_MODE") };
}
/// Run a name-returning query and return its first column, sorted.
/// Run `name` on main under the cost-chooser (auto) Expand mode; first column sorted.
async fn sorted_names(db: &mut Omnigraph, queries: &str, name: &str, params: &ParamMap) -> Vec<String> {
let result = query_main(db, queries, name, params).await.unwrap();
if result.num_rows() == 0 {
return Vec::new();
}
let batch = result.concat_batches().unwrap();
let col = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let mut v: Vec<String> = (0..col.len()).map(|i| col.value(i).to_string()).collect();
v.sort();
v
first_column_sorted(&query_main(db, queries, name, params).await.unwrap())
}
/// Run the same query under CSR, indexed, and auto (cost-chooser) modes; assert
/// all three produce identical results and return them. The auto pass exercises
/// `choose_expand_mode` end to end: whichever path it selects, the rows must
/// match the forced paths (the chooser changes which path runs, never the result).
/// all three produce identical results and return them. The forced modes use the
/// scoped `with_traversal_mode` seam; the auto pass exercises `choose_expand_mode`
/// end to end (whichever path it selects, the rows must match the forced paths —
/// the chooser changes which path runs, never the result).
async fn both_modes(db: &mut Omnigraph, queries: &str, name: &str, params: &ParamMap) -> Vec<String> {
set_mode("csr");
let csr = sorted_names(db, queries, name, params).await;
set_mode("indexed");
let indexed = sorted_names(db, queries, name, params).await;
clear_mode();
let csr = first_column_sorted(
&with_traversal_mode("csr", query_main(db, queries, name, params))
.await
.unwrap(),
);
let indexed = first_column_sorted(
&with_traversal_mode("indexed", query_main(db, queries, name, params))
.await
.unwrap(),
);
let auto = sorted_names(db, queries, name, params).await;
assert_eq!(
indexed, csr,
@ -72,7 +52,6 @@ async fn both_modes(db: &mut Omnigraph, queries: &str, name: &str, params: &Para
// The C6 index-coverage guard: `key_column_index_coverage` must report whether
// a `key_col IN (...)` scan will use the persisted BTREE or silently full-scan.
// Not #[serial] — it calls the helper directly and reads no env.
#[tokio::test]
async fn key_column_index_coverage_detects_btree_presence() {
let dir = tempfile::tempdir().unwrap();
@ -136,7 +115,6 @@ async fn coverage_degrades_for_appended_unindexed_fragment() {
}
#[tokio::test]
#[serial]
async fn indexed_matches_csr_one_hop_same_type() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
@ -146,7 +124,6 @@ async fn indexed_matches_csr_one_hop_same_type() {
}
#[tokio::test]
#[serial]
async fn indexed_matches_csr_multi_hop_same_type() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
@ -165,7 +142,6 @@ query reach($name: String) {
}
#[tokio::test]
#[serial]
async fn indexed_matches_csr_cross_type() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
@ -183,7 +159,6 @@ query employer($name: String) {
}
#[tokio::test]
#[serial]
async fn indexed_matches_csr_no_match() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
@ -193,7 +168,6 @@ async fn indexed_matches_csr_no_match() {
}
#[tokio::test]
#[serial]
async fn indexed_finds_unindexed_appended_edge() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
@ -212,9 +186,14 @@ async fn indexed_finds_unindexed_appended_edge() {
.await
.unwrap();
set_mode("indexed");
let got = sorted_names(&mut db, TEST_QUERIES, "friends_of", &params(&[("$name", "Alice")])).await;
clear_mode();
let got = first_column_sorted(
&with_traversal_mode(
"indexed",
query_main(&mut db, TEST_QUERIES, "friends_of", &params(&[("$name", "Alice")])),
)
.await
.unwrap(),
);
assert_eq!(
got,
@ -234,7 +213,6 @@ async fn indexed_finds_unindexed_appended_edge() {
// CSR path never produces. `both_modes` (csr == indexed == auto) plus the
// golden assert catch both the divergence and an over-emitting shared bug.
#[tokio::test]
#[serial]
async fn cross_type_id_collision_does_not_bleed_into_second_hop() {
const SCHEMA: &str = r#"
node Person { name: String @key }
@ -288,7 +266,6 @@ query reach($name: String) {
// bounded range deliberately: an unbounded `{1,}` is a typecheck error, not a
// runtime path. `both_modes` also confirms indexed == csr on the cycle.
#[tokio::test]
#[serial]
async fn variable_hops_terminate_and_dedup_on_cycle() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
@ -310,7 +287,6 @@ async fn variable_hops_terminate_and_dedup_on_cycle() {
// A self-loop a->a plus a->b. Variable-length traversal must not loop forever and
// must not re-emit the seeded source.
#[tokio::test]
#[serial]
async fn variable_hops_handle_self_loop() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();

View file

@ -7,34 +7,18 @@
mod helpers;
use arrow_array::{Array, StringArray};
use std::sync::Arc;
use std::sync::atomic::{AtomicU64, Ordering};
use omnigraph::db::{Omnigraph, ReadTarget};
use omnigraph_compiler::result::QueryResult;
use omnigraph::instrumentation::{QueryIoProbes, with_query_io_probes, with_traversal_mode};
use helpers::cost::{cost_harness, measure};
use helpers::{
MUTATION_QUERIES, TEST_QUERIES, commit_many, count_rows, init_and_load, mixed_params,
mutate_branch, mutate_main, params,
MUTATION_QUERIES, TEST_QUERIES, commit_many, count_rows, first_column_sorted, init_and_load,
mixed_params, mutate_branch, mutate_main, params,
};
fn first_column_strings(result: &QueryResult) -> Vec<String> {
if result.num_rows() == 0 {
return Vec::new();
}
let batch = result.concat_batches().unwrap();
let values = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let mut out = (0..values.len())
.filter(|&row| !values.is_null(row))
.map(|row| values.value(row).to_string())
.collect::<Vec<_>>();
out.sort();
out
}
/// A warm same-branch read must do ZERO `__manifest` object-store reads and must
/// not open the commit graph, even at commit-history depth. Wrapped in
/// `cost_harness`, so `manifest_reads` is ground truth: the warm-coordinator
@ -458,10 +442,15 @@ async fn recreated_branch_owned_table_handle_uses_table_etag() {
);
}
/// The graph-index cache is keyed by synthetic snapshot id plus edge-table
/// state. A recreated branch can reuse the same edge table `(branch, version)`,
/// so the synthetic snapshot id must carry the manifest incarnation or traversal
/// can reuse stale topology.
/// A recreated branch can reuse the same edge table `(branch, version)`. The
/// graph-index cache is keyed (A1) by each edge table's physical identity
/// `(table_key, version, table_branch, e_tag)`; on local FS the e_tag is `None`,
/// so a recreated branch at the same version has the same key — the stale topology
/// is instead evicted by the same-branch manifest refresh (`invalidate_all` on the
/// `version_probes == 2` stale path), the documented e_tag-less fallback. This
/// traversal takes the indexed path (single-source frontier), so it also exercises
/// the table-handle cache incarnation; the assertion is that recreated-branch
/// topology is never stale regardless of path.
#[tokio::test]
async fn recreated_branch_traversal_uses_graph_index_incarnation() {
let dir = tempfile::tempdir().unwrap();
@ -493,7 +482,7 @@ async fn recreated_branch_traversal_uses_graph_index_incarnation() {
)
.await
.unwrap();
assert_eq!(first_column_strings(&old_friends), vec!["Alice"]);
assert_eq!(first_column_sorted(&old_friends), vec!["Alice"]);
let old_edge_entry = reader
.snapshot_of(ReadTarget::branch("feature"))
.await
@ -540,7 +529,7 @@ async fn recreated_branch_traversal_uses_graph_index_incarnation() {
.await;
let new_friends = new_friends.unwrap();
assert_eq!(
first_column_strings(&new_friends),
first_column_sorted(&new_friends),
vec!["Bob"],
"traversal must use the recreated branch's topology, not stale cached graph index"
);
@ -563,7 +552,7 @@ async fn recreated_branch_traversal_uses_graph_index_incarnation() {
.await
.unwrap();
assert_eq!(
first_column_strings(&stale_old_friends),
first_column_sorted(&stale_old_friends),
Vec::<String>::new(),
"old branch topology must not leak after branch recreation"
);
@ -728,3 +717,134 @@ async fn write_invalidates_table_cache_for_changed_table() {
"the post-write read observes the new row (no stale handle served)"
);
}
// ─── Topology-index build cost (A1 cross-branch reuse + A2 scoped build) ─────
//
// These force the CSR build path (the indexed path builds no topology) via the
// scoped `with_traversal_mode` seam — no process-global env, so they are safe in
// this mixed serial/non-serial binary and need no `#[serial]`. They read the
// `graph_build_count` / `graph_edges_built` probes off a directly-constructed
// `QueryIoProbes`.
/// A1: a fresh (unwritten) branch reuses main's cached CSR topology index
/// (`graph_build_count == 0`), and the reused index returns correct results for
/// the branch. Before A1 the branch-keyed snapshot id forced a rebuild (count 1).
#[tokio::test]
async fn fresh_branch_traversal_reuses_main_graph_index() {
let dir = tempfile::tempdir().unwrap();
let mut writer = init_and_load(&dir).await;
let uri = dir.path().to_str().unwrap();
// A Knows edge on main so there is topology to build and then reuse.
mutate_main(
&mut writer,
MUTATION_QUERIES,
"insert_person_and_friend",
&mixed_params(&[("$name", "Walker"), ("$friend", "Alice")], &[("$age", 41)]),
)
.await
.unwrap();
// Separate reader handle. As in production, the reader never creates the
// branch, so creating it does not invalidate the reader's warm cache.
let reader = Omnigraph::open(uri).await.unwrap();
// Reader warms main on the CSR path: builds and caches the topology index.
let warm = with_traversal_mode(
"csr",
reader.query(
ReadTarget::branch("main"),
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Walker")]),
),
)
.await
.unwrap();
assert_eq!(
first_column_sorted(&warm),
vec!["Alice"],
"test setup: main has the Knows edge"
);
// A separate writer creates the branch (lazy fork: feature's edge tables are
// physically main's — same version + e_tag, table_branch=None).
writer.branch_create("feature").await.unwrap();
let graph_build = Arc::new(AtomicU64::new(0));
let probes = QueryIoProbes {
graph_build_count: Arc::clone(&graph_build),
..Default::default()
};
let on_branch = with_traversal_mode(
"csr",
with_query_io_probes(
probes,
reader.query(
ReadTarget::branch("feature"),
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Walker")]),
),
),
)
.await
.unwrap();
assert_eq!(
first_column_sorted(&on_branch),
vec!["Alice"],
"fresh branch sees main's edges (lazy fork) and the reused index is correct"
);
assert_eq!(
graph_build.load(Ordering::Relaxed),
0,
"a fresh branch with unchanged edges must reuse main's cached CSR index, not rebuild it"
);
}
/// A2: a query referencing one edge type builds the topology for only that edge,
/// not every edge in the catalog. Forces CSR (the build path) and counts edge
/// tables built. Before A2 the build materialized all catalog edges (the fixture
/// defines Knows + WorksAt, so a build-all touches >= 2) — the cold-build cost.
#[tokio::test]
async fn single_edge_query_builds_only_referenced_edge() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
// A Knows edge so the referenced build has topology; the fixture also defines
// WorksAt, so a build-all would touch more than one edge.
mutate_main(
&mut db,
MUTATION_QUERIES,
"insert_person_and_friend",
&mixed_params(&[("$name", "Walker"), ("$friend", "Alice")], &[("$age", 41)]),
)
.await
.unwrap();
let graph_edges = Arc::new(AtomicU64::new(0));
let probes = QueryIoProbes {
graph_edges_built: Arc::clone(&graph_edges),
..Default::default()
};
let result = with_traversal_mode(
"csr",
with_query_io_probes(
probes,
db.query(
ReadTarget::branch("main"),
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Walker")]),
),
),
)
.await
.unwrap();
assert_eq!(first_column_sorted(&result), vec!["Alice"]);
assert_eq!(
graph_edges.load(Ordering::Relaxed),
1,
"a query referencing only `knows` must build only that edge, not all catalog edges"
);
}

View file

@ -138,8 +138,8 @@ flowchart TB
end
subgraph idx[graph index]
gi[GraphIndex<br/>CSR/CSC built per query]:::l2
rc[RuntimeCache LRU=8]:::l2
gi[GraphIndex<br/>CSR/CSC built per query<br/>scoped to traversed edges]:::l2
rc[RuntimeCache LRU=8<br/>keyed by edge-table identity]:::l2
end
subgraph io[Lance I/O]

View file

@ -6,7 +6,7 @@ Pipeline:
1. Parse + typecheck via `omnigraph-compiler`.
2. Lower to IR.
3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex`.
3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex` **scoped to the edge types the query actually traverses** (`referenced_edge_types`, recursing through `AntiJoin` inners) — not every edge type in the catalog. The CSR build full-scans each covered edge dataset, so scoping is what keeps a single-edge join (`$x identifiesPerson $p`) from scanning the whole graph's edge data. The `RuntimeCache` key is each covered edge table's **physical identity** `(table_key, version, table_branch, e_tag)` (not the resolved snapshot id), so a `{Knows}` index and a `{Knows, WorksAt}` index are distinct entries AND a lazy-fork branch whose edge tables physically *are* main's reuses main's built index instead of cold-scanning it.
4. Run `execute_query` against the snapshot.
### Read flow — sequence

View file

@ -336,7 +336,29 @@ them explicit.
deferred — it needs the Q8 cleanup-resurrection watermark first). The commit
graph IS now reconcilable from the manifest (RFC-013 Phase 7 — it is a pure
projection of the `graph_commit`/`graph_head` rows); the traversal id-map is
still rebuilt.
still rebuilt. The CSR/CSC topology index is now **scoped and cross-branch
reused** (the two cuts that closed the cross-edge-join hang): the build covers
only the edge types a query traverses (`referenced_edge_types` over
`Expand`/`AntiJoin`, not every catalog edge — a single-edge join no longer
scans the whole graph's edge data), and the `RuntimeCache` cache key is each
edge table's physical identity `(table_key, version, table_branch, e_tag)`
plus the edge's `(from_type, to_type)` endpoint mapping — rather than the
resolved snapshot id — so a lazy-fork branch reuses main's built index instead
of cold-scanning it, while a schema repoint of an edge type (which changes the
built `TypeIndex` namespace) still rebuilds even if the edge table's physical
identity is unchanged. Residual: on stores without per-table e_tags (local FS)
a branch deleted and recreated at the same version with the same endpoints has
the same key, so the incarnation distinction falls back to the same-branch
manifest refresh clearing read caches (`invalidate_all`); production object
stores carry real e_tags, so the key alone distinguishes incarnations there
(the e_tag-present cross-branch-reuse path is exercised in CI by
`s3_storage.rs::s3_fresh_branch_traversal_reuses_main_graph_index_with_etags`
against RustFS, which surfaces real ETags — local-FS tests cannot reach it).
Known narrow gap (local FS only): a cold *cross-branch* resolve of a
recreated branch (a long-lived reader bound to another branch) does not trigger
that same-branch refresh, so an e_tag-less recreated branch can still reuse a
stale entry until a same-branch read refreshes — acceptable because local FS is
a dev/test substrate and production carries e_tags.
- **Commit-graph parent under concurrency — CLOSED (RFC-013 Phase 7):** the graph
commit is now recorded in the manifest publish CAS, and the publisher resolves
the new commit's parent INSIDE its retry loop, per attempt, from the just-loaded

View file

@ -25,7 +25,7 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
| `staged_writes.rs` | TableStore staged-write primitives (`stage_append`, `stage_merge_insert`, `commit_staged`, `scan_with_staged`, `count_rows_with_staged`) — primitive-level only; engine code uses the in-memory `MutationStaging` accumulator instead |
| `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs, nor open datasets directly (`Dataset::open` / `DatasetBuilder::from_uri`/`from_namespace`) — reads route through `Snapshot::open` and the held-handle cache; `// forbidden-api-allow: <reason>` sentinel exempts reviewed lines |
| `lance_surface_guards.rs` | Pins the Lance API surfaces omnigraph depends on (named runtime + compile-only guards; see [lance.md](lance.md)) — the first smoke check on any Lance version bump; e.g. `compact_files_still_fails_on_blob_columns` turns red when the upstream blob-compaction fix lands |
| `warm_read_cost.rs` | Cost-budget tests for the warm read path (query-latency work), measured at the object-store boundary with Lance `IOTracker` (the LanceDB IO-counted pattern): a warm same-branch read does 0 manifest opens, 0 commit-graph opens, 1 version probe, validates the schema once (Fix 1 / finding A / Fix 2 at commit-history depth); stale same-branch reads perform exactly 2 probes and refresh manifest-only; recreated non-main branches with the same Lance version refresh by incarnation; recreated branch-owned table handles are distinguished by table e_tag or refresh-time cache clearing; recreated traversal topology is protected by synthetic snapshot-id incarnation or refresh-time cache clearing; a warm *repeat* read does 0 table opens via the held-handle cache and a write re-opens only the changed table at its new version/e_tag (Fix 3/6A). See "Cost-budget tests" below |
| `warm_read_cost.rs` | Cost-budget tests for the warm read path (query-latency work), measured at the object-store boundary with Lance `IOTracker` (the LanceDB IO-counted pattern): a warm same-branch read does 0 manifest opens, 1 version probe, validates the schema once (Fix 1 / finding A / Fix 2 at commit-history depth); stale same-branch reads perform exactly 2 probes and refresh manifest-only; recreated non-main branches with the same Lance version refresh by incarnation; recreated branch-owned table handles are distinguished by table e_tag or refresh-time cache clearing; recreated traversal topology is protected by per-edge-table e_tag in the graph-index cache key or refresh-time cache clearing; a warm *repeat* read does 0 table opens via the held-handle cache and a write re-opens only the changed table at its new version/e_tag (Fix 3/6A). Also the CSR topology-build cost guards: `fresh_branch_traversal_reuses_main_graph_index` (A1 — a lazy-fork branch reuses main's cached CSR index, 0 rebuilds via `graph_build_count`) and `single_edge_query_builds_only_referenced_edge` (A2 — a one-edge query builds only that edge via `graph_edges_built`); both force CSR via the scoped `with_traversal_mode` seam, so they need no `#[serial]`. See "Cost-budget tests" below. |
| `write_cost.rs` | Cost-budget tests for the WRITE path (RFC-013), the latency twin of `warm_read_cost.rs` on the **shared `helpers::cost` harness** (`measure`/`IoCounts`/`assert_flat`/`local_graph`). Runs on **local FS**; gates the **internal-table** term (`__manifest` scans flat in commit-history depth, lineage rows included — `internal_table_scans_are_flat_in_history`, now **green every-PR** since RFC-013 step 2 brought the internal tables into `optimize`; the test compacts at each depth before measuring) plus green every-PR guards (single-insert `data_writes` bounded, a per-write read-op ceiling that fails the moment a round-trip is added, and a `measure_with_staged` fitness assert that a keyed insert routes through `stage_merge_insert` once with no `stage_append`/vector-index build). The **data-table opener** term is S3-only — see `write_cost_s3.rs` and the backend-split note in "Cost-budget tests" below |
| `helpers/cost.rs` | The shared cost-budget harness (not a test): `IoCounts`/`StagedCounts` (counts by table class), `measure`/`measure_with_staged` (the one place the `with_query_io_probes` + `MergeWriteProbes` task-local + `IOTracker` wiring lives; reads per-op deltas via lance's `incremental_stats()`, the upstream per-request idiom from `rust/lance/src/dataset/tests/dataset_io.rs`), `cost_harness`/`GraphIoMeter` (installs ONE `__manifest` `IOTracker` for a whole test body so the graph opens **under** it and `manifest_reads` is **ground truth** — every read regardless of handle age, the warm-coordinator freshness probe included — closing the blind spot where a per-op tracker installed at measure time cannot see a long-lived handle's reads; outside `cost_harness`, `measure` falls back to fresh per-op tracking, so `write_cost_s3.rs` is unaffected), `last_manifest_reads()` (the manifest read log for `assert_io_eq!`-style failure diagnostics), `assert_flat(curve, select, slack, what)`, and store-agnostic `local_graph`/`s3_graph` fixtures. `warm_read_cost.rs`, `write_cost.rs`, and `write_cost_s3.rs` all consume it so a cost test body is written once and reads in one vocabulary |
| `lifecycle.rs` | Graph lifecycle, schema state |
@ -35,13 +35,13 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
| `schema_apply.rs` | Migration plan + apply, schema-apply lock; index materialization deferred to the reconciler (iss-848): `apply_schema_defers_vector_index_on_empty_table` (an empty-table Vector `@index` never aborts the apply) and `index_only_constraint_apply_touches_no_table_data` (adding an `@index` is metadata-only — no table-version bump) |
| `search.rs` | FTS / vector / hybrid (`bm25`, `nearest`, `rrf`) |
| `traversal.rs` | `Expand`, variable-length hops, anti-join (CSR path — `OMNIGRAPH_TRAVERSAL_MODE` unset) |
| `traversal_indexed.rs` | BTREE-indexed Expand (`execute_expand_indexed`) forced via `OMNIGRAPH_TRAVERSAL_MODE`, asserted semantically equal to the CSR path; own binary, all `#[serial]` so env writes never race |
| `traversal_indexed.rs` | BTREE-indexed Expand (`execute_expand_indexed`) forced via the scoped `with_traversal_mode` seam (not the env var), asserted semantically equal to the CSR path. No `#[serial]` needed — the seam is scope-bound and process-safe. (The CSR topology-build cost guards — `fresh_branch_traversal_reuses_main_graph_index` (A1, `graph_build_count`) and `single_edge_query_builds_only_referenced_edge` (A2, `graph_edges_built`) — live in `warm_read_cost.rs`.) |
| `proptest_equivalence.rs` | Property-based query-correctness invariants over generated graphs (shared key alphabet forces cross-type id collisions, cycles, self-loops) — pins Expand-mode equivalence so a future fork divergence fails loudly instead of silently; `#[serial]` |
| `ordering.rs` | ORDER BY contract: descending, multi-key precedence, deterministic key-column tie-break (total order, so `ORDER … LIMIT` is deterministic), NULL placement (`nulls_first = !descending`) |
| `literal_filters.rs` | Execution goldens for non-string/non-integer scalar literal filters (F64/F32/Bool/Date/DateTime) across both the in-memory comparison arm and the Lance-pushdown arm |
| `aggregation.rs` | `count`, `sum`, `avg`, `min`, `max` |
| `export.rs` | NDJSON streaming export filters |
| `s3_storage.rs` | S3-backed graph (skipped unless `OMNIGRAPH_S3_TEST_BUCKET` is set) |
| `s3_storage.rs` | S3-backed graph (skipped unless `OMNIGRAPH_S3_TEST_BUCKET` is set). Includes `s3_fresh_branch_traversal_reuses_main_graph_index_with_etags` — the CSR topology cache-key test on a **real** per-table e_tag (`None` on local FS, so `warm_read_cost.rs` can't reach this path); forces CSR via the scoped `with_traversal_mode` seam |
| `lance_version_columns.rs` | Per-row `_row_last_updated_at_version` behavior |
| `validators.rs` | Schema constraint enforcement (enum, range, unique, cardinality) across JSONL, insert, update paths |
| `policy_engine_chassis.rs` | Engine-layer Cedar enforcement (MR-722): allow + deny through every `_as` writer via the SDK directly — no HTTP — proving embedded and CLI callers hit the same gate as the server, with action × scope shapes matching `authorize_request` |
@ -75,7 +75,7 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
CI runs these S3-backed tests against a containerized RustFS server (`.github/workflows/ci.yml``rustfs_integration` job):
- `cargo test -p omnigraph-engine --test s3_storage`
- `cargo test -p omnigraph-engine --test s3_storage` (lifecycle/branching + the e_tag-present CSR topology cache-key reuse test — the path local FS can't reach since its e_tag is `None`)
- `cargo test -p omnigraph-engine --test write_cost_s3` (RFC-013 step 3a's data-table opener cost gate — flat across commit depth on S3; the term local FS can't reproduce)
- `cargo test -p omnigraph-server --test s3` (single-graph serving + config-free `--cluster s3://` boot)
- `cargo test -p omnigraph-cluster --test s3_cluster` (full control-plane lifecycle on the bucket)