omnigraph/crates/omnigraph/tests/end_to_end.rs
Andrew Altshuler cb80fa40f1
exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113)
* exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr)

Switches `execute_node_scan` from string-flattened Lance SQL pushdown
(`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion
Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`).

## What this enables

1. **`CompOp::Contains` now pushes down.** `ir_filter_to_sql` returned
   `None` for list-contains (the comment said *"Can't pushdown list
   contains"*) because string SQL can't easily express it. With Expr,
   it lowers to DataFusion's `array_has(col, value)` builtin via the
   `nested_expressions` feature, and pushes down to Lance's scan layer
   the same way Eq/Lt/etc. do. Pinned by the new regression test
   `end_to_end::ir_filter_with_list_contains_pushes_down`.

2. **DataFusion 53's optimizer rules now reach our predicates.** Once
   the Expr lands at the Lance scanner, DF's planner runs:
   - `IN`-list vectorized eq kernel (DF #20528)
   - `PhysicalExprSimplifier` (DF #20111)
   - CASE WHEN x THEN y ELSE NULL shortcut (DF #20097)
   - Push limit into hash join (DF #20228)
   None of these were applicable before because the string SQL path
   short-circuited the optimizer.

## Scope

This is one of three string-flattened pushdown sites; the other two
(`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation
delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL
string path for now:

- The Expand pushdown still serializes through `hydrate_nodes`'s
  `extra_filter_sql: Option<&str>` parameter. Migrating it changes the
  `TableStorage` trait surface (`scan_stream(filter: Option<&str>)` →
  `Option<Expr>`) and the cascading call sites — out of scope for this
  MR.
- The mutation delete predicate still goes through `Dataset::delete(&str)`
  in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the
  Lance v7 bump per issue #112) will migrate that path to
  `DeleteBuilder::execute_uncommitted` taking an Expr.

The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql`
helpers stay in place to serve the remaining string-SQL consumers
(mutation predicates). They get retired when the other call sites
migrate.

## Cargo

Enables the `nested_expressions` feature on the `datafusion` workspace
dep. Lance already pulls in `datafusion-functions-nested` transitively
(it's listed in their feature set), so this just exposes the
`datafusion::functions_nested::expr_fn::array_has` re-export. No
transitive dep change (Cargo.lock unchanged).

## Tests

- New: `ir_filter_with_list_contains_pushes_down` — pins the case that
  was previously impossible (`ir_filter_to_sql` returning `None`).
- 906/906 workspace tests still pass.
- 417/417 engine integration tests pass (was 416 + the new one).
- 19/19 failpoints (recovery canary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break)

The RustFS S3 Integration job started failing 2026-05-23 with all 3
tests panicking on the first PUT:

  HTTP error: error sending request

The "Dump RustFS logs on failure" step revealed the container was
dying at startup:

  [FATAL] Server encountered an error and is shutting down:
  Default root credentials are not allowed on non-loopback listeners;
  set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values,
  bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
  for local development only

`rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a
credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as
"default" values. PR #111 passed yesterday because it ran against
beta.3; today's runs against beta.4 fail at container startup.

This is unrelated to PR #113's Expr-pushdown refactor — the bump
just happened to hit the same week.

Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The
right long-term fix is one of:
  - Rotate the CI creds to less-default values (less coupling to
    RustFS's "default" set definition)
  - Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the
    error message
  - Use a workflow service container with controlled lifecycle

Deferred — pinning is the minimal restore. Also incidentally
documents *which* version we tested against, which `:latest` never
did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 12:47:33 +01:00

1930 lines
58 KiB
Rust

mod helpers;
use arrow_array::{Array, Int32Array, RecordBatch, StringArray};
use futures::TryStreamExt;
use omnigraph::db::{Omnigraph, ReadTarget};
use omnigraph::loader::{LoadMode, load_jsonl, load_jsonl_file};
use omnigraph_compiler::ir::ParamMap;
use helpers::*;
// ─── Init + Load ────────────────────────────────────────────────────────────
#[tokio::test]
async fn init_creates_schema_file_and_manifest() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
assert!(dir.path().join("_schema.pg").exists());
assert!(dir.path().join("__manifest").exists());
assert_eq!(db.catalog().node_types.len(), 2);
assert_eq!(db.catalog().edge_types.len(), 2);
}
#[tokio::test]
async fn open_restores_full_state() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let original = init_and_load(&dir).await;
let v = version_main(&original).await.unwrap();
drop(original);
let reopened = Omnigraph::open(uri).await.unwrap();
assert_eq!(reopened.catalog().node_types.len(), 2);
assert_eq!(reopened.catalog().edge_types.len(), 2);
// Version should be what we left it at
// (manifest was committed during load)
assert!(version_main(&reopened).await.unwrap() >= v);
}
#[tokio::test]
async fn load_populates_all_types() {
let dir = tempfile::tempdir().unwrap();
let db = init_and_load(&dir).await;
let snap = snapshot_main(&db).await.unwrap();
// 4 persons
let person_ds = snap.open("node:Person").await.unwrap();
assert_eq!(person_ds.count_rows(None).await.unwrap(), 4);
// 2 companies
let company_ds = snap.open("node:Company").await.unwrap();
assert_eq!(company_ds.count_rows(None).await.unwrap(), 2);
// 3 Knows edges
let knows_ds = snap.open("edge:Knows").await.unwrap();
assert_eq!(knows_ds.count_rows(None).await.unwrap(), 3);
// 2 WorksAt edges
let works_at_ds = snap.open("edge:WorksAt").await.unwrap();
assert_eq!(works_at_ds.count_rows(None).await.unwrap(), 2);
}
// ─── Read consistency ───────────────────────────────────────────────────────
#[tokio::test]
async fn node_ids_are_key_values() {
let dir = tempfile::tempdir().unwrap();
let db = init_and_load(&dir).await;
let batches = read_table(&db, "node:Person").await;
let mut ids = collect_column_strings(&batches, "id");
ids.sort();
assert_eq!(ids, vec!["Alice", "Bob", "Charlie", "Diana"]);
}
#[tokio::test]
async fn node_properties_are_correct() {
let dir = tempfile::tempdir().unwrap();
let db = init_and_load(&dir).await;
let batches = read_table(&db, "node:Person").await;
let batch = &batches[0];
let ids = batch
.column_by_name("id")
.unwrap()
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let ages = batch
.column_by_name("age")
.unwrap()
.as_any()
.downcast_ref::<Int32Array>()
.unwrap();
// Find Alice's row and check age
let alice_idx = (0..ids.len()).find(|&i| ids.value(i) == "Alice").unwrap();
assert_eq!(ages.value(alice_idx), 30);
}
#[tokio::test]
async fn entity_at_returns_typed_json_values() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let schema = r#"
node Flagged {
slug: String @key
active: Bool
rating: I32?
}
"#;
let data = r#"{"type":"Flagged","data":{"slug":"alpha","active":true,"rating":42}}"#;
let mut db = Omnigraph::init(uri, schema).await.unwrap();
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let entity = db
.entity_at_target(ReadTarget::branch("main"), "node:Flagged", "alpha")
.await
.unwrap()
.unwrap();
assert_eq!(entity["id"], serde_json::json!("alpha"));
assert_eq!(entity["active"], serde_json::json!(true));
assert_eq!(entity["rating"], serde_json::json!(42));
}
#[tokio::test]
async fn nullable_vectors_round_trip_as_null() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let schema = r#"
node Doc {
slug: String @key
embedding: Vector(2)?
}
"#;
let data = r#"{"type":"Doc","data":{"slug":"a"}}
{"type":"Doc","data":{"slug":"b","embedding":[1.0,2.0]}}"#;
let mut db = Omnigraph::init(uri, schema).await.unwrap();
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let missing = db
.entity_at_target(ReadTarget::branch("main"), "node:Doc", "a")
.await
.unwrap()
.unwrap();
let present = db
.entity_at_target(ReadTarget::branch("main"), "node:Doc", "b")
.await
.unwrap()
.unwrap();
assert!(missing["embedding"].is_null());
assert_eq!(present["embedding"], serde_json::json!([1.0, 2.0]));
}
#[tokio::test]
async fn edge_src_dst_reference_node_ids() {
let dir = tempfile::tempdir().unwrap();
let db = init_and_load(&dir).await;
let batches = read_table(&db, "edge:Knows").await;
let batch = &batches[0];
let srcs = batch
.column_by_name("src")
.unwrap()
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let dsts = batch
.column_by_name("dst")
.unwrap()
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
// Collect all (src, dst) pairs
let mut edges: Vec<(&str, &str)> = (0..batch.num_rows())
.map(|i| (srcs.value(i), dsts.value(i)))
.collect();
edges.sort();
assert_eq!(
edges,
vec![("Alice", "Bob"), ("Alice", "Charlie"), ("Bob", "Diana")]
);
}
#[tokio::test]
async fn edge_ids_are_unique_strings() {
let dir = tempfile::tempdir().unwrap();
let db = init_and_load(&dir).await;
let batches = read_table(&db, "edge:Knows").await;
let batch = &batches[0];
let ids = batch
.column_by_name("id")
.unwrap()
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let id_values: Vec<&str> = (0..ids.len()).map(|i| ids.value(i)).collect();
// All unique
let mut deduped = id_values.clone();
deduped.sort();
deduped.dedup();
assert_eq!(id_values.len(), deduped.len());
// All non-empty
assert!(id_values.iter().all(|id| !id.is_empty()));
}
// ─── Load modes ─────────────────────────────────────────────────────────────
#[tokio::test]
async fn overwrite_replaces_data() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
// Load full data
load_jsonl(&mut db, TEST_DATA, LoadMode::Overwrite)
.await
.unwrap();
// Overwrite with just one person
let small = r#"{"type": "Person", "data": {"name": "Zara", "age": 40}}"#;
load_jsonl(&mut db, small, LoadMode::Overwrite)
.await
.unwrap();
let batches = read_table(&db, "node:Person").await;
let batch = &batches[0];
assert_eq!(batch.num_rows(), 1);
let ids = batch
.column_by_name("id")
.unwrap()
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
assert_eq!(ids.value(0), "Zara");
}
#[tokio::test]
async fn append_adds_rows() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
let batch1 = r#"{"type": "Person", "data": {"name": "Alice", "age": 30}}"#;
let batch2 = r#"{"type": "Person", "data": {"name": "Bob", "age": 25}}"#;
load_jsonl(&mut db, batch1, LoadMode::Overwrite)
.await
.unwrap();
load_jsonl(&mut db, batch2, LoadMode::Append).await.unwrap();
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Person").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 2);
}
// ─── Load from fixture file ─────────────────────────────────────────────────
#[tokio::test]
async fn load_from_file_works() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
let fixture_path = concat!(env!("CARGO_MANIFEST_DIR"), "/tests/fixtures/test.jsonl");
load_jsonl_file(&mut db, fixture_path, LoadMode::Overwrite)
.await
.unwrap();
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Person").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 4);
}
// ─── Signals fixture (complex @key schema) ──────────────────────────────────
#[tokio::test]
async fn signals_fixture_loads_correctly() {
let schema = include_str!("fixtures/signals.pg");
let data = include_str!("fixtures/signals.jsonl");
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let snap = snapshot_main(&db).await.unwrap();
// Verify some types have data
let company_ds = snap.open("node:Company").await.unwrap();
assert!(company_ds.count_rows(None).await.unwrap() > 0);
// Verify node IDs are @key values (slug)
let batches: Vec<arrow_array::RecordBatch> = company_ds
.scan()
.try_into_stream()
.await
.unwrap()
.try_collect()
.await
.unwrap();
let ids = collect_column_strings(&batches, "id");
// Should contain slug values like "aws", "openai", etc.
assert!(ids.contains(&"aws".to_string()));
assert!(ids.contains(&"openai".to_string()));
}
// ─── Query execution ────────────────────────────────────────────────────────
#[tokio::test]
async fn query_get_person_by_name() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = query_main(
&mut db,
TEST_QUERIES,
"get_person",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
assert_eq!(result.num_rows(), 1);
let batch = &result.batches()[0];
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
assert_eq!(names.value(0), "Alice");
let ages = batch
.column(1)
.as_any()
.downcast_ref::<Int32Array>()
.unwrap();
assert_eq!(ages.value(0), 30);
}
#[tokio::test]
async fn query_get_person_not_found() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = query_main(
&mut db,
TEST_QUERIES,
"get_person",
&params(&[("$name", "Nobody")]),
)
.await
.unwrap();
assert_eq!(result.num_rows(), 0);
}
#[tokio::test]
async fn query_adults_filtered_and_ordered() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = query_main(&mut db, TEST_QUERIES, "adults", &ParamMap::new())
.await
.unwrap();
// Only Charlie (35) matches age > 30, ordered desc
assert_eq!(result.num_rows(), 1);
let batch = &result.batches()[0];
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
assert_eq!(names.value(0), "Charlie");
}
#[tokio::test]
async fn query_top_by_age_with_limit() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = query_main(&mut db, TEST_QUERIES, "top_by_age", &ParamMap::new())
.await
.unwrap();
// Top 2 by age desc: Charlie (35), Alice (30)
assert_eq!(result.num_rows(), 2);
let batch = &result.batches()[0];
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
assert_eq!(names.value(0), "Charlie");
assert_eq!(names.value(1), "Alice");
let ages = batch
.column(1)
.as_any()
.downcast_ref::<Int32Array>()
.unwrap();
assert_eq!(ages.value(0), 35);
assert_eq!(ages.value(1), 30);
}
// ─── Graph traversal ─────────────────────────────────────────────────────
#[tokio::test]
async fn query_friends_of() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = query_main(
&mut db,
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
// Alice knows Bob and Charlie
let batch = result.concat_batches().unwrap();
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let mut friend_names: Vec<&str> = (0..names.len()).map(|i| names.value(i)).collect();
friend_names.sort();
assert_eq!(friend_names, vec!["Bob", "Charlie"]);
}
#[tokio::test]
async fn query_employees_of() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = query_main(
&mut db,
TEST_QUERIES,
"employees_of",
&params(&[("$company", "Acme")]),
)
.await
.unwrap();
// Alice works at Acme (reverse traversal)
let batch = result.concat_batches().unwrap();
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
assert_eq!(names.len(), 1);
assert_eq!(names.value(0), "Alice");
}
#[tokio::test]
async fn query_friends_of_friends() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = query_main(
&mut db,
TEST_QUERIES,
"friends_of_friends",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
// Alice→Bob→Diana (Alice→Charlie→nobody)
let batch = result.concat_batches().unwrap();
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let mut fof_names: Vec<&str> = (0..names.len()).map(|i| names.value(i)).collect();
fof_names.sort();
assert_eq!(fof_names, vec!["Diana"]);
}
#[tokio::test]
async fn query_unemployed() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = query_main(&mut db, TEST_QUERIES, "unemployed", &ParamMap::new())
.await
.unwrap();
// Charlie and Diana have no WorksAt edges
let batch = result.concat_batches().unwrap();
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let mut unemployed: Vec<&str> = (0..names.len()).map(|i| names.value(i)).collect();
unemployed.sort();
assert_eq!(unemployed, vec!["Charlie", "Diana"]);
}
#[tokio::test]
async fn query_anti_join_all_have_edges() {
let schema = r#"
node Person { name: String @key }
node Company { name: String @key }
edge WorksAt: Person -> Company
"#;
let data = r#"{"type": "Person", "data": {"name": "Alice"}}
{"type": "Person", "data": {"name": "Bob"}}
{"type": "Company", "data": {"name": "Acme"}}
{"edge": "WorksAt", "from": "Alice", "to": "Acme"}
{"edge": "WorksAt", "from": "Bob", "to": "Acme"}
"#;
let queries = r#"
query unemployed() {
match {
$p: Person
not { $p worksAt $_ }
}
return { $p.name }
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let result = query_main(&mut db, queries, "unemployed", &ParamMap::new())
.await
.unwrap();
// Everyone has a WorksAt edge → empty result
assert_eq!(result.num_rows(), 0);
}
// ─── Mutations ───────────────────────────────────────────────────────────────
#[tokio::test]
async fn mutation_insert_node() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = mutate_main(
&mut db,
MUTATION_QUERIES,
"insert_person",
&mixed_params(&[("$name", "Eve")], &[("$age", 22)]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
assert_eq!(result.affected_edges, 0);
// Query it back
let qr = query_main(
&mut db,
TEST_QUERIES,
"get_person",
&params(&[("$name", "Eve")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 1);
let batch = &qr.batches()[0];
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
assert_eq!(names.value(0), "Eve");
}
#[tokio::test]
async fn mutation_insert_edge() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
// Insert Eve
mutate_main(
&mut db,
MUTATION_QUERIES,
"insert_person",
&mixed_params(&[("$name", "Eve")], &[("$age", 22)]),
)
.await
.unwrap();
// Add edge Eve → Alice
let result = mutate_main(
&mut db,
MUTATION_QUERIES,
"add_friend",
&params(&[("$from", "Eve"), ("$to", "Alice")]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 0);
assert_eq!(result.affected_edges, 1);
// Verify traversal
let qr = query_main(
&mut db,
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Eve")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 1);
let batch = qr.concat_batches().unwrap();
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
assert_eq!(names.value(0), "Alice");
}
#[tokio::test]
async fn mutation_multi_insert_node_and_edge() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
// In one atomic mutation: insert Eve + edge Eve→Alice
let result = mutate_main(
&mut db,
MUTATION_QUERIES,
"insert_person_and_friend",
&mixed_params(&[("$name", "Eve"), ("$friend", "Alice")], &[("$age", 22)]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
assert_eq!(result.affected_edges, 1);
// Verify traversal: Eve → Alice
let qr = query_main(
&mut db,
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Eve")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 1);
let batch = qr.concat_batches().unwrap();
let names = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
assert_eq!(names.value(0), "Alice");
}
#[tokio::test]
async fn mutation_update_node() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let result = mutate_main(
&mut db,
MUTATION_QUERIES,
"set_age",
&mixed_params(&[("$name", "Alice")], &[("$age", 31)]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
assert_eq!(result.affected_edges, 0);
// Verify the update
let qr = query_main(
&mut db,
TEST_QUERIES,
"get_person",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 1);
let batch = &qr.batches()[0];
let ages = batch
.column(1)
.as_any()
.downcast_ref::<Int32Array>()
.unwrap();
assert_eq!(ages.value(0), 31);
}
#[tokio::test]
async fn mutation_delete_node_cascades_edges() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
// Alice has: 2 outgoing Knows (Alice→Bob, Alice→Charlie) + 1 WorksAt (Alice→Acme) = 3 edges
let result = mutate_main(
&mut db,
MUTATION_QUERIES,
"remove_person",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
assert!(
result.affected_edges >= 3,
"expected at least 3 cascaded edges, got {}",
result.affected_edges
);
// Alice should be gone
let qr = query_main(
&mut db,
TEST_QUERIES,
"get_person",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 0);
// Verify no edges reference Alice
let snap = snapshot_main(&db).await.unwrap();
for edge_key in &["edge:Knows", "edge:WorksAt"] {
let ds = snap.open(edge_key).await.unwrap();
let batches: Vec<arrow_array::RecordBatch> = ds
.scan()
.try_into_stream()
.await
.unwrap()
.try_collect()
.await
.unwrap();
for batch in &batches {
let srcs = batch
.column_by_name("src")
.unwrap()
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let dsts = batch
.column_by_name("dst")
.unwrap()
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
for i in 0..batch.num_rows() {
assert_ne!(
srcs.value(i),
"Alice",
"found edge src=Alice in {}",
edge_key
);
assert_ne!(
dsts.value(i),
"Alice",
"found edge dst=Alice in {}",
edge_key
);
}
}
}
}
#[tokio::test]
async fn mutation_delete_edge() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
// Delete all Knows edges from Alice (Alice→Bob, Alice→Charlie)
let result = mutate_main(
&mut db,
MUTATION_QUERIES,
"remove_friendship",
&params(&[("$from", "Alice")]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 0);
assert_eq!(result.affected_edges, 2);
// Alice should still exist
let qr = query_main(
&mut db,
TEST_QUERIES,
"get_person",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 1);
// But has no friends
let qr = query_main(
&mut db,
TEST_QUERIES,
"friends_of",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 0);
}
#[tokio::test]
async fn mutation_insert_duplicate_key_upserts() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
// Alice already exists with age=30. Insert again with age=99.
let result = mutate_main(
&mut db,
MUTATION_QUERIES,
"insert_person",
&mixed_params(&[("$name", "Alice")], &[("$age", 99)]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
// Should still be exactly 1 Alice (upsert, not duplicate)
let qr = query_main(
&mut db,
TEST_QUERIES,
"get_person",
&params(&[("$name", "Alice")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 1);
// Age should be updated to 99
let batch = &qr.batches()[0];
let ages = batch
.column(1)
.as_any()
.downcast_ref::<Int32Array>()
.unwrap();
assert_eq!(ages.value(0), 99);
}
#[tokio::test]
async fn mutation_update_key_property_rejected() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let queries = r#"
query rename_person($old_name: String, $new_name: String) {
update Person set { name: $new_name } where name = $old_name
}
"#;
let result = mutate_main(
&mut db,
queries,
"rename_person",
&params(&[("$old_name", "Alice"), ("$new_name", "Bob")]),
)
.await;
assert!(result.is_err());
let err = result.unwrap_err().to_string();
assert!(err.contains("@key"), "error should mention @key: {}", err);
}
// ─── Blob support ────────────────────────────────────────────────────────────
const BLOB_SCHEMA: &str = r#"
node Document {
title: String @key
content: Blob?
}
"#;
const BLOB_QUERIES: &str = r#"
query all_docs() {
match { $d: Document }
return { $d.title, $d.content }
}
query get_doc($title: String) {
match { $d: Document { title: $title } }
return { $d.title, $d.content }
}
"#;
const BLOB_MUTATIONS: &str = r#"
query insert_doc($title: String, $content: Blob) {
insert Document { title: $title, content: $content }
}
query update_doc_content($title: String, $content: Blob) {
update Document set { content: $content } where title = $title
}
"#;
#[tokio::test]
async fn blob_schema_parses_and_init_succeeds() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
assert!(
db.catalog().node_types["Document"]
.blob_properties
.contains("content")
);
assert_eq!(db.catalog().node_types["Document"].properties.len(), 2);
}
#[tokio::test]
async fn blob_load_base64_inline() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
// "Hello World" = "SGVsbG8gV29ybGQ="
let data = r#"{"type": "Document", "data": {"title": "readme", "content": "base64:SGVsbG8gV29ybGQ="}}
{"type": "Document", "data": {"title": "empty"}}
"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Document").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 2);
}
#[tokio::test]
async fn blob_query_returns_metadata() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
let data = r#"{"type": "Document", "data": {"title": "readme", "content": "base64:SGVsbG8gV29ybGQ="}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let result = query_main(
&mut db,
BLOB_QUERIES,
"get_doc",
&params(&[("$title", "readme")]),
)
.await
.unwrap();
assert_eq!(result.num_rows(), 1);
let json = result.to_sdk_json();
let row = json.as_array().unwrap().first().unwrap();
assert_eq!(row["d.title"], "readme");
// Blob columns return null in query projections — data is accessed via take_blobs API.
// (Lance bug: BlobsDescriptions + filter triggers assertion, so blobs are excluded from scan)
assert!(
row["d.content"].is_null(),
"blob column should return null in query projection"
);
}
#[tokio::test]
async fn blob_null_returns_null_in_query() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
let data = r#"{"type": "Document", "data": {"title": "empty"}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let result = query_main(
&mut db,
BLOB_QUERIES,
"get_doc",
&params(&[("$title", "empty")]),
)
.await
.unwrap();
assert_eq!(result.num_rows(), 1);
let json = result.to_sdk_json();
let row = json.as_array().unwrap().first().unwrap();
assert_eq!(row["d.title"], "empty");
// Nullable blob with no value should return null
assert!(
row["d.content"].is_null(),
"null blob should return null, got: {}",
row["d.content"]
);
}
#[tokio::test]
async fn blob_insert_mutation() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
let result = mutate_main(
&mut db,
BLOB_MUTATIONS,
"insert_doc",
&params(&[("$title", "new-doc"), ("$content", "base64:AQID")]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
// Query it back
let qr = query_main(
&mut db,
BLOB_QUERIES,
"get_doc",
&params(&[("$title", "new-doc")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 1);
let json = qr.to_sdk_json();
let row = json.as_array().unwrap().first().unwrap();
assert_eq!(row["d.title"], "new-doc");
// Blob column present but null in query projection (data accessed via take_blobs)
assert!(
row.get("d.content").is_some(),
"content column should be present"
);
}
#[tokio::test]
async fn blob_update_mutation() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
// First insert a doc with blob
mutate_main(
&mut db,
BLOB_MUTATIONS,
"insert_doc",
&params(&[("$title", "updatable"), ("$content", "base64:AQID")]),
)
.await
.unwrap();
// Update the blob
let result = mutate_main(
&mut db,
BLOB_MUTATIONS,
"update_doc_content",
&params(&[("$title", "updatable"), ("$content", "base64:BAUG")]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
let blob = db
.read_blob("Document", "updatable", "content")
.await
.unwrap();
let bytes = blob.read().await.unwrap();
assert_eq!(&bytes[..], &[4, 5, 6]);
}
// ─── Blob read API ───────────────────────────────────────────────────────
#[tokio::test]
async fn blob_read_returns_bytes() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
// "Hello World" = base64 "SGVsbG8gV29ybGQ="
let data = r#"{"type": "Document", "data": {"title": "readme", "content": "base64:SGVsbG8gV29ybGQ="}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let blob = db.read_blob("Document", "readme", "content").await.unwrap();
assert_eq!(blob.size(), 11); // "Hello World" = 11 bytes
let bytes = blob.read().await.unwrap();
assert_eq!(&bytes[..], b"Hello World");
}
#[tokio::test]
async fn blob_read_not_found_errors() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
let data = r#"{"type": "Document", "data": {"title": "readme", "content": "base64:SGVsbG8="}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
// Non-existent ID
let err = db.read_blob("Document", "nonexistent", "content").await;
assert!(err.is_err());
// Non-blob property
let err = db.read_blob("Document", "readme", "title").await;
assert!(err.is_err());
}
#[tokio::test]
async fn blob_read_after_mutation_insert() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
// Insert via mutation (base64 for bytes [1, 2, 3])
mutate_main(
&mut db,
BLOB_MUTATIONS,
"insert_doc",
&params(&[("$title", "inserted"), ("$content", "base64:AQID")]),
)
.await
.unwrap();
let blob = db
.read_blob("Document", "inserted", "content")
.await
.unwrap();
let bytes = blob.read().await.unwrap();
assert_eq!(&bytes[..], &[1, 2, 3]);
}
// ─── Blob low-level: probe BlobHandling::BlobsDescriptions ───────────────
#[tokio::test]
async fn blob_scan_with_descriptions_on_nonempty_dataset() {
use lance::datatypes::BlobHandling;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
let data = r#"{"type": "Document", "data": {"title": "readme", "content": "base64:SGVsbG8gV29ybGQ="}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
// Open the dataset directly and try BlobsDescriptions
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Document").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 1);
// BlobsDescriptions works without filter
let mut scanner = ds.scan();
scanner.blob_handling(BlobHandling::BlobsDescriptions);
let stream = scanner.try_into_stream().await.unwrap();
let batches: Vec<RecordBatch> = stream.try_collect().await.unwrap();
assert_eq!(batches.len(), 1);
assert_eq!(batches[0].num_rows(), 1);
// Blob descriptor is a struct with kind, position, size, blob_id, blob_uri
let content_col = batches[0].column_by_name("content").unwrap();
assert!(
matches!(content_col.data_type(), arrow_schema::DataType::Struct(_)),
"blob column should be Struct, got {:?}",
content_col.data_type()
);
}
// ─── Constraint enforcement ──────────────────────────────────────────────────
#[tokio::test]
async fn range_constraint_rejects_out_of_bounds() {
let schema = r#"
node Person {
name: String @key
age: I32?
@range(age, 0..200)
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
// age = 300 exceeds max of 200
let data = r#"{"type": "Person", "data": {"name": "Old", "age": 300}}"#;
let result = load_jsonl(&mut db, data, LoadMode::Overwrite).await;
assert!(result.is_err(), "expected range violation");
let err = result.unwrap_err().to_string();
assert!(err.contains("@range violation"), "error: {}", err);
}
#[tokio::test]
async fn range_constraint_allows_within_bounds() {
let schema = r#"
node Person {
name: String @key
age: I32?
@range(age, 0..200)
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
let data = r#"{"type": "Person", "data": {"name": "Alice", "age": 30}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Person").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 1);
}
#[tokio::test]
async fn range_constraint_float_rejects_out_of_bounds() {
let schema = r#"
node Measurement {
name: String @key
temperature: F64?
@range(temperature, 0.0..100.0)
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
let data = r#"{"type": "Measurement", "data": {"name": "hot", "temperature": 150.5}}"#;
let result = load_jsonl(&mut db, data, LoadMode::Overwrite).await;
assert!(result.is_err(), "expected range violation for float");
let err = result.unwrap_err().to_string();
assert!(err.contains("@range violation"), "error: {}", err);
}
#[tokio::test]
async fn range_constraint_float_allows_within_bounds() {
let schema = r#"
node Measurement {
name: String @key
temperature: F64?
@range(temperature, 0.0..100.0)
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
let data = r#"{"type": "Measurement", "data": {"name": "warm", "temperature": 37.5}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Measurement").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 1);
}
#[tokio::test]
async fn range_constraint_negative_float_bounds() {
let schema = r#"
node Measurement {
name: String @key
temperature: F64?
@range(temperature, -40.0..60.0)
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
// Within bounds — should succeed
let data = r#"{"type": "Measurement", "data": {"name": "cold", "temperature": -20.0}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
// Below minimum — should fail
let data = r#"{"type": "Measurement", "data": {"name": "arctic", "temperature": -50.0}}"#;
let result = load_jsonl(&mut db, data, LoadMode::Overwrite).await;
assert!(result.is_err(), "expected range violation for -50.0");
let err = result.unwrap_err().to_string();
assert!(err.contains("@range violation"), "error: {}", err);
}
#[tokio::test]
async fn check_constraint_rejects_bad_pattern() {
let schema = r#"
node Order {
code: String @key
@check(code, "^[A-Z]{3}-[0-9]+$")
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
let data = r#"{"type": "Order", "data": {"code": "invalid"}}"#;
let result = load_jsonl(&mut db, data, LoadMode::Overwrite).await;
assert!(result.is_err(), "expected check violation");
let err = result.unwrap_err().to_string();
assert!(err.contains("@check violation"), "error: {}", err);
}
#[tokio::test]
async fn check_constraint_allows_matching_pattern() {
let schema = r#"
node Order {
code: String @key
@check(code, "^[A-Z]{3}-[0-9]+$")
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
let data = r#"{"type": "Order", "data": {"code": "ABC-123"}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Order").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 1);
}
#[tokio::test]
async fn mutation_insert_rejects_range_violation() {
let schema = r#"
node Person {
name: String @key
age: I32?
@range(age, 0..200)
}
"#;
let queries = r#"
query insert_person($name: String, $age: I32) {
insert Person { name: $name, age: $age }
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
let result = mutate_main(&mut db, queries, "insert_person", &{
let mut p = omnigraph_compiler::ir::ParamMap::new();
p.insert(
"name".to_string(),
omnigraph_compiler::query::ast::Literal::String("Old".to_string()),
);
p.insert(
"age".to_string(),
omnigraph_compiler::query::ast::Literal::Integer(300),
);
p
})
.await;
assert!(result.is_err(), "expected range violation");
let err = result.unwrap_err().to_string();
assert!(err.contains("@range violation"), "error: {}", err);
}
#[tokio::test]
async fn mutation_update_rejects_range_violation() {
let schema = r#"
node Person {
name: String @key
age: I32?
@range(age, 0..200)
}
"#;
let queries = r#"
query set_age($name: String, $age: I32) {
update Person set { age: $age } where name = $name
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
load_jsonl(
&mut db,
r#"{"type": "Person", "data": {"name": "Alice", "age": 30}}"#,
LoadMode::Overwrite,
)
.await
.unwrap();
let result = mutate_main(&mut db, queries, "set_age", &{
let mut p = omnigraph_compiler::ir::ParamMap::new();
p.insert(
"name".to_string(),
omnigraph_compiler::query::ast::Literal::String("Alice".to_string()),
);
p.insert(
"age".to_string(),
omnigraph_compiler::query::ast::Literal::Integer(300),
);
p
})
.await;
assert!(result.is_err(), "expected range violation");
let err = result.unwrap_err().to_string();
assert!(err.contains("@range violation"), "error: {}", err);
}
#[tokio::test]
async fn mutation_insert_rejects_check_violation() {
let schema = r#"
node Order {
code: String @key
@check(code, "^[A-Z]{3}-[0-9]+$")
}
"#;
let queries = r#"
query insert_order($code: String) {
insert Order { code: $code }
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
let result = mutate_main(&mut db, queries, "insert_order", &{
let mut p = omnigraph_compiler::ir::ParamMap::new();
p.insert(
"code".to_string(),
omnigraph_compiler::query::ast::Literal::String("invalid".to_string()),
);
p
})
.await;
assert!(result.is_err(), "expected check violation");
let err = result.unwrap_err().to_string();
assert!(err.contains("@check violation"), "error: {}", err);
}
#[tokio::test]
async fn mutation_update_rejects_check_violation() {
let schema = r#"
node Order {
code: String @key
label: String?
@check(label, "^[A-Z]+$")
}
"#;
let queries = r#"
query set_label($code: String, $label: String) {
update Order set { label: $label } where code = $code
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
load_jsonl(
&mut db,
r#"{"type": "Order", "data": {"code": "ABC-123", "label": "VALID"}}"#,
LoadMode::Overwrite,
)
.await
.unwrap();
let result = mutate_main(&mut db, queries, "set_label", &{
let mut p = omnigraph_compiler::ir::ParamMap::new();
p.insert(
"code".to_string(),
omnigraph_compiler::query::ast::Literal::String("ABC-123".to_string()),
);
p.insert(
"label".to_string(),
omnigraph_compiler::query::ast::Literal::String("invalid".to_string()),
);
p
})
.await;
assert!(result.is_err(), "expected check violation");
let err = result.unwrap_err().to_string();
assert!(err.contains("@check violation"), "error: {}", err);
}
#[tokio::test]
async fn edge_cardinality_max_enforced() {
let schema = r#"
node Person { name: String @key }
node Company { name: String @key }
edge WorksAt: Person -> Company @card(0..1)
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
// Alice works at two companies — violates @card(0..1)
let data = r#"{"type": "Person", "data": {"name": "Alice"}}
{"type": "Company", "data": {"name": "Acme"}}
{"type": "Company", "data": {"name": "Globex"}}
{"edge": "WorksAt", "from": "Alice", "to": "Acme"}
{"edge": "WorksAt", "from": "Alice", "to": "Globex"}
"#;
let result = load_jsonl(&mut db, data, LoadMode::Overwrite).await;
assert!(result.is_err(), "expected cardinality violation");
let err = result.unwrap_err().to_string();
assert!(err.contains("@card violation"), "error: {}", err);
}
#[tokio::test]
async fn edge_cardinality_allows_within_bounds() {
let schema = r#"
node Person { name: String @key }
node Company { name: String @key }
edge WorksAt: Person -> Company @card(0..1)
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
let data = r#"{"type": "Person", "data": {"name": "Alice"}}
{"type": "Company", "data": {"name": "Acme"}}
{"edge": "WorksAt", "from": "Alice", "to": "Acme"}
"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("edge:WorksAt").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 1);
}
// ─── Regression: apply_assignments with blob mid-schema ──────────────────────
#[tokio::test]
async fn update_with_blob_mid_schema_does_not_panic() {
// Blob column in the MIDDLE of schema — not last. This previously caused
// a column-index mismatch in apply_assignments (batch.column(idx) used
// schema position but the batch had blob columns excluded from projection).
let schema = r#"
node Article {
slug: String @key
attachment: Blob?
summary: String?
rating: I32?
}
"#;
let mutations = r#"
query insert_article($slug: String, $summary: String, $rating: I32) {
insert Article { slug: $slug, summary: $summary, rating: $rating }
}
query update_summary($slug: String, $summary: String) {
update Article set { summary: $summary } where slug = $slug
}
query get_article($slug: String) {
match { $a: Article { slug: $slug } }
return { $a.slug, $a.summary, $a.rating }
}
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
mutate_main(
&mut db,
mutations,
"insert_article",
&mixed_params(
&[("$slug", "a1"), ("$summary", "hello")],
&[("$rating", 42)],
),
)
.await
.unwrap();
// This would panic with the old batch.column(idx) code
let result = mutate_main(
&mut db,
mutations,
"update_summary",
&params(&[("$slug", "a1"), ("$summary", "updated")]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
// Verify the update applied correctly
let qr = query_main(
&mut db,
mutations,
"get_article",
&params(&[("$slug", "a1")]),
)
.await
.unwrap();
assert_eq!(qr.num_rows(), 1);
}
// ─── Regression: blob update null → non-null ─────────────────────────────────
#[tokio::test]
async fn blob_update_null_to_non_null() {
// Regression: updating a blob column that was previously all-null panicked
// with assertion `left: 0, right: 1` in lance-table stream.rs because the
// two-phase blob update sent a blob-only batch to merge_insert on a dataset
// with zero blob fragments.
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
// Load a row with blob = null (no blob data in dataset)
let data = r#"{"type": "Document", "data": {"title": "kid-a"}}"#;
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
// Update: null → non-null blob. Previously panicked with assertion
// `left: 0, right: 1` in lance-table stream.rs.
let result = mutate_main(
&mut db,
BLOB_MUTATIONS,
"update_doc_content",
&params(&[("$title", "kid-a"), ("$content", "base64:AQID")]),
)
.await
.unwrap();
assert_eq!(result.affected_nodes, 1);
let blob = db.read_blob("Document", "kid-a", "content").await.unwrap();
let bytes = blob.read().await.unwrap();
assert_eq!(&bytes[..], &[1, 2, 3]);
}
// ─── Regression: blob load with external file URI ────────────────────────────
#[tokio::test]
async fn blob_load_external_file_uri() {
// Regression: loading blobs with external file:// URIs was rejected with
// "External blob URI '...' is outside registered external bases" because
// allow_external_blob_outside_bases was not set on data table write paths.
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
// Create a temp file to reference
let blob_dir = tempfile::tempdir().unwrap();
let blob_path = blob_dir.path().join("test.txt");
std::fs::write(&blob_path, b"Hello from file").unwrap();
let file_uri = format!("file://{}", blob_path.display());
let mut db = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap();
let data = format!(
r#"{{"type": "Document", "data": {{"title": "from-file", "content": "{}"}}}}"#,
file_uri
);
// Load with external URI
load_jsonl(&mut db, &data, LoadMode::Overwrite)
.await
.unwrap();
// Verify the blob is accessible
let blob = db
.read_blob("Document", "from-file", "content")
.await
.unwrap();
assert!(blob.uri().is_some(), "external blob should have a URI");
}
// ─── Regression: execute_update on edge type ─────────────────────────────────
#[tokio::test]
async fn update_edge_type_returns_error_not_panic() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
// The typechecker should reject this, but even if bypassed,
// execute_update must not panic with HashMap key-not-found.
let mutations = r#"
query update_edge($from: String) {
update Knows set { since: "2025-01-01" } where from = $from
}
"#;
let result = mutate_main(
&mut db,
mutations,
"update_edge",
&params(&[("$from", "Alice")]),
)
.await;
assert!(result.is_err(), "should return error, not panic");
}
// ─── Regression: Date/DateTime SQL literal escaping ──────────────────────────
#[tokio::test]
async fn date_literal_with_quote_is_escaped() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
// A date-like value with a single-quote must not cause SQL injection.
// This tests that literal_to_sql escapes Date/DateTime values.
let queries = r#"
query filter_date($d: String) {
match { $p: Person { name: $d } }
return { $p.name }
}
"#;
// Pass a value with a single-quote — should not error or return all rows
let result = query_main(
&mut db,
queries,
"filter_date",
&params(&[("$d", "2025-01-01' OR '1'='1")]),
)
.await
.unwrap();
assert_eq!(result.num_rows(), 0);
}
// ─── Regression: manifest row_count tracks total, not batch size ─────────────
#[tokio::test]
async fn append_mode_manifest_row_count_is_total() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await; // Overwrite: 4 persons
let extra = r#"{"type": "Person", "data": {"name": "Eve", "age": 22}}"#;
load_jsonl(&mut db, extra, LoadMode::Append).await.unwrap();
let snap = snapshot_main(&db).await.unwrap();
let entry = snap.entry("node:Person").unwrap();
// Must be total rows (4 + 1 = 5), not just the appended batch size (1)
assert_eq!(entry.row_count, 5);
// Verify actual dataset count matches manifest
let ds = snap.open("node:Person").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap() as u64, entry.row_count);
}
// ─── Regression: cardinality violation must not commit manifest ───────────────
#[tokio::test]
async fn cardinality_violation_does_not_commit_manifest() {
let schema = r#"
node Person { name: String @key }
node Company { name: String @key }
edge WorksAt: Person -> Company @card(0..1)
"#;
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, schema).await.unwrap();
// Alice works at two companies — violates @card(0..1) (at most 1)
let data = r#"
{"type": "Person", "data": {"name": "Alice"}}
{"type": "Company", "data": {"name": "Acme"}}
{"type": "Company", "data": {"name": "Beta"}}
{"edge": "WorksAt", "from": "Alice", "to": "Acme"}
{"edge": "WorksAt", "from": "Alice", "to": "Beta"}
"#;
let v_before = version_main(&db).await.unwrap();
let result = load_jsonl(&mut db, data, LoadMode::Overwrite).await;
assert!(result.is_err(), "cardinality violation should be rejected");
assert!(
result.unwrap_err().to_string().contains("@card violation"),
"error should mention @card"
);
// Manifest must NOT have advanced — invalid data was not committed
assert_eq!(version_main(&db).await.unwrap(), v_before);
}
// ─── Regression: dangling edge references are rejected ───────────────────────
#[tokio::test]
async fn dangling_edge_dst_rejected_on_load() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
let data = r#"
{"type": "Person", "data": {"name": "Alice", "age": 30}}
{"type": "Company", "data": {"name": "Acme"}}
{"edge": "Knows", "from": "Alice", "to": "NonExistent"}
"#;
let result = load_jsonl(&mut db, data, LoadMode::Overwrite).await;
assert!(result.is_err(), "dangling edge dst should be rejected");
let err = result.unwrap_err().to_string();
assert!(
err.contains("not found"),
"error should mention 'not found': {}",
err
);
}
#[tokio::test]
async fn dangling_edge_src_rejected_on_load() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
let data = r#"
{"type": "Person", "data": {"name": "Alice", "age": 30}}
{"type": "Company", "data": {"name": "Acme"}}
{"edge": "WorksAt", "from": "Ghost", "to": "Acme"}
"#;
let result = load_jsonl(&mut db, data, LoadMode::Overwrite).await;
assert!(result.is_err(), "dangling edge src should be rejected");
let err = result.unwrap_err().to_string();
assert!(
err.contains("not found"),
"error should mention 'not found': {}",
err
);
}
// ─── Regression: ensure_indices is idempotent ────────────────────────────────
#[tokio::test]
async fn ensure_indices_does_not_error_on_repeated_call() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let version_after_load = version_main(&db).await.unwrap();
// load commits now enforce required indices; repeated ensure_indices calls
// should be a no-op at the manifest level.
db.ensure_indices().await.unwrap();
let version_after_first = version_main(&db).await.unwrap();
db.ensure_indices().await.unwrap();
let version_after_second = version_main(&db).await.unwrap();
assert_eq!(version_after_first, version_after_load);
assert_eq!(version_after_second, version_after_load);
// Data should still be queryable after index operations
let snap = snapshot_main(&db).await.unwrap();
let ds = snap.open("node:Person").await.unwrap();
assert_eq!(ds.count_rows(None).await.unwrap(), 4);
}
// ─── DataFusion-Expr filter pushdown (Tier-1 follow-up to the Lance v6 bump) ──
/// Regression for `CompOp::Contains` pushdown via `array_has` in
/// `ir_filter_to_expr`. Before the Expr-pushdown refactor, the
/// `ir_filter_to_sql` family returned `None` for list-contains (the
/// comment said *"Can't pushdown list contains"*) and the predicate was
/// applied post-scan in memory. With `Scanner::filter_expr(Expr)` and
/// DF's `array_has` builtin, the contains predicate now pushes down to
/// Lance — the test confirms results are correct AND the pushdown path
/// is exercised (a regression on the pushdown would land all rows in
/// the scan, then be filtered post-hoc; that still produces the right
/// count so this test pins correctness, while `lance_surface_guards.rs`
/// is the structural pin for the surface itself).
#[tokio::test]
async fn ir_filter_with_list_contains_pushes_down() {
let schema = r#"
node Doc {
slug: String @key
tags: [String]
}
"#;
let data = r#"{"type":"Doc","data":{"slug":"alpha","tags":["red","blue"]}}
{"type":"Doc","data":{"slug":"bravo","tags":["green"]}}
{"type":"Doc","data":{"slug":"charlie","tags":["red","green"]}}
{"type":"Doc","data":{"slug":"delta","tags":[]}}"#;
let dir = tempfile::tempdir().unwrap();
let mut db = Omnigraph::init(dir.path().to_str().unwrap(), schema)
.await
.unwrap();
load_jsonl(&mut db, data, LoadMode::Overwrite)
.await
.unwrap();
let queries = r#"
query docs_with_tag($tag: String) {
match {
$d: Doc
$d.tags contains $tag
}
return { $d.slug }
}
"#;
let result = query_main(&mut db, queries, "docs_with_tag", &params(&[("$tag", "red")]))
.await
.unwrap();
let batch = result.concat_batches().unwrap();
let slugs = batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let mut got: Vec<&str> = (0..slugs.len()).map(|i| slugs.value(i)).collect();
got.sort();
assert_eq!(
got,
vec!["alpha", "charlie"],
"contains-pushdown should return exactly the rows whose tags list contains 'red'"
);
}