mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
* engine: opt MergeInsertBuilder into FirstSeen for Lance dup-rowid bug (MR-957)
Lance 4.0.x's MergeInsertBuilder rejects sequential merge_insert /
update against rows previously rewritten by merge_insert with a
spurious "Ambiguous merge inserts: multiple source rows match the
same target row on (id = ...)" error. The engine passes exactly 1
source row; Lance's `processed_row_ids: Mutex<HashSet<u64>>`
(lance-4.0.0 src/dataset/write/merge_insert.rs:2099) double-processes
the same source/target match against datasets previously rewritten
by merge_insert and errors under the default
SourceDedupeBehavior::Fail.
Two surfaces hit it:
- Load: `omnigraph load --mode merge` twice against the same @key set.
- Mutate: sequential `update T set {f:v} where x=y` on the same row.
Fix: opt both MergeInsertBuilder call sites (merge_insert_batch,
stage_merge_insert) into SourceDedupeBehavior::FirstSeen. Lance
silently skips a duplicate match instead of erroring.
Correctness-preserving for OmniGraph because source-side duplicates
are already rejected upstream of these call sites:
- Loader: enforce_unique_constraints_intra_batch (loader/mod.rs:1453)
rejects intra-batch dup @key values across all three LoadModes,
pinned by the new loader_rejects_intra_batch_duplicate_keys test.
- Mutate: MutationStaging::finalize pre-dedupes by id.
So FirstSeen only suppresses the spurious Lance behavior, never user
data.
Regression coverage:
- consistency::load_merge_repeated_against_overlapping_keys_succeeds
— load surface (was the basis of the original PR #98 report).
- runs::second_sequential_update_on_same_row_succeeds — update
surface (MR-920).
- consistency::loader_rejects_intra_batch_duplicate_keys — pins
FirstSeen's safety argument.
- consistency::load_merge_window_2_documents_upstream_lance_gap —
canary for the residual upstream Lance gap (after MR-848 removes
the eager BTREE-on-id, re-establishing the index via
ensure_indices re-triggers the bug class). Drop the FirstSeen
setter only when this canary stays green without it.
Cross-validation on the prior PR #98 branch: both use_index(false)
(PR #98's hypothesis) and FirstSeen (MR-920's hypothesis) cover both
surfaces individually. FirstSeen chosen because it has no perf cost
(use_index(false) would force full-table scans on every merge_insert).
Supersedes PR #98 and andrew/merge-insert-firstseen.
Tracked at MR-957; upstream: lance-format/lance#6877.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* engine: add dedup-by-keys precondition on merge_insert primitives
Addresses Codex P1 on PR #109: `SourceDedupeBehavior::FirstSeen`
silently collapses duplicate source rows, and the branch-merge rewrite
path (`exec/merge.rs::publish_rewritten_merge_table`) feeds a
concatenated batch directly into `stage_merge_insert` without going
through `MutationStaging::finalize`'s pre-dedupe. By construction the
merge algorithm (`compute_source_delta` / `compute_three_way_delta`
walk via `OrderedTableCursor` and push each id at most once) produces
1-row-per-id, but the invariant was implicit — a future refactor
could violate it and FirstSeen would mask the bug as silent data
loss.
Add `check_batch_unique_by_keys` as a release-mode precondition at the
top of `merge_insert_batch` and `stage_merge_insert`. Errors with an
explicit "duplicate source row" message before the builder runs, so
real source dups continue to fail-fast regardless of caller.
Cost: one extra O(N) pass over the key column on every merge_insert.
String HashSet over typical batch sizes is microseconds — negligible
next to the merge_insert itself.
The inline comment in `table_store.rs` now enumerates all three
pre-dedup paths (load / mutate / branch-merge) and names the
precondition as the structural pin instead of relying on
by-construction invariants from three separate callers.
Three new unit tests in `table_store::tests` pin the helper itself;
the existing `loader_rejects_intra_batch_duplicate_keys` integration
test continues to pin the loader's intake-time check as the first
defense layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
782 lines
25 KiB
Rust
782 lines
25 KiB
Rust
mod helpers;
|
|
|
|
use arrow_array::{Array, Date32Array, Int32Array, StringArray};
|
|
use futures::TryStreamExt;
|
|
|
|
use omnigraph::db::Omnigraph;
|
|
use omnigraph::loader::{LoadMode, load_jsonl};
|
|
use omnigraph_compiler::ir::ParamMap;
|
|
use omnigraph_compiler::query::ast::Literal;
|
|
|
|
use helpers::*;
|
|
|
|
// ─── Snapshot data-level isolation ──────────────────────────────────────────
|
|
|
|
#[tokio::test]
|
|
async fn snapshot_returns_stale_data_after_write() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let mut db = init_and_load(&dir).await;
|
|
|
|
// Snapshot BEFORE mutation
|
|
let snap_before = snapshot_main(&db).await.unwrap();
|
|
|
|
// Insert a new person
|
|
mutate_main(
|
|
&mut db,
|
|
MUTATION_QUERIES,
|
|
"insert_person",
|
|
&mixed_params(&[("$name", "Eve")], &[("$age", 22)]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Snapshot AFTER mutation
|
|
let snap_after = snapshot_main(&db).await.unwrap();
|
|
|
|
// Old snapshot should still see 4 persons
|
|
let ds_before = snap_before.open("node:Person").await.unwrap();
|
|
assert_eq!(ds_before.count_rows(None).await.unwrap(), 4);
|
|
|
|
// New snapshot should see 5 persons
|
|
let ds_after = snap_after.open("node:Person").await.unwrap();
|
|
assert_eq!(ds_after.count_rows(None).await.unwrap(), 5);
|
|
|
|
// Verify Eve is NOT in old snapshot's data
|
|
let batches_before: Vec<arrow_array::RecordBatch> = ds_before
|
|
.scan()
|
|
.try_into_stream()
|
|
.await
|
|
.unwrap()
|
|
.try_collect()
|
|
.await
|
|
.unwrap();
|
|
let ids_before = collect_column_strings(&batches_before, "id");
|
|
assert!(!ids_before.contains(&"Eve".to_string()));
|
|
|
|
// Verify Eve IS in new snapshot's data
|
|
let batches_after: Vec<arrow_array::RecordBatch> = ds_after
|
|
.scan()
|
|
.try_into_stream()
|
|
.await
|
|
.unwrap()
|
|
.try_collect()
|
|
.await
|
|
.unwrap();
|
|
let ids_after = collect_column_strings(&batches_after, "id");
|
|
assert!(ids_after.contains(&"Eve".to_string()));
|
|
}
|
|
|
|
// ─── LoadMode::Merge ────────────────────────────────────────────────────────
|
|
|
|
#[tokio::test]
|
|
async fn load_merge_upserts_existing_and_inserts_new() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let uri = dir.path().to_str().unwrap();
|
|
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
|
|
|
|
// Load Alice(30) and Bob(25) via Overwrite
|
|
let initial = r#"{"type": "Person", "data": {"name": "Alice", "age": 30}}
|
|
{"type": "Person", "data": {"name": "Bob", "age": 25}}"#;
|
|
load_jsonl(&mut db, initial, LoadMode::Overwrite)
|
|
.await
|
|
.unwrap();
|
|
|
|
assert_eq!(count_rows(&db, "node:Person").await, 2);
|
|
|
|
// Merge: Alice updated to age=31, Charlie is new
|
|
let merge_data = r#"{"type": "Person", "data": {"name": "Alice", "age": 31}}
|
|
{"type": "Person", "data": {"name": "Charlie", "age": 35}}"#;
|
|
load_jsonl(&mut db, merge_data, LoadMode::Merge)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Should have 3 persons total (not 4)
|
|
assert_eq!(count_rows(&db, "node:Person").await, 3);
|
|
|
|
// Verify individual values
|
|
let batches = read_table(&db, "node:Person").await;
|
|
let batch = &batches[0];
|
|
let ids = batch
|
|
.column_by_name("id")
|
|
.unwrap()
|
|
.as_any()
|
|
.downcast_ref::<StringArray>()
|
|
.unwrap();
|
|
let ages = batch
|
|
.column_by_name("age")
|
|
.unwrap()
|
|
.as_any()
|
|
.downcast_ref::<Int32Array>()
|
|
.unwrap();
|
|
|
|
for i in 0..batch.num_rows() {
|
|
match ids.value(i) {
|
|
"Alice" => assert_eq!(ages.value(i), 31, "Alice should be updated to 31"),
|
|
"Bob" => assert_eq!(ages.value(i), 25, "Bob should be unchanged"),
|
|
"Charlie" => assert_eq!(ages.value(i), 35, "Charlie should be inserted"),
|
|
other => panic!("unexpected person: {}", other),
|
|
}
|
|
}
|
|
}
|
|
|
|
/// Regression: two sequential `LoadMode::Merge` invocations against the
|
|
/// same set of keys must both succeed. Pre-fix, the second one failed
|
|
/// with `Ambiguous merge inserts are prohibited: multiple source rows
|
|
/// match the same target row on (id = "TEST-1")` even though every
|
|
/// source batch had one row per key.
|
|
///
|
|
/// Triggered by Lance's `processed_row_ids: Mutex<HashSet<u64>>`
|
|
/// (lance-4.0.0 `src/dataset/write/merge_insert.rs:2099`) double-
|
|
/// processing the same source/target match against datasets previously
|
|
/// rewritten by merge_insert. Worked around by opting
|
|
/// `MergeInsertBuilder` into `SourceDedupeBehavior::FirstSeen` in
|
|
/// `crates/omnigraph/src/table_store.rs` — see that file for the full
|
|
/// rationale and the safety pin (`loader_rejects_intra_batch_duplicate_keys`).
|
|
/// Tracked at MR-957; upstream: lance-format/lance#6877.
|
|
#[tokio::test]
|
|
async fn load_merge_repeated_against_overlapping_keys_succeeds() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let uri = dir.path().to_str().unwrap();
|
|
let schema = r#"
|
|
node Thing {
|
|
key: String @key
|
|
required_val: String
|
|
optional_val: String?
|
|
}
|
|
"#;
|
|
let mut db = Omnigraph::init(uri, schema).await.unwrap();
|
|
|
|
// Seed with 50 fully-populated rows (id + required + optional).
|
|
let mut seed = String::new();
|
|
for i in 1..=50 {
|
|
seed.push_str(&format!(
|
|
r#"{{"type":"Thing","data":{{"key":"TEST-{i}","required_val":"required {i}","optional_val":"optional {i}"}}}}
|
|
"#,
|
|
));
|
|
}
|
|
load_jsonl(&mut db, &seed, LoadMode::Overwrite)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Partial-schema delta — mirrors the bug report exactly: omits
|
|
// `optional_val`. 25 existing keys + 5 new keys, one row per key.
|
|
let mut delta = String::new();
|
|
for i in (1..=25).chain(51..=55) {
|
|
delta.push_str(&format!(
|
|
r#"{{"type":"Thing","data":{{"key":"TEST-{i}","required_val":"required {i} UPDATED"}}}}
|
|
"#,
|
|
));
|
|
}
|
|
|
|
load_jsonl(&mut db, &delta, LoadMode::Merge)
|
|
.await
|
|
.expect("first merge must succeed");
|
|
assert_eq!(count_rows(&db, "node:Thing").await, 55);
|
|
|
|
load_jsonl(&mut db, &delta, LoadMode::Merge)
|
|
.await
|
|
.expect("second merge against same keys must succeed");
|
|
assert_eq!(count_rows(&db, "node:Thing").await, 55);
|
|
}
|
|
|
|
/// Safety pin for the `SourceDedupeBehavior::FirstSeen` workaround in
|
|
/// `crates/omnigraph/src/table_store.rs`. FirstSeen tells Lance to
|
|
/// silently skip a duplicate source row instead of erroring. Our use of
|
|
/// it depends on user-provided duplicates being rejected *before* the
|
|
/// batch reaches Lance — otherwise FirstSeen could silently drop user
|
|
/// data.
|
|
///
|
|
/// Defense in depth:
|
|
/// 1. The loader's `enforce_unique_constraints_intra_batch`
|
|
/// (`loader/mod.rs:1453`), invoked unconditionally on any node type
|
|
/// with a `@key`, errors on intra-batch duplicate `@key` values at
|
|
/// intake — pinned by this test across every `LoadMode`.
|
|
/// 2. The `check_batch_unique_by_keys` precondition at the top of
|
|
/// `merge_insert_batch` and `stage_merge_insert` is the final
|
|
/// fail-fast guard: even if a future caller bypasses the loader path
|
|
/// (e.g. branch-merge's `publish_rewritten_merge_table` builds its
|
|
/// own source batch directly), a real duplicate id reaches Lance
|
|
/// only after surfacing as an `OmniError::Manifest`, never silently
|
|
/// via FirstSeen. Pinned by the unit tests in `table_store::tests`.
|
|
#[tokio::test]
|
|
async fn loader_rejects_intra_batch_duplicate_keys() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let uri = dir.path().to_str().unwrap();
|
|
let schema = r#"
|
|
node Thing {
|
|
key: String @key
|
|
value: String
|
|
}
|
|
"#;
|
|
let mut db = Omnigraph::init(uri, schema).await.unwrap();
|
|
|
|
let dupes = r#"{"type":"Thing","data":{"key":"DUP","value":"first"}}
|
|
{"type":"Thing","data":{"key":"DUP","value":"second"}}
|
|
"#;
|
|
|
|
for mode in [LoadMode::Overwrite, LoadMode::Append, LoadMode::Merge] {
|
|
let err = load_jsonl(&mut db, dupes, mode).await.unwrap_err();
|
|
let msg = err.to_string();
|
|
assert!(
|
|
msg.contains("@unique violation") && msg.contains("DUP"),
|
|
"load mode {mode:?} must reject intra-batch duplicate @key (got: {msg})"
|
|
);
|
|
assert_eq!(
|
|
count_rows(&db, "node:Thing").await,
|
|
0,
|
|
"load mode {mode:?} must not persist any rows when the batch is rejected"
|
|
);
|
|
}
|
|
}
|
|
|
|
/// Canary for the upstream Lance gap that the `FirstSeen` workaround
|
|
/// in `table_store.rs` masks. The bug class is "Window 2": load →
|
|
/// indices built explicitly → merge → merge. Even with the engine
|
|
/// fully aligned to the "indexes are derived state" invariant
|
|
/// (MR-848), as long as an `id` index has been built between the
|
|
/// first and second merge_insert, the Lance internal that triggers
|
|
/// the bug remains reachable.
|
|
///
|
|
/// This test runs the Window-2 sequence under the FirstSeen workaround.
|
|
/// It is expected to pass today. If a future Lance upgrade or local
|
|
/// change makes it START failing, the workaround has lost effectiveness
|
|
/// (upstream Lance changed something, or the FirstSeen setter was
|
|
/// dropped from `table_store.rs`). If a future Lance upgrade fixes the
|
|
/// bug class, this test continues to pass and the FirstSeen setter can
|
|
/// be retired.
|
|
///
|
|
/// Tracked at MR-957; upstream: lance-format/lance#6877.
|
|
#[tokio::test]
|
|
async fn load_merge_window_2_documents_upstream_lance_gap() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let uri = dir.path().to_str().unwrap();
|
|
let schema = r#"
|
|
node Thing {
|
|
key: String @key
|
|
required_val: String
|
|
optional_val: String?
|
|
}
|
|
"#;
|
|
let mut db = Omnigraph::init(uri, schema).await.unwrap();
|
|
|
|
let mut seed = String::new();
|
|
for i in 1..=50 {
|
|
seed.push_str(&format!(
|
|
r#"{{"type":"Thing","data":{{"key":"TEST-{i}","required_val":"required {i}","optional_val":"optional {i}"}}}}
|
|
"#,
|
|
));
|
|
}
|
|
load_jsonl(&mut db, &seed, LoadMode::Overwrite)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Explicit ensure_indices between seed and the merges — the Window
|
|
// 2 trigger. The eager-build behavior (MR-583) means the BTREE on
|
|
// `id` is already present here, but calling explicitly pins the
|
|
// invariant for the post-MR-848 future where the eager build is
|
|
// gone.
|
|
db.ensure_indices().await.unwrap();
|
|
|
|
let mut delta = String::new();
|
|
for i in (1..=25).chain(51..=55) {
|
|
delta.push_str(&format!(
|
|
r#"{{"type":"Thing","data":{{"key":"TEST-{i}","required_val":"required {i} UPDATED"}}}}
|
|
"#,
|
|
));
|
|
}
|
|
|
|
// Both merges must succeed under the FirstSeen workaround.
|
|
// `processed_row_ids` re-processes the same target row_id under
|
|
// the default `SourceDedupeBehavior::Fail`; FirstSeen tolerates it.
|
|
load_jsonl(&mut db, &delta, LoadMode::Merge)
|
|
.await
|
|
.expect("first merge after ensure_indices must succeed");
|
|
db.ensure_indices().await.unwrap();
|
|
load_jsonl(&mut db, &delta, LoadMode::Merge)
|
|
.await
|
|
.expect(
|
|
"second merge after ensure_indices must succeed \
|
|
(Window 2 canary: drop the FirstSeen setter in table_store.rs \
|
|
only when this stays green WITHOUT it)",
|
|
);
|
|
assert_eq!(count_rows(&db, "node:Thing").await, 55);
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn cross_type_traversal_deduplicates_duplicate_edges() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let uri = dir.path().to_str().unwrap();
|
|
let schema = r#"
|
|
node Person { name: String @key }
|
|
node Company { name: String @key }
|
|
edge WorksAt: Person -> Company
|
|
"#;
|
|
let data = r#"{"type":"Person","data":{"name":"Alice"}}
|
|
{"type":"Company","data":{"name":"Acme"}}
|
|
{"edge":"WorksAt","from":"Alice","to":"Acme"}
|
|
{"edge":"WorksAt","from":"Alice","to":"Acme"}"#;
|
|
let query = r#"
|
|
query company($name: String) {
|
|
match {
|
|
$p: Person { name: $name }
|
|
$p worksAt $c
|
|
}
|
|
return { $c.name }
|
|
}
|
|
"#;
|
|
|
|
let mut db = Omnigraph::init(uri, schema).await.unwrap();
|
|
load_jsonl(&mut db, data, LoadMode::Overwrite)
|
|
.await
|
|
.unwrap();
|
|
|
|
let result = query_main(&mut db, query, "company", ¶ms(&[("$name", "Alice")]))
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(result.num_rows(), 1);
|
|
}
|
|
|
|
// ─── Multi-writer refresh ───────────────────────────────────────────────────
|
|
|
|
#[tokio::test]
|
|
async fn explicit_target_query_sees_other_writer_commits_without_refresh() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let _db = init_and_load(&dir).await;
|
|
drop(_db);
|
|
|
|
let uri = dir.path().to_str().unwrap();
|
|
|
|
// Two independent handles to the same repo
|
|
let mut db1 = Omnigraph::open(uri).await.unwrap();
|
|
let mut db2 = Omnigraph::open(uri).await.unwrap();
|
|
|
|
// Writer 1 inserts Eve
|
|
mutate_main(
|
|
&mut db1,
|
|
MUTATION_QUERIES,
|
|
"insert_person",
|
|
&mixed_params(&[("$name", "Eve")], &[("$age", 22)]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Explicit-target reads resolve the latest branch head and should see Eve
|
|
let qr = query_main(
|
|
&mut db2,
|
|
TEST_QUERIES,
|
|
"get_person",
|
|
¶ms(&[("$name", "Eve")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(qr.num_rows(), 1, "explicit target reads should see Eve");
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn explicit_target_query_rebuilds_graph_index_after_external_edge_write() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let _db = init_and_load(&dir).await;
|
|
drop(_db);
|
|
|
|
let uri = dir.path().to_str().unwrap();
|
|
let mut db1 = Omnigraph::open(uri).await.unwrap();
|
|
let mut db2 = Omnigraph::open(uri).await.unwrap();
|
|
|
|
let warm = query_main(
|
|
&mut db2,
|
|
TEST_QUERIES,
|
|
"friends_of",
|
|
¶ms(&[("$name", "Alice")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(warm.num_rows(), 2);
|
|
|
|
mutate_main(
|
|
&mut db1,
|
|
MUTATION_QUERIES,
|
|
"add_friend",
|
|
¶ms(&[("$from", "Alice"), ("$to", "Diana")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
let refreshed = query_main(
|
|
&mut db2,
|
|
TEST_QUERIES,
|
|
"friends_of",
|
|
¶ms(&[("$name", "Alice")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(
|
|
refreshed.num_rows(),
|
|
3,
|
|
"explicit target reads should rebuild topology after edge change"
|
|
);
|
|
|
|
let batch = refreshed.concat_batches().unwrap();
|
|
let names = batch
|
|
.column(0)
|
|
.as_any()
|
|
.downcast_ref::<StringArray>()
|
|
.unwrap();
|
|
let values: Vec<&str> = (0..names.len()).map(|i| names.value(i)).collect();
|
|
assert!(values.contains(&"Bob"));
|
|
assert!(values.contains(&"Diana"));
|
|
}
|
|
|
|
// ─── Null handling ──────────────────────────────────────────────────────────
|
|
|
|
#[tokio::test]
|
|
async fn null_values_in_filter_and_projection() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let uri = dir.path().to_str().unwrap();
|
|
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
|
|
|
|
// Load data: Alice has age, Bob has null age, Charlie has age
|
|
let data = r#"{"type": "Person", "data": {"name": "Alice", "age": 30}}
|
|
{"type": "Person", "data": {"name": "Bob"}}
|
|
{"type": "Person", "data": {"name": "Charlie", "age": 35}}"#;
|
|
load_jsonl(&mut db, data, LoadMode::Overwrite)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Filter: age > 30 should exclude Bob (null) and Alice (30), keep Charlie (35)
|
|
let queries = r#"
|
|
query older_than_30() {
|
|
match {
|
|
$p: Person
|
|
$p.age > 30
|
|
}
|
|
return { $p.name, $p.age }
|
|
order { $p.age desc }
|
|
}
|
|
|
|
query all_persons() {
|
|
match { $p: Person }
|
|
return { $p.name, $p.age }
|
|
order { $p.age desc }
|
|
}
|
|
"#;
|
|
|
|
let result = query_main(&mut db, queries, "older_than_30", &ParamMap::new())
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(result.num_rows(), 1);
|
|
let batch = &result.batches()[0];
|
|
let names = batch
|
|
.column(0)
|
|
.as_any()
|
|
.downcast_ref::<StringArray>()
|
|
.unwrap();
|
|
assert_eq!(names.value(0), "Charlie");
|
|
|
|
// Projection: Bob's age should be null
|
|
let all = query_main(&mut db, queries, "all_persons", &ParamMap::new())
|
|
.await
|
|
.unwrap();
|
|
let batch = &all.batches()[0];
|
|
let ids = batch
|
|
.column(0)
|
|
.as_any()
|
|
.downcast_ref::<StringArray>()
|
|
.unwrap();
|
|
let ages = batch
|
|
.column(1)
|
|
.as_any()
|
|
.downcast_ref::<Int32Array>()
|
|
.unwrap();
|
|
|
|
for i in 0..batch.num_rows() {
|
|
if ids.value(i) == "Bob" {
|
|
assert!(ages.is_null(i), "Bob's age should be null");
|
|
}
|
|
}
|
|
}
|
|
|
|
// ─── Graph index after node+edge insert ─────────────────────────────────────
|
|
|
|
#[tokio::test]
|
|
async fn traversal_works_after_node_then_edge_insert() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let mut db = init_and_load(&dir).await;
|
|
|
|
// Warm up the graph index cache by running a traversal
|
|
let _ = query_main(
|
|
&mut db,
|
|
TEST_QUERIES,
|
|
"friends_of",
|
|
¶ms(&[("$name", "Alice")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Insert a new node (does NOT invalidate graph index)
|
|
mutate_main(
|
|
&mut db,
|
|
MUTATION_QUERIES,
|
|
"insert_person",
|
|
&mixed_params(&[("$name", "Frank")], &[("$age", 40)]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Insert an edge from Frank → Alice (DOES invalidate graph index)
|
|
mutate_main(
|
|
&mut db,
|
|
MUTATION_QUERIES,
|
|
"add_friend",
|
|
¶ms(&[("$from", "Frank"), ("$to", "Alice")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Traversal should work: Frank → Alice
|
|
let result = query_main(
|
|
&mut db,
|
|
TEST_QUERIES,
|
|
"friends_of",
|
|
¶ms(&[("$name", "Frank")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(result.num_rows(), 1);
|
|
let batch = result.concat_batches().unwrap();
|
|
let names = batch
|
|
.column(0)
|
|
.as_any()
|
|
.downcast_ref::<StringArray>()
|
|
.unwrap();
|
|
assert_eq!(names.value(0), "Alice");
|
|
}
|
|
|
|
// ─── Edge property insert ───────────────────────────────────────────────────
|
|
|
|
#[tokio::test]
|
|
async fn insert_edge_with_property() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let mut db = init_and_load(&dir).await;
|
|
|
|
// Knows has `since: Date?` property
|
|
let queries = r#"
|
|
query add_friend_since($from: String, $to: String, $since: Date) {
|
|
insert Knows { from: $from, to: $to, since: $since }
|
|
}
|
|
"#;
|
|
let mut p = params(&[("$from", "Diana"), ("$to", "Bob")]);
|
|
p.insert("since".to_string(), Literal::Date("2024-06-15".to_string()));
|
|
|
|
let result = mutate_main(&mut db, queries, "add_friend_since", &p)
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(result.affected_edges, 1);
|
|
|
|
// Verify the edge property was stored
|
|
let batches = read_table(&db, "edge:Knows").await;
|
|
let mut found = false;
|
|
for batch in &batches {
|
|
let srcs = batch
|
|
.column_by_name("src")
|
|
.unwrap()
|
|
.as_any()
|
|
.downcast_ref::<StringArray>()
|
|
.unwrap();
|
|
let dsts = batch
|
|
.column_by_name("dst")
|
|
.unwrap()
|
|
.as_any()
|
|
.downcast_ref::<StringArray>()
|
|
.unwrap();
|
|
let since = batch
|
|
.column_by_name("since")
|
|
.unwrap()
|
|
.as_any()
|
|
.downcast_ref::<Date32Array>()
|
|
.unwrap();
|
|
for i in 0..batch.num_rows() {
|
|
if srcs.value(i) == "Diana" && dsts.value(i) == "Bob" {
|
|
assert!(!since.is_null(i), "since should not be null");
|
|
found = true;
|
|
}
|
|
}
|
|
}
|
|
assert!(found, "should find Diana→Bob edge");
|
|
}
|
|
|
|
// ─── Update / delete no-match ───────────────────────────────────────────────
|
|
|
|
#[tokio::test]
|
|
async fn update_nonexistent_returns_zero_affected() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let mut db = init_and_load(&dir).await;
|
|
|
|
let result = mutate_main(
|
|
&mut db,
|
|
MUTATION_QUERIES,
|
|
"set_age",
|
|
&mixed_params(&[("$name", "Nobody")], &[("$age", 99)]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
assert_eq!(result.affected_nodes, 0);
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn delete_nonexistent_returns_zero_affected() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let mut db = init_and_load(&dir).await;
|
|
|
|
let result = mutate_main(
|
|
&mut db,
|
|
MUTATION_QUERIES,
|
|
"remove_person",
|
|
¶ms(&[("$name", "Nobody")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
assert_eq!(result.affected_nodes, 0);
|
|
assert_eq!(result.affected_edges, 0);
|
|
|
|
// All 4 persons still intact
|
|
assert_eq!(count_rows(&db, "node:Person").await, 4);
|
|
}
|
|
|
|
// ─── Large batch load ───────────────────────────────────────────────────────
|
|
|
|
#[tokio::test]
|
|
async fn large_batch_load_and_query() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let uri = dir.path().to_str().unwrap();
|
|
|
|
let schema = r#"
|
|
node Item {
|
|
name: String @key
|
|
value: I32
|
|
}
|
|
"#;
|
|
let mut db = Omnigraph::init(uri, schema).await.unwrap();
|
|
|
|
// Generate 500 items
|
|
let mut lines = Vec::with_capacity(500);
|
|
for i in 0..500 {
|
|
lines.push(format!(
|
|
r#"{{"type": "Item", "data": {{"name": "item_{:04}", "value": {}}}}}"#,
|
|
i, i
|
|
));
|
|
}
|
|
let data = lines.join("\n");
|
|
load_jsonl(&mut db, &data, LoadMode::Overwrite)
|
|
.await
|
|
.unwrap();
|
|
|
|
assert_eq!(count_rows(&db, "node:Item").await, 500);
|
|
|
|
// Query with filter — value > 490
|
|
let queries = r#"
|
|
query high_value() {
|
|
match {
|
|
$i: Item
|
|
$i.value > 490
|
|
}
|
|
return { $i.name, $i.value }
|
|
order { $i.value asc }
|
|
}
|
|
"#;
|
|
let result = query_main(&mut db, queries, "high_value", &ParamMap::new())
|
|
.await
|
|
.unwrap();
|
|
|
|
// Items 491..499 = 9 items
|
|
assert_eq!(result.num_rows(), 9);
|
|
let batch = &result.batches()[0];
|
|
let values = batch
|
|
.column(1)
|
|
.as_any()
|
|
.downcast_ref::<Int32Array>()
|
|
.unwrap();
|
|
assert_eq!(values.value(0), 491);
|
|
assert_eq!(values.value(8), 499);
|
|
}
|
|
|
|
// ─── Stale handle must refresh-and-retry (no silent rebase) ──────────────
|
|
|
|
#[tokio::test]
|
|
async fn stale_handle_public_mutation_must_refresh_then_retry() {
|
|
// With the Run state machine removed, the engine no longer
|
|
// auto-rebases stale-handle mutations onto the latest target head.
|
|
// The publisher's `expected_table_versions` CAS makes the contract
|
|
// explicit — a stale writer fails loudly with
|
|
// `ExpectedVersionMismatch` and the client decides whether to
|
|
// refresh-and-retry.
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let _db = init_and_load(&dir).await;
|
|
drop(_db);
|
|
|
|
let uri = dir.path().to_str().unwrap();
|
|
let mut db1 = Omnigraph::open(uri).await.unwrap();
|
|
let mut db2 = Omnigraph::open(uri).await.unwrap();
|
|
|
|
// Writer 1 inserts Eve — advances the Person sub-table.
|
|
mutate_main(
|
|
&mut db1,
|
|
MUTATION_QUERIES,
|
|
"insert_person",
|
|
&mixed_params(&[("$name", "Eve")], &[("$age", 22)]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Writer 2 is now stale. Its first attempt must fail with
|
|
// ExpectedVersionMismatch — no silent rebase.
|
|
let stale_err = mutate_main(
|
|
&mut db2,
|
|
MUTATION_QUERIES,
|
|
"set_age",
|
|
&mixed_params(&[("$name", "Alice")], &[("$age", 99)]),
|
|
)
|
|
.await
|
|
.expect_err("stale writer must hit ExpectedVersionMismatch");
|
|
let omnigraph::error::OmniError::Manifest(manifest_err) = stale_err else {
|
|
panic!("expected Manifest error");
|
|
};
|
|
assert!(matches!(
|
|
manifest_err.details,
|
|
Some(omnigraph::error::ManifestConflictDetails::ExpectedVersionMismatch { .. })
|
|
));
|
|
|
|
// Refresh and retry — the canonical client recovery path.
|
|
db2.sync_branch("main").await.unwrap();
|
|
mutate_main(
|
|
&mut db2,
|
|
MUTATION_QUERIES,
|
|
"set_age",
|
|
&mixed_params(&[("$name", "Alice")], &[("$age", 99)]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Both Writer 1's insert and Writer 2's update are visible.
|
|
let result = query_main(
|
|
&mut db2,
|
|
TEST_QUERIES,
|
|
"get_person",
|
|
¶ms(&[("$name", "Alice")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(result.num_rows(), 1);
|
|
assert_eq!(result.to_rust_json()[0]["p.age"], serde_json::json!(99));
|
|
|
|
let eve = query_main(
|
|
&mut db2,
|
|
TEST_QUERIES,
|
|
"get_person",
|
|
¶ms(&[("$name", "Eve")]),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
assert_eq!(eve.num_rows(), 1, "concurrent insert should be preserved");
|
|
}
|