omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-21 02:28:07 +02:00

Author	SHA1	Message	Date
Ragnor Comerford	cdfbccbfdc	MR-794 step 2: scaffold MutationStaging accumulator + scan_with_pending Add the scaffolding for the in-memory staged-write rewire — no behavior change yet: * New crates/omnigraph/src/exec/staging.rs with MutationStaging, PendingTable, PendingMode, StagedTablePath, plus the end-of-query finalize() that issues one stage_* + commit_staged per pending table (Merge mode dedupes by id, last-write-wins). * TableStore::scan_with_pending and count_rows_with_pending helpers — Lance scan committed + DataFusion MemTable scan pending, concat. Sidesteps the Scanner::with_fragments filter-pushdown limitation documented on scan_with_staged. * Add datafusion = "52" to workspace + omnigraph-engine deps for MemTable (transitively pulled by Lance already). Engine code still uses the legacy MutationStaging shape; the rewire lands in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 10:42:21 +02:00
Ragnor Comerford	7c09220210	MR-794 step 1: fix u32 cast + pin scan_with_staged filter limitation Two CI failures, both addressed: (1) u32/u64 type mismatch in stage_append (compile error): ds.manifest.max_fragment_id is Option<u32>, but Lance's Fragment::id and the commit-time renumbering counter in Transaction::fragments_with_ids operate on u64. Cast max_fragment_id to u64 before the arithmetic. (2) scan_with_staged_pushes_filter_through_committed_and_staged failed because Lance's stats-based fragment pruning drops uncommitted staged fragments from filtered scans — they lack the per-column statistics that committed fragments carry. With filter `age >= 30` and a staged dave (age=35), dave is silently absent from the result. scanner.use_stats(false) does not bypass this in lance 4.0.0 (verified locally). Rather than chase Lance internals further, document the limitation: - stage_merge_insert / scan_with_staged docstring updated to flag the filter contract as incomplete on staged fragments. - Test renamed to scan_with_staged_with_filter_silently_drops_staged_rows and flipped to assert the actual behavior, with a clear note pointing at the design pivot (.context/mr-794-step2-design.md §1.1) and instructions for whoever sees the assertion fail in the future. - Test also asserts that unfiltered scan_with_staged returns all rows — confirms the issue is specifically filter pushdown, not fragment scanning per se. The engine's MR-794 step 2+ design (in-memory pending-batch accumulation + DataFusion MemTable for read-your-writes) sidesteps this entirely; production code is unaffected. scan_with_staged stays on the public surface for primitive-level testing and for callers that don't need filter pushdown. All 8 staged_writes tests + 10 runs + 63 end_to_end + consistency green locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 01:03:27 +02:00
Ragnor Comerford	85cfffeaf8	MR-794 step 1: assign real fragment IDs to staged appends CI exposed the actual root cause behind the three staged_writes test failures: Lance's InsertBuilder::execute_uncommitted produces fragments with id=0 as a "Temporary ID" (lance-4.0.0 dataset/write.rs:1044, with the assertion at line 1712). Real IDs get assigned at commit time by Transaction::fragments_with_ids (transaction.rs:1456). Because we expose pre-commit fragments to scan_with_staged via Scanner::with_fragments, two fragments collide on id=0 in the combined list — the staged fragment with the seed fragment, or two staged fragments with each other. Lance's scanner mishandles the collision. Symptoms observed in the three failing tests: - chained_stage_appends: only 1 distinct _rowid (other fragments silently dropped) - count_rows_with_staged_matches_scan: range overflow ("Invalid read of range 0..2 for fragment 0 with 1 addressable rows") - scan_with_staged_pushes_filter: duplicate carol + missing dave (one fragment read twice, the other not at all) Fix: assign real fragment IDs in stage_append, mirroring Lance's commit-time logic. Use ds.manifest.max_fragment_id + 1 as the base, incremented by the prior_stages fragment count so chained stage_appends produce distinct IDs. The row_id_meta assignment stays — both are needed for the scanner to correctly map row IDs through the combined fragment list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 00:18:47 +02:00
Ragnor Comerford	61b3f5090b	MR-794 step 1: thread row-ID offset + add commit_staged + filter tests Three follow-ups to the staged-writes primitives, all caught by the "are we missing tests?" review: (1) Path A row-ID threading (Gap 1, real bug): stage_append now takes prior_stages: &[StagedWrite] and offsets the assigned row IDs by the sum of prior stages' physical_rows. Without this, two stage_appends against the same dataset both started at ds.manifest.next_row_id, producing fragments with overlapping _rowid ranges. This would have fired in Step 2+ on any multi-statement mutation like `insert Knows ...; insert Knows ...` (multiple appends to the same edge table — allowed under D₂′). The slice mirrors scan_with_staged's API shape; the same slice is passed to both stage and scan. Documented contract: only stage_append results in prior_stages (D₂′ guarantees this upstream). (2) commit_staged round-trip tests (Gap 2): Two tests covering stage_append + commit_staged and stage_merge_insert + commit_staged. Validate that Lance's commit-time row-ID assignment works correctly even after our pre-commit row_id_meta assignment in the append path — the two assignments diverge but neither is observed across the boundary. (3) Filter pushdown test (Gap 3): scan_with_staged with a SQL filter applies it across both committed and staged fragments. Validates the MR-794 ticket's claim that Lance's with_fragments preserves filter/vector/FTS pushdown (Lance tests test_scalar_index_respects_fragment_list etc.). Also adds chained_stage_appends_have_distinct_row_ids which directly demonstrates the Gap 1 fix by projecting _rowid and asserting no duplicates across 1 committed + 2 staged rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:59:59 +02:00
Ragnor Comerford	714f1f0c0a	MR-794 step 1: assign row_id_meta to stage_append fragments CI exposed a real Step 1 bug surfaced by the new staged_writes tests: stage_append → scan_with_staged fails on stable_row_id datasets with "Missing row id meta" (lance-4.0.0/src/dataset/rowids.rs:22). Root cause: InsertBuilder::execute_uncommitted produces fragments with row_id_meta = None. Lance's commit phase normally populates it via Transaction::assign_row_ids, but scan_with_staged reads the staged fragments BEFORE commit. MergeInsertBuilder::execute_uncommitted dodges this by populating row_id_meta inline (transaction.rs:1618) — that's why the two merge-side tests in tests/staged_writes.rs passed and the two append-side tests failed. The bug was always present in the primitive — PR #66 shipped it the same way. PR #66 had no tests calling stage_append, so neither CI nor the bot reviewers caught it. Step 2+ would have hit it on the first mutation that did "insert + insert with FK validation," but the failure would have looked like a MutationStaging wiring bug; localizing it here saves the next session the chase. Fix: assign row_id_meta on the cloned fragments returned in StagedWrite.new_fragments. Mirrors the relevant arm of Lance's Transaction::assign_row_ids (transaction.rs:2682) for the row_id_meta = None case. The transaction's internal fragment copy stays untouched — Lance assigns its own IDs at commit time, and the two ID assignments don't have to agree because no caller threads _rowid from the staged scan into the commit path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:49:14 +02:00
Ragnor Comerford	2fe2669017	MR-794 step 1: import arrow_array::Array in staged_writes test CI failed compiling tests/staged_writes.rs — `.len()` is on the Array trait, not on the concrete StringArray/Int32Array types. Add the trait import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:19:05 +02:00
Ragnor Comerford	4c5fa3d8b8	MR-794 step 1: address PR #67 Codex P1 — document chained-merge contract Codex flagged that combine_committed_with_staged can return duplicates on chained stage_merge_inserts: each call's MergeInsertBuilder runs against the committed view (it does not see prior staged fragments), so two staged merges whose source rows share keys both produce Operation::Update transactions whose new_fragments contain the shared row. The combined scan returns it twice. The bug is intrinsic to Lance's API: there is no public way to make MergeInsertBuilder see uncommitted fragments. Fixing the primitive itself requires either a Lance API extension or in-memory pre-merge logic, neither in scope for v1. The v1 fix is a parse-time companion (D₂′) added with the engine rewire in MR-794 step 2+: per touched table, ops must be all stage_append OR exactly one stage_merge_insert. Multi-table queries and append-chains remain safe; only chained merges on a single table are rejected. This commit: - Documents the contract on stage_merge_insert and combine_committed_with_staged so callers know the invariant the primitive relies on. - Adds tests/staged_writes.rs with four primitive-level tests: - stage_append + scan_with_staged shows committed + staged - stage_merge_insert dedupes superseded committed fragments (regression for the removed_fragment_ids fix that PR #66's `730631c` added) - count_rows_with_staged matches scan - chained stage_merge_insert with shared key documents the duplicate-row behavior; assertion pins it so a future change either preserves the contract or consciously fixes it (and updates the test) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:10:19 +02:00
Ragnor Comerford	6dc4167291	MR-794 step 1: address PR #66 review — track removed_fragment_ids Three independent automated reviews (Cubic P1, Cursor High, Codex P1) flagged a real correctness bug in stage_merge_insert: Operation::Update returns three fields — removed_fragment_ids, updated_fragments, new_fragments — and we were collecting only the latter two into StagedWrite.new_fragments while discarding removed_fragment_ids. That breaks read-your-writes for any merge_insert that rewrites an existing fragment: scan_with_staged combines the dataset's full committed manifest with the staged new_fragments, so the original committed fragment (which the rewrite supersedes) and its rewritten version both end up in the Scanner's fragment list. Result: duplicate rows. Fix: - StagedWrite gains `removed_fragment_ids: Vec<u64>` populated from Operation::Update; empty for Operation::Append (which never supersedes existing fragments). - scan_with_staged / count_rows_with_staged take `&[StagedWrite]` instead of `&[Fragment]` so they have access to both fields. - A new `combine_committed_with_staged` helper composes the visible fragment list as `committed - removed + new`, deduping by fragment ID. Also addresses cubic's P3 doc-fab note: the StagedWrite doc comment claimed the type was "used by MutationStaging and the loader" but those callers don't exist in this PR (they're MR-794 step 2+). Reword to "defined here for later integration" so the doc doesn't lie about the current state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 19:19:39 +02:00
Ragnor Comerford	3601002440	MR-794 step 1: add staged-write primitives to TableStore Lance's distributed-write API splits "write fragment files" from "advance HEAD": write_fragments returns a Transaction with FragmentMetadata; a later CommitBuilder::execute(transaction) commits via the manifest CAS. The same shape exists for merge_insert via MergeInsertBuilder::execute_uncommitted. Scanner::with_fragments(staged) lets in-flight reads see uncommitted staged data. Adds wrappers for these primitives: - StagedWrite carries the uncommitted Transaction plus the new Fragments (extracted for read-your-writes via Scanner::with_fragments). - TableStore::stage_append wraps InsertBuilder::execute_uncommitted. - TableStore::stage_merge_insert wraps MergeInsertBuilder::execute_uncommitted. - TableStore::commit_staged wraps CommitBuilder::execute. - TableStore::scan_with_staged / count_rows_with_staged thread the staged fragments into a Scanner alongside the dataset's committed fragments. The MutationStaging integration that uses these primitives is the next step in MR-794 — it requires a coordinated rewrite of execute_insert / execute_update / execute_delete plus the load_jsonl_reader path, plus end-of-query commit logic. Doc comment on MutationStaging is updated to reference MR-794 and these primitives so the followup is well-anchored. The current MR-771 limitation in docs/runs.md ("mid-query partial failure leaves Lance HEAD ahead of __manifest") still applies until the follow-up lands; the primitives are the building blocks but not yet the fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 19:19:38 +02:00
Ragnor Comerford	1a906403cb	MR-771: address cubic comment — drop vacuous __run__ check in cancel test cubic correctly flagged that the assertion `!branches_after.iter().any(\|b\| b.starts_with("__run__"))` is vacuous because `branch_list()` already filters `__run__*` via `is_internal_system_branch`. The real structural property (no `__run__` branches can ever be created) is enforced by MR-771's deletion of `begin_run` etc. — that's a build-time invariant, not a runtime one. Drop the vacuous assertion; document why. The remaining checks (public branch list unchanged + `_graph_runs.lance` never reappears) cover the actual cancel-safety properties. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 15:31:49 +02:00
Ragnor Comerford	a7109d5fba	MR-771: address PR review feedback Three fixes from automated PR review on #65: 1. Internal-branch guard in mutation/load (Cursor Bugbot, Medium). Pre-MR-771 the begin_run path called ensure_public_branch_ref; the direct-publish replacements only normalized the name. A caller passing __run__* or __schema_apply_lock__ verbatim could write directly to a system branch. Re-add the explicit guard at the public write boundary in mutate_with_current_actor and load. 2. Panic-safe coordinator restoration (Cursor Bugbot, High). The previous swap-and-restore pattern would skip restore_coordinator if execute_named_mutation panicked, leaving the handle pinned to the wrong branch indefinitely. Replace with a CoordinatorRestoreGuard RAII type that captures the previous coordinator on swap and restores it in Drop. 3. Flaky cancel-safety test (cubic, P2). tests/runs.rs::cancelled_mutation_future_leaves_no_state asserted manifest version equality after handle.abort(), but abort races the spawned task. Re-frame around what actually defines cancel safety: no __run__* branches, no _graph_runs.lance, no synthesized public branches. The fourth comment (Codex P1: branch_delete losing its in-flight write barrier) is bigger in scope — fits in the MR-794 storage-trait staging story rather than a hotfix here. Tracked there. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 15:17:00 +02:00
Ragnor Comerford	35be20cb05	MR-771: demote Run to direct-publish via expected_table_versions CAS mutate_as and load now write directly to target tables and call the publisher once at the end with per-table expected versions; the Run state machine, _graph_runs.lance writers, __run__ staging branches, and server /runs/* endpoints are removed. Multi-statement mutations remain atomic at the manifest level via an in-memory MutationStaging accumulator that gives read-your-writes within a query and a single publish at the end. Concurrent-writer conflicts surface as ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the old DivergentUpdate merge shape. Documents one known limitation in docs/runs.md: a multi-statement mid-query failure where op-N writes a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the manifest until a follow-up introduces per-table Lance branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 08:52:50 +02:00
Claude	243c0c3464	Add internal-schema versioning + auto-migration for __manifest The on-disk shape of `__manifest` is reconciled with the binary via a single stamp + dispatcher in `db/manifest/migrations.rs`: - `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` declares the shape this binary writes. - The on-disk stamp `omnigraph:internal_schema_version` lives in the manifest dataset's schema-level metadata (Lance `update_schema_metadata`). - `migrate_internal_schema(&mut dataset)` walks `match`-arm steps forward from the on-disk stamp until it matches the binary, then returns. Idempotent. - `init_manifest_repo` stamps the current version at creation; the publisher's open-for-write path runs pending migrations before reading state. Reads stay side-effect-free. - Forward-version protection: a stamp higher than the binary's known version triggers a clear "upgrade omnigraph first" error so an old binary cannot clobber a newer schema. Self-heals existing pre-MR-766 deployments by auto-applying the v1→v2 step: the `lance-schema:unenforced-primary-key` annotation on `__manifest.object_id` that engages Lance's row-level CAS at commit time. New repos created via `init` are stamped at v2 immediately and don't need migration. Adding a future on-disk shape change is one constant bump, one match arm in `migrate_internal_schema`, and one test — no new branches in unrelated code paths. Code outside the migration module never inspects the stamp. New tests in `manifest/tests.rs`: - `test_init_stamps_internal_schema_version` - `test_publish_migrates_pre_stamp_manifest_to_current_version` - `test_publish_rejects_manifest_stamped_at_future_version` Docs: `docs/storage.md`, `docs/maintenance.md`, `docs/constants.md` updated per the AGENTS.md maintenance contract.	2026-04-29 11:44:14 +00:00
Claude	df0e158190	Add per-table expected-version OCC to ManifestBatchPublisher (MR-766) Layered approach selected by the CAS-granularity investigation (.context/merge-insert-cas-granularity.md): - Annotate __manifest.object_id with `lance-schema:unenforced-primary-key`, enabling Lance row-level CAS via the bloom-filter conflict resolver. Closes a latent silent-duplicate bug where two concurrent publishes of the same `version:T@v=N+1` row could both land in disjoint fragments. - Extend `ManifestBatchPublisher::publish` with `expected_table_versions: &HashMap<String, u64>`. Empty map preserves today's behavior; populated map asserts the manifest's latest non-tombstoned version per table matches the caller's view. Mismatches surface as a typed `ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected, actual }` so callers can match without parsing strings. - Set `merge_builder.conflict_retries(0)` so Lance's transparent rebase cannot silently break the OCC contract; retries are owned by the publisher loop, where each attempt re-runs `load_publish_state` and the expected-version pre-check. - Surface `ManifestCoordinator::commit_with_expected` for the callers that need strict OCC (the run-demotion ticket); existing `commit` and `commit_changes` paths are unaffected. New tests in `manifest/tests.rs` cover: matching expected versions, stale expected with typed details, drift on an untouched expected table, unknown expected table (actual=0), and the headline case of two concurrent publishes with overlapping expected versions where exactly one succeeds.	2026-04-29 00:34:20 +00:00
Andrew Altshuler	56b6319197	Enforce schema validators on every write path (#59 ) Several validators were defined but only called from a subset of write paths, so writes that violated @unique, @range, @check, enum, or @cardinality constraints could silently succeed and corrupt data. Adds two new helpers in loader/mod.rs: - validate_enum_constraints — batch-level enum check, scans Arrow string columns (and list-of-string columns) for values outside the declared set - enforce_unique_constraints_intra_batch — single-batch duplicate detection over named columns; partial enforcement (does not check against committed rows yet — cross-batch enforcement is a separate effort) Wires the validators into: - load_jsonl_reader nodes (alongside the existing validate_value_constraints call) and edges (which had no enum or unique check at all) - exec/mutation.rs node insert, edge insert, and update paths - mutation edge insert now also calls validate_edge_cardinality after the row lands but before the manifest commit, matching the loader's Phase 3 behavior A new tests/validators.rs suite asserts rejection on every entry path for invalid enum values, @range violations, intra-batch @unique duplicates, and edge @card excesses. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 04:51:10 +03:00
Andrew Altshuler	f75b941a9e	Make schema apply atomic across crashes (#57 ) Schema apply previously committed the manifest before writing the schema source and IR contract files. A crash in that window left the manifest pointing at the new schema while _schema.pg, _schema.ir.json, and __schema_state.json still reflected the old one — a silent inconsistency that subsequent reads hit as type errors. Reorders the apply: write to staging filenames first, commit the manifest, then atomically rename staging → final. On open, a recovery sweep reconciles any leftover staging files against the manifest's table set: pre-commit crashes get the staging files deleted, post-commit crashes get the renames completed (idempotent — handles partial renames). Property-only migrations where both schemas imply the same table set return an operator-actionable error rather than guessing. Adds rename_text + delete to StorageAdapter (atomic on local FS via tokio::fs::rename; copy + delete on S3 — recovery is tolerant of the non-atomic case). Failpoints test coverage at both crash boundaries plus a partial-rename scenario. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:21:00 +03:00
Ragnor Comerford	189caf893c	Merge pull request #47 from ModernRelay/perf/expand-dense-ids perf(expand): dense u32 ids end-to-end (follow-up to #45)	2026-04-25 23:54:03 +02:00
Andrew Altshuler	74eb5a5380	Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46 ) * Parallel per-type load writes + omnigraph optimize/cleanup CLI ## MR-677.3 — parallel per-type load writes The load path already groups records into one RecordBatch per type and makes one Lance commit per table (loader::mod.rs:249-..), but those commits ran sequentially. Wrap node and edge write loops in `futures::stream::buffered(N)` against a new helper `write_batches_concurrently`. Concurrency tunable via `OMNIGRAPH_LOAD_CONCURRENCY` (default 8). ## MR-676 — `omnigraph optimize` and `omnigraph cleanup` New CLI subcommands that walk every node + edge table in the repo: - `omnigraph optimize <uri>` — runs Lance `compact_files` on each table to merge small fragments into fewer larger ones. - `omnigraph cleanup <uri> --keep N \| --older-than 7d --confirm` — runs Lance `cleanup_old_versions` to prune historical manifests + unique fragments. Requires `--confirm` because it's destructive. Supports both count-based and time-based retention (or both AND'd together). Time uses chrono `DateTime<Utc>` (added as a workspace dep, default-features off). Both commands run their per-table loops in parallel (8-way bounded, `OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested against the 114-table prod graph: optimize went 7m15s sequential → 1m28s parallel. cleanup --keep 1 removed 137 historical versions across 114 tables in 1m57s without disrupting `/healthz` or query responses. Public API on `Omnigraph`: pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>> pub async fn cleanup(&mut self, opts: CleanupPolicyOptions) -> Result<Vec<TableCleanupStats>> All 10 existing loader tests still pass. Closes MR-676. Partially addresses MR-677 (the .3 — parallel by type — piece; MR-677.1 is for the `omnigraph embed` path, not load, since load doesn't call Gemini directly. .2 was already in place). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate openapi.json --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2026-04-25 14:22:14 +03:00
Ragnor Comerford	53d7f47909	Pass dense u32 ids through expand instead of round-tripping via String BFS now emits Vec<u32> dense ids directly with HashSet<u32> per-source dedup. Only the deduped set is stringified for Lance's IN-list. The post-hydrate alignment uses a dense-indexed Vec<Option<u32>> instead of HashMap<&str, usize>, giving O(1) lookup without repeated string hashing. End-to-end on the bench_expand harness (release, M-series): query baseline after speedup 1k hop3 460.2 ms 23.7 ms 19x 10k hop2 4.21 s 139.9 ms 30x 10k hop3 40.59 s 898.5 ms 45x 30k hop2 11.71 s 490.2 ms 24x 30k hop3 197.38 s 3.22 s 61x The cost lived in stringifying every (src,dst) pair and re-hashing the strings during alignment; once dense ids stay dense, the BFS inner loop and the final fan-out both collapse to integer ops. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 12:07:25 +02:00
andrew	628bc2e607	Clean up bench_expand example Remove vestigial code left from removed hasher variants: unused BuildHasherDefault import, PhantomData suppression line, orphan planning comments for Variant C/E. Also drop an unused `mut` on the PRNG closure binding. No behavior change; compiles warning-free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 00:59:21 +03:00
Ragnor Comerford	d8e0bfeb22	Dedupe dst ids before hydrating nodes in execute_expand (#45 ) The BFS in execute_expand emits one (src_idx, dst_id) pair per edge, so dst_id_list contains heavy duplication when multi-hop traversals revisit the same destination nodes. hydrate_nodes then built an "id IN ('a', 'b', ...)" filter from the full list, passing it verbatim to Lance. On a 30k-node Person graph, a 3-hop query produced a 15.4M- entry IN-list against a 30k-row target — 512x more entries than unique ids. Deduplicate before the Lance scan; the post-hydrate alignment HashMap already fans results back out to the original (src, dst) pairs, so output is bit-identical. Bench numbers (crates/omnigraph/examples/bench_expand.rs, min of 2-3 runs, release build): query before after speedup 1k hop3 460 ms 28 ms 16x 10k hop2 4.21 s 188 ms 22x 10k hop3 40.59 s 1.30 s 31x 30k hop2 11.71 s 678 ms 17x 30k hop3 197.38 s 4.86 s 41x All existing omnigraph-engine tests pass (72/72, 0 failures). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:56:18 +03:00
Andrew Altshuler	8649b2084f	Prepare v0.3.0 release (#44 ) * Prepare v0.3.0 release Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate openapi.json * ci: retrigger CI on latest openapi.json --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2026-04-21 19:11:34 +03:00
andrew	2df578eab8	Delete __run__ branches on every terminal state (MR-674) Run branches are transactional scaffolding — the durable audit lives on RunRecord. Invariant: every terminal state (Published, Aborted, Failed) deletes the __run__ branch. - Add `terminate_run` helper: appends terminal RunRecord, then deletes the run branch. Delete errors are swallowed — the record is authoritative; `cleanup_terminal_run_branches_for_target` retries on later `branch_delete` of the target. - Wire into `publish_run_as`, `abort_run`, `fail_run`. - Include `Failed` in the cleanup filter (was `Published \| Aborted` only) for legacy-repo GC during branch_delete. - Cleanup now checks `coordinator.all_branches()` first to skip branches already deleted by a concurrent handle — avoids Lance NotFound when two handles publish/clean up independently. - Drop `Failed` from `ensure_branch_delete_safe` — post-fix, Failed means the branch is already gone, so there's no reason to block target deletion (MR-674 "Downstream effects"). Tests: - New regression: `run_branches_do_not_accumulate_across_repeated_loads` — 10 loads + 1 abort → `branch_list() == ["main"]`. - New `failed_load_deletes_run_branch` asserts Failed path cleans up. - Rename `abort_run_keeps_target_unchanged_and_preserves_hidden_branch_for_inspection` → `abort_run_leaves_target_unchanged_and_deletes_run_branch`, invert the hidden-branch assertion. - Rewrite `public_{load,mutation}_preserves_staged_edge_ids_on_publish` to capture staged IDs before publish instead of inspecting the run branch after (branch is gone now). - Update MR-670 regression test to assert the run branch is absent after publish. Deferred to follow-up: `--keep-run-branch` debug flag, `omnigraph run gc`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 14:15:39 +03:00
andrew	f05ea2c7c3	Extract public-API tests from omnigraph.rs to integration tests The inline `mod tests` in crates/omnigraph/src/db/omnigraph.rs had grown to ~620 lines, mixing tests that need crate-private access with tests that only exercise the public API. Splits the latter out. - tests/lifecycle.rs: 10 init/open/snapshot/drift tests - tests/schema_apply.rs: 5 plan/apply tests - omnigraph.rs: 10 tests remain inline because they use db.coordinator, db.table_store(), ManifestCoordinator, SCHEMA_APPLY_LOCK_BRANCH, or is_internal_run_branch — all crate-private and intentionally kept so. No behavior change. Zero semantic edits to the tests themselves beyond replacing db.snapshot() (pub(crate)) with snapshot_main helper at integration-test boundaries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:09:34 +03:00
andrew	26012d156e	Filter internal run branches in schema_apply (MR-670) Published `__run__` branches are intentionally retained after publish for post-publish inspection (runs.rs tests verify edge IDs match between run branch and main). `apply_schema` was counting them as "non-main" branches and refusing to run — permanently blocking schema evolution after any load or change, with no CLI recovery path (`branch_delete` rejects internal refs, `run abort` rejects Published runs). Fix: `apply_schema` filters `is_internal_system_branch` (covers both `__run__*` and the schema-apply lock) rather than just the lock. Run branches remain available for inspection. Regression: test_apply_schema_succeeds_after_load_creates_published_run_branch pins that schema apply succeeds after a load even while the run branch is still present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:32:20 +03:00
andrew	33bdab1fcb	Prepare v0.2.2 release	2026-04-14 20:13:00 +03:00
andrew	3d74cbfc20	Prepare v0.2.1 release	2026-04-14 19:19:00 +03:00
Ragnor Comerford	063be3ddc7	Merge pull request #16 from ModernRelay/tin-epoch Fix join alignment for traversal-introduced bindings	2026-04-13 16:54:52 +02:00
Ragnor Comerford	6e43ceac08	Add comprehensive tests from morphological matrix analysis Unit tests covering gaps identified by systematic matrix of: topology (fan-out, fan-in, cycle) × deferral × filter type × direction. New unit tests: - fan-out: one root fans to two deferred destinations via different edges - fan-in: two sources converge on one destination via reverse expand - cycle: deferred binding + genuine cycle-closing on return edge - multiple filters on single deferred binding (name + age) - param filter on deferred binding (IRExpr::Param in dst_filters) - negation with inner binding (documents current NodeScan+cycle-close behavior) New integration tests: - fan-out projection (friend × company cross-product per source) - deferred filter matching nothing (empty result propagation) - negation with inner destination binding filter Also: guard anti-join fast path against non-empty dst_filters. The bulk CSR existence check only tests neighbor existence, not destination properties — it must fall back to the slow path when dst_filters are present to avoid false negatives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:31:08 +02:00
Ragnor Comerford	853691c70e	Fix join alignment for traversal-introduced bindings with Lance filter pushdown The IR lowering previously emitted independent NodeScans for every binding in a match clause, even when bindings were connected by traversals. This created O(N×M) cross-joins followed by cycle-closing filters — correct but extremely slow for large datasets. Two changes fix this by design: 1. Deferred bindings — When multiple bindings are connected by traversals, only the first-declared binding gets a NodeScan. The rest are introduced by Expand operations, eliminating cross-joins entirely. 2. Filter fusion into Expand — Deferred binding filters are attached directly to IROp::Expand (new `dst_filters` field) and pushed into Lance SQL during hydrate_nodes(), so the storage layer skips non-matching rows. Non-pushable filters (list-contains, FTS) fall back to in-memory application after hconcat. For a query like: match { $p: Person $p worksAt $c $c: Company { name: "Acme" } } Old plan: NodeScan($p) → NodeScan($c) → cross-join → Expand(__temp) → cycle-close New plan: NodeScan($p) → Expand($p→$c, Lance SQL: id IN (...) AND name='Acme') Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:10:50 +02:00
Claude	37b7a94eb7	Fix nullable query parameters: accept omission and null for `?` params Parameters declared with `?` (e.g. `$changelogUrl: String?`) now correctly accept omission or explicit null in JSON input instead of requiring empty strings as a workaround. Adds `Literal::Null` variant and threads it through parameter parsing, type-checking, and Arrow array conversion. https://claude.ai/code/session_014oGFKL7EVg1b2cyPgt9Gne	2026-04-13 08:43:48 +00:00
Ragnor Comerford	c5a88cacb5	Merge pull request #6 from ModernRelay/claude/omnigraph-aggregates-a53rG Implement aggregate functions with GROUP BY support	2026-04-13 10:26:07 +02:00
Claude	351610d18c	Implement aggregate execution with wide-batch model Add runtime support for aggregate functions (count, sum, avg, min, max) with GROUP BY semantics, built on a single wide RecordBatch that eliminates correlation tracking by construction. Execution engine (exec/query.rs): - Replace HashMap<String, RecordBatch> with Option<RecordBatch> where columns are prefixed as <variable>.<property> - NodeScan prefixes columns and cross-joins with existing batch - Expand collects (src_row, dst_id) pairs, takes wide batch rows, appends prefixed destination columns via hconcat - Filter applies single mask to entire wide batch - AntiJoin: fast-path returns BooleanArray mask; slow-path slices one row for inner pipeline execution Projection engine (exec/projection.rs): - aggregate_return groups rows by non-aggregate key columns using length-prefixed string encoding, computes per-group aggregates - SUM accumulates into f64 to avoid integer overflow - MIN/MAX support both numeric and string types - Empty input returns count=0, others=null Compiler (typecheck.rs): - T8: split MIN/MAX from SUM/AVG — allow string arguments - T9: non-aggregate expressions in aggregate queries must be property accesses or variables - SUM type inference returns Float64 (matching runtime) Tests: 8 new integration tests covering grouped count, global count, sum/avg/min/max per company, aggregate+order+limit, string min/max, multi-hop aggregates, and edge cases. https://claude.ai/code/session_019o5NRyYomgETFyd7hpiLey	2026-04-12 20:59:13 +00:00
andrew	95e89a343e	Genericize bootstrap context fixture	2026-04-12 22:02:25 +03:00
andrew	5daeae7571	Prepare v0.2.0 release	2026-04-12 20:35:34 +03:00
andrew	e9a511e38f	Refactor query execution modules	2026-04-12 18:18:57 +03:00
andrew	3741900611	Refactor omnigraph db module layout	2026-04-12 17:07:24 +03:00
andrew	6655cd65d5	Harden schema apply against write races	2026-04-12 15:19:48 +03:00
Andrew Altshuler	af9a44e879	Merge pull request #4 from ModernRelay/claude/omnigraph-multi-statement-mutations-DxWSA Support multi-statement mutations (insert + edge in one query)	2026-04-12 13:58:26 +03:00
andrew	92fa3189f7	Add schema apply command and policy support	2026-04-12 04:01:14 +03:00
Claude	d10f78530f	Support multi-statement mutations (insert + edge in one query) Allow mutation queries to contain multiple sequential statements that execute atomically within a single transactional run. This enables patterns like inserting a node and its edges in one query: query add_and_link($name: String, $age: I32, $friend: String) { insert Person { name: $name, age: $age } insert Knows { from: $name, to: $friend } } Changes span the full compiler-to-execution pipeline: - Grammar: mutation_body = { mutation_stmt+ } - AST: QueryDecl.mutations: Vec<Mutation> - IR: MutationIR.ops: Vec<MutationOpIR> - Execution: loop over ops, accumulate affected counts Cross-statement visibility works because each statement's commit_updates advances the manifest state, so subsequent statements see prior writes. Atomicity comes from the existing run mechanism (begin_run/publish_run). https://claude.ai/code/session_01E4VG2WXrZW8aeXFiqr8NwF	2026-04-11 20:27:51 +00:00
andrew	4b058b9813	Fix CLI ergonomics and stream export output	2026-04-11 19:01:48 +03:00
andrew	40ed575e7e	Set public release version to 0.1.0	2026-04-11 05:33:04 +03:00
andrew	338289656a	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

44 commits