omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
Ragnor Comerford	61b3f5090b	MR-794 step 1: thread row-ID offset + add commit_staged + filter tests Three follow-ups to the staged-writes primitives, all caught by the "are we missing tests?" review: (1) Path A row-ID threading (Gap 1, real bug): stage_append now takes prior_stages: &[StagedWrite] and offsets the assigned row IDs by the sum of prior stages' physical_rows. Without this, two stage_appends against the same dataset both started at ds.manifest.next_row_id, producing fragments with overlapping _rowid ranges. This would have fired in Step 2+ on any multi-statement mutation like `insert Knows ...; insert Knows ...` (multiple appends to the same edge table — allowed under D₂′). The slice mirrors scan_with_staged's API shape; the same slice is passed to both stage and scan. Documented contract: only stage_append results in prior_stages (D₂′ guarantees this upstream). (2) commit_staged round-trip tests (Gap 2): Two tests covering stage_append + commit_staged and stage_merge_insert + commit_staged. Validate that Lance's commit-time row-ID assignment works correctly even after our pre-commit row_id_meta assignment in the append path — the two assignments diverge but neither is observed across the boundary. (3) Filter pushdown test (Gap 3): scan_with_staged with a SQL filter applies it across both committed and staged fragments. Validates the MR-794 ticket's claim that Lance's with_fragments preserves filter/vector/FTS pushdown (Lance tests test_scalar_index_respects_fragment_list etc.). Also adds chained_stage_appends_have_distinct_row_ids which directly demonstrates the Gap 1 fix by projecting _rowid and asserting no duplicates across 1 committed + 2 staged rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:59:59 +02:00
Ragnor Comerford	714f1f0c0a	MR-794 step 1: assign row_id_meta to stage_append fragments CI exposed a real Step 1 bug surfaced by the new staged_writes tests: stage_append → scan_with_staged fails on stable_row_id datasets with "Missing row id meta" (lance-4.0.0/src/dataset/rowids.rs:22). Root cause: InsertBuilder::execute_uncommitted produces fragments with row_id_meta = None. Lance's commit phase normally populates it via Transaction::assign_row_ids, but scan_with_staged reads the staged fragments BEFORE commit. MergeInsertBuilder::execute_uncommitted dodges this by populating row_id_meta inline (transaction.rs:1618) — that's why the two merge-side tests in tests/staged_writes.rs passed and the two append-side tests failed. The bug was always present in the primitive — PR #66 shipped it the same way. PR #66 had no tests calling stage_append, so neither CI nor the bot reviewers caught it. Step 2+ would have hit it on the first mutation that did "insert + insert with FK validation," but the failure would have looked like a MutationStaging wiring bug; localizing it here saves the next session the chase. Fix: assign row_id_meta on the cloned fragments returned in StagedWrite.new_fragments. Mirrors the relevant arm of Lance's Transaction::assign_row_ids (transaction.rs:2682) for the row_id_meta = None case. The transaction's internal fragment copy stays untouched — Lance assigns its own IDs at commit time, and the two ID assignments don't have to agree because no caller threads _rowid from the staged scan into the commit path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:49:14 +02:00
Ragnor Comerford	2fe2669017	MR-794 step 1: import arrow_array::Array in staged_writes test CI failed compiling tests/staged_writes.rs — `.len()` is on the Array trait, not on the concrete StringArray/Int32Array types. Add the trait import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:19:05 +02:00
Ragnor Comerford	4c5fa3d8b8	MR-794 step 1: address PR #67 Codex P1 — document chained-merge contract Codex flagged that combine_committed_with_staged can return duplicates on chained stage_merge_inserts: each call's MergeInsertBuilder runs against the committed view (it does not see prior staged fragments), so two staged merges whose source rows share keys both produce Operation::Update transactions whose new_fragments contain the shared row. The combined scan returns it twice. The bug is intrinsic to Lance's API: there is no public way to make MergeInsertBuilder see uncommitted fragments. Fixing the primitive itself requires either a Lance API extension or in-memory pre-merge logic, neither in scope for v1. The v1 fix is a parse-time companion (D₂′) added with the engine rewire in MR-794 step 2+: per touched table, ops must be all stage_append OR exactly one stage_merge_insert. Multi-table queries and append-chains remain safe; only chained merges on a single table are rejected. This commit: - Documents the contract on stage_merge_insert and combine_committed_with_staged so callers know the invariant the primitive relies on. - Adds tests/staged_writes.rs with four primitive-level tests: - stage_append + scan_with_staged shows committed + staged - stage_merge_insert dedupes superseded committed fragments (regression for the removed_fragment_ids fix that PR #66's `730631c` added) - count_rows_with_staged matches scan - chained stage_merge_insert with shared key documents the duplicate-row behavior; assertion pins it so a future change either preserves the contract or consciously fixes it (and updates the test) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 23:10:19 +02:00
Ragnor Comerford	6dc4167291	MR-794 step 1: address PR #66 review — track removed_fragment_ids Three independent automated reviews (Cubic P1, Cursor High, Codex P1) flagged a real correctness bug in stage_merge_insert: Operation::Update returns three fields — removed_fragment_ids, updated_fragments, new_fragments — and we were collecting only the latter two into StagedWrite.new_fragments while discarding removed_fragment_ids. That breaks read-your-writes for any merge_insert that rewrites an existing fragment: scan_with_staged combines the dataset's full committed manifest with the staged new_fragments, so the original committed fragment (which the rewrite supersedes) and its rewritten version both end up in the Scanner's fragment list. Result: duplicate rows. Fix: - StagedWrite gains `removed_fragment_ids: Vec<u64>` populated from Operation::Update; empty for Operation::Append (which never supersedes existing fragments). - scan_with_staged / count_rows_with_staged take `&[StagedWrite]` instead of `&[Fragment]` so they have access to both fields. - A new `combine_committed_with_staged` helper composes the visible fragment list as `committed - removed + new`, deduping by fragment ID. Also addresses cubic's P3 doc-fab note: the StagedWrite doc comment claimed the type was "used by MutationStaging and the loader" but those callers don't exist in this PR (they're MR-794 step 2+). Reword to "defined here for later integration" so the doc doesn't lie about the current state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 19:19:39 +02:00
Ragnor Comerford	3601002440	MR-794 step 1: add staged-write primitives to TableStore Lance's distributed-write API splits "write fragment files" from "advance HEAD": write_fragments returns a Transaction with FragmentMetadata; a later CommitBuilder::execute(transaction) commits via the manifest CAS. The same shape exists for merge_insert via MergeInsertBuilder::execute_uncommitted. Scanner::with_fragments(staged) lets in-flight reads see uncommitted staged data. Adds wrappers for these primitives: - StagedWrite carries the uncommitted Transaction plus the new Fragments (extracted for read-your-writes via Scanner::with_fragments). - TableStore::stage_append wraps InsertBuilder::execute_uncommitted. - TableStore::stage_merge_insert wraps MergeInsertBuilder::execute_uncommitted. - TableStore::commit_staged wraps CommitBuilder::execute. - TableStore::scan_with_staged / count_rows_with_staged thread the staged fragments into a Scanner alongside the dataset's committed fragments. The MutationStaging integration that uses these primitives is the next step in MR-794 — it requires a coordinated rewrite of execute_insert / execute_update / execute_delete plus the load_jsonl_reader path, plus end-of-query commit logic. Doc comment on MutationStaging is updated to reference MR-794 and these primitives so the followup is well-anchored. The current MR-771 limitation in docs/runs.md ("mid-query partial failure leaves Lance HEAD ahead of __manifest") still applies until the follow-up lands; the primitives are the building blocks but not yet the fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 19:19:38 +02:00
Ragnor Comerford	b73813e525	Merge pull request #65 from ModernRelay/ragnorc/demote-run MR-771: demote Run to direct-publish via expected_table_versions CAS	2026-04-30 19:08:17 +02:00
Ragnor Comerford	1a906403cb	MR-771: address cubic comment — drop vacuous __run__ check in cancel test cubic correctly flagged that the assertion `!branches_after.iter().any(\|b\| b.starts_with("__run__"))` is vacuous because `branch_list()` already filters `__run__*` via `is_internal_system_branch`. The real structural property (no `__run__` branches can ever be created) is enforced by MR-771's deletion of `begin_run` etc. — that's a build-time invariant, not a runtime one. Drop the vacuous assertion; document why. The remaining checks (public branch list unchanged + `_graph_runs.lance` never reappears) cover the actual cancel-safety properties. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 15:31:49 +02:00
Ragnor Comerford	a7109d5fba	MR-771: address PR review feedback Three fixes from automated PR review on #65: 1. Internal-branch guard in mutation/load (Cursor Bugbot, Medium). Pre-MR-771 the begin_run path called ensure_public_branch_ref; the direct-publish replacements only normalized the name. A caller passing __run__* or __schema_apply_lock__ verbatim could write directly to a system branch. Re-add the explicit guard at the public write boundary in mutate_with_current_actor and load. 2. Panic-safe coordinator restoration (Cursor Bugbot, High). The previous swap-and-restore pattern would skip restore_coordinator if execute_named_mutation panicked, leaving the handle pinned to the wrong branch indefinitely. Replace with a CoordinatorRestoreGuard RAII type that captures the previous coordinator on swap and restores it in Drop. 3. Flaky cancel-safety test (cubic, P2). tests/runs.rs::cancelled_mutation_future_leaves_no_state asserted manifest version equality after handle.abort(), but abort races the spawned task. Re-frame around what actually defines cancel safety: no __run__* branches, no _graph_runs.lance, no synthesized public branches. The fourth comment (Codex P1: branch_delete losing its in-flight write barrier) is bigger in scope — fits in the MR-794 storage-trait staging story rather than a hotfix here. Tracked there. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 15:17:00 +02:00
Ragnor Comerford	35be20cb05	MR-771: demote Run to direct-publish via expected_table_versions CAS mutate_as and load now write directly to target tables and call the publisher once at the end with per-table expected versions; the Run state machine, _graph_runs.lance writers, __run__ staging branches, and server /runs/* endpoints are removed. Multi-statement mutations remain atomic at the manifest level via an in-memory MutationStaging accumulator that gives read-your-writes within a query and a single publish at the end. Concurrent-writer conflicts surface as ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the old DivergentUpdate merge shape. Documents one known limitation in docs/runs.md: a multi-statement mid-query failure where op-N writes a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the manifest until a follow-up introduces per-table Lance branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 08:52:50 +02:00
Ragnor Comerford	034c49f82b	Merge pull request #64 from ModernRelay/ragnorc/architecture-diagrams docs: add Mermaid architecture diagrams (system / storage / read + write flows)	2026-04-29 19:31:08 +02:00
Ragnor Comerford	97c6490c1d	docs: post-merge code-review fixes (Cursor + cubic on PR #60 era) Two factual mismatches caught during code-grounded re-review: - docs/architecture.md: "13 cmd families" was stale — the CLI has 17 Command variants (Version, Embed, Init, Load, Ingest, Branch, Schema, Query, Snapshot, Export, Run, Commit, Read, Change, Policy, Optimize, Cleanup). Replaced the count with "command families" so the diagram doesn't drift again. - docs/execution.md: the mutation prose said "every mutation runs on a fresh __run__<id> branch", which over-claims. mutation.rs:555 short- circuits when the target is already a __run__ branch — the assumption there is the caller is managing the surrounding run lifecycle. Added a one-paragraph caveat noting the exception with the file:line citation. Both diagrams unchanged; only annotations / counts adjusted.	2026-04-29 17:14:48 +02:00
Ragnor Comerford	411d86b743	docs/execution: fix mutation atomicity narrative — run-branch + publish_run The mutation flow diagram and prose previously claimed multi-statement mutations publish through a single ManifestRepo::commit. That's wrong: each statement commits independently to a __run__<id> branch via commit_updates; atomicity comes from the publish_run step that promotes the run-branch into the target (fast path) or three-way merges it (merge path). Diagram now shows: - begin_run forks __run__<id> from target head - per-statement commit_updates on the run-branch (loop) - OCC pre-check on target head - publish_run with fast-path / merge-path branches - terminate_run(Published) Prose now points at runs.md for the full run lifecycle and cites the correct entry points: mutate_with_current_actor (mutation.rs:539), publish_run (omnigraph.rs:858). Addresses Codex review comment on PR #64.	2026-04-29 17:09:29 +02:00
Ragnor Comerford	64b9d56476	docs: add Mermaid architecture diagrams across architecture / storage / execution Replace the single ASCII stack in docs/architecture.md with a hierarchy of Mermaid diagrams that show the system from external context down to the component level. Add an on-disk layout diagram in docs/storage.md and two sequence diagrams (read query, mutation) in docs/execution.md so readers can navigate from "what is OmniGraph" to "how does a query run" without opening source. Static structure (docs/architecture.md): - System context — agents/clients, embedding providers, Cedar, object store. - Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling via classDef, replacing the pre-existing ASCII art. - Component zoom-ins — compiler, engine, storage trait, index lifecycle, server/CLI. Each zoom-in cites file:line entry points. Aspirational shapes (storage trait, full reconciler) are visually marked and pointed at the relevant invariants.md section so readers see the intended seam without thinking it's already implemented. On-disk layout (docs/storage.md): - Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance, _graph_runs.lance, _refs/branches/ down into Lance's per-dataset internals (_versions/, data/, _indices/, _refs/, _transactions/). - Annotated with the actual filenames so readers can `ls` the same paths. - Slots in below the existing __manifest CAS / OCC / migration prose; does not move or rewrite that content. Runtime flows (docs/execution.md): - Read flow sequence: client → Omnigraph::query → typecheck → lower → execute_query → table_store → Lance scanner → RecordBatch stream. - Mutation flow sequence: Omnigraph::mutate → resolve literals → Lance write op (Append / merge_insert) → ManifestRepo::commit → __manifest upsert. - Both diagrams are followed by a "Code paths" block with verified file:line citations so readers can navigate from diagram element to source in one step. Conventions established (this is the first Mermaid in the repo): - L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed. - Diagram size cap ~9 elements; more detail goes in a sub-diagram. - Diagrams paired with prose; code-path citations follow each diagram. - Consistent vocabulary across diagrams: frontend / compiler / engine / storage trait / Lance / object store. No accidental synonyms. Subsequent PRs will add flow diagrams for schema apply, branch + merge, run isolation, index reconcile, and the embedding pipeline in the same conventions.	2026-04-29 16:58:56 +02:00
Ragnor Comerford	4e5374a85e	Merge pull request #63 from ModernRelay/claude/extend-manifest-batch-publisher-GQzB6	2026-04-29 14:45:33 +02:00
Claude	63babede4a	AGENTS.md: reframe 'minimize ongoing liability' as a general decision lens The previous bullets read like a migration pattern (centralized dispatcher, one match arm, no shape forks). That's one application, not the principle. Reframe it as a bidirectional decision lens: ask "which option has lower ongoing cost over time?" and let the answer be more code, less code, DRYing, duplication, removal, addition, a new abstraction, or flattening one — whichever shape converges over the expected change horizon. Add explicit examples of cases where the lower-liability option is more code (dispatcher, migration framework, typed error variants) and where it's less (premature abstractions, "just in case" paths, helpers that wedge independently-evolving callers together) so readers don't collapse the principle into "minimize code".	2026-04-29 12:32:30 +00:00
Claude	b6a596670a	AGENTS.md: add 'minimize ongoing liability' as a first principle The always-on rules are concretizations of a broader engineering posture that wasn't stated explicitly. Add a short section that frames the spirit behind those rules: - One centralized detection point, not many heal hooks. - One dispatcher, not branch-on-shape in every consumer. - One canonical shape after migration, not forks on "old vs new". - Three similar lines beats a premature abstraction. - Delete dead paths when their last caller leaves. Plus a forward-looking review prompt ("what do these paths look like after 5 more changes like this?") so the principle bites at design time, not just at review time. The internal-schema-version mechanism we just shipped is a concrete application: one stamp + one dispatcher + one match arm per change, no heal hooks scattered across the engine. Codify the pattern so future work doesn't drift back to ad-hoc.	2026-04-29 11:48:47 +00:00
Claude	243c0c3464	Add internal-schema versioning + auto-migration for __manifest The on-disk shape of `__manifest` is reconciled with the binary via a single stamp + dispatcher in `db/manifest/migrations.rs`: - `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` declares the shape this binary writes. - The on-disk stamp `omnigraph:internal_schema_version` lives in the manifest dataset's schema-level metadata (Lance `update_schema_metadata`). - `migrate_internal_schema(&mut dataset)` walks `match`-arm steps forward from the on-disk stamp until it matches the binary, then returns. Idempotent. - `init_manifest_repo` stamps the current version at creation; the publisher's open-for-write path runs pending migrations before reading state. Reads stay side-effect-free. - Forward-version protection: a stamp higher than the binary's known version triggers a clear "upgrade omnigraph first" error so an old binary cannot clobber a newer schema. Self-heals existing pre-MR-766 deployments by auto-applying the v1→v2 step: the `lance-schema:unenforced-primary-key` annotation on `__manifest.object_id` that engages Lance's row-level CAS at commit time. New repos created via `init` are stamped at v2 immediately and don't need migration. Adding a future on-disk shape change is one constant bump, one match arm in `migrate_internal_schema`, and one test — no new branches in unrelated code paths. Code outside the migration module never inspects the stamp. New tests in `manifest/tests.rs`: - `test_init_stamps_internal_schema_version` - `test_publish_migrates_pre_stamp_manifest_to_current_version` - `test_publish_rejects_manifest_stamped_at_future_version` Docs: `docs/storage.md`, `docs/maintenance.md`, `docs/constants.md` updated per the AGENTS.md maintenance contract.	2026-04-29 11:44:14 +00:00
Claude	5eb47b8c13	docs: surface MR-766 publisher OCC in storage / errors / constants - storage.md: document the row-level CAS annotation on `__manifest.object_id` and the `expected_table_versions` OCC contract on `ManifestBatchPublisher::publish`. - errors.md: list `ManifestConflictDetails` and its variants alongside `ManifestError`. - constants.md: add `PUBLISHER_RETRY_BUDGET = 5`. Per AGENTS.md "Maintenance contract": new schema construct, new constant, and new typed error shape all need to ship with the source change.	2026-04-29 07:56:18 +00:00
Claude	df0e158190	Add per-table expected-version OCC to ManifestBatchPublisher (MR-766) Layered approach selected by the CAS-granularity investigation (.context/merge-insert-cas-granularity.md): - Annotate __manifest.object_id with `lance-schema:unenforced-primary-key`, enabling Lance row-level CAS via the bloom-filter conflict resolver. Closes a latent silent-duplicate bug where two concurrent publishes of the same `version:T@v=N+1` row could both land in disjoint fragments. - Extend `ManifestBatchPublisher::publish` with `expected_table_versions: &HashMap<String, u64>`. Empty map preserves today's behavior; populated map asserts the manifest's latest non-tombstoned version per table matches the caller's view. Mismatches surface as a typed `ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected, actual }` so callers can match without parsing strings. - Set `merge_builder.conflict_retries(0)` so Lance's transparent rebase cannot silently break the OCC contract; retries are owned by the publisher loop, where each attempt re-runs `load_publish_state` and the expected-version pre-check. - Surface `ManifestCoordinator::commit_with_expected` for the callers that need strict OCC (the run-demotion ticket); existing `commit` and `commit_changes` paths are unaffected. New tests in `manifest/tests.rs` cover: matching expected versions, stale expected with typed details, drift on an untouched expected table, unknown expected table (actual=0), and the headline case of two concurrent publishes with overlapping expected versions where exactly one succeeds.	2026-04-29 00:34:20 +00:00
Claude	bb95fdceda	Investigate Lance MergeInsertBuilder CAS granularity (MR-766 prereq) Confirms Lance v4.0.0 has row-level CAS for merge_insert only when the join-key column carries lance-schema:unenforced-primary-key=true. Our __manifest schema does not, so the publisher silently allows duplicate object_id rows under concurrent writers. Note + reproducible scratch crate select the layered (pre-check + row-level CAS) approach for the publisher API ticket.	2026-04-28 23:30:17 +00:00
Ragnor Comerford	58dba6210e	Merge pull request #61 from ModernRelay/ragnorc/query-execution-deep-dive docs/invariants: drop commercial section; separate patterns from invariants	2026-04-29 00:44:26 +02:00
Ragnor Comerford	56b30c5c5a	Restructure invariants doc: drop commercial, separate patterns from invariants - Removed §IX (OSS / Cloud kernel-product split) — business strategy belongs in MR-738, not the technical invariants doc. - Filled the §IV (Additivity / migration) placeholder with five evolution invariants. - Reframed §I to be substrate-agnostic: invariants are about respecting any substrate; Lance / DataFusion are noted as the current chosen substrate rather than as the invariant itself. - Added §VI Database guarantees (12 invariants): atomicity, schema integrity, isolation, durability, causal consistency, determinism, idempotency, no silent loss, bounded operations, failure scope, crash recovery, consistency model. - Added §II.8 wire-protocol agnosticism (kernel transport-agnostic, Flight/HTTP at the server boundary). - Reframed §VII as "Current architectural patterns" — explicitly distinct from invariants. Each pattern entry now names the underlying invariant it realizes (reconciler / Union / mutations-wrap-reads / SIP / factorize / stable row IDs / rank columns / policy predicates / Source). - Pulled specific config defaults out of §VI (timeouts, memory caps); invariant is that bounds exist, values live in docs/constants.md. - Split §IX deny-list into "invariant violations" (high bar) and "pattern violations" (overridable with justification). - Added status legend: decided / open — see MR-X / aspirational. Annotated invariants and patterns that are not yet upheld in current code. - Updated review checklist (§X) to cover database-guarantee dimensions and the wire-protocol / Source / patterns sections. - Updated Living Document policy (§XI) to spell out how to revise patterns, resolve open invariants, and lift aspirational annotations. Source tickets: MR-737, MR-744, MR-765, MR-694 family, MR-722/MR-725.	2026-04-29 00:39:11 +02:00
Ragnor Comerford	a9430978fb	Merge pull request #60 from ModernRelay/ragnorc/omnigraph-spec Add AGENTS.md (map) + docs/ knowledge base + CI link check	2026-04-29 00:15:19 +02:00
Ragnor Comerford	6f25c4f9f8	Address reviewer feedback (Cursor + cubic) on PR #60 All eight comments verified against source and applied: - AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of the markdown blockquote. Claude Code's @-import parser expects @ at column 0; the leading "> " of a blockquote silently broke recognition, so the claimed auto-include did nothing. (Cursor, Medium severity.) - docs/cli-reference.md: command-family count 13 → 17. The current enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level variants. (cubic P2.) - docs/ci.md: Homebrew tap update is a regular `git push`, not a force-push (release.yml:117 is `git push origin HEAD:main`). (cubic P2.) - docs/errors.md: add the Storage variant to the NanoError list — it exists at error.rs:88-89 but the doc enumerated only 10 of 11. (cubic P2.) - docs/storage.md: clarify tombstone semantics. There is no tombstone_version column; state.rs:180 reads the tombstone version from the table_version column on rows where object_type = table_tombstone. (cubic P2.) - docs/branches-commits.md: split the GraphCommit pseudo-struct from the underlying storage. actor_id is joined in-memory from _graph_commit_actors.lance, not a column on _graph_commits.lance. (cubic P2.) - docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to match the actual constant name in catalog/schema_ir.rs:11. (cubic P3.) - docs/testing.md: engine integration test count 16 → 15 (matches `ls crates/omnigraph/tests/*.rs`). (cubic P3.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 00:09:06 +02:00
Ragnor Comerford	ada58ccd7b	Make "check existing coverage first" a top-level testing principle The original docs/testing.md mentioned finding existing tests as step 1 of the checklist but never explicitly said "if existing coverage already addresses your case, extend it; don't duplicate." Adds a prominent "First principle" section that names extend-vs-new as the preferred outcome and lists three duplicated init_and_load blocks as the most common form of test rot. Adds an extra checklist item: verify your change makes an existing test fail before it makes a new one pass — if you can break the code without breaking a test, that coverage gap is the bug to fix first. Strengthens the AGENTS.md callout so the principle ("always check what already covers it") is in scope from the top of every session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 00:03:50 +02:00
Ragnor Comerford	8be0e6a067	Add docs/testing.md as required-read every session Maps the test surface (engine integration tests by area, CLI/server tests, helpers harness, fixtures, failpoints feature, RustFS S3 integration, OpenAPI drift) and gives a before-every-task checklist: find existing tests for the area, run them as a clean baseline, plan the new test up front, reuse helpers, mind the layer boundary per invariants §VII.33. Notes that there's no coverage tooling today — coverage knowledge comes from reading and running the relevant integration tests, not a tarpaulin/codecov report. Threaded into AGENTS.md as the third required-reading file alongside invariants.md and lance.md, with a Claude-Code @-import so agents load it on every turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:55:21 +02:00
Ragnor Comerford	a06d8bcf82	Promote docs/lance.md to required-read every session Adds @docs/lance.md alongside @docs/invariants.md so the Lance index loads on every turn (Claude Code @-import; explicit-open instruction for other agents). Reframes the directive from "when you hit a Lance-shaped problem" to "consult before every task to identify which upstream pages are relevant." The Lance docs are the authoritative source for substrate behavior, so reasoning about them should start every change rather than be triggered conditionally. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:50:42 +02:00
Ragnor Comerford	b6440d6b17	Add docs/lance.md — task-organized index of Lance upstream docs Curates the Lance documentation site (lance.org) into a problem-domain index so agents fetch the right page when working on Lance-touching code instead of guessing or grepping our codebase. Organized by topic: storage format & file layout, branching/tags/time travel, indexes (scalar + system + vector), reads/writes, schema evolution, object store, data types, performance, compaction, DataFusion integration, SDK reference, plus quick-starts and the upstream AGENTS.md. Skips ~200 irrelevant URLs from the upstream sitemap (Namespace REST API model surface, Spark/Trino/Databricks/etc. integrations, Python/Ray/HuggingFace docs, community pages) since omnigraph is Rust-only and doesn't run a Lance Namespace catalog. AGENTS.md surfaces it in the topic index and adds a directive: "when you hit a Lance-shaped problem, consult docs/lance.md and fetch the upstream URL before guessing." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:48:28 +02:00
Ragnor Comerford	43724b9f18	Make docs/invariants.md required reading every session Adds a top-of-file directive plus a Claude-Code @-import so the full invariants document is loaded into context on every turn, not only when an agent follows a pointer. Other agents are instructed to open it explicitly at session start. The §IX deny-list and §X review checklist apply to every change, so they should be in scope by default rather than gated on the agent remembering to look. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:39:09 +02:00
Ragnor Comerford	1e7334275a	Trim always-on rules to architectural-level invariants Drops four rules whose phrasing leaned on implementation specifics (nearest LIMIT, __run__<id> branches, __schema_apply_lock__, branch_list filter convention) — those are real constraints, but they live at the implementation layer and would go stale if internals are renamed or refactored. The architectural intent is captured by the remaining six rules and by the per-area docs. Reframes the kept rules at the survives-a-rename level: "multi-dataset publish is atomic across the whole graph" rather than naming the manifest table or the publisher type, etc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:38:24 +02:00
Ragnor Comerford	c924e121d2	Add architectural invariants & deny-list as docs/invariants.md A standing reference for invariants that hold across storage, engine, server, schema, indexing, observability, and the OSS/Cloud split. Used to check RFCs and PRs against the substrate boundaries (don't rebuild what Lance gives us), layering rules (one trait boundary per layer), distributability constraints (Send+Sync, location-neutral IR), honesty expectations (estimate-vs-actual, bounded failure modes), unified patterns (reconciler, Union polymorphism, SIP, factorize), the §IX deny-list, and the §X review checklist. §IV (additivity / migration) and §VIII (OSS/Cloud kernel-product split) are referenced but not yet drafted — flagged as placeholders pending upstream fill-in. AGENTS.md surfaces it from the topic index, the always-on rules section, and the maintenance contract; the deny-list is also inlined there as a fast-pass review filter so it stays in scope every turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:34:44 +02:00
Ragnor Comerford	a335d98854	Refactor AGENTS.md from encyclopedia to map; move spec into docs/ Splits the 990-line AGENTS.md into a 184-line map (architecture, where-to-find index, always-on invariants, capability matrix, maintenance contract) plus 18 new docs/*.md files holding the deep content per topic (storage, schema and query languages, indexes, embeddings, branches/commits, runs, merge, changes, execution, policy, server, CLI reference, audit, errors, CI, constants, v0.3.1 notes). Adds scripts/check-agents-md.sh and a check_agents_md CI job that verifies every docs/ link in AGENTS.md resolves and every doc in the canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:31:08 +02:00
Ragnor Comerford	cfea41e942	Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it Captures the v0.3.1 feature spec (storage, schema/query languages, IR, indexes, embeddings, branches/commits/runs, merge, server, CLI, policy, deployment) and adds a §26 maintenance contract instructing agents to keep this file current alongside any user-visible change. CLAUDE.md is a symlink so there's one source of truth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:10:09 +02:00
Andrew Altshuler	56b6319197	Enforce schema validators on every write path (#59 ) Several validators were defined but only called from a subset of write paths, so writes that violated @unique, @range, @check, enum, or @cardinality constraints could silently succeed and corrupt data. Adds two new helpers in loader/mod.rs: - validate_enum_constraints — batch-level enum check, scans Arrow string columns (and list-of-string columns) for values outside the declared set - enforce_unique_constraints_intra_batch — single-batch duplicate detection over named columns; partial enforcement (does not check against committed rows yet — cross-batch enforcement is a separate effort) Wires the validators into: - load_jsonl_reader nodes (alongside the existing validate_value_constraints call) and edges (which had no enum or unique check at all) - exec/mutation.rs node insert, edge insert, and update paths - mutation edge insert now also calls validate_edge_cardinality after the row lands but before the manifest commit, matching the loader's Phase 3 behavior A new tests/validators.rs suite asserts rejection on every entry path for invalid enum values, @range violations, intra-batch @unique duplicates, and edge @card excesses. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 04:51:10 +03:00
Andrew Altshuler	c8047b6620	Sharpen README tagline; add incident-response and compliance use cases (#58 ) * docs: sharpen README tagline; add incident-response and compliance use cases Lead with "lakehouse-native graph engine with git-style workflows" and a supporting line that names the action ("branch, commit, and merge typed graph data like source code") rather than restating capabilities. Adds incident-response and compliance graphs to the use-case list and fixes "multi-agentic" to "multi-agent". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: lift Capabilities above Quick Install; rename from "Omnigraph CORE" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 03:46:21 +03:00
Andrew Altshuler	f75b941a9e	Make schema apply atomic across crashes (#57 ) Schema apply previously committed the manifest before writing the schema source and IR contract files. A crash in that window left the manifest pointing at the new schema while _schema.pg, _schema.ir.json, and __schema_state.json still reflected the old one — a silent inconsistency that subsequent reads hit as type errors. Reorders the apply: write to staging filenames first, commit the manifest, then atomically rename staging → final. On open, a recovery sweep reconciles any leftover staging files against the manifest's table set: pre-commit crashes get the staging files deleted, post-commit crashes get the renames completed (idempotent — handles partial renames). Property-only migrations where both schemas imply the same table set return an operator-actionable error rather than guessing. Adds rename_text + delete to StorageAdapter (atomic on local FS via tokio::fs::rename; copy + delete on S3 — recovery is tolerant of the non-atomic case). Failpoints test coverage at both crash boundaries plus a partial-rename scenario. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:21:00 +03:00
Andrew Altshuler	372f793ad6	Drop macOS x86_64 build target (#55 ) Stop producing the omnigraph-macos-x86_64 archive in both the stable and edge release workflows. The macos-15-intel runner build was the slowest of the matrix and Apple Silicon is now the default Mac developer target. - release.yml + release-edge.yml: drop the macos-15-intel matrix entry - install.sh: drop the Darwin/x86_64 case so Intel Macs get a clear "no prebuilt binary" error instead of attempting an absent download - update-homebrew-formula.sh: drop the MACOS_X86_* variables and emit an arm64-only Homebrew formula. The on_macos block now declares `depends_on arch: :arm64` so Intel `brew install` fails fast with a clear architecture message instead of installing an arm64 binary that errors at exec time. Linux x86_64 build is unaffected. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 18:19:26 +03:00
andrew	0469b6883e	Ignore local-only working files Keep machine-local state (.claude/, .worktrees/, local omnigraph.yaml, CLAUDE.md, and schema design notes) from showing up as untracked in git status. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:41:15 +03:00
Andrew Altshuler	7310f69928	Revert "Merge pull request #49 from ModernRelay/ragnorc/x-request-id" (#54 ) This reverts commit `b352fca13c`, reversing changes made to `748ad334a9`.	2026-04-26 15:56:29 +03:00
Ragnor Comerford	b352fca13c	Merge pull request #49 from ModernRelay/ragnorc/x-request-id Add X-Request-Id middleware	2026-04-26 12:33:33 +02:00
Ragnor Comerford	e14b203208	Reuse X_REQUEST_ID constant for inbound header lookup Both Cursor Bugbot and Cubic flagged that the inbound `headers().get(...)` call constructed `HeaderName::from_static("x-request-id")` inline instead of reusing the `X_REQUEST_ID` constant defined at the top of the file. The two were already kept in sync by both being `from_static("x-request-id")`, but a future rename would have to touch both sites or risk silent drift between read and write. Also drops the now-unused `header` module import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 12:05:19 +02:00
Ragnor Comerford	748ad334a9	Merge pull request #48 from ModernRelay/ragnorc/api-sdk-research Polish OpenAPI spec for SDK generation	2026-04-26 11:52:46 +02:00
Ragnor Comerford	189caf893c	Merge pull request #47 from ModernRelay/perf/expand-dense-ids perf(expand): dense u32 ids end-to-end (follow-up to #45)	2026-04-25 23:54:03 +02:00
Ragnor Comerford	284c9377c2	Add X-Request-Id middleware Per-request ULID minted at the edge, exposed in request extensions and on the response header. Caller-supplied X-Request-Id is echoed when well-formed (1..=128 ASCII printable characters); otherwise rejected and replaced with a fresh ULID so the value is always safe to log. Companion to the TypeScript SDK redesign — clients now correlate logs across the wire by reading X-Request-Id from response headers (and the SDK already surfaces it on every OmnigraphError as `requestId`). No spec change required; the header is a transport-layer concern. Tests: - mint a ULID when no header is provided - echo a valid caller-supplied id - reject overlong header (200 chars), mint a fresh ULID Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 22:56:17 +02:00
Ragnor Comerford	7809bf607e	Polish OpenAPI spec for SDK generation Add operation descriptions and examples to utoipa annotations so the generated TypeScript SDK has rich JSDoc, and so future Python/Go SDKs and any /openapi.json docs UI benefit from the same effort. - Doc comments on all 18 handlers (utoipa picks up summary/description) - #[schema(example = ...)] on free-text fields (query_source, schema_source, NDJSON data) and i64 timestamps - Destructive/irreversible warnings on change, applySchema, ingest, mergeBranches, deleteBranch, publishRun, abortRun Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 16:36:51 +02:00
Ragnor Comerford	7ea868485e	Update README.md	2026-04-25 16:16:24 +02:00
Ragnor Comerford	7101565929	Update README.md	2026-04-25 16:16:07 +02:00
Andrew Altshuler	74eb5a5380	Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46 ) * Parallel per-type load writes + omnigraph optimize/cleanup CLI ## MR-677.3 — parallel per-type load writes The load path already groups records into one RecordBatch per type and makes one Lance commit per table (loader::mod.rs:249-..), but those commits ran sequentially. Wrap node and edge write loops in `futures::stream::buffered(N)` against a new helper `write_batches_concurrently`. Concurrency tunable via `OMNIGRAPH_LOAD_CONCURRENCY` (default 8). ## MR-676 — `omnigraph optimize` and `omnigraph cleanup` New CLI subcommands that walk every node + edge table in the repo: - `omnigraph optimize <uri>` — runs Lance `compact_files` on each table to merge small fragments into fewer larger ones. - `omnigraph cleanup <uri> --keep N \| --older-than 7d --confirm` — runs Lance `cleanup_old_versions` to prune historical manifests + unique fragments. Requires `--confirm` because it's destructive. Supports both count-based and time-based retention (or both AND'd together). Time uses chrono `DateTime<Utc>` (added as a workspace dep, default-features off). Both commands run their per-table loops in parallel (8-way bounded, `OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested against the 114-table prod graph: optimize went 7m15s sequential → 1m28s parallel. cleanup --keep 1 removed 137 historical versions across 114 tables in 1m57s without disrupting `/healthz` or query responses. Public API on `Omnigraph`: pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>> pub async fn cleanup(&mut self, opts: CleanupPolicyOptions) -> Result<Vec<TableCleanupStats>> All 10 existing loader tests still pass. Closes MR-676. Partially addresses MR-677 (the .3 — parallel by type — piece; MR-677.1 is for the `omnigraph embed` path, not load, since load doesn't call Gemini directly. .2 was already in place). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate openapi.json --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2026-04-25 14:22:14 +03:00
Ragnor Comerford	53d7f47909	Pass dense u32 ids through expand instead of round-tripping via String BFS now emits Vec<u32> dense ids directly with HashSet<u32> per-source dedup. Only the deduped set is stringified for Lance's IN-list. The post-hydrate alignment uses a dense-indexed Vec<Option<u32>> instead of HashMap<&str, usize>, giving O(1) lookup without repeated string hashing. End-to-end on the bench_expand harness (release, M-series): query baseline after speedup 1k hop3 460.2 ms 23.7 ms 19x 10k hop2 4.21 s 139.9 ms 30x 10k hop3 40.59 s 898.5 ms 45x 30k hop2 11.71 s 490.2 ms 24x 30k hop3 197.38 s 3.22 s 61x The cost lived in stringifying every (src,dst) pair and re-hashing the strings during alignment; once dense ids stay dense, the BFS inner loop and the final fan-out both collapse to integer ops. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 12:07:25 +02:00

1 2 3 4

156 commits