omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
Ragnor Comerford	5afde54d69	agents: stop overclaiming atomic multi-table publish — describe the three layers honestly External reviewer flagged that the capability matrix's "Atomic multi-dataset publish" cell implied Lance gives us a single primitive for cross-table atomicity. It doesn't. The real contract is three layers stacked: (1) per-table Lance `commit_staged` for the data write (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering (3) recovery-on-open reconciler for the residual gap between (1) and (2) — NOT YET SHIPPED, tracked in MR-847. Until MR-847 lands, a failure between per-table `commit_staged` and the manifest publish leaves drift on the partially-committed tables (the "Phase B → Phase C residual" documented in `docs/runs.md`). Also enumerate the legacy inline-commit residuals (`append_batch`, `merge_insert_batches`, `overwrite_batch`, `create_*_index`) alongside `delete_where` and `create_vector_index` — they remain on the trait pending Phase 1b call-site conversion + Phase 9 demotion. End the row with an explicit DO NOT: future agents reading the capability matrix should not describe atomicity as "fully upheld" until MR-847 ships. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:35:34 +02:00
Ragnor Comerford	9b0920b5da	address PR #70 bot review (Cubic + Cursor): 7 inline + failpoint test + invariants notes Cubic findings: * `tests/forbidden_apis.rs`: expand `FORBIDDEN_PATTERNS` with `Dataset::write` / `Dataset::append` / `Dataset::delete` / `Dataset::merge_insert` / `Dataset::add_columns` / `update_columns` / `drop_columns` / `truncate_table` / `restore` and the bare `.merge_insert(` / `.add_columns(` / `.update_columns(` / `.drop_columns(` / `.truncate_table(` method patterns. Deliberately avoid `.append(` / `.delete(` / `.write(` (over-match `Vec::append`, `.delete_branch(`, arrow-array `.append(`, etc.). Allow-list `commit_graph.rs` and `graph_coordinator.rs` — they're manifest-layer infra that legitimately uses `Dataset::write` for system tables. * `schema_apply.rs:253`: pass `entry.table_branch.as_deref()` (not `None`) to `open_dataset_head_for_write` for consistency with the sibling `indexed_tables` block. Schema apply rejects non-main branches at the lock-acquire step today, so behavior is unchanged; this is a defensive consistency fix that survives a future relaxation of the lock check. * `storage_layer.rs:131` doc: was `Vec<&StagedWrite>` with lifetime claim; actually returns `Vec<StagedWrite>` (cloned). Fixed. * `AGENTS.md:201` capability matrix row + `storage_layer.rs:1` module doc: softened the "stage_* + commit_staged are the only paths" / "trait funnels every write" overclaim. Inline-commit residuals (`delete_where`, `create_vector_index`) remain on the trait pending upstream Lance work (#6658, #6666); legacy `append_batch` etc. remain pending Phase 1b / Phase 9. Module doc now describes the current transitional state honestly. Cursor Bugbot findings: * `storage_layer.rs:360`: trait `delete_where` consumed `SnapshotHandle` but returned only `DeleteState`, dropping the post-delete dataset. Future callers migrating from the inherent `&mut Dataset` API would lose the post-delete dataset state needed for indexing / `table_state` queries. Fixed: returns `(SnapshotHandle, DeleteState)` matching `append_batch` / `overwrite_batch` shape. * `storage_layer.rs:824`: removed dead `_scanner_type_marker` fn and the unused `Scanner` import (the marker existed only to suppress an unused-import warning — fixing the import is the cleaner answer). Engine-level Phase A failpoint test (closes the partial-criterion flagged in Cubic's acceptance-criteria checklist): * `db/omnigraph/table_ops.rs::stage_and_commit_btree`: instrumented with `crate::failpoints::maybe_fail("ensure_indices.post_stage_pre_commit_btree")` between `stage_create_btree_index` and `commit_staged`. * `tests/failpoints.rs::ensure_indices_phase_a_btree_failure_leaves_existing_tables_writable`: triggers the failpoint via a schema-apply that adds a new node type; proves that existing tables are unaffected (Person mutation succeeds after the failed apply) — i.e. Phase A failure leaves no Lance-HEAD drift on tables outside the failed `added_tables` iteration. `docs/invariants.md` transitional notes: * §VI.23 (atomicity per query): annotated as upheld at the writer-trait surface for inserts / updates / scalar-index builds / merge_insert / overwrite after MR-793 PR #70. Per-table commit_staged → manifest publish window remains; closing requires MR-847's recovery-on-open reconciler. `delete_where` and `create_vector_index` remain inline pending lance#6658 / #6666. * §VII.35 (reconciler pattern): annotated as partial — staged primitives are the building blocks; the reconciler task itself is MR-848. * §VIII.45 (reference impl per trait): `TableStorage` has its primary impl on `TableStore` with opaque-handle signatures; no test impl yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 18:47:07 +02:00
Ragnor Comerford	b87be5e9f0	agents: read every Lance page even slightly relevant, not just the obvious match Behavior is interlocked across Lance pages — transactions reference index lifecycle, index lifecycle references compaction, compaction references row-id lineage. Skipping a "slightly relevant" page is how alignment misses happen. The index alone is not a substitute for reading the pages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 17:44:41 +02:00
Ragnor Comerford	17bf978d0e	MR-793 follow-up: lance docs alignment audit + mandate full-page fetch via mdrip * AGENTS.md / docs/lance.md: agents must use `npx mdrip` (not summarizing WebFetch) when consulting Lance docs. WebFetch routinely drops load-bearing details — `pub(crate)` blockers, sub-specs behind nav hubs, default flags. Lesson learned during the MR-793 alignment audit. * docs/lance.md: add "Last alignment audit: 2026-05-02" stanza documenting MemWAL gap, lance#6666 companion ticket, stable-row-ID status (experimental, may unblock MR-848), FRI as documented compaction-friendly alternative. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 17:41:32 +02:00
Ragnor Comerford	3135ff5d19	MR-793 phases 1-6: TableStorage trait + staged-write surface for engine writers Hoists Lance's stage+commit two-phase write pattern from "discipline at each writer" to a sealed trait surface (`TableStorage`). New engine code that needs to advance Lance HEAD MUST go through `stage_` + `commit_staged`; the trait's opaque `SnapshotHandle` / `StagedHandle` types keep `lance::Dataset` and `lance::Transaction` out of trait signatures. Phases landed (see .context/mr-793-design.md for the full plan): 1a: `crates/omnigraph/src/storage_layer.rs` — `TableStorage` trait, sealed (only in-tree types can impl), single impl on `TableStore` delegating to existing inherent methods; `Omnigraph::storage()` accessor returns `&dyn TableStorage`. * 2: three new staged primitives — `stage_overwrite`, `stage_create_btree_index`, `stage_create_inverted_index` — implementing the simple branch of Lance's `CreateIndexBuilder::execute` (scalar indices only; vector indices stay inline because `build_index_metadata_from_segments` is `pub(crate)` in lance-4.0.0). Six new tests in `tests/staged_writes.rs` pin both the new primitives and the inline residuals (`delete_where`, `create_vector_index`). * 3: `tests/forbidden_apis.rs` — defense-in-depth integration test walks engine source, fails on direct lance::* inline-commit API use outside `table_store.rs` / `db/manifest/`. Skips comment lines and honors `// forbidden-api-allow:` sentinels. * 4: `ensure_indices` migration — scalar index builds now route through `stage_create__index` + `commit_staged` instead of `create__index(&mut Dataset)`. Vector indices stay inline (residual, named honestly at the call site). * 5: `branch_merge::publish_rewritten_merge_table` migration — the merge_insert phase now uses `stage_merge_insert` + `commit_staged`; delete phase stays inline (Lance #6658 residual, named honestly). * 6: `schema_apply` rewritten_tables migration — non-empty rewrites use `stage_overwrite` + `commit_staged`; empty-batch rewrites stay inline because `InsertBuilder::execute_uncommitted` rejects empty data. The narrow inline window is bounded by `__schema_apply_lock__`. Verified-green test surface: * `cargo test -p omnigraph-engine` — 68 lib + ~120 integration tests (incl. 6 new staged_writes tests + the new forbidden_apis test). * `cargo test -p omnigraph-engine --features failpoints --test failpoints` — 5 tests, all green. * `cargo test --workspace` — green. Deferred to follow-up sessions (see design doc §17 split): * Phase 1b — convert remaining engine call sites to `&dyn TableStorage` (mostly READS that don't touch the staged-write invariant). * Phase 7 — recovery-on-open reconciler (closes Phase B → Phase C residual across process restarts; new subsystem). * Phase 8 — index-coverage reconciler (full §VII.35 compliance — removes synchronous index work from the publish path). * Phase 9 — demote unused `TableStore` inherent methods to `pub(crate)` (depends on Phase 1b). Lance upstream blockers documented: * lance-format/lance#6658 — two-phase delete API (open, no PRs). * Companion: `build_index_metadata_from_segments` should be `pub` so vector-index builds can be staged outside the lance crate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 11:03:15 +02:00
Ragnor Comerford	a61e82f47a	MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 10:43:19 +02:00
Ragnor Comerford	35be20cb05	MR-771: demote Run to direct-publish via expected_table_versions CAS mutate_as and load now write directly to target tables and call the publisher once at the end with per-table expected versions; the Run state machine, _graph_runs.lance writers, __run__ staging branches, and server /runs/* endpoints are removed. Multi-statement mutations remain atomic at the manifest level via an in-memory MutationStaging accumulator that gives read-your-writes within a query and a single publish at the end. Concurrent-writer conflicts surface as ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the old DivergentUpdate merge shape. Documents one known limitation in docs/runs.md: a multi-statement mid-query failure where op-N writes a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the manifest until a follow-up introduces per-table Lance branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 08:52:50 +02:00
Claude	63babede4a	AGENTS.md: reframe 'minimize ongoing liability' as a general decision lens The previous bullets read like a migration pattern (centralized dispatcher, one match arm, no shape forks). That's one application, not the principle. Reframe it as a bidirectional decision lens: ask "which option has lower ongoing cost over time?" and let the answer be more code, less code, DRYing, duplication, removal, addition, a new abstraction, or flattening one — whichever shape converges over the expected change horizon. Add explicit examples of cases where the lower-liability option is more code (dispatcher, migration framework, typed error variants) and where it's less (premature abstractions, "just in case" paths, helpers that wedge independently-evolving callers together) so readers don't collapse the principle into "minimize code".	2026-04-29 12:32:30 +00:00
Claude	b6a596670a	AGENTS.md: add 'minimize ongoing liability' as a first principle The always-on rules are concretizations of a broader engineering posture that wasn't stated explicitly. Add a short section that frames the spirit behind those rules: - One centralized detection point, not many heal hooks. - One dispatcher, not branch-on-shape in every consumer. - One canonical shape after migration, not forks on "old vs new". - Three similar lines beats a premature abstraction. - Delete dead paths when their last caller leaves. Plus a forward-looking review prompt ("what do these paths look like after 5 more changes like this?") so the principle bites at design time, not just at review time. The internal-schema-version mechanism we just shipped is a concrete application: one stamp + one dispatcher + one match arm per change, no heal hooks scattered across the engine. Codify the pattern so future work doesn't drift back to ad-hoc.	2026-04-29 11:48:47 +00:00
Ragnor Comerford	6f25c4f9f8	Address reviewer feedback (Cursor + cubic) on PR #60 All eight comments verified against source and applied: - AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of the markdown blockquote. Claude Code's @-import parser expects @ at column 0; the leading "> " of a blockquote silently broke recognition, so the claimed auto-include did nothing. (Cursor, Medium severity.) - docs/cli-reference.md: command-family count 13 → 17. The current enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level variants. (cubic P2.) - docs/ci.md: Homebrew tap update is a regular `git push`, not a force-push (release.yml:117 is `git push origin HEAD:main`). (cubic P2.) - docs/errors.md: add the Storage variant to the NanoError list — it exists at error.rs:88-89 but the doc enumerated only 10 of 11. (cubic P2.) - docs/storage.md: clarify tombstone semantics. There is no tombstone_version column; state.rs:180 reads the tombstone version from the table_version column on rows where object_type = table_tombstone. (cubic P2.) - docs/branches-commits.md: split the GraphCommit pseudo-struct from the underlying storage. actor_id is joined in-memory from _graph_commit_actors.lance, not a column on _graph_commits.lance. (cubic P2.) - docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to match the actual constant name in catalog/schema_ir.rs:11. (cubic P3.) - docs/testing.md: engine integration test count 16 → 15 (matches `ls crates/omnigraph/tests/*.rs`). (cubic P3.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 00:09:06 +02:00
Ragnor Comerford	ada58ccd7b	Make "check existing coverage first" a top-level testing principle The original docs/testing.md mentioned finding existing tests as step 1 of the checklist but never explicitly said "if existing coverage already addresses your case, extend it; don't duplicate." Adds a prominent "First principle" section that names extend-vs-new as the preferred outcome and lists three duplicated init_and_load blocks as the most common form of test rot. Adds an extra checklist item: verify your change makes an existing test fail before it makes a new one pass — if you can break the code without breaking a test, that coverage gap is the bug to fix first. Strengthens the AGENTS.md callout so the principle ("always check what already covers it") is in scope from the top of every session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 00:03:50 +02:00
Ragnor Comerford	8be0e6a067	Add docs/testing.md as required-read every session Maps the test surface (engine integration tests by area, CLI/server tests, helpers harness, fixtures, failpoints feature, RustFS S3 integration, OpenAPI drift) and gives a before-every-task checklist: find existing tests for the area, run them as a clean baseline, plan the new test up front, reuse helpers, mind the layer boundary per invariants §VII.33. Notes that there's no coverage tooling today — coverage knowledge comes from reading and running the relevant integration tests, not a tarpaulin/codecov report. Threaded into AGENTS.md as the third required-reading file alongside invariants.md and lance.md, with a Claude-Code @-import so agents load it on every turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:55:21 +02:00
Ragnor Comerford	a06d8bcf82	Promote docs/lance.md to required-read every session Adds @docs/lance.md alongside @docs/invariants.md so the Lance index loads on every turn (Claude Code @-import; explicit-open instruction for other agents). Reframes the directive from "when you hit a Lance-shaped problem" to "consult before every task to identify which upstream pages are relevant." The Lance docs are the authoritative source for substrate behavior, so reasoning about them should start every change rather than be triggered conditionally. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:50:42 +02:00
Ragnor Comerford	b6440d6b17	Add docs/lance.md — task-organized index of Lance upstream docs Curates the Lance documentation site (lance.org) into a problem-domain index so agents fetch the right page when working on Lance-touching code instead of guessing or grepping our codebase. Organized by topic: storage format & file layout, branching/tags/time travel, indexes (scalar + system + vector), reads/writes, schema evolution, object store, data types, performance, compaction, DataFusion integration, SDK reference, plus quick-starts and the upstream AGENTS.md. Skips ~200 irrelevant URLs from the upstream sitemap (Namespace REST API model surface, Spark/Trino/Databricks/etc. integrations, Python/Ray/HuggingFace docs, community pages) since omnigraph is Rust-only and doesn't run a Lance Namespace catalog. AGENTS.md surfaces it in the topic index and adds a directive: "when you hit a Lance-shaped problem, consult docs/lance.md and fetch the upstream URL before guessing." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:48:28 +02:00
Ragnor Comerford	43724b9f18	Make docs/invariants.md required reading every session Adds a top-of-file directive plus a Claude-Code @-import so the full invariants document is loaded into context on every turn, not only when an agent follows a pointer. Other agents are instructed to open it explicitly at session start. The §IX deny-list and §X review checklist apply to every change, so they should be in scope by default rather than gated on the agent remembering to look. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:39:09 +02:00
Ragnor Comerford	1e7334275a	Trim always-on rules to architectural-level invariants Drops four rules whose phrasing leaned on implementation specifics (nearest LIMIT, __run__<id> branches, __schema_apply_lock__, branch_list filter convention) — those are real constraints, but they live at the implementation layer and would go stale if internals are renamed or refactored. The architectural intent is captured by the remaining six rules and by the per-area docs. Reframes the kept rules at the survives-a-rename level: "multi-dataset publish is atomic across the whole graph" rather than naming the manifest table or the publisher type, etc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:38:24 +02:00
Ragnor Comerford	c924e121d2	Add architectural invariants & deny-list as docs/invariants.md A standing reference for invariants that hold across storage, engine, server, schema, indexing, observability, and the OSS/Cloud split. Used to check RFCs and PRs against the substrate boundaries (don't rebuild what Lance gives us), layering rules (one trait boundary per layer), distributability constraints (Send+Sync, location-neutral IR), honesty expectations (estimate-vs-actual, bounded failure modes), unified patterns (reconciler, Union polymorphism, SIP, factorize), the §IX deny-list, and the §X review checklist. §IV (additivity / migration) and §VIII (OSS/Cloud kernel-product split) are referenced but not yet drafted — flagged as placeholders pending upstream fill-in. AGENTS.md surfaces it from the topic index, the always-on rules section, and the maintenance contract; the deny-list is also inlined there as a fast-pass review filter so it stays in scope every turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:34:44 +02:00
Ragnor Comerford	a335d98854	Refactor AGENTS.md from encyclopedia to map; move spec into docs/ Splits the 990-line AGENTS.md into a 184-line map (architecture, where-to-find index, always-on invariants, capability matrix, maintenance contract) plus 18 new docs/*.md files holding the deep content per topic (storage, schema and query languages, indexes, embeddings, branches/commits, runs, merge, changes, execution, policy, server, CLI reference, audit, errors, CI, constants, v0.3.1 notes). Adds scripts/check-agents-md.sh and a check_agents_md CI job that verifies every docs/ link in AGENTS.md resolves and every doc in the canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:31:08 +02:00
Ragnor Comerford	cfea41e942	Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it Captures the v0.3.1 feature spec (storage, schema/query languages, IR, indexes, embeddings, branches/commits/runs, merge, server, CLI, policy, deployment) and adds a §26 maintenance contract instructing agents to keep this file current alongside any user-visible change. CLAUDE.md is a symlink so there's one source of truth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:10:09 +02:00

19 commits