omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
aaltshuler	df30aa6935	feat(lint): add OG-MF-105 / OG-MF-107 enum-migration codes Mint two schema-lint codes for the enum-migration planner work: - OG-MF-105 "narrow enum value set" (Validated): removing allowed enum variants — apply scans existing rows and fails loudly on a row holding a now-disallowed value. - OG-MF-107 "constrain String to enum" (Validated): tightening a free String to an enum — same validated-scan semantics. Both are appended to ALL_CODES and EMITTED_IN_V0. The OG-MF-106 doc is narrowed to mean a genuine scalar-type change only, now that enum value-set deltas are split out to 105/107. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 23:44:28 +01:00
aaltshuler	550ab8b3d1	test: baseline schema-apply coverage for enum-add and metadata-only apply Pins two foundations for enum-migration work: - Planner + integration coverage that adding a nullable enum property is supported (AddProperty), backfills existing rows as NULL, and that the enum value-set is enforced on subsequent writes. Closes the gap where this path was exercised only implicitly via generic nullable-property logic. - A regression test for the metadata-only apply path (UpdateTypeMetadata from an added @description): the manifest version does not advance, yet the schema contract is persisted, `applied` is true, and a reopen re-plans the same source as a no-op. Enum widen will ride exactly this path, so the contract is now nailed down before building on it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 23:43:38 +01:00
Ragnor Comerford	d6d92a32a0	Update README.md (#120 )	2026-05-25 22:08:22 +02:00
Ragnor Comerford	cc2412dc65	Rename repo terminology to graph (#118 ) Some checks failed CI / Classify Changes (push) Has been cancelled Details CI / Check AGENTS.md Links (push) Has been cancelled Details Release Edge / Prepare edge release (push) Has been cancelled Details CI / Test Workspace (push) Has been cancelled Details CI / Test omnigraph-server --features aws (push) Has been cancelled Details CI / RustFS S3 Integration (push) Has been cancelled Details Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled Details	2026-05-24 16:46:00 +01:00
Andrew Altshuler	587fbeabd8	ci(publish-crates): set User-Agent + treat "already exists" as success (#117 ) Two related fixes uncovered while recovering the v0.5.0 publish. 1. crates.io API requires a User-Agent header. The `publish_if_new` skip check was doing a bare `curl -fsSL https://crates.io/api/...` which crates.io rejects with HTTP 403. With `-f` curl exits non-zero, the pipeline returns empty, the script doesn't recognize already-published crates, and we fall through to a real publish attempt. On a re-run that means cargo publish errors with "already exists on crates.io index" for crates that DID publish successfully on the previous run. Fix: send a `User-Agent: ModernRelay-omnigraph-ci (URL)` header. 2. Defense in depth: even with the UA, the API could hiccup. If the skip check misses an existing version and cargo publish errors with "already exists on crates.io index", treat as success instead of failing the whole run. This makes the workflow re-runnable after any partial publish without needing manual intervention. Both fixes are required to recover from the v0.5.0 partial publish where omnigraph-compiler@0.5.0 made it through but the run failed before omnigraph-policy / engine / server / cli — re-triggering the workflow now succeeds end-to-end. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 14:19:17 +01:00
Andrew Altshuler	1a9f8b1f7f	ci(publish-crates): include omnigraph-policy in the publish list (#116 ) omnigraph-policy is a new crate this release cycle (Cedar policy engine, MR-722). It wasn't added to the publish list when it was created, so v0.5.0's tag-triggered publish run succeeded for omnigraph-compiler but failed at omnigraph-engine: failed to prepare local package for uploading Caused by: no matching package named `omnigraph-policy` found location searched: crates.io index required by package `omnigraph-engine v0.5.0` omnigraph-policy has no internal omnigraph-* deps so it can publish after omnigraph-compiler (either could go first). omnigraph-engine depends on both; server on the three; cli on everything. publish_if_new is idempotent — re-running with the v0.5.0 tag after this lands will skip omnigraph-compiler (already published), then publish policy + engine + server + cli. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 14:09:58 +01:00
Andrew Altshuler	bb1fe57640	release: v0.5.0 (#115 ) * gitignore: exclude docs/internal/ from publication Mirrors the existing "Local-only working files (not for the public repo)" pattern. Working notes filed under docs/internal/ stay on the contributor's machine instead of cluttering the published doc tree or tripping the AGENTS.md / docs-index cross-link check (scripts/check-agents-md.sh enumerates every docs/.md and requires each one to be linked from an audience index — internal notes don't have an audience index by definition). Incidental to the v0.5.0 release; lands separately from the version bump commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> ci: skip docs/internal/ in agents-md cross-link check Matches the .gitignore exclusion. Mirrors the existing 'docs/releases/' exclusion pattern: notes under docs/internal/ aren't part of the published doc tree and don't need to be linked from an audience index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * release: v0.5.0 — Lance 6 substrate, Cedar policy engine, schema-lint v1 Bumps the workspace from 0.4.2 to 0.5.0. Release notes at docs/releases/v0.5.0.md. Three user-visible pillars motivate the minor bump: 1. Lance 6.0.1 substrate (DataFusion 52→53, Arrow 57→58) 2. Engine-wide Cedar policy enforcement on every _as writer; server defaults to deny-all; signed-token-claim-only actor identity 3. Schema-lint v1 chassis: OG-XXX-NNN codes, soft drops, and `--allow-data-loss` (Hard mode) for destructive migrations Plus structured DataFusion Expr filter pushdown (unblocks CompOp::Contains via array_has), HTTP allow_data_loss parity, inline .gq sources on CLI/HTTP, optional CORS layer, and bug fixes (merge-insert dup-rowid, branch-merge coordinator restore on error, blob columns in branch merge). Sites bumped: - 5 crate [package].version lines (omnigraph, omnigraph-cli, omnigraph-compiler, omnigraph-policy, omnigraph-server) - 10 internal path-dep `version = "..."` constraints across the four manifests that depend on sister crates (engine, server, cli, plus engine's dev-dep on the compiler) - Cargo.lock (regenerated via cargo update --workspace) - AGENTS.md "Version surveyed:" - openapi.json `info.version` (regenerated via OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test openapi) Verification: - cargo test --workspace --locked: 907/907 green - cargo test -p omnigraph-engine --test failpoints --features failpoints: 19/19 green - cargo test -p omnigraph-engine --test lance_surface_guards: 3/3 - scripts/check-agents-md.sh: clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 13:59:42 +01:00
Andrew Altshuler	cb80fa40f1	exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113 ) * exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr) Switches `execute_node_scan` from string-flattened Lance SQL pushdown (`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`). ## What this enables 1. `CompOp::Contains` now pushes down. `ir_filter_to_sql` returned `None` for list-contains (the comment said "Can't pushdown list contains") because string SQL can't easily express it. With Expr, it lowers to DataFusion's `array_has(col, value)` builtin via the `nested_expressions` feature, and pushes down to Lance's scan layer the same way Eq/Lt/etc. do. Pinned by the new regression test `end_to_end::ir_filter_with_list_contains_pushes_down`. 2. DataFusion 53's optimizer rules now reach our predicates. Once the Expr lands at the Lance scanner, DF's planner runs: - `IN`-list vectorized eq kernel (DF #20528) - `PhysicalExprSimplifier` (DF #20111) - CASE WHEN x THEN y ELSE NULL shortcut (DF #20097) - Push limit into hash join (DF #20228) None of these were applicable before because the string SQL path short-circuited the optimizer. ## Scope This is one of three string-flattened pushdown sites; the other two (`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL string path for now: - The Expand pushdown still serializes through `hydrate_nodes`'s `extra_filter_sql: Option<&str>` parameter. Migrating it changes the `TableStorage` trait surface (`scan_stream(filter: Option<&str>)` → `Option<Expr>`) and the cascading call sites — out of scope for this MR. - The mutation delete predicate still goes through `Dataset::delete(&str)` in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the Lance v7 bump per issue #112) will migrate that path to `DeleteBuilder::execute_uncommitted` taking an Expr. The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql` helpers stay in place to serve the remaining string-SQL consumers (mutation predicates). They get retired when the other call sites migrate. ## Cargo Enables the `nested_expressions` feature on the `datafusion` workspace dep. Lance already pulls in `datafusion-functions-nested` transitively (it's listed in their feature set), so this just exposes the `datafusion::functions_nested::expr_fn::array_has` re-export. No transitive dep change (Cargo.lock unchanged). ## Tests - New: `ir_filter_with_list_contains_pushes_down` — pins the case that was previously impossible (`ir_filter_to_sql` returning `None`). - 906/906 workspace tests still pass. - 417/417 engine integration tests pass (was 416 + the new one). - 19/19 failpoints (recovery canary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break) The RustFS S3 Integration job started failing 2026-05-23 with all 3 tests panicking on the first PUT: HTTP error: error sending request The "Dump RustFS logs on failure" step revealed the container was dying at startup: [FATAL] Server encountered an error and is shutting down: Default root credentials are not allowed on non-loopback listeners; set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values, bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true for local development only `rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as "default" values. PR #111 passed yesterday because it ran against beta.3; today's runs against beta.4 fail at container startup. This is unrelated to PR #113's Expr-pushdown refactor — the bump just happened to hit the same week. Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The right long-term fix is one of: - Rotate the CI creds to less-default values (less coupling to RustFS's "default" set definition) - Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the error message - Use a workflow service container with controlled lifecycle Deferred — pinning is the minimal restore. Also incidentally documents which version we tested against, which `:latest` never did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 12:47:33 +01:00
Andrew Altshuler	3551e0d40e	chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111 ) * tests: add lance_surface_guards pre-flight pins for the v6 bump Land 8 named guards in a new test file that pin Lance API surfaces OmniGraph relies on. Each guard turns a silent-break risk (variant rename, struct restructure, async-flip) into a red CI bar instead of runtime drift. Guards (mapped to the silent-break inventory from the v6 migration plan): Runtime (#[tokio::test]): 1. lance_error_too_much_write_contention_variant_exists — pins the variant referenced by db/manifest/publisher.rs::map_lance_publish_error. 2. manifest_location_field_shape — pins .path/.size/.e_tag/.naming_scheme types and ManifestLocation accessor returning &Self (the access pattern at db/manifest/metadata.rs:84-88). 6. write_params_default_does_not_set_storage_version — confirms our explicit V2_2 pin remains load-bearing (blob v2 requirement). Compile-only async fns (#[allow(...)] + unimplemented!() placeholders; never run, but cargo build --tests enforces the API shape): 3. checkout_version + restore chain — pins the recovery rollback hammer at db/manifest/recovery.rs:505-522. 4. DatasetBuilder::from_namespace().with_branch().with_version().load() — pins the namespace builder chain at db/manifest/namespace.rs:162-174. 5. MergeInsertBuilder fluent chain — pins the manifest CAS at db/manifest/publisher.rs:370-391, including the return shape (Arc<Dataset>, MergeStats). 7. compact_files(&mut ds, CompactionOptions, None) — pins db/omnigraph/optimize.rs:107. 8. DeleteResult { new_dataset, num_deleted_rows } — pins the inline delete result shape (MR-A will repurpose this guard to the staged two-phase variant once Lance #6658 migration lands). This is commit 1 of the chore/lance-6.0.1 migration. Cargo bump follows in commit 2 (will trigger the guards under v6 if any surface drifted). Per the migration plan at ~/.claude/plans/shimmering-percolating-duckling.md (written this session). Two guards from the plan deferred to follow-up: - manifest_cas_returns_row_level_contention_variant (full publisher race integration test — needs harness scaffolding) - table_version_metadata_byte_compatible_with_v4 (TableVersionMetadata is pub(crate); requires test reach extension). Verified on v4: cargo test -p omnigraph-engine --test lance_surface_guards passes 3/3 runtime tests; cargo build -p omnigraph-engine --tests compiles all 5 compile-only guards clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump Lance 4.0.0 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58 The Cargo bump itself. Source is intentionally untouched — this commit will not compile. The compile errors are the work-list for subsequent commits on this branch. Lance updates: lance + 7 sub-crates 4.0.0 → 6.0.1. Transitive churn: + lance-tokenizer v6.0.1 (vendored tokenizer per Lance PR #6512) + object_store 0.13.x (Lance 6 brings it transitively; our explicit pin stays at 0.12.5 for now — revisit in stages if diamond bites) - tantivy* crates (replaced by lance-tokenizer) Compile error landscape on this commit (11 errors): • 1× E0432: `lance_index::DatasetIndexExt` import (Lance PR #6280 moved it to lance::index). Sites: table_store.rs:20, db/manifest.rs:37 (the second site was missed by the pre-flight inventory). • 8× E0599: `create_index_builder` / `load_indices` missing on `lance::Dataset` — all downstream of the DatasetIndexExt move. Once the import is corrected on table_store.rs and db/manifest.rs, these resolve automatically. • 2× E0063: missing field `is_only_declared` in `DescribeTableResponse` initializer at db/manifest/namespace.rs:221, 364. New Lance namespace field per the v5 namespace restructure (PR #6186). Surface guards (lance_surface_guards.rs, commit `d571fa8`) all still compile + the 3 runtime ones pass on v6 — none of the silent-break surfaces drifted. That's the load-bearing observation: the publisher CAS chain, ManifestLocation field shape, checkout_version/restore, DatasetBuilder fluent chain, MergeInsertBuilder return shape, WriteParams::default, compact_files signature, and DeleteResult fields are all v6-stable. Next commits address the 11 errors per the migration plan stages 3-8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * imports: move DatasetIndexExt to lance::index (Lance PR #6280) Lance 5.0 (PR #6280) moved `DatasetIndexExt` out of `lance-index` into `lance::index`. `is_system_index` and `IndexType` stayed in `lance-index`. Mechanical update of 6 import sites: crates/omnigraph/src/table_store.rs:20 — split into two `use` lines crates/omnigraph-server/tests/server.rs:10 — was traits::DatasetIndexExt crates/omnigraph/tests/search.rs:6 crates/omnigraph/tests/branching.rs:7 crates/omnigraph/tests/failpoints.rs:467 crates/omnigraph-cli/tests/cli.rs:3 — was traits::DatasetIndexExt All 9 E0599 cascading errors on .create_index_builder / .load_indices resolve once the trait is back in scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * namespace: add is_only_declared field to DescribeTableResponse Lance namespace 6.0.0 added `is_only_declared: Option<bool>` to `DescribeTableResponse` (lance-namespace-reqwest-client 0.7+ via the v5.0 namespace API restructure, Lance PR #6186). Set to `Some(false)` because every table BranchManifestNamespace returns from describe_table is materialized — the manifest snapshot only includes entries for tables we've already opened via Dataset::open. Two sites in db/manifest/namespace.rs (BranchManifestNamespace + StagedTableNamespace impls of LanceNamespace::describe_table). Closes the last two compile errors from the v6 bump in the engine lib. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * cargo: add lance to omnigraph-cli + omnigraph-server dev-deps Stage 3 moved DatasetIndexExt imports from `lance-index` to `lance::index` in the cli and server test crates. Both crates only had `lance-index` in their dev-dependencies; add `lance` alongside so the new path resolves. This is the last compile-error fix from the v6 bump — `cargo build --workspace --tests` is now green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: refresh Lance alignment audit for v6.0.1; bump surveyed version Per CLAUDE.md maintenance rule 2 (same-PR docs): - docs/dev/lance.md: replace the v4.0.1 alignment audit stanza with the v6.0.1 audit. Captures every v5/v6 finding from this PR (the DatasetIndexExt move, DescribeTableResponse.is_only_declared, MergeInsertBuilder return shape, ManifestLocation field shape, LanceFileVersion::default flip, file-reader async, tokenizer vendor, Lance #6658/#6666/#6877 status). Cross-references each guard in tests/lance_surface_guards.rs. - AGENTS.md: bump "Storage substrate: Lance 4.x" → "Lance 6.x". Note: surveyed crate version stays at 0.4.2 — substrate version bumps are independent of OmniGraph's release version. - crates/omnigraph/src/storage_layer.rs: update the trait module-level doc-comment to reflect that Lance #6658 closed 2026-05-14 and delete_where two-phase migration is MR-A (the next follow-up). #6666 stays open; create_vector_index inline residual stays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * tests: silence clippy::diverging_sub_expression on compile-only guards The five `_compile_` async fns in lance_surface_guards.rs use `let ds: Dataset = unimplemented!()` as a placeholder so type inference can chase the method chain we want to pin, without ever running the function. Clippy's `diverging_sub_expression` lint flags this pattern because the RHS diverges; that's the entire point. Added to the per-fn `#[allow(...)]` list, alongside dead_code / unreachable_code / unused_variables / unused_mut already there. No behavior change. cargo test -p omnigraph-engine --test lance_surface_guards still 3/3 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> docs: correct #6658 status — closed but API ships in Lance v7.x, not v6.0.1 The audit stanza in docs/dev/lance.md and the storage_layer.rs trait doc-comment both implied the public DeleteBuilder::execute_uncommitted API shipped with Lance 6.0.1. It did not. Issue #6658 closed 2026-05-14, but binary search across the release stream confirms: v6.0.1 ❌ no pub async fn execute_uncommitted on DeleteBuilder v6.1.0-rc.1 ❌ v7.0.0-beta.5 ❌ v7.0.0-beta.10 ✅ first appearance v7.0.0-rc.1 ✅ So MR-A (delete two-phase migration) is gated on the Lance v7.x bump, not on this PR. v7.0.0-rc.1 dropped 2026-05-21; GA likely within a week. No behavior change. Doc-only correction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(lib): bump recursion_limit to 256 — Lance 6 trait depth on Linux Lance 6's heavier trait surface around futures/streams in storage_layer.rs's staged-write API pushes the rustc trait-resolution recursion limit past the default 128 on Linux builds. CI on PR #111 surfaced this in both `Test Workspace` and `Test omnigraph-server --features aws`: error: queries overflow the depth limit! = help: consider increasing the recursion limit by adding a `#![recursion_limit = "256"]` attribute to your crate (`omnigraph`) = note: query depth increased by 130 when computing layout of `{async block@crates/omnigraph/src/storage_layer.rs:697:5: 697:10}` (The async block is `stage_create_btree_index`'s body — its return type is several layers of `impl Future<Output=Result<StagedHandle>>` deep on top of Lance's own builder return types.) Local macOS builds happened to short-circuit before tripping the limit, which is why this didn't surface during the v6 bump sequence. The fix rustc itself suggests is one line at the crate root. No behavior change. Revisit if a future Lance bump stops needing it. Verified: `cargo build --locked -p omnigraph-server --features aws` compiles clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 00:42:29 +01:00
Ragnor Comerford	e1b40aee0b	engine: opt MergeInsertBuilder into FirstSeen for Lance dup-rowid bug (MR-957) (#109 ) * engine: opt MergeInsertBuilder into FirstSeen for Lance dup-rowid bug (MR-957) Lance 4.0.x's MergeInsertBuilder rejects sequential merge_insert / update against rows previously rewritten by merge_insert with a spurious "Ambiguous merge inserts: multiple source rows match the same target row on (id = ...)" error. The engine passes exactly 1 source row; Lance's `processed_row_ids: Mutex<HashSet<u64>>` (lance-4.0.0 src/dataset/write/merge_insert.rs:2099) double-processes the same source/target match against datasets previously rewritten by merge_insert and errors under the default SourceDedupeBehavior::Fail. Two surfaces hit it: - Load: `omnigraph load --mode merge` twice against the same @key set. - Mutate: sequential `update T set {f:v} where x=y` on the same row. Fix: opt both MergeInsertBuilder call sites (merge_insert_batch, stage_merge_insert) into SourceDedupeBehavior::FirstSeen. Lance silently skips a duplicate match instead of erroring. Correctness-preserving for OmniGraph because source-side duplicates are already rejected upstream of these call sites: - Loader: enforce_unique_constraints_intra_batch (loader/mod.rs:1453) rejects intra-batch dup @key values across all three LoadModes, pinned by the new loader_rejects_intra_batch_duplicate_keys test. - Mutate: MutationStaging::finalize pre-dedupes by id. So FirstSeen only suppresses the spurious Lance behavior, never user data. Regression coverage: - consistency::load_merge_repeated_against_overlapping_keys_succeeds — load surface (was the basis of the original PR #98 report). - runs::second_sequential_update_on_same_row_succeeds — update surface (MR-920). - consistency::loader_rejects_intra_batch_duplicate_keys — pins FirstSeen's safety argument. - consistency::load_merge_window_2_documents_upstream_lance_gap — canary for the residual upstream Lance gap (after MR-848 removes the eager BTREE-on-id, re-establishing the index via ensure_indices re-triggers the bug class). Drop the FirstSeen setter only when this canary stays green without it. Cross-validation on the prior PR #98 branch: both use_index(false) (PR #98's hypothesis) and FirstSeen (MR-920's hypothesis) cover both surfaces individually. FirstSeen chosen because it has no perf cost (use_index(false) would force full-table scans on every merge_insert). Supersedes PR #98 and andrew/merge-insert-firstseen. Tracked at MR-957; upstream: lance-format/lance#6877. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * engine: add dedup-by-keys precondition on merge_insert primitives Addresses Codex P1 on PR #109: `SourceDedupeBehavior::FirstSeen` silently collapses duplicate source rows, and the branch-merge rewrite path (`exec/merge.rs::publish_rewritten_merge_table`) feeds a concatenated batch directly into `stage_merge_insert` without going through `MutationStaging::finalize`'s pre-dedupe. By construction the merge algorithm (`compute_source_delta` / `compute_three_way_delta` walk via `OrderedTableCursor` and push each id at most once) produces 1-row-per-id, but the invariant was implicit — a future refactor could violate it and FirstSeen would mask the bug as silent data loss. Add `check_batch_unique_by_keys` as a release-mode precondition at the top of `merge_insert_batch` and `stage_merge_insert`. Errors with an explicit "duplicate source row" message before the builder runs, so real source dups continue to fail-fast regardless of caller. Cost: one extra O(N) pass over the key column on every merge_insert. String HashSet over typical batch sizes is microseconds — negligible next to the merge_insert itself. The inline comment in `table_store.rs` now enumerates all three pre-dedup paths (load / mutate / branch-merge) and names the precondition as the structural pin instead of relying on by-construction invariants from three separate callers. Three new unit tests in `table_store::tests` pin the helper itself; the existing `loader_rejects_intra_batch_duplicate_keys` integration test continues to pin the loader's intake-time check as the first defense layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 18:19:54 +01:00
Andrew Altshuler	aadfa11ecb	schema: HTTP allow_data_loss exposure + e2e drop coverage (MR-694 follow-up) (#107 ) Some checks failed CI / Classify Changes (push) Has been cancelled Details CI / Check AGENTS.md Links (push) Has been cancelled Details Release Edge / Prepare edge release (push) Has been cancelled Details CI / Test Workspace (push) Has been cancelled Details CI / Test omnigraph-server --features aws (push) Has been cancelled Details CI / RustFS S3 Integration (push) Has been cancelled Details Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled Details The schema-lint chassis v1.2 (PR #100) shipped `--allow-data-loss` on the CLI, but `SchemaApplyRequest` had no equivalent field — Hard-mode drops were CLI-only. This commit closes that feature gap and adds e2e test coverage for drop modes across HTTP + CLI, plus data preservation on additive apply, plus a CLI↔SDK plan-parity assertion. Feature gap closed: - `crates/omnigraph-server/src/api.rs` — added `allow_data_loss: bool` (default false via `#[serde(default)]`) to `SchemaApplyRequest`. Added `Default` derive so test usages can use `..Default::default()`. - `crates/omnigraph-server/src/lib.rs` — `server_schema_apply` now constructs `SchemaApplyOptions { allow_data_loss: request.allow_data_loss }` and threads through to `apply_schema_as`. - `crates/omnigraph-cli/src/main.rs` — remote-URI schema-apply path used to bail with "--allow-data-loss not yet supported on remote"; now forwards the flag into the JSON payload so the CLI behaves identically against local and remote URIs. - `openapi.json` — regenerated; only diff is the new field on `SchemaApplyRequest`. Tests added (8 new): * `crates/omnigraph-server/tests/server.rs` (+5): - `schema_apply_route_soft_drops_property_via_http` — POST schema removing nullable property, verify catalog reflects the drop AND `snapshot_at_version(pre)` still has `age` in the field list (time-travel reachability is the Soft contract). - `schema_apply_route_soft_drops_node_type_via_http` — POST schema removing `Company` node + cascading `WorksAt` edge. - `schema_apply_route_hard_drops_property_with_allow_data_loss` — POST with `allow_data_loss: true`, verify plan step reports `mode: hard`. - `schema_apply_route_keeps_drops_soft_without_flag` — same schema without flag, verify `mode: soft`. Pins default semantics against accidental Hard promotion. - `schema_apply_route_additive_property_preserves_existing_rows` — load fixture, POST adding nullable property, verify row count preserved (SDK suite covers data preservation on drops + renames; additive AddProperty wasn't pinned). Plus helpers `schema_without_age` and `schema_without_company`. * `crates/omnigraph-cli/tests/cli.rs` (+3): - `schema_apply_allow_data_loss_flag_promotes_drops_to_hard` — CLI `omnigraph schema apply --allow-data-loss --schema X.pg --json`, verify plan step has `mode: hard`. - `schema_apply_without_allow_data_loss_keeps_soft_drops` — without flag, verify Soft. - `schema_plan_parity_cli_and_sdk` — same `.pg` source through `Omnigraph::plan_schema` (SDK) and `omnigraph schema plan --json` (CLI), assert the steps array is byte-identical post-JSON. HTTP has no `/schema/plan` endpoint; apply-side parity is implicitly covered by the HTTP drop tests + CLI drop tests using identical fixtures. Docs: - `docs/user/schema-language.md` — new "Destructive drops" section documenting Soft vs Hard semantics and that `allow_data_loss` is now honored uniformly across CLI / HTTP / SDK. Verification: every new test passes; full `cargo test --workspace --locked` green; `scripts/check-agents-md.sh` passes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 01:56:46 +03:00
Andrew Altshuler	e8fec2fa0f	tests: policy chassis e2e gap-fills (MR-722 follow-up) (#106 ) Some checks failed CI / Classify Changes (push) Has been cancelled Details CI / Check AGENTS.md Links (push) Has been cancelled Details Release Edge / Prepare edge release (push) Has been cancelled Details CI / Test Workspace (push) Has been cancelled Details CI / Test omnigraph-server --features aws (push) Has been cancelled Details CI / RustFS S3 Integration (push) Has been cancelled Details Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled Details * tests: policy chassis e2e gap-fills (MR-722 follow-up) Audit after PRs #101-105 surfaced real e2e gaps in the policy chassis that could let regressions ride through silently. Coverage was strong at the SDK level (18 chassis tests) and reasonable at HTTP (12+ policy tests), but the CLI×writer matrix was asymmetric (only `change` tested end-to-end), the `cli.actor` config-only precedence path was untested, the `OMNIGRAPH_UNAUTHENTICATED` env-var read path was unexercised, `serve()`'s startup-refusal propagation was structural-review only, and engine↔HTTP decision parity was a structural property without a test pinning it. This commit closes those gaps. Added (15 new tests, all test-only): * `policy_engine_chassis.rs` (+2): `load_file_as` allow + deny pair — PR #104 added the actor-aware mirror of `load_file` but it was only exercised via CLI integration; this is direct-SDK coverage. * `omnigraph-server/src/lib.rs` mod tests (+2): - `unauthenticated_env_var_classification` — consolidated single test (process-global env var; running parallel would race) that pins truthy values, falsy values, unset, and CLI-flag-overrides- env behavior of the `OMNIGRAPH_UNAUTHENTICATED` read path inside `load_server_settings`. - `serve_refuses_to_start_in_state_1_without_unauthenticated` — `#[serial]` integration test. Clears all bearer-token env vars, builds a `ServerConfig` with no policy file and no flag, calls `serve(config).await`, asserts Err before any side-effecting work (Lance dataset open, TcpListener::bind). Guards the classifier→serve propagation path so a future refactor that drops the call turns red. * `omnigraph-server/tests/server.rs` (+4): `policy_decision_parity_` — four cases (Change×allowed+denied, BranchMerge×allowed+denied). Each case runs the same Cedar decision via both SDK (`Omnigraph::with_policy().mutate_as` / `branch_merge_as`) and HTTP (`POST /change` / `POST /branches/merge`) and asserts both either Allow or Deny. The structural property (both paths call `PolicyChecker::check`) is now test-asserted. `omnigraph-cli/tests/system_local.rs` (+8): the CLI×writer matrix fan-out: - `local_cli_load_enforces_engine_layer_policy` - `local_cli_ingest_enforces_engine_layer_policy` - `local_cli_schema_apply_enforces_engine_layer_policy` - `local_cli_branch_create_enforces_engine_layer_policy` - `local_cli_branch_delete_enforces_engine_layer_policy` - `local_cli_branch_merge_enforces_engine_layer_policy` Each: one denied case (`--as act-bruno` against protected main) + one allowed case (`--as act-ragnor` via existing/extended admins-* rules). Plus: - `local_cli_actor_from_config_used_when_no_flag` — proves the config-only precedence path works. - `local_cli_actor_flag_overrides_config_actor` — proves the `--as` flag wins over `cli.actor` in the config. Adds `local_policy_config_with_actor` helper. Extends `POLICY_E2E_YAML` with `admins-branch-ops` (BranchCreate + BranchDelete) and `admins-schema-apply` rules so the CLI×writer matrix has positive-case rule coverage. Verification: all new tests pass; full `cargo test --workspace --locked` is green; `scripts/check-agents-md.sh` passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * tests: serialize env-touching server lib tests to fix CI flake CI flake on PR #106's Test Workspace job: two of the new tests (`serve_refuses_to_start_in_state_1_without_unauthenticated` and `unauthenticated_env_var_classification`) raced against `server_bearer_tokens_from_env_reads_legacy_token_and_token_file`, which sets `OMNIGRAPH_SERVER_BEARER_TOKEN` via `EnvGuard`. While `serve_refuses` was mid-execution with its EnvGuard cleared, the bearer-token test's EnvGuard had `OMNIGRAPH_SERVER_BEARER_TOKEN` set; `resolve_token_source()` saw it and classified the runtime state as `DefaultDeny` rather than refusing — so the test panicked with "Dataset at path X not found" instead of the expected refusal message. The unauthenticated test had the symmetric failure: its `OMNIGRAPH_UNAUTHENTICATED="anything"` got overwritten by a peer `EnvGuard` drop. Fix: mark every test that uses `EnvGuard` with `#[serial]` so they serialize against each other (default key). Already on `serve_refuses_to_start_in_state_1_without_unauthenticated`; added to `unauthenticated_env_var_classification` and `server_bearer_tokens_from_env_reads_legacy_token_and_token_file`. The `parse_bearer_tokens_json_*` tests don't touch env vars and stay parallel. Locally green (36 tests pass on my workstation); the parallelism issue is CI-runner-specific (more aggressive thread interleaving) but the fix is universal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 22:25:04 +03:00
Andrew Altshuler	f3f2a051ba	policy: server 3-state default-deny matrix (MR-723) (#105 ) Closes the "tokens but no policy" trap. Pre-MR-723, an operator who configured bearer tokens and forgot to set policy.file got a server that required auth and then permitted every action — the illusion of protection. After MR-723, that configuration is default-deny: only `read` actions succeed; every other action returns HTTP 403. Three startup states, classified deterministically: - Open — no tokens, no policy. Requires explicit `--unauthenticated` flag or `OMNIGRAPH_UNAUTHENTICATED=1`; otherwise `serve()` refuses to start. Forces the operator to opt in to "fully open dev mode" so it can't happen accidentally. - DefaultDeny — tokens configured, no policy. `authorize_request` rejects every action except `Read` with 403. The warn-log on startup names the misconfiguration explicitly. - PolicyEnabled — policy file configured. Cedar evaluates every request, unchanged from pre-MR-723. What landed: - `ServerConfig.allow_unauthenticated: bool` + `--unauthenticated` flag on the `omnigraph-server` bin + `OMNIGRAPH_UNAUTHENTICATED` env var (`load_server_settings` honors both). - New `classify_server_runtime_state(has_tokens, has_policy, allow_unauthenticated) -> Result<ServerRuntimeState>` pure function. `serve()` calls it before opening the engine and bails with a clear error when the operator hits the no-tokens-no-policy-no-flag cell. - `authorize_request` state-2 branch: when `policy_engine()` is None but the bearer-auth middleware delivered an authenticated actor, any action other than `Read` returns 403 with a message that names the misconfiguration. - `AppState::with_policy_engine(self, engine)` builder method so integration tests that need a custom workload (`new_with_workload`) can still install a permit-all policy without a new constructor. - `app_for_loaded_repo_with_auth(token)` and `app_for_loaded_repo_with_auth_tokens(tokens)` test helpers now install a permit-all policy alongside tokens — they previously represented the "tokens but no policy" state that MR-723 makes default-deny, and tests that don't care about policy were inadvertently coupled to the loophole. Tests: - `classify_` unit tests (3) — every cell of the matrix. - `default_deny_mode_allows_read_for_authenticated_actor` — GET /snapshot succeeds with bearer token + no policy. - `default_deny_mode_rejects_change_with_forbidden` — POST /change rejected with 403 + "default-deny" message. - `default_deny_mode_rejects_schema_apply_with_forbidden` — POST /schema/apply rejected with 403 + "default-deny" message. - New `app_for_repo_with_auth_tokens_only(schema, tokens)` helper builds the State-2 fixture without policy. The pre-MR-723 helpers `app_for_loaded_repo_with_auth` shift semantics to "tokens + permit-all" so existing tests retain their original intent. docs/user/policy.md: new "Server runtime states (MR-723)" section documents the matrix and the explicit `--unauthenticated` opt-in. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 17:02:26 +03:00
Andrew Altshuler	a275306a15	policy: CLI policy injection — local writes go through engine enforce (MR-722) (#104 ) Closes the CLI side of the policy chassis fan-out. Before this commit, CLI direct-engine writes bypassed Cedar entirely because the CLI never called `Omnigraph::with_policy(...)` for non-`policy validate\|test\|explain` subcommands. After this commit, every CLI direct-engine writer (change, load, ingest, branch create/delete/merge, schema apply) opens the engine via a new `open_local_db_with_policy(uri, &config)` helper that installs the configured `PolicyEngine` when `policy.file` is set, and threads the resolved actor through to the `_as` writer methods. Actor identity resolution: - New top-level `--as <ACTOR>` global flag on the CLI overrides config. - New `cli.actor` field in `omnigraph.yaml` provides a default actor. - Precedence: `--as` > `cli.actor` > None. - When policy is configured and neither is set, the engine-layer footgun guard fires and the write is denied — silent bypass via "I forgot the actor" is exactly what the guard prevents. - Remote HTTP writes ignore both — bearer-token-resolved server-side. Helpers added in main.rs: - `open_local_db_with_policy(uri, &config) -> Result<Omnigraph>` — opens the DB and installs the PolicyEngine when configured. Without policy this is identical to a bare `Omnigraph::open`. - `resolve_cli_actor(cli_as, &config) -> Option<&str>` — implements the flag > config > None precedence. Engine: added `load_file_as` to the loader as the actor-aware mirror of `load_file`, so CLI file-path loads flow through the same enforce gate as in-memory `load_as` calls. Test rewrite: `local_cli_policy_tooling_is_end_to_end_while_local_writes_stay_unenforced` was the explicit assertion of the pre-chassis hole. Renamed and split: - `local_cli_policy_tooling_is_end_to_end` — sanity for the read-only policy CLI surfaces (validate/test/explain), unchanged behavior. - `local_cli_change_enforces_engine_layer_policy` — the new assertion: policy installed + no actor → footgun-guard denial; `--as act-bruno` on protected main → Cedar denial; `--as act-ragnor` (admins-write rule) on main → permit, write committed. POLICY_E2E_YAML gains an `admins-write` rule so the permit case has a non-trivial actor to exercise. docs/user/policy.md updated with `cli.actor` + `--as <ACTOR>` usage. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 04:06:21 +03:00
Andrew Altshuler	da42beec41	policy: chassis fan-out — _as variants on the remaining 6 writers (MR-722) (#103 ) PR #102 wired apply_schema_as. This PR completes the chassis-side coverage so every public mutating engine entry point hits the same Omnigraph::enforce(action, scope, actor) gate regardless of transport: - mutate_as → enforce(Change, Branch(branch), actor) - load_as → enforce(Change, Branch(branch), actor) - ingest_as → enforce(Change, Branch(branch), actor); also threads actor through the implicit branch_create_from_as so fresh-branch ingest correctly hits BranchCreate too - branch_create_as → enforce(BranchCreate, TargetBranch(name), actor) - branch_create_from_as → enforce(BranchCreate, BranchTransition { source, target }, actor) - branch_delete_as → enforce(BranchDelete, TargetBranch(name), actor) - branch_merge_as → enforce(BranchMerge, BranchTransition { source, target }, actor) Three new _as variants for branch ops (create, create_from, delete) that had no actor surface before; existing actor-less variants delegate with actor=None so the no-policy path is a strict no-op. HTTP handlers updated to thread the resolved actor into the new _as variants for branch_create and branch_delete (was previously dropped). 14 new SDK chassis tests (one allow + one deny pair per wired writer); the existing 4 apply_schema_as tests stay. All 18 pass. docs/user/policy.md updated to describe engine-wide enforcement and the coarse-vs-fine layer split (engine = action gate, query layer per-row = MR-725 future). AGENTS.md capability matrix updated to match. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 03:38:18 +03:00
Andrew Altshuler	9973683261	policy: chassis core — omnigraph-policy crate + Omnigraph::enforce() (MR-722) (#102 ) PR #2 of the policy chassis series (PR #1 = MR-731, merged in #101). The structural fix that moves Cedar enforcement from HTTP-only to engine-wide. apply_schema is the proof-of-concept writer; PR #3 fans the enforce() call out to the remaining six (mutate_as, load, ingest_as, branch_create_from, branch_delete, branch_merge). ## What lands ### New crate: omnigraph-policy The 844-line policy.rs moves from `omnigraph-server` into a new `omnigraph-policy` workspace crate so both engine and server can depend on it. Cedar dependency moves with it. The server's policy.rs becomes a re-export shim (`pub use omnigraph_policy::`) so existing `omnigraph_server::PolicyAction` etc. paths keep working — CLI and test consumers don't have to migrate in one go. ### New trait: PolicyChecker ```rust pub trait PolicyChecker: Send + Sync { fn check(&self, action: PolicyAction, scope: &ResourceScope, actor: &str) -> Result<(), PolicyError>; } ``` `PolicyEngine` (Cedar-backed) implements it. `Omnigraph::with_policy()` takes `Arc<dyn PolicyChecker>`. Engine tests mock the trait without spinning up Cedar. MR-725 will extend the trait with `predicate_for()` for query-layer pushdown — additive, no call-site changes. ### New enum: ResourceScope Four variants — Graph, Branch, TargetBranch, BranchTransition — mapping cleanly to today's `(branch, target_branch)` shape on PolicyRequest via `to_branch_pair()`. Each engine writer picks the variant that matches the existing HTTP-layer convention so engine and HTTP evaluate the same Cedar decision. Invariant: ResourceScope stays at branch granularity. Per-type and per-row scope are MR-725's territory, not engine-layer's. Adding Type/Row variants here creates two places per-type policy can be evaluated, which can drift. See chassis design refinements comment on MR-722 (2026-05-17). ### Omnigraph::with_policy() + enforce() New `policy: Option<Arc<dyn PolicyChecker>>` field on Omnigraph, None by default (preserves embedded/dev no-enforcement mode). * `with_policy(self, checker)` setter — builder-style, consumes self. * `enforce(action, scope, actor)` — the gate. When policy is None, no-op. When policy is Some AND actor is None, hard error — silent bypass via "I forgot the actor" is exactly the footgun this gate is here to prevent. ### apply_schema_as: first writer wired * New public method `apply_schema_as(source, options, actor)` that calls `enforce(SchemaApply, TargetBranch("main"), actor)` before acquiring the schema-apply lock or doing any other work. * Existing `apply_schema(source)` and `apply_schema_with_options(...)` delegate to it with actor=None (no-actor variants). * HTTP handler `server_schema_apply` updated to call apply_schema_as with the resolved actor. AppState construction injects the PolicyEngine into Omnigraph via `with_policy`. HTTP-layer authorize_request still fires first; the engine gate is the redundant-but-correct backstop and the only path that protects SDK / embedded callers. PR #3 removes the HTTP redundancy. ### OmniError::Policy New error variant for engine-layer policy denial / evaluation failure. ApiError::from_omni maps it to 403. ### MR-724 Admin action — Option A reservation PolicyAction::Admin kept in the enum with a load-bearing doc comment naming its future consumers (hot reload, audit log query, approvals list per MR-726 / MR-732 / MR-734). No enforce(Admin, ...) call site exists yet — the variant is reserved so the action vocabulary is complete from chassis day one. MR-724 closes when the first consumer surface ships. ### New SDK-side integration test `crates/omnigraph/tests/policy_engine_chassis.rs` — four tests covering: * Policy denies for unauthorized actor → OmniError::Policy * Policy permits for authorized actor → apply succeeds * Policy installed + no actor → hard error (forget-the-actor footgun) * No policy → no-op (embedded/dev default still works) These exercise the engine path directly — no HTTP layer involved. ## Test results - cargo test --workspace --locked --no-fail-fast: 851 passed, 0 failed * 45 server tests (existing) pass * 14 schema_apply tests (existing) pass * 4 new chassis tests pass * 60 OpenAPI tests pass (no HTTP API surface changes) * No regressions across the workspace ## Architectural decisions baked in Per MR-722 chassis design refinements comment (2026-05-17): 1. PolicyChecker is a trait, not just a concrete. Engine and server consume the trait. MR-725 adds predicate_for() additively. 2. ResourceScope stays at branch granularity. No Type/Row variants. 3. Coarse-vs-fine framing pinned: engine-layer is action gate; query-layer (MR-725) is predicate gate. Both backed by same Cedar engine; non-overlapping responsibilities. 4. Admin action reserved for policy-management surfaces (MR-724 Option A). ## Pending follow-ups (PR #3+) - Fan-out enforce() to mutate_as, load, ingest_as, branch_create_from, branch_delete, branch_merge (PR #3). - Remove HTTP-layer authorize_request redundancy once engine gate covers all writers (PR #3). - CLI policy injection into Omnigraph for non-`policy validate\|test\|explain` subcommands (PR #3 or follow-up). - MR-723 default-deny 3-state matrix (PR #4). - MR-736 severity warn/deny (PR #5). - AGENTS.md scope-of-enforcement rewrite once chassis fully lands. - Coarse-vs-fine framing in docs/user/policy.md. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 00:36:36 +03:00
Andrew Altshuler	7a86f654d4	policy: codify signed-token-claim-only actor identity (MR-731) (#101 ) Warm-up commit for the policy chassis epic (MR-722). PR #1 of the chassis series — same role as schema-lint v1's commit #1 baseline. Zero behavioral change; establishes the regression test, the load-bearing doc comment, and the user-doc paragraph for an invariant already true in code. Server auth already resolves `actor_id` from the matched bearer token at `omnigraph-server/src/lib.rs:692-694`, overwriting whatever the handler put in the PolicyRequest. The principle is named in docs/dev/invariants.md Hard Invariant 11 ("clients cannot set actor identity directly"). What was missing: a regression test, a load-bearing doc comment at the resolution site, and a user-facing documentation paragraph. This commit adds all three. Why first. The actor-identity invariant is the foundation every other policy decision stands on. If `actor_id` can be spoofed, every chassis primitive (per-row scope, audit log, two-person rule) becomes ungated. Pinning the invariant first means PR #2 (the chassis core) doesn't have to re-prove this assertion. Changes: * crates/omnigraph-server/tests/server.rs — new regression test actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers with three sub-assertions: - spoof-up: bearer for denied actor + X-Actor-Id naming allowed actor → 403 (header doesn't promote) - spoof-down: bearer for allowed actor + X-Actor-Id naming denied actor → 200 (header doesn't demote) - empty-string spoof: empty X-Actor-Id doesn't clear resolved actor Cross-link to MR-777 (auth boundary cases — actor-id collision + malformed bearer) noted in the test docstring. * crates/omnigraph-server/src/lib.rs — expanded doc comment at the actor-resolution site explaining the SECURITY INVARIANT, citing Hard Invariant 11, the Supabase RLS history footgun, and the regression test that pins the contract. Reader thinking "I should let clients override actor_id for impersonation" hits this comment first. * docs/user/policy.md — new "Actor identity (signed-claim-only)" section near the existing Server enforcement section. Closes the user-facing doc gap MR-731's "Done when" requires. Architectural decisions for PR #2+ pinned this session (not implemented here, recorded so future implementers don't re-litigate): - PolicyEngine moves to new `omnigraph-policy` workspace crate so both engine and server can depend on it (Q2). - `enforce(action, scope, actor)` will take a new `ResourceScope` enum, leaving room for MR-725's per-type and per-row variants (Q3). - `PolicyAction::Admin` is kept and wired (Option A) — meta-action for policy-management surfaces (hot reload, audit log query, approvals list) as those consumer features land (Q4). Test results: - cargo test -p omnigraph-server --test server: 45 pass (44 existing + 1 new); no regressions - scripts/check-agents-md.sh: passes (34 links / 33 docs OK) Out of scope (PR #2+): - Omnigraph::with_policy() + enforce() method - omnigraph-policy crate creation - ResourceScope enum - CLI policy injection into Omnigraph - HTTP-layer redundant-check removal - MR-724 Admin action wiring (PR #2) - MR-723 default-deny 3-state (PR #4) - MR-736 severity warn/deny (PR #5) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:51:34 +03:00
Andrew Altshuler	a6e037547f	schema-lint chassis v1.2: --allow-data-loss flag + Hard mode (MR-694) — completes v1 (#100 ) * schema-lint v1 commit 5: --allow-data-loss flag + Hard mode Final v1 commit. Wires up the --allow-data-loss CLI flag and Hard mode for both DropProperty and DropType. Per docs/dev/schema-lint-v1-plan.md, commit #5 of the schema-lint chassis v1 series (MR-694). CLI (omnigraph-cli/src/main.rs): - New --allow-data-loss flag on both `omnigraph schema plan` and `omnigraph schema apply` subcommands. Off by default (Soft). - HTTP remote schema apply explicitly rejects the flag for now (CLI-only; HTTP parity is a separate small follow-up that adds the field to SchemaApplyRequest + the server handler). Engine (omnigraph.rs + schema_apply.rs): - New SchemaApplyOptions { allow_data_loss: bool } public struct (Default = all false), re-exported via omnigraph::db::SchemaApplyOptions. - New public methods: plan_schema_with_options and apply_schema_with_options. Existing plan_schema/apply_schema are now thin wrappers that pass Default::default(). - promote_drops_to_hard: post-plan walk that promotes every DropMode::Soft step to DropMode::Hard when the flag is set. Keeps the compiler's plan_schema_migration signature unchanged (no breaking change for tests / callers). - Apply path: both Drop arms accept Hard mode; behavior is identical to Soft inside the apply loop. The DIFFERENCE is the new hard_cleanup_targets: Vec<(String, String)> accumulator, populated for every Hard variant with (table_key, full_dataset_uri). - Post-publish cleanup: a new loop after the manifest commit iterates hard_cleanup_targets and calls cleanup_old_versions (before_timestamp = now) on each dataset URI. Best-effort — the apply is already durable; cleanup failure is logged via tracing::warn rather than failing the apply. - New cleanup_dataset_old_versions helper inlines the Lance cleanup_old_versions call against a dataset URI. Behavioral details: - DropProperty Hard: stage_overwrite produced a new dataset version without the column. cleanup_old_versions removes the prior version (and reclaims unique fragments). After Hard apply, snapshot_at_version(pre_drop).open(table_key) FAILS because the prior dataset version was reclaimed. - DropType Hard: no per-table write happens (the change is the manifest tombstone). cleanup_old_versions on the orphan dataset is a no-op in the immediate term (no prior versions to clean since the dataset wasn't modified by this apply). The dataset directory persists. Full orphan-cleanup is a documented follow-up — the user-facing contract is "data is unreachable via omnigraph" (manifest entry tombstoned), which is satisfied. Tests (tests/schema_apply.rs): - apply_schema_with_allow_data_loss_promotes_drops_to_hard: default plan emits Soft; with options.allow_data_loss=true, plan emits Hard; apply succeeds. - apply_schema_hard_drops_property_makes_prior_version_unreachable: Hard drop succeeds, current snapshot lacks the column, and snapshot_at_version(pre_drop).open("node:Person") FAILS (Lance prior version reclaimed by cleanup). - apply_schema_hard_drops_node_and_edge_with_flag_succeeds: both Node and Edge DropType variants are promoted to Hard with the flag; apply succeeds; current manifest entries gone. (Orphan dataset directory cleanup deferred.) Test results: - cargo test -p omnigraph-compiler --lib: 239 passed - cargo test -p omnigraph-engine --test schema_apply: 14 passed (3 new Hard tests + 11 existing soft/regression tests) - cargo test -p omnigraph-server --test openapi: 60 passed (no HTTP API surface changes in this commit; OpenAPI parity follow-up noted) v1 status: complete for CLI/embedded use. MR-694 chassis epic + MR-700 DropType/DropProperty ticket can close after this lands. Known follow-ups (separate small PRs): - HTTP parity: extend SchemaApplyRequest with allow_data_loss field, thread through server handler, regenerate openapi.json. - Orphan-dataset directory deletion for DropType Hard (currently the dataset directory persists; cleanup_old_versions doesn't remove it because the dataset wasn't modified). - MR-948 substrate alignment: swap DropProperty Soft from stage_overwrite to Dataset::drop_columns (catalog_only vs full_rewrite cost class). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fixup: use bail! from color_eyre::eyre instead of anyhow The remote-rejection branch in SchemaCommand::Apply used anyhow::anyhow! which isn't in scope; the CLI's Result type is color_eyre::eyre::Result and bail! is already imported. Caught by CI Test Workspace job on PR #100. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 22:12:46 +03:00
Andrew Altshuler	58cee158d8	schema-lint v1 commit 4: emit + apply DropType { Soft } (#99 ) Wire the second half of the dormant Drop* family. Per docs/dev/schema-lint-v1-plan.md, commit #4 of the schema-lint chassis v1 series (MR-694). Builds on commit #3 (PR #90, DropProperty Soft). Planner (schema_plan.rs): - plan_nodes leftover loop: emit DropType { Node, name, Soft } instead of UnsupportedChange (OG-DS-102) for node-type removals. - plan_edges leftover loop: emit DropType { Edge, name, Soft } instead of UnsupportedChange (OG-DS-103) for edge-type removals. Apply (schema_apply.rs): - New dropped_tables: BTreeSet<String> accumulator alongside added_tables / renamed_tables / rewritten_tables. - DropType arm in the metadata loop populates dropped_tables for Soft mode. Hard mode errors (lands in commit #5 with --allow-data-loss). - New tombstone-emission loop after the rename sidecar build: for each dropped table, push to sidecar_tombstones AND populate table_tombstones with table_version + 1. The existing manifest publish path converts table_tombstones into ManifestChange::Tombstone operations — no new manifest plumbing needed. - Soft DropType has no Phase B per-table write; the tombstone is the entire change. Lance dataset files are retained — prior __manifest versions still reference them, so time travel + branch-from-snapshot can read the dropped table until cleanup_old_versions runs. - Rides on SidecarKind::SchemaApply per MR-847 (already established by commit #3). Tests: - Planner unit test plan_emits_soft_drop_for_removed_node_and_edge_types asserts both Node and Edge DropType { Soft } emission for the Company + WorksAt combined drop, plus no UnsupportedChange. - Integration test apply_schema_drops_node_and_referencing_edge_softly (replaces apply_schema_rejects_dropping_a_node_type): asserts plan emission, apply success, current manifest entries absent, pre-drop manifest entries present (time-travel reversibility), reopen consistency. - Integration test apply_schema_drops_an_edge_type_softly (replaces apply_schema_rejects_dropping_an_edge_type): single edge drop, asserts other tables untouched, time-travel reversibility. Test results: - cargo test -p omnigraph-compiler --lib: 239 passed (1 new + 238) - cargo test -p omnigraph-engine --test schema_apply: 11 passed (2 converted + 9 unchanged) Pending for v1 completion: - Commit #5: --allow-data-loss CLI flag + Hard mode promotion in planner + immediate compact_files + cleanup_old_versions for both DropProperty and DropType. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:25:42 +03:00
Andrew Altshuler	e98347eb7b	schema-lint chassis v1.0: DropProperty Soft + code-tagged diagnostics (MR-694) (#90 ) * schema-lint chassis v1 (WIP): tier surfacing + plan doc First commit of the chassis v1 branch. Lands a small, foundational slice without behavior change, plus a planning doc that lays out the remaining 7 commits in sequence so the PR can be reviewed incrementally. This commit: - Adds SchemaMigrationStep::diagnostic() returning the full &'static DiagnosticCode (family + tier + severity) for UnsupportedChange steps with codes. Renderers can now reach the tier without re-implementing the code → tier lookup. - CLI `omnigraph schema plan` output now displays tier alongside code: unsupported change on node:Person.age [OG-DS-104, destructive]: removing property 'Person.age' is not supported in schema migration v1 Operators see at-a-glance the kind of risk each rejection represents — not just the rule identifier. - No behavior change. All 11 existing schema_apply tests still pass. Planning doc at docs/schema-lint-v1-plan.md tracks the 7 remaining commits to bring v1 to feature-complete: 1. (this commit) Tier surfacing in plan output. 2. Soft / Hard mode enum on drop steps. 3. Tombstone fields on catalog IR. 4. Planner emits DropProperty { Soft } by default. 5. Apply path implements Soft mode. 6. Convert PR #62 destructive-rejection tests. 7. --allow-data-loss flag + Hard mode. 8. (optional) Tombstone unhide / restore command. Delete the planning doc when v1 lands. Intentionally checked in to the WIP branch so the scope is reviewable; not intended as a permanent doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * schema-lint v1 commit 2: DropMode + dormant Drop* variants Second commit of the chassis v1 branch. Lands the type-level shape of soft/hard drops without wiring them up. Variants are reachable from emitters but the planner doesn't produce them yet; the apply path returns an explicit not-yet-implemented error if one shows up via deserialization. Added: - `DropMode { Soft, Hard }` — orthogonal to `SafetyTier`. Tier classifies the rule's risk class; mode is the operator's intent for data treatment. - `Soft` → catalog tombstone, data retained. Tier: safe. - `Hard` → Lance-level removal. Tier: destructive; will require --allow-data-loss to apply (commit 7). - `SchemaMigrationStep::DropType { type_kind, name, mode }` and `SchemaMigrationStep::DropProperty { type_kind, type_name, property_name, mode }` variants. - Re-export `DropMode` from `omnigraph_compiler::DropMode` so downstream crates don't reach into the catalog submodule. - CLI `render_schema_plan_step` arms for both variants, surfacing the mode in plan output: `drop property 'Person.age' of node 'Person' (soft mode)`. - `apply_schema_with_lock` exhaustive match arm for the two new variants that returns `manifest_internal` with a clear not-yet-implemented message. If a SchemaIR JSON containing Drop{Type,Property} arrives (e.g. from a future tool or hand- written), the apply path fails explicitly rather than silently misclassifying. - Two new in-source tests: - `drop_steps_round_trip_through_serde` — pins the wire shape for all four (variant × mode) combinations. - `drop_mode_serde_uses_snake_case` — pins external-tool- friendly serialization (`"soft"` / `"hard"`). Build: clean, only pre-existing warnings. Tests: - omnigraph-compiler schema_plan: 6/6 (4 existing + 2 new). - omnigraph-engine schema_apply: 11/11 (unchanged — planner still emits UnsupportedChange for removal paths). Next commit (commit 3 per docs/schema-lint-v1-plan.md): add the `tombstoned: bool` fields to NodeIR / EdgeIR / PropertyIR for the catalog representation of soft-mode tombstones. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * plan doc: reframe v1 around Lance native drop_columns After a substrate audit of the Lance data-evolution guide on 2026-05-13, the v1 plan was simplified. Two key findings: 1. Lance's `drop_columns()` is already metadata-only and reversible via time travel until cleanup. No need for a parallel `tombstoned: bool` field in our catalog IR — Lance's version graph IS the tombstone. 2. The full schema_apply substrate migration (add_columns, drop_columns, alter_columns vs. stage_overwrite across all step types) is consolidated in MR-948 as a sibling issue. v1 only uses the relevant slice (drop_columns for OG-DS-1XX). Net plan changes: - Commit 3 (original): tombstone fields on catalog IR → dropped. No catalog IR change needed. The Lance drop_columns commit IS the tombstone. - Commit 5 (original): apply path writes tombstoned: true → replaced with: apply path calls Dataset::drop_columns([name]). - Commit 7 Hard mode: stage_overwrite removing the column → replaced with: drop_columns + compact_files + cleanup_old_versions. Same APIs omnigraph cleanup already uses. - Commit 8 (original): omnigraph schema unhide → dropped. Time travel is the undo (omnigraph snapshot --at <commit>). Net result: 8 commits → 5 commits. ~250 LoC less surface. More substrate-aligned. The chassis types from commit 2 (DropMode enum, DropType / DropProperty variants) are kept exactly as designed; only the implementation strategy changed. The Lance docs quote is included in the doc so future readers see the substrate behavior cited verbatim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * schema-lint v1 commit 3: emit + apply DropProperty { Soft } Wire the dormant DropProperty variant end-to-end for the Soft case. Per docs/schema-lint-v1-plan.md, commit #3 of the schema-lint chassis v1 series (MR-694). Planner (schema_plan.rs): - plan_properties: emit DropProperty { type_kind, type_name, property_name, mode: Soft } instead of UnsupportedChange when a property exists in accepted but not in desired. Plan is now supported = true for drop-only changes. Apply (schema_apply.rs): - Route DropProperty { Soft } through rewritten_tables. The existing batch_for_schema_apply_rewrite path already iterates the target schema fields, so a property absent from desired_catalog is naturally projected away. The prior Lance version retains the dropped column for time-travel reversibility (until cleanup runs). - DropType still errors (lands in commit #4 with different mechanics: __manifest entry removal instead of column projection). - DropProperty { Hard } still errors (lands in commit #5 with --allow-data-loss CLI flag + immediate compact_files + cleanup_old_versions). Tests: - Planner unit test plan_emits_soft_drop_for_removed_nullable_property asserts the variant emission + supported = true + no UnsupportedChange. - Integration test apply_schema_drops_a_nullable_property_softly_ preserves_prior_version (replaces the former apply_schema_rejects_dropping_a_property_with_data) asserts: (a) plan contains DropProperty { Soft } (b) apply succeeds + manifest advances + row count unchanged (c) current dataset schema lacks the dropped column (d) snapshot_at_version(pre_drop) still has the dropped column (e) reopen consistency — drop preserved across engine restart Recovery: rides on SidecarKind::SchemaApply per MR-847. No new sidecar kind needed; the entire apply path is already sidecar-wrapped. Substrate alignment: this commit uses the stage_overwrite full-rewrite path (full_rewrite cost class) rather than Lance native drop_columns (catalog_only cost class). MR-948 is the follow-up substrate-alignment refactor that introduces a LanceColumnOp surface and switches the metadata-only case onto drop_columns. Functional outcome is identical; cost-class improvement deferred. Test results: - cargo test -p omnigraph-compiler --lib: 238 passed - cargo test -p omnigraph-engine --test schema_apply: 11 passed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: move schema-lint-v1-plan into docs/dev/ + add to index Post-rebase fixup for the docs split (#93). The plan doc was added to docs/ at the top level before main reorganized to docs/{user,dev}/. This moves it into docs/dev/ and adds an entry to docs/dev/index.md under a new "Active Implementation Plans" section so the check-agents-md.sh link check passes. Per the original commit message (`617a77d`), the plan doc is intentionally temporary — it will be deleted when v1 lands. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 16:30:03 +03:00
Ragnor Comerford	5c889f8e42	Update README.md	2026-05-15 18:06:25 -07:00
Ragnor Comerford	e046c193bc	Update README.md	2026-05-15 18:03:40 -07:00
Ragnor Comerford	6abe59bbaa	Update use cases in README with second brains link	2026-05-15 15:40:30 -07:00
Andrew Altshuler	0de5f69d86	docs: drop npx mdrip; use curl \| pandoc for full-page fetches (#97 ) The previous "fetch the full page" recommendation in AGENTS.md and docs/dev/lance.md pointed at an unknown-author npm CLI that, on consent, wrote agent-targeted content into AGENTS.md and modified .gitignore / tsconfig.json. Source audit was clean of malicious code but the self-perpetuating prompt-injection pattern combined with a single maintainer and ~21 downloads/day made it not worth the risk. Switched to the curl + pandoc command already documented as the no-tool option. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:06:24 +03:00
Andrew Altshuler	60eee78465	docs: split user and developer docs (#93 )	2026-05-15 03:45:22 +03:00
Andrew Altshuler	e8d49559c4	branch-protection: allow admin bypass on main (#94 ) Flip enforce_admins from true to false. Repo admins can now merge their own PRs without waiting for code-owner review, by clicking "Merge without waiting for requirements to be met" once CI is green. The action is recorded in the audit log. Non-admins still see full enforcement: code-owner review required, 1 approving review, required status checks must pass. Rationale: as the solo owner of most CODEOWNERS scopes, the author cannot satisfy GitHub's "non-self approver" rule on their own PRs, which made every PR block on a second human. Admin bypass restores the practical workflow while keeping the protection rules as the default for everyone else.	2026-05-15 03:32:12 +03:00
Andrew Altshuler	6bad829ed0	branch-protection: declarative policy + apply script (#89 ) Branch protection on main, declared as code rather than as opaque GitHub UI state. Pairs with the CODEOWNERS chassis (#88): once this PR lands and an admin runs the apply script, every PR to main must satisfy code-owner review and the listed required checks. Components: - .github/branch-protection.json — the policy. Edit this to change required checks, review counts, etc. Includes a _comment field for human readers; the apply script strips it before PUT. - scripts/apply-branch-protection.sh — idempotent apply via `gh api`. Reads back current state for verification. Supports DRY_RUN=1. - docs/branch-protection.md — explains the policy, how to apply, how to change, why declared as code. - AGENTS.md topic-index row. Policy summary: - Required status checks (strict): Classify Changes, Check AGENTS.md Links, Test Workspace, Test omnigraph-server --features aws, CODEOWNERS / drift, CODEOWNERS / noedit. - Required approving reviews: 1, must be a code owner. - Dismiss stale reviews on new commits. - Required linear history (squash or rebase merges only). - No force pushes, no deletions, no admin bypasses. - Required conversation resolution. What's NOT in this PR: - Required signed commits — not yet; maintainers must enroll GPG/SSH signing first or merges will block. - Tag protection for v* tags — separate PR. - Additional required checks (cargo deny, audit, fmt, clippy, CodeQL, schema-lint MR-946) — separate PRs as each lands. - The script is NOT run by CI. Branch-protection changes are admin actions; CI-driven auto-apply would defeat the purpose. Manual invocation is the audit point. How to apply after merge: ./scripts/apply-branch-protection.sh Requires gh-CLI auth with repo-admin permissions. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:38:20 +03:00
Andrew Altshuler	730712b73f	codeowners: yml source of truth + generator + drift CI (#88 ) * codeowners: generator + drift CI + initial roles Source-of-truth approach to CODEOWNERS: yml is hand-edited, CODEOWNERS is generated and CI-enforced. Every role change is a reviewable PR with a permanent in-repo audit trail. No GitHub UI clicks, no shadow state. Initial roles: engineering @aaltshuler owns crates/** + default (.github/, scripts/, Cargo., openapi.json, everything else not docs) docs @aaltshuler @ragnorc owns docs/, README.md, AGENTS.md, CLAUDE.md, SECURITY.md Per GitHub semantics, multiple owners on a CODEOWNERS line means "any one satisfies the review" — for docs, either named member can approve. Strict "N distinct approvers" would need a CI workaround (not wired today; tracked for future hardening). Components: - .github/codeowners-roles.yml — source of truth. Edit this. - .github/scripts/render-codeowners.py — generator (PyYAML; ~100 LoC). - .github/CODEOWNERS — generated. CI rejects hand-edits. - .github/workflows/codeowners.yml — two checks: drift: re-render and assert CODEOWNERS matches. * noedit: reject PRs that edit CODEOWNERS without editing the yml. - docs/codeowners.md — explains the source-of-truth pattern, how to change roles, how to add new roles. - AGENTS.md topic-index row. What's NOT in this PR: - Branch protection on main (separate PR; needs `gh api` call against the org). - Required-reviewer enforcement (depends on branch protection landing). - Required CI status checks (depends on branch protection landing). - Scheduled rotation (the schedule: block in the yml + a weekly workflow). Today's roles are stable; rotation isn't needed yet. - Linear-as-source-of-truth integration (Approach 4 from the design discussion; deferred). Verified: - Generator output is deterministic (idempotent re-runs). - scripts/check-agents-md.sh OK (28 links, 28 docs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * codeowners: fix catch-all ordering (Devin review #88) Devin caught a real bug: GitHub CODEOWNERS uses "last match wins" semantics, but the generator emitted the catch-all `` AFTER specific patterns. Net effect: `` won for every file, silently nullifying the docs role and never routing reviews to @ragnorc. Fix is one-line — emit the default `` line before iterating the specific paths. Also: - Added a regression assertion in the generator: after rendering, the first non-comment line must start with `` if a default is configured. Generator exits non-zero otherwise. Catches the same class of mistake in any future refactor. - Rewrote the yml header comment, which incorrectly stated "keep more-specific paths after broader patterns" (correct for GitHub semantics but the generator was doing the opposite — so the comment read as a description of behavior when it was actually a contradicted intention). Verified by re-rendering: `` is now line 12, `crates/` is line 14, `docs/` is line 15, etc. README.md matches both `` and `README.md`; `README.md` is later → wins → @aaltshuler + @ragnorc both assigned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:26:06 +03:00
Andrew Altshuler	c142dafdf3	schema-lint chassis v0: code-tagged diagnostics (MR-694) (#87 ) First slice of the schema-lint chassis. Adds stable `OG-XXX-NNN` codes to schema-migration rejections so operators can suppress, look up, and filter on identifiers rather than free-text prose. Atlas-style chassis adapted to omnigraph's typed-IR substrate (no SQL injection vector, no per-engine locks, native edge/vector/embedding types). What's in v0: - New `omnigraph-compiler/src/lint/` module with: - `diagnostic.rs` — Family / SafetyTier / Severity enums covering ten families (DS, MF, CD, BC, NM, OW, NL, VE, ED, LK). Only DS and MF are populated in this PR. - `codes.rs` — 8 DiagnosticCode constants (OG-DS-101..105, OG-MF-103, OG-MF-104, OG-MF-106). Five of the eight are wired to real emission sites; the other three are reserved. - Unit tests for catalog invariants: codes unique, prefix matches family, suffixes are 3-digit, destructive defaults to error, lookup() works, EMITTED_IN_V0 codes exist in ALL_CODES. - `SchemaMigrationStep::UnsupportedChange` gains an optional `code: Option<String>` field. New `unsupported_error_message()` helper prefixes the message with `[code]` when present. - 5 of 17 existing rejection paths now carry codes: - `removing node type` → OG-DS-102 - `removing edge type` → OG-DS-103 - `removing property` → OG-DS-104 - `adding required property without backfill` → OG-MF-103 - `changing property type` → OG-MF-106 Remaining 12 paths carry `code: None` and are tagged as future work. - `schema_apply` surfaces the formatted error (with `[code]` prefix); CLI `omnigraph schema plan` renders the code on the `unsupported change on <entity>` line. - PR #62 destructive-rejection tests in `tests/schema_apply.rs` now assert on the stable code (`msg.contains("OG-DS-104")`) instead of the error-message substring. 11/11 tests pass. - New `docs/schema-lint.md` documents the v0 catalog + the 10 families + Atlas prior art. AGENTS.md index updated. What's explicitly NOT in v0 (subsequent PRs): - No severity config in `omnigraph.yaml` (MR-694 §2). - No `@allow(OG-XXX-NNN, "rationale")` suppression directive (§3). - No `--allow-data-loss` flag or destructive-tier enforcement. - No new `SchemaMigrationStep` variants (soft/hard drops, default, widen/narrow). MR-700, MR-697 land those. - No pre-migration checks (MR-941). - No CD / VE / LK / NM family rules (MR-942..945). - No CI integration (MR-946). Tests: 235 compiler tests, 11 schema_apply integration tests, 14 lint module tests, 55 CLI tests — all green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:08:18 +03:00
Ragnor Comerford	f28f644bf2	Merge pull request #83 from ModernRelay/devin/1778623807-remove-orphan-loader-files Remove orphaned loader/{constraints,embeddings,jsonl}.rs files	2026-05-12 21:03:36 -07:00
Ragnor Comerford	53d41a30b4	Merge pull request #85 from ModernRelay/ragnorc/survey-state engine: pin stable-row-id preservation through stage_overwrite	2026-05-12 17:24:55 -07:00
Ragnor Comerford	3cc5c6a9a2	chore: gitignore the mdrip/ markdown snapshot cache npx mdrip writes fetched-page snapshots under mdrip/. The cache is a local-only working artifact (docs/lance.md is the curated index of upstream Lance pages we fetch on demand). Keep the cache out of the tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:02:14 -07:00
Ragnor Comerford	a30d1cc0dc	engine: stage_overwrite sets enable_stable_row_ids explicitly Defensive — Lance 4.0.0 preserves the source dataset's flag through Operation::Overwrite even when WriteParams omits it (pinned by the prior commit's test), but setting it explicitly matches the public overwrite_dataset path at line 454 and documents the dependency at the call site so a future refactor doesn't accidentally drop it. Setting it on a dataset created without stable row IDs is a no-op per Lance's row-id-lineage spec, so this stays correct for legacy datasets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:57:05 -07:00
Ragnor Comerford	549060f297	tests: pin stable-row-id preservation across stage_overwrite stage_overwrite is used by schema_apply to rewrite tables when an additive migration touches data. If Lance Operation::Overwrite ever stopped preserving the source dataset's enable_stable_row_ids flag, every schema_apply that triggers a rewrite would silently disable stable row IDs on the affected tables and downstream readers that depend on _rowid stability (change-feed validators, index reconcilers) would observe silent corruption. Empirically Lance 4.0.0 does preserve the flag through Overwrite even when WriteParams omits it — but the preservation isn't documented at the Lance spec level, so pin it here. Any future behaviour change surfaces as a test failure rather than silent corruption. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:56:58 -07:00
Ragnor Comerford	2121d9f6c3	docs: storage stable-row-ids reflects every dataset The L1 capability list claimed the flag was enabled "for the commit-graph and run-registry datasets" — stale. Every Lance dataset OmniGraph creates has enable_stable_row_ids: true; the run-registry datasets are gone since MR-771. Replace with a single paragraph capturing the invariant, the consequences (row-version columns available, CreateIndex × Rewrite not retryable, Lance reader version required), the legacy-dataset constraint (one-way at create, dump-and-reload to migrate), and a pointer to the regression test in staged_writes.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:56:51 -07:00
Ragnor Comerford	8427e705dd	Merge pull request #84 from ModernRelay/ragnorc/read-docs docs: lead AGENTS.md first principle with integrated-over-time framing	2026-05-12 16:38:52 -07:00
Ragnor Comerford	24c0558180	docs: lead AGENTS.md first principle with integrated-over-time framing Reframes the first-principle section to lead with Winters' "engineering is programming integrated over time" as the lens, keeping "minimize ongoing liability" as the operative directive and folding in "complexity should be earned." Adds a new Tiebreakers subsection with two rules that the prior section lacked clean appeals for: - correctness > simplicity > performance (lexicographic) - reversibility shapes evidence demand (reversible → prod metrics over napkin math over RFCs; irreversible → RFC up-front) Adds a Hyrum's-Law deny-list entry in both AGENTS.md and docs/invariants.md §IX: shipping observable behavior is shipping a contract, even when undocumented. Net always-on context cost: ~7 lines. No renumbering of §I–VIII invariants; Hyrum's Law lands in the deny-list to avoid breaking back-references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:27:24 -07:00
Devin AI	327eb821b5	Remove orphaned loader/{constraints,embeddings,jsonl}.rs files These three files in crates/omnigraph/src/loader/ have no `mod` declaration anywhere in the workspace and no `#[path = "…"]` reference. They are not compiled — `touch`-ing them does not trigger `cargo check` to recompile anything. Their imports (`crate::catalog::schema_ir`, `crate::error::NanoError`, `crate::store::manifest::hash_string`, `crate::types::ScalarType`, `super::super::graph::DatasetAccumulator`) reference modules that no longer exist in the engine crate, so they could not even be wired in without further work. They are vestigial code from an earlier monolithic crate layout. The live functionality is independently implemented inside crates/omnigraph/src/loader/mod.rs. These files have been orphaned since the initial public commit. `cargo check --workspace --all-targets` and `cargo test --workspace --no-run` both pass with no new warnings. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-12 22:57:20 +00:00
Claude	a9c4423b82	Strengthen cleanup-then-optimize sequencing test with postconditions Reviewer feedback on PR #62: the original `cleanup_then_optimize_succeed_in_sequence` only unwrapped both calls and asserted nothing, so it didn't validate the claimed sequencing behavior. The concern that motivates the test is that cleanup destroys version history and optimize on a freshly-cleaned table could trip on dropped fragment refs or stale manifests. Rename to `cleanup_then_optimize_preserves_rows_and_table_remains_writable` and add three concrete postconditions: row counts in both Person and Company tables survive the sequence; the head remains readable; and a subsequent merge load still succeeds.	2026-05-12 23:36:01 +03:00
Claude	57a62756c5	Exercise actual type rename in schema-apply rename test The previous version of `apply_schema_renames_node_type_via_rename_from_and_preserves_rows` kept the node name as `Person` (`@rename_from("Person")`) and only renamed a property. The planner only emits a `RenameType` step when the new name differs from the accepted one, so the test name overstated what it covered: a regression in `RenameType` step emission or in the coordinator's table-key remap during type rename could pass while the test still went green. Rename the desired node from `Person` to `Human` (with `@rename_from("Person")`), update the dependent edge endpoints to point at `Human`, and assert both the `RenameType` step and that the manifest table key has moved from `node:Person` to `node:Human`.	2026-05-12 23:36:01 +03:00
Claude	e22d468e27	Add maintenance + destructive-migration test coverage The audit of test coverage flagged three holes: - `omnigraph optimize` and `omnigraph cleanup` had no integration tests (no `maintenance.rs`). Add one covering empty/idempotent edges, the policy-validation contract on `cleanup`, and head preservation under aggressive policies. - `apply_schema` only covered I32 -> I64 type-change rejection. Add the symmetric narrowing case plus rejections for the other destructive shapes (drop property with data, drop node type, drop edge type, add required property without backfill) and assert the manifest version doesn't advance. Add a positive `@rename_from` case to pin the stable-type-id contract preserves rows through a rename. - `docs/testing.md` was missing `validators.rs` and the new `maintenance.rs` from its file table; bump the count and add rows.	2026-05-12 23:36:01 +03:00
devin-ai-integration[bot]	6914e0256e	MR-786: merge-pair truth table with exhaustive op-variant matrix (#81 ) * MR-786: merge-pair truth table with exhaustive op-variant matrix Add crates/omnigraph/tests/merge_truth_table.rs that enumerates every (left_op, right_op) cell from the operation vocabulary named in the ticket — {noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel} — and asserts the deterministic outcome of Omnigraph::branch_merge against a structured oracle. The matrix is built in a 9x9 match in build_case, so adding a new OpVariant is a compile-time, fail-on-omission task. Today's mutation grammar only exposes insert \| update set \| delete (see docs/query-language.md), so the 36 cells over the first six ops are executable and the 45 cells involving dropProperty/addLabel/removeLabel are recorded as Expected::Unsupported with a note. Each executable cell spins up a fresh tempdir, applies one mutation per branch, calls branch_merge, and asserts either: * MergeOutcome (AlreadyUpToDate / FastForward / Merged) plus a GraphAssert on the affected entities, or * an OmniError::MergeConflicts whose entries match the expected table_key + MergeConflictKind (row_id is optional because edge ULIDs are generated at runtime). branch_merge is directional, so the (L, R) and (R, L) cells live in separate entries in the matrix and are run independently — the op-pair symmetry encoded in build_case serves as the commutativity oracle without doubling the runtime. End-to-end the suite runs in ~10s on a fresh build, well under the 30s budget asserted at the bottom of the test. Also adds a row to docs/testing.md so the test-coverage map points future agents at this file. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Use one Omnigraph handle for both branches Self-review caught that the runner was opening two Omnigraph handles on the same temp dataset (one for main, a second via Omnigraph::open for feature). tests/branching.rs uses one handle and passes the branch name to mutate_branch — same pattern works here and avoids any cache-coherency surprises between the two handles. Also drops the post-merge reopen, which only existed to give the second handle a fresh snapshot. Runtime drops ~10s -> ~9s. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Assert exact conflict count, not subset inclusion cubic and Devin Review both flagged that check_outcome's Expected::Conflicts arm only enforces want ⊆ got, so a regression that produces a spurious extra conflict (e.g. emitting both OrphanEdge and a stray DivergentInsert) would silently pass the truth-table cell. For a deterministic oracle that's the wrong direction — the cell pins the exact conflict-artifact set, not a lower bound. Add an assert_eq!(got.len(), want.len()) before the existence loop. All 36 executable cells still pass; runtime unchanged. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Subsume 4 conflict tests in branching.rs into truth table The four `branch_merge_reports__conflict` tests (DivergentUpdate / DivergentInsert / DeleteVsUpdate / OrphanEdge) were redundant with the deterministic-oracle cells in the new `merge_truth_table.rs` and only added drift risk. To preserve the post-conflict invariant that lived in `branch_merge_reports_divergent_update_conflict` (target unchanged after a failed merge), the truth-table runner now generalizes it: on every `Conflicts` cell, main's state is asserted against `state_after_apply_only(right_op)`. That gives strictly more coverage than the deleted tests carried, since the invariant now applies to all* seven conflict cells, not just one. The `UniqueViolation` and `CardinalityViolation` cases stay in `branching.rs` — they're combinatorial (require >1 op per side with a non-default schema) and out of scope for the pair-wise truth table. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> * Fix misleading 'Total edges: 0' comment in (AddEdge, RemoveEdge) cell Devin Review flagged that the comment said 'Total edges: 0' while the parenthetical math evaluates to 1 (matching `GraphAssert::base()`). The assertion is correct; only the leading number in the comment was wrong. Reworded to 'Net edges: … = 1 (matches base)' so the prose agrees with both the math and the assertion. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com> --------- Co-authored-by: Ragnor <ragnor@modernrelay.com> Co-authored-by: Ragnor Comerford <ragnor.comerford@gmail.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-12 22:36:01 +03:00
Ragnor Comerford	3bd072c917	docs: add docs/transactions.md — branch-as-transaction explainer (#69 ) The architectural rule "no cross-query BEGIN/COMMIT; branches fill that role" lives in docs/invariants.md §VI.23 but is not surfaced anywhere user-facing. New users coming from Postgres/MySQL hit the gap when they realize multiple queries on main are independently atomic, not jointly atomic. This page explains the model with worked examples: * Single-query multi-statement (atomic by default) * Two separate queries on main (NOT atomic — common surprise) * Many queries via a branch (atomic at merge) * Coordinating multiple agents via branch-per-agent Plus a comparison table to BEGIN/COMMIT, failure-mode rundown, and "when to use what" decision matrix. Linked from AGENTS.md "Where to find each topic" between branches-commits.md and runs.md. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:35:57 +03:00
Ragnor Comerford	c9c7c0672e	Update README.md	2026-05-12 08:17:31 -07:00
Ragnor Comerford	c2e3a9e5c3	Add use cases for unified company brain and context graphs	2026-05-12 08:08:08 -07:00
Ragnor Comerford	676c9eab05	Merge pull request #78 from ModernRelay/devin/1778363660-mr-901-blob-branch-merge Fix branch merge with blob columns	2026-05-12 07:31:04 -07:00
Ragnor Comerford	d6d2763609	Merge pull request #80 from ModernRelay/devin/1778524905-mr-923-merge-restore-refresh Fix MR-923: refresh restored coordinator on merge Err path	2026-05-11 15:55:43 -07:00
Devin AI	725d41205e	Drop redundant server-level regression test The matrix cell d:merge×change:into-target already exercises this race: pre-fix it flakes ~20% on shared-CPU hardware (sentinel 409s); post-fix it passes 100% regardless of which side of the racing pair returns first. That flake-to-stable transition is the regression signal. The replacement test (concurrent_merge_clean_409_does_not_poison_next_ change_on_target) tried to sharpen this by looping until the clean- 409 path fired and then strictly requiring it. On fast CI hardware the race window never opens in 50 iterations, which made the strict variant fail in CI despite passing 10/10 locally. The bug genuinely needs a real concurrent writer to advance on-disk manifest during the swap window — a deterministic failpoint can't substitute because forcing the merge body to Err without a real concurrent writer leaves no cache-vs-disk drift to validate. Reverting to the matrix cell as the sole regression coverage. Updated the comment in merge.rs accordingly. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:57:47 +00:00
Devin AI	a6c7e5fab5	Use if-let shape for refresh outcome handling Switch from match-on-Result to if-let-Err so the refresh outcome and merge_result outcome are checked independently, making the intent clearer: 'attempt refresh; on Ok-merge-with-refresh-error propagate; on Err-merge-with-refresh-error log and surface the original merge error'. No semantic change — both shapes were valid (wildcard patterns don't move the scrutinee) — but the if-let form sidesteps a needs-second-reading question raised in code review. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:50:26 +00:00
Devin AI	7d1a40102c	Address review feedback merge.rs: best-effort refresh on the Err path so a refresh-time storage error doesn't replace the merge body's structured error (typically the manifest_conflict that the HTTP layer maps to a 409 with a structured payload) with a less informative one. Ok-path behavior is unchanged — there a refresh failure is propagated so the caller knows the coord's cache is unsynced. server.rs: bump MAX_ITERATIONS to 50 and assert at the end that the named clean-409 path actually fired at least once. With ~20% per-iter rate on shared-CPU CI (per the original MR-923 repro), P(no hit in 50) is < 0.002%. Without this assertion the test silently degraded to exercising only the 200-merge path — covered already by the matrix cell. Both changes per Devin Review + cubic comments on PR #80. Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>	2026-05-11 21:35:18 +00:00

1 2 3 4 5 ...

317 commits