* tests: add lance_surface_guards pre-flight pins for the v6 bump
Land 8 named guards in a new test file that pin Lance API surfaces
OmniGraph relies on. Each guard turns a silent-break risk (variant
rename, struct restructure, async-flip) into a red CI bar instead of
runtime drift.
Guards (mapped to the silent-break inventory from the v6 migration plan):
Runtime (#[tokio::test]):
1. lance_error_too_much_write_contention_variant_exists — pins the
variant referenced by db/manifest/publisher.rs::map_lance_publish_error.
2. manifest_location_field_shape — pins .path/.size/.e_tag/.naming_scheme
types and ManifestLocation accessor returning &Self (the access
pattern at db/manifest/metadata.rs:84-88).
6. write_params_default_does_not_set_storage_version — confirms our
explicit V2_2 pin remains load-bearing (blob v2 requirement).
Compile-only async fns (#[allow(...)] + unimplemented!() placeholders;
never run, but cargo build --tests enforces the API shape):
3. checkout_version + restore chain — pins the recovery rollback hammer
at db/manifest/recovery.rs:505-522.
4. DatasetBuilder::from_namespace().with_branch().with_version().load()
— pins the namespace builder chain at db/manifest/namespace.rs:162-174.
5. MergeInsertBuilder fluent chain — pins the manifest CAS at
db/manifest/publisher.rs:370-391, including the return shape
(Arc<Dataset>, MergeStats).
7. compact_files(&mut ds, CompactionOptions, None) — pins
db/omnigraph/optimize.rs:107.
8. DeleteResult { new_dataset, num_deleted_rows } — pins the inline
delete result shape (MR-A will repurpose this guard to the staged
two-phase variant once Lance #6658 migration lands).
This is commit 1 of the chore/lance-6.0.1 migration. Cargo bump
follows in commit 2 (will trigger the guards under v6 if any surface
drifted).
Per the migration plan at ~/.claude/plans/shimmering-percolating-duckling.md
(written this session). Two guards from the plan deferred to follow-up:
- manifest_cas_returns_row_level_contention_variant (full publisher
race integration test — needs harness scaffolding)
- table_version_metadata_byte_compatible_with_v4 (TableVersionMetadata
is pub(crate); requires test reach extension).
Verified on v4: cargo test -p omnigraph-engine --test lance_surface_guards
passes 3/3 runtime tests; cargo build -p omnigraph-engine --tests
compiles all 5 compile-only guards clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(deps): bump Lance 4.0.0 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58
The Cargo bump itself. Source is intentionally untouched — this commit
will not compile. The compile errors are the work-list for subsequent
commits on this branch.
Lance updates: lance + 7 sub-crates 4.0.0 → 6.0.1. Transitive churn:
+ lance-tokenizer v6.0.1 (vendored tokenizer per Lance PR #6512)
+ object_store 0.13.x (Lance 6 brings it transitively; our explicit
pin stays at 0.12.5 for now — revisit in stages if diamond bites)
- tantivy* crates (replaced by lance-tokenizer)
Compile error landscape on this commit (11 errors):
• 1× E0432: `lance_index::DatasetIndexExt` import (Lance PR #6280
moved it to lance::index). Sites: table_store.rs:20,
db/manifest.rs:37 (the second site was missed by the pre-flight
inventory).
• 8× E0599: `create_index_builder` / `load_indices` missing on
`lance::Dataset` — all downstream of the DatasetIndexExt move.
Once the import is corrected on table_store.rs and db/manifest.rs,
these resolve automatically.
• 2× E0063: missing field `is_only_declared` in `DescribeTableResponse`
initializer at db/manifest/namespace.rs:221, 364. New Lance
namespace field per the v5 namespace restructure (PR #6186).
Surface guards (lance_surface_guards.rs, commit d571fa8) all still
compile + the 3 runtime ones pass on v6 — none of the silent-break
surfaces drifted. That's the load-bearing observation: the publisher
CAS chain, ManifestLocation field shape, checkout_version/restore,
DatasetBuilder fluent chain, MergeInsertBuilder return shape,
WriteParams::default, compact_files signature, and DeleteResult
fields are all v6-stable.
Next commits address the 11 errors per the migration plan stages
3-8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* imports: move DatasetIndexExt to lance::index (Lance PR #6280)
Lance 5.0 (PR #6280) moved `DatasetIndexExt` out of `lance-index` into
`lance::index`. `is_system_index` and `IndexType` stayed in `lance-index`.
Mechanical update of 6 import sites:
crates/omnigraph/src/table_store.rs:20 — split into two `use` lines
crates/omnigraph-server/tests/server.rs:10 — was traits::DatasetIndexExt
crates/omnigraph/tests/search.rs:6
crates/omnigraph/tests/branching.rs:7
crates/omnigraph/tests/failpoints.rs:467
crates/omnigraph-cli/tests/cli.rs:3 — was traits::DatasetIndexExt
All 9 E0599 cascading errors on .create_index_builder / .load_indices
resolve once the trait is back in scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* namespace: add is_only_declared field to DescribeTableResponse
Lance namespace 6.0.0 added `is_only_declared: Option<bool>` to
`DescribeTableResponse` (lance-namespace-reqwest-client 0.7+ via the
v5.0 namespace API restructure, Lance PR #6186). Set to `Some(false)`
because every table BranchManifestNamespace returns from describe_table
is materialized — the manifest snapshot only includes entries for
tables we've already opened via Dataset::open.
Two sites in db/manifest/namespace.rs (BranchManifestNamespace +
StagedTableNamespace impls of LanceNamespace::describe_table).
Closes the last two compile errors from the v6 bump in the engine lib.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* cargo: add lance to omnigraph-cli + omnigraph-server dev-deps
Stage 3 moved DatasetIndexExt imports from `lance-index` to `lance::index`
in the cli and server test crates. Both crates only had `lance-index`
in their dev-dependencies; add `lance` alongside so the new path
resolves.
This is the last compile-error fix from the v6 bump — `cargo build
--workspace --tests` is now green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: refresh Lance alignment audit for v6.0.1; bump surveyed version
Per CLAUDE.md maintenance rule 2 (same-PR docs):
- docs/dev/lance.md: replace the v4.0.1 alignment audit stanza with
the v6.0.1 audit. Captures every v5/v6 finding from this PR (the
DatasetIndexExt move, DescribeTableResponse.is_only_declared,
MergeInsertBuilder return shape, ManifestLocation field shape,
LanceFileVersion::default flip, file-reader async, tokenizer
vendor, Lance #6658/#6666/#6877 status). Cross-references each
guard in tests/lance_surface_guards.rs.
- AGENTS.md: bump "Storage substrate: Lance 4.x" → "Lance 6.x".
Note: surveyed crate version stays at 0.4.2 — substrate version
bumps are independent of OmniGraph's release version.
- crates/omnigraph/src/storage_layer.rs: update the trait module-level
doc-comment to reflect that Lance #6658 closed 2026-05-14 and
delete_where two-phase migration is MR-A (the next follow-up).
#6666 stays open; create_vector_index inline residual stays.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* tests: silence clippy::diverging_sub_expression on compile-only guards
The five `_compile_*` async fns in lance_surface_guards.rs use
`let ds: Dataset = unimplemented!()` as a placeholder so type inference
can chase the method chain we want to pin, without ever running the
function. Clippy's `diverging_sub_expression` lint flags this pattern
because the RHS diverges; that's the entire point. Added to the
per-fn `#[allow(...)]` list, alongside dead_code / unreachable_code /
unused_variables / unused_mut already there.
No behavior change. cargo test -p omnigraph-engine --test
lance_surface_guards still 3/3 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: correct #6658 status — closed but API ships in Lance v7.x, not v6.0.1
The audit stanza in docs/dev/lance.md and the storage_layer.rs trait
doc-comment both implied the public DeleteBuilder::execute_uncommitted
API shipped with Lance 6.0.1. It did not. Issue #6658 closed
2026-05-14, but binary search across the release stream confirms:
v6.0.1 ❌ no pub async fn execute_uncommitted on DeleteBuilder
v6.1.0-rc.1 ❌
v7.0.0-beta.5 ❌
v7.0.0-beta.10 ✅ first appearance
v7.0.0-rc.1 ✅
So MR-A (delete two-phase migration) is gated on the Lance v7.x bump,
not on this PR. v7.0.0-rc.1 dropped 2026-05-21; GA likely within a
week.
No behavior change. Doc-only correction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci(lib): bump recursion_limit to 256 — Lance 6 trait depth on Linux
Lance 6's heavier trait surface around futures/streams in storage_layer.rs's
staged-write API pushes the rustc trait-resolution recursion limit past
the default 128 on Linux builds. CI on PR #111 surfaced this in both
`Test Workspace` and `Test omnigraph-server --features aws`:
error: queries overflow the depth limit!
= help: consider increasing the recursion limit by adding a
`#![recursion_limit = "256"]` attribute to your crate (`omnigraph`)
= note: query depth increased by 130 when computing layout of
`{async block@crates/omnigraph/src/storage_layer.rs:697:5: 697:10}`
(The async block is `stage_create_btree_index`'s body — its return type
is several layers of `impl Future<Output=Result<StagedHandle>>` deep on
top of Lance's own builder return types.)
Local macOS builds happened to short-circuit before tripping the limit,
which is why this didn't surface during the v6 bump sequence. The fix
rustc itself suggests is one line at the crate root.
No behavior change. Revisit if a future Lance bump stops needing it.
Verified: `cargo build --locked -p omnigraph-server --features aws`
compiles clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema-lint chassis v1.2 (PR #100) shipped `--allow-data-loss` on
the CLI, but `SchemaApplyRequest` had no equivalent field — Hard-mode
drops were CLI-only. This commit closes that feature gap and adds e2e
test coverage for drop modes across HTTP + CLI, plus data preservation
on additive apply, plus a CLI↔SDK plan-parity assertion.
Feature gap closed:
- `crates/omnigraph-server/src/api.rs` — added `allow_data_loss: bool`
(default false via `#[serde(default)]`) to `SchemaApplyRequest`.
Added `Default` derive so test usages can use `..Default::default()`.
- `crates/omnigraph-server/src/lib.rs` — `server_schema_apply` now
constructs `SchemaApplyOptions { allow_data_loss: request.allow_data_loss }`
and threads through to `apply_schema_as`.
- `crates/omnigraph-cli/src/main.rs` — remote-URI schema-apply path
used to bail with "--allow-data-loss not yet supported on remote";
now forwards the flag into the JSON payload so the CLI behaves
identically against local and remote URIs.
- `openapi.json` — regenerated; only diff is the new field on
`SchemaApplyRequest`.
Tests added (8 new):
* `crates/omnigraph-server/tests/server.rs` (+5):
- `schema_apply_route_soft_drops_property_via_http` — POST schema
removing nullable property, verify catalog reflects the drop AND
`snapshot_at_version(pre)` still has `age` in the field list
(time-travel reachability is the Soft contract).
- `schema_apply_route_soft_drops_node_type_via_http` — POST schema
removing `Company` node + cascading `WorksAt` edge.
- `schema_apply_route_hard_drops_property_with_allow_data_loss` —
POST with `allow_data_loss: true`, verify plan step reports
`mode: hard`.
- `schema_apply_route_keeps_drops_soft_without_flag` — same schema
without flag, verify `mode: soft`. Pins default semantics against
accidental Hard promotion.
- `schema_apply_route_additive_property_preserves_existing_rows` —
load fixture, POST adding nullable property, verify row count
preserved (SDK suite covers data preservation on drops + renames;
additive AddProperty wasn't pinned).
Plus helpers `schema_without_age` and `schema_without_company`.
* `crates/omnigraph-cli/tests/cli.rs` (+3):
- `schema_apply_allow_data_loss_flag_promotes_drops_to_hard` — CLI
`omnigraph schema apply --allow-data-loss --schema X.pg --json`,
verify plan step has `mode: hard`.
- `schema_apply_without_allow_data_loss_keeps_soft_drops` — without
flag, verify Soft.
- `schema_plan_parity_cli_and_sdk` — same `.pg` source through
`Omnigraph::plan_schema` (SDK) and `omnigraph schema plan --json`
(CLI), assert the steps array is byte-identical post-JSON. HTTP
has no `/schema/plan` endpoint; apply-side parity is implicitly
covered by the HTTP drop tests + CLI drop tests using identical
fixtures.
Docs:
- `docs/user/schema-language.md` — new "Destructive drops" section
documenting Soft vs Hard semantics and that `allow_data_loss` is
now honored uniformly across CLI / HTTP / SDK.
Verification: every new test passes; full `cargo test --workspace --locked`
green; `scripts/check-agents-md.sh` passes.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* tests: policy chassis e2e gap-fills (MR-722 follow-up)
Audit after PRs #101-105 surfaced real e2e gaps in the policy chassis
that could let regressions ride through silently. Coverage was strong
at the SDK level (18 chassis tests) and reasonable at HTTP (12+ policy
tests), but the CLI×writer matrix was asymmetric (only `change` tested
end-to-end), the `cli.actor` config-only precedence path was untested,
the `OMNIGRAPH_UNAUTHENTICATED` env-var read path was unexercised,
`serve()`'s startup-refusal propagation was structural-review only,
and engine↔HTTP decision parity was a structural property without a
test pinning it. This commit closes those gaps.
Added (15 new tests, all test-only):
* `policy_engine_chassis.rs` (+2): `load_file_as` allow + deny pair —
PR #104 added the actor-aware mirror of `load_file` but it was only
exercised via CLI integration; this is direct-SDK coverage.
* `omnigraph-server/src/lib.rs` mod tests (+2):
- `unauthenticated_env_var_classification` — consolidated single
test (process-global env var; running parallel would race) that
pins truthy values, falsy values, unset, and CLI-flag-overrides-
env behavior of the `OMNIGRAPH_UNAUTHENTICATED` read path inside
`load_server_settings`.
- `serve_refuses_to_start_in_state_1_without_unauthenticated` —
`#[serial]` integration test. Clears all bearer-token env vars,
builds a `ServerConfig` with no policy file and no flag, calls
`serve(config).await`, asserts Err before any side-effecting
work (Lance dataset open, TcpListener::bind). Guards the
classifier→serve propagation path so a future refactor that
drops the call turns red.
* `omnigraph-server/tests/server.rs` (+4): `policy_decision_parity_*`
— four cases (Change×allowed+denied, BranchMerge×allowed+denied).
Each case runs the same Cedar decision via both SDK
(`Omnigraph::with_policy().mutate_as` / `branch_merge_as`) and HTTP
(`POST /change` / `POST /branches/merge`) and asserts both either
Allow or Deny. The structural property (both paths call
`PolicyChecker::check`) is now test-asserted.
* `omnigraph-cli/tests/system_local.rs` (+8): the CLI×writer matrix
fan-out:
- `local_cli_load_enforces_engine_layer_policy`
- `local_cli_ingest_enforces_engine_layer_policy`
- `local_cli_schema_apply_enforces_engine_layer_policy`
- `local_cli_branch_create_enforces_engine_layer_policy`
- `local_cli_branch_delete_enforces_engine_layer_policy`
- `local_cli_branch_merge_enforces_engine_layer_policy`
Each: one denied case (`--as act-bruno` against protected main) +
one allowed case (`--as act-ragnor` via existing/extended admins-*
rules).
Plus:
- `local_cli_actor_from_config_used_when_no_flag` — proves the
config-only precedence path works.
- `local_cli_actor_flag_overrides_config_actor` — proves the
`--as` flag wins over `cli.actor` in the config.
Adds `local_policy_config_with_actor` helper. Extends
`POLICY_E2E_YAML` with `admins-branch-ops` (BranchCreate +
BranchDelete) and `admins-schema-apply` rules so the CLI×writer
matrix has positive-case rule coverage.
Verification: all new tests pass; full `cargo test --workspace
--locked` is green; `scripts/check-agents-md.sh` passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* tests: serialize env-touching server lib tests to fix CI flake
CI flake on PR #106's Test Workspace job: two of the new tests
(`serve_refuses_to_start_in_state_1_without_unauthenticated` and
`unauthenticated_env_var_classification`) raced against
`server_bearer_tokens_from_env_reads_legacy_token_and_token_file`,
which sets `OMNIGRAPH_SERVER_BEARER_TOKEN` via `EnvGuard`.
While `serve_refuses` was mid-execution with its EnvGuard cleared,
the bearer-token test's EnvGuard had `OMNIGRAPH_SERVER_BEARER_TOKEN`
set; `resolve_token_source()` saw it and classified the runtime
state as `DefaultDeny` rather than refusing — so the test panicked
with "Dataset at path X not found" instead of the expected refusal
message. The unauthenticated test had the symmetric failure: its
`OMNIGRAPH_UNAUTHENTICATED="anything"` got overwritten by a peer
`EnvGuard` drop.
Fix: mark every test that uses `EnvGuard` with `#[serial]` so they
serialize against each other (default key). Already on
`serve_refuses_to_start_in_state_1_without_unauthenticated`; added
to `unauthenticated_env_var_classification` and
`server_bearer_tokens_from_env_reads_legacy_token_and_token_file`.
The `parse_bearer_tokens_json_*` tests don't touch env vars and
stay parallel.
Locally green (36 tests pass on my workstation); the parallelism
issue is CI-runner-specific (more aggressive thread interleaving)
but the fix is universal.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Closes the CLI side of the policy chassis fan-out. Before this commit,
CLI direct-engine writes bypassed Cedar entirely because the CLI never
called `Omnigraph::with_policy(...)` for non-`policy validate|test|explain`
subcommands. After this commit, every CLI direct-engine writer
(change, load, ingest, branch create/delete/merge, schema apply) opens
the engine via a new `open_local_db_with_policy(uri, &config)` helper
that installs the configured `PolicyEngine` when `policy.file` is set,
and threads the resolved actor through to the `_as` writer methods.
Actor identity resolution:
- New top-level `--as <ACTOR>` global flag on the CLI overrides config.
- New `cli.actor` field in `omnigraph.yaml` provides a default actor.
- Precedence: `--as` > `cli.actor` > None.
- When policy is configured and neither is set, the engine-layer
footgun guard fires and the write is denied — silent bypass via
"I forgot the actor" is exactly what the guard prevents.
- Remote HTTP writes ignore both — bearer-token-resolved server-side.
Helpers added in main.rs:
- `open_local_db_with_policy(uri, &config) -> Result<Omnigraph>` —
opens the DB and installs the PolicyEngine when configured. Without
policy this is identical to a bare `Omnigraph::open`.
- `resolve_cli_actor(cli_as, &config) -> Option<&str>` — implements
the flag > config > None precedence.
Engine: added `load_file_as` to the loader as the actor-aware mirror of
`load_file`, so CLI file-path loads flow through the same enforce gate
as in-memory `load_as` calls.
Test rewrite: `local_cli_policy_tooling_is_end_to_end_while_local_writes_stay_unenforced`
was the explicit assertion of the pre-chassis hole. Renamed and split:
- `local_cli_policy_tooling_is_end_to_end` — sanity for the read-only
policy CLI surfaces (validate/test/explain), unchanged behavior.
- `local_cli_change_enforces_engine_layer_policy` — the new assertion:
policy installed + no actor → footgun-guard denial; `--as act-bruno`
on protected main → Cedar denial; `--as act-ragnor` (admins-write
rule) on main → permit, write committed.
POLICY_E2E_YAML gains an `admins-write` rule so the permit case has
a non-trivial actor to exercise.
docs/user/policy.md updated with `cli.actor` + `--as <ACTOR>` usage.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* schema-lint v1 commit 5: --allow-data-loss flag + Hard mode
Final v1 commit. Wires up the --allow-data-loss CLI flag and Hard
mode for both DropProperty and DropType. Per
docs/dev/schema-lint-v1-plan.md, commit #5 of the schema-lint
chassis v1 series (MR-694).
CLI (omnigraph-cli/src/main.rs):
- New --allow-data-loss flag on both `omnigraph schema plan` and
`omnigraph schema apply` subcommands. Off by default (Soft).
- HTTP remote schema apply explicitly rejects the flag for now
(CLI-only; HTTP parity is a separate small follow-up that adds
the field to SchemaApplyRequest + the server handler).
Engine (omnigraph.rs + schema_apply.rs):
- New SchemaApplyOptions { allow_data_loss: bool } public struct
(Default = all false), re-exported via omnigraph::db::SchemaApplyOptions.
- New public methods: plan_schema_with_options and
apply_schema_with_options. Existing plan_schema/apply_schema are
now thin wrappers that pass Default::default().
- promote_drops_to_hard: post-plan walk that promotes every
DropMode::Soft step to DropMode::Hard when the flag is set.
Keeps the compiler's plan_schema_migration signature unchanged
(no breaking change for tests / callers).
- Apply path: both Drop arms accept Hard mode; behavior is
identical to Soft inside the apply loop. The DIFFERENCE is the
new hard_cleanup_targets: Vec<(String, String)> accumulator,
populated for every Hard variant with (table_key, full_dataset_uri).
- Post-publish cleanup: a new loop after the manifest commit
iterates hard_cleanup_targets and calls cleanup_old_versions
(before_timestamp = now) on each dataset URI. Best-effort —
the apply is already durable; cleanup failure is logged via
tracing::warn rather than failing the apply.
- New cleanup_dataset_old_versions helper inlines the Lance
cleanup_old_versions call against a dataset URI.
Behavioral details:
- DropProperty Hard: stage_overwrite produced a new dataset version
without the column. cleanup_old_versions removes the prior version
(and reclaims unique fragments). After Hard apply,
snapshot_at_version(pre_drop).open(table_key) FAILS because the
prior dataset version was reclaimed.
- DropType Hard: no per-table write happens (the change is the
manifest tombstone). cleanup_old_versions on the orphan dataset
is a no-op in the immediate term (no prior versions to clean
since the dataset wasn't modified by this apply). The dataset
directory persists. Full orphan-cleanup is a documented
follow-up — the user-facing contract is "data is unreachable
via omnigraph" (manifest entry tombstoned), which is satisfied.
Tests (tests/schema_apply.rs):
- apply_schema_with_allow_data_loss_promotes_drops_to_hard:
default plan emits Soft; with options.allow_data_loss=true,
plan emits Hard; apply succeeds.
- apply_schema_hard_drops_property_makes_prior_version_unreachable:
Hard drop succeeds, current snapshot lacks the column, and
snapshot_at_version(pre_drop).open("node:Person") FAILS (Lance
prior version reclaimed by cleanup).
- apply_schema_hard_drops_node_and_edge_with_flag_succeeds: both
Node and Edge DropType variants are promoted to Hard with the
flag; apply succeeds; current manifest entries gone. (Orphan
dataset directory cleanup deferred.)
Test results:
- cargo test -p omnigraph-compiler --lib: 239 passed
- cargo test -p omnigraph-engine --test schema_apply: 14 passed
(3 new Hard tests + 11 existing soft/regression tests)
- cargo test -p omnigraph-server --test openapi: 60 passed (no
HTTP API surface changes in this commit; OpenAPI parity follow-up
noted)
v1 status: complete for CLI/embedded use. MR-694 chassis epic +
MR-700 DropType/DropProperty ticket can close after this lands.
Known follow-ups (separate small PRs):
- HTTP parity: extend SchemaApplyRequest with allow_data_loss field,
thread through server handler, regenerate openapi.json.
- Orphan-dataset directory deletion for DropType Hard (currently
the dataset directory persists; cleanup_old_versions doesn't
remove it because the dataset wasn't modified).
- MR-948 substrate alignment: swap DropProperty Soft from
stage_overwrite to Dataset::drop_columns (catalog_only vs
full_rewrite cost class).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fixup: use bail! from color_eyre::eyre instead of anyhow
The remote-rejection branch in SchemaCommand::Apply used
anyhow::anyhow! which isn't in scope; the CLI's Result type is
color_eyre::eyre::Result and bail! is already imported.
Caught by CI Test Workspace job on PR #100.
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* schema-lint chassis v1 (WIP): tier surfacing + plan doc
First commit of the chassis v1 branch. Lands a small, foundational
slice without behavior change, plus a planning doc that lays out the
remaining 7 commits in sequence so the PR can be reviewed
incrementally.
This commit:
- Adds SchemaMigrationStep::diagnostic() returning the full
&'static DiagnosticCode (family + tier + severity) for
UnsupportedChange steps with codes. Renderers can now reach the
tier without re-implementing the code → tier lookup.
- CLI `omnigraph schema plan` output now displays tier alongside
code:
unsupported change on node:Person.age [OG-DS-104, destructive]:
removing property 'Person.age' is not supported in schema
migration v1
Operators see at-a-glance the kind of risk each rejection
represents — not just the rule identifier.
- No behavior change. All 11 existing schema_apply tests still pass.
Planning doc at docs/schema-lint-v1-plan.md tracks the 7 remaining
commits to bring v1 to feature-complete:
1. (this commit) Tier surfacing in plan output.
2. Soft / Hard mode enum on drop steps.
3. Tombstone fields on catalog IR.
4. Planner emits DropProperty { Soft } by default.
5. Apply path implements Soft mode.
6. Convert PR #62 destructive-rejection tests.
7. --allow-data-loss flag + Hard mode.
8. (optional) Tombstone unhide / restore command.
Delete the planning doc when v1 lands. Intentionally checked in to
the WIP branch so the scope is reviewable; not intended as a
permanent doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* schema-lint v1 commit 2: DropMode + dormant Drop* variants
Second commit of the chassis v1 branch. Lands the type-level shape
of soft/hard drops without wiring them up. Variants are reachable
from emitters but the planner doesn't produce them yet; the apply
path returns an explicit not-yet-implemented error if one shows up
via deserialization.
Added:
- `DropMode { Soft, Hard }` — orthogonal to `SafetyTier`. Tier
classifies the rule's risk class; mode is the operator's intent
for data treatment.
- `Soft` → catalog tombstone, data retained. Tier: safe.
- `Hard` → Lance-level removal. Tier: destructive; will require
--allow-data-loss to apply (commit 7).
- `SchemaMigrationStep::DropType { type_kind, name, mode }` and
`SchemaMigrationStep::DropProperty { type_kind, type_name,
property_name, mode }` variants.
- Re-export `DropMode` from `omnigraph_compiler::DropMode` so
downstream crates don't reach into the catalog submodule.
- CLI `render_schema_plan_step` arms for both variants, surfacing
the mode in plan output: `drop property 'Person.age' of node
'Person' (soft mode)`.
- `apply_schema_with_lock` exhaustive match arm for the two new
variants that returns `manifest_internal` with a clear
not-yet-implemented message. If a SchemaIR JSON containing
Drop{Type,Property} arrives (e.g. from a future tool or hand-
written), the apply path fails explicitly rather than silently
misclassifying.
- Two new in-source tests:
- `drop_steps_round_trip_through_serde` — pins the wire shape
for all four (variant × mode) combinations.
- `drop_mode_serde_uses_snake_case` — pins external-tool-
friendly serialization (`"soft"` / `"hard"`).
Build: clean, only pre-existing warnings.
Tests:
- omnigraph-compiler schema_plan: 6/6 (4 existing + 2 new).
- omnigraph-engine schema_apply: 11/11 (unchanged — planner still
emits UnsupportedChange for removal paths).
Next commit (commit 3 per docs/schema-lint-v1-plan.md): add the
`tombstoned: bool` fields to NodeIR / EdgeIR / PropertyIR for the
catalog representation of soft-mode tombstones.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* plan doc: reframe v1 around Lance native drop_columns
After a substrate audit of the Lance data-evolution guide on
2026-05-13, the v1 plan was simplified. Two key findings:
1. Lance's `drop_columns()` is already metadata-only and reversible
via time travel until cleanup. No need for a parallel
`tombstoned: bool` field in our catalog IR — Lance's version
graph IS the tombstone.
2. The full schema_apply substrate migration (add_columns,
drop_columns, alter_columns vs. stage_overwrite across all step
types) is consolidated in MR-948 as a sibling issue. v1 only
uses the relevant slice (drop_columns for OG-DS-1XX).
Net plan changes:
- Commit 3 (original): tombstone fields on catalog IR → dropped.
No catalog IR change needed. The Lance drop_columns commit IS the
tombstone.
- Commit 5 (original): apply path writes tombstoned: true → replaced
with: apply path calls Dataset::drop_columns([name]).
- Commit 7 Hard mode: stage_overwrite removing the column → replaced
with: drop_columns + compact_files + cleanup_old_versions. Same
APIs omnigraph cleanup already uses.
- Commit 8 (original): omnigraph schema unhide → dropped. Time
travel is the undo (omnigraph snapshot --at <commit>).
Net result: 8 commits → 5 commits. ~250 LoC less surface. More
substrate-aligned.
The chassis types from commit 2 (DropMode enum, DropType /
DropProperty variants) are kept exactly as designed; only the
implementation strategy changed.
The Lance docs quote is included in the doc so future readers see
the substrate behavior cited verbatim.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* schema-lint v1 commit 3: emit + apply DropProperty { Soft }
Wire the dormant DropProperty variant end-to-end for the Soft case.
Per docs/schema-lint-v1-plan.md, commit #3 of the schema-lint chassis
v1 series (MR-694).
Planner (schema_plan.rs):
- plan_properties: emit DropProperty { type_kind, type_name,
property_name, mode: Soft } instead of UnsupportedChange when a
property exists in accepted but not in desired. Plan is now
supported = true for drop-only changes.
Apply (schema_apply.rs):
- Route DropProperty { Soft } through rewritten_tables. The existing
batch_for_schema_apply_rewrite path already iterates the *target*
schema fields, so a property absent from desired_catalog is
naturally projected away. The prior Lance version retains the
dropped column for time-travel reversibility (until cleanup runs).
- DropType still errors (lands in commit #4 with different mechanics:
__manifest entry removal instead of column projection).
- DropProperty { Hard } still errors (lands in commit #5 with
--allow-data-loss CLI flag + immediate compact_files +
cleanup_old_versions).
Tests:
- Planner unit test plan_emits_soft_drop_for_removed_nullable_property
asserts the variant emission + supported = true + no UnsupportedChange.
- Integration test apply_schema_drops_a_nullable_property_softly_
preserves_prior_version (replaces the former
apply_schema_rejects_dropping_a_property_with_data) asserts:
(a) plan contains DropProperty { Soft }
(b) apply succeeds + manifest advances + row count unchanged
(c) current dataset schema lacks the dropped column
(d) snapshot_at_version(pre_drop) still has the dropped column
(e) reopen consistency — drop preserved across engine restart
Recovery: rides on SidecarKind::SchemaApply per MR-847. No new
sidecar kind needed; the entire apply path is already sidecar-wrapped.
Substrate alignment: this commit uses the stage_overwrite full-rewrite
path (full_rewrite cost class) rather than Lance native drop_columns
(catalog_only cost class). MR-948 is the follow-up substrate-alignment
refactor that introduces a LanceColumnOp surface and switches the
metadata-only case onto drop_columns. Functional outcome is identical;
cost-class improvement deferred.
Test results:
- cargo test -p omnigraph-compiler --lib: 238 passed
- cargo test -p omnigraph-engine --test schema_apply: 11 passed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: move schema-lint-v1-plan into docs/dev/ + add to index
Post-rebase fixup for the docs split (#93). The plan doc was added
to docs/ at the top level before main reorganized to docs/{user,dev}/.
This moves it into docs/dev/ and adds an entry to docs/dev/index.md
under a new "Active Implementation Plans" section so the
check-agents-md.sh link check passes.
Per the original commit message (617a77d), the plan doc is intentionally
temporary — it will be deleted when v1 lands.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First slice of the schema-lint chassis. Adds stable `OG-XXX-NNN`
codes to schema-migration rejections so operators can suppress, look
up, and filter on identifiers rather than free-text prose. Atlas-style
chassis adapted to omnigraph's typed-IR substrate (no SQL injection
vector, no per-engine locks, native edge/vector/embedding types).
What's in v0:
- New `omnigraph-compiler/src/lint/` module with:
- `diagnostic.rs` — Family / SafetyTier / Severity enums covering ten
families (DS, MF, CD, BC, NM, OW, NL, VE, ED, LK). Only DS and MF
are populated in this PR.
- `codes.rs` — 8 DiagnosticCode constants (OG-DS-101..105,
OG-MF-103, OG-MF-104, OG-MF-106). Five of the eight are wired to
real emission sites; the other three are reserved.
- Unit tests for catalog invariants: codes unique, prefix matches
family, suffixes are 3-digit, destructive defaults to error,
lookup() works, EMITTED_IN_V0 codes exist in ALL_CODES.
- `SchemaMigrationStep::UnsupportedChange` gains an optional
`code: Option<String>` field. New `unsupported_error_message()`
helper prefixes the message with `[code]` when present.
- 5 of 17 existing rejection paths now carry codes:
- `removing node type` → OG-DS-102
- `removing edge type` → OG-DS-103
- `removing property` → OG-DS-104
- `adding required property without backfill` → OG-MF-103
- `changing property type` → OG-MF-106
Remaining 12 paths carry `code: None` and are tagged as future work.
- `schema_apply` surfaces the formatted error (with `[code]` prefix);
CLI `omnigraph schema plan` renders the code on the
`unsupported change on <entity>` line.
- PR #62 destructive-rejection tests in `tests/schema_apply.rs` now
assert on the stable code (`msg.contains("OG-DS-104")`) instead of
the error-message substring. 11/11 tests pass.
- New `docs/schema-lint.md` documents the v0 catalog + the 10 families
+ Atlas prior art. AGENTS.md index updated.
What's explicitly NOT in v0 (subsequent PRs):
- No severity config in `omnigraph.yaml` (MR-694 §2).
- No `@allow(OG-XXX-NNN, "rationale")` suppression directive (§3).
- No `--allow-data-loss` flag or destructive-tier enforcement.
- No new `SchemaMigrationStep` variants (soft/hard drops, default,
widen/narrow). MR-700, MR-697 land those.
- No pre-migration checks (MR-941).
- No CD / VE / LK / NM family rules (MR-942..945).
- No CI integration (MR-946).
Tests: 235 compiler tests, 11 schema_apply integration tests, 14
lint module tests, 55 CLI tests — all green.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundles the working-tree state from the prior session (PR 0 bench harness,
PR 1a audit_actor_id removal, PR 1b WriteQueueManager + writer integration)
together with the first half of PR 2's interior-mutability foundation
(catalog and schema_source wrapped in Arc<ArcSwap<...>>). The two streams
intermix in 7 of the same files, so splitting via git add -p was
impractical. Subsequent PR 2 steps land as separate atomic commits.
PR 0 — server-level concurrent /change bench harness
- crates/omnigraph-server/examples/bench_concurrent_http.rs (new)
- .context/bench-results/{baseline-main,after-pr1}/ (gitignored)
PR 1a — drop the audit_actor_id field, thread per-call
- removed Omnigraph::audit_actor_id and the swap-restore patterns in
mutation.rs, merge.rs, loader/mod.rs
- actor_id: Option<&str> threaded through MutationStaging::finalize,
mutate_with_current_actor, ingest_with_current_actor,
branch_merge_impl, branch_merge_on_current_target,
commit_prepared_updates*, record_merge_commit,
commit_updates_on_branch_with_expected
- apply_schema and ensure_indices_for_branch pass None (system-attributed)
PR 1b — per-(table_key, branch) write queue + revalidation + sidecar
- new crates/omnigraph/src/db/write_queue.rs with WriteQueueManager,
acquire/acquire_many, sorted+deduped acquisition; 6 unit tests
- Arc<WriteQueueManager> field on Omnigraph + db.write_queue() accessor
- MutationStaging::finalize split into stage_all (Phase A, no queue)
and StagedMutation::commit_all (Phase B, acquire_many + revalidate
pins + sidecar + commit_staged); guards held across publisher
- delete-only mutations now emit recovery sidecars; revalidation
extended to inline_committed tables
- branch_merge_on_current_target, apply_schema_with_lock, and
ensure_indices_for_branch acquire per-table queues for their
touched tables
PR 2 Step B (partial) — catalog and schema_source via ArcSwap
- catalog: Catalog -> Arc<ArcSwap<Catalog>>
- schema_source: String -> Arc<ArcSwap<String>>
- public accessors return Arc<Catalog> / Arc<String>; readers bind
locally where the borrow has to outlive an expression
- new pub(crate) store_catalog / store_schema_source helpers replace
the field assignments in apply_schema and reload_schema_if_source_changed
- 117 tests across lifecycle/end_to_end/branching/runs pass; engine
lib + workspace compile clean
Coordinator wrap (Mutex) and the &mut self -> &self engine API
conversion follow in subsequent commits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OmniGraph is OSS; internal Linear ticket references and code-review-bot
mentions in source-code comments don't help external readers and leak
internal tooling. Replace ticket numbers (MR-XXX) with descriptive
prose, drop linear.app URLs, and remove inline mentions of
Cursor/Bugbot/Cubic/Codex review threads.
Scope is limited to source-code comments (`crates/`). Docs under
`docs/` keep their MR-XXX references — those are part of the
established change-history narrative for in-repo docs and don't
require a Linear account to find context for.
No behavior changes; no public API changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Parallel per-type load writes + omnigraph optimize/cleanup CLI
## MR-677.3 — parallel per-type load writes
The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).
## MR-676 — `omnigraph optimize` and `omnigraph cleanup`
New CLI subcommands that walk every node + edge table in the repo:
- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
runs Lance `cleanup_old_versions` to prune historical manifests +
unique fragments. Requires `--confirm` because it's destructive.
Supports both count-based and time-based retention (or both AND'd
together). Time uses chrono `DateTime<Utc>` (added as a workspace
dep, default-features off).
Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.
Public API on `Omnigraph`:
pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
-> Result<Vec<TableCleanupStats>>
All 10 existing loader tests still pass.
Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: regenerate openapi.json
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Review feedback on #23, applied on top of the original commit:
- Rename the CLI subcommand from `schema get` to `schema show` to match
the existing `run show` / `commit show` convention. A `#[command(alias
= "get")]` preserves muscle memory for anyone who already typed `get`.
- Rename `SchemaGetOutput` → `SchemaOutput` and its field `source` →
`schema_source`, so the get response and the apply request use the
same field name for the same concept.
- Use `println!` instead of `print!` in the CLI so the shell prompt
doesn't land on the last line of schema output.
- Add three integration tests on `/schema`: happy path (no auth),
401 when bearer is required but missing, 403 when the policy grants
the actor branch_create but not read.
Follow-ups left for a separate PR: include `schema_ir_hash` and
`schema_identity_version` in the response payload so clients can do
drift detection and the server can set an ETag; and a fast-path local
read that skips `Omnigraph::open()` when only the schema source is
needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Exposes the existing schema_source() method via a new `omnigraph schema get`
CLI subcommand and a `GET /schema` API endpoint, allowing users to retrieve
the current accepted schema from any graph repository.
https://claude.ai/code/session_01UYybeBQks3fz3RJrTHtwQw