Commit graph

21 commits

Author SHA1 Message Date
Ragnor Comerford
4e2f18a95e
mr-668: split PolicyEngine::load into kind-typed loaders
Pre-fix, every caller of `PolicyEngine::load(path, graph_id)`
passed *some* `graph_id` argument — even when the policy was
server-scoped and Cedar's resolution would never touch a Graph
entity. The server-level loader at lib.rs passed the meaningless
sentinel `"server"`. A graph policy file containing a `graph_list`
rule compiled fine; a server policy file containing a `read` rule
compiled fine. Both silently no-op'd at request time because the
engine kind and the rule's resource kind disagreed.

Correct-by-design fix: replace `load` with two kind-typed loaders.

* `PolicyEngine::load_graph(path, graph_id)` — for per-graph
  policy files. Rejects any rule whose action `resource_kind()`
  is `Server`.
* `PolicyEngine::load_server(path)` — for server-level policy
  files. Takes no `graph_id`: server-scoped actions resolve against
  the singleton `Omnigraph::Server::"root"` entity, never a Graph.
  Rejects any rule whose action `resource_kind()` is `Graph`.

The old `load` is hard-deleted in the same commit because every
in-tree consumer migrates here (no semver promise on the workspace
crate, no external pinners). New `PolicyEngineKind` enum types
the loader's intent; `validate_kind_alignment` is the load-time
check that closes the "wrong action, wrong file, silent no-op"
class — operators get a load-time error instead of confused-and-
silent behavior at request time.

Callsites migrated:
* server lib.rs:374 (single-mode per-graph)   → load_graph
* server lib.rs:1065 (multi-mode server)      → load_server
* server lib.rs:1103 (multi-mode per-graph)   → load_graph
* CLI main.rs:732 (resolve_policy_engine)     → load_graph
* tests/server.rs ×5 (4 graph, 1 server)      → load_graph/load_server
* policy_engine_chassis.rs                    → load_graph

Four new in-source tests pin the contract: both rejection paths
and both positive paths.

Closes the "operator puts an action in the wrong file and the
rule silently never matches" class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:35:22 +02:00
Ragnor Comerford
67a46528ef
mr-668: close init re-init footgun via InitOptions preflight (green)
`Omnigraph::init` is "create a new graph"; existing graphs need
an explicit overwrite. Today's behavior — silently overwrite
schema files, then on inner failure delete them via best-effort
cleanup — is destructive against an existing graph regardless of
which branch fires.

Correct-by-design fix:

* New `InitOptions { force: bool }` struct (default `force: false`).
* New `Omnigraph::init_with_options(uri, schema, options)`. The
  old `Omnigraph::init(uri, schema)` is a thin shortcut that
  passes `InitOptions::default()`.
* `init_with_storage` runs a `storage.exists()` preflight on the
  three schema URIs BEFORE any parse, write, or coordinator call.
  Any hit → typed `OmniError::AlreadyInitialized { uri }`. The
  destructive code paths (the `write_text` overwrite and the
  best-effort cleanup) are now unreachable in strict mode against
  an existing graph.
* `force: true` skips the preflight; existing operators who
  actually mean to overwrite opt in explicitly.
* CLI: `omnigraph init --force` maps to `InitOptions { force: true }`.
* HTTP: `OmniError::AlreadyInitialized` maps to 409 via
  `ApiError::from_omni`. Not currently HTTP-reachable (POST /graphs
  was pulled), but the wiring lands here so a future runtime
  create endpoint has one canonical translation.

Closes the "init is destructive against existing state" class.
The regression test added in the previous commit
(`init_on_existing_graph_uri_does_not_destroy_existing_schema`)
turns green: the original schema files now survive a second
init attempt byte-for-byte, and the call errors cleanly with
`AlreadyInitialized`. The four existing
`init_failpoint_after_*_cleans_up_*` tests stay green — strict
mode's preflight passes on a fresh tempdir, and cleanup still
runs as before when a failpoint fires mid-write.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 13:24:49 +02:00
Ragnor Comerford
76ee061cac
mr-668: drop actor_id from PolicyRequest; pass actor as separate arg
The MR-731 "server-authoritative actor identity" invariant was enforced
by an in-function chokepoint (`request.actor_id = actor.actor_id...`
overwrite inside `authorize_request`). That worked but relied on every
caller passing in a `PolicyRequest` and trusting the overwrite — a
comment-enforced invariant.

Move the invariant into the type system:

* `PolicyRequest` no longer carries `actor_id`. The struct now models
  what a caller wants to do, not who they are.
* `PolicyEngine::authorize(actor_id: &str, request: &PolicyRequest)`
  and `validate_request(actor_id, request)` take identity as a
  separate argument. The same shape `PolicyChecker::check` already had
  for the engine layer.
* `authorize_request` in the HTTP layer extracts `actor_id` from the
  bearer-resolved `ResolvedActor` and passes it positionally — no
  overwrite step that could be skipped.
* CLI `omnigraph policy explain` updated (the only other consumer
  that built a `PolicyRequest`).

Public API break for the `omnigraph-policy` crate. Worth it: handlers
can no longer accidentally populate `actor_id` from a request body
field, and external consumers are forced by the compiler to source
actor identity from a trusted path.

The MR-731 chokepoint test
`actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers`
still passes — the bearer-resolved actor is what reaches the engine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 12:00:52 +02:00
Ragnor Comerford
937fd6382d
mr-668: remove POST /graphs and CLI graphs create (defer runtime graph mgmt)
The POST /graphs runtime-create endpoint shipped in PR 7/10 has three
unresolved high-severity bugs:

  - flock-on-renamed-inode race: the YAML flock is taken on
    omnigraph.yaml itself, then a temp file is renamed over it.
    Cross-process writers end up locking different inodes — both
    believing they hold exclusive access.
  - duplicate-check outside the file lock: precheck runs against
    the in-memory registry only; the locked closure does
    config.graphs.insert(...) unconditionally. Concurrent same-id
    POSTs can persist the loser in YAML while the in-memory registry
    keeps the winner — they disagree after restart.
  - best_effort_cleanup_init_artifacts deletes _schema.pg /
    _schema.ir.json / __schema_state.json on any init failure. An
    accidental re-init against an existing graph's URI destroys its
    schema; subsequent open() fails at read_text(_schema.pg).

The correct fix is a Lance-style cluster catalog (reserve → init →
publish with recovery sidecars), parallel to the engine's existing
__manifest discipline. That work is out of scope for v0.7.0.

For now, disable runtime add/remove from the network and CLI surface.
Operators add graphs by editing omnigraph.yaml and restarting. The
GET /graphs read-only enumeration stays.

Removed:
- POST /graphs handler + router fragment + utoipa registration
- 13 post_graphs_* server tests + 3 composite POST tests +
  multi_mode_app_with_real_config / post_graph helpers
- CLI omnigraph graphs create subcommand + its handler + cli.rs tests
- system_remote.rs combined list+create test trimmed to list-only
- YAML rewrite infra: rewrite_atomic[_with_modify], RewriteAtomicError,
  staging_path, hash_config_file, AppState::config_hash field +
  threading through new_multi and open_multi_graph_state
- fs2 dependency (verified absent from cargo tree)
- sha2/fs2 imports in config.rs (only the rewrite path used them)
- Cedar PolicyAction::GraphCreate variant + "graph_create" match arms
  + action def in Cedar schema + graph_create_action_authorizes_against_server_resource test
- GraphCreateRequest / GraphCreateResponse / GraphSchemaSpec /
  GraphPolicySpec API types (only the POST handler / CLI imported them)

Kept:
- GET /graphs (read-only enumeration) and graph_list Cedar action
- omnigraph graphs list CLI subcommand
- All multi-graph startup, mode inference, cluster routes,
  per-graph + server-level Cedar policies
- server_settings_drive_multi_graph_startup_end_to_end (the test
  that covers operator-authored YAML + restart — the path that
  survives)
- best_effort_cleanup_init_artifacts and the three init failpoints
  (still reachable from CLI `omnigraph init`; preflight fix deferred
  as a follow-up)
- GraphRegistry::insert and its concurrency tests — production
  callers gone, but the method is the natural seam for the future
  cluster-catalog work

Also fixed (transcript issue 4):
- ALWAYS_FLAT_PATHS now includes /graphs so multi-mode OpenAPI
  advertises the management route correctly (was previously rewritten
  to /graphs/{graph_id}/graphs)
- multi_mode_openapi_keeps_healthz_flat → renamed to
  multi_mode_openapi_keeps_management_paths_flat, asserts both
  /healthz and /graphs stay flat
- multi_mode_openapi_prefixes_operation_ids_with_cluster skips
  /graphs in addition to /healthz

Doc fixes:
- docs/user/cli.md: graphs list example was --target http://...,
  but --target is a config-graph-name lookup; corrected to --uri.
  Removed the graphs create example.
- docs/user/server.md: dropped POST /graphs row, "omnigraph.yaml
  ownership", and "POST /graphs body shape" sections. Added a
  paragraph stating runtime add/remove is not exposed in v0.7.0.
- docs/user/policy.md: dropped graph_create action; reworded the
  "Configuration" line to clarify that server-scoped rules (graph_list)
  take neither branch_scope nor target_branch_scope.
- docs/releases/v0.7.0.md: rewrote release narrative — multi-graph
  mode ships; runtime add/remove deferred.
- AGENTS.md: HTTP server bullet and capability matrix row updated to
  reflect read-only GET /graphs and the operator-edit workflow.
- openapi.json regenerated; /graphs has only .get, no .post.

Diff: 17 files, +123 −1525 LOC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 17:49:38 +02:00
Ragnor Comerford
75514b6cfd
mr-668: CLI omnigraph graphs list/create (PR 8/10)
PR 8 of the MR-668 multi-graph server work. CLI parity for the
v0.7.0 management surface: operators can now manage graphs from
the command line against a running multi-graph server.

  omnigraph graphs list --target dev --json
  omnigraph graphs create \
    --target dev \
    --graph-id beta \
    --graph-uri /data/beta.omni \
    --schema schema.pg

DELETE is intentionally absent — server-side DELETE was deferred from
v0.7.0 scope, and shipping a client subcommand for a server endpoint
that doesn't exist would be dead vocabulary. The help output, the
subcommand enum, and the test that pins it (`graphs_subcommand_help_
lists_list_and_create`) all agree.

CLI architecture (modeled on `BranchCommand`):
  - New `Command::Graphs { command: GraphsCommand }` top-level variant.
  - `GraphsCommand { List, Create }` enum.
  - List: GET `<base>/graphs`. Stdout is `<graph_id>\t<uri>` per line,
    or JSON via `--json`.
  - Create: reads `--schema <path>` from local disk, inlines as
    `schema: { source: <file> }` in the POST body (nested per
    MR-668 decision 7). Optional `--policy-file <path>` becomes
    `policy: { file: <path> }`. Returns 201 → "created graph X at Y"
    or JSON via `--json`.
  - Both subcommands reject local URI targets with a clear
    "remote multi-graph server URL" error.

New API type imports in the CLI: `GraphCreateRequest`,
`GraphCreateResponse`, `GraphListResponse`, `GraphSchemaSpec`,
`GraphPolicySpec` — all from `omnigraph-server::api`.

Tests:
  - cli.rs (4 new, non-network):
      * `graphs_subcommand_help_lists_list_and_create` — pins the
        deferral of `delete` (catches scope creep).
      * `graphs_list_against_local_uri_errors_with_remote_only_message`
      * `graphs_create_against_local_uri_errors_with_remote_only_message`
      * `graphs_create_with_missing_schema_file_errors` — pins the
        IO context in the schema-read error path.
  - system_remote.rs (1 new, `#[ignore]` like its peers):
      * `graphs_list_and_create_against_multi_graph_server` — spawns a
        multi-mode server, calls `graphs list` (sees `alpha`),
        `graphs create` (adds `beta`), `graphs list` again (sees both),
        and confirms the new graph is reachable via its cluster route.

CLI suite: 62 tests green (58 existing + 4 new). The new ignored
end-to-end test runs locally with `cargo test --ignored`.

LOC: +159 main.rs (enum + handlers), +88 cli.rs (unit tests),
+131 system_remote.rs (integration test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 20:54:21 +02:00
Ragnor Comerford
cc2412dc65
Rename repo terminology to graph (#118)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
2026-05-24 16:46:00 +01:00
Andrew Altshuler
aadfa11ecb
schema: HTTP allow_data_loss exposure + e2e drop coverage (MR-694 follow-up) (#107)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
The schema-lint chassis v1.2 (PR #100) shipped `--allow-data-loss` on
the CLI, but `SchemaApplyRequest` had no equivalent field — Hard-mode
drops were CLI-only. This commit closes that feature gap and adds e2e
test coverage for drop modes across HTTP + CLI, plus data preservation
on additive apply, plus a CLI↔SDK plan-parity assertion.

Feature gap closed:

- `crates/omnigraph-server/src/api.rs` — added `allow_data_loss: bool`
  (default false via `#[serde(default)]`) to `SchemaApplyRequest`.
  Added `Default` derive so test usages can use `..Default::default()`.
- `crates/omnigraph-server/src/lib.rs` — `server_schema_apply` now
  constructs `SchemaApplyOptions { allow_data_loss: request.allow_data_loss }`
  and threads through to `apply_schema_as`.
- `crates/omnigraph-cli/src/main.rs` — remote-URI schema-apply path
  used to bail with "--allow-data-loss not yet supported on remote";
  now forwards the flag into the JSON payload so the CLI behaves
  identically against local and remote URIs.
- `openapi.json` — regenerated; only diff is the new field on
  `SchemaApplyRequest`.

Tests added (8 new):

* `crates/omnigraph-server/tests/server.rs` (+5):
  - `schema_apply_route_soft_drops_property_via_http` — POST schema
    removing nullable property, verify catalog reflects the drop AND
    `snapshot_at_version(pre)` still has `age` in the field list
    (time-travel reachability is the Soft contract).
  - `schema_apply_route_soft_drops_node_type_via_http` — POST schema
    removing `Company` node + cascading `WorksAt` edge.
  - `schema_apply_route_hard_drops_property_with_allow_data_loss` —
    POST with `allow_data_loss: true`, verify plan step reports
    `mode: hard`.
  - `schema_apply_route_keeps_drops_soft_without_flag` — same schema
    without flag, verify `mode: soft`. Pins default semantics against
    accidental Hard promotion.
  - `schema_apply_route_additive_property_preserves_existing_rows` —
    load fixture, POST adding nullable property, verify row count
    preserved (SDK suite covers data preservation on drops + renames;
    additive AddProperty wasn't pinned).
  Plus helpers `schema_without_age` and `schema_without_company`.

* `crates/omnigraph-cli/tests/cli.rs` (+3):
  - `schema_apply_allow_data_loss_flag_promotes_drops_to_hard` — CLI
    `omnigraph schema apply --allow-data-loss --schema X.pg --json`,
    verify plan step has `mode: hard`.
  - `schema_apply_without_allow_data_loss_keeps_soft_drops` — without
    flag, verify Soft.
  - `schema_plan_parity_cli_and_sdk` — same `.pg` source through
    `Omnigraph::plan_schema` (SDK) and `omnigraph schema plan --json`
    (CLI), assert the steps array is byte-identical post-JSON. HTTP
    has no `/schema/plan` endpoint; apply-side parity is implicitly
    covered by the HTTP drop tests + CLI drop tests using identical
    fixtures.

Docs:

- `docs/user/schema-language.md` — new "Destructive drops" section
  documenting Soft vs Hard semantics and that `allow_data_loss` is
  now honored uniformly across CLI / HTTP / SDK.

Verification: every new test passes; full `cargo test --workspace --locked`
green; `scripts/check-agents-md.sh` passes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 01:56:46 +03:00
Andrew Altshuler
a275306a15
policy: CLI policy injection — local writes go through engine enforce (MR-722) (#104)
Closes the CLI side of the policy chassis fan-out. Before this commit,
CLI direct-engine writes bypassed Cedar entirely because the CLI never
called `Omnigraph::with_policy(...)` for non-`policy validate|test|explain`
subcommands. After this commit, every CLI direct-engine writer
(change, load, ingest, branch create/delete/merge, schema apply) opens
the engine via a new `open_local_db_with_policy(uri, &config)` helper
that installs the configured `PolicyEngine` when `policy.file` is set,
and threads the resolved actor through to the `_as` writer methods.

Actor identity resolution:

- New top-level `--as <ACTOR>` global flag on the CLI overrides config.
- New `cli.actor` field in `omnigraph.yaml` provides a default actor.
- Precedence: `--as` > `cli.actor` > None.
- When policy is configured and neither is set, the engine-layer
  footgun guard fires and the write is denied — silent bypass via
  "I forgot the actor" is exactly what the guard prevents.
- Remote HTTP writes ignore both — bearer-token-resolved server-side.

Helpers added in main.rs:

- `open_local_db_with_policy(uri, &config) -> Result<Omnigraph>` —
  opens the DB and installs the PolicyEngine when configured. Without
  policy this is identical to a bare `Omnigraph::open`.
- `resolve_cli_actor(cli_as, &config) -> Option<&str>` — implements
  the flag > config > None precedence.

Engine: added `load_file_as` to the loader as the actor-aware mirror of
`load_file`, so CLI file-path loads flow through the same enforce gate
as in-memory `load_as` calls.

Test rewrite: `local_cli_policy_tooling_is_end_to_end_while_local_writes_stay_unenforced`
was the explicit assertion of the pre-chassis hole. Renamed and split:

- `local_cli_policy_tooling_is_end_to_end` — sanity for the read-only
  policy CLI surfaces (validate/test/explain), unchanged behavior.
- `local_cli_change_enforces_engine_layer_policy` — the new assertion:
  policy installed + no actor → footgun-guard denial; `--as act-bruno`
  on protected main → Cedar denial; `--as act-ragnor` (admins-write
  rule) on main → permit, write committed.

POLICY_E2E_YAML gains an `admins-write` rule so the permit case has
a non-trivial actor to exercise.

docs/user/policy.md updated with `cli.actor` + `--as <ACTOR>` usage.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 04:06:21 +03:00
Andrew Altshuler
a6e037547f
schema-lint chassis v1.2: --allow-data-loss flag + Hard mode (MR-694) — completes v1 (#100)
* schema-lint v1 commit 5: --allow-data-loss flag + Hard mode

Final v1 commit. Wires up the --allow-data-loss CLI flag and Hard
mode for both DropProperty and DropType. Per
docs/dev/schema-lint-v1-plan.md, commit #5 of the schema-lint
chassis v1 series (MR-694).

CLI (omnigraph-cli/src/main.rs):
- New --allow-data-loss flag on both `omnigraph schema plan` and
  `omnigraph schema apply` subcommands. Off by default (Soft).
- HTTP remote schema apply explicitly rejects the flag for now
  (CLI-only; HTTP parity is a separate small follow-up that adds
  the field to SchemaApplyRequest + the server handler).

Engine (omnigraph.rs + schema_apply.rs):
- New SchemaApplyOptions { allow_data_loss: bool } public struct
  (Default = all false), re-exported via omnigraph::db::SchemaApplyOptions.
- New public methods: plan_schema_with_options and
  apply_schema_with_options. Existing plan_schema/apply_schema are
  now thin wrappers that pass Default::default().
- promote_drops_to_hard: post-plan walk that promotes every
  DropMode::Soft step to DropMode::Hard when the flag is set.
  Keeps the compiler's plan_schema_migration signature unchanged
  (no breaking change for tests / callers).
- Apply path: both Drop arms accept Hard mode; behavior is
  identical to Soft inside the apply loop. The DIFFERENCE is the
  new hard_cleanup_targets: Vec<(String, String)> accumulator,
  populated for every Hard variant with (table_key, full_dataset_uri).
- Post-publish cleanup: a new loop after the manifest commit
  iterates hard_cleanup_targets and calls cleanup_old_versions
  (before_timestamp = now) on each dataset URI. Best-effort —
  the apply is already durable; cleanup failure is logged via
  tracing::warn rather than failing the apply.
- New cleanup_dataset_old_versions helper inlines the Lance
  cleanup_old_versions call against a dataset URI.

Behavioral details:
- DropProperty Hard: stage_overwrite produced a new dataset version
  without the column. cleanup_old_versions removes the prior version
  (and reclaims unique fragments). After Hard apply,
  snapshot_at_version(pre_drop).open(table_key) FAILS because the
  prior dataset version was reclaimed.
- DropType Hard: no per-table write happens (the change is the
  manifest tombstone). cleanup_old_versions on the orphan dataset
  is a no-op in the immediate term (no prior versions to clean
  since the dataset wasn't modified by this apply). The dataset
  directory persists. Full orphan-cleanup is a documented
  follow-up — the user-facing contract is "data is unreachable
  via omnigraph" (manifest entry tombstoned), which is satisfied.

Tests (tests/schema_apply.rs):
- apply_schema_with_allow_data_loss_promotes_drops_to_hard:
  default plan emits Soft; with options.allow_data_loss=true,
  plan emits Hard; apply succeeds.
- apply_schema_hard_drops_property_makes_prior_version_unreachable:
  Hard drop succeeds, current snapshot lacks the column, and
  snapshot_at_version(pre_drop).open("node:Person") FAILS (Lance
  prior version reclaimed by cleanup).
- apply_schema_hard_drops_node_and_edge_with_flag_succeeds: both
  Node and Edge DropType variants are promoted to Hard with the
  flag; apply succeeds; current manifest entries gone. (Orphan
  dataset directory cleanup deferred.)

Test results:
- cargo test -p omnigraph-compiler --lib: 239 passed
- cargo test -p omnigraph-engine --test schema_apply: 14 passed
  (3 new Hard tests + 11 existing soft/regression tests)
- cargo test -p omnigraph-server --test openapi: 60 passed (no
  HTTP API surface changes in this commit; OpenAPI parity follow-up
  noted)

v1 status: complete for CLI/embedded use. MR-694 chassis epic +
MR-700 DropType/DropProperty ticket can close after this lands.

Known follow-ups (separate small PRs):
- HTTP parity: extend SchemaApplyRequest with allow_data_loss field,
  thread through server handler, regenerate openapi.json.
- Orphan-dataset directory deletion for DropType Hard (currently
  the dataset directory persists; cleanup_old_versions doesn't
  remove it because the dataset wasn't modified).
- MR-948 substrate alignment: swap DropProperty Soft from
  stage_overwrite to Dataset::drop_columns (catalog_only vs
  full_rewrite cost class).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fixup: use bail! from color_eyre::eyre instead of anyhow

The remote-rejection branch in SchemaCommand::Apply used
anyhow::anyhow! which isn't in scope; the CLI's Result type is
color_eyre::eyre::Result and bail! is already imported.

Caught by CI Test Workspace job on PR #100.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:12:46 +03:00
Andrew Altshuler
e98347eb7b
schema-lint chassis v1.0: DropProperty Soft + code-tagged diagnostics (MR-694) (#90)
* schema-lint chassis v1 (WIP): tier surfacing + plan doc

First commit of the chassis v1 branch. Lands a small, foundational
slice without behavior change, plus a planning doc that lays out the
remaining 7 commits in sequence so the PR can be reviewed
incrementally.

This commit:

- Adds SchemaMigrationStep::diagnostic() returning the full
  &'static DiagnosticCode (family + tier + severity) for
  UnsupportedChange steps with codes. Renderers can now reach the
  tier without re-implementing the code → tier lookup.

- CLI `omnigraph schema plan` output now displays tier alongside
  code:

    unsupported change on node:Person.age [OG-DS-104, destructive]:
        removing property 'Person.age' is not supported in schema
        migration v1

  Operators see at-a-glance the kind of risk each rejection
  represents — not just the rule identifier.

- No behavior change. All 11 existing schema_apply tests still pass.

Planning doc at docs/schema-lint-v1-plan.md tracks the 7 remaining
commits to bring v1 to feature-complete:

  1. (this commit) Tier surfacing in plan output.
  2. Soft / Hard mode enum on drop steps.
  3. Tombstone fields on catalog IR.
  4. Planner emits DropProperty { Soft } by default.
  5. Apply path implements Soft mode.
  6. Convert PR #62 destructive-rejection tests.
  7. --allow-data-loss flag + Hard mode.
  8. (optional) Tombstone unhide / restore command.

Delete the planning doc when v1 lands. Intentionally checked in to
the WIP branch so the scope is reviewable; not intended as a
permanent doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* schema-lint v1 commit 2: DropMode + dormant Drop* variants

Second commit of the chassis v1 branch. Lands the type-level shape
of soft/hard drops without wiring them up. Variants are reachable
from emitters but the planner doesn't produce them yet; the apply
path returns an explicit not-yet-implemented error if one shows up
via deserialization.

Added:

- `DropMode { Soft, Hard }` — orthogonal to `SafetyTier`. Tier
  classifies the rule's risk class; mode is the operator's intent
  for data treatment.
    - `Soft` → catalog tombstone, data retained. Tier: safe.
    - `Hard` → Lance-level removal. Tier: destructive; will require
      --allow-data-loss to apply (commit 7).

- `SchemaMigrationStep::DropType { type_kind, name, mode }` and
  `SchemaMigrationStep::DropProperty { type_kind, type_name,
  property_name, mode }` variants.

- Re-export `DropMode` from `omnigraph_compiler::DropMode` so
  downstream crates don't reach into the catalog submodule.

- CLI `render_schema_plan_step` arms for both variants, surfacing
  the mode in plan output: `drop property 'Person.age' of node
  'Person' (soft mode)`.

- `apply_schema_with_lock` exhaustive match arm for the two new
  variants that returns `manifest_internal` with a clear
  not-yet-implemented message. If a SchemaIR JSON containing
  Drop{Type,Property} arrives (e.g. from a future tool or hand-
  written), the apply path fails explicitly rather than silently
  misclassifying.

- Two new in-source tests:
    - `drop_steps_round_trip_through_serde` — pins the wire shape
      for all four (variant × mode) combinations.
    - `drop_mode_serde_uses_snake_case` — pins external-tool-
      friendly serialization (`"soft"` / `"hard"`).

Build: clean, only pre-existing warnings.
Tests:
- omnigraph-compiler schema_plan: 6/6 (4 existing + 2 new).
- omnigraph-engine schema_apply: 11/11 (unchanged — planner still
  emits UnsupportedChange for removal paths).

Next commit (commit 3 per docs/schema-lint-v1-plan.md): add the
`tombstoned: bool` fields to NodeIR / EdgeIR / PropertyIR for the
catalog representation of soft-mode tombstones.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* plan doc: reframe v1 around Lance native drop_columns

After a substrate audit of the Lance data-evolution guide on
2026-05-13, the v1 plan was simplified. Two key findings:

1. Lance's `drop_columns()` is already metadata-only and reversible
   via time travel until cleanup. No need for a parallel
   `tombstoned: bool` field in our catalog IR — Lance's version
   graph IS the tombstone.

2. The full schema_apply substrate migration (add_columns,
   drop_columns, alter_columns vs. stage_overwrite across all step
   types) is consolidated in MR-948 as a sibling issue. v1 only
   uses the relevant slice (drop_columns for OG-DS-1XX).

Net plan changes:

- Commit 3 (original): tombstone fields on catalog IR → dropped.
  No catalog IR change needed. The Lance drop_columns commit IS the
  tombstone.

- Commit 5 (original): apply path writes tombstoned: true → replaced
  with: apply path calls Dataset::drop_columns([name]).

- Commit 7 Hard mode: stage_overwrite removing the column → replaced
  with: drop_columns + compact_files + cleanup_old_versions. Same
  APIs omnigraph cleanup already uses.

- Commit 8 (original): omnigraph schema unhide → dropped. Time
  travel is the undo (omnigraph snapshot --at <commit>).

Net result: 8 commits → 5 commits. ~250 LoC less surface. More
substrate-aligned.

The chassis types from commit 2 (DropMode enum, DropType /
DropProperty variants) are kept exactly as designed; only the
implementation strategy changed.

The Lance docs quote is included in the doc so future readers see
the substrate behavior cited verbatim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* schema-lint v1 commit 3: emit + apply DropProperty { Soft }

Wire the dormant DropProperty variant end-to-end for the Soft case.
Per docs/schema-lint-v1-plan.md, commit #3 of the schema-lint chassis
v1 series (MR-694).

Planner (schema_plan.rs):
- plan_properties: emit DropProperty { type_kind, type_name,
  property_name, mode: Soft } instead of UnsupportedChange when a
  property exists in accepted but not in desired. Plan is now
  supported = true for drop-only changes.

Apply (schema_apply.rs):
- Route DropProperty { Soft } through rewritten_tables. The existing
  batch_for_schema_apply_rewrite path already iterates the *target*
  schema fields, so a property absent from desired_catalog is
  naturally projected away. The prior Lance version retains the
  dropped column for time-travel reversibility (until cleanup runs).
- DropType still errors (lands in commit #4 with different mechanics:
  __manifest entry removal instead of column projection).
- DropProperty { Hard } still errors (lands in commit #5 with
  --allow-data-loss CLI flag + immediate compact_files +
  cleanup_old_versions).

Tests:
- Planner unit test plan_emits_soft_drop_for_removed_nullable_property
  asserts the variant emission + supported = true + no UnsupportedChange.
- Integration test apply_schema_drops_a_nullable_property_softly_
  preserves_prior_version (replaces the former
  apply_schema_rejects_dropping_a_property_with_data) asserts:
  (a) plan contains DropProperty { Soft }
  (b) apply succeeds + manifest advances + row count unchanged
  (c) current dataset schema lacks the dropped column
  (d) snapshot_at_version(pre_drop) still has the dropped column
  (e) reopen consistency — drop preserved across engine restart

Recovery: rides on SidecarKind::SchemaApply per MR-847. No new
sidecar kind needed; the entire apply path is already sidecar-wrapped.

Substrate alignment: this commit uses the stage_overwrite full-rewrite
path (full_rewrite cost class) rather than Lance native drop_columns
(catalog_only cost class). MR-948 is the follow-up substrate-alignment
refactor that introduces a LanceColumnOp surface and switches the
metadata-only case onto drop_columns. Functional outcome is identical;
cost-class improvement deferred.

Test results:
- cargo test -p omnigraph-compiler --lib: 238 passed
- cargo test -p omnigraph-engine --test schema_apply: 11 passed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: move schema-lint-v1-plan into docs/dev/ + add to index

Post-rebase fixup for the docs split (#93). The plan doc was added
to docs/ at the top level before main reorganized to docs/{user,dev}/.
This moves it into docs/dev/ and adds an entry to docs/dev/index.md
under a new "Active Implementation Plans" section so the
check-agents-md.sh link check passes.

Per the original commit message (617a77d), the plan doc is intentionally
temporary — it will be deleted when v1 lands.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 16:30:03 +03:00
Andrew Altshuler
c142dafdf3
schema-lint chassis v0: code-tagged diagnostics (MR-694) (#87)
First slice of the schema-lint chassis. Adds stable `OG-XXX-NNN`
codes to schema-migration rejections so operators can suppress, look
up, and filter on identifiers rather than free-text prose. Atlas-style
chassis adapted to omnigraph's typed-IR substrate (no SQL injection
vector, no per-engine locks, native edge/vector/embedding types).

What's in v0:

- New `omnigraph-compiler/src/lint/` module with:
  - `diagnostic.rs` — Family / SafetyTier / Severity enums covering ten
    families (DS, MF, CD, BC, NM, OW, NL, VE, ED, LK). Only DS and MF
    are populated in this PR.
  - `codes.rs` — 8 DiagnosticCode constants (OG-DS-101..105,
    OG-MF-103, OG-MF-104, OG-MF-106). Five of the eight are wired to
    real emission sites; the other three are reserved.
  - Unit tests for catalog invariants: codes unique, prefix matches
    family, suffixes are 3-digit, destructive defaults to error,
    lookup() works, EMITTED_IN_V0 codes exist in ALL_CODES.

- `SchemaMigrationStep::UnsupportedChange` gains an optional
  `code: Option<String>` field. New `unsupported_error_message()`
  helper prefixes the message with `[code]` when present.

- 5 of 17 existing rejection paths now carry codes:
  - `removing node type` → OG-DS-102
  - `removing edge type` → OG-DS-103
  - `removing property` → OG-DS-104
  - `adding required property without backfill` → OG-MF-103
  - `changing property type` → OG-MF-106
  Remaining 12 paths carry `code: None` and are tagged as future work.

- `schema_apply` surfaces the formatted error (with `[code]` prefix);
  CLI `omnigraph schema plan` renders the code on the
  `unsupported change on <entity>` line.

- PR #62 destructive-rejection tests in `tests/schema_apply.rs` now
  assert on the stable code (`msg.contains("OG-DS-104")`) instead of
  the error-message substring. 11/11 tests pass.

- New `docs/schema-lint.md` documents the v0 catalog + the 10 families
  + Atlas prior art. AGENTS.md index updated.

What's explicitly NOT in v0 (subsequent PRs):

- No severity config in `omnigraph.yaml` (MR-694 §2).
- No `@allow(OG-XXX-NNN, "rationale")` suppression directive (§3).
- No `--allow-data-loss` flag or destructive-tier enforcement.
- No new `SchemaMigrationStep` variants (soft/hard drops, default,
  widen/narrow). MR-700, MR-697 land those.
- No pre-migration checks (MR-941).
- No CD / VE / LK / NM family rules (MR-942..945).
- No CI integration (MR-946).

Tests: 235 compiler tests, 11 schema_apply integration tests, 14
lint module tests, 55 CLI tests — all green.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:08:18 +03:00
Ragnor Comerford
fcb47620d3
mr-686: bundle PR 0/1a/1b foundation + PR 2 catalog/schema_source ArcSwap
Bundles the working-tree state from the prior session (PR 0 bench harness,
PR 1a audit_actor_id removal, PR 1b WriteQueueManager + writer integration)
together with the first half of PR 2's interior-mutability foundation
(catalog and schema_source wrapped in Arc<ArcSwap<...>>). The two streams
intermix in 7 of the same files, so splitting via git add -p was
impractical. Subsequent PR 2 steps land as separate atomic commits.

PR 0 — server-level concurrent /change bench harness
  - crates/omnigraph-server/examples/bench_concurrent_http.rs (new)
  - .context/bench-results/{baseline-main,after-pr1}/ (gitignored)

PR 1a — drop the audit_actor_id field, thread per-call
  - removed Omnigraph::audit_actor_id and the swap-restore patterns in
    mutation.rs, merge.rs, loader/mod.rs
  - actor_id: Option<&str> threaded through MutationStaging::finalize,
    mutate_with_current_actor, ingest_with_current_actor,
    branch_merge_impl, branch_merge_on_current_target,
    commit_prepared_updates*, record_merge_commit,
    commit_updates_on_branch_with_expected
  - apply_schema and ensure_indices_for_branch pass None (system-attributed)

PR 1b — per-(table_key, branch) write queue + revalidation + sidecar
  - new crates/omnigraph/src/db/write_queue.rs with WriteQueueManager,
    acquire/acquire_many, sorted+deduped acquisition; 6 unit tests
  - Arc<WriteQueueManager> field on Omnigraph + db.write_queue() accessor
  - MutationStaging::finalize split into stage_all (Phase A, no queue)
    and StagedMutation::commit_all (Phase B, acquire_many + revalidate
    pins + sidecar + commit_staged); guards held across publisher
  - delete-only mutations now emit recovery sidecars; revalidation
    extended to inline_committed tables
  - branch_merge_on_current_target, apply_schema_with_lock, and
    ensure_indices_for_branch acquire per-table queues for their
    touched tables

PR 2 Step B (partial) — catalog and schema_source via ArcSwap
  - catalog: Catalog -> Arc<ArcSwap<Catalog>>
  - schema_source: String -> Arc<ArcSwap<String>>
  - public accessors return Arc<Catalog> / Arc<String>; readers bind
    locally where the borrow has to outlive an expression
  - new pub(crate) store_catalog / store_schema_source helpers replace
    the field assignments in apply_schema and reload_schema_if_source_changed
  - 117 tests across lifecycle/end_to_end/branching/runs pass; engine
    lib + workspace compile clean

Coordinator wrap (Mutex) and the &mut self -> &self engine API
conversion follow in subsequent commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:22:38 +02:00
Ragnor Comerford
35be20cb05
MR-771: demote Run to direct-publish via expected_table_versions CAS
mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 08:52:50 +02:00
Andrew Altshuler
74eb5a5380
Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46)
* Parallel per-type load writes + omnigraph optimize/cleanup CLI

## MR-677.3 — parallel per-type load writes

The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).

## MR-676 — `omnigraph optimize` and `omnigraph cleanup`

New CLI subcommands that walk every node + edge table in the repo:

- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
  table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
  runs Lance `cleanup_old_versions` to prune historical manifests +
  unique fragments. Requires `--confirm` because it's destructive.
  Supports both count-based and time-based retention (or both AND'd
  together). Time uses chrono `DateTime<Utc>` (added as a workspace
  dep, default-features off).

Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.

Public API on `Omnigraph`:

  pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
  pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
      -> Result<Vec<TableCleanupStats>>

All 10 existing loader tests still pass.

Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-25 14:22:14 +03:00
andrew
be520f31f4 Polish schema endpoint: rename show, align field name, add tests
Review feedback on #23, applied on top of the original commit:

- Rename the CLI subcommand from `schema get` to `schema show` to match
  the existing `run show` / `commit show` convention. A `#[command(alias
  = "get")]` preserves muscle memory for anyone who already typed `get`.
- Rename `SchemaGetOutput` → `SchemaOutput` and its field `source` →
  `schema_source`, so the get response and the apply request use the
  same field name for the same concept.
- Use `println!` instead of `print!` in the CLI so the shell prompt
  doesn't land on the last line of schema output.
- Add three integration tests on `/schema`: happy path (no auth),
  401 when bearer is required but missing, 403 when the policy grants
  the actor branch_create but not read.

Follow-ups left for a separate PR: include `schema_ir_hash` and
`schema_identity_version` in the response payload so clients can do
drift detection and the server can set an ETag; and a fast-path local
read that skips `Omnigraph::open()` when only the schema source is
needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 00:30:46 +03:00
Claude
0c4df674fa
Add schema get command to CLI and HTTP API
Exposes the existing schema_source() method via a new `omnigraph schema get`
CLI subcommand and a `GET /schema` API endpoint, allowing users to retrieve
the current accepted schema from any graph repository.

https://claude.ai/code/session_01UYybeBQks3fz3RJrTHtwQw
2026-04-16 21:15:17 +00:00
andrew
1a26e2e654 Rename config targets to graphs 2026-04-14 04:12:14 +03:00
andrew
1bf55fa52d Add query lint and check commands 2026-04-13 00:37:44 +03:00
andrew
92fa3189f7 Add schema apply command and policy support 2026-04-12 04:01:14 +03:00
andrew
4b058b9813 Fix CLI ergonomics and stream export output 2026-04-11 19:01:48 +03:00
andrew
338289656a Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00