mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-12 01:45:14 +02:00
11 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
eab99e6f48
|
fix: close validated init and multi-graph gaps | ||
|
|
1aafbd9dc6
|
mr-668: fold multi-graph work into v0.6.0 (no separate v0.7.0 release)
The branch had bumped workspace versions to 0.7.0 and added a dedicated `docs/releases/v0.7.0.md` for the multi-graph work. Per scope decision: ship the graph-rename and the multi-graph mode in one v0.6.0 release. Changes: * Workspace versions bumped 0.7.0 → 0.6.0 in every crate manifest (`omnigraph`, `omnigraph-compiler`, `omnigraph-policy`, `omnigraph-server`, `omnigraph-cli`) and their internal `path = ..., version = "..."` dependency constraints. * `docs/releases/v0.7.0.md` content merged into `docs/releases/v0.6.0.md`, retargeted to a single coherent v0.6.0 release note covering both the graph terminology rename and the multi-graph server mode. The original v0.7.0.md is deleted. * All `v0.7.0` / `0.7.0` doc and comment references throughout `crates/`, `docs/`, `AGENTS.md`, and `openapi.json` retargeted to `v0.6.0` / `0.6.0`. `Cargo.lock` regenerated to match. * OpenAPI spec regenerated via `OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test openapi openapi_spec_is_up_to_date` — `"version": "0.6.0"` now. Verification: * `cargo build --workspace` — clean (6 pre-existing engine warnings only). * `cargo test --workspace --locked` — zero failures across all 39 test result groups. * `bash scripts/check-agents-md.sh` — passes (34 links / 33 docs). * `grep -rn "0\.7\.0\|v0\.7\.0" --include='*.rs' --include='*.md' --include='*.json' --include='*.toml' .` returns no workspace hits. The three remaining `0.7.0` strings in `Cargo.lock` belong to unrelated 3rd-party crates (`pem-rfc7468`, `radium`, `rand_xoshiro`). The git tag and crates.io publish happen later — this commit just consolidates the surface so the eventual release is one coherent v0.6.0 covering all the work since v0.5.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
937fd6382d
|
mr-668: remove POST /graphs and CLI graphs create (defer runtime graph mgmt)
The POST /graphs runtime-create endpoint shipped in PR 7/10 has three
unresolved high-severity bugs:
- flock-on-renamed-inode race: the YAML flock is taken on
omnigraph.yaml itself, then a temp file is renamed over it.
Cross-process writers end up locking different inodes — both
believing they hold exclusive access.
- duplicate-check outside the file lock: precheck runs against
the in-memory registry only; the locked closure does
config.graphs.insert(...) unconditionally. Concurrent same-id
POSTs can persist the loser in YAML while the in-memory registry
keeps the winner — they disagree after restart.
- best_effort_cleanup_init_artifacts deletes _schema.pg /
_schema.ir.json / __schema_state.json on any init failure. An
accidental re-init against an existing graph's URI destroys its
schema; subsequent open() fails at read_text(_schema.pg).
The correct fix is a Lance-style cluster catalog (reserve → init →
publish with recovery sidecars), parallel to the engine's existing
__manifest discipline. That work is out of scope for v0.7.0.
For now, disable runtime add/remove from the network and CLI surface.
Operators add graphs by editing omnigraph.yaml and restarting. The
GET /graphs read-only enumeration stays.
Removed:
- POST /graphs handler + router fragment + utoipa registration
- 13 post_graphs_* server tests + 3 composite POST tests +
multi_mode_app_with_real_config / post_graph helpers
- CLI omnigraph graphs create subcommand + its handler + cli.rs tests
- system_remote.rs combined list+create test trimmed to list-only
- YAML rewrite infra: rewrite_atomic[_with_modify], RewriteAtomicError,
staging_path, hash_config_file, AppState::config_hash field +
threading through new_multi and open_multi_graph_state
- fs2 dependency (verified absent from cargo tree)
- sha2/fs2 imports in config.rs (only the rewrite path used them)
- Cedar PolicyAction::GraphCreate variant + "graph_create" match arms
+ action def in Cedar schema + graph_create_action_authorizes_against_server_resource test
- GraphCreateRequest / GraphCreateResponse / GraphSchemaSpec /
GraphPolicySpec API types (only the POST handler / CLI imported them)
Kept:
- GET /graphs (read-only enumeration) and graph_list Cedar action
- omnigraph graphs list CLI subcommand
- All multi-graph startup, mode inference, cluster routes,
per-graph + server-level Cedar policies
- server_settings_drive_multi_graph_startup_end_to_end (the test
that covers operator-authored YAML + restart — the path that
survives)
- best_effort_cleanup_init_artifacts and the three init failpoints
(still reachable from CLI `omnigraph init`; preflight fix deferred
as a follow-up)
- GraphRegistry::insert and its concurrency tests — production
callers gone, but the method is the natural seam for the future
cluster-catalog work
Also fixed (transcript issue 4):
- ALWAYS_FLAT_PATHS now includes /graphs so multi-mode OpenAPI
advertises the management route correctly (was previously rewritten
to /graphs/{graph_id}/graphs)
- multi_mode_openapi_keeps_healthz_flat → renamed to
multi_mode_openapi_keeps_management_paths_flat, asserts both
/healthz and /graphs stay flat
- multi_mode_openapi_prefixes_operation_ids_with_cluster skips
/graphs in addition to /healthz
Doc fixes:
- docs/user/cli.md: graphs list example was --target http://...,
but --target is a config-graph-name lookup; corrected to --uri.
Removed the graphs create example.
- docs/user/server.md: dropped POST /graphs row, "omnigraph.yaml
ownership", and "POST /graphs body shape" sections. Added a
paragraph stating runtime add/remove is not exposed in v0.7.0.
- docs/user/policy.md: dropped graph_create action; reworded the
"Configuration" line to clarify that server-scoped rules (graph_list)
take neither branch_scope nor target_branch_scope.
- docs/releases/v0.7.0.md: rewrote release narrative — multi-graph
mode ships; runtime add/remove deferred.
- AGENTS.md: HTTP server bullet and capability matrix row updated to
reflect read-only GET /graphs and the operator-edit workflow.
- openapi.json regenerated; /graphs has only .get, no .post.
Diff: 17 files, +123 −1525 LOC.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d11c18fb27
|
mr-668: composite e2e tests, race fix, v0.7.0 release (PR 9/10)
PR 9 — the final integration PR for MR-668 multi-graph server work.
Closes the v0.7.0 release.
Composite lifecycle tests (closes gaps flagged in PR 7's coverage
review):
- `multi_graph_lifecycle_post_query_restart_persistence` — POST a
graph, query it via cluster route, reload the config from disk
and confirm `load_server_settings` sees the rewritten YAML.
Validates the "restart resolves orphans" failure-mode story.
- `per_graph_policy_enforced_on_post_created_graph` — POST a graph
with a per-graph policy attached, then send authenticated read
and change requests. Per-graph Cedar enforcement fires correctly
on a POST-created graph (engine-layer policy reinstalled via
`Omnigraph::with_policy` inside the create flow).
- `concurrent_post_graphs_distinct_ids_all_succeed` — 4 concurrent
POSTs with distinct graph_ids all return 201. Caught a real
race in `rewrite_atomic` (see below).
Race fix — `rewrite_atomic_with_modify`:
The first composite test surfaced a real bug. The old
`rewrite_atomic(path, new_config, expected_hash)` captured the
baseline hash OUTSIDE the flock, then called rewrite_atomic which
re-acquired it inside. Under concurrent writers:
- POST A: captures baseline H0, calls rewrite_atomic.
- POST B: captures baseline H0 too (before A's update lands).
- A: acquires flock, on-disk == H0, writes H1, releases.
- A: updates baseline H0 → H1.
- B: tries to acquire flock — waits.
- B: acquires flock. On-disk is now H1. Expected (captured
before A finished) is H0. MISMATCH → spurious Drift error.
Worse: even if the timing happens to align, B's `updated` config
was constructed from BYTES read before the flock. B writes a config
that doesn't include A's new graph — silent data loss.
The fix: new `config::rewrite_atomic_with_modify(path, baseline,
modify)` takes a closure. Inside the flock + baseline mutex:
1. Read on-disk bytes, hash, compare to baseline.
2. Parse on-disk YAML.
3. Call `modify(parsed)` to produce the new config — receives
fresh on-disk state, returns the modification.
4. Serialize + write + fsync + rename + update baseline.
Everything is read-modify-write under the same critical section.
Concurrent writers serialize cleanly. Test confirmed this is no
longer a race.
The old `rewrite_atomic(path, new_config, expected_hash)` API stays
for tests that don't need the read-modify-write shape; the POST
handler switches to the new shape.
Version bump v0.6.0 → v0.7.0:
- All 5 `crates/*/Cargo.toml` (compiler, engine, policy, cli, server)
plus their inter-crate `path` dep version constraints.
- `Cargo.lock` regenerated by `cargo build --workspace`.
- `AGENTS.md` "Version surveyed" line, capability matrix HTTP-server
row updated to mention multi-graph + cluster routes + atomic YAML
rewrite.
- `openapi.json` regenerated.
Docs:
- `docs/releases/v0.7.0.md` (new) — release notes with breaking
changes, new features, deferred items (DELETE, `delete_prefix`,
actor forwarding), and the single→multi migration recipe.
- `docs/user/server.md` — substantial section additions for the
two modes, mode inference, cluster endpoint table, management
endpoints, `omnigraph.yaml` ownership contract, `POST /graphs`
body shape + status codes.
- `docs/user/cli.md` — `omnigraph graphs list/create` section,
deferred-DELETE note.
- `docs/user/policy.md` — server-scoped Cedar actions
(`graph_create`, `graph_list`), per-graph vs server-level policy
composition, example server-level policy.
Workspace test pass: 573 tests green across all crates. Zero
failures. MR-731 spoof regression still pinned and passing across
the entire 10-PR series.
This commit closes MR-668. v0.7.0 is ready for tagging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cc2412dc65
|
Rename repo terminology to graph (#118)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
|
||
|
|
aadfa11ecb
|
schema: HTTP allow_data_loss exposure + e2e drop coverage (MR-694 follow-up) (#107)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
The schema-lint chassis v1.2 (PR #100) shipped `--allow-data-loss` on the CLI, but `SchemaApplyRequest` had no equivalent field — Hard-mode drops were CLI-only. This commit closes that feature gap and adds e2e test coverage for drop modes across HTTP + CLI, plus data preservation on additive apply, plus a CLI↔SDK plan-parity assertion. Feature gap closed: - `crates/omnigraph-server/src/api.rs` — added `allow_data_loss: bool` (default false via `#[serde(default)]`) to `SchemaApplyRequest`. Added `Default` derive so test usages can use `..Default::default()`. - `crates/omnigraph-server/src/lib.rs` — `server_schema_apply` now constructs `SchemaApplyOptions { allow_data_loss: request.allow_data_loss }` and threads through to `apply_schema_as`. - `crates/omnigraph-cli/src/main.rs` — remote-URI schema-apply path used to bail with "--allow-data-loss not yet supported on remote"; now forwards the flag into the JSON payload so the CLI behaves identically against local and remote URIs. - `openapi.json` — regenerated; only diff is the new field on `SchemaApplyRequest`. Tests added (8 new): * `crates/omnigraph-server/tests/server.rs` (+5): - `schema_apply_route_soft_drops_property_via_http` — POST schema removing nullable property, verify catalog reflects the drop AND `snapshot_at_version(pre)` still has `age` in the field list (time-travel reachability is the Soft contract). - `schema_apply_route_soft_drops_node_type_via_http` — POST schema removing `Company` node + cascading `WorksAt` edge. - `schema_apply_route_hard_drops_property_with_allow_data_loss` — POST with `allow_data_loss: true`, verify plan step reports `mode: hard`. - `schema_apply_route_keeps_drops_soft_without_flag` — same schema without flag, verify `mode: soft`. Pins default semantics against accidental Hard promotion. - `schema_apply_route_additive_property_preserves_existing_rows` — load fixture, POST adding nullable property, verify row count preserved (SDK suite covers data preservation on drops + renames; additive AddProperty wasn't pinned). Plus helpers `schema_without_age` and `schema_without_company`. * `crates/omnigraph-cli/tests/cli.rs` (+3): - `schema_apply_allow_data_loss_flag_promotes_drops_to_hard` — CLI `omnigraph schema apply --allow-data-loss --schema X.pg --json`, verify plan step has `mode: hard`. - `schema_apply_without_allow_data_loss_keeps_soft_drops` — without flag, verify Soft. - `schema_plan_parity_cli_and_sdk` — same `.pg` source through `Omnigraph::plan_schema` (SDK) and `omnigraph schema plan --json` (CLI), assert the steps array is byte-identical post-JSON. HTTP has no `/schema/plan` endpoint; apply-side parity is implicitly covered by the HTTP drop tests + CLI drop tests using identical fixtures. Docs: - `docs/user/schema-language.md` — new "Destructive drops" section documenting Soft vs Hard semantics and that `allow_data_loss` is now honored uniformly across CLI / HTTP / SDK. Verification: every new test passes; full `cargo test --workspace --locked` green; `scripts/check-agents-md.sh` passes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
f3f2a051ba
|
policy: server 3-state default-deny matrix (MR-723) (#105)
Closes the "tokens but no policy" trap. Pre-MR-723, an operator who configured bearer tokens and forgot to set policy.file got a server that required auth and then permitted every action — the illusion of protection. After MR-723, that configuration is default-deny: only `read` actions succeed; every other action returns HTTP 403. Three startup states, classified deterministically: - **Open** — no tokens, no policy. Requires explicit `--unauthenticated` flag or `OMNIGRAPH_UNAUTHENTICATED=1`; otherwise `serve()` refuses to start. Forces the operator to opt in to "fully open dev mode" so it can't happen accidentally. - **DefaultDeny** — tokens configured, no policy. `authorize_request` rejects every action except `Read` with 403. The warn-log on startup names the misconfiguration explicitly. - **PolicyEnabled** — policy file configured. Cedar evaluates every request, unchanged from pre-MR-723. What landed: - `ServerConfig.allow_unauthenticated: bool` + `--unauthenticated` flag on the `omnigraph-server` bin + `OMNIGRAPH_UNAUTHENTICATED` env var (`load_server_settings` honors both). - New `classify_server_runtime_state(has_tokens, has_policy, allow_unauthenticated) -> Result<ServerRuntimeState>` pure function. `serve()` calls it before opening the engine and bails with a clear error when the operator hits the no-tokens-no-policy-no-flag cell. - `authorize_request` state-2 branch: when `policy_engine()` is None but the bearer-auth middleware delivered an authenticated actor, any action other than `Read` returns 403 with a message that names the misconfiguration. - `AppState::with_policy_engine(self, engine)` builder method so integration tests that need a custom workload (`new_with_workload`) can still install a permit-all policy without a new constructor. - `app_for_loaded_repo_with_auth(token)` and `app_for_loaded_repo_with_auth_tokens(tokens)` test helpers now install a permit-all policy alongside tokens — they previously represented the "tokens but no policy" state that MR-723 makes default-deny, and tests that don't care about policy were inadvertently coupled to the loophole. Tests: - `classify_*` unit tests (3) — every cell of the matrix. - `default_deny_mode_allows_read_for_authenticated_actor` — GET /snapshot succeeds with bearer token + no policy. - `default_deny_mode_rejects_change_with_forbidden` — POST /change rejected with 403 + "default-deny" message. - `default_deny_mode_rejects_schema_apply_with_forbidden` — POST /schema/apply rejected with 403 + "default-deny" message. - New `app_for_repo_with_auth_tokens_only(schema, tokens)` helper builds the State-2 fixture without policy. The pre-MR-723 helpers `app_for_loaded_repo_with_auth*` shift semantics to "tokens + permit-all" so existing tests retain their original intent. docs/user/policy.md: new "Server runtime states (MR-723)" section documents the matrix and the explicit `--unauthenticated` opt-in. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
a275306a15
|
policy: CLI policy injection — local writes go through engine enforce (MR-722) (#104)
Closes the CLI side of the policy chassis fan-out. Before this commit, CLI direct-engine writes bypassed Cedar entirely because the CLI never called `Omnigraph::with_policy(...)` for non-`policy validate|test|explain` subcommands. After this commit, every CLI direct-engine writer (change, load, ingest, branch create/delete/merge, schema apply) opens the engine via a new `open_local_db_with_policy(uri, &config)` helper that installs the configured `PolicyEngine` when `policy.file` is set, and threads the resolved actor through to the `_as` writer methods. Actor identity resolution: - New top-level `--as <ACTOR>` global flag on the CLI overrides config. - New `cli.actor` field in `omnigraph.yaml` provides a default actor. - Precedence: `--as` > `cli.actor` > None. - When policy is configured and neither is set, the engine-layer footgun guard fires and the write is denied — silent bypass via "I forgot the actor" is exactly what the guard prevents. - Remote HTTP writes ignore both — bearer-token-resolved server-side. Helpers added in main.rs: - `open_local_db_with_policy(uri, &config) -> Result<Omnigraph>` — opens the DB and installs the PolicyEngine when configured. Without policy this is identical to a bare `Omnigraph::open`. - `resolve_cli_actor(cli_as, &config) -> Option<&str>` — implements the flag > config > None precedence. Engine: added `load_file_as` to the loader as the actor-aware mirror of `load_file`, so CLI file-path loads flow through the same enforce gate as in-memory `load_as` calls. Test rewrite: `local_cli_policy_tooling_is_end_to_end_while_local_writes_stay_unenforced` was the explicit assertion of the pre-chassis hole. Renamed and split: - `local_cli_policy_tooling_is_end_to_end` — sanity for the read-only policy CLI surfaces (validate/test/explain), unchanged behavior. - `local_cli_change_enforces_engine_layer_policy` — the new assertion: policy installed + no actor → footgun-guard denial; `--as act-bruno` on protected main → Cedar denial; `--as act-ragnor` (admins-write rule) on main → permit, write committed. POLICY_E2E_YAML gains an `admins-write` rule so the permit case has a non-trivial actor to exercise. docs/user/policy.md updated with `cli.actor` + `--as <ACTOR>` usage. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
da42beec41
|
policy: chassis fan-out — _as variants on the remaining 6 writers (MR-722) (#103)
PR #102 wired apply_schema_as. This PR completes the chassis-side coverage so every public mutating engine entry point hits the same Omnigraph::enforce(action, scope, actor) gate regardless of transport: - mutate_as → enforce(Change, Branch(branch), actor) - load_as → enforce(Change, Branch(branch), actor) - ingest_as → enforce(Change, Branch(branch), actor); also threads actor through the implicit branch_create_from_as so fresh-branch ingest correctly hits BranchCreate too - branch_create_as → enforce(BranchCreate, TargetBranch(name), actor) - branch_create_from_as → enforce(BranchCreate, BranchTransition { source, target }, actor) - branch_delete_as → enforce(BranchDelete, TargetBranch(name), actor) - branch_merge_as → enforce(BranchMerge, BranchTransition { source, target }, actor) Three new _as variants for branch ops (create, create_from, delete) that had no actor surface before; existing actor-less variants delegate with actor=None so the no-policy path is a strict no-op. HTTP handlers updated to thread the resolved actor into the new _as variants for branch_create and branch_delete (was previously dropped). 14 new SDK chassis tests (one allow + one deny pair per wired writer); the existing 4 apply_schema_as tests stay. All 18 pass. docs/user/policy.md updated to describe engine-wide enforcement and the coarse-vs-fine layer split (engine = action gate, query layer per-row = MR-725 future). AGENTS.md capability matrix updated to match. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
7a86f654d4
|
policy: codify signed-token-claim-only actor identity (MR-731) (#101)
Warm-up commit for the policy chassis epic (MR-722). PR #1 of the chassis series — same role as schema-lint v1's commit #1 baseline. Zero behavioral change; establishes the regression test, the load-bearing doc comment, and the user-doc paragraph for an invariant already true in code. Server auth already resolves `actor_id` from the matched bearer token at `omnigraph-server/src/lib.rs:692-694`, overwriting whatever the handler put in the PolicyRequest. The principle is named in docs/dev/invariants.md Hard Invariant 11 ("clients cannot set actor identity directly"). What was missing: a regression test, a load-bearing doc comment at the resolution site, and a user-facing documentation paragraph. This commit adds all three. Why first. The actor-identity invariant is the foundation every other policy decision stands on. If `actor_id` can be spoofed, every chassis primitive (per-row scope, audit log, two-person rule) becomes ungated. Pinning the invariant first means PR #2 (the chassis core) doesn't have to re-prove this assertion. Changes: * crates/omnigraph-server/tests/server.rs — new regression test actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers with three sub-assertions: - spoof-up: bearer for denied actor + X-Actor-Id naming allowed actor → 403 (header doesn't promote) - spoof-down: bearer for allowed actor + X-Actor-Id naming denied actor → 200 (header doesn't demote) - empty-string spoof: empty X-Actor-Id doesn't clear resolved actor Cross-link to MR-777 (auth boundary cases — actor-id collision + malformed bearer) noted in the test docstring. * crates/omnigraph-server/src/lib.rs — expanded doc comment at the actor-resolution site explaining the SECURITY INVARIANT, citing Hard Invariant 11, the Supabase RLS history footgun, and the regression test that pins the contract. Reader thinking "I should let clients override actor_id for impersonation" hits this comment first. * docs/user/policy.md — new "Actor identity (signed-claim-only)" section near the existing Server enforcement section. Closes the user-facing doc gap MR-731's "Done when" requires. Architectural decisions for PR #2+ pinned this session (not implemented here, recorded so future implementers don't re-litigate): - PolicyEngine moves to new `omnigraph-policy` workspace crate so both engine and server can depend on it (Q2). - `enforce(action, scope, actor)` will take a new `ResourceScope` enum, leaving room for MR-725's per-type and per-row variants (Q3). - `PolicyAction::Admin` is kept and wired (Option A) — meta-action for policy-management surfaces (hot reload, audit log query, approvals list) as those consumer features land (Q4). Test results: - cargo test -p omnigraph-server --test server: 45 pass (44 existing + 1 new); no regressions - scripts/check-agents-md.sh: passes (34 links / 33 docs OK) Out of scope (PR #2+): - Omnigraph::with_policy() + enforce() method - omnigraph-policy crate creation - ResourceScope enum - CLI policy injection into Omnigraph - HTTP-layer redundant-check removal - MR-724 Admin action wiring (PR #2) - MR-723 default-deny 3-state (PR #4) - MR-736 severity warn/deny (PR #5) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
60eee78465
|
docs: split user and developer docs (#93) |