Commit graph

5 commits

Author SHA1 Message Date
Ragnor Comerford
52f28cebe8
mr-668: comment cleanup and policy format style
Strip "PR Na/Nb" sub-PR references throughout MR-668 surfaces — they
were useful during the 10-PR delivery sequence but rot now that the
work is in the tree. Keep the MR-668 umbrella references.

Also:
- Add explicit `when = when` and `resource_literal = resource_literal`
  named args in `compile_policy_source`'s outer `format!` to match the
  surrounding crate style (already explicit for `group` and `action`).
- Rename the best-effort cleanup tracing target from
  "omnigraph::init" to "omnigraph::init::cleanup" so operators can
  filter init-failure cleanup events separately from init's other
  log lines.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:57:04 +02:00
Ragnor Comerford
937fd6382d
mr-668: remove POST /graphs and CLI graphs create (defer runtime graph mgmt)
The POST /graphs runtime-create endpoint shipped in PR 7/10 has three
unresolved high-severity bugs:

  - flock-on-renamed-inode race: the YAML flock is taken on
    omnigraph.yaml itself, then a temp file is renamed over it.
    Cross-process writers end up locking different inodes — both
    believing they hold exclusive access.
  - duplicate-check outside the file lock: precheck runs against
    the in-memory registry only; the locked closure does
    config.graphs.insert(...) unconditionally. Concurrent same-id
    POSTs can persist the loser in YAML while the in-memory registry
    keeps the winner — they disagree after restart.
  - best_effort_cleanup_init_artifacts deletes _schema.pg /
    _schema.ir.json / __schema_state.json on any init failure. An
    accidental re-init against an existing graph's URI destroys its
    schema; subsequent open() fails at read_text(_schema.pg).

The correct fix is a Lance-style cluster catalog (reserve → init →
publish with recovery sidecars), parallel to the engine's existing
__manifest discipline. That work is out of scope for v0.7.0.

For now, disable runtime add/remove from the network and CLI surface.
Operators add graphs by editing omnigraph.yaml and restarting. The
GET /graphs read-only enumeration stays.

Removed:
- POST /graphs handler + router fragment + utoipa registration
- 13 post_graphs_* server tests + 3 composite POST tests +
  multi_mode_app_with_real_config / post_graph helpers
- CLI omnigraph graphs create subcommand + its handler + cli.rs tests
- system_remote.rs combined list+create test trimmed to list-only
- YAML rewrite infra: rewrite_atomic[_with_modify], RewriteAtomicError,
  staging_path, hash_config_file, AppState::config_hash field +
  threading through new_multi and open_multi_graph_state
- fs2 dependency (verified absent from cargo tree)
- sha2/fs2 imports in config.rs (only the rewrite path used them)
- Cedar PolicyAction::GraphCreate variant + "graph_create" match arms
  + action def in Cedar schema + graph_create_action_authorizes_against_server_resource test
- GraphCreateRequest / GraphCreateResponse / GraphSchemaSpec /
  GraphPolicySpec API types (only the POST handler / CLI imported them)

Kept:
- GET /graphs (read-only enumeration) and graph_list Cedar action
- omnigraph graphs list CLI subcommand
- All multi-graph startup, mode inference, cluster routes,
  per-graph + server-level Cedar policies
- server_settings_drive_multi_graph_startup_end_to_end (the test
  that covers operator-authored YAML + restart — the path that
  survives)
- best_effort_cleanup_init_artifacts and the three init failpoints
  (still reachable from CLI `omnigraph init`; preflight fix deferred
  as a follow-up)
- GraphRegistry::insert and its concurrency tests — production
  callers gone, but the method is the natural seam for the future
  cluster-catalog work

Also fixed (transcript issue 4):
- ALWAYS_FLAT_PATHS now includes /graphs so multi-mode OpenAPI
  advertises the management route correctly (was previously rewritten
  to /graphs/{graph_id}/graphs)
- multi_mode_openapi_keeps_healthz_flat → renamed to
  multi_mode_openapi_keeps_management_paths_flat, asserts both
  /healthz and /graphs stay flat
- multi_mode_openapi_prefixes_operation_ids_with_cluster skips
  /graphs in addition to /healthz

Doc fixes:
- docs/user/cli.md: graphs list example was --target http://...,
  but --target is a config-graph-name lookup; corrected to --uri.
  Removed the graphs create example.
- docs/user/server.md: dropped POST /graphs row, "omnigraph.yaml
  ownership", and "POST /graphs body shape" sections. Added a
  paragraph stating runtime add/remove is not exposed in v0.7.0.
- docs/user/policy.md: dropped graph_create action; reworded the
  "Configuration" line to clarify that server-scoped rules (graph_list)
  take neither branch_scope nor target_branch_scope.
- docs/releases/v0.7.0.md: rewrote release narrative — multi-graph
  mode ships; runtime add/remove deferred.
- AGENTS.md: HTTP server bullet and capability matrix row updated to
  reflect read-only GET /graphs and the operator-edit workflow.
- openapi.json regenerated; /graphs has only .get, no .post.

Diff: 17 files, +123 −1525 LOC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 17:49:38 +02:00
Ragnor Comerford
0e5aa036f4
mr-668: Cedar resource-model refactor (PR 6a/10)
PR 6a of the MR-668 multi-graph server work. Policy-crate-only refactor —
no HTTP handler changes, no operator-supplied policy.yaml changes. Sets
up the chassis that PR 6b's `GET /graphs` consumes.

Two new `PolicyAction` variants:
  - `GraphCreate` — gates `POST /graphs` (deferred behavioral PR).
  - `GraphList`   — gates `GET /graphs` (lands in PR 6b).

Note: `GraphDelete` is intentionally NOT added in this PR. `DELETE
/graphs/{id}` is deferred from MR-668's v0.7.0 scope to bound complexity
(no `delete_prefix`, no tombstone, no `RegistryLookup::Tombstoned`).
Adding the Cedar action without a consumer would be the same kind of
"dead vocabulary" trap the `Admin` variant already documents.

New `PolicyResourceKind { Graph, Server }` enum, plus a
`PolicyAction::resource_kind()` method that classifies every action.
Per-graph actions (Read, Change, BranchCreate, …) bind to
`Omnigraph::Graph::"<graph_label>"`; server-scoped actions
(GraphCreate, GraphList) bind to the singleton
`Omnigraph::Server::"root"`. `Admin` stays classified as per-graph for
now — MR-724 will pick the final shape when the first consumer surface
ships.

Cedar schema string additions:
  - `entity Server;`
  - `action "graph_create" appliesTo { principal: Actor, resource: Server, ... }`
  - `action "graph_list"   appliesTo { principal: Actor, resource: Server, ... }`

Compiler updates:
  - `compile_policy_source` picks the resource literal based on the
    action's `resource_kind`. Existing graph-only policies generate
    the same Cedar source as before — pinned by
    `per_graph_rules_continue_to_work_alongside_server_rules`.
  - `compile_entities` includes the `Server::"root"` entity only when
    a rule references a server-scoped action. Keeps test assertions
    for graph-only policies tight.
  - `PolicyEngine::authorize` builds the right resource UID at
    request time based on `request.action.resource_kind()`.

Validation rules added to `PolicyConfig::validate`:
  - A rule may not mix server-scoped and per-graph actions (different
    resource kinds need different `permit` clauses).
  - Server-scoped actions cannot have `branch_scope` or
    `target_branch_scope` — there's no branch context at the server
    level.

Operator impact: zero. The Cedar schema `Omnigraph::Server` entity is
internally referenced by `compile_policy_source`; operator policy.yaml
files only declare actions in `rules[].allow.actions` and never
reference the resource entity directly. Decision 6's "internal rename
only; operator policies unaffected" contract is preserved and pinned
by `per_graph_rules_continue_to_work_alongside_server_rules`.

Tests: 5 new (11 policy tests total, up from 6):
  - `graph_list_action_authorizes_against_server_resource`
  - `graph_create_action_authorizes_against_server_resource`
  - `server_scoped_rule_cannot_use_branch_scope`
  - `rule_mixing_server_and_per_graph_actions_is_rejected`
  - `per_graph_rules_continue_to_work_alongside_server_rules`

No regression: 145 server tests (74 lib + 71 integration) still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 20:20:35 +02:00
Ragnor Comerford
cc2412dc65
Rename repo terminology to graph (#118)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
2026-05-24 16:46:00 +01:00
Andrew Altshuler
9973683261
policy: chassis core — omnigraph-policy crate + Omnigraph::enforce() (MR-722) (#102)
PR #2 of the policy chassis series (PR #1 = MR-731, merged in #101).
The structural fix that moves Cedar enforcement from HTTP-only to
engine-wide. apply_schema is the proof-of-concept writer; PR #3 fans
the enforce() call out to the remaining six (mutate_as, load,
ingest_as, branch_create_from, branch_delete, branch_merge).

## What lands

### New crate: omnigraph-policy

The 844-line policy.rs moves from `omnigraph-server` into a new
`omnigraph-policy` workspace crate so both engine and server can
depend on it. Cedar dependency moves with it. The server's policy.rs
becomes a re-export shim (`pub use omnigraph_policy::*`) so existing
`omnigraph_server::PolicyAction` etc. paths keep working — CLI and
test consumers don't have to migrate in one go.

### New trait: PolicyChecker

```rust
pub trait PolicyChecker: Send + Sync {
    fn check(&self, action: PolicyAction, scope: &ResourceScope,
             actor: &str) -> Result<(), PolicyError>;
}
```

`PolicyEngine` (Cedar-backed) implements it. `Omnigraph::with_policy()`
takes `Arc<dyn PolicyChecker>`. Engine tests mock the trait without
spinning up Cedar. MR-725 will extend the trait with `predicate_for()`
for query-layer pushdown — additive, no call-site changes.

### New enum: ResourceScope

Four variants — Graph, Branch, TargetBranch, BranchTransition —
mapping cleanly to today's `(branch, target_branch)` shape on
PolicyRequest via `to_branch_pair()`. Each engine writer picks the
variant that matches the existing HTTP-layer convention so engine
and HTTP evaluate the same Cedar decision.

**Invariant**: ResourceScope stays at branch granularity. Per-type
and per-row scope are MR-725's territory, not engine-layer's.
Adding Type/Row variants here creates two places per-type policy
can be evaluated, which can drift. See chassis design refinements
comment on MR-722 (2026-05-17).

### Omnigraph::with_policy() + enforce()

* New `policy: Option<Arc<dyn PolicyChecker>>` field on Omnigraph,
  None by default (preserves embedded/dev no-enforcement mode).
* `with_policy(self, checker)` setter — builder-style, consumes self.
* `enforce(action, scope, actor)` — the gate. When policy is None,
  no-op. When policy is Some AND actor is None, hard error — silent
  bypass via "I forgot the actor" is exactly the footgun this gate
  is here to prevent.

### apply_schema_as: first writer wired

* New public method `apply_schema_as(source, options, actor)` that
  calls `enforce(SchemaApply, TargetBranch("main"), actor)` before
  acquiring the schema-apply lock or doing any other work.
* Existing `apply_schema(source)` and `apply_schema_with_options(...)`
  delegate to it with actor=None (no-actor variants).
* HTTP handler `server_schema_apply` updated to call apply_schema_as
  with the resolved actor. AppState construction injects the
  PolicyEngine into Omnigraph via `with_policy`. HTTP-layer
  authorize_request still fires first; the engine gate is the
  redundant-but-correct backstop and the only path that protects SDK
  / embedded callers. PR #3 removes the HTTP redundancy.

### OmniError::Policy

New error variant for engine-layer policy denial / evaluation
failure. ApiError::from_omni maps it to 403.

### MR-724 Admin action — Option A reservation

PolicyAction::Admin kept in the enum with a load-bearing doc
comment naming its future consumers (hot reload, audit log query,
approvals list per MR-726 / MR-732 / MR-734). No enforce(Admin, ...)
call site exists yet — the variant is reserved so the action
vocabulary is complete from chassis day one. MR-724 closes when
the first consumer surface ships.

### New SDK-side integration test

`crates/omnigraph/tests/policy_engine_chassis.rs` — four tests
covering:
* Policy denies for unauthorized actor → OmniError::Policy
* Policy permits for authorized actor → apply succeeds
* Policy installed + no actor → hard error (forget-the-actor footgun)
* No policy → no-op (embedded/dev default still works)

These exercise the engine path directly — no HTTP layer involved.

## Test results

- cargo test --workspace --locked --no-fail-fast: 851 passed, 0 failed
  * 45 server tests (existing) pass
  * 14 schema_apply tests (existing) pass
  * 4 new chassis tests pass
  * 60 OpenAPI tests pass (no HTTP API surface changes)
  * No regressions across the workspace

## Architectural decisions baked in

Per MR-722 chassis design refinements comment (2026-05-17):

1. PolicyChecker is a trait, not just a concrete. Engine and server
   consume the trait. MR-725 adds predicate_for() additively.
2. ResourceScope stays at branch granularity. No Type/Row variants.
3. Coarse-vs-fine framing pinned: engine-layer is action gate;
   query-layer (MR-725) is predicate gate. Both backed by same Cedar
   engine; non-overlapping responsibilities.
4. Admin action reserved for policy-management surfaces (MR-724
   Option A).

## Pending follow-ups (PR #3+)

- Fan-out enforce() to mutate_as, load, ingest_as, branch_create_from,
  branch_delete, branch_merge (PR #3).
- Remove HTTP-layer authorize_request redundancy once engine gate
  covers all writers (PR #3).
- CLI policy injection into Omnigraph for non-`policy validate|test|explain`
  subcommands (PR #3 or follow-up).
- MR-723 default-deny 3-state matrix (PR #4).
- MR-736 severity warn/deny (PR #5).
- AGENTS.md scope-of-enforcement rewrite once chassis fully lands.
- Coarse-vs-fine framing in docs/user/policy.md.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 00:36:36 +03:00