Commit graph

7 commits

Author SHA1 Message Date
Andrew Altshuler
9973683261
policy: chassis core — omnigraph-policy crate + Omnigraph::enforce() (MR-722) (#102)
PR #2 of the policy chassis series (PR #1 = MR-731, merged in #101).
The structural fix that moves Cedar enforcement from HTTP-only to
engine-wide. apply_schema is the proof-of-concept writer; PR #3 fans
the enforce() call out to the remaining six (mutate_as, load,
ingest_as, branch_create_from, branch_delete, branch_merge).

## What lands

### New crate: omnigraph-policy

The 844-line policy.rs moves from `omnigraph-server` into a new
`omnigraph-policy` workspace crate so both engine and server can
depend on it. Cedar dependency moves with it. The server's policy.rs
becomes a re-export shim (`pub use omnigraph_policy::*`) so existing
`omnigraph_server::PolicyAction` etc. paths keep working — CLI and
test consumers don't have to migrate in one go.

### New trait: PolicyChecker

```rust
pub trait PolicyChecker: Send + Sync {
    fn check(&self, action: PolicyAction, scope: &ResourceScope,
             actor: &str) -> Result<(), PolicyError>;
}
```

`PolicyEngine` (Cedar-backed) implements it. `Omnigraph::with_policy()`
takes `Arc<dyn PolicyChecker>`. Engine tests mock the trait without
spinning up Cedar. MR-725 will extend the trait with `predicate_for()`
for query-layer pushdown — additive, no call-site changes.

### New enum: ResourceScope

Four variants — Graph, Branch, TargetBranch, BranchTransition —
mapping cleanly to today's `(branch, target_branch)` shape on
PolicyRequest via `to_branch_pair()`. Each engine writer picks the
variant that matches the existing HTTP-layer convention so engine
and HTTP evaluate the same Cedar decision.

**Invariant**: ResourceScope stays at branch granularity. Per-type
and per-row scope are MR-725's territory, not engine-layer's.
Adding Type/Row variants here creates two places per-type policy
can be evaluated, which can drift. See chassis design refinements
comment on MR-722 (2026-05-17).

### Omnigraph::with_policy() + enforce()

* New `policy: Option<Arc<dyn PolicyChecker>>` field on Omnigraph,
  None by default (preserves embedded/dev no-enforcement mode).
* `with_policy(self, checker)` setter — builder-style, consumes self.
* `enforce(action, scope, actor)` — the gate. When policy is None,
  no-op. When policy is Some AND actor is None, hard error — silent
  bypass via "I forgot the actor" is exactly the footgun this gate
  is here to prevent.

### apply_schema_as: first writer wired

* New public method `apply_schema_as(source, options, actor)` that
  calls `enforce(SchemaApply, TargetBranch("main"), actor)` before
  acquiring the schema-apply lock or doing any other work.
* Existing `apply_schema(source)` and `apply_schema_with_options(...)`
  delegate to it with actor=None (no-actor variants).
* HTTP handler `server_schema_apply` updated to call apply_schema_as
  with the resolved actor. AppState construction injects the
  PolicyEngine into Omnigraph via `with_policy`. HTTP-layer
  authorize_request still fires first; the engine gate is the
  redundant-but-correct backstop and the only path that protects SDK
  / embedded callers. PR #3 removes the HTTP redundancy.

### OmniError::Policy

New error variant for engine-layer policy denial / evaluation
failure. ApiError::from_omni maps it to 403.

### MR-724 Admin action — Option A reservation

PolicyAction::Admin kept in the enum with a load-bearing doc
comment naming its future consumers (hot reload, audit log query,
approvals list per MR-726 / MR-732 / MR-734). No enforce(Admin, ...)
call site exists yet — the variant is reserved so the action
vocabulary is complete from chassis day one. MR-724 closes when
the first consumer surface ships.

### New SDK-side integration test

`crates/omnigraph/tests/policy_engine_chassis.rs` — four tests
covering:
* Policy denies for unauthorized actor → OmniError::Policy
* Policy permits for authorized actor → apply succeeds
* Policy installed + no actor → hard error (forget-the-actor footgun)
* No policy → no-op (embedded/dev default still works)

These exercise the engine path directly — no HTTP layer involved.

## Test results

- cargo test --workspace --locked --no-fail-fast: 851 passed, 0 failed
  * 45 server tests (existing) pass
  * 14 schema_apply tests (existing) pass
  * 4 new chassis tests pass
  * 60 OpenAPI tests pass (no HTTP API surface changes)
  * No regressions across the workspace

## Architectural decisions baked in

Per MR-722 chassis design refinements comment (2026-05-17):

1. PolicyChecker is a trait, not just a concrete. Engine and server
   consume the trait. MR-725 adds predicate_for() additively.
2. ResourceScope stays at branch granularity. No Type/Row variants.
3. Coarse-vs-fine framing pinned: engine-layer is action gate;
   query-layer (MR-725) is predicate gate. Both backed by same Cedar
   engine; non-overlapping responsibilities.
4. Admin action reserved for policy-management surfaces (MR-724
   Option A).

## Pending follow-ups (PR #3+)

- Fan-out enforce() to mutate_as, load, ingest_as, branch_create_from,
  branch_delete, branch_merge (PR #3).
- Remove HTTP-layer authorize_request redundancy once engine gate
  covers all writers (PR #3).
- CLI policy injection into Omnigraph for non-`policy validate|test|explain`
  subcommands (PR #3 or follow-up).
- MR-723 default-deny 3-state matrix (PR #4).
- MR-736 severity warn/deny (PR #5).
- AGENTS.md scope-of-enforcement rewrite once chassis fully lands.
- Coarse-vs-fine framing in docs/user/policy.md.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 00:36:36 +03:00
Ragnor Comerford
cd780e2d37
deps: add arc-swap to workspace for PR 2 catalog/schema_source wrapping
PR 2 wraps the Omnigraph engine's catalog and schema_source fields in
ArcSwap so reads stay zero-cost while apply_schema can swap atomically
without &mut self. arc-swap lands as an unused workspace dep here so the
follow-up commits that wrap fields can land in isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:25:22 +02:00
Ragnor Comerford
cdfbccbfdc
MR-794 step 2: scaffold MutationStaging accumulator + scan_with_pending
Add the scaffolding for the in-memory staged-write rewire — no behavior
change yet:

* New crates/omnigraph/src/exec/staging.rs with MutationStaging,
  PendingTable, PendingMode, StagedTablePath, plus the end-of-query
  finalize() that issues one stage_* + commit_staged per pending
  table (Merge mode dedupes by id, last-write-wins).
* TableStore::scan_with_pending and count_rows_with_pending helpers —
  Lance scan committed + DataFusion MemTable scan pending, concat.
  Sidesteps the Scanner::with_fragments filter-pushdown limitation
  documented on scan_with_staged.
* Add datafusion = "52" to workspace + omnigraph-engine deps for
  MemTable (transitively pulled by Lance already).

Engine code still uses the legacy MutationStaging shape; the rewire
lands in subsequent commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:42:21 +02:00
Andrew Altshuler
74eb5a5380
Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46)
* Parallel per-type load writes + omnigraph optimize/cleanup CLI

## MR-677.3 — parallel per-type load writes

The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).

## MR-676 — `omnigraph optimize` and `omnigraph cleanup`

New CLI subcommands that walk every node + edge table in the repo:

- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
  table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
  runs Lance `cleanup_old_versions` to prune historical manifests +
  unique fragments. Requires `--confirm` because it's destructive.
  Supports both count-based and time-based retention (or both AND'd
  together). Time uses chrono `DateTime<Utc>` (added as a workspace
  dep, default-features off).

Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.

Public API on `Omnigraph`:

  pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
  pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
      -> Result<Vec<TableCleanupStats>>

All 10 existing loader tests still pass.

Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-25 14:22:14 +03:00
andrew
c338e80180 Harden bearer auth: constant-time compare, hashed at rest, authoritative actor_id
Fixes two live authz bugs in omnigraph-server:

- Bearer-token lookup previously used HashMap::get, which compares keys with
  Eq and short-circuits on the first differing byte — a network-observable
  timing oracle for brute-forcing tokens. Tokens are now stored as SHA-256
  digests and compared with subtle::ConstantTimeEq, iterating every entry
  unconditionally so total work is independent of which slot matches. Raw
  token bytes no longer live in server memory after startup.

- authorize_request now overwrites PolicyRequest.actor_id from the
  authenticated session instead of trusting the handler-supplied field,
  which previously defaulted to "" via unwrap_or_default(). The empty
  string can no longer reach Cedar as a policy subject even if a future
  refactor drops the None check.

External API of AppState constructors is unchanged — tokens still enter as
Vec<(String, String)> and are hashed on the way in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 01:41:02 +03:00
Claude
859ec9faa8
Add OpenAPI spec generation via utoipa with /openapi.json endpoint
Integrate utoipa 5 to auto-generate an OpenAPI 3.1 spec from the existing
Axum handlers and serde types. All 16 endpoints are annotated with path
metadata, request/response schemas, security requirements, and tags. A
public /openapi.json endpoint serves the spec without requiring auth.

Includes 59 tests covering path completeness, HTTP methods, schema fields,
enum variants, security scheme, path/query parameters, request bodies,
response references, and endpoint integration.

https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY
2026-04-12 11:03:23 +00:00
andrew
338289656a Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00