mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-24 02:38:06 +02:00
Merge branch 'main' into ragnorc/omnigraph-mcp-crate
Folds in v0.7.1 (release #290 + optimize/write-path/internal-table-compaction fixes #288/#291/#297) under the MCP branch. Conflict resolutions (5 files): - crates/omnigraph-server/Cargo.toml: take main's 0.7.1 path-dep constraints; keep our omnigraph-mcp dep (bumped to 0.7.1) + http dep. - crates/omnigraph-server/src/handlers.rs: keep our server_list_queries doc-comment (exposed @mcp(expose) subset, invoke_query-gated) — it supersedes main's pre-@mcp(expose) text, since this branch adds the per-query expose flag. - docs/user/operations/server.md: keep our GET /queries description (invoke_query gate + @mcp(expose) exposure) over main's read-gated/list-all text. - docs/dev/index.md: keep both in-flight RFC rows; renumber this branch's tenancy RFC 013 -> 014 (rfc-014-tenancy-cells.md) since main now owns RFC-013 (rfc-013-write-path-latency.md). Title + index link updated; link-check green. - openapi.json: regenerated from merged source (OMNIGRAPH_UPDATE_OPENAPI=1) — now info.version 0.7.1 with our invoke_query/@mcp schema. Coherence: omnigraph-mcp bumped 0.7.0 -> 0.7.1 to match the workspace; Cargo.lock updated. cargo build --workspace green; server/mcp/api-types/compiler suites green (schema_routes.rs reopen-after-apply flakes under parallel IO on a near-full disk, passes single-threaded — a pre-existing main test, unchanged by the merge).
This commit is contained in:
commit
adc36adf32
44 changed files with 3595 additions and 528 deletions
|
|
@ -28,7 +28,7 @@ Top-level command families and subcommands. Graph-targeting commands accept a po
|
|||
| `policy validate \| test \| explain` | Cedar tooling against a cluster's applied policies (`--cluster <dir>`; `--graph <id>` picks a graph's bundle when several apply). `test` takes `--tests <file>`; `explain` takes `--actor`/`--action`/`--branch`/`--target-branch` |
|
||||
| `queries list \| validate` | inspect a cluster's applied stored-query registry (`--cluster <dir\|uri>`; `--graph <id>` to scope one graph). `list` prints each query's kind (read/mutation), name, typed params, and `[mcp: …]` exposure; a query's `@description`/`@instruction` are shown as indented `description:` / `instruction:` lines when declared (omitted otherwise). `--json` emits `{name, mcp_expose, tool_name, mutation, params}` plus `description`/`instruction` **only when present** — matching the HTTP `GET /queries` catalog ([server.md](../operations/server.md)). `validate` type-checks the registry and exits non-zero on a broken query |
|
||||
| `profile list \| show [<name>]` | read-only inspection of `~/.omnigraph/config.yaml` profiles. `list` shows each profile's binding (server/cluster/store) + default graph and marks the `$OMNIGRAPH_PROFILE`-active one; JSON keeps `binding` and adds `scope_kind`, `target`, `valid`, and `error`; `show` resolves one profile's scope (endpoint + default graph), defaulting to the active profile, else the flat operator defaults |
|
||||
| `version` / `-v` | print `omnigraph 0.3.x` |
|
||||
| `version` / `-v` | print `omnigraph 0.7.x` |
|
||||
|
||||
## Command capabilities
|
||||
|
||||
|
|
@ -189,22 +189,26 @@ omnigraph cluster import --config company-brain --json
|
|||
omnigraph cluster force-unlock <LOCK_ID> --config company-brain --json
|
||||
```
|
||||
|
||||
`--config` is a directory containing `cluster.yaml`; it defaults to `.`.
|
||||
Stage 3A accepts graphs, schemas, stored queries, and policy bundle file
|
||||
`--config` is a directory containing `cluster.yaml`; it defaults to `.`. The
|
||||
config declares graphs, schemas, stored queries, and policy bundle file
|
||||
references. `cluster plan` reads local JSON state from
|
||||
`<config-dir>/__cluster/state.json`; a missing file means empty state. Plan,
|
||||
apply, refresh, and import acquire `__cluster/lock.json` by default and release
|
||||
it before returning. `cluster apply` executes only stored-query/policy catalog
|
||||
writes (content-addressed under `__cluster/resources/`) and requires an
|
||||
existing `state.json`; graph/schema changes are deferred with warnings, and
|
||||
applied resources do not serve traffic until an `omnigraph-server --cluster
|
||||
<dir>` restart picks them up. `cluster status` reads state only and reports any existing
|
||||
lock metadata. `force-unlock` removes a lock only when the supplied id exactly
|
||||
matches the lock file. `refresh` requires an existing `state.json`; `import`
|
||||
creates one only when it is missing. Both observe declared graphs read-only at
|
||||
`<config-dir>/graphs/<graph-id>.omni`. External state backends, graph/schema
|
||||
apply, automatic stale-lock breaking, `plan --refresh`, pipelines, UI specs,
|
||||
embeddings, aliases, and bindings are reserved for later stages. See
|
||||
it before returning. `cluster apply` converges the cluster to its config in one
|
||||
ordered run: it creates declared graphs, applies schema updates (soft drops
|
||||
only — see [schema](../schema/index.md)), writes stored-query/policy catalog
|
||||
resources (content-addressed under `__cluster/resources/`), and executes
|
||||
approved graph deletes; it requires an existing `state.json` (run `import`
|
||||
first). Applied state does not serve traffic until an `omnigraph-server
|
||||
--cluster <dir>` restart picks up the new revision. Standalone schema deletes
|
||||
remain unsupported and are reported as `deferred` with a warning. `cluster
|
||||
status` reads state only and reports any existing lock metadata. `force-unlock`
|
||||
removes a lock only when the supplied id exactly matches the lock file.
|
||||
`refresh` requires an existing `state.json`; `import` creates one only when it
|
||||
is missing. Both observe declared graphs read-only at
|
||||
`<config-dir>/graphs/<graph-id>.omni`. External state backends, automatic
|
||||
stale-lock breaking, `plan --refresh`, pipelines, UI specs, embeddings,
|
||||
aliases, and bindings are not yet supported. See
|
||||
[cluster-config.md](../clusters/config.md).
|
||||
|
||||
## Output formats (`query` command, alias: `read`)
|
||||
|
|
@ -221,9 +225,12 @@ Precedence (high to low): explicit `--params` / `--params-file`, alias positiona
|
|||
|
||||
## Bearer token resolution (CLI)
|
||||
|
||||
1. `graphs.<name>.bearer_token_env`
|
||||
2. `OMNIGRAPH_BEARER_TOKEN` global env
|
||||
3. `auth.env_file` referenced `.env`
|
||||
See **Credentials keyed by server name** above: a remote command resolves its
|
||||
token via `OMNIGRAPH_TOKEN_<NAME>` env → the `[<name>]` section in
|
||||
`~/.omnigraph/credentials` → the default `OMNIGRAPH_BEARER_TOKEN` env, and a
|
||||
keyed token is only ever sent to the server it is keyed to. Plaintext tokens are
|
||||
never stored in operator config; the removed `omnigraph.yaml` keys
|
||||
(`graphs.<name>.bearer_token_env`, `auth.env_file`) no longer exist.
|
||||
|
||||
## Duration parsing (cleanup)
|
||||
|
||||
|
|
|
|||
|
|
@ -212,7 +212,7 @@ resource is planned as a create. If present, the file must use this shape:
|
|||
```
|
||||
|
||||
`state_revision`, `resource_statuses`, `approval_records`, `recovery_records`,
|
||||
and `observations` are optional so older Stage 1 state fixtures keep working.
|
||||
and `observations` are optional so earlier state fixtures keep working.
|
||||
Missing `state_revision` is treated as `0`. Resource status values are
|
||||
`pending`, `planned`, `applying`, `applied`, `drifted`, `blocked`, or `error`.
|
||||
|
||||
|
|
@ -238,9 +238,10 @@ profile in the ledger; pre-profile ledgers are backfilled by an Update with
|
|||
catalog changes and count toward convergence.
|
||||
|
||||
Each plan change carries a `disposition` field — an honest preview of what
|
||||
`cluster apply` will do with it in this stage: `applied` (executes), `derived`
|
||||
(a `graph.<id>` composite-digest update that converges automatically once its
|
||||
query digests land), `deferred` (graph/schema change, later phase), or
|
||||
`cluster apply` will do with it: `applied` (executes — graph creates, schema
|
||||
updates, catalog writes, approved deletes), `derived` (a `graph.<id>`
|
||||
composite-digest update that converges automatically once its query digests
|
||||
land), `deferred` (an unsupported change, e.g. a standalone schema delete), or
|
||||
`blocked` (query/policy gated by an unapplied or missing dependency, with the
|
||||
condition in `reason`).
|
||||
|
||||
|
|
@ -496,5 +497,5 @@ matches the argument. A wrong id, missing lock, invalid lock JSON, or unsupporte
|
|||
lock version exits non-zero and leaves the file untouched.
|
||||
|
||||
This is manual recovery for abandoned local locks. OmniGraph does not perform
|
||||
PID-liveness checks, TTL expiry, stale-lock breaking, or automatic unlock in
|
||||
Stage 2C.
|
||||
PID-liveness checks, TTL expiry, stale-lock breaking, or automatic unlock
|
||||
today.
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@
|
|||
|
||||
- Compacts every node + edge table on `main`, then reindexes them, then **publishes the resulting version to the `__manifest`** so the manifest's recorded version tracks the compacted-and-reindexed state. Reads pin the manifest version, so without this publish the work would be invisible to readers *and* would break the version precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually changed.
|
||||
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older versions until `cleanup` runs.
|
||||
- **Also compacts the internal system tables** `__manifest`, `_graph_commits`, and `_graph_commit_actors` (RFC-013 step 2), which accumulate one fragment per commit (the actor table only on the authenticated write path, where every commit carries an actor) and otherwise make every write's metadata scan grow with history. These take a simpler path than data tables: they are not `__manifest`-tracked (readers open them at their latest version), so compaction just advances their version in place — **no manifest publish and no recovery sidecar**. (The sidecar-free property is not because it is one commit — `compact_files` can emit a `ReserveFragments` commit before the `Rewrite`, and the auto-cleanup strip below is a further commit — but because every one of those commits is content-preserving and the table is read at its latest version, so a crash at any point leaves it readable and content-identical and the next `optimize` re-plans.) They appear in the returned stats under `table_key` `"__manifest"` / `"_graph_commits"` / `"_graph_commit_actors"` (the latter two only when present). They are **not yet covered by `cleanup`**, so their version chain still grows until the cleanup half lands (it requires a cleanup-resurrection safeguard first); run `optimize` on a cadence to keep per-write metadata scans flat.
|
||||
- **`optimize` is non-destructive by construction — it never garbage-collects versions, on any table (data or internal).** Compaction rewrites fragments and advances the version; old versions stay reachable until you run `cleanup`. This holds even for a graph created by an older binary that stored an on-by-default Lance `auto_cleanup` hook: `compact_files` / `optimize_indices` commit with the hook enabled and expose no skip override, so before compacting **any** table `optimize` strips its stale `lance.auto_cleanup.*` config first, so Lance's commit-time GC hook cannot fire and silently prune `__manifest`-pinned versions. (Graphs created by current binaries store no such config; the strip is the upgrade-path safety net.) The internal-table path additionally tolerates a concurrent live writer: it runs a **bounded** rebase-and-retry, so transient contention does not fail the operator's `optimize` or the live write — but sustained contention past the retry budget surfaces a loud conflict error rather than looping forever (bounded and observable, not a silent give-up). The data-table path holds the per-table write queue while it compacts, so it does not contend with mutations on that table in the first place.
|
||||
- **Reindex (index coverage maintenance).** A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the index was built (e.g. by `load --mode merge`, whose commit does not rebuild an already-existing index) are scanned unindexed, and compaction itself rewrites fragments out of an index's coverage. `optimize` runs Lance's incremental `optimize_indices` after compaction to fold those fragments back in (a delta merge, not a full retrain), restoring full coverage so equality/range/traversal predicates stay index-accelerated. This is why a table with **no compaction work but stale index coverage still commits** a new version under `optimize`. Run `optimize` on a cadence at least as frequent as your freshness window so recently-loaded rows do not linger in the unindexed flat-scan tail.
|
||||
- **Create declared-but-missing indexes (the index reconciler).** `@index`/`@key` declares intent; `schema apply` records it but builds nothing, and `load`/`mutate` defer a column that cannot be built yet (a `Vector` column with no trainable vectors). `optimize` materializes any such declared-but-unbuilt index over the compacted layout — so it is the convergence path for an `@index` added after data exists, or a vector index whose embeddings arrived via a later `embed`. A column still not buildable (no vectors yet) is reported on the table's stat as `pending_indexes` (visible in `--json`), not treated as a failure; the next `optimize` retries. So `optimize` is the single operator-facing index reconciler: it compacts, restores coverage, **and** builds declared-but-missing indexes.
|
||||
- Each table's compact→reindex→publish serializes with concurrent mutations on the same table. A crash mid-operation is recovered automatically on the next open (both compaction and reindex are content-preserving, so roll-forward is always safe).
|
||||
|
|
|
|||
|
|
@ -40,7 +40,7 @@ storage root, with no local config directory. `--bind`,
|
|||
|
||||
### Stored-query validation at startup
|
||||
|
||||
If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup**. Query parse/type failures quarantine that graph; if no graph remains healthy, startup refuses. Two MCP-exposed queries claiming the same tool name are likewise graph-local startup failures. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the exposed queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below).
|
||||
If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup**. Query parse/type failures quarantine that graph; if no graph remains healthy, startup refuses. Two MCP-exposed queries claiming the same tool name are likewise graph-local startup failures. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the stored queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below).
|
||||
|
||||
## Endpoint inventory
|
||||
|
||||
|
|
@ -77,6 +77,11 @@ Server-level management endpoints:
|
|||
|---|---|---|---|
|
||||
| GET | `/graphs` | bearer + `graph_list` on `Server::"root"` | list ready/served graphs |
|
||||
|
||||
> The per-graph subsections below name routes in shorthand (`GET /queries`,
|
||||
> `POST /query`, `POST /mutate`, `POST /queries/{name}`); every one is served
|
||||
> under the `/graphs/{id}/…` prefix shown in the table — only `/graphs` and
|
||||
> `/healthz` are flat.
|
||||
|
||||
### Stored-query catalog (`GET /queries`)
|
||||
|
||||
List the graph's **exposed** (`@mcp(expose: true)`) stored queries as a typed tool catalog — enough for a client to register each as a tool without fetching `.gq` source. (The server also projects these queries as live MCP tools at `POST /graphs/{id}/mcp` — see [mcp.md](mcp.md); this catalog endpoint is the REST view of the same registry.) Each entry: `{ name, tool_name, description, instruction, mutation, params }`, where each param is `{ name, kind, item_kind?, vector_dim?, nullable, description? }`. `kind` is one of `string | bool | int | bigint | float | date | datetime | blob | vector | list` (decomposed so a consumer maps it with a closed `switch`, never re-parsing GQ type spelling). `bigint` (I64/U64), `date`, `datetime`, and `blob` are carried as JSON **strings** — a 64-bit integer loses precision as a JSON number, dates are ISO strings, and a blob is a URI string.
|
||||
|
|
@ -179,8 +184,8 @@ Uniform `ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? }` wi
|
|||
caller's pre-write view of one table's manifest version was stale.
|
||||
`ManifestConflictOutput { table_key, expected, actual }` tells the client
|
||||
which table to refresh and retry. This is the conflict shape produced by
|
||||
concurrent `/mutate` (or its `/change` alias) or `/ingest` calls landing
|
||||
the same `(table, branch)` race.
|
||||
concurrent `/mutate` (or its `/change` alias), `/load` (or its deprecated
|
||||
`/ingest` alias) calls landing the same `(table, branch)` race.
|
||||
|
||||
HTTP status codes used: 200, 400, 401, 403, 404, 409, 429, 500.
|
||||
|
||||
|
|
@ -207,7 +212,8 @@ Cedar policy authorization runs **before** admission accounting so
|
|||
denied requests don't consume admission slots.
|
||||
|
||||
Today admission gates every mutating handler: `/mutate` (and its
|
||||
deprecated alias `/change`), `/ingest`, `/branches/{create,delete,merge}`,
|
||||
deprecated alias `/change`), `/load` (and its deprecated alias `/ingest`),
|
||||
`/branches/{create,delete,merge}`,
|
||||
and `/schema/apply`. Read-only endpoints (`/snapshot`, `/query`, `/read`,
|
||||
`/export`, `/branches` GET, `/commits`, `/schema` GET) are not
|
||||
admission-gated.
|
||||
|
|
@ -215,7 +221,7 @@ admission-gated.
|
|||
## Body limits
|
||||
|
||||
- Default: 1 MB
|
||||
- `/ingest`: 32 MB
|
||||
- `/load` (and its deprecated `/ingest` alias): 32 MB
|
||||
|
||||
## Auth model (`bearer + SHA-256`)
|
||||
|
||||
|
|
@ -243,7 +249,7 @@ See [deployment.md](../deployment.md) for token-source operational details.
|
|||
|
||||
- CORS — not configured; add `tower_http::cors` if needed.
|
||||
- Rate limiting — per-actor admission control gates `/mutate` (alias
|
||||
`/change`), `/ingest`, `/branches/{create,delete,merge}`,
|
||||
`/change`), `/load` (alias `/ingest`), `/branches/{create,delete,merge}`,
|
||||
`/schema/apply` (see "Per-actor
|
||||
admission control" above). No global rate limiter is configured;
|
||||
add `tower_http::limit` if a graph-wide cap is needed.
|
||||
|
|
|
|||
|
|
@ -18,7 +18,7 @@
|
|||
| Expand CSR-build cost factor | `CSR_BUILD_FACTOR = 1.5` | traversal |
|
||||
| Expand mode override | `OMNIGRAPH_TRAVERSAL_MODE` (`indexed`\|`csr`; unset = cost-based auto) | traversal |
|
||||
| Default body limit | `1 MB` | HTTP server |
|
||||
| Ingest body limit | `32 MB` | HTTP server |
|
||||
| Load (bulk-write) body limit | `32 MB` | HTTP server (`/load`; shared by the deprecated `/ingest` alias) |
|
||||
| Default embed provider/model | `openai-compatible` / `openai/text-embedding-3-large` | engine embedding |
|
||||
| OpenAI-direct embed model | `text-embedding-3-large` | engine embedding |
|
||||
| Gemini-direct embed model | `gemini-embedding-2` | engine embedding |
|
||||
|
|
|
|||
|
|
@ -72,6 +72,8 @@ Applying a plan reports whether it was supported, the steps applied, and the res
|
|||
|
||||
`DropProperty` and `DropType` steps default to `Soft` mode: the catalog tombstones the entry but the prior column / dataset remains time-travel-reachable via `snapshot_at_version(prev)` until `omnigraph cleanup` runs. Soft drops are reversible.
|
||||
|
||||
Pass `--allow-data-loss` (CLI) or `allow_data_loss: true` (HTTP `POST /schema/apply` body, SDK `SchemaApplyOptions`) to promote every drop in the plan to `Hard` mode. Hard drops run `cleanup_old_versions` on the affected dataset immediately after the manifest publish, making the prior column / dataset unreachable. **Irreversible.**
|
||||
Pass `--allow-data-loss` (CLI `schema apply`) or `allow_data_loss: true` (SDK `SchemaApplyOptions`) to promote every drop in the plan to `Hard` mode. Hard drops run `cleanup_old_versions` on the affected dataset immediately after the manifest publish, making the prior column / dataset unreachable. **Irreversible.**
|
||||
|
||||
The flag is honored uniformly across transports — `omnigraph schema apply --allow-data-loss`, `POST /schema/apply { schema_source, allow_data_loss: true }`, and `apply_schema_with_options(.., SchemaApplyOptions { allow_data_loss: true })` produce identical plans and identical effects.
|
||||
This is the **direct/embedded** schema-apply path — `omnigraph schema apply --store …` and the embedded SDK `apply_schema_with_options(.., SchemaApplyOptions { allow_data_loss: true })` produce identical plans and identical effects.
|
||||
|
||||
**Cluster-managed graphs are different.** A graph served from a cluster evolves only through `omnigraph cluster apply`, which performs **soft drops only** (no `allow_data_loss` path), and the HTTP `POST /schema/apply` route is **disabled (returns 409) for cluster-backed serving** — see [server](../operations/server.md) and [cluster-config](../clusters/config.md). Direct `schema apply` against a cluster-managed storage path is likewise refused.
|
||||
|
|
|
|||
|
|
@ -22,7 +22,7 @@ list/`Blob` columns → none.
|
|||
|
||||
> **Coverage and cost.** Each indexed column adds index files and build time, and
|
||||
> an index only covers the fragments it was built over. Rows appended after the
|
||||
> index was built (e.g. by `ingest --mode merge`) are scanned unindexed until a
|
||||
> index was built (e.g. by `load --mode merge`) are scanned unindexed until a
|
||||
> reindex extends coverage; see [maintenance](../operations/maintenance.md) → `optimize`.
|
||||
|
||||
## L2 — OmniGraph orchestration
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue