Merge branch 'main' into ragnorc/omnigraph-mcp-crate

Folds in v0.7.1 (release #290 + optimize/write-path/internal-table-compaction
fixes #288/#291/#297) under the MCP branch.

Conflict resolutions (5 files):
- crates/omnigraph-server/Cargo.toml: take main's 0.7.1 path-dep constraints;
  keep our omnigraph-mcp dep (bumped to 0.7.1) + http dep.
- crates/omnigraph-server/src/handlers.rs: keep our server_list_queries
  doc-comment (exposed @mcp(expose) subset, invoke_query-gated) — it supersedes
  main's pre-@mcp(expose) text, since this branch adds the per-query expose flag.
- docs/user/operations/server.md: keep our GET /queries description
  (invoke_query gate + @mcp(expose) exposure) over main's read-gated/list-all text.
- docs/dev/index.md: keep both in-flight RFC rows; renumber this branch's tenancy
  RFC 013 -> 014 (rfc-014-tenancy-cells.md) since main now owns RFC-013
  (rfc-013-write-path-latency.md). Title + index link updated; link-check green.
- openapi.json: regenerated from merged source (OMNIGRAPH_UPDATE_OPENAPI=1) — now
  info.version 0.7.1 with our invoke_query/@mcp schema.

Coherence: omnigraph-mcp bumped 0.7.0 -> 0.7.1 to match the workspace; Cargo.lock
updated. cargo build --workspace green; server/mcp/api-types/compiler suites green
(schema_routes.rs reopen-after-apply flakes under parallel IO on a near-full disk,
passes single-threaded — a pre-existing main test, unchanged by the merge).
This commit is contained in:
Ragnor Comerford 2026-06-23 18:26:45 +02:00
commit adc36adf32
No known key found for this signature in database
44 changed files with 3595 additions and 528 deletions

View file

@ -6,6 +6,8 @@
- Compacts every node + edge table on `main`, then reindexes them, then **publishes the resulting version to the `__manifest`** so the manifest's recorded version tracks the compacted-and-reindexed state. Reads pin the manifest version, so without this publish the work would be invisible to readers *and* would break the version precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually changed.
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older versions until `cleanup` runs.
- **Also compacts the internal system tables** `__manifest`, `_graph_commits`, and `_graph_commit_actors` (RFC-013 step 2), which accumulate one fragment per commit (the actor table only on the authenticated write path, where every commit carries an actor) and otherwise make every write's metadata scan grow with history. These take a simpler path than data tables: they are not `__manifest`-tracked (readers open them at their latest version), so compaction just advances their version in place — **no manifest publish and no recovery sidecar**. (The sidecar-free property is not because it is one commit — `compact_files` can emit a `ReserveFragments` commit before the `Rewrite`, and the auto-cleanup strip below is a further commit — but because every one of those commits is content-preserving and the table is read at its latest version, so a crash at any point leaves it readable and content-identical and the next `optimize` re-plans.) They appear in the returned stats under `table_key` `"__manifest"` / `"_graph_commits"` / `"_graph_commit_actors"` (the latter two only when present). They are **not yet covered by `cleanup`**, so their version chain still grows until the cleanup half lands (it requires a cleanup-resurrection safeguard first); run `optimize` on a cadence to keep per-write metadata scans flat.
- **`optimize` is non-destructive by construction — it never garbage-collects versions, on any table (data or internal).** Compaction rewrites fragments and advances the version; old versions stay reachable until you run `cleanup`. This holds even for a graph created by an older binary that stored an on-by-default Lance `auto_cleanup` hook: `compact_files` / `optimize_indices` commit with the hook enabled and expose no skip override, so before compacting **any** table `optimize` strips its stale `lance.auto_cleanup.*` config first, so Lance's commit-time GC hook cannot fire and silently prune `__manifest`-pinned versions. (Graphs created by current binaries store no such config; the strip is the upgrade-path safety net.) The internal-table path additionally tolerates a concurrent live writer: it runs a **bounded** rebase-and-retry, so transient contention does not fail the operator's `optimize` or the live write — but sustained contention past the retry budget surfaces a loud conflict error rather than looping forever (bounded and observable, not a silent give-up). The data-table path holds the per-table write queue while it compacts, so it does not contend with mutations on that table in the first place.
- **Reindex (index coverage maintenance).** A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the index was built (e.g. by `load --mode merge`, whose commit does not rebuild an already-existing index) are scanned unindexed, and compaction itself rewrites fragments out of an index's coverage. `optimize` runs Lance's incremental `optimize_indices` after compaction to fold those fragments back in (a delta merge, not a full retrain), restoring full coverage so equality/range/traversal predicates stay index-accelerated. This is why a table with **no compaction work but stale index coverage still commits** a new version under `optimize`. Run `optimize` on a cadence at least as frequent as your freshness window so recently-loaded rows do not linger in the unindexed flat-scan tail.
- **Create declared-but-missing indexes (the index reconciler).** `@index`/`@key` declares intent; `schema apply` records it but builds nothing, and `load`/`mutate` defer a column that cannot be built yet (a `Vector` column with no trainable vectors). `optimize` materializes any such declared-but-unbuilt index over the compacted layout — so it is the convergence path for an `@index` added after data exists, or a vector index whose embeddings arrived via a later `embed`. A column still not buildable (no vectors yet) is reported on the table's stat as `pending_indexes` (visible in `--json`), not treated as a failure; the next `optimize` retries. So `optimize` is the single operator-facing index reconciler: it compacts, restores coverage, **and** builds declared-but-missing indexes.
- Each table's compact→reindex→publish serializes with concurrent mutations on the same table. A crash mid-operation is recovered automatically on the next open (both compaction and reindex are content-preserving, so roll-forward is always safe).

View file

@ -40,7 +40,7 @@ storage root, with no local config directory. `--bind`,
### Stored-query validation at startup
If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup**. Query parse/type failures quarantine that graph; if no graph remains healthy, startup refuses. Two MCP-exposed queries claiming the same tool name are likewise graph-local startup failures. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the exposed queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below).
If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup**. Query parse/type failures quarantine that graph; if no graph remains healthy, startup refuses. Two MCP-exposed queries claiming the same tool name are likewise graph-local startup failures. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the stored queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below).
## Endpoint inventory
@ -77,6 +77,11 @@ Server-level management endpoints:
|---|---|---|---|
| GET | `/graphs` | bearer + `graph_list` on `Server::"root"` | list ready/served graphs |
> The per-graph subsections below name routes in shorthand (`GET /queries`,
> `POST /query`, `POST /mutate`, `POST /queries/{name}`); every one is served
> under the `/graphs/{id}/…` prefix shown in the table — only `/graphs` and
> `/healthz` are flat.
### Stored-query catalog (`GET /queries`)
List the graph's **exposed** (`@mcp(expose: true)`) stored queries as a typed tool catalog — enough for a client to register each as a tool without fetching `.gq` source. (The server also projects these queries as live MCP tools at `POST /graphs/{id}/mcp` — see [mcp.md](mcp.md); this catalog endpoint is the REST view of the same registry.) Each entry: `{ name, tool_name, description, instruction, mutation, params }`, where each param is `{ name, kind, item_kind?, vector_dim?, nullable, description? }`. `kind` is one of `string | bool | int | bigint | float | date | datetime | blob | vector | list` (decomposed so a consumer maps it with a closed `switch`, never re-parsing GQ type spelling). `bigint` (I64/U64), `date`, `datetime`, and `blob` are carried as JSON **strings** — a 64-bit integer loses precision as a JSON number, dates are ISO strings, and a blob is a URI string.
@ -179,8 +184,8 @@ Uniform `ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? }` wi
caller's pre-write view of one table's manifest version was stale.
`ManifestConflictOutput { table_key, expected, actual }` tells the client
which table to refresh and retry. This is the conflict shape produced by
concurrent `/mutate` (or its `/change` alias) or `/ingest` calls landing
the same `(table, branch)` race.
concurrent `/mutate` (or its `/change` alias), `/load` (or its deprecated
`/ingest` alias) calls landing the same `(table, branch)` race.
HTTP status codes used: 200, 400, 401, 403, 404, 409, 429, 500.
@ -207,7 +212,8 @@ Cedar policy authorization runs **before** admission accounting so
denied requests don't consume admission slots.
Today admission gates every mutating handler: `/mutate` (and its
deprecated alias `/change`), `/ingest`, `/branches/{create,delete,merge}`,
deprecated alias `/change`), `/load` (and its deprecated alias `/ingest`),
`/branches/{create,delete,merge}`,
and `/schema/apply`. Read-only endpoints (`/snapshot`, `/query`, `/read`,
`/export`, `/branches` GET, `/commits`, `/schema` GET) are not
admission-gated.
@ -215,7 +221,7 @@ admission-gated.
## Body limits
- Default: 1 MB
- `/ingest`: 32 MB
- `/load` (and its deprecated `/ingest` alias): 32 MB
## Auth model (`bearer + SHA-256`)
@ -243,7 +249,7 @@ See [deployment.md](../deployment.md) for token-source operational details.
- CORS — not configured; add `tower_http::cors` if needed.
- Rate limiting — per-actor admission control gates `/mutate` (alias
`/change`), `/ingest`, `/branches/{create,delete,merge}`,
`/change`), `/load` (alias `/ingest`), `/branches/{create,delete,merge}`,
`/schema/apply` (see "Per-actor
admission control" above). No global rate limiter is configured;
add `tower_http::limit` if a graph-wide cap is needed.