mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-21 02:28:07 +02:00
Merge branch 'main' into ragnorc/index-best-practices-audit
This commit is contained in:
commit
5ca5c40df7
38 changed files with 5302 additions and 202 deletions
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
A reference for the `omnigraph` binary's command surface and `omnigraph.yaml` schema. For a quick-start guide, see [cli.md](cli.md).
|
||||
|
||||
17 top-level command families, 40+ subcommands. All commands accept either a positional `URI`, `--uri`, or a `--target <name>` resolved against `omnigraph.yaml`.
|
||||
Top-level command families and subcommands. Graph-targeting commands accept either a positional `URI`, `--uri`, or a `--target <name>` resolved against `omnigraph.yaml`; `cluster` commands use `--config <dir>`.
|
||||
|
||||
## Top-level commands
|
||||
|
||||
|
|
@ -17,11 +17,12 @@ A reference for the `omnigraph` binary's command surface and `omnigraph.yaml` sc
|
|||
| `export` | dump to JSONL on stdout (`--type T`, `--table K` filters) |
|
||||
| `branch create \| list \| delete \| merge` | branching ops |
|
||||
| `commit list \| show` | inspect commit graph |
|
||||
| `run list \| show \| publish \| abort` | transactional run ops |
|
||||
| `schema plan \| apply \| show (alias: get)` | migrations |
|
||||
| `lint` (alias: `check`) | offline / graph-backed query validation. Replaces `query lint` / `query check`, which are kept as deprecated argv-level shims that print a one-line warning and rewrite to `omnigraph lint` |
|
||||
| `queries validate \| list` | operate on the server-side stored-query registry (the `queries:` block). `validate` type-checks every stored query against the live schema offline (opens the selected graph; exits non-zero on any breakage), catching schema drift without restarting the server; `list` prints the selected registry's query names, MCP exposure, and typed params. For per-graph registries, pass `--target <graph>` or set `cli.graph`; with no graph selection, `list` shows only top-level `queries:`. Distinct from `lint`, which validates a single `.gq` file |
|
||||
| `optimize` | non-destructive Lance compaction (skips tables with `Blob` columns; `--json` reports a `skipped` field) |
|
||||
| `cluster validate \| plan \| status` | read-only cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json` while briefly holding `__cluster/lock.json`; `status` reads the state ledger. No apply, graph open, live drift scan, server change, or `state.json` mutation occurs in Stage 2A |
|
||||
| `optimize` | non-destructive Lance compaction (skips tables with `Blob` columns or uncovered drift; `--json` reports `skipped`) |
|
||||
| `repair [--confirm] [--force]` | preview or explicitly publish uncovered manifest/head drift. `--confirm` heals verified maintenance drift and exits non-zero if suspicious/unverifiable drift is refused; `--force --confirm` publishes suspicious/unverifiable drift after operator review |
|
||||
| `cleanup --keep N --older-than 7d --confirm` | destructive version GC |
|
||||
| `embed` | offline JSONL embedding pipeline |
|
||||
| `policy validate \| test \| explain` | Cedar tooling. Selects `cli.graph`, else `server.graph`, else top-level `policy.file` |
|
||||
|
|
@ -73,6 +74,23 @@ policy:
|
|||
file: ./policy.yaml
|
||||
```
|
||||
|
||||
## Cluster config preview
|
||||
|
||||
```bash
|
||||
omnigraph cluster validate --config ./company-brain
|
||||
omnigraph cluster plan --config ./company-brain --json
|
||||
omnigraph cluster status --config ./company-brain --json
|
||||
```
|
||||
|
||||
`--config` is a directory containing `cluster.yaml`; it defaults to `.`.
|
||||
Stage 2A accepts graphs, schemas, stored queries, and policy bundle file
|
||||
references. `cluster plan` reads local JSON state from
|
||||
`<config-dir>/__cluster/state.json`; a missing file means empty state. Plan
|
||||
acquires `__cluster/lock.json` by default and releases it before returning.
|
||||
`cluster status` reads state only and reports any existing lock. External state
|
||||
backends, apply, refresh/import, pipelines, UI specs, embeddings, aliases, and
|
||||
bindings are reserved for later stages. See [cluster-config.md](cluster-config.md).
|
||||
|
||||
## Output formats (`query` command, alias: `read`)
|
||||
|
||||
- `json` — pretty-printed object with metadata + rows
|
||||
|
|
|
|||
126
docs/user/cluster-config.md
Normal file
126
docs/user/cluster-config.md
Normal file
|
|
@ -0,0 +1,126 @@
|
|||
# Cluster Config
|
||||
|
||||
**Status:** Stage 2A read-only preview.
|
||||
|
||||
Cluster config is the future control-plane configuration surface for a whole
|
||||
OmniGraph deployment. In this stage, OmniGraph can validate a local
|
||||
`cluster.yaml` folder, produce a deterministic read-only plan, and inspect the
|
||||
local JSON state ledger. It does not apply changes, open graph roots, scan live
|
||||
cluster state, start servers, or write graph resources.
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
omnigraph cluster validate --config ./company-brain
|
||||
omnigraph cluster plan --config ./company-brain --json
|
||||
omnigraph cluster status --config ./company-brain --json
|
||||
```
|
||||
|
||||
`--config` points at a directory, not a file. The directory must contain
|
||||
`cluster.yaml`. When omitted, it defaults to the current directory.
|
||||
|
||||
## Supported `cluster.yaml`
|
||||
|
||||
Stage 2A accepts only the read-only resource subset:
|
||||
|
||||
```yaml
|
||||
version: 1
|
||||
metadata:
|
||||
name: company-brain
|
||||
|
||||
state:
|
||||
backend: cluster
|
||||
lock: true
|
||||
|
||||
graphs:
|
||||
knowledge:
|
||||
schema: ./knowledge.pg
|
||||
queries:
|
||||
find_experts:
|
||||
file: ./knowledge.gq
|
||||
|
||||
policies:
|
||||
base:
|
||||
file: ./base.policy.yaml
|
||||
applies_to: [knowledge]
|
||||
```
|
||||
|
||||
`metadata.name` is a display label. `state.backend` may be omitted or set to
|
||||
`cluster`; external state backends are reserved for a later stage. `state.lock`
|
||||
defaults to `true`. When enabled, `cluster plan` briefly acquires
|
||||
`<config-dir>/__cluster/lock.json` while it reads state, then removes it before
|
||||
returning. `cluster status` never acquires the lock; it only reports whether one
|
||||
is present.
|
||||
|
||||
## Validation
|
||||
|
||||
`cluster validate` checks:
|
||||
|
||||
- `cluster.yaml` syntax and supported fields
|
||||
- duplicate YAML keys
|
||||
- schema, query, and policy file existence
|
||||
- schema parsing and catalog construction
|
||||
- stored-query parsing and query-name matching
|
||||
- stored-query type-checking against the desired schema
|
||||
- policy `applies_to` graph references
|
||||
|
||||
Fields reserved for later phases, such as `pipelines`, `embeddings`, `ui`,
|
||||
`aliases`, and `bindings`, fail with a typed diagnostic instead of being
|
||||
silently ignored.
|
||||
|
||||
## Planning
|
||||
|
||||
`cluster plan` first performs validation, then reads local JSON state from:
|
||||
|
||||
```text
|
||||
<config-dir>/__cluster/state.json
|
||||
```
|
||||
|
||||
If the file is missing, the state is treated as empty and every desired
|
||||
resource is planned as a create. If present, the file must use this shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 1,
|
||||
"state_revision": 0,
|
||||
"applied_revision": {
|
||||
"config_digest": "...",
|
||||
"resources": {
|
||||
"graph.knowledge": { "digest": "..." },
|
||||
"schema.knowledge": { "digest": "..." },
|
||||
"query.knowledge.find_experts": { "digest": "..." },
|
||||
"policy.base": { "digest": "..." }
|
||||
}
|
||||
},
|
||||
"resource_statuses": {
|
||||
"graph.knowledge": {
|
||||
"status": "applied",
|
||||
"conditions": [],
|
||||
"message": "optional status detail"
|
||||
}
|
||||
},
|
||||
"approval_records": {},
|
||||
"recovery_records": {},
|
||||
"observations": {}
|
||||
}
|
||||
```
|
||||
|
||||
`state_revision`, `resource_statuses`, `approval_records`, `recovery_records`,
|
||||
and `observations` are optional so older Stage 1 state fixtures keep working.
|
||||
Missing `state_revision` is treated as `0`. Resource status values are
|
||||
`pending`, `planned`, `applying`, `applied`, `drifted`, `blocked`, or `error`.
|
||||
|
||||
Plan output compares desired resource digests against state resource digests
|
||||
and reports `create`, `update`, and `delete` changes. It also reports the state
|
||||
CAS (`sha256:<digest>`) and state revision. `state_observations.locked` means an
|
||||
existing lock file was observed; a successful `plan` instead reports
|
||||
`lock_acquired: true` and an `acquired_lock_id`, then releases the lock before
|
||||
returning. The command never writes `state.json`; apply, refresh, import, and
|
||||
live drift scans are later-stage work.
|
||||
|
||||
## Status
|
||||
|
||||
`cluster status` reads the same local JSON state ledger and prints what the
|
||||
ledger says is deployed. It does not validate referenced schema/query/policy
|
||||
files and does not inspect live graphs. Missing `state.json` succeeds with a
|
||||
warning; invalid state JSON or an unsupported state version fails.
|
||||
|
|
@ -13,6 +13,7 @@ of MRs, internal recovery mechanics, or contributor-only invariants.
|
|||
| Install OmniGraph | [install.md](install.md) |
|
||||
| Run the CLI locally | [cli.md](cli.md) |
|
||||
| Look up every CLI flag and config field | [cli-reference.md](cli-reference.md) |
|
||||
| Validate and plan cluster config | [cluster-config.md](cluster-config.md) |
|
||||
| Write schemas | [schema-language.md](schema-language.md) |
|
||||
| Read schema-lint diagnostic codes | [schema-lint.md](schema-lint.md) |
|
||||
| Write queries and mutations | [query-language.md](query-language.md) |
|
||||
|
|
|
|||
|
|
@ -1,17 +1,26 @@
|
|||
# Maintenance: Optimize & Cleanup
|
||||
# Maintenance: Optimize, Repair & Cleanup
|
||||
|
||||
`db/omnigraph/optimize.rs`.
|
||||
`db/omnigraph/optimize.rs` and `db/omnigraph/repair.rs`.
|
||||
|
||||
## `optimize_all_tables(db)` — non-destructive
|
||||
|
||||
- Lance `compact_files()` on every node + edge table on `main`, then **publishes the compacted version to the `__manifest`** so the manifest's `table_version` tracks the compacted Lance HEAD. Reads pin the manifest version, so without this publish compaction would be invisible to readers *and* would break the HEAD-vs-manifest precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually compacted.
|
||||
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests until `cleanup` runs.
|
||||
- Each table's compact→publish runs under its per-`(table, main)` write queue (serializing with concurrent mutations — compaction is a Lance `Rewrite` op that retryable-conflicts with a concurrent merge/update/delete on overlapping fragments). The Lance-HEAD-before-manifest-publish gap is covered by a `SidecarKind::Optimize` recovery sidecar (loose-match): a crash in that window rolls the compacted version forward on the next `Omnigraph::open` (compaction is content-preserving, so roll-forward is always safe).
|
||||
- **Requires a recovered graph.** `optimize` refuses (errors) when an unresolved recovery sidecar is present under `__recovery` — operating on an unrecovered graph could publish a partial write the open-time recovery sweep would roll back. Reopen the graph to run the recovery sweep, then re-run `optimize`. (Recovery roll-back now publishes its restored version, so a recovered graph always satisfies `manifest == Lance HEAD` going in; there is no leftover drift for `optimize` to interpret.)
|
||||
- **Requires a recovered graph.** `optimize` refuses (errors) when an unresolved recovery sidecar is present under `__recovery` — operating on an unrecovered graph could publish a partial write the open-time recovery sweep would roll back. Reopen the graph to run the recovery sweep, then re-run `optimize`.
|
||||
- **Uncovered drift is skipped, not interpreted.** If a table's Lance HEAD is ahead of the version recorded in `__manifest` and no recovery sidecar covers that movement, `optimize` reports `skipped: Some(DriftNeedsRepair)` with the manifest/head versions and leaves the table untouched. Run `omnigraph repair` to classify and explicitly publish that drift.
|
||||
- Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8).
|
||||
- Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed, skipped }]`.
|
||||
- Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed, skipped, manifest_version, lance_head_version }]`.
|
||||
- **Blob tables are skipped.** A table that declares any `Blob` property is not compacted: it is reported with `skipped: Some(BlobColumnsUnsupportedByLance)` (and logged via `tracing::warn`) instead of compacted, and the rest of the sweep proceeds normally. The current Lance `compact_files` mis-decodes blob-v2 columns under its forced `BlobHandling::AllBinary` read; **reads and writes are unaffected** — only compaction is. This is gated by `LANCE_SUPPORTS_BLOB_COMPACTION` (`db/omnigraph/optimize.rs`) and removed when the upstream Lance fix lands (see [docs/dev/lance.md](../dev/lance.md)). Consequence: fragment count and deleted-row space on blob tables are not reclaimed until then; query results are never affected.
|
||||
|
||||
## `repair_all_tables(db, options)` — explicit
|
||||
|
||||
- Handles **uncovered manifest/head drift**: a table's Lance HEAD is ahead of the manifest pin and no recovery sidecar records the writer intent.
|
||||
- Preview by default. `omnigraph repair --json <uri>` reports each table's `classification`, `action`, manifest/head versions, Lance operation names, and any classification error. `--confirm` publishes only verified maintenance drift; if any suspicious or unverifiable table is refused, the CLI prints the per-table output and exits non-zero. `--force --confirm` also publishes suspicious or unverifiable drift after operator review.
|
||||
- Classifies drift by reading Lance transactions from `manifest_version + 1` through `lance_head_version`. Only `ReserveFragments` and `Rewrite` are verified maintenance. Semantic operations such as `Append`, `Delete`, `Update`, `Merge`, or missing transaction history are not auto-healed.
|
||||
- Publishes repair by advancing `__manifest` to the existing Lance HEAD; it does **not** rewrite Lance data. If the publish succeeds, normal reads and strict writes use the repaired version. If it fails, no new data-side partial state was created.
|
||||
- Requires a clean recovery state. Pending `__recovery` sidecars still belong to automatic sidecar recovery, not manual repair.
|
||||
|
||||
## `cleanup_all_tables(db, options)` — destructive
|
||||
|
||||
- Lance `cleanup_old_versions()` per table.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue