mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-12 01:45:14 +02:00
Add cluster state lock recovery
This commit is contained in:
parent
cb1e7bb5ea
commit
4fffddc6b7
6 changed files with 596 additions and 52 deletions
|
|
@ -8,7 +8,7 @@ This file is the always-on map of the test surface. **Consult it before every ta
|
|||
|---|---|---|
|
||||
| `omnigraph` (engine) | `crates/omnigraph/tests/` | Integration tests (21 files), fixture-driven, share `tests/helpers/mod.rs` |
|
||||
| `omnigraph-cli` | `crates/omnigraph-cli/tests/` | `cli.rs` (unit-ish), `system_local.rs`, `system_remote.rs`, share `tests/support/mod.rs` |
|
||||
| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests` | Cluster config parser, local JSON state diff, state CAS/lock handling, read-only validate/plan/status plus explicit refresh/import graph observations |
|
||||
| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests` | Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations |
|
||||
| `omnigraph-server` | `crates/omnigraph-server/tests/` | `server.rs` (HTTP-level), `openapi.rs` (OpenAPI drift / regeneration) |
|
||||
| `omnigraph-compiler` | mostly in-source `#[cfg(test)] mod tests` | Parser, type-checker, IR lowering, lint |
|
||||
|
||||
|
|
|
|||
|
|
@ -21,7 +21,7 @@ A reference for the `omnigraph` binary's command surface and `omnigraph.yaml` sc
|
|||
| `schema plan \| apply \| show (alias: get)` | migrations |
|
||||
| `lint` (alias: `check`) | offline / graph-backed query validation. Replaces `query lint` / `query check`, which are kept as deprecated argv-level shims that print a one-line warning and rewrite to `omnigraph lint` |
|
||||
| `queries validate \| list` | operate on the server-side stored-query registry (the `queries:` block). `validate` type-checks every stored query against the live schema offline (opens the selected graph; exits non-zero on any breakage), catching schema drift without restarting the server; `list` prints the selected registry's query names, MCP exposure, and typed params. For per-graph registries, pass `--target <graph>` or set `cli.graph`; with no graph selection, `list` shows only top-level `queries:`. Distinct from `lint`, which validates a single `.gq` file |
|
||||
| `cluster validate \| plan \| status \| refresh \| import` | cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json`; `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations. No apply, graph-resource mutation, server change, or `plan --refresh` occurs in Stage 2B |
|
||||
| `cluster validate \| plan \| status \| refresh \| import \| force-unlock` | cluster-control preview. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json`; `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock <LOCK_ID>` manually removes a held local state lock by exact id. No apply, graph-resource mutation, server change, automatic stale-lock breaking, or `plan --refresh` occurs in Stage 2C |
|
||||
| `optimize` | non-destructive Lance compaction (skips tables with `Blob` columns; `--json` reports a `skipped` field) |
|
||||
| `cleanup --keep N --older-than 7d --confirm` | destructive version GC |
|
||||
| `embed` | offline JSONL embedding pipeline |
|
||||
|
|
@ -82,19 +82,22 @@ omnigraph cluster plan --config ./company-brain --json
|
|||
omnigraph cluster status --config ./company-brain --json
|
||||
omnigraph cluster refresh --config ./company-brain --json
|
||||
omnigraph cluster import --config ./company-brain --json
|
||||
omnigraph cluster force-unlock <LOCK_ID> --config ./company-brain --json
|
||||
```
|
||||
|
||||
`--config` is a directory containing `cluster.yaml`; it defaults to `.`.
|
||||
Stage 2B accepts graphs, schemas, stored queries, and policy bundle file
|
||||
Stage 2C accepts graphs, schemas, stored queries, and policy bundle file
|
||||
references. `cluster plan` reads local JSON state from
|
||||
`<config-dir>/__cluster/state.json`; a missing file means empty state. Plan,
|
||||
refresh, and import acquire `__cluster/lock.json` by default and release it
|
||||
before returning. `cluster status` reads state only and reports any existing
|
||||
lock. `refresh` requires an existing `state.json`; `import` creates one only
|
||||
when it is missing. Both observe declared graphs read-only at
|
||||
lock metadata. `force-unlock` removes a lock only when the supplied id exactly
|
||||
matches the lock file. `refresh` requires an existing `state.json`; `import`
|
||||
creates one only when it is missing. Both observe declared graphs read-only at
|
||||
`<config-dir>/graphs/<graph-id>.omni`. External state backends, apply,
|
||||
`plan --refresh`, pipelines, UI specs, embeddings, aliases, and bindings are
|
||||
reserved for later stages. See [cluster-config.md](cluster-config.md).
|
||||
automatic stale-lock breaking, `plan --refresh`, pipelines, UI specs,
|
||||
embeddings, aliases, and bindings are reserved for later stages. See
|
||||
[cluster-config.md](cluster-config.md).
|
||||
|
||||
## Output formats (`query` command, alias: `read`)
|
||||
|
||||
|
|
|
|||
|
|
@ -1,13 +1,13 @@
|
|||
# Cluster Config
|
||||
|
||||
**Status:** Stage 2B state-observation preview.
|
||||
**Status:** Stage 2C state-lock recovery preview.
|
||||
|
||||
Cluster config is the future control-plane configuration surface for a whole
|
||||
OmniGraph deployment. In this stage, OmniGraph can validate a local
|
||||
`cluster.yaml` folder, produce a deterministic read-only plan, inspect the
|
||||
local JSON state ledger, and explicitly refresh/import graph observations into
|
||||
that ledger. It does not apply desired changes, start servers, or write graph
|
||||
resources.
|
||||
that ledger. It can also manually remove a held local state lock by exact lock
|
||||
id. It does not apply desired changes, start servers, or write graph resources.
|
||||
|
||||
## Commands
|
||||
|
||||
|
|
@ -17,6 +17,7 @@ omnigraph cluster plan --config ./company-brain --json
|
|||
omnigraph cluster status --config ./company-brain --json
|
||||
omnigraph cluster refresh --config ./company-brain --json
|
||||
omnigraph cluster import --config ./company-brain --json
|
||||
omnigraph cluster force-unlock <LOCK_ID> --config ./company-brain --json
|
||||
```
|
||||
|
||||
`--config` points at a directory, not a file. The directory must contain
|
||||
|
|
@ -24,7 +25,7 @@ omnigraph cluster import --config ./company-brain --json
|
|||
|
||||
## Supported `cluster.yaml`
|
||||
|
||||
Stage 2B accepts only the read-only resource subset:
|
||||
Stage 2C accepts only the read-only resource subset:
|
||||
|
||||
```yaml
|
||||
version: 1
|
||||
|
|
@ -53,7 +54,9 @@ policies:
|
|||
defaults to `true`. When enabled, `cluster plan`, `cluster refresh`, and
|
||||
`cluster import` briefly acquire `<config-dir>/__cluster/lock.json`, then remove
|
||||
it before returning. `cluster status` never acquires the lock; it only reports
|
||||
whether one is present.
|
||||
whether one is present. `cluster force-unlock` is the only lock-removal command;
|
||||
it requires the exact lock id and should be run only after confirming no cluster
|
||||
operation is active.
|
||||
|
||||
## Validation
|
||||
|
||||
|
|
@ -115,18 +118,19 @@ Missing `state_revision` is treated as `0`. Resource status values are
|
|||
|
||||
Plan output compares desired resource digests against state resource digests
|
||||
and reports `create`, `update`, and `delete` changes. It also reports the state
|
||||
CAS (`sha256:<digest>`), state revision, and lock id used for the read. The
|
||||
command never writes `state.json` and does not scan live graphs. Use explicit
|
||||
`cluster refresh` / `cluster import` when the state ledger should be updated
|
||||
from live observations. Apply and live drift scans during plan are later-stage
|
||||
work.
|
||||
CAS (`sha256:<digest>`), state revision, and lock metadata used for the read.
|
||||
The command never writes `state.json` and does not scan live graphs. Use
|
||||
explicit `cluster refresh` / `cluster import` when the state ledger should be
|
||||
updated from live observations. Apply and live drift scans during plan are
|
||||
later-stage work.
|
||||
|
||||
## Status
|
||||
|
||||
`cluster status` reads the same local JSON state ledger and prints what the
|
||||
ledger says is deployed. It does not validate referenced schema/query/policy
|
||||
files and does not inspect live graphs. Missing `state.json` succeeds with a
|
||||
warning; invalid state JSON or an unsupported state version fails.
|
||||
warning; invalid state JSON or an unsupported state version fails. If a lock is
|
||||
present, status reports its id, operation, creation time, pid, and age.
|
||||
|
||||
## Refresh And Import
|
||||
|
||||
|
|
@ -148,3 +152,14 @@ initial state.
|
|||
|
||||
Refresh/import do not observe query or policy resources yet. Existing query and
|
||||
policy state digests are preserved on refresh and are not invented on import.
|
||||
|
||||
## Force Unlock
|
||||
|
||||
`cluster force-unlock <LOCK_ID>` removes `<config-dir>/__cluster/lock.json` only
|
||||
when the file exists, is valid version-1 lock JSON, and its `lock_id` exactly
|
||||
matches the argument. A wrong id, missing lock, invalid lock JSON, or unsupported
|
||||
lock version exits non-zero and leaves the file untouched.
|
||||
|
||||
This is manual recovery for abandoned local locks. OmniGraph does not perform
|
||||
PID-liveness checks, TTL expiry, stale-lock breaking, or automatic unlock in
|
||||
Stage 2C.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue