mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
* feat(engine): sweep legacy __run__ branches via v2→v3 manifest migration Pre-v0.4.0 graphs can carry stale `__run__<id>` staging branches on the `__manifest` dataset, left by the Run state machine removed in MR-771. Lance's `list_branches` still enumerates them, so they leak into `branch_list()` and count as blocking branches at schema-apply time. Add a one-time `migrate_v2_to_v3` arm to the internal-schema dispatcher: on the first read-write open it enumerates `__manifest` branches, deletes every `__run__*` ref, and bumps the stamp to 3. Idempotent under retry (re-enumerates fresh each run). The `"__run__"` prefix is inlined so the migration does not depend on the run_registry guard that MR-770 removes next. This is the prerequisite sweep; the guard removal follows in the next commit. * refactor(engine): remove the legacy __run__ branch guard (MR-770) With the v2→v3 migration sweeping stale `__run__*` branches off `__manifest` on first read-write open, the defense-in-depth `is_internal_run_branch` guard is no longer needed. - delete `db/run_registry.rs`; drop the module + re-export from `db/mod.rs` - collapse `is_internal_system_branch` to the schema-apply-lock check only - `ensure_public_branch_ref`: drop the run-ref rejection; `__run__*` is now an ordinary branch name - `branch_merge`: reject `is_internal_system_branch` (was run-only) so the schema-apply lock is rejected consistently with create/delete — a small, deliberate tightening - update the inline schema-apply test + the writes integration tests (`public_branch_apis_reject_internal_run_refs` → `public_branch_apis_reject_internal_system_refs`, which also asserts `__run__*` now creates successfully) - docs: flip the "pending production sweep / defense-in-depth" notes to "auto-swept by the v2→v3 migration"; document the read-only-open limitation Known residual: the inert `_graph_runs.lance` / `_graph_run_actors.lance` bytes remain until a `StorageAdapter::delete_prefix` primitive lands. * fix(engine): run __run__ sweep at Omnigraph::open, not only on publish Review (PR #132) caught a regression: removing __run__ from `is_internal_system_branch` exposed legacy `__run__*` branches to the schema-apply blocking-branch checks (schema_apply.rs:104 and :778) and to `branch_list()`, but the v2→v3 sweep ran only inside the publisher's `load_publish_state`. On a pre-v0.4.0 graph whose first write is a schema apply, the blocking-branch check fires before any publish, so apply failed with "found non-main branches: __run__…". The same lazy timing also created a reverse hazard: a user-created `__run__*` branch on a still-v2 graph could be deleted by the first publish's sweep. Fix: run the internal-schema migration in `Omnigraph::open(ReadWrite)` (new `manifest::migrate_on_open`), before the coordinator reads branch state. The sweep now lands before any branch-observing code, and a graph is stamped v3 at open — so the one-time sweep can never catch a legitimately-created branch. Both checks and `branch_list` see the swept graph; correct by construction for every write path. Accepted residual: a read-only open of an unmigrated legacy graph still lists `__run__*` (read-only opens must not write, so they can't sweep). Documented. Regression test `legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply` confirmed RED before the fix (panicked on the branch_list leak assertion) and GREEN after. Also updates the stale schema_apply.rs comment, the writes.md "Migration code" section, and adds the v3 row to storage.md's migration table. * test(engine): sweep multiple legacy __run__ branches; doc nit Strengthen the v2→v3 migration test to synthesize three `__run__*` branches (a real legacy graph accumulates one per run) so the migration's delete loop is exercised on a single reused dataset handle, not just a single branch. Confirms multi-branch deletion is safe. Also drop a stale "active runs" reference from the branch_delete doc line. * fix(engine): force-delete in __run__ sweep for concurrency safety `migrate_v2_to_v3` ran `Dataset::delete_branch` (= `branches().delete(.., false)`), which errors "BranchContents not found" if the branch is already gone. Since the sweep now runs in `Omnigraph::open(ReadWrite)`, two processes opening the same legacy v2 graph concurrently would race: one wins each delete, the other's open fails. The migration only claimed idempotency under *sequential* retry. Switch to `Dataset::force_delete_branch` (= `delete(.., true)`), Lance's documented path for cleaning up zombie branches, which tolerates an already-absent branch. The sweep is now idempotent under concurrent runners and robust to partial/zombie state. Found in self-review; no behavior change for the common single-open path. * docs(release): note MR-770 __run__ cleanup in v0.6.1 * docs(branches): reconcile branch cleanup semantics
4.7 KiB
4.7 KiB
Omnigraph v0.6.1
v0.6.1 focuses on operational polish after v0.6.0: stored-query registries, safer branch cleanup, more complete release artifacts, and a Lance blob-compaction workaround.
Highlights
- Stored-query registries.
omnigraph.yamlcan declare curatedqueries:blocks per graph. Servers load and type-check them at startup,omnigraph queries validatechecks them offline,omnigraph queries listshows exposed queries and typed params,GET /queriesexposes a typed catalog, andPOST /queries/{name}invokes a stored query without accepting ad hoc.gqsource from the client. - Stored-query policy gate. New Cedar action
invoke_querygates the stored-query invocation surface. Stored mutations are double-gated:invoke_queryto reach the stored query andchangefor the actual write. - Safer branch deletion.
branch_deletenow treats the manifest as the authority, flips branch visibility atomically, and reclaims per-table/commit-graph forks as derived state. If best-effort reclaim is interrupted,cleanupreconciles orphaned forks; reusing a branch name before cleanup reports an actionable error. - Legacy
__run__cleanup (MR-770). Removed the last functional remnant of the Run state machine (retired in v0.4.0): the__run__branch-name guard. A new v2→v3__manifestinternal-schema migration sweeps any stale__run__*staging branches on the first read-write open, so__run__*is no longer a reserved branch name. This closes the "unpromoted__run__branches block reads" condition behind the zombie-run cascade incident; the inert_graph_runs.lancerow cleanup is tracked separately (it needs adelete_prefixprimitive). - Blob-safe optimize.
omnigraph optimizeskips tables withBlobproperties instead of failing the whole sweep on Lance's blob-v2 compaction decode bug. Skips are visible in human output,--jsonasskipped,TableOptimizeStats.skipped, and logs; non-blob tables still compact normally. - Deployment improvements. The container entrypoint now composes
OMNIGRAPH_TARGET_URIwithOMNIGRAPH_CONFIG, so operators can keep the graph URI in env while loading policy/query config from a mounted file. The local RustFS bootstrap pins RustFS beta.3 and allows the current insecure local-dev default credentials. - Windows release support. Tagged and edge releases now publish Windows x86_64 archives containing
omnigraph.exeandomnigraph-server.exe, with a PowerShell installer and Windows install docs. - Release tooling. Homebrew formula generation was tightened to produce audit-clean formulas.
Compatibility Notes
- A graph selected by name (
--targetorserver.graph) now usesgraphs.<name>.policyandgraphs.<name>.queries. Top-levelpolicy/queriesblocks are only for anonymous bare-URI single-graph mode; using them with a named graph now fails loudly with migration guidance. mcp.exposedefaults totruefor stored-query registry entries. Setmcp: { expose: false }for service-only queries that should not appear in the catalog.invoke_queryis graph-scoped, not branch-scoped. Branch/snapshot access remains enforced by the innerread/changegate.- Legacy
__run__migration. Graphs created before v0.4.0 are migrated automatically on the first read-write open by a v0.6.1 binary (one-time__manifeststamp v2→v3 sweep of stale__run__*branches). No action required. Two caveats: (1) a graph opened read-only still lists any stale__run__*branch until its first read-write open, since the migration is write-path-only like all manifest migrations — long-lived read-only deployments should be opened read-write once after upgrading; (2) the inert_graph_runs.lance/_graph_run_actors.lancedataset bytes are left in place until a futuredelete_prefixprimitive (they are invisible to graph-level state). - Blob tables are not compacted until the upstream Lance fix lands, so fragment count and deleted-row space on blob tables are not reclaimed by
optimize. Reads, writes, and query results are unaffected; no on-disk migration is required. TableOptimizeStatsis now#[non_exhaustive]and gains askipped: Option<SkipReason>field (so does the newSkipReasonenum). This is a source-level change only for downstream code that built this returned result struct by literal — rare, since it is produced byoptimizeand consumed by reading its fields; field access is unaffected, and#[non_exhaustive]keeps future additions non-breaking.
Docs And Cleanup
- Public docs were updated for stored queries, policy, server routes, deployment, Windows installation, branch deletion, maintenance, and the
runsdocs rename towrites. - README copy and release documentation were refreshed; older release notes had small typo/wording fixes.