mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
* feat(engine): sweep legacy __run__ branches via v2→v3 manifest migration Pre-v0.4.0 graphs can carry stale `__run__<id>` staging branches on the `__manifest` dataset, left by the Run state machine removed in MR-771. Lance's `list_branches` still enumerates them, so they leak into `branch_list()` and count as blocking branches at schema-apply time. Add a one-time `migrate_v2_to_v3` arm to the internal-schema dispatcher: on the first read-write open it enumerates `__manifest` branches, deletes every `__run__*` ref, and bumps the stamp to 3. Idempotent under retry (re-enumerates fresh each run). The `"__run__"` prefix is inlined so the migration does not depend on the run_registry guard that MR-770 removes next. This is the prerequisite sweep; the guard removal follows in the next commit. * refactor(engine): remove the legacy __run__ branch guard (MR-770) With the v2→v3 migration sweeping stale `__run__*` branches off `__manifest` on first read-write open, the defense-in-depth `is_internal_run_branch` guard is no longer needed. - delete `db/run_registry.rs`; drop the module + re-export from `db/mod.rs` - collapse `is_internal_system_branch` to the schema-apply-lock check only - `ensure_public_branch_ref`: drop the run-ref rejection; `__run__*` is now an ordinary branch name - `branch_merge`: reject `is_internal_system_branch` (was run-only) so the schema-apply lock is rejected consistently with create/delete — a small, deliberate tightening - update the inline schema-apply test + the writes integration tests (`public_branch_apis_reject_internal_run_refs` → `public_branch_apis_reject_internal_system_refs`, which also asserts `__run__*` now creates successfully) - docs: flip the "pending production sweep / defense-in-depth" notes to "auto-swept by the v2→v3 migration"; document the read-only-open limitation Known residual: the inert `_graph_runs.lance` / `_graph_run_actors.lance` bytes remain until a `StorageAdapter::delete_prefix` primitive lands. * fix(engine): run __run__ sweep at Omnigraph::open, not only on publish Review (PR #132) caught a regression: removing __run__ from `is_internal_system_branch` exposed legacy `__run__*` branches to the schema-apply blocking-branch checks (schema_apply.rs:104 and :778) and to `branch_list()`, but the v2→v3 sweep ran only inside the publisher's `load_publish_state`. On a pre-v0.4.0 graph whose first write is a schema apply, the blocking-branch check fires before any publish, so apply failed with "found non-main branches: __run__…". The same lazy timing also created a reverse hazard: a user-created `__run__*` branch on a still-v2 graph could be deleted by the first publish's sweep. Fix: run the internal-schema migration in `Omnigraph::open(ReadWrite)` (new `manifest::migrate_on_open`), before the coordinator reads branch state. The sweep now lands before any branch-observing code, and a graph is stamped v3 at open — so the one-time sweep can never catch a legitimately-created branch. Both checks and `branch_list` see the swept graph; correct by construction for every write path. Accepted residual: a read-only open of an unmigrated legacy graph still lists `__run__*` (read-only opens must not write, so they can't sweep). Documented. Regression test `legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply` confirmed RED before the fix (panicked on the branch_list leak assertion) and GREEN after. Also updates the stale schema_apply.rs comment, the writes.md "Migration code" section, and adds the v3 row to storage.md's migration table. * test(engine): sweep multiple legacy __run__ branches; doc nit Strengthen the v2→v3 migration test to synthesize three `__run__*` branches (a real legacy graph accumulates one per run) so the migration's delete loop is exercised on a single reused dataset handle, not just a single branch. Confirms multi-branch deletion is safe. Also drop a stale "active runs" reference from the branch_delete doc line. * fix(engine): force-delete in __run__ sweep for concurrency safety `migrate_v2_to_v3` ran `Dataset::delete_branch` (= `branches().delete(.., false)`), which errors "BranchContents not found" if the branch is already gone. Since the sweep now runs in `Omnigraph::open(ReadWrite)`, two processes opening the same legacy v2 graph concurrently would race: one wins each delete, the other's open fails. The migration only claimed idempotency under *sequential* retry. Switch to `Dataset::force_delete_branch` (= `delete(.., true)`), Lance's documented path for cleaning up zombie branches, which tolerates an already-absent branch. The sweep is now idempotent under concurrent runners and robust to partial/zombie state. Found in self-review; no behavior change for the common single-open path. * docs(release): note MR-770 __run__ cleanup in v0.6.1 * docs(branches): reconcile branch cleanup semantics
1.5 KiB
1.5 KiB
Constants & Tunables (cheat sheet)
| Name | Value | Where |
|---|---|---|
MANIFEST_DIR |
__manifest |
db/manifest/layout.rs |
| Commit graph dir | _graph_commits.lance |
db/commit_graph.rs |
| Run registry dir (legacy, removed MR-771) | _graph_runs.lance |
inert post-v0.4.0; bytes remain until a delete_prefix primitive lands |
| Run branch prefix (legacy, removed MR-771/MR-770) | __run__ |
swept off __manifest by the v2→v3 migration; no longer a reserved name |
| Schema apply lock | __schema_apply_lock__ |
db/mod.rs |
| Manifest publisher retry budget | PUBLISHER_RETRY_BUDGET = 5 |
db/manifest/publisher.rs |
| Internal manifest schema version | INTERNAL_MANIFEST_SCHEMA_VERSION = 3 |
db/manifest/migrations.rs |
| Merge stage batch | MERGE_STAGE_BATCH_ROWS = 8192 |
exec/merge.rs |
| Maintenance concurrency | OMNIGRAPH_MAINTENANCE_CONCURRENCY=8 |
db/omnigraph/optimize.rs |
| Lance blob compaction support | LANCE_SUPPORTS_BLOB_COMPACTION = false |
db/omnigraph/optimize.rs |
| Graph index cache size | 8 (LRU) |
runtime_cache.rs |
| Default body limit | 1 MB |
omnigraph-server/lib.rs |
| Ingest body limit | 32 MB |
omnigraph-server/lib.rs |
| Engine embed model | gemini-embedding-2-preview |
omnigraph/embedding.rs |
| Compiler embed model | text-embedding-3-small |
omnigraph-compiler/embedding.rs |
| Embed timeout | 30 000 ms |
both clients |
| Embed retries | 4 |
both clients |
| Embed retry backoff | 200 ms |
both clients |
| LANCE memory pool default | 1 GB (raised in v0.3.0) |
runtime |