mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
* feat(engine): sweep legacy __run__ branches via v2→v3 manifest migration Pre-v0.4.0 graphs can carry stale `__run__<id>` staging branches on the `__manifest` dataset, left by the Run state machine removed in MR-771. Lance's `list_branches` still enumerates them, so they leak into `branch_list()` and count as blocking branches at schema-apply time. Add a one-time `migrate_v2_to_v3` arm to the internal-schema dispatcher: on the first read-write open it enumerates `__manifest` branches, deletes every `__run__*` ref, and bumps the stamp to 3. Idempotent under retry (re-enumerates fresh each run). The `"__run__"` prefix is inlined so the migration does not depend on the run_registry guard that MR-770 removes next. This is the prerequisite sweep; the guard removal follows in the next commit. * refactor(engine): remove the legacy __run__ branch guard (MR-770) With the v2→v3 migration sweeping stale `__run__*` branches off `__manifest` on first read-write open, the defense-in-depth `is_internal_run_branch` guard is no longer needed. - delete `db/run_registry.rs`; drop the module + re-export from `db/mod.rs` - collapse `is_internal_system_branch` to the schema-apply-lock check only - `ensure_public_branch_ref`: drop the run-ref rejection; `__run__*` is now an ordinary branch name - `branch_merge`: reject `is_internal_system_branch` (was run-only) so the schema-apply lock is rejected consistently with create/delete — a small, deliberate tightening - update the inline schema-apply test + the writes integration tests (`public_branch_apis_reject_internal_run_refs` → `public_branch_apis_reject_internal_system_refs`, which also asserts `__run__*` now creates successfully) - docs: flip the "pending production sweep / defense-in-depth" notes to "auto-swept by the v2→v3 migration"; document the read-only-open limitation Known residual: the inert `_graph_runs.lance` / `_graph_run_actors.lance` bytes remain until a `StorageAdapter::delete_prefix` primitive lands. * fix(engine): run __run__ sweep at Omnigraph::open, not only on publish Review (PR #132) caught a regression: removing __run__ from `is_internal_system_branch` exposed legacy `__run__*` branches to the schema-apply blocking-branch checks (schema_apply.rs:104 and :778) and to `branch_list()`, but the v2→v3 sweep ran only inside the publisher's `load_publish_state`. On a pre-v0.4.0 graph whose first write is a schema apply, the blocking-branch check fires before any publish, so apply failed with "found non-main branches: __run__…". The same lazy timing also created a reverse hazard: a user-created `__run__*` branch on a still-v2 graph could be deleted by the first publish's sweep. Fix: run the internal-schema migration in `Omnigraph::open(ReadWrite)` (new `manifest::migrate_on_open`), before the coordinator reads branch state. The sweep now lands before any branch-observing code, and a graph is stamped v3 at open — so the one-time sweep can never catch a legitimately-created branch. Both checks and `branch_list` see the swept graph; correct by construction for every write path. Accepted residual: a read-only open of an unmigrated legacy graph still lists `__run__*` (read-only opens must not write, so they can't sweep). Documented. Regression test `legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply` confirmed RED before the fix (panicked on the branch_list leak assertion) and GREEN after. Also updates the stale schema_apply.rs comment, the writes.md "Migration code" section, and adds the v3 row to storage.md's migration table. * test(engine): sweep multiple legacy __run__ branches; doc nit Strengthen the v2→v3 migration test to synthesize three `__run__*` branches (a real legacy graph accumulates one per run) so the migration's delete loop is exercised on a single reused dataset handle, not just a single branch. Confirms multi-branch deletion is safe. Also drop a stale "active runs" reference from the branch_delete doc line. * fix(engine): force-delete in __run__ sweep for concurrency safety `migrate_v2_to_v3` ran `Dataset::delete_branch` (= `branches().delete(.., false)`), which errors "BranchContents not found" if the branch is already gone. Since the sweep now runs in `Omnigraph::open(ReadWrite)`, two processes opening the same legacy v2 graph concurrently would race: one wins each delete, the other's open fails. The migration only claimed idempotency under *sequential* retry. Switch to `Dataset::force_delete_branch` (= `delete(.., true)`), Lance's documented path for cleaning up zombie branches, which tolerates an already-absent branch. The sweep is now idempotent under concurrent runners and robust to partial/zombie state. Found in self-review; no behavior change for the common single-open path. * docs(release): note MR-770 __run__ cleanup in v0.6.1 * docs(branches): reconcile branch cleanup semantics
116 lines
11 KiB
Markdown
116 lines
11 KiB
Markdown
# Storage
|
||
|
||
## L1 — Lance dataset (per node/edge type)
|
||
|
||
Every node type and every edge type is its own Lance dataset:
|
||
|
||
- **Columnar Arrow storage**: each property is a column; nullable per Arrow schema.
|
||
- **Fragments**: data is partitioned into fragments; new writes create new fragments.
|
||
- **Manifest versioning**: every commit produces a new dataset version; old versions remain readable.
|
||
- **Stable row IDs**: `enable_stable_row_ids: true` is set on every Lance dataset OmniGraph creates — node and edge data tables, `__manifest`, `_graph_commits.lance`, `_graph_commit_recoveries.lance`, and any future system tables. This is an architectural invariant: the flag is one-way at dataset create per Lance's row-id-lineage spec, so a future change that introduces a Lance dataset must preserve it. Consequences: `_row_created_at_version` and `_row_last_updated_at_version` are available on every dataset (load-bearing for change-feed validators); `CreateIndex × Rewrite` is not a retryable conflict, so indices survive `omnigraph optimize` without needing the Fragment Reuse Index; readers must use a Lance build that recognises the flag (our pinned 4.0.0 is fine). Pre-0.4.x graphs created before this code path settled may have datasets without the flag and cannot be retrofitted in place — the supported path is dump-and-reload. The `stage_overwrite` rewrite path (used by `schema_apply`) preserves the flag through `Operation::Overwrite`; pinned by `stage_overwrite_preserves_stable_row_ids` in `crates/omnigraph/tests/staged_writes.rs`.
|
||
- **Append / delete / `merge_insert`**: native Lance write modes.
|
||
- **Per-dataset branches** (Lance native): copy-on-write at the dataset level.
|
||
- **Object-store agnostic**: file://, s3://, gs://, az://, http (read-only via Lance) — OmniGraph wires file:// and s3:// (`storage.rs`).
|
||
|
||
## L2 — Multi-dataset coordination via `__manifest`
|
||
|
||
OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordinated through one append-only manifest table.
|
||
|
||
- **Manifest table**: `__manifest/` Lance dataset.
|
||
- **Layout** (`db/manifest/layout.rs`, `db/manifest/state.rs`):
|
||
- `nodes/{fnv1a64-hex(type_name)}` — one Lance dataset per node type
|
||
- `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
|
||
- `__manifest/` — the catalog of all sub-tables and their published versions
|
||
- `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
|
||
- (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 graphs are inert; the run state machine was removed in MR-771. The v2→v3 manifest migration sweeps stale `__run__*` branches on first write-open; the inert dataset bytes themselves remain until a `delete_prefix` storage primitive lands)
|
||
- **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`):
|
||
- `object_type` ∈ `table | table_version | table_tombstone`
|
||
- `table_key` ∈ `node:<TypeName> | edge:<EdgeName>`
|
||
- `table_branch` is `null` for the main lineage and the branch name otherwise
|
||
- **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones — rows where `object_type = table_tombstone`, whose own `table_version` (acting as the tombstone version) is `>= the entry's table_version`.
|
||
- **Atomic publish**: multi-dataset commits publish via a `ManifestBatchPublisher` so a single write to `__manifest` flips all the new sub-table versions visible at once.
|
||
- **Row-level CAS on the merge-insert join key**: `object_id` carries `lance-schema:unenforced-primary-key=true` so Lance's bloom-filter conflict resolver rejects two concurrent commits that land the same `object_id` row. Without this annotation, Lance's transparent rebase would admit silent duplicates of `version:T@v=N` from racing publishers (see `.context/merge-insert-cas-granularity.md`).
|
||
- **Optimistic concurrency control on publish**: `ManifestBatchPublisher::publish` accepts a `expected_table_versions: HashMap<table_key, u64>` map. Each entry asserts the manifest's current latest non-tombstoned version for that table is exactly what the caller observed; mismatches surface as `OmniError::Manifest` with `ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected, actual }`. Empty map preserves the legacy "best-effort publish" semantics. The publisher uses `conflict_retries(0)` against Lance and owns retry itself (`PUBLISHER_RETRY_BUDGET = 5`), re-running the pre-check on each iteration so concurrent advances surface as `ExpectedVersionMismatch` rather than being silently rebased through.
|
||
|
||
### Internal schema versioning (`db/manifest/migrations.rs`)
|
||
|
||
The on-disk shape of `__manifest` is reconciled with the binary via a single stamp + dispatcher. `INTERNAL_MANIFEST_SCHEMA_VERSION` declares the shape this binary writes; the on-disk stamp `omnigraph:internal_schema_version` lives in the manifest dataset's schema-level metadata (Lance `update_schema_metadata`).
|
||
|
||
- **`init_manifest_graph`** stamps the current version at creation, so newly initialized graphs never need migration.
|
||
- **Publisher open-for-write path** (`load_publish_state`) calls `migrate_internal_schema(&mut dataset)` before reading state. When the on-disk stamp matches the binary, this is a single metadata read with no writes; otherwise the dispatcher walks `match`-arm steps forward (1→2, 2→3, …) until the stamp matches, then proceeds with the publish. Reads stay side-effect-free.
|
||
- **Forward-version protection**: a stamp *higher* than the binary's known version triggers a clear "upgrade omnigraph first" error. An old binary cannot clobber a newer schema by silently treating "unknown stamp" as "missing stamp".
|
||
- **Idempotency**: each migration step is safe to re-run. A crash between two metadata updates inside a single step leaves the partial state; the next open re-runs the step and the second update lands. The dispatcher itself is a cheap stamp-read on the steady-state path.
|
||
|
||
Adding a new on-disk shape change is one constant bump (`INTERNAL_MANIFEST_SCHEMA_VERSION`), one match arm in `migrate_internal_schema`, and one test. No code outside this module branches on the stamp.
|
||
|
||
| Stamp | Shape change |
|
||
|---|---|
|
||
| v1 (implicit, pre-stamp) | `__manifest.object_id` had no PK annotation; publisher had no row-level CAS protection. |
|
||
| v2 | `__manifest.object_id` carries `lance-schema:unenforced-primary-key=true`; row-level CAS engaged. Stamped as `omnigraph:internal_schema_version=2`. |
|
||
| v3 | One-time sweep of legacy `__run__*` staging branches (pre-v0.4.0 Run state machine, removed MR-771) off `__manifest`. Runs at `Omnigraph::open(ReadWrite)` and on publish. Stamped as `omnigraph:internal_schema_version=3`. |
|
||
|
||
## On-disk layout
|
||
|
||
A graph on disk is a directory tree of Lance datasets. Each dataset follows the standard Lance layout (`_versions/`, `data/`, `_indices/`, `_refs/`); OmniGraph adds the multi-dataset coordination by keeping `__manifest/` alongside the per-type datasets.
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
classDef l1 fill:#fef3e8,stroke:#c46900,color:#000
|
||
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
|
||
|
||
graph["graph URI<br/>file:// or s3://bucket/prefix"]:::l2
|
||
|
||
manifest["__manifest/<br/>L2 catalog of sub-tables"]:::l2
|
||
nodes["nodes/{fnv1a64-hex}/<br/>one dataset per node type"]:::l2
|
||
edges["edges/{fnv1a64-hex}/<br/>one dataset per edge type"]:::l2
|
||
cgraph["_graph_commits.lance/<br/>_graph_commit_actors.lance/<br/>_graph_commit_recoveries.lance/"]:::l2
|
||
recovery["__recovery/{ulid}.json<br/>recovery sidecars (transient)"]:::l2
|
||
refs["_refs/branches/{name}.json<br/>graph-level branches"]:::l2
|
||
|
||
graph --> manifest
|
||
graph --> nodes
|
||
graph --> edges
|
||
graph --> cgraph
|
||
graph --> recovery
|
||
graph --> refs
|
||
|
||
subgraph dataset[Inside each Lance dataset — L1]
|
||
ds_v["_versions/{n}.manifest<br/>per-dataset versions"]:::l1
|
||
ds_data["data/<br/>fragment files (Arrow IPC)"]:::l1
|
||
ds_idx["_indices/{uuid}/<br/>BTREE · Inverted FTS · IVF/HNSW"]:::l1
|
||
ds_refs["_refs/<br/>per-dataset Lance branches/tags"]:::l1
|
||
ds_tx["_transactions/<br/>commit transaction logs"]:::l1
|
||
end
|
||
|
||
nodes -.-> dataset
|
||
edges -.-> dataset
|
||
manifest -.-> dataset
|
||
```
|
||
|
||
**What's where:**
|
||
|
||
- **Graph root** is one directory (or S3 prefix). Everything below is part of one OmniGraph graph.
|
||
- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
|
||
- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
|
||
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the v2→v3 migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once `delete_prefix` lands.)
|
||
- **`_graph_commit_recoveries.lance`** — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`.
|
||
- **`__recovery/{ulid}.json`** — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`.
|
||
- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
|
||
- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
|
||
|
||
The split — L2 owns the cross-dataset catalog; L1 owns the per-dataset internals — means that schema work (which adds or removes datasets) updates `__manifest`, while data work (which adds fragments) updates `_versions/` inside the affected dataset and then bumps `__manifest`.
|
||
|
||
## URI scheme support (`storage.rs`)
|
||
|
||
| Scheme | Backend | Notes |
|
||
|---|---|---|
|
||
| local path / `file://` | `LocalStorageAdapter` (tokio) | Normalized to absolute paths |
|
||
| `s3://bucket/prefix` | `S3StorageAdapter` (object_store) | Honors `AWS_ENDPOINT_URL_S3`, `AWS_ALLOW_HTTP`, `AWS_S3_FORCE_PATH_STYLE` |
|
||
| `http(s)://host:port` | HTTP client to `omnigraph-server` | Used by CLI as a target, not a storage backend |
|
||
|
||
## Object-store env vars (S3-compatible)
|
||
|
||
- `AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`
|
||
- `AWS_ENDPOINT_URL`, `AWS_ENDPOINT_URL_S3` — for MinIO / RustFS / GCS-via-XML
|
||
- `AWS_S3_FORCE_PATH_STYLE=true` — path-style URLs
|
||
- `AWS_ALLOW_HTTP=true` — allow plain HTTP (local dev)
|