mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-27 02:39:38 +02:00
Merge branch 'main' into ragnorc/omnigraph-mcp-crate
Folds in v0.7.2 (release #301) + RFC-013 Phase 7 (graph lineage in __manifest, internal schema v3→v4 migration #299; WriteTxn #298; recovery convergence #296) under the MCP branch. Conflict resolutions (2 files): - crates/omnigraph-server/Cargo.toml: take main's 0.7.2 path-dep constraints; keep our omnigraph-mcp dep (bumped to 0.7.2). - docs/releases/v0.8.0.md (add/add): both branches drafted v0.8.0 notes for the same next minor — combined them. v0.8.0 now documents BOTH the MCP surface (ours) and main's __manifest lineage fold + the breaking internal-schema-v4 upgrade-order requirement (kept prominent under Upgrade notes). Corrected our 'no breaking changes / on-disk format unchanged' line, which the v4 migration makes false. Coherence: omnigraph-mcp [package] + Cargo.lock bumped 0.7.1→0.7.2; openapi.json auto-merged to info.version 0.7.2 (no API-surface drift from the incoming engine-internal commits). Verification deferred to CI (no local rebuild).
This commit is contained in:
commit
4d4c2164de
62 changed files with 5898 additions and 1053 deletions
|
|
@ -20,13 +20,14 @@ OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordin
|
|||
- **Layout**:
|
||||
- `nodes/{fnv1a64-hex(type_name)}` — one Lance dataset per node type
|
||||
- `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
|
||||
- `__manifest/` — the catalog of all sub-tables and their published versions
|
||||
- `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
|
||||
- `__manifest/` — the catalog of all sub-tables and their published versions, **and** the graph commit lineage (RFC-013 Phase 7)
|
||||
- `_graph_commits.lance` / `_graph_commit_actors.lance` — legacy / branch-ref carriers. Since RFC-013 Phase 7 the graph lineage lives in `__manifest` (`graph_commit` / `graph_head` rows, written in the publish CAS); `_graph_commits.lance` no longer receives commit rows, but is retained to carry the Lance branch refs that `create_branch` / `list_branches` / the `cleanup` orphan reconciler operate on. A graph created before Phase 7 (internal schema v3) keeps its lineage here until its first read-write open, which migrates it into `__manifest` via `migrate_v3_to_v4`.
|
||||
- (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 graphs are inert; the run state machine was removed. The internal schema migration sweeps stale `__run__*` branches on first write-open; the inert dataset bytes themselves remain until a prefix-delete storage primitive lands)
|
||||
- **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`):
|
||||
- `object_type` ∈ `table | table_version | table_tombstone`
|
||||
- `table_key` ∈ `node:<TypeName> | edge:<EdgeName>`
|
||||
- `object_type` ∈ `table | table_version | table_tombstone | graph_commit | graph_head`
|
||||
- `table_key` ∈ `node:<TypeName> | edge:<EdgeName>` (empty for `graph_commit` / `graph_head` lineage rows)
|
||||
- `table_branch` is `null` for the main lineage and the branch name otherwise
|
||||
- **Graph lineage rows** (RFC-013 Phase 7): one immutable `graph_commit` row per commit (`object_id` = the commit ULID; `metadata` JSON carries parent / merged-parent / actor / timestamp) plus one mutable `graph_head:<branch>` pointer per branch (`graph_head:main` for main). The in-memory commit DAG is a projection of these rows.
|
||||
- **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones — rows where `object_type = table_tombstone`, whose own `table_version` (acting as the tombstone version) is `>= the entry's table_version`.
|
||||
- **Atomic publish**: multi-dataset commits publish so that a single write to `__manifest` flips all the new sub-table versions visible at once.
|
||||
- **Row-level CAS on the merge-insert join key**: `object_id` carries an unenforced-primary-key annotation so Lance's bloom-filter conflict resolver rejects two concurrent commits that land the same `object_id` row. Without this annotation, Lance's transparent rebase would admit silent duplicates from racing publishers.
|
||||
|
|
@ -90,8 +91,8 @@ flowchart TB
|
|||
- **Graph root** is one directory (or S3 prefix). Everything below is part of one OmniGraph graph.
|
||||
- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
|
||||
- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
|
||||
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the internal schema migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once a prefix-delete primitive lands.)
|
||||
- **`_graph_commit_recoveries.lance`** — one row per crash-recovery action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join.
|
||||
- **`_graph_commits.lance`** is an L2 dataset retained only as a branch-ref carrier (and, on a pre-Phase-7 graph, the migration source). Since RFC-013 Phase 7 the graph commit DAG lives in `__manifest` as `graph_commit` / `graph_head` rows written in the publish CAS — `_graph_commits.lance` and its paired `_graph_commit_actors.lance` no longer receive commit rows. A graph created before Phase 7 (internal schema v3) backfills its lineage into `__manifest` on its first read-write open (`migrate_v3_to_v4`). (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the internal schema migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once a prefix-delete primitive lands.)
|
||||
- **`_graph_commit_recoveries.lance`** — one row per crash-recovery action. Joined by `graph_commit_id` to the graph commit lineage (the `graph_commit` rows in `__manifest` since RFC-013 Phase 7); the linked commit carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join.
|
||||
- **`__recovery/{ulid}.json`** — transient sidecar files written by a writer before it advances the underlying dataset, deleted once the matching manifest publish succeeds. A sidecar persisting after process exit means the writer crashed mid-commit; the next read-write open processes it. Steady-state directory is empty.
|
||||
- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
|
||||
- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
|
||||
|
|
|
|||
|
|
@ -3,12 +3,12 @@
|
|||
| Name | Value | Area |
|
||||
|---|---|---|
|
||||
| `MANIFEST_DIR` | `__manifest` | manifest layout |
|
||||
| Commit graph dir | `_graph_commits.lance` | commit graph |
|
||||
| Commit graph dir | `_graph_commits.lance` | branch-ref carrier + pre-v4 lineage source (lineage lives in `__manifest` since RFC-013 Phase 7) |
|
||||
| Run registry dir (legacy, removed) | `_graph_runs.lance` | inert post-v0.4.0; bytes remain until a prefix-delete primitive lands |
|
||||
| Run branch prefix (legacy, removed) | `__run__` | swept off `__manifest` by the internal schema migration; no longer a reserved name |
|
||||
| Schema apply lock | `__schema_apply_lock__` | schema apply |
|
||||
| Manifest publisher retry budget | `PUBLISHER_RETRY_BUDGET = 5` | manifest publish |
|
||||
| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 3` | manifest migrations |
|
||||
| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 4` | manifest migrations (v4 = graph lineage in `__manifest`, RFC-013 Phase 7) |
|
||||
| Merge stage batch | `MERGE_STAGE_BATCH_ROWS = 8192` | merge execution |
|
||||
| Maintenance concurrency | `OMNIGRAPH_MAINTENANCE_CONCURRENCY=8` | optimize/cleanup |
|
||||
| Lance blob compaction support | `LANCE_SUPPORTS_BLOB_COMPACTION = false` | optimize |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue