From 2121d9f6c37b995ac9d256f0b5fcdd3bb57024be Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Tue, 12 May 2026 16:56:51 -0700 Subject: [PATCH] docs: storage stable-row-ids reflects every dataset MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The L1 capability list claimed the flag was enabled "for the commit-graph and run-registry datasets" — stale. Every Lance dataset OmniGraph creates has enable_stable_row_ids: true; the run-registry datasets are gone since MR-771. Replace with a single paragraph capturing the invariant, the consequences (row-version columns available, CreateIndex × Rewrite not retryable, Lance reader version required), the legacy-dataset constraint (one-way at create, dump-and-reload to migrate), and a pointer to the regression test in staged_writes.rs. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/storage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/storage.md b/docs/storage.md index 825fbbe..b284bc2 100644 --- a/docs/storage.md +++ b/docs/storage.md @@ -7,7 +7,7 @@ Every node type and every edge type is its own Lance dataset: - **Columnar Arrow storage**: each property is a column; nullable per Arrow schema. - **Fragments**: data is partitioned into fragments; new writes create new fragments. - **Manifest versioning**: every commit produces a new dataset version; old versions remain readable. -- **Stable row IDs**: enabled by OmniGraph for the commit-graph and run-registry datasets so durable references survive compaction. +- **Stable row IDs**: `enable_stable_row_ids: true` is set on every Lance dataset OmniGraph creates — node and edge data tables, `__manifest`, `_graph_commits.lance`, `_graph_commit_recoveries.lance`, and any future system tables. This is an architectural invariant: the flag is one-way at dataset create per Lance's row-id-lineage spec, so a future change that introduces a Lance dataset must preserve it. Consequences: `_row_created_at_version` and `_row_last_updated_at_version` are available on every dataset (load-bearing for change-feed validators); `CreateIndex × Rewrite` is not a retryable conflict, so indices survive `omnigraph optimize` without needing the Fragment Reuse Index; readers must use a Lance build that recognises the flag (our pinned 4.0.0 is fine). Pre-0.4.x repos created before this code path settled may have datasets without the flag and cannot be retrofitted in place — the supported path is dump-and-reload. The `stage_overwrite` rewrite path (used by `schema_apply`) preserves the flag through `Operation::Overwrite`; pinned by `stage_overwrite_preserves_stable_row_ids` in `crates/omnigraph/tests/staged_writes.rs`. - **Append / delete / `merge_insert`**: native Lance write modes. - **Per-dataset branches** (Lance native): copy-on-write at the dataset level. - **Object-store agnostic**: file://, s3://, gs://, az://, http (read-only via Lance) — OmniGraph wires file:// and s3:// (`storage.rs`).