mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-24 02:38:06 +02:00
MR-925: re-align §-references in writeups after full re-read of MR-737
On full re-read of MR-737, the section numbering used in several of the
original MR-925 cross-references no longer lines up:
- §5.10 in current MR-737 is 'First-class scores and rank fusion', NOT
'custom index types' / 'connector SPI'. The custom-index-type / plugin
surface is §5.4 ('Persisted CSR adjacency as Lance index plugin') with
the capability shape in §5.6.
- §5.11 is 'Substrate choice — DataFusion vs. custom executor (A)'
(resolved 2026-05-11), NOT 'per-table txn branches'. Per-table Lance
native txn branches live in §5.12 ('Mutation IR, write planner, and
external sources') per Ragnor 2026-04-29.
- §5.5 is 'Stable row IDs as graph IDs', NOT 'reconciler pattern'. The
reconciler is §5.16.
- §5.8 is 'Tiering via Lance base paths', NOT SIP-related. SIP is §5.3.
- The MR-925 §1.6 cross-reference to 'Open Q5' was to a pre-2026-05-11
numbering; Q5 in current §10 is 'extension rate under filters'.
Each writeup now has a §-numbering note at the top mapping its findings
to the current MR-737 numbering. The findings themselves are unchanged —
this is a numbering-only edit.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
This commit is contained in:
parent
88b338b56b
commit
c0d7ae6ba4
10 changed files with 155 additions and 57 deletions
|
|
@ -1,6 +1,15 @@
|
|||
# Experiment 1.5 — Extending DataFusion dynamic-filter-pushdown to bitmap shape (code-dive)
|
||||
|
||||
**Ticket:** MR-925 §1.5 (validates MR-737 §5.6, §5.7 / Open Q3).
|
||||
**Ticket:** MR-925 §1.5 (validates MR-737 §5.3 "Sideways Information Passing
|
||||
(SIP) — extends DataFusion's dynamic filter pushdown" + §5.6 "Capability-bearing
|
||||
storage trait", with implications for §5.7 cost-model).
|
||||
|
||||
**§-numbering note:** the original MR-925 cross-reference said "§5.3 / §5.10".
|
||||
On a full re-read of MR-737: §5.3 is the SIP design and is the correct primary
|
||||
§ for this experiment; §5.10 in current MR-737 is "First-class scores and rank
|
||||
fusion" (unrelated). The capability seam that the bitmap pushdown traverses
|
||||
on the storage side is §5.6 (`scan_by_key_set` capability), so this writeup
|
||||
also produces deltas for §5.6.
|
||||
**Type:** Code-dive only (no prototype crate).
|
||||
**Substrate pin:** DataFusion 52.5.
|
||||
**Date:** 2026-05-12.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,14 @@
|
|||
# Experiment 1.2 — Custom Lance index plugin from outside the lance crate
|
||||
|
||||
**Ticket:** MR-925 §1.2 (validates MR-737 §5.4, §5.5).
|
||||
**Ticket:** MR-925 §1.2 (validates MR-737 §5.4 "Persisted CSR adjacency as
|
||||
Lance index plugin" + §5.16 "Index rebuild orchestration — stateless reconciler").
|
||||
|
||||
**§-numbering note:** the original MR-925 cross-reference said "§5.4, §5.5".
|
||||
On a full re-read of MR-737: §5.4 is correct (this is the plugin section);
|
||||
§5.5 is "Stable row IDs as graph IDs" — *not* the reconciler. The reconciler
|
||||
is §5.16 ("Index rebuild orchestration"). Section references in the body have
|
||||
been corrected; §5.5 is *indirectly* in scope only because the experiment
|
||||
leans on stable row IDs.
|
||||
**Prototype:** `validation-prototypes/custom-lance-index/` (long-lived branch).
|
||||
**Substrate pin:** Lance 4.0.1 (matched by cargo to 4.0.0 spec). Lance 4.0.1 internally pulls roaring 0.11 and prost-types 0.14; the workspace deps were lifted to match.
|
||||
**Date:** 2026-05-12.
|
||||
|
|
@ -128,8 +136,8 @@ is closed to us). Any custom-index reconciler we ship has to:
|
|||
- emit `Operation::CreateIndex { new_indices: vec![updated], removed_indices: vec![old] }`
|
||||
to re-publish the index with the updated bitmap.
|
||||
|
||||
This is *consistent with* the §5.5 reconciler pattern in MR-737, so it's
|
||||
not a blocker — but the writeup of §5.5 should explicitly say "the
|
||||
This is *consistent with* the §5.16 reconciler pattern in MR-737, so it's
|
||||
not a blocker — but the writeup of §5.16 should explicitly say "the
|
||||
reconciler also owns fragment coverage diffs, not just file content".
|
||||
|
||||
### F5. Compaction does not move our index. ⚠️
|
||||
|
|
@ -203,7 +211,7 @@ paths forward:
|
|||
both. The current MR-737 wording implies path (1) is available; this
|
||||
experiment proves it is not.
|
||||
|
||||
§5.5 (reconciler pattern) is unaffected by this finding — but it must
|
||||
§5.16 (reconciler pattern) is unaffected by this finding — but it must
|
||||
expand to explicitly own `fragment_bitmap` recomputation across all
|
||||
mutating operations, since with path (2) or path (3) we are the only
|
||||
party that knows the index's row coverage.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,16 @@
|
|||
# Experiment 1.3 — Custom UserDefinedLogicalNode + ExecutionPlan e2e
|
||||
|
||||
**Ticket:** MR-925 §1.3 (validates MR-737 §5.3, §5.10).
|
||||
**Ticket:** MR-925 §1.3 (validates MR-737 §5.1 "Unified IR", §5.2 "Factorized IR",
|
||||
§5.11 "Substrate choice — DataFusion vs. custom executor (A)", and §5.12
|
||||
"Mutation IR" — all of which rely on `UserDefinedLogicalNode` + `ExecutionPlan`
|
||||
surviving the optimizer end-to-end).
|
||||
|
||||
**§-numbering note:** the original §1.3 cross-reference in MR-925 cited "§5.3,
|
||||
§5.10". On a full re-read of MR-737: §5.10 is "First-class scores and rank
|
||||
fusion (open-world over modalities)" (not "operators survive the optimizer"),
|
||||
and the custom-operator e2e contract is actually shared across §5.1, §5.2,
|
||||
§5.11, and §5.12. §5.3 (SIP) is the *first* operator that consumes the contract
|
||||
— valid — but the contract itself is broader than §5.3 alone.
|
||||
**Prototype:** `validation-prototypes/custom-operator/`.
|
||||
**Substrate pin:** DataFusion 52.5 (matched to omnigraph workspace).
|
||||
**Date:** 2026-05-12.
|
||||
|
|
@ -168,7 +178,7 @@ These are not blockers but should be noted for the §11 RFC-body delta:
|
|||
the RFC that graph operators must explicitly choose their output
|
||||
partitioning rather than inheriting.
|
||||
|
||||
## Decision impact on MR-737 §5.3 and §5.10
|
||||
## Decision impact on MR-737 §5.1, §5.2, §5.3, §5.11, §5.12
|
||||
|
||||
**§5.3 is achievable on DataFusion 52.5 as written.** The
|
||||
`UserDefinedLogicalNode`/`ExecutionPlan` surface is fully sufficient
|
||||
|
|
@ -183,9 +193,9 @@ NeighborSetIntersect, etc.). The only edits needed in §5.3:
|
|||
operators implementing it accurately, not punting to
|
||||
`Statistics::new_unknown`.
|
||||
|
||||
**§5.10 ("operators survive the optimizer + execute correctly")**:
|
||||
The composition test (E4) plus the metrics test (E5) cover this. No
|
||||
deltas needed.
|
||||
**§5.1 / §5.2 / §5.11 / §5.12 ("custom operators survive the optimizer +
|
||||
execute correctly")**: The composition test (E4) plus the metrics test (E5)
|
||||
cover this. No deltas needed.
|
||||
|
||||
## Caveats
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,15 @@
|
|||
# Experiment 1.4 — Roaring bitmap variant for u64 row IDs (SIP wire format)
|
||||
|
||||
**Ticket:** MR-925 §1.4 (validates MR-737 §5.6, §5.8 / Open Q4).
|
||||
**Ticket:** MR-925 §1.4 (validates MR-737 §5.3 "Sideways Information Passing
|
||||
(SIP)" wire format + §10 Open Q3 "SIP wire format").
|
||||
|
||||
**§-numbering note:** the original MR-925 cross-reference said "§5.6, §5.8 /
|
||||
Open Q4". On a full re-read of MR-737: §5.3 is SIP (the wire format is named
|
||||
in Open Q3); §5.6 is the storage trait that *consumes* SIP via `sip_mask` /
|
||||
`key_set` on `ScanRequest`; §5.8 is "Tiering via Lance base paths" (unrelated
|
||||
to SIP). Open Q4 is "CSR index format inside Lance" (also unrelated). The
|
||||
correct primary mapping is **§5.3 + §10 Open Q3**, with downstream
|
||||
implications for §5.6 (the capability advertisement).
|
||||
**Prototype:** `validation-prototypes/sip-format-bench/`.
|
||||
**Substrate pin:** `roaring = "0.11"` (matched to lance-table dependency).
|
||||
**Date:** 2026-05-12.
|
||||
|
|
@ -10,7 +19,7 @@
|
|||
## Hypothesis
|
||||
|
||||
For propagating row-ID side-information predicates (SIPs) between operators —
|
||||
the §5.6 dynamic-filter-pushdown wire format — Roaring bitmaps over u64
|
||||
the §5.3 dynamic-filter-pushdown wire format — Roaring bitmaps over u64
|
||||
(`RoaringTreemap`) are the right encoding when row IDs cluster by Lance
|
||||
fragment (which they do). For random u64s, Roaring is *not* the right
|
||||
choice.
|
||||
|
|
@ -148,7 +157,7 @@ At `n=1M` dense, encoding takes **37ms**, decoding takes **0.02ms**.
|
|||
For "build once, read many" wire-format use, this is fine. But if the
|
||||
SIP is built mid-pipeline (e.g. from a `FilterExec`'s output IDs) and
|
||||
intersected immediately with another payload, the build cost dominates.
|
||||
The §5.6 RFC should clarify: SIPs are produced at *probe-build time* on
|
||||
The §5.3 RFC should clarify: SIPs are produced at *probe-build time* on
|
||||
the hash-join build side, where 37ms is amortized across the entire
|
||||
probe phase.
|
||||
|
||||
|
|
@ -176,9 +185,9 @@ construction.
|
|||
|
||||
Default fallback (for non-row-ID u64s): **varint-delta**.
|
||||
|
||||
## Decision impact on MR-737 §5.6 and §5.8
|
||||
## Decision impact on MR-737 §5.3 + §10 Open Q3 (and downstream §5.6)
|
||||
|
||||
**§5.6 (SIP wire format) — concrete choice:**
|
||||
**§5.3 (SIP wire format) — concrete choice:**
|
||||
|
||||
> ROW_ID_SIP wire format := length-prefixed roaring `serialize_into` bytes
|
||||
> with a 1-byte format-tag prefix. Tag values: `0x01` = Roaring (u64
|
||||
|
|
@ -189,7 +198,7 @@ Default fallback (for non-row-ID u64s): **varint-delta**.
|
|||
This makes the wire format extensible while picking a default that
|
||||
matches the dominant workload.
|
||||
|
||||
**§5.8 / Open Q4 — answered:**
|
||||
**Open Q3 — answered:**
|
||||
|
||||
The RFC's Q4 ("can we share the SIP filter between operator stages by
|
||||
serializing roaring bytes?") is **yes for row-ID payloads**.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,17 @@
|
|||
# Experiment 1.7 — Stable-row-id-aware indices survive compaction (code-dive + small repro plan)
|
||||
|
||||
**Ticket:** MR-925 §1.7 (validates MR-737 §5.4, §5.10 / Open Q6).
|
||||
**Ticket:** MR-925 §1.7 (validates MR-737 §5.4 "Persisted CSR adjacency as Lance
|
||||
index plugin" + §5.5 "Stable row IDs as graph IDs").
|
||||
|
||||
**§-numbering note (added on re-read of MR-737):** MR-925's original §1.7 cross-
|
||||
reference cited "§5.8 / Open Q7" / "§5.10". On a full read of MR-737, §5.10 is
|
||||
"First-class scores and rank fusion" (NOT custom index types), §5.4 is "Persisted
|
||||
CSR adjacency as Lance index plugin" (which contains the custom-index-type seam),
|
||||
and §5.5 is "Stable row IDs as graph IDs" (which flags the experimental status of
|
||||
"Stable Row ID for Index" in lance-4.0.x). The corrected mapping for §1.7 is
|
||||
**§5.4 + §5.5** and the MR-737 §5.5 substrate caveat that "`Stable Row ID for
|
||||
Index` is documented as experimental in lance-4.0.x" is the immediate caveat for
|
||||
this experiment.
|
||||
**Type:** Code-dive plus a planned small repro (not yet built; specified for Phase 0 entry).
|
||||
**Substrate pin:** Lance 4.0.1, lance-index 4.0.1.
|
||||
**Date:** 2026-05-12.
|
||||
|
|
@ -9,10 +20,13 @@
|
|||
|
||||
## Question
|
||||
|
||||
MR-737 §5.4 (graph topology index) and §5.10 (custom index types for graph
|
||||
adjacency) both depend on the assumption that a custom index — i.e. our
|
||||
own CSR/CSC adjacency lists keyed by source-table row IDs — **continues
|
||||
to point at the right rows after the source table is compacted.**
|
||||
MR-737 §5.4 ("Persisted CSR adjacency as Lance index plugin") and §5.5 ("Stable
|
||||
row IDs as graph IDs") both depend on the assumption that a custom CSR/CSC
|
||||
adjacency index keyed by source-table row IDs **continues to point at the right
|
||||
rows after the source table is compacted.** §5.5 explicitly flags the substrate
|
||||
caveat: "Stable Row ID for Index" is **experimental** in lance-4.0.x; confirming
|
||||
whether our created indices opt into stable-row-id mode is a follow-up worth
|
||||
doing before MR-848 (index reconciler) lands.
|
||||
|
||||
Lance's compaction (`compact_files`) consolidates fragments, which on the
|
||||
non-stable row-ID scheme renumbers row addresses. The question:
|
||||
|
|
@ -253,19 +267,22 @@ demonstrates end-to-end survival. Specification:
|
|||
Estimated effort: 1–2 days. **Defer to Phase 0**; the code-dive
|
||||
already justifies §5.4 as feasible without the repro.
|
||||
|
||||
## Decision impact on MR-737 §5.4 and §5.10
|
||||
## Decision impact on MR-737 §5.4 and §5.5
|
||||
|
||||
**§5.4 (graph topology index) is feasible on Lance 4.0.1 with stable
|
||||
row IDs (Path B):**
|
||||
**§5.4 (persisted CSR adjacency as Lance index plugin) is feasible on Lance
|
||||
4.0.1 with stable row IDs (Path B):**
|
||||
|
||||
- No Lance plugin-registry dependency; we drive remapping ourselves.
|
||||
- The custom topology dataset stores stable row IDs end-to-end; the
|
||||
bulk of compaction-induced changes don't require remap.
|
||||
- Path A (Lance-managed remapping) is a follow-up improvement
|
||||
contingent on the §1.2 contribution.
|
||||
contingent on the §1.2 plugin-registry contribution.
|
||||
|
||||
**§5.10 (custom index types):** No new findings beyond §1.2. The
|
||||
`ScalarIndex::remap` contract is sufficient and stable.
|
||||
**§5.5 (stable row IDs as graph IDs):** The MR-737 substrate caveat
|
||||
("`Stable Row ID for Index` is experimental in lance-4.0.x") still
|
||||
stands. The small repro in §5 above is the way to confirm opt-in;
|
||||
until it runs, treat §5.5 as substrate-positive but not yet validated
|
||||
for index-side stable IDs.
|
||||
|
||||
**Open Q6 ("survive compaction"):** Answered yes. The recommendation is
|
||||
**Path B for v1, Path A for v2**. RFC §5.4 should specify the
|
||||
|
|
|
|||
|
|
@ -1,6 +1,25 @@
|
|||
# Experiment 1.6 — Lance native per-table txn branches (code-dive, cost model)
|
||||
|
||||
**Ticket:** MR-925 §1.6 (validates MR-737 §5.11, §5.12 / Open Q5).
|
||||
**Ticket:** MR-925 §1.6 (validates MR-737 §5.12 "Mutation IR, write planner,
|
||||
and external sources").
|
||||
|
||||
**§-numbering note (added on re-read of MR-737):** MR-925's original §1.6 cross-
|
||||
reference cited "§5.11, §5.12 / Open Q5". On a full read of MR-737:
|
||||
|
||||
- **§5.11** is "Substrate choice — DataFusion vs. custom executor — RECOMMENDED
|
||||
(A)" — resolved 2026-05-11, not about per-table branches.
|
||||
- **§5.12** "Mutation IR, write planner, and external sources" is where the
|
||||
*implementation note* says "writes go to per-table Lance native branches
|
||||
(storage-layer staging invisible above the TableProvider/Dataset boundary),
|
||||
success → fast-forward, failure → drop branches. NOT `restore()`-based rollback"
|
||||
(also called out in §0.5 item 9). This is what §1.6 actually validates.
|
||||
- **Open Q5** in the current MR-737 §10 is "What 'extension rate' means under
|
||||
filters" — unrelated. The MR-925 cross-reference is to a pre-2026-05-11
|
||||
numbering where Q5 covered per-table branches; that question is now folded
|
||||
into §5.12 and resolved per Ragnor 2026-04-29.
|
||||
|
||||
Corrected mapping for §1.6: **§5.12 (mutation-IR per-table branches)** — cost
|
||||
model + recommendation.
|
||||
**Type:** Code-dive only — no prototype crate.
|
||||
**Substrate pin:** Lance 4.0.1.
|
||||
**Date:** 2026-05-12.
|
||||
|
|
@ -204,11 +223,12 @@ cheaper.
|
|||
*branches*, not just zombie *commits*. This is more state to track,
|
||||
not less.
|
||||
|
||||
## Decision impact on MR-737 §5.11 and §5.12
|
||||
## Decision impact on MR-737 §5.12
|
||||
|
||||
**§5.11 (per-table txn isolation):** Lance-native branches **can**
|
||||
implement this, but the steady-state cost is essentially the same as
|
||||
the current lazy-graph-branch model, and the abort-path cost is *higher*.
|
||||
**§5.12 (mutation IR with per-table branches):** Lance-native branches **can**
|
||||
implement the per-table staging shape that §5.12 prescribes, but the steady-state
|
||||
cost is essentially the same as the current lazy-graph-branch model, and the
|
||||
abort-path cost is *higher*.
|
||||
The conceptual clarity argument is real but not load-bearing — both
|
||||
models provide the same isolation guarantee.
|
||||
|
||||
|
|
@ -236,7 +256,7 @@ Lance-native per-table branches only if:
|
|||
"lazy" so most branches don't fork most tables).
|
||||
|
||||
For Phase 0, the deliverable is a **clear specification** of which model
|
||||
MR-737 §5.11/§5.12 prescribes. The cost analysis above suggests
|
||||
MR-737 §5.12 prescribes. The cost analysis above suggests
|
||||
specifying the current model.
|
||||
|
||||
## Caveats and follow-ups
|
||||
|
|
@ -255,4 +275,4 @@ specifying the current model.
|
|||
- **Forbidden APIs file** (`crates/omnigraph/tests/forbidden_apis.rs:57`)
|
||||
excludes `.delete_branch(` from the over-match check — there's
|
||||
intent in the codebase to allow these calls. Worth a follow-up read
|
||||
to see if MR-737 §5.11 has already opened the door.
|
||||
to see if MR-737 §5.12 has already opened the door.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue