mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
MR-793 follow-up: lance docs alignment audit + mandate full-page fetch via mdrip
* AGENTS.md / docs/lance.md: agents must use `npx mdrip` (not summarizing WebFetch) when consulting Lance docs. WebFetch routinely drops load-bearing details — `pub(crate)` blockers, sub-specs behind nav hubs, default flags. Lesson learned during the MR-793 alignment audit. * docs/lance.md: add "Last alignment audit: 2026-05-02" stanza documenting MemWAL gap, lance#6666 companion ticket, stable-row-ID status (experimental, may unblock MR-848), FRI as documented compaction-friendly alternative. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
3135ff5d19
commit
17bf978d0e
2 changed files with 13 additions and 2 deletions
|
|
@ -5,7 +5,7 @@ This file is the always-on map for AI coding agents (Claude Code, Codex, Cursor,
|
|||
**Required reading every session, every change:**
|
||||
|
||||
1. **[docs/invariants.md](docs/invariants.md)** — the architectural invariants and §IX deny-list. Apply to every PR, not only architecture work.
|
||||
2. **[docs/lance.md](docs/lance.md)** — the curated index of upstream Lance docs. **Consult it before every task** to identify which Lance pages are relevant to what you're about to do, then fetch those upstream URLs before grepping our code or guessing. Lance is the substrate; behavior is documented there, not here.
|
||||
2. **[docs/lance.md](docs/lance.md)** — the curated index of upstream Lance docs. **Consult it before every task** to identify which Lance pages are relevant to what you're about to do, then fetch those upstream URLs before grepping our code or guessing. Lance is the substrate; behavior is documented there, not here. **Always fetch the FULL page content, not summaries** — use `npx mdrip <url>` (or `npx mdrip --max-chars 200000 <url>` for very long pages). Tools that summarize pages (like Claude's `WebFetch`) drop load-bearing details — we have caught alignment misses (default flags, `pub(crate)` blockers, three-page sub-specs hidden behind navigation hubs) only after dumping the full markdown. If `npx mdrip` is unavailable, fall back to `curl <url> | pandoc -f html -t markdown` or paste the rendered page text manually; never act on a summarized fetch alone.
|
||||
3. **[docs/testing.md](docs/testing.md)** — the test-coverage map. **Always check what already covers your change before writing a new test.** Extending an existing test (an assertion, a fixture row, a parameterization) is preferred over a duplicated `init_and_load()` block. Walk the before-every-task checklist to identify existing coverage, run those tests as a clean baseline, and only add a new test fn or file when no existing one owns the area.
|
||||
|
||||
Tools that support `@`-imports (Claude Code) auto-include all three files via the imports below — note these must sit at column 0 (not inside a blockquote) for the parser to recognize them. Other agents (Codex, Cursor, Cline, …) must open them explicitly at the start of each session.
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ OmniGraph sits on top of Lance. Many problems — index lifecycle, branching, tr
|
|||
|
||||
This file is the curated entry point. **When you hit a Lance-shaped problem, find the matching topic below and fetch the listed URL(s) before guessing.** Don't grep our codebase for behavior that is documented authoritatively in Lance.
|
||||
|
||||
Base URL: `https://lance.org`. Use `WebFetch` (or your tool's equivalent) on the full URLs. Keep this index curated to relevant material — the upstream sitemap has hundreds of URLs (notably the Namespace REST API model surface, Spark/Trino/Databricks integrations) that we don't use.
|
||||
Base URL: `https://lance.org`. **Fetch the FULL page content, not summaries** — use `npx mdrip <url>` (or `npx mdrip --max-chars 200000 <url>` for very long pages). Tools that summarize pages (like Claude's `WebFetch`) routinely drop load-bearing details — defaults, `pub(crate)` blockers, sub-specs hidden behind navigation hubs. If `npx mdrip` is unavailable, fall back to `curl <url> | pandoc -f html -t markdown` or paste the rendered page text manually; **never act on a summarized fetch alone**. Keep this index curated to relevant material — the upstream sitemap has hundreds of URLs (notably the Namespace REST API model surface, Spark/Trino/Databricks integrations) that we don't use.
|
||||
|
||||
> **Substrate boundary check.** Before fetching, recall [docs/invariants.md §I](invariants.md): if Lance already does the thing, we don't reimplement it. The most common reason to read these docs is to confirm a substrate behavior, not to learn what to clone.
|
||||
|
||||
|
|
@ -155,3 +155,14 @@ If a future need pulls one of these into scope, add a row to the matching domain
|
|||
## Maintenance
|
||||
|
||||
When Lance ships a major release that changes any of the above (file format bump, new index type, transaction semantics change, new branching primitive), refresh this index in the same change as the omnigraph upgrade. Stale Lance pointers are worse than no pointers.
|
||||
|
||||
### Last alignment audit: 2026-05-02 (Lance 4.0.1 upstream; omnigraph pinned at 4.0.0)
|
||||
|
||||
A full read-through of every index page above was performed in the MR-793 cycle. Findings (no code changes required for PR #70):
|
||||
|
||||
- The MemWAL system index has three deeper sub-pages that this index does not yet list — they're load-bearing for understanding crash-recovery semantics and are needed before MR-847 (recovery reconciler) implementation. Add when located: `MemWAL Index Overview`, `MemWAL Index Details`, `MemWAL Implementation` (linked from the parent MemWAL page but at sub-URLs not currently in `lance.md`).
|
||||
- The distributed-indexing guide names Python APIs (`commit_existing_index_segments`, `merge_existing_index_segments`); the Rust analogues exist via `CreateIndexBuilder::execute_uncommitted` for scalar indices but **`build_index_metadata_from_segments` is `pub(crate)`** and blocks vector-index two-phase commits from outside the lance crate. Filed [lance-format/lance#6666](https://github.com/lance-format/lance/issues/6666) as a companion to [#6658](https://github.com/lance-format/lance/issues/6658).
|
||||
- "Stable Row ID for Index" is documented as **experimental** in lance-4.0.x. Our datasets enable stable row IDs at the dataset level (`WriteParams::enable_stable_row_ids = true`); confirming whether our created indices opt into stable-row-id mode is a follow-up worth doing before MR-848 (index reconciler) lands.
|
||||
- Fragment Reuse Index (FRI) is documented as one of three compaction strategies. omnigraph currently uses option 2 (immediate index rewrite at compaction time, via `omnigraph optimize`'s post-compaction rebuild). Adopting FRI is the explicit option for compaction-friendly index updates; relevant to MR-848.
|
||||
|
||||
Bump this date stanza on the next alignment pass.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue