Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
# Storage
## L1 — Lance dataset (per node/edge type)
Every node type and every edge type is its own Lance dataset:
- **Columnar Arrow storage**: each property is a column; nullable per Arrow schema.
- **Fragments**: data is partitioned into fragments; new writes create new fragments.
- **Manifest versioning**: every commit produces a new dataset version; old versions remain readable.
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
- **Stable row IDs**: stable row IDs are enabled on every Lance dataset OmniGraph creates — node and edge data tables, `__manifest` , the commit-graph datasets, and any future system tables. This is an architectural invariant: the flag is one-way at dataset create, so a future change that introduces a Lance dataset must preserve it. Consequences: `_row_created_at_version` and `_row_last_updated_at_version` are available on every dataset (load-bearing for change-feed validators); indices survive `omnigraph optimize` . Pre-0.4.x graphs created before this code path settled may have datasets without the flag and cannot be retrofitted in place — the supported path is dump-and-reload. The rewrite path used by `schema_apply` preserves the flag.
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
- **Append / delete / `merge_insert` **: native Lance write modes.
- **Per-dataset branches** (Lance native): copy-on-write at the dataset level.
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
- **Object-store agnostic**: file://, s3://, gs://, az://, http (read-only via Lance) — OmniGraph wires file:// and s3://.
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
## L2 — Multi-dataset coordination via `__manifest`
OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordinated through one append-only manifest table.
- **Manifest table**: `__manifest/` Lance dataset.
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
- **Layout**:
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
- `nodes/{fnv1a64-hex(type_name)}` — one Lance dataset per node type
- `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
- `__manifest/` — the catalog of all sub-tables and their published versions
- `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
- (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 graphs are inert; the run state machine was removed. The internal schema migration sweeps stale `__run__*` branches on first write-open; the inert dataset bytes themselves remain until a prefix-delete storage primitive lands)
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
- **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count` ):
- `object_type` ∈ `table | table_version | table_tombstone`
- `table_key` ∈ `node:<TypeName> | edge:<EdgeName>`
- `table_branch` is `null` for the main lineage and the branch name otherwise
2026-04-29 00:09:06 +02:00
- **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones — rows where `object_type = table_tombstone` , whose own `table_version` (acting as the tombstone version) is `>= the entry's table_version` .
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
- **Atomic publish**: multi-dataset commits publish so that a single write to `__manifest` flips all the new sub-table versions visible at once.
- **Row-level CAS on the merge-insert join key**: `object_id` carries an unenforced-primary-key annotation so Lance's bloom-filter conflict resolver rejects two concurrent commits that land the same `object_id` row. Without this annotation, Lance's transparent rebase would admit silent duplicates from racing publishers.
- **Optimistic concurrency control on publish**: a publish asserts the manifest's current latest non-tombstoned version for each touched table is exactly what the caller observed; mismatches surface as an `ExpectedVersionMismatch` manifest conflict naming the table and the expected/actual versions. Concurrent advances surface as a conflict rather than being silently rebased through.
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
### Internal schema versioning
2026-04-29 11:44:14 +00:00
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
The on-disk shape of `__manifest` is reconciled with the binary via a single version stamp held in the manifest dataset's schema-level metadata.
2026-04-29 11:44:14 +00:00
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
- **Graph creation** stamps the current version, so newly initialized graphs never need migration.
- **The open-for-write path** migrates the on-disk stamp before reading state. When the stamp matches the binary, this is a single metadata read with no writes; otherwise the migration walks steps forward (1→2, 2→3, …) until the stamp matches, then proceeds with the publish. Reads stay side-effect-free.
2026-04-29 11:44:14 +00:00
- **Forward-version protection**: a stamp *higher* than the binary's known version triggers a clear "upgrade omnigraph first" error. An old binary cannot clobber a newer schema by silently treating "unknown stamp" as "missing stamp".
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
- **Idempotency**: each migration step is safe to re-run. A crash between two metadata updates inside a single step leaves the partial state; the next open re-runs the step and the second update lands.
2026-04-29 11:44:14 +00:00
| Stamp | Shape change |
|---|---|
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
| v1 (implicit, pre-stamp) | `__manifest.object_id` had no PK annotation; no row-level CAS protection. |
| v2 | `__manifest.object_id` carries an unenforced-primary-key annotation; row-level CAS engaged. |
| v3 | One-time sweep of legacy `__run__*` staging branches (pre-v0.4.0 Run state machine, removed) off `__manifest` . Runs at read-write open and on publish. |
2026-04-29 11:44:14 +00:00
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.
Static structure (docs/architecture.md):
- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
server/CLI. Each zoom-in cites file:line entry points.
Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.
On-disk layout (docs/storage.md):
- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
_graph_runs.lance, _refs/branches/ down into Lance's per-dataset
internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
not move or rewrite that content.
Runtime flows (docs/execution.md):
- Read flow sequence: client → Omnigraph::query → typecheck → lower →
execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
Lance write op (Append / merge_insert) → ManifestRepo::commit →
__manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
file:line citations so readers can navigate from diagram element to
source in one step.
Conventions established (this is the first Mermaid in the repo):
- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
storage trait / Lance / object store. No accidental synonyms.
Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00
## On-disk layout
2026-05-24 16:46:00 +01:00
A graph on disk is a directory tree of Lance datasets. Each dataset follows the standard Lance layout (`_versions/` , `data/` , `_indices/` , `_refs/` ); OmniGraph adds the multi-dataset coordination by keeping `__manifest/` alongside the per-type datasets.
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.
Static structure (docs/architecture.md):
- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
server/CLI. Each zoom-in cites file:line entry points.
Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.
On-disk layout (docs/storage.md):
- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
_graph_runs.lance, _refs/branches/ down into Lance's per-dataset
internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
not move or rewrite that content.
Runtime flows (docs/execution.md):
- Read flow sequence: client → Omnigraph::query → typecheck → lower →
execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
Lance write op (Append / merge_insert) → ManifestRepo::commit →
__manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
file:line citations so readers can navigate from diagram element to
source in one step.
Conventions established (this is the first Mermaid in the repo):
- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
storage trait / Lance / object store. No accidental synonyms.
Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00
```mermaid
flowchart TB
classDef l1 fill:#fef3e8 ,stroke:#c46900 ,color:#000
classDef l2 fill:#e8f4fd ,stroke:#1e6aa8 ,color:#000
2026-05-24 16:46:00 +01:00
graph["graph URI< br / > file:// or s3://bucket/prefix"]:::l2
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.
Static structure (docs/architecture.md):
- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
server/CLI. Each zoom-in cites file:line entry points.
Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.
On-disk layout (docs/storage.md):
- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
_graph_runs.lance, _refs/branches/ down into Lance's per-dataset
internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
not move or rewrite that content.
Runtime flows (docs/execution.md):
- Read flow sequence: client → Omnigraph::query → typecheck → lower →
execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
Lance write op (Append / merge_insert) → ManifestRepo::commit →
__manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
file:line citations so readers can navigate from diagram element to
source in one step.
Conventions established (this is the first Mermaid in the repo):
- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
storage trait / Lance / object store. No accidental synonyms.
Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00
manifest["__manifest/< br / > L2 catalog of sub-tables"]:::l2
nodes["nodes/{fnv1a64-hex}/< br / > one dataset per node type"]:::l2
edges["edges/{fnv1a64-hex}/< br / > one dataset per edge type"]:::l2
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
cgraph["_graph_commits.lance/< br / > _graph_commit_actors.lance/< br / > _graph_commit_recoveries.lance/"]:::l2
2026-05-03 13:56:36 +02:00
recovery["__recovery/{ulid}.json< br / > recovery sidecars (transient)"]:::l2
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.
Static structure (docs/architecture.md):
- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
server/CLI. Each zoom-in cites file:line entry points.
Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.
On-disk layout (docs/storage.md):
- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
_graph_runs.lance, _refs/branches/ down into Lance's per-dataset
internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
not move or rewrite that content.
Runtime flows (docs/execution.md):
- Read flow sequence: client → Omnigraph::query → typecheck → lower →
execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
Lance write op (Append / merge_insert) → ManifestRepo::commit →
__manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
file:line citations so readers can navigate from diagram element to
source in one step.
Conventions established (this is the first Mermaid in the repo):
- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
storage trait / Lance / object store. No accidental synonyms.
Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00
refs["_refs/branches/{name}.json< br / > graph-level branches"]:::l2
2026-05-24 16:46:00 +01:00
graph --> manifest
graph --> nodes
graph --> edges
graph --> cgraph
graph --> recovery
graph --> refs
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.
Static structure (docs/architecture.md):
- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
server/CLI. Each zoom-in cites file:line entry points.
Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.
On-disk layout (docs/storage.md):
- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
_graph_runs.lance, _refs/branches/ down into Lance's per-dataset
internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
not move or rewrite that content.
Runtime flows (docs/execution.md):
- Read flow sequence: client → Omnigraph::query → typecheck → lower →
execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
Lance write op (Append / merge_insert) → ManifestRepo::commit →
__manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
file:line citations so readers can navigate from diagram element to
source in one step.
Conventions established (this is the first Mermaid in the repo):
- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
storage trait / Lance / object store. No accidental synonyms.
Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00
subgraph dataset[Inside each Lance dataset — L1]
ds_v["_versions/{n}.manifest< br / > per-dataset versions"]:::l1
ds_data["data/< br / > fragment files (Arrow IPC)"]:::l1
ds_idx["_indices/{uuid}/< br / > BTREE · Inverted FTS · IVF/HNSW"]:::l1
ds_refs["_refs/< br / > per-dataset Lance branches/tags"]:::l1
ds_tx["_transactions/< br / > commit transaction logs"]:::l1
end
nodes -.-> dataset
edges -.-> dataset
manifest -.-> dataset
```
**What's where:**
2026-05-24 16:46:00 +01:00
- **Graph root** is one directory (or S3 prefix). Everything below is part of one OmniGraph graph.
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.
Static structure (docs/architecture.md):
- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
server/CLI. Each zoom-in cites file:line entry points.
Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.
On-disk layout (docs/storage.md):
- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
_graph_runs.lance, _refs/branches/ down into Lance's per-dataset
internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
not move or rewrite that content.
Runtime flows (docs/execution.md):
- Read flow sequence: client → Omnigraph::query → typecheck → lower →
execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
Lance write op (Append / merge_insert) → ManifestRepo::commit →
__manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
file:line citations so readers can navigate from diagram element to
source in one step.
Conventions established (this is the first Mermaid in the repo):
- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
storage trait / Lance / object store. No accidental synonyms.
Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00
- **`__manifest/` ** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
- **`nodes/` ** and ** `edges/` ** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
- **`_graph_commits.lance` ** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the internal schema migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once a prefix-delete primitive lands.)
- **`_graph_commit_recoveries.lance` ** — one row per crash-recovery action. Joined to `_graph_commits.lance` by `graph_commit_id` ; the linked commit row carries `actor_id=omnigraph:recovery` . Operators correlate recoveries with the original mutations they rolled forward / back via this join.
- **`__recovery/{ulid}.json` ** — transient sidecar files written by a writer before it advances the underlying dataset, deleted once the matching manifest publish succeeds. A sidecar persisting after process exit means the writer crashed mid-commit; the next read-write open processes it. Steady-state directory is empty.
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.
Static structure (docs/architecture.md):
- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
server/CLI. Each zoom-in cites file:line entry points.
Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.
On-disk layout (docs/storage.md):
- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
_graph_runs.lance, _refs/branches/ down into Lance's per-dataset
internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
not move or rewrite that content.
Runtime flows (docs/execution.md):
- Read flow sequence: client → Omnigraph::query → typecheck → lower →
execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
Lance write op (Append / merge_insert) → ManifestRepo::commit →
__manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
file:line citations so readers can navigate from diagram element to
source in one step.
Conventions established (this is the first Mermaid in the repo):
- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
storage trait / Lance / object store. No accidental synonyms.
Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00
- **`_refs/branches/{name}.json` ** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
The split — L2 owns the cross-dataset catalog; L1 owns the per-dataset internals — means that schema work (which adds or removes datasets) updates `__manifest` , while data work (which adds fragments) updates `_versions/` inside the affected dataset and then bumps `__manifest` .
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
## URI scheme support
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
| Scheme | Backend | Notes |
|---|---|---|
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.
Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
"optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.
Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.
Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).
Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00
| local path / `file://` | local filesystem | Normalized to absolute paths; relative and dot-segment paths are lexically absolutized |
| `s3://bucket/prefix` | S3 object store | Honors `AWS_ENDPOINT_URL_S3` , `AWS_ALLOW_HTTP` , `AWS_S3_FORCE_PATH_STYLE` |
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
| `http(s)://host:port` | HTTP client to `omnigraph-server` | Used by CLI as a target, not a storage backend |
## Object-store env vars (S3-compatible)
- `AWS_REGION` , `AWS_ACCESS_KEY_ID` , `AWS_SECRET_ACCESS_KEY` , `AWS_SESSION_TOKEN`
- `AWS_ENDPOINT_URL` , `AWS_ENDPOINT_URL_S3` — for MinIO / RustFS / GCS-via-XML
- `AWS_S3_FORCE_PATH_STYLE=true` — path-style URLs
- `AWS_ALLOW_HTTP=true` — allow plain HTTP (local dev)