Merge branch 'main' into devin/1778632299-mr-854-phase-1b-phase-9

This commit is contained in:
Ragnor Comerford 2026-06-07 22:58:42 +02:00 committed by GitHub
commit 8e2f03771a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
30 changed files with 700 additions and 101 deletions

4
.github/CODEOWNERS vendored
View file

@ -8,9 +8,9 @@
# CI fails if this file drifts from its source, and rejects PRs that
# edit this file directly without also editing the yml.
* @ragnorc
* @ragnorc @aaltshuler
crates/** @ragnorc
crates/** @ragnorc @aaltshuler
docs/** @ragnorc
README.md @ragnorc
AGENTS.md @ragnorc

34
.github/DISCUSSION_TEMPLATE/rfc.yml vendored Normal file
View file

@ -0,0 +1,34 @@
labels: ["rfc"]
body:
- type: markdown
attributes:
value: |
Use this to **incubate an RFC** — socialize a design and reach rough
consensus before writing the formal document. When it's ready, graduate
it into a pull request that adds `docs/rfcs/NNNN-title.md`
(see [docs/rfcs/README.md](../blob/main/docs/rfcs/README.md)); a
maintainer merging that PR is acceptance.
For a plain feature request or open-ended idea, use the **Ideas**
category instead. For bugs, open an [Issue](../../issues/new/choose).
- type: textarea
id: problem
attributes:
label: Problem / motivation
description: What needs solving, and why is it worth the long-run cost?
validations:
required: true
- type: textarea
id: sketch
attributes:
label: Proposed direction (sketch)
description: A rough shape of the design. Detail comes later in the RFC document.
validations:
required: true
- type: textarea
id: invariants
attributes:
label: Invariants touched
description: Which items in docs/dev/invariants.md does this affect or risk? Any deny-list brush?
validations:
required: false

55
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View file

@ -0,0 +1,55 @@
name: Bug report
description: Report a reproducible problem or wrong behavior in OmniGraph.
title: "bug: <short summary>"
labels: ["bug", "needs-triage"]
body:
- type: markdown
attributes:
value: |
Issues are for **reporting problems** — concrete, reproducible bugs.
For ideas, feature requests, or questions, please use
[Discussions](../../discussions) instead.
For a security vulnerability, follow [SECURITY.md](../../blob/main/SECURITY.md) — do **not** file it here.
A maintainer will triage this; once labelled **`accepted`** it's open for a pull request
(see [GOVERNANCE.md](../../blob/main/GOVERNANCE.md)).
- type: textarea
id: what-happened
attributes:
label: What happened
description: What went wrong, and what you expected instead.
validations:
required: true
- type: textarea
id: repro
attributes:
label: Steps to reproduce
description: Minimal steps, commands, schema/query, or a failing snippet.
placeholder: |
1. omnigraph init ...
2. omnigraph ...
3. observed: ... / expected: ...
validations:
required: true
- type: input
id: version
attributes:
label: Version
description: Output of `omnigraph --version` (or the engine/crate version) and how you installed it.
validations:
required: true
- type: input
id: environment
attributes:
label: Environment
description: OS, architecture, and storage backend (local FS / S3 / RustFS / MinIO).
validations:
required: false
- type: textarea
id: logs
attributes:
label: Logs / output
description: Relevant error text or logs. Will be rendered as code.
render: shell
validations:
required: false

13
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View file

@ -0,0 +1,13 @@
# Issues are for problem reports only. Disable blank issues so everything is
# routed: bugs through the form, everything else to Discussions / SECURITY.md.
blank_issues_enabled: false
contact_links:
- name: 💡 Idea, feature request, or RFC
url: https://github.com/ModernRelay/omnigraph/discussions
about: Propose features and designs in Discussions. RFCs graduate from there into a docs/rfcs/ pull request.
- name: ❓ Question or help
url: https://github.com/ModernRelay/omnigraph/discussions
about: Ask in Discussions — questions are not tracked as Issues.
- name: 🔒 Security vulnerability
url: https://github.com/ModernRelay/omnigraph/blob/main/SECURITY.md
about: Report security issues privately per SECURITY.md — never as a public Issue.

29
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View file

@ -0,0 +1,29 @@
<!--
Thanks for contributing! See CONTRIBUTING.md and GOVERNANCE.md.
A substantive PR needs a backing accepted issue or accepted RFC.
Maintainers: your internal process applies; the link requirement below
is for external contributions.
-->
## What & why
<!-- One or two sentences: what this changes and why. -->
## Backing issue / RFC
<!-- Pick one. A substantive change needs (1) or (2). -->
- [ ] Fixes an **accepted** issue: Closes #
- [ ] Implements / is an **accepted** RFC: <link to docs/rfcs/NNNN-*.md>
- [ ] **Trivial fast-lane** (typo / docs / dependency bump / comment / one-line CI) — no issue/RFC required
## Checklist
- [ ] Change is focused (one logical change)
- [ ] Tests added/updated for behavior changes (or N/A)
- [ ] Public docs updated if user-facing surface changed (or N/A)
- [ ] Reviewed against [docs/dev/invariants.md](../blob/main/docs/dev/invariants.md) — no Hard Invariant weakened, no deny-list item hit (or justified)
## Notes for reviewers
<!-- Anything that helps review: tradeoffs, follow-ups, areas of risk. -->

View file

@ -22,6 +22,7 @@ roles:
compiler.
members:
- ragnorc
- aaltshuler
docs:
description: >

View file

@ -1,10 +1,29 @@
# Contributing
Small bug fixes and documentation improvements are welcome directly through pull
requests.
Thanks for your interest in OmniGraph. This page is the practical how-to; the
rules and decision authority behind it live in [GOVERNANCE.md](GOVERNANCE.md).
For larger changes, please open an issue or design discussion first so the
proposed direction is clear before implementation starts.
## Start in the right place
| I want to… | Go to | Notes |
|---|---|---|
| **Report a bug** or wrong behavior | **[Open an Issue](../../issues/new/choose)** | Concrete and reproducible. A maintainer triages it; once labelled **`accepted`** it's open for a PR. |
| **Suggest a feature / share an idea / ask** | **[Start a Discussion](../../discussions)** | Ideas and questions live here, not in Issues. |
| **Propose a design / RFC** | **An RFC pull request** | Anyone can author one — see [docs/rfcs/README.md](docs/rfcs/README.md). A maintainer merging it is acceptance. |
| **Fix something / implement a change** | **A pull request** | Must link an `accepted` issue or an accepted RFC — unless it's trivial (below). |
| **Report a security vulnerability** | **[SECURITY.md](SECURITY.md)** | Do **not** open a public Issue. |
### When can I just open a PR?
The **trivial fast-lane** — open directly, no prior issue/RFC needed: typo and
wording fixes, doc corrections, dependency bumps, comment fixes, obvious
one-line CI tweaks. Anything more substantial needs a backing `accepted` issue
or accepted RFC first, so the *why* is agreed before the *how* is reviewed. A PR
that turns out to be non-trivial will be redirected — that's about process, not
the merit of the change.
> **Maintainers (ModernRelay team)** follow a separate internal process and are
> not bound by the intake rules above. Everyone is bound by review, CODEOWNERS,
> branch protection, and CI.
## Development
@ -49,6 +68,11 @@ CI runs both.
## Pull Requests
- keep changes focused
- include tests for behavior changes when practical
- update public docs when the user-facing surface changes
- **Link the backing issue or RFC** (`Closes #123`, or reference the RFC) — or
mark the PR as trivial per the fast-lane.
- Keep changes focused; one logical change per PR.
- Include tests for behavior changes when practical.
- Update public docs when the user-facing surface changes.
New to the codebase? Read [AGENTS.md](AGENTS.md) — the architecture map and the
always-on invariants every change is reviewed against.

106
GOVERNANCE.md Normal file
View file

@ -0,0 +1,106 @@
# Governance
This document describes how **external contributions** to OmniGraph are
proposed, accepted, and merged. It exists so an outside contributor can answer,
without asking: *where does my report/idea/change go, who decides, and what has
to happen before code lands?*
> **Scope.** This governs the public contribution surface — Issues,
> Discussions, RFCs, and pull requests from people outside the ModernRelay
> team. **Maintainers operate under a separate internal process** and are not
> bound by the intake gates below. Everyone, maintainer or not, is still bound
> by the universal gates: branch protection on `main` and CODEOWNERS review
> (see [docs/dev/branch-protection.md](docs/dev/branch-protection.md) and
> [docs/dev/codeowners.md](docs/dev/codeowners.md)).
## Roles
| Role | Who | Authority |
|---|---|---|
| **Maintainer** | The code owners in [`.github/CODEOWNERS`](.github/CODEOWNERS) (generated from [`.github/codeowners-roles.yml`](.github/codeowners-roles.yml)) | Validate issues, accept/reject RFCs, review and merge PRs, set direction. Final decision authority. |
| **Contributor** | Anyone else | Report problems (Issues), propose ideas (Discussions), author RFCs, and open pull requests. |
Decision authority rests with the maintainers. CODEOWNERS is the single source
of truth for who that is; this document does not duplicate the list.
## The three channels
Each channel has one job. Using the right one is the first thing we ask of a
contribution.
| Channel | Purpose | Not for |
|---|---|---|
| **[Issues](../../issues)** | **Report a problem** — a bug, a regression, a documented behavior that's wrong. Something concrete and reproducible. | Feature requests, ideas, questions, or design proposals (→ Discussions). |
| **[Discussions](../../discussions)** | **Propose and explore** — new ideas, feature requests, questions, and the incubation of RFCs. | Bug reports (→ Issues). |
| **Pull requests** | **Land a sanctioned change** — a fix for a *validated* issue, an *accepted* RFC, or a trivial change (see fast-lane). | Substantive change with no backing issue/RFC — it will be redirected. |
## How a change becomes mergeable
```
┌─────────── bug ───────────┐ ┌──────── idea / feature ────────┐
▼ │ ▼ │
Issue (problem report) │ Discussion (idea / RFC incubation) │
│ │ │ │
maintainer triage │ rough consensus │
│ │ │ graduate │
▼ │ ▼ │
label: accepted ──────────┐ │ RFC PR (docs/rfcs/NNNN-*.md) │
│ │ │ │ │
│ │ │ maintainer review │
▼ ▼ │ ▼ │
Pull request ◀──────────┴──────────│── merged == accepted │
(links the issue or the accepted RFC) ◀───────┘ (implementation PRs reference it) │
review + CODEOWNERS + branch protection
merged
```
### Issues → validated
A new issue starts unlabeled. A maintainer triages it and, if it's a real,
in-scope problem, applies the **`accepted`** label. **Only `accepted` issues are
open for a contributor PR.** This prevents the "I fixed an issue you hadn't
agreed was a problem" rejection. Want to fix something? Get the issue accepted
first, or pick one already labelled `accepted` / `help wanted`.
### Discussions → RFCs → accepted
Ideas and feature requests start in **Discussions**. Anyone — including external
contributors — may then **author an RFC** by opening a pull request that adds
`docs/rfcs/NNNN-title.md` (see [docs/rfcs/README.md](docs/rfcs/README.md)). The
RFC is reviewed as code; **a maintainer merging it is the act of acceptance**
(it becomes the durable decision record). Implementation PRs then reference the
accepted RFC.
Authoring an RFC is open to everyone; **accepting one is a maintainer
decision.** Maintainers may also decline an RFC, with rationale, by closing it.
### Pull requests → sanctioned
A contributor PR must do one of:
1. link a maintainer-**`accepted`** issue it fixes, or
2. be (or reference) an **accepted RFC**, or
3. qualify for the **trivial fast-lane**.
**Trivial fast-lane** — these may be opened directly, no prior issue/RFC:
typo and wording fixes, documentation corrections, dependency bumps, comment
fixes, and obviously-correct one-line CI tweaks. When in doubt, open an Issue or
Discussion first; a PR that turns out to be non-trivial will be asked to.
A substantive PR with no backing issue/RFC will be closed with a pointer to the
right channel — not as a judgment of the idea, but to keep design discussion
where it's reviewable.
## What maintainers do *not* gate
Maintainers' own changes do not pass through the intake gates above — the team
runs a separate internal process. The universal gates (review, CODEOWNERS,
branch protection, CI) apply to everyone. Enforcement of the intake rules is, to
start, **by convention and review** (PR template + labels); an automated check
keyed to author association may be added later if volume warrants.
## Code of conduct & security
- Conduct: [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md).
- Security issues are **not** public Issues — see [SECURITY.md](SECURITY.md).
## Changing this document
Governance changes the same way code does: a pull request, reviewed by
maintainers. This file describes the external surface; the internal maintainer
process is intentionally out of scope here.

View file

@ -48,6 +48,22 @@ const OBJECT_TYPE_TABLE_VERSION: &str = "table_version";
const OBJECT_TYPE_TABLE_TOMBSTONE: &str = "table_tombstone";
const TABLE_VERSION_MANAGEMENT_KEY: &str = "table_version_management";
/// Apply pending internal-schema migrations against `__manifest` on the
/// open-for-write path, independent of a publish.
///
/// `Omnigraph::open(ReadWrite)` calls this before the coordinator reads branch
/// state, so branch-observing code (`branch_list`, the schema-apply
/// blocking-branch checks) sees the post-migration graph. In particular the
/// v2→v3 step sweeps legacy `__run__*` staging branches off `__manifest`
/// (MR-770); running it here closes the window where those branches would
/// otherwise block schema apply before the first publish runs the migration.
///
/// Idempotent: a no-op stamp read when the on-disk version already matches.
pub(crate) async fn migrate_on_open(root_uri: &str) -> Result<()> {
let mut dataset = open_manifest_dataset(root_uri, None).await?;
migrations::migrate_internal_schema(&mut dataset).await
}
/// Immutable point-in-time view of the database.
///
/// Cheap to create (no storage I/O). All reads within a query go through one

View file

@ -46,7 +46,11 @@ use crate::error::{OmniError, Result};
/// - v2 — `__manifest.object_id` carries the unenforced-PK annotation,
/// engaging Lance's bloom-filter conflict resolver at commit time. Added
/// alongside `expected_table_versions` OCC on `ManifestBatchPublisher::publish`.
pub(super) const INTERNAL_MANIFEST_SCHEMA_VERSION: u32 = 2;
/// - v3 — one-time sweep of legacy `__run__<id>` staging branches left on the
/// `__manifest` dataset by the pre-v0.4.0 Run state machine (removed in
/// MR-771). Once swept, the `is_internal_run_branch` defense-in-depth guard
/// is no longer needed (MR-770).
pub(super) const INTERNAL_MANIFEST_SCHEMA_VERSION: u32 = 3;
const INTERNAL_SCHEMA_VERSION_KEY: &str = "omnigraph:internal_schema_version";
const OBJECT_ID_PK_KEY: &str = "lance-schema:unenforced-primary-key";
@ -89,6 +93,10 @@ pub(super) async fn migrate_internal_schema(dataset: &mut Dataset) -> Result<()>
migrate_v1_to_v2(dataset).await?;
current = 2;
}
2 => {
migrate_v2_to_v3(dataset).await?;
current = 3;
}
other => {
return Err(OmniError::manifest_internal(format!(
"no internal-schema migration registered for v{} → v{}",
@ -122,6 +130,51 @@ async fn migrate_v1_to_v2(dataset: &mut Dataset) -> Result<()> {
set_stamp(dataset, 2).await
}
/// v2 → v3: sweep legacy `__run__<id>` staging branches off the `__manifest`
/// dataset, then bump the stamp.
///
/// The pre-v0.4.0 Run state machine (removed in MR-771) created graph-level
/// staging branches named `__run__<ulid>` on `__manifest`. MR-771 stopped
/// creating them but left any pre-existing ones in place; Lance's
/// `list_branches` still enumerates them, so they leak into `branch_list()`
/// and count as blocking branches at schema-apply time. This one-time sweep
/// removes them so the `is_internal_run_branch` guard can retire (MR-770).
///
/// The `"__run__"` prefix is inlined here on purpose: this migration must keep
/// working after the `run_registry` module (the guard) is deleted, so it does
/// not depend on it.
///
/// Idempotent under both sequential retry and concurrent runners: each run
/// re-enumerates `list_branches` fresh, and `force_delete_branch` tolerates a
/// branch that is already gone — so a crash before the stamp bump, or a second
/// process opening the same legacy graph at the same time, never errors out.
async fn migrate_v2_to_v3(dataset: &mut Dataset) -> Result<()> {
const LEGACY_RUN_BRANCH_PREFIX: &str = "__run__";
let branches = dataset
.list_branches()
.await
.map_err(|e| OmniError::Lance(e.to_string()))?;
let run_branches: Vec<String> = branches
.into_keys()
.filter(|name| {
name.trim_start_matches('/')
.starts_with(LEGACY_RUN_BRANCH_PREFIX)
})
.collect();
for name in run_branches {
// `force_delete_branch` deletes even when the `BranchContents` is
// already gone. Plain `delete_branch` errors "BranchContents not
// found", which would fail a second concurrent open (or a retry that
// raced another runner) after the first one swept the branch. Force is
// exactly Lance's documented path for cleaning up zombie branches.
dataset
.force_delete_branch(&name)
.await
.map_err(|e| OmniError::Lance(e.to_string()))?;
}
set_stamp(dataset, 3).await
}
async fn set_stamp(dataset: &mut Dataset, version: u32) -> Result<()> {
dataset
.update_schema_metadata([(INTERNAL_SCHEMA_VERSION_KEY.to_string(), version.to_string())])

View file

@ -1461,6 +1461,80 @@ async fn test_publish_migrates_pre_stamp_manifest_to_current_version() {
assert!(reopened.snapshot().entry("node:Person").is_some());
}
#[tokio::test]
async fn test_v2_to_v3_sweeps_legacy_run_branches_on_write_open() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let catalog = build_test_catalog();
let mut mc = ManifestCoordinator::init(uri, &catalog).await.unwrap();
// Synthesize a pre-MR-770 graph: several stale `__run__` staging branches
// left on `__manifest` (a real legacy graph accumulates one per run), plus
// a real user branch that must survive the sweep. Multiple run branches
// exercise the migration's delete loop on a single reused dataset handle.
mc.create_branch("__run__01J9LEGACY").await.unwrap();
mc.create_branch("__run__01J9SECOND").await.unwrap();
mc.create_branch("__run__01J9THIRD").await.unwrap();
mc.create_branch("feature").await.unwrap();
let before = mc.list_branches().await.unwrap();
assert_eq!(
before.iter().filter(|b| b.starts_with("__run__")).count(),
3,
"precondition: three legacy run branches exist on __manifest; got {before:?}",
);
// Rewind the internal-schema stamp to v2 so the next write-open runs the
// v2 → v3 sweep arm (init stamps at the current version, which is past it).
{
let mut ds = open_manifest_dataset(uri, None).await.unwrap();
ds.update_schema_metadata([(
"omnigraph:internal_schema_version".to_string(),
Some("2".to_string()),
)])
.await
.unwrap();
let post = open_manifest_dataset(uri, None).await.unwrap();
assert_eq!(super::migrations::read_stamp(&post), 2, "stamp rewound to v2");
}
// A no-op publish forces the open-for-write path, which runs the migration.
let mut expected = HashMap::new();
expected.insert("node:Person".to_string(), 1);
GraphNamespacePublisher::new(uri, None)
.publish(&[], &expected)
.await
.unwrap();
// Stamp advanced to current; the legacy run branch is physically gone from
// `__manifest` (checked via the raw, unfiltered manifest list — not the
// guard-filtered `branch_list`), and the real branch + `main` survive.
let post = open_manifest_dataset(uri, None).await.unwrap();
assert_eq!(
super::migrations::read_stamp(&post),
super::migrations::INTERNAL_MANIFEST_SCHEMA_VERSION,
);
let reopened = ManifestCoordinator::open(uri).await.unwrap();
let after = reopened.list_branches().await.unwrap();
assert!(
!after.iter().any(|b| b.starts_with("__run__")),
"legacy run branch must be swept; got {after:?}",
);
assert!(after.iter().any(|b| b == "feature"), "user branch must survive");
assert!(after.iter().any(|b| b == "main"), "main must survive");
// Idempotent: a second write-open finds the stamp at current and does not
// re-run the sweep or error.
GraphNamespacePublisher::new(uri, None)
.publish(&[], &expected)
.await
.unwrap();
let final_ds = open_manifest_dataset(uri, None).await.unwrap();
assert_eq!(
super::migrations::read_stamp(&final_ds),
super::migrations::INTERNAL_MANIFEST_SCHEMA_VERSION,
);
}
#[tokio::test]
async fn test_publish_rejects_manifest_stamped_at_future_version() {
let dir = tempfile::tempdir().unwrap();

View file

@ -3,7 +3,6 @@ pub mod graph_coordinator;
pub mod manifest;
mod omnigraph;
mod recovery_audit;
mod run_registry;
mod schema_state;
pub(crate) mod write_queue;
@ -15,7 +14,6 @@ pub use omnigraph::{
CleanupPolicyOptions, InitOptions, MergeOutcome, Omnigraph, OpenMode, SchemaApplyOptions,
SchemaApplyResult, SkipReason, TableCleanupStats, TableOptimizeStats,
};
pub(crate) use run_registry::is_internal_run_branch;
pub(crate) const SCHEMA_APPLY_LOCK_BRANCH: &str = "__schema_apply_lock__";
@ -69,5 +67,8 @@ pub(crate) fn is_schema_apply_lock_branch(name: &str) -> bool {
}
pub(crate) fn is_internal_system_branch(name: &str) -> bool {
is_internal_run_branch(name) || is_schema_apply_lock_branch(name)
// Legacy `__run__*` staging branches (Run state machine, removed MR-771)
// are swept off `__manifest` by the v2→v3 internal-schema migration, so the
// only internal branch the engine still creates is the schema-apply lock.
is_schema_apply_lock_branch(name)
}

View file

@ -347,6 +347,16 @@ impl Omnigraph {
mode: OpenMode,
) -> Result<Self> {
let root = normalize_root_uri(uri)?;
// Apply pending internal-schema migrations before the coordinator reads
// branch state, so `branch_list` and the schema-apply blocking-branch
// checks observe the post-migration graph — notably the v2→v3 sweep of
// legacy `__run__*` staging branches (MR-770). ReadWrite only: a
// read-only open must not trigger object-store writes, so a read-only
// open of an unmigrated legacy graph still lists `__run__*` until its
// first read-write open (an accepted, documented limitation).
if matches!(mode, OpenMode::ReadWrite) {
crate::db::manifest::migrate_on_open(&root).await?;
}
// Open the coordinator first so the schema-staging recovery sweep can
// compare its snapshot against any leftover staging files.
let mut coordinator = GraphCoordinator::open(&root, Arc::clone(&storage)).await?;
@ -1494,12 +1504,6 @@ pub(crate) fn normalize_branch_name(branch: &str) -> Result<Option<String>> {
}
pub(crate) fn ensure_public_branch_ref(branch: &str, operation: &str) -> Result<()> {
if super::is_internal_run_branch(branch) {
return Err(OmniError::manifest(format!(
"{} does not allow internal run ref '{}'",
operation, branch
)));
}
if is_internal_system_branch(branch) {
return Err(OmniError::manifest(format!(
"{} does not allow internal system ref '{}'",
@ -1903,7 +1907,6 @@ fn json_value_from_array(array: &dyn Array, row: usize) -> Result<serde_json::Va
#[cfg(test)]
mod tests {
use super::*;
use crate::db::is_internal_run_branch;
use crate::db::manifest::ManifestCoordinator;
use async_trait::async_trait;
use serde_json::Value;
@ -2245,11 +2248,11 @@ edge WorksAt: Person -> Company
#[tokio::test]
async fn test_apply_schema_succeeds_after_load() {
// Historical: schema apply used to be blocked by leftover
// `__run__` branches. A defense-in-depth filter now skips
// internal system branches, and run branches were made
// ephemeral on every terminal state — so in practice no
// `__run__` branch survives publish. The filter still guards
// the invariant.
// `__run__` branches. The Run state machine was removed in
// MR-771, so a fresh graph never creates a `__run__` branch;
// legacy ones are swept by the v2→v3 manifest migration. This
// asserts the invariant a current graph upholds: publish leaves
// no `__run__` branch behind, so schema apply proceeds.
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
@ -2264,8 +2267,8 @@ edge WorksAt: Person -> Company
let all_branches = db.coordinator.read().await.all_branches().await.unwrap();
assert!(
!all_branches.iter().any(|b| is_internal_run_branch(b)),
"run branch should be deleted after publish, got: {:?}",
!all_branches.iter().any(|b| b.starts_with("__run__")),
"no __run__ branch should exist after publish, got: {:?}",
all_branches
);
@ -2277,6 +2280,56 @@ edge WorksAt: Person -> Company
assert!(result.applied, "schema apply should have applied");
}
/// Regression (MR-770): a pre-v0.4.0 graph that still carries a stale
/// `__run__*` branch on `__manifest` must not block schema apply. The
/// v2→v3 sweep runs in `Omnigraph::open(ReadWrite)` — before the
/// schema-apply blocking-branch check — so apply succeeds with no
/// intervening publish.
///
/// Confirmed to fail before the open-time migration landed: the reopened
/// graph still listed `__run__legacy`, and `apply_schema` returned
/// "found non-main branches: __run__legacy".
#[tokio::test]
async fn legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply() {
let dir = tempfile::tempdir().unwrap();
let uri = dir.path().to_str().unwrap();
let mut db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap();
// Synthesize a legacy graph: a stale `__run__` branch on `__manifest`
// plus the manifest stamp rewound to v2 (pre-sweep).
db.branch_create("__run__legacy").await.unwrap();
drop(db);
{
let mut ds = lance::Dataset::open(&format!("{}/__manifest", uri))
.await
.unwrap();
ds.update_schema_metadata([(
"omnigraph:internal_schema_version".to_string(),
Some("2".to_string()),
)])
.await
.unwrap();
}
// Reopen (ReadWrite): the open-time migration must sweep `__run__legacy`
// before any branch-observing code runs.
let db = Omnigraph::open(uri).await.unwrap();
let branches = db.branch_list().await.unwrap();
assert!(
!branches.iter().any(|b| b.starts_with("__run__")),
"open-time migration must sweep legacy __run__ branches; got {branches:?}",
);
// Schema apply must proceed with no intervening publish — the
// blocking-branch check no longer sees `__run__legacy`.
let desired = TEST_SCHEMA.replace(
" age: I32?\n}",
" age: I32?\n nickname: String?\n}",
);
let result = db.apply_schema(&desired).await.unwrap();
assert!(result.applied, "schema apply should have applied");
}
#[tokio::test]
async fn test_apply_schema_adds_index_for_existing_property() {
let dir = tempfile::tempdir().unwrap();

View file

@ -61,11 +61,11 @@ async fn plan_schema_for_apply(
) -> Result<PlannedSchemaApply> {
db.ensure_schema_state_valid().await?;
let branches = db.coordinator.read().await.all_branches().await?;
// Skip `main` and internal system branches. The schema-apply lock branch
// is excluded because it is the cluster-wide schema-apply serializer.
// `__run__*` branches are no longer created; the filter remains as
// defense-in-depth for legacy graphs with leftover staging branches.
// A future production sweep will let this guard go.
// Skip `main` and internal system branches (the schema-apply lock branch,
// the cluster-wide schema-apply serializer). Legacy `__run__*` staging
// branches were swept off `__manifest` by the v2→v3 migration that runs in
// `Omnigraph::open(ReadWrite)` before this check (MR-770), so they no
// longer appear here.
let blocking_branches = branches
.into_iter()
.filter(|branch| branch != "main" && !is_internal_system_branch(branch))

View file

@ -1,16 +0,0 @@
// The Run state machine has been removed. Mutations now write directly
// to target tables and use the publisher's `expected_table_versions`
// CAS for cross-table OCC; `__run__<id>` staging branches and the
// `_graph_runs.lance` state machine no longer exist.
//
// What remains is the branch-name predicate, kept as a defense-in-depth
// guard against users naming a public branch `__run__*`. A future
// production sweep of legacy `_graph_runs.lance` rows and stale
// `__run__*` branches will let this predicate (and this file) go too.
pub(crate) const INTERNAL_RUN_BRANCH_PREFIX: &str = "__run__";
pub(crate) fn is_internal_run_branch(name: &str) -> bool {
name.trim_start_matches('/')
.starts_with(INTERNAL_RUN_BRANCH_PREFIX)
}

View file

@ -1092,9 +1092,9 @@ impl Omnigraph {
target: &str,
actor_id: Option<&str>,
) -> Result<MergeOutcome> {
if is_internal_run_branch(source) || is_internal_run_branch(target) {
if is_internal_system_branch(source) || is_internal_system_branch(target) {
return Err(OmniError::manifest(format!(
"branch_merge does not allow internal run refs ('{}' -> '{}')",
"branch_merge does not allow internal system refs ('{}' -> '{}')",
source, target
)));
}

View file

@ -35,7 +35,7 @@ use time::format_description::well_known::Rfc3339;
use crate::db::commit_graph::CommitGraph;
use crate::db::manifest::ManifestCoordinator;
use crate::db::{MergeOutcome, Omnigraph, is_internal_run_branch};
use crate::db::{MergeOutcome, Omnigraph, is_internal_system_branch};
use crate::db::{ReadTarget, Snapshot};
use crate::embedding::EmbeddingClient;
use crate::error::{MergeConflict, MergeConflictKind, OmniError, Result};

View file

@ -288,21 +288,24 @@ async fn load_jsonl_reader<R: BufRead>(
let mut node_rows: HashMap<String, Vec<JsonValue>> = HashMap::new();
let mut edge_rows: HashMap<String, Vec<(String, String, JsonValue)>> = HashMap::new();
for (line_num, line) in reader.lines().enumerate() {
let line = line?;
let line = line.trim();
if line.is_empty() {
continue;
}
let value: JsonValue = serde_json::from_str(line).map_err(|e| {
OmniError::manifest(format!("invalid JSON on line {}: {}", line_num + 1, e))
// Parse a stream of JSON values. Accepts both compact JSONL (one object
// per line) and pretty-printed JSON where a single object spans multiple
// lines — serde's streaming deserializer treats any whitespace (including
// newlines) between top-level values as a separator.
for (idx, parsed) in serde_json::Deserializer::from_reader(reader)
.into_iter::<JsonValue>()
.enumerate()
{
let record_num = idx + 1;
let value: JsonValue = parsed.map_err(|e| {
OmniError::manifest(format!("invalid JSON at record {}: {}", record_num, e))
})?;
if let Some(type_name) = value.get("type").and_then(|v| v.as_str()) {
if !catalog.node_types.contains_key(type_name) {
return Err(OmniError::manifest(format!(
"line {}: unknown node type '{}'",
line_num + 1,
"record {}: unknown node type '{}'",
record_num,
type_name
)));
}
@ -317,8 +320,8 @@ async fn load_jsonl_reader<R: BufRead>(
} else if let Some(edge_name) = value.get("edge").and_then(|v| v.as_str()) {
if catalog.lookup_edge_by_name(edge_name).is_none() {
return Err(OmniError::manifest(format!(
"line {}: unknown edge type '{}'",
line_num + 1,
"record {}: unknown edge type '{}'",
record_num,
edge_name
)));
}
@ -326,14 +329,14 @@ async fn load_jsonl_reader<R: BufRead>(
.get("from")
.and_then(|v| v.as_str())
.ok_or_else(|| {
OmniError::manifest(format!("line {}: edge missing 'from'", line_num + 1))
OmniError::manifest(format!("record {}: edge missing 'from'", record_num))
})?
.to_string();
let to = value
.get("to")
.and_then(|v| v.as_str())
.ok_or_else(|| {
OmniError::manifest(format!("line {}: edge missing 'to'", line_num + 1))
OmniError::manifest(format!("record {}: edge missing 'to'", record_num))
})?
.to_string();
let data = value
@ -347,8 +350,8 @@ async fn load_jsonl_reader<R: BufRead>(
.push((from, to, data));
} else {
return Err(OmniError::manifest(format!(
"line {}: expected 'type' or 'edge' field",
line_num + 1
"record {}: expected 'type' or 'edge' field",
record_num
)));
}
}

View file

@ -371,11 +371,10 @@ async fn cancelled_mutation_future_leaves_no_state() {
// Cancel-safety property: no graph-level run/staging state remains.
//
// Note: `branch_list()` already filters `__run__*` via
// `is_internal_system_branch`, so a runtime "no `__run__` branches" check
// would be vacuous. The structural property that no `__run__` branches
// can ever be created is enforced by deletion of `begin_run` etc. in
// (verified by the build itself — those symbols no longer exist).
// No `__run__` branches can ever be created: the Run state machine
// (`begin_run` etc.) was deleted in MR-771 — verified by the build itself,
// those symbols no longer exist. Any legacy `__run__*` branch on an
// upgraded graph is swept by the v2→v3 manifest migration.
//
// (1) The branch list is unchanged: cancellation/completion cannot
// synthesize new public branches.
@ -442,34 +441,40 @@ async fn repeated_loads_do_not_accumulate_branches() {
assert_eq!(db.branch_list().await.unwrap(), vec!["main".to_string()]);
}
/// User code must not be able to write to internal `__run__*` names.
/// The branch-name guard predicate is kept as defense-in-depth; it
/// will be removed once a future production sweep retires the legacy
/// branches.
/// After MR-770, `__run__*` is an ordinary branch name — the Run state machine
/// and its `is_internal_run_branch` guard are gone. The surviving internal-ref
/// guard still rejects the active `__schema_apply_lock__` branch on the public
/// create/merge APIs.
#[tokio::test]
async fn public_branch_apis_reject_internal_run_refs() {
async fn public_branch_apis_reject_internal_system_refs() {
let dir = tempfile::tempdir().unwrap();
let mut db = init_and_load(&dir).await;
let create_err = db.branch_create("__run__synthetic").await.unwrap_err();
// `__run__*` is no longer reserved — creating it now succeeds.
db.branch_create("__run__formerly_reserved")
.await
.expect("__run__ prefix is a normal branch name post-MR-770");
// The schema-apply lock branch is still rejected on public branch APIs.
let create_err = db.branch_create("__schema_apply_lock__").await.unwrap_err();
let OmniError::Manifest(err) = create_err else {
panic!("expected Manifest error");
};
assert!(
err.message.contains("internal run ref"),
err.message.contains("internal system ref"),
"unexpected error: {}",
err.message
);
let merge_err = db
.branch_merge("__run__synthetic", "main")
.branch_merge("__schema_apply_lock__", "main")
.await
.unwrap_err();
let OmniError::Manifest(err) = merge_err else {
panic!("expected Manifest error");
};
assert!(
err.message.contains("internal run refs"),
err.message.contains("internal system refs"),
"unexpected error: {}",
err.message
);

View file

@ -14,8 +14,8 @@ The tables below are **generated** from `.github/codeowners-roles.yml` by `.gith
| Path | Owners | Role(s) |
|---|---|---|
| `*` | @ragnorc | engineering |
| `crates/**` | @ragnorc | engineering |
| `*` | @ragnorc @aaltshuler | engineering |
| `crates/**` | @ragnorc @aaltshuler | engineering |
| `docs/**` | @ragnorc | docs |
| `README.md` | @ragnorc | docs |
| `AGENTS.md` | @ragnorc | docs |
@ -26,7 +26,7 @@ The tables below are **generated** from `.github/codeowners-roles.yml` by `.gith
| Role | Members | Description |
|---|---|---|
| `engineering` | @ragnorc | All production code under crates/**. Engine, CLI, server, compiler. |
| `engineering` | @ragnorc @aaltshuler | All production code under crates/**. Engine, CLI, server, compiler. |
| `docs` | @ragnorc | Documentation under docs/**, plus repo-level docs (README.md, AGENTS.md, CLAUDE.md symlink, SECURITY.md). |
<!-- END GENERATED OWNERSHIP -->

View file

@ -51,6 +51,18 @@ constraints. User-facing behavior should still be documented through
| Install and deployment packaging | [install.md](../user/install.md), [deployment.md](../user/deployment.md) |
| Release history | [releases/](../releases/) |
## Contribution & Governance
| Area | Read |
|---|---|
| How to contribute (external) | [CONTRIBUTING.md](../../CONTRIBUTING.md) |
| Governance model, roles, decision authority | [GOVERNANCE.md](../../GOVERNANCE.md) |
| Public contribution RFC track | [rfcs/](../rfcs/) |
The `docs/rfcs/` track is the **public, externally-authorable** RFC process. The
maintainer/internal RFCs below (`rfc-00N-*.md`) are a separate, team-owned
track; don't conflate the two.
## Active Implementation Plans
Working documents for in-flight feature work. Removed when the work lands.

View file

@ -14,8 +14,11 @@ publisher's row-level CAS on `__manifest` is the single fence.
- No `RunRecord`, no `_graph_runs.lance`, no `_graph_run_actors.lance`.
- No `omnigraph run *` CLI subcommands and no `/runs/*` HTTP endpoints.
- No `__run__<id>` staging branches. (Legacy on-disk artifacts from
pre-MR-771 repos are inert; MR-770 sweeps them in production.)
- No `__run__<id>` staging branches; `__run__*` is no longer a reserved
name. The branch-name guard was removed in MR-770, and any stale
`__run__*` branch on an upgraded graph is swept off `__manifest` by the
v2→v3 internal-schema migration on first read-write open. (The inert
`_graph_runs.lance` bytes remain until a `delete_prefix` primitive lands.)
- Cancelled mutation futures leave **no graph-level state** — only orphaned
Lance fragments, which the existing `omnigraph cleanup` pipe reclaims.
@ -241,9 +244,14 @@ list`.
## Migration code
`db/manifest/migrations.rs` does not change. Active deletion of
`_graph_runs.lance` belongs in MR-770 (the production sweep) — this PR
stops *creating* run state but does not destroy legacy bytes on disk.
`db/manifest/migrations.rs` carries the v2→v3 internal-schema step (MR-770):
a one-time sweep that deletes legacy `__run__*` staging branches off
`__manifest`. It runs in `Omnigraph::open(ReadWrite)` (via
`manifest::migrate_on_open`, before the coordinator reads branch state) and
again on the publisher's write path; both are idempotent once the stamp is at
v3. Deleting the inert `_graph_runs.lance` / `_graph_run_actors.lance` dataset
*bytes* is still deferred — it needs a `StorageAdapter::delete_prefix`
primitive — but those bytes are invisible to graph-level state.
## Mid-query partial failure: closed by MR-794

View file

@ -7,6 +7,7 @@ v0.6.1 focuses on operational polish after v0.6.0: stored-query registries, safe
- **Stored-query registries.** `omnigraph.yaml` can declare curated `queries:` blocks per graph. Servers load and type-check them at startup, `omnigraph queries validate` checks them offline, `omnigraph queries list` shows exposed queries and typed params, `GET /queries` exposes a typed catalog, and `POST /queries/{name}` invokes a stored query without accepting ad hoc `.gq` source from the client.
- **Stored-query policy gate.** New Cedar action `invoke_query` gates the stored-query invocation surface. Stored mutations are double-gated: `invoke_query` to reach the stored query and `change` for the actual write.
- **Safer branch deletion.** `branch_delete` now treats the manifest as the authority, flips branch visibility atomically, and reclaims per-table/commit-graph forks as derived state. If best-effort reclaim is interrupted, `cleanup` reconciles orphaned forks; reusing a branch name before cleanup reports an actionable error.
- **Legacy `__run__` cleanup (MR-770).** Removed the last functional remnant of the Run state machine (retired in v0.4.0): the `__run__` branch-name guard. A new v2→v3 `__manifest` internal-schema migration sweeps any stale `__run__*` staging branches on the first read-write open, so `__run__*` is no longer a reserved branch name. This closes the "unpromoted `__run__` branches block reads" condition behind the zombie-run cascade incident; the inert `_graph_runs.lance` row cleanup is tracked separately (it needs a `delete_prefix` primitive).
- **Blob-safe optimize.** `omnigraph optimize` skips tables with `Blob` properties instead of failing the whole sweep on Lance's blob-v2 compaction decode bug. Skips are visible in human output, `--json` as `skipped`, `TableOptimizeStats.skipped`, and logs; non-blob tables still compact normally.
- **Deployment improvements.** The container entrypoint now composes `OMNIGRAPH_TARGET_URI` with `OMNIGRAPH_CONFIG`, so operators can keep the graph URI in env while loading policy/query config from a mounted file. The local RustFS bootstrap pins RustFS beta.3 and allows the current insecure local-dev default credentials.
- **Windows release support.** Tagged and edge releases now publish Windows x86_64 archives containing `omnigraph.exe` and `omnigraph-server.exe`, with a PowerShell installer and Windows install docs.
@ -17,6 +18,7 @@ v0.6.1 focuses on operational polish after v0.6.0: stored-query registries, safe
- A graph selected by name (`--target` or `server.graph`) now uses `graphs.<name>.policy` and `graphs.<name>.queries`. Top-level `policy` / `queries` blocks are only for anonymous bare-URI single-graph mode; using them with a named graph now fails loudly with migration guidance.
- `mcp.expose` defaults to `true` for stored-query registry entries. Set `mcp: { expose: false }` for service-only queries that should not appear in the catalog.
- `invoke_query` is graph-scoped, not branch-scoped. Branch/snapshot access remains enforced by the inner `read` / `change` gate.
- **Legacy `__run__` migration.** Graphs created before v0.4.0 are migrated automatically on the first **read-write** open by a v0.6.1 binary (one-time `__manifest` stamp v2→v3 sweep of stale `__run__*` branches). No action required. Two caveats: (1) a graph opened **read-only** still lists any stale `__run__*` branch until its first read-write open, since the migration is write-path-only like all manifest migrations — long-lived read-only deployments should be opened read-write once after upgrading; (2) the inert `_graph_runs.lance` / `_graph_run_actors.lance` dataset bytes are left in place until a future `delete_prefix` primitive (they are invisible to graph-level state).
- Blob tables are not compacted until the upstream Lance fix lands, so fragment count and deleted-row space on blob tables are not reclaimed by `optimize`. Reads, writes, and query results are unaffected; no on-disk migration is required.
- `TableOptimizeStats` is now `#[non_exhaustive]` and gains a `skipped: Option<SkipReason>` field (so does the new `SkipReason` enum). This is a source-level change only for downstream code that built this returned result struct by literal — rare, since it is produced by `optimize` and consumed by reading its fields; field access is unaffected, and `#[non_exhaustive]` keeps future additions non-breaking.

View file

@ -0,0 +1,54 @@
# RFC NNNN: <title>
| | |
|---|---|
| **Status** | Proposed |
| **Author(s)** | <your name / handle> |
| **Discussion** | <link to the originating Discussion, if any> |
| **Implementation** | <issue/PR links, filled in as work lands> |
> Status is maintained by maintainers: `Proposed` while the PR is open,
> `Accepted` on merge, `Declined` on close, `Superseded by NNNN` later.
## Summary
One paragraph: what this changes, in plain terms.
## Motivation
What problem does this solve, and why is it worth the ongoing cost? Tie it to a
concrete need (a Discussion, a recurring issue, a user request). Per the
project's first principle, argue the *long-run liability*, not just the
short-term convenience.
## Guide-level explanation
Explain the change as you'd teach it to a user or contributor: new commands,
syntax, API shapes, behavior. Examples first.
## Reference-level design
The precise design: data structures, IR/AST/planner changes, storage/format
impact, migration path, error behavior. Enough that a reviewer can find the
holes.
## Invariants & deny-list check
Which Hard Invariants in [../dev/invariants.md](../dev/invariants.md) does this
touch? Does it brush against any deny-list item — and if so, why is this the
justified exception? State explicitly that no invariant is weakened, or which
Known Gap moves.
## Drawbacks & alternatives
What does this cost, what did you reject, and why. "Do nothing" is a valid
alternative to weigh.
## Reversibility
Is this reversible? On-disk/wire/format and substrate choices are near-permanent
and demand more evidence; a CLI flag or doc is cheap to undo. Say which this is.
## Unresolved questions
What's deliberately left open for review to settle.

66
docs/rfcs/README.md Normal file
View file

@ -0,0 +1,66 @@
# RFCs
Substantial changes to OmniGraph — new user-facing surface, format or protocol
changes, anything irreversible or cross-cutting — go through a lightweight RFC
so the design is agreed *as reviewable code* before implementation starts. This
is the public RFC track, open to **anyone, including external contributors**.
This complements the always-on review bar in
[../dev/invariants.md](../dev/invariants.md): the invariants say *what every
change must respect*; an RFC says *why this particular change is worth making and
how*.
> **Two tracks, don't conflate them.** This `docs/rfcs/` directory is the
> **public contribution** track (anyone authors; maintainers accept). The
> maintainer-internal RFCs under `docs/dev/rfc-00N-*.md` are a separate,
> team-owned track for in-flight internal work. If you're an outside
> contributor, you're in the right place here.
## When you need one
- **RFC required:** new query/schema/CLI/HTTP surface; on-disk or wire-format
changes; a new substrate dependency; anything the deny-list in
[../dev/invariants.md](../dev/invariants.md) flags; anything irreversible
("reversibility shapes evidence demand").
- **RFC not required:** bug fixes for an `accepted` issue, and the trivial
fast-lane (typos, docs, deps) — see [../../CONTRIBUTING.md](../../CONTRIBUTING.md).
If you're unsure, start a [Discussion](../../../discussions); a maintainer will
tell you whether it needs an RFC.
## Lifecycle
```
Discussion (incubate, get rough consensus)
│ graduate
RFC pull request → adds docs/rfcs/NNNN-title.md (Status: Proposed)
maintainer review ──▶ changes requested / declined (PR closed, with rationale)
merged == Accepted (the merged file is the durable decision record)
Implementation PR(s) reference the accepted RFC
```
- **Author:** anyone. **Acceptance:** a maintainer decision, performed by
merging the RFC PR. Declining is closing it with rationale.
- The merged RFC *is* the accepted record — there is no separate sign-off step.
- Later reversals don't edit history: supersede with a new RFC that links back
and flip the old one's `Status` to `Superseded`.
## Numbering & naming
- File: `docs/rfcs/NNNN-kebab-title.md`, where `NNNN` is the next free
zero-padded integer (`0001`, `0002`, …). `0000-template.md` is reserved.
- Pick the number when you open the PR; if it collides with another in-flight
RFC, the second to merge bumps theirs.
## Status values
`Proposed` (open PR) · `Accepted` (merged) · `Declined` (closed) ·
`Superseded by NNNN` · `Implemented` (set once the work lands, optional).
Copy [0000-template.md](0000-template.md) to start.

View file

@ -4,4 +4,4 @@
- `_as` variants of every write API let callers override the actor: `mutate_as`, `ingest_as`, `branch_merge_as`, `apply_schema_as`, etc.
- Actor IDs are persisted on `GraphCommit.actor_id` with split storage in `_graph_commit_actors.lance` (the commit graph is split into `_graph_commits.lance` for the linkage and `_graph_commit_actors.lance` for the actor map).
- HTTP server uses the bearer-token actor automatically; CLI uses the local user / explicit env (no implicit actor).
- Pre-v0.4.0 graphs also stored actor IDs on `RunRecord.actor_id` in `_graph_runs.lance` / `_graph_run_actors.lance`. The Run state machine was removed in MR-771; those files are inert post-v0.4.0 and reclaimed by MR-770's production sweep.
- Pre-v0.4.0 graphs also stored actor IDs on `RunRecord.actor_id` in `_graph_runs.lance` / `_graph_run_actors.lance`. The Run state machine was removed in MR-771; those files are inert post-v0.4.0. The v2→v3 manifest migration sweeps any stale `__run__*` branches on first write-open (MR-770); the inert dataset bytes remain until a `delete_prefix` primitive lands.

View file

@ -9,8 +9,8 @@ Lance supports branching at the dataset level: a branch is a named lineage of ve
OmniGraph builds *graph branches* on top by branching every sub-table coherently:
- `branch_create(name)` / `branch_create_from(target, name)` — disallowed name `main`; fails if branch exists; ensures the schema-apply lock is idle. Atomic and authority-first like `branch_delete`: it flips the `__manifest` branch (authority), then creates the derived commit-graph branch, force-dropping any orphaned commit-graph ref left by an incomplete prior delete (the manifest branch is fresh, so a same-named commit-graph branch is provably a zombie). If commit-graph creation fails, the manifest branch is rolled back so the name never half-exists.
- `branch_list()` — returns public branches, **filters internal** `__run__…` and `__schema_apply_lock__` prefixes.
- `branch_delete(name)` — refuses if there are descendants or active runs on the branch. The manifest is the single authority for branch existence: deletion flips the `__manifest` branch ref first (one atomic op), after which the branch is gone from every snapshot. The owned per-table forks and the commit-graph branch are derived state, reclaimed best-effort with `force_delete_branch` after the flip. A failure during that reclaim (transient object-store error) does not fail the call or block the authority flip; the leftover forks are unreachable orphans that the [`cleanup`](maintenance.md) reconciler converges. One consequence: if a delete's best-effort reclaim fails, reusing that branch name before the next `cleanup` surfaces a clear error pointing at `cleanup` (the stale fork would otherwise collide on first write).
- `branch_list()` — returns public branches, **filters the internal** `__schema_apply_lock__` branch.
- `branch_delete(name)` — refuses if there are descendants on the branch, or if it is the current branch. The manifest is the single authority for branch existence: deletion flips the `__manifest` branch ref first (one atomic op), after which the branch is gone from every snapshot. The owned per-table forks and the commit-graph branch are derived state, reclaimed best-effort with `force_delete_branch` after the flip. A failure during that reclaim (transient object-store error) does not fail the call or block the authority flip; the leftover forks are unreachable orphans that the [`cleanup`](maintenance.md) reconciler converges. One consequence: if a delete's best-effort reclaim fails, reusing that branch name before the next `cleanup` surfaces a clear error pointing at `cleanup` (the stale fork would otherwise collide on first write).
- **Lazy forking**: a branch only forks a sub-table when that sub-table is first mutated on it. Pure-read branches share fragments with their source. A fork collision is classified by the manifest authority, not by Lance branch versions: if the live manifest already records the fork on the active branch, a concurrent first-write won and the caller gets a retryable "refresh and retry"; if the manifest does not, a physical branch there is an orphan and the caller is pointed at `cleanup`.
- `sync_branch(branch)` — re-binds the in-memory handle to the latest head of the branch.
@ -51,10 +51,10 @@ Notes:
## L2 — Internal system branches
Filtered from `branch_list()` but visible to internals:
Internal or legacy branch refs:
- `__schema_apply_lock__` — serializes schema migrations.
- `__run__<run-id>` — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). The branch-name guard predicate `is_internal_run_branch` is kept as defense-in-depth so users cannot create a branch matching the legacy prefix; the filter will be removed once production legacy branches are swept (MR-770).
- `__schema_apply_lock__` — serializes schema migrations; filtered from `branch_list()` but visible to internals.
- `__run__<run-id>` — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). These are swept off `__manifest` on the first read-write open by the v2→v3 internal-schema migration (MR-770), and `__run__*` is no longer a reserved name. Known limitation: a pre-v0.4.0 graph opened **read-only** still surfaces any stale `__run__*` branch in `branch_list()` until its first read-write open (the migration is write-path-only, like all manifest migrations).
## L2 — Recovery audit trail

View file

@ -4,11 +4,11 @@
|---|---|---|
| `MANIFEST_DIR` | `__manifest` | `db/manifest/layout.rs` |
| Commit graph dir | `_graph_commits.lance` | `db/commit_graph.rs` |
| Run registry dir (legacy, removed MR-771) | `_graph_runs.lance` | inert post-v0.4.0; reclaimed by MR-770 |
| Run branch prefix (legacy, removed MR-771) | `__run__` | filtered by `is_internal_run_branch` defense-in-depth |
| Run registry dir (legacy, removed MR-771) | `_graph_runs.lance` | inert post-v0.4.0; bytes remain until a `delete_prefix` primitive lands |
| Run branch prefix (legacy, removed MR-771/MR-770) | `__run__` | swept off `__manifest` by the v2→v3 migration; no longer a reserved name |
| Schema apply lock | `__schema_apply_lock__` | `db/mod.rs` |
| Manifest publisher retry budget | `PUBLISHER_RETRY_BUDGET = 5` | `db/manifest/publisher.rs` |
| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` | `db/manifest/migrations.rs` |
| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 3` | `db/manifest/migrations.rs` |
| Merge stage batch | `MERGE_STAGE_BATCH_ROWS = 8192` | `exec/merge.rs` |
| Maintenance concurrency | `OMNIGRAPH_MAINTENANCE_CONCURRENCY=8` | `db/omnigraph/optimize.rs` |
| Lance blob compaction support | `LANCE_SUPPORTS_BLOB_COMPACTION = false` | `db/omnigraph/optimize.rs` |

View file

@ -22,7 +22,7 @@ OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordin
- `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
- `__manifest/` — the catalog of all sub-tables and their published versions
- `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
- (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 graphs are inert; the run state machine was removed in MR-771 and these files are cleaned up via MR-770's production sweep)
- (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 graphs are inert; the run state machine was removed in MR-771. The v2→v3 manifest migration sweeps stale `__run__*` branches on first write-open; the inert dataset bytes themselves remain until a `delete_prefix` storage primitive lands)
- **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`):
- `object_type``table | table_version | table_tombstone`
- `table_key``node:<TypeName> | edge:<EdgeName>`
@ -47,6 +47,7 @@ Adding a new on-disk shape change is one constant bump (`INTERNAL_MANIFEST_SCHEM
|---|---|
| v1 (implicit, pre-stamp) | `__manifest.object_id` had no PK annotation; publisher had no row-level CAS protection. |
| v2 | `__manifest.object_id` carries `lance-schema:unenforced-primary-key=true`; row-level CAS engaged. Stamped as `omnigraph:internal_schema_version=2`. |
| v3 | One-time sweep of legacy `__run__*` staging branches (pre-v0.4.0 Run state machine, removed MR-771) off `__manifest`. Runs at `Omnigraph::open(ReadWrite)` and on publish. Stamped as `omnigraph:internal_schema_version=3`. |
## On-disk layout
@ -91,7 +92,7 @@ flowchart TB
- **Graph root** is one directory (or S3 prefix). Everything below is part of one OmniGraph graph.
- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; MR-770 sweeps these in production.)
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the v2→v3 migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once `delete_prefix` lands.)
- **`_graph_commit_recoveries.lance`** — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`.
- **`__recovery/{ulid}.json`** — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`.
- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.

View file

@ -34,10 +34,15 @@ PY
canonical=()
while IFS= read -r line; do
canonical+=("$line")
done < <(find docs -type f -name '*.md' ! -path 'docs/releases/*' ! -path 'docs/internal/*' | sort)
done < <(find docs -type f -name '*.md' ! -path 'docs/releases/*' ! -path 'docs/internal/*' ! -path 'docs/rfcs/*' | sort)
if [[ -d docs/releases ]]; then
canonical+=("docs/releases/")
fi
# RFCs are a growing collection (like releases): represent the directory, not
# every per-RFC file. The dir must be linked from an audience index.
if [[ -d docs/rfcs ]]; then
canonical+=("docs/rfcs/")
fi
linked=()
for index_file in "${index_files[@]}"; do