omnigraph/docs/branches-commits.md
Ragnor Comerford 6f25c4f9f8
Address reviewer feedback (Cursor + cubic) on PR #60
All eight comments verified against source and applied:

- AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of
  the markdown blockquote. Claude Code's @-import parser expects @ at
  column 0; the leading "> " of a blockquote silently broke
  recognition, so the claimed auto-include did nothing. (Cursor,
  Medium severity.)
- docs/cli-reference.md: command-family count 13 → 17. The current
  enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level
  variants. (cubic P2.)
- docs/ci.md: Homebrew tap update is a regular `git push`, not a
  force-push (release.yml:117 is `git push origin HEAD:main`). (cubic
  P2.)
- docs/errors.md: add the Storage variant to the NanoError list — it
  exists at error.rs:88-89 but the doc enumerated only 10 of 11.
  (cubic P2.)
- docs/storage.md: clarify tombstone semantics. There is no
  tombstone_version column; state.rs:180 reads the tombstone version
  from the table_version column on rows where object_type =
  table_tombstone. (cubic P2.)
- docs/branches-commits.md: split the GraphCommit pseudo-struct from
  the underlying storage. actor_id is joined in-memory from
  _graph_commit_actors.lance, not a column on _graph_commits.lance.
  (cubic P2.)
- docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to
  match the actual constant name in catalog/schema_ir.rs:11.
  (cubic P3.)
- docs/testing.md: engine integration test count 16 → 15 (matches
  `ls crates/omnigraph/tests/*.rs`). (cubic P3.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:09:06 +02:00

2.8 KiB

Branches, Commits, Snapshots

L1 — Lance per-dataset branches

Lance supports branching at the dataset level: a branch is a named lineage of versions, and fork_branch_from_state(source_branch, target_branch, source_version) creates a copy-on-write fork.

L2 — Graph-level branches

OmniGraph builds graph branches on top by branching every sub-table coherently:

  • branch_create(name) / branch_create_from(target, name) — disallowed name main; fails if branch exists; ensures the schema-apply lock is idle.
  • branch_list() — returns public branches, filters internal __run__… and __schema_apply_lock__ prefixes.
  • branch_delete(name) — refuses if there are descendants or active runs on the branch; cleans up owned per-branch fragments.
  • Lazy forking: a branch only forks a sub-table when that sub-table is first mutated on it. Pure-read branches share fragments with their source.
  • sync_branch(branch) — re-binds the in-memory handle to the latest head of the branch.

L2 — Commit graph (db/commit_graph.rs)

In-memory shape of a graph commit:

GraphCommit {
  graph_commit_id: ULID,
  manifest_branch: Option<String>,
  manifest_version: u64,
  parent_commit_id: Option<String>,
  merged_parent_commit_id: Option<String>,   // populated for merge commits
  actor_id: Option<String>,                  // joined in-memory from _graph_commit_actors.lance, NOT a column on _graph_commits.lance
  created_at: i64 (microseconds since epoch),
}

Storage is split across two Lance datasets (both with stable row IDs):

  • _graph_commits.lance — every column above except actor_id.
  • _graph_commit_actors.lance — optional separate (graph_commit_id, actor_id) map, created on demand. The actor_id field above is populated by joining this dataset in-memory at load time.

Notes:

  • Every successful publish (load / change / merge / schema_apply / publish_run) appends one commit.
  • Merge commits have two parents; linear commits have one.
  • API: list_commits(branch), get_commit(id), head_commit_id_for_branch(branch).

L2 — Snapshots & time travel

  • snapshot() — current snapshot for the bound branch; cached.
  • snapshot_of(target) — snapshot at a ReadTarget (branch | snapshot id).
  • snapshot_at_version(v: u64) — historical snapshot from any manifest version.
  • entity_at(table_key, id, version) — single-entity time travel without building a full snapshot.
  • A Snapshot is a (version, HashMap<table_key, SubTableEntry>) — cheap to build, snapshot-isolated cross-table reads.

L2 — Internal system branches

Filtered from branch_list() but visible to internals:

  • __run__<run-id> — ephemeral isolation branch for a transactional run.
  • __schema_apply_lock__ — serializes schema migrations.