mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-21 02:28:07 +02:00
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator docs, while preserving every user-facing behavior, command, flag, endpoint, constant, and env var. No behavior changes. Removed across 18 files: - internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N"); - source-code paths (crates/**/*.rs, *.pest) and internal struct/function dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types, internal fn names like fork_branch_from_state, optimize_all_tables); - Lance-internal blocker prose (upstream issue numbers, blob-decode cause, sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g. "optimize skips Blob-column tables; reads/writes unaffected"); - pre-v0.4.0 Run-state-machine archaeology. Internal IR/lowering/recovery-internals sections were either trimmed to a brief user-facing note (e.g. "Traversal execution", "interrupted writes recover automatically; recovery commits are recorded under actor omnigraph:recovery") or removed. Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints, error taxonomy, every constant and env var (verified none dropped from the constants cheat-sheet), and the operator-facing explanations of on-disk artifacts. Residual "legacy" mentions are all user-facing (the deprecated omnigraph.yaml, the legacy token chain, old command names). Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across docs/user; zero broken links; check-agents-md.sh green. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
612741b387
commit
77dffdae92
18 changed files with 192 additions and 266 deletions
|
|
@ -1,6 +1,6 @@
|
|||
# Change Detection / Diff
|
||||
|
||||
`changes/mod.rs`. Three-level algorithm:
|
||||
Diffing two read targets uses a three-level algorithm:
|
||||
|
||||
1. **Manifest diff**: skip sub-tables whose `(table_version, table_branch)` is unchanged.
|
||||
2. **Lineage check**:
|
||||
|
|
|
|||
|
|
@ -2,44 +2,24 @@
|
|||
|
||||
## L1 — Lance per-dataset branches
|
||||
|
||||
Lance supports branching at the dataset level: a branch is a named lineage of versions, and `fork_branch_from_state(source_branch, target_branch, source_version)` creates a copy-on-write fork.
|
||||
Lance supports branching at the dataset level: a branch is a named lineage of versions, and a copy-on-write fork creates a new branch from a source branch at a given version.
|
||||
|
||||
## L2 — Graph-level branches
|
||||
|
||||
OmniGraph builds *graph branches* on top by branching every sub-table coherently:
|
||||
|
||||
- `branch_create(name)` / `branch_create_from(target, name)` — disallowed name `main`; fails if branch exists; ensures the schema-apply lock is idle. Atomic and authority-first like `branch_delete`: it flips the `__manifest` branch (authority), then creates the derived commit-graph branch, force-dropping any orphaned commit-graph ref left by an incomplete prior delete (the manifest branch is fresh, so a same-named commit-graph branch is provably a zombie). If commit-graph creation fails, the manifest branch is rolled back so the name never half-exists.
|
||||
- `branch_list()` — returns public branches, **filters the internal** `__schema_apply_lock__` branch.
|
||||
- `branch_delete(name)` — refuses if there are descendants on the branch, or if it is the current branch. The manifest is the single authority for branch existence: deletion flips the `__manifest` branch ref first (one atomic op), after which the branch is gone from every snapshot. The owned per-table forks and the commit-graph branch are derived state, reclaimed best-effort with `force_delete_branch` after the flip. A failure during that reclaim (transient object-store error) does not fail the call or block the authority flip; the leftover forks are unreachable orphans that the [`cleanup`](../operations/maintenance.md) reconciler converges. One consequence: if a delete's best-effort reclaim fails, reusing that branch name before the next `cleanup` surfaces a clear error pointing at `cleanup` (the stale fork would otherwise collide on first write).
|
||||
- **Lazy forking**: a branch only forks a sub-table when that sub-table is first mutated on it. Pure-read branches share fragments with their source. A fork collision is classified by the manifest authority, not by Lance branch versions: if the live manifest already records the fork on the active branch, a concurrent first-write won and the caller gets a retryable "refresh and retry"; if the manifest does not, a physical branch there is an orphan and the caller is pointed at `cleanup`.
|
||||
- `sync_branch(branch)` — re-binds the in-memory handle to the latest head of the branch.
|
||||
- **Create** (`branch create` / `branch create --from <target>`) — the name `main` is disallowed; fails if the branch exists. Atomic: the new branch becomes visible all-or-nothing, so a name never half-exists.
|
||||
- **List** (`branch list`) — returns public branches, **filtering the internal** `__schema_apply_lock__` branch.
|
||||
- **Delete** (`branch delete`) — refuses if there are descendants on the branch, or if it is the current branch. Once deleted, the branch is gone from every snapshot. The owned per-table forks are reclaimed best-effort; if that reclaim hits a transient object-store error, the leftover storage is reclaimed later by the [`cleanup`](../operations/maintenance.md) command. One consequence: if a delete's reclaim fails, reusing that branch name before the next `cleanup` surfaces a clear error pointing at `cleanup`.
|
||||
- **Lazy forking**: a branch only forks a sub-table when that sub-table is first mutated on it. Pure-read branches share storage with their source. If two writers race to first-write the same branch, the loser gets a retryable "refresh and retry".
|
||||
|
||||
## L2 — Commit graph (`db/commit_graph.rs`)
|
||||
## L2 — Commit graph
|
||||
|
||||
In-memory shape of a graph commit:
|
||||
Each graph commit carries a ULID id, the manifest branch and version it published, its parent commit (two parents for a merge commit, one for a linear commit), the actor who made it, and a creation timestamp.
|
||||
|
||||
```
|
||||
GraphCommit {
|
||||
graph_commit_id: ULID,
|
||||
manifest_branch: Option<String>,
|
||||
manifest_version: u64,
|
||||
parent_commit_id: Option<String>,
|
||||
merged_parent_commit_id: Option<String>, // populated for merge commits
|
||||
actor_id: Option<String>, // joined in-memory from _graph_commit_actors.lance, NOT a column on _graph_commits.lance
|
||||
created_at: i64 (microseconds since epoch),
|
||||
}
|
||||
```
|
||||
|
||||
Storage is split across two Lance datasets (both with stable row IDs):
|
||||
|
||||
- `_graph_commits.lance` — every column above *except* `actor_id`.
|
||||
- `_graph_commit_actors.lance` — optional separate `(graph_commit_id, actor_id)` map, created on demand. The `actor_id` field above is populated by joining this dataset in-memory at load time.
|
||||
|
||||
Notes:
|
||||
|
||||
- Every successful publish (load / change / merge / schema_apply) appends one commit.
|
||||
- Every successful publish (load / change / merge / schema apply) appends one commit.
|
||||
- Merge commits have two parents; linear commits have one.
|
||||
- API: `list_commits(branch)`, `get_commit(id)`, `head_commit_id_for_branch(branch)`.
|
||||
- Inspect history with `commit list` and `commit show`.
|
||||
|
||||
## L2 — Snapshots & time travel
|
||||
|
||||
|
|
@ -49,13 +29,12 @@ conflict kinds are on the [merge](merge.md) page.
|
|||
|
||||
## L2 — Internal system branches
|
||||
|
||||
Internal or legacy branch refs:
|
||||
|
||||
- `__schema_apply_lock__` — serializes schema migrations; filtered from `branch_list()` but visible to internals.
|
||||
- `__run__<run-id>` — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). These are swept off `__manifest` on the first read-write open by the v2→v3 internal-schema migration (MR-770), and `__run__*` is no longer a reserved name. Known limitation: a pre-v0.4.0 graph opened **read-only** still surfaces any stale `__run__*` branch in `branch_list()` until its first read-write open (the migration is write-path-only, like all manifest migrations).
|
||||
- `__schema_apply_lock__` — serializes schema migrations; filtered from `branch list` but used internally.
|
||||
|
||||
## L2 — Recovery audit trail
|
||||
|
||||
The five migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`, `optimize_all_tables`) protect their multi-table commits with a sidecar at `__recovery/{ulid}.json` written before Phase B and deleted after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`: classify per-table state, decide all-or-nothing per sidecar, roll forward / back, record an audit row.
|
||||
Interrupted multi-table writes are recovered automatically the next time the graph is opened read-write. Recovery commits are recorded in the audit trail under the actor `omnigraph:recovery`, so you can find them with:
|
||||
|
||||
Audit rows live in `_graph_commit_recoveries.lance` (sibling to `_graph_commits.lance`) and reference the commit graph by `graph_commit_id`. The linked recovery commit is identified by that same `graph_commit_id`, and `actor_id="omnigraph:recovery"` is stored in `_graph_commit_actors.lance` (joined by `graph_commit_id`) — `_graph_commits.lance` itself does not carry the `actor_id` column. To find recoveries for a specific original actor: `omnigraph commit list --filter actor=omnigraph:recovery`, then join to `_graph_commit_recoveries.lance` by `graph_commit_id` to read `recovery_for_actor`. Schema: see `crates/omnigraph/src/db/recovery_audit.rs`.
|
||||
```bash
|
||||
omnigraph commit list --filter actor=omnigraph:recovery
|
||||
```
|
||||
|
|
|
|||
|
|
@ -107,7 +107,7 @@ Properties:
|
|||
- Each query on the branch is its own publisher commit — so they're individually atomic. Per-query CAS works on branches just like on main.
|
||||
- The branch lives on disk. Process crash mid-workflow? Re-open and resume.
|
||||
- Multiple agents can work on different branches in parallel without blocking each other.
|
||||
- The merge is a three-way merge at the row level. Conflicts surface as `OmniError::MergeConflicts(Vec<MergeConflict>)`, with structured kinds (`DivergentInsert`, `DivergentUpdate`, `DeleteVsUpdate`, …) so callers can handle them programmatically.
|
||||
- The merge is a three-way merge at the row level. Conflicts surface as structured merge-conflict kinds (`DivergentInsert`, `DivergentUpdate`, `DeleteVsUpdate`, …) so callers can handle them programmatically.
|
||||
|
||||
### 4. Coordinating multiple agents
|
||||
|
||||
|
|
@ -129,14 +129,14 @@ omnigraph branch merge agent-b/work --into main graph.omni
|
|||
|
||||
Each agent sees a consistent snapshot of `main` at the time it forked. The first merge to `main` lands as a fast-forward (or a no-op if no concurrent change). The second merge runs three-way: rows touched by both branches surface as `MergeConflict`s for the caller to resolve.
|
||||
|
||||
This is the workflow MR-797 / agentic loops are designed around: **branches are the unit of "an agent's working set."**
|
||||
This is the workflow agentic loops are designed around: **branches are the unit of "an agent's working set."**
|
||||
|
||||
## Failure modes
|
||||
|
||||
| Scenario | What happens | Caller action |
|
||||
|---|---|---|
|
||||
| Single query fails mid-flight | Publisher never publishes; target unchanged | Read the error, decide whether to retry |
|
||||
| Concurrent writers race the same `(table, branch)` | Publisher CAS rejects the loser with `ManifestConflictDetails::ExpectedVersionMismatch` | Refresh handle, retry the query |
|
||||
| Concurrent writers race the same `(table, branch)` | Publisher CAS rejects the loser with a version-mismatch conflict | Refresh handle, retry the query |
|
||||
| Branch with N successful mutations, then merge fails (three-way conflict) | Each individual mutation already committed on the branch; merge surfaces `MergeConflicts` | Inspect, decide whether to keep working on the branch, abandon it (`branch_delete`), or resolve and re-merge |
|
||||
| Process crashes mid-branch-workflow | Each completed mutation on the branch is durable | Re-open the graph, continue where you left off |
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue