mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
Merge origin/main into cluster-config-docs
This commit is contained in:
commit
b046515e1c
42 changed files with 2244 additions and 358 deletions
|
|
@ -139,6 +139,20 @@ them explicit.
|
|||
Remove the skip when the upstream Lance fix lands — the
|
||||
`lance_surface_guards.rs::compact_files_still_fails_on_blob_columns` guard
|
||||
turns red on that bump to force it.
|
||||
- **Manifest→commit-graph publish atomicity:** a graph commit advances
|
||||
`__manifest` (the visibility authority) and then appends `_graph_commits` as
|
||||
two separate writes (`commit_updates_with_actor_with_expected`, failpoint
|
||||
`graph_publish.before_commit_append`). A crash between them leaves the manifest
|
||||
at version N with no commit-graph row for N. Live reads and durability are
|
||||
unaffected — the live version resolves via the manifest
|
||||
(`GraphCoordinator::version()`), not the commit-graph head — and the open-time
|
||||
recovery sweep does NOT repair it (`lance_head == manifest_pinned` classifies
|
||||
`NoMovement`; a recovery sidecar would not change this). Impact is bounded to
|
||||
commit history: `commit list` misses N, time-travel by commit id to N fails,
|
||||
and merge-base loses a node (a likely-benign off-by-one re-merge). This affects
|
||||
every publish, not a specific maintenance command. Eventual fix: make the
|
||||
commit graph reconcilable from the manifest (or the two writes atomic) — not a
|
||||
recovery-sidecar concern.
|
||||
- **Planner capability/stat surfaces:** cost-aware planning, complete
|
||||
capability advertisement, and explain-with-cost are roadmap. Do not describe
|
||||
them as implemented.
|
||||
|
|
|
|||
|
|
@ -21,7 +21,7 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
|
|||
| `end_to_end.rs` | Full init → load → query/mutate flow |
|
||||
| `branching.rs` | Branch create / list / delete, lazy fork |
|
||||
| `merge_truth_table.rs` | Merge-pair truth table (MR-786): all 9×9 `(left_op, right_op)` cells from `{noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel}`. Adding a new op to `OpVariant` forces a compile error in `build_case` until the new row + column are dispositioned. 36 executable cells run through real `branch_merge` with a structured oracle (`MergeOutcome` / `MergeConflictKind` + graph-state assert); 45 cells involving `dropProperty`/`addLabel`/`removeLabel` are recorded as `Unsupported` until the mutation grammar grows. |
|
||||
| `writes.rs` | Direct-publish writes: cancellation, concurrent-writer CAS, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery) |
|
||||
| `writes.rs` | Direct-publish writes: cancellation, non-strict insert/merge rebase under the per-table queue, strict stale-write conflicts, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery) |
|
||||
| `staged_writes.rs` | TableStore staged-write primitives (`stage_append`, `stage_merge_insert`, `commit_staged`, `scan_with_staged`, `count_rows_with_staged`) — primitive-level only; engine code uses the in-memory `MutationStaging` accumulator instead |
|
||||
| `lifecycle.rs` | Graph lifecycle, schema state |
|
||||
| `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) |
|
||||
|
|
@ -35,10 +35,10 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
|
|||
| `s3_storage.rs` | S3-backed graph (skipped unless `OMNIGRAPH_S3_TEST_BUCKET` is set) |
|
||||
| `lance_version_columns.rs` | Per-row `_row_last_updated_at_version` behavior |
|
||||
| `validators.rs` | Schema constraint enforcement (enum, range, unique, cardinality) across JSONL, insert, update paths |
|
||||
| `maintenance.rs` | `optimize` (compaction) + `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation |
|
||||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the four per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`). |
|
||||
| `maintenance.rs` | `optimize` (compaction), `repair` (explicit uncovered-drift publish), and `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation; `optimize` publishes its own compaction (`optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds`), skips pre-existing uncovered drift (`optimize_skips_preexisting_manifest_head_drift`), and refuses to run while a `__recovery` sidecar is pending (`optimize_defers_when_recovery_sidecar_is_pending`); `repair` previews/heals verified maintenance drift, refuses raw semantic drift without `--force`, and forced repair publishes only by explicit operator choice |
|
||||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`, `optimize_phase_b_failure_recovered_on_next_open`). |
|
||||
| `recovery.rs` | Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path |
|
||||
| `composite_flow.rs` | Compositional/narrative end-to-end stories — multi-step flows that compose mechanics covered by other test files. Catches integration regressions where individual operations all pass their unit tests but their composition breaks (sequential merges, post-merge main writes, time-travel through merge DAG, reopen consistency over multi-merge histories). |
|
||||
| `composite_flow.rs` | Compositional/narrative end-to-end stories — multi-step flows that compose mechanics covered by other test files. Catches integration regressions where individual operations all pass their unit tests but their composition breaks (sequential merges, post-merge main writes, time-travel through merge DAG, reopen consistency over multi-merge histories, post-optimize and post-cleanup strict writes). |
|
||||
|
||||
## Fixtures
|
||||
|
||||
|
|
|
|||
|
|
@ -157,10 +157,14 @@ are left at `Lance HEAD = manifest_pinned + 1`.
|
|||
|
||||
**Recovery protocol** (lifecycle of every staged-write writer —
|
||||
`MutationStaging::finalize`, `schema_apply::apply_schema_with_lock`,
|
||||
`branch_merge_on_current_target`, `ensure_indices_for_branch`):
|
||||
`branch_merge_on_current_target`, `ensure_indices_for_branch`,
|
||||
`optimize_all_tables`):
|
||||
|
||||
1. **Phase A**: writer writes a sidecar JSON to
|
||||
`__recovery/{ulid}.json` BEFORE its first `commit_staged`. The
|
||||
`__recovery/{ulid}.json` BEFORE its first HEAD-advancing commit
|
||||
(`commit_staged`, or `compact_files` for `optimize_all_tables`,
|
||||
which advances the Lance HEAD via a reserve-fragments + rewrite
|
||||
commit rather than a staged write). The
|
||||
sidecar names every `(table_key, table_path, expected_version,
|
||||
post_commit_pin)` it intends to commit + the writer kind +
|
||||
actor_id.
|
||||
|
|
@ -195,8 +199,13 @@ recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`:
|
|||
otherwise full open-time recovery rolls them back and refresh-time
|
||||
recovery leaves them for the next read-write open.
|
||||
- Otherwise **roll back**: per-table `Dataset::restore` to the
|
||||
manifest-pinned table version for that branch. Rollback records the
|
||||
actual restore target in the audit row's `to_version`.
|
||||
manifest-pinned table version, then a single `ManifestBatchPublisher::publish`
|
||||
of the restored HEAD — symmetric with roll-forward, so `manifest == HEAD`
|
||||
after recovery (no residual drift). This convergence is what lets a
|
||||
failed-then-retried schema apply succeed instead of failing one version higher
|
||||
each iteration. The audit row's `to_version` records the logical
|
||||
rolled-back-to version (`manifest_pinned`); the manifest is published at the
|
||||
restore commit (`manifest_pinned + 1`, same content).
|
||||
- After a successful roll-forward or roll-back, an audit row is
|
||||
recorded — `_graph_commits.lance` carries
|
||||
a commit tagged `actor_id = "omnigraph:recovery"`, and a sibling
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue