mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
docs(optimize): document the pre-existing-drift reconcile
- maintenance.md: add the reconcile bullet (metadata-only manifest catch-up for a table with HEAD>manifest and an empty plan), and correct the 'requires a recovered graph' note — that guard is what makes the reconcile safe (no sidecar-covered drift in flight), not a claim that no drift exists. - AGENTS.md: restore the reconcile mention in the Compaction capability row. - testing.md: the maintenance.rs row lists all three optimize tests (publishes / reconciles / defers); fix the stale failpoints ensure_indices test name (recovery_rolls_forward_ensure_indices_on_feature_branch).
This commit is contained in:
parent
4bcfdee891
commit
2193a24641
3 changed files with 6 additions and 5 deletions
|
|
@ -34,8 +34,8 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
|
|||
| `s3_storage.rs` | S3-backed graph (skipped unless `OMNIGRAPH_S3_TEST_BUCKET` is set) |
|
||||
| `lance_version_columns.rs` | Per-row `_row_last_updated_at_version` behavior |
|
||||
| `validators.rs` | Schema constraint enforcement (enum, range, unique, cardinality) across JSONL, insert, update paths |
|
||||
| `maintenance.rs` | `optimize` (compaction) + `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation; `optimize` publishes the compacted version so the manifest tracks the Lance HEAD and a subsequent schema apply succeeds (`optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds`), and reconciles a pre-existing manifest-behind-HEAD drift forged via raw Lance compaction (`optimize_reconciles_preexisting_manifest_head_drift`) |
|
||||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`, `optimize_phase_b_failure_recovered_on_next_open`). |
|
||||
| `maintenance.rs` | `optimize` (compaction) + `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation; `optimize` publishes the compacted version so the manifest tracks the Lance HEAD and a subsequent schema apply succeeds (`optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds`), reconciles a pre-existing manifest-behind-HEAD drift forged via raw Lance compaction (`optimize_reconciles_preexisting_manifest_head_drift`), and refuses while an unresolved recovery sidecar is pending (`optimize_defers_when_recovery_sidecar_is_pending`) |
|
||||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `recovery_rolls_forward_ensure_indices_on_feature_branch`, `optimize_phase_b_failure_recovered_on_next_open`). |
|
||||
| `recovery.rs` | Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path |
|
||||
| `composite_flow.rs` | Compositional/narrative end-to-end stories — multi-step flows that compose mechanics covered by other test files. Catches integration regressions where individual operations all pass their unit tests but their composition breaks (sequential merges, post-merge main writes, time-travel through merge DAG, reopen consistency over multi-merge histories, post-optimize and post-cleanup strict writes). |
|
||||
|
||||
|
|
|
|||
|
|
@ -4,10 +4,11 @@
|
|||
|
||||
## `optimize_all_tables(db)` — non-destructive
|
||||
|
||||
- Lance `compact_files()` on every node + edge table on `main`, then **publishes the compacted version to the `__manifest`** so the manifest's `table_version` tracks the compacted Lance HEAD. Reads pin the manifest version, so without this publish compaction would be invisible to readers *and* would break the HEAD-vs-manifest precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually compacted.
|
||||
- Lance `compact_files()` on every node + edge table on `main`, then **publishes the compacted version to the `__manifest`** so the manifest's `table_version` tracks the compacted Lance HEAD. Reads pin the manifest version, so without this publish compaction would be invisible to readers *and* would break the HEAD-vs-manifest precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) for tables that actually compacted.
|
||||
- **Reconciles pre-existing benign drift.** A table whose Lance HEAD already sits ahead of its manifest pin with nothing left to compact — left by a pre-fix `optimize` that never published, or by an external raw `compact_files` — is caught up by publishing `manifest = HEAD` (a metadata-only commit; `committed: true` with `fragments_added == 0`, no fragment changes). This is the producer-side heal for *uncovered* drift, and it is safe because the "requires a recovered graph" guard below ensures no recovery sidecar is in flight, so any `HEAD > manifest` it sees is necessarily content-preserving — never a partial write. Strict writes / schema apply on such a table 409 with "stale view" until an `optimize` (or any insert/merge, which heals it as a side effect) reconciles it.
|
||||
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests until `cleanup` runs.
|
||||
- Each table's compact→publish runs under its per-`(table, main)` write queue (serializing with concurrent mutations — compaction is a Lance `Rewrite` op that retryable-conflicts with a concurrent merge/update/delete on overlapping fragments). The Lance-HEAD-before-manifest-publish gap is covered by a `SidecarKind::Optimize` recovery sidecar (loose-match): a crash in that window rolls the compacted version forward on the next `Omnigraph::open` (compaction is content-preserving, so roll-forward is always safe).
|
||||
- **Requires a recovered graph.** `optimize` refuses (errors) when an unresolved recovery sidecar is present under `__recovery` — operating on an unrecovered graph could publish a partial write the open-time recovery sweep would roll back. Reopen the graph to run the recovery sweep, then re-run `optimize`. (Recovery roll-back now publishes its restored version, so a recovered graph always satisfies `manifest == Lance HEAD` going in; there is no leftover drift for `optimize` to interpret.)
|
||||
- **Requires a recovered graph.** `optimize` refuses (errors) when an unresolved recovery sidecar is present under `__recovery` — operating on an unrecovered graph could publish a partial write the open-time recovery sweep would roll back. Reopen the graph to run the recovery sweep, then re-run `optimize`. (This is exactly what makes the reconcile above safe: a recovered graph has no *sidecar-covered* drift going in, so any `HEAD > manifest` `optimize` then sees is uncovered, content-preserving drift — never a partial write.)
|
||||
- Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8).
|
||||
- Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed, skipped }]`.
|
||||
- **Blob tables are skipped.** A table that declares any `Blob` property is not compacted: it is reported with `skipped: Some(BlobColumnsUnsupportedByLance)` (and logged via `tracing::warn`) instead of compacted, and the rest of the sweep proceeds normally. The current Lance `compact_files` mis-decodes blob-v2 columns under its forced `BlobHandling::AllBinary` read; **reads and writes are unaffected** — only compaction is. This is gated by `LANCE_SUPPORTS_BLOB_COMPACTION` (`db/omnigraph/optimize.rs`) and removed when the upstream Lance fix lands (see [docs/dev/lance.md](../dev/lance.md)). Consequence: fragment count and deleted-row space on blob tables are not reclaimed until then; query results are never affected.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue