omnigraph/docs/user
Ragnor Comerford e62d9166fb
fix: optimize publishes compaction; recovery roll-back converges manifest (#141)
* test(optimize): cover manifest publish + HEAD-drift reconcile

Red against the pre-fix optimize, which ran compact_files without
publishing the compacted version to __manifest:

- maintenance: optimize must publish so the manifest table_version
  tracks the compacted Lance HEAD and a later schema apply succeeds;
  and must reconcile a pre-existing manifest-behind-HEAD drift (forged
  via raw Lance compaction) so strict writes commit again.
- end_to_end + composite_flow: post-optimize query / strict update /
  reopen in the full lifecycle (the canonical flow previously omitted
  post-optimize writes as a documented "known limitation").
- failpoints: a crash between compaction and the manifest publish rolls
  forward on next open.

* fix(optimize): publish compaction to manifest and reconcile HEAD drift

optimize ran Lance compact_files without publishing the new version to
__manifest, so the manifest table_version lagged the Lance HEAD: reads
stayed pinned to the pre-compaction version, and the next schema apply or
strict update/delete failed its HEAD-vs-manifest precondition with
"stale view ... refresh and retry" (open-time recovery rollback inflated
the gap on retry).

optimize now publishes each compacted table's version under the
per-(table, main) write queue, guarded by a manifest CAS and a
SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe
because compaction is content-preserving). When a table has nothing left
to compact but its Lance HEAD is already ahead of the manifest pin
(pre-fix drift, or a recovery restore commit), optimize reconciles the
manifest forward to HEAD (metadata-only, no sidecar). Caches and the
CSR/CSC graph index are invalidated after a publish.

Docs updated (maintenance, storage, branches-commits, writes, testing).

* test(recovery): rollback convergence + optimize-defer regressions

Red against the current code, landed before the fix:
- recovery: after the open-time sweep rolls a sidecar back, the manifest
  must track Lance HEAD (no residual drift) so a follow-up schema apply
  succeeds — the original "+1 per retry" loop. Today roll-back restores
  without publishing, so the manifest lags HEAD and the apply fails its
  HEAD-vs-manifest precondition.
- maintenance: optimize must refuse while a recovery sidecar is pending —
  operating on an unrecovered graph could publish a partial write the
  sweep would roll back.

Also removes optimize_reconciles_preexisting_manifest_head_drift: the
ad-hoc drift reconcile it covered is replaced by recovery-side convergence.

* fix(recovery): converge manifest on roll-back; optimize defers on pending recovery

Root of PR #141's review findings and the original "+1 per retry" loop:
a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving
drift vs. a partial write a sidecar will roll back), and optimize's reconcile
guessed it benign. Close the class instead of guessing:

- Recovery roll-back now PUBLISHES the restored version (via a
  push_table_update_at_head helper shared with roll-forward), so the manifest
  tracks the Lance HEAD after recovery — symmetric with roll-forward. This
  fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest
  precondition passes) and removes the only remaining source of orphaned
  drift. The audit still records the logical rolled-back-to version; the
  manifest is published at the restore commit (identical content).
- optimize drops the ad-hoc drift reconcile and instead REFUSES when a
  __recovery sidecar is pending, so it only ever operates on a recovered
  graph (manifest == HEAD); its compaction publish can no longer commit a
  partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is
  moot.

Updates the rollback recovery-test helper (manifest == HEAD after roll-back),
the failpoints assertions, and the user/dev docs.

* test(recovery): fix rollback assertion for manifest convergence

The roll-back-publishes change makes the manifest version advance after a
SchemaApply roll-back (to the old-schema content), so the
schema_apply_without_schema_staging_rolls_back_on_next_open assertion must
be `version > pre`, not `version == pre`. This update was dropped during
the commit churn and surfaced as a CI Test Workspace failure; the
old-schema-preserved intent stays covered by count_rows + _schema.pg + the
RolledBack convergence invariant.
2026-06-08 02:50:12 +03:00
..
audit.md feat(engine): sweep & remove legacy __run__ branch guard (MR-770) (#132) 2026-06-07 18:33:14 +03:00
branches-commits.md fix: optimize publishes compaction; recovery roll-back converges manifest (#141) 2026-06-08 02:50:12 +03:00
changes.md docs: split user and developer docs (#93) 2026-05-15 03:45:22 +03:00
cli-reference.md fix(optimize): skip blob-bearing tables to avoid Lance compaction crash (#138) 2026-06-02 17:12:00 +02:00
cli.md feat: inline query strings in CLI and HTTP server (#110) 2026-05-29 13:41:54 +02:00
constants.md feat(engine): sweep & remove legacy __run__ branch guard (MR-770) (#132) 2026-06-07 18:33:14 +03:00
deployment.md feat(server): compose OMNIGRAPH_TARGET_URI with OMNIGRAPH_CONFIG in entrypoint (#129) 2026-05-30 20:17:55 +01:00
embeddings.md Rename repo terminology to graph (#118) 2026-05-24 16:46:00 +01:00
errors.md docs: rename runs.md/runs.rs → writes and repoint all references (#131) 2026-05-30 23:20:56 +02:00
index.md Rename repo terminology to graph (#118) 2026-05-24 16:46:00 +01:00
indexes.md docs: split user and developer docs (#93) 2026-05-15 03:45:22 +03:00
install.md Add Windows release binaries (#127) 2026-05-30 14:23:40 +02:00
maintenance.md fix: optimize publishes compaction; recovery roll-back converges manifest (#141) 2026-06-08 02:50:12 +03:00
policy.md Stored-query registry foundation + config/CLI RFC-002 (#128) 2026-06-01 22:50:31 +02:00
query-language.md docs: rename runs.md/runs.rs → writes and repoint all references (#131) 2026-05-30 23:20:56 +02:00
schema-language.md schema: HTTP allow_data_loss exposure + e2e drop coverage (MR-694 follow-up) (#107) 2026-05-19 01:56:46 +03:00
schema-lint.md docs: split user and developer docs (#93) 2026-05-15 03:45:22 +03:00
server.md Stored-query registry foundation + config/CLI RFC-002 (#128) 2026-06-01 22:50:31 +02:00
storage.md fix: optimize publishes compaction; recovery roll-back converges manifest (#141) 2026-06-08 02:50:12 +03:00
transactions.md docs: rename runs.md/runs.rs → writes and repoint all references (#131) 2026-05-30 23:20:56 +02:00