Commit graph

4 commits

Author SHA1 Message Date
Ragnor Comerford
05e52f2ee0
recovery: rename composite test, strip ticket references, address review
Three bundled changes:

1. Rename `tests/agent_lifecycle.rs` -> `tests/composite_flow.rs` (and
   the test function). OmniGraph is consumed by both humans and agents
   - naming the test after one audience misframes the library.

2. Strip Linear ticket IDs, PR numbers, bot reviewer names, and
   review-round labels from source, tests, and docs added by this
   branch. Internal traceability belongs in commit messages and PR
   descriptions, not in checked-in artifacts. Upstream
   lance-format/lance issue refs and pre-existing MR-XXX refs in docs
   not touched by this branch are left alone.

3. Two outstanding review findings addressed:
   - `needs_index_work_node` / `needs_index_work_edge`: propagate
     `count_rows` errors instead of `unwrap_or(0)`. Silently treating
     transient I/O failures as "0 rows" risked skipping a table from
     the recovery sidecar pin set that was actually about to be
     modified.
   - `recovery_multi_sidecar_requires_fresh_snapshot_for_correctness`:
     strengthen the assertion to fail when sidecar B classifies under
     a stale snapshot. The new assertion checks post-recovery Lance
     HEAD == v3 (no `Dataset::restore` ran). The previous "sidecar
     deleted + audit rows present" pair passed in both the bug and
     fix paths because both delete the sidecar and write an audit
     row; the differentiator is the post-recovery HEAD. Strengthening
     the assertion exposed an additional nuance: in this overlapping-
     sidecar scenario sidecar B's audit kind is RolledBack (no-op)
     rather than RolledForward, since sidecar A's roll-forward
     publishes Lance HEAD as the new manifest pin (absorbing B's
     work). The docstring now explains why this is correct given
     current `roll_forward_all` semantics.

All workspace tests pass with --features failpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:56:36 +02:00
Ragnor Comerford
932334ba01
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.

- docs/invariants.md §VI.23: change from "upheld at the writer-trait
  surface for inserts/updates/etc., per-table commit_staged → manifest
  publish window remains" to "upheld at the writer-trait surface AND
  across process boundaries". The MR-847 sweep closes the residual on
  the next Omnigraph::open. The "continuous in-process" property
  (no ExpectedVersionMismatch surfacing to subsequent writers between
  Phase B failure and process restart) is honest follow-up at MR-856.

- docs/runs.md: replace "Finalize → publisher residual" section with
  "Open-time recovery sweep (MR-847)" — describes the sidecar protocol
  lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
  the audit trail, and the operator-facing query
  (omnigraph commit list --filter actor=omnigraph:recovery).

- AGENTS.md capability matrix "Atomic single-dataset commits" row:
  drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
  describe the three layers as all shipping; reference MR-856 for the
  background-reconciler follow-up.

- docs/storage.md: add _graph_commit_recoveries.lance and
  __recovery/{ulid}.json to the on-disk layout (mermaid + prose).

- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
  subsection describing the join from
  _graph_commits.lance:actor_id="omnigraph:recovery" to
  _graph_commit_recoveries.lance:graph_commit_id for operator
  post-mortem.

- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
  --keep < 3 may garbage-collect Lance versions the recovery sweep
  needs as a rollback target. Default --keep 10 is safe.

- docs/testing.md: add tests/recovery.rs to the engine integration-test
  table; expand the failpoints.rs row to mention the four MR-847
  per-writer Phase B → recovery integration tests.

- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
  every commit hash + scope across phases 1-10.

AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:24 +02:00
Claude
243c0c3464
Add internal-schema versioning + auto-migration for __manifest
The on-disk shape of `__manifest` is reconciled with the binary via a single
stamp + dispatcher in `db/manifest/migrations.rs`:

- `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` declares the shape this binary writes.
- The on-disk stamp `omnigraph:internal_schema_version` lives in the manifest
  dataset's schema-level metadata (Lance `update_schema_metadata`).
- `migrate_internal_schema(&mut dataset)` walks `match`-arm steps forward from
  the on-disk stamp until it matches the binary, then returns. Idempotent.
- `init_manifest_repo` stamps the current version at creation; the publisher's
  open-for-write path runs pending migrations before reading state. Reads
  stay side-effect-free.
- Forward-version protection: a stamp higher than the binary's known version
  triggers a clear "upgrade omnigraph first" error so an old binary cannot
  clobber a newer schema.

Self-heals existing pre-MR-766 deployments by auto-applying the v1→v2 step:
the `lance-schema:unenforced-primary-key` annotation on `__manifest.object_id`
that engages Lance's row-level CAS at commit time. New repos created via
`init` are stamped at v2 immediately and don't need migration.

Adding a future on-disk shape change is one constant bump, one match arm in
`migrate_internal_schema`, and one test — no new branches in unrelated code
paths. Code outside the migration module never inspects the stamp.

New tests in `manifest/tests.rs`:
- `test_init_stamps_internal_schema_version`
- `test_publish_migrates_pre_stamp_manifest_to_current_version`
- `test_publish_rejects_manifest_stamped_at_future_version`

Docs: `docs/storage.md`, `docs/maintenance.md`, `docs/constants.md` updated
per the AGENTS.md maintenance contract.
2026-04-29 11:44:14 +00:00
Ragnor Comerford
a335d98854
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00