omnigraph/AGENTS.md

# OmniGraph — Agent Guide

This file is the always-on map for AI coding agents (Claude Code, Codex, Cursor, Cline) working in this codebase. It is loaded into context on every turn, so it stays as a **map plus the rules and invariants that need to be in scope at all times** — the encyclopedia content lives under [`docs/`](docs/). When you need depth, follow a pointer.

**Required reading every session, every change:**

1. **[docs/dev/invariants.md](docs/dev/invariants.md)** — the architectural invariants and deny-list. Apply to every PR, not only architecture work.
2. **[docs/dev/lance.md](docs/dev/lance.md)** — the curated index of upstream Lance docs. **Consult it before every task** to identify which Lance pages are relevant. **Then fetch every page in the matching domain section, plus every page that is even slightly relevant** — not just the page whose title most obviously matches the task. Behavior is interlocked across pages (transactions reference index lifecycle; index lifecycle references compaction; compaction references row-id lineage), and skipping a "slightly relevant" page is how alignment misses happen. The index itself is not a substitute for reading the pages — never act on the index alone. **Always fetch the FULL page content, not summaries** — use `curl -sL <url> | pandoc -f html -t markdown` or paste the rendered page text manually. Tools that summarize pages (like Claude's `WebFetch`) drop load-bearing details — we have caught alignment misses (default flags, `pub(crate)` blockers, three-page sub-specs hidden behind navigation hubs) only after dumping the full markdown.
3. **[docs/dev/testing.md](docs/dev/testing.md)** — the test-coverage map. **Always check what already covers your change before writing a new test.** Extending an existing test (an assertion, a fixture row, a parameterization) is preferred over a duplicated `init_and_load()` block. Walk the before-every-task checklist to identify existing coverage, run those tests as a clean baseline, and only add a new test fn or file when no existing one owns the area.

Tools that support `@`-imports (Claude Code) auto-include all three files via the imports below — note these must sit at column 0 (not inside a blockquote) for the parser to recognize them. Other agents (Codex, Cursor, Cline, …) must open them explicitly at the start of each session.

@docs/dev/invariants.md
@docs/dev/lance.md
@docs/dev/testing.md

`CLAUDE.md` is a symlink to this file — there is exactly one source of truth. Edit `AGENTS.md`.

**Version surveyed:** 0.7.0
**Workspace crates:** `omnigraph-compiler`, `omnigraph` (engine), `omnigraph-policy`, `omnigraph-api-types` (shared HTTP wire DTOs), `omnigraph-cluster`, `omnigraph-cli`, `omnigraph-server`
**Storage substrate:** Lance 7.x (columnar, versioned, branchable)
**License:** MIT
**Toolchain:** Rust stable, edition 2024

---

## Start here — what is this?

OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets. Highlights:

- **Storage**: per node/edge type a separate Lance dataset; multi-dataset commits coordinated atomically through one `__manifest` table.
- **Languages**: a `.pg` schema language and a `.gq` query language, both Pest-based, with a typed IR.
- **Multi-modal querying**: vector ANN (`nearest`), full-text (`search`/`fuzzy`/`match_text`/`bm25`), Reciprocal Rank Fusion (`rrf`), and graph traversal (`Expand`, anti-join `not { … }`) in one runtime.
- **Branches and commits across the whole graph**: Git-style — every successful publish appends to a commit DAG; merges are three-way at the row level.
- **Atomic per-query writes**: `mutate_as` and `load` accumulate insert/update batches into an in-memory `MutationStaging.pending` per touched table; one `stage_*` + `commit_staged` per table runs at end-of-query, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS. A mid-query failure leaves Lance HEAD untouched on staged tables — no drift, no run state machine, no staging branches. Deletes still inline-commit; D₂ at parse time prevents inserts/updates and deletes from coexisting in one query.
- **HTTP server**: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager). Cedar policy enforcement is engine-wide — every `_as` writer calls `Omnigraph::enforce(action, scope, actor)`, so HTTP, CLI, and embedded SDK consumers all hit the same gate. **Cluster-only boot** (RFC-011): the server always boots from a cluster directory (`--cluster <dir | s3://…>`, RFC-005) and serves N graphs (N ≥ 1) under multi-graph routes (`/graphs/{graph_id}/...` + read-only `GET /graphs` enumeration); there are no single-graph flat routes and no positional-URI boot. Per-graph + server-level Cedar policies. Runtime add/remove (`POST /graphs`, `DELETE /graphs/{id}`) is not exposed — operators run `cluster apply` and restart.
- **CLI** with two-surface config (RFC-007/008): the team-owned cluster directory (`cluster.yaml`) plus the per-operator `~/.omnigraph/config.yaml` (servers, clusters, credentials, actor, profiles, aliases, defaults). Graphs are addressed via `--store`/`--server`/`--cluster`/`--profile`/operator defaults (RFC-011). Multi-format output (json/jsonl/csv/kv/table).

Throughout the docs, capabilities are split into **L1 — Inherited from Lance** vs **L2 — Added by OmniGraph**.

---

## Architecture at a glance

```
CLI (omnigraph)        HTTP Server (omnigraph-server, Axum)
        │                            │
        └─────────────┬──────────────┘
                      ▼
           omnigraph-compiler  ── Pest grammars, catalog, IR, lowering, lint, migration plan
                      │
                      ▼
           omnigraph (engine)  ── ManifestCoordinator, CommitGraph, RunRegistry, GraphIndex (CSR/CSC), exec
                      │
                      ▼
              Lance 7.x         ── columnar Arrow, fragments, per-dataset versions/branches, indexes
                      │
                      ▼
        Object store (file / s3 / RustFS / MinIO / S3-compat)
```

Full diagram and concurrency model: [docs/dev/architecture.md](docs/dev/architecture.md).

---

## Where to find each topic

| Area | Read |
|---|---|
| **User docs entry point (public CLI/API/operator docs)** | **[docs/user/index.md](docs/user/index.md)** |
| **Developer docs entry point (architecture, invariants, testing, internals)** | **[docs/dev/index.md](docs/dev/index.md)** |
| **Architectural invariants & deny-list (read before any non-trivial proposal or review)** | **[docs/dev/invariants.md](docs/dev/invariants.md)** |
| **Lance docs index — fetch upstream Lance docs by problem domain** | **[docs/dev/lance.md](docs/dev/lance.md)** |
| **Test coverage map — what's covered, what helpers to reuse, before-every-task checklist** | **[docs/dev/testing.md](docs/dev/testing.md)** |
| Architecture, L1/L2 framing, concurrency model | [docs/dev/architecture.md](docs/dev/architecture.md) |
| Storage layout, `__manifest` schema, URI schemes, S3 env vars | [docs/user/concepts/storage.md](docs/user/concepts/storage.md) |
| `.pg` schema language, types, constraints, annotations, migration planning | [docs/user/schema/index.md](docs/user/schema/index.md) |
| Schema-lint codes (`OG-XXX-NNN`), families, severity, suppression | [docs/user/schema/lint.md](docs/user/schema/lint.md) |
| `.gq` query language, MATCH/RETURN/ORDER, IR ops, lint codes | [docs/user/queries/index.md](docs/user/queries/index.md) |
| Mutations — insert/update/delete, D2, atomicity | [docs/user/mutations/index.md](docs/user/mutations/index.md) |
| Search funcs (`nearest`/`bm25`/`rrf`), hybrid ranking | [docs/user/search/index.md](docs/user/search/index.md) |
| Indexes (BTREE / inverted / vector / graph topology) | [docs/user/search/indexes.md](docs/user/search/indexes.md) |
| Embeddings (engine client, env vars, `@embed`) | [docs/user/search/embeddings.md](docs/user/search/embeddings.md) |
| Concepts — what OmniGraph is, L1/L2 framing | [docs/user/concepts/index.md](docs/user/concepts/index.md) |
| Quickstart — init → load → query → branch | [docs/user/quickstart.md](docs/user/quickstart.md) |
| Branches, commit graph, system branches | [docs/user/branching/index.md](docs/user/branching/index.md) |
| Snapshots & time travel | [docs/user/branching/time-travel.md](docs/user/branching/time-travel.md) |
| Three-way merge and conflict kinds (user-facing) | [docs/user/branching/merge.md](docs/user/branching/merge.md) |
| Transactions and atomicity (per-query atomic; branches as multi-query transactions) | [docs/user/branching/transactions.md](docs/user/branching/transactions.md) |
| Direct-publish write path (staging, D2, recovery sidecars; the former Run state machine) | [docs/dev/writes.md](docs/dev/writes.md) |
| Three-way merge and conflict kinds | [docs/dev/merge.md](docs/dev/merge.md) |
| Diff / change feed (`diff_between`, `diff_commits`) | [docs/user/branching/changes.md](docs/user/branching/changes.md) |
| Query execution, mutation execution, bulk loader, `load` vs `ingest` | [docs/dev/execution.md](docs/dev/execution.md) |
| `optimize` (compaction) and `cleanup` (version GC) | [docs/user/operations/maintenance.md](docs/user/operations/maintenance.md) |
| Cluster operator guide (deploy/manage clusters, approvals, recovery, serving) | [docs/user/clusters/index.md](docs/user/clusters/index.md) |
| Cedar policy actions, scopes, CLI | [docs/user/operations/policy.md](docs/user/operations/policy.md) |
| HTTP server endpoints, auth, error model, body limits | [docs/user/operations/server.md](docs/user/operations/server.md) |
| CLI quick-start | [docs/user/cli/index.md](docs/user/cli/index.md) |
| CLI command surface and config schema (`~/.omnigraph/config.yaml`) | [docs/user/cli/reference.md](docs/user/cli/reference.md) |
| Audit / actor tracking | [docs/user/operations/audit.md](docs/user/operations/audit.md) |
| Error taxonomy and result serialization | [docs/user/operations/errors.md](docs/user/operations/errors.md) |
| Install (binary / Homebrew / source / channels) | [docs/user/install.md](docs/user/install.md) |
| Deployment (binary / container / S3-local testing / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) |
| CI / release workflows | [docs/dev/ci.md](docs/dev/ci.md) |
| Code ownership (CODEOWNERS source of truth, roles, regeneration) | [docs/dev/codeowners.md](docs/dev/codeowners.md) |
| Branch protection policy (declarative, applied via `scripts/apply-branch-protection.sh`) | [docs/dev/branch-protection.md](docs/dev/branch-protection.md) |
| Constants & tunables cheat sheet | [docs/user/reference/constants.md](docs/user/reference/constants.md) |
| Per-version release notes | [docs/releases/](docs/releases/) |

---

## First principle: engineering is programming integrated over time

Software engineering is **programming integrated over time** (Winters, *Software Engineering at Google*). A line of code costs you at every future read, refactor, migration, and dependent change — not just at write-time. So the operative question for any change is: **which option has the lower ongoing liability?** Not "shorter now," not "fastest to ship," but which leaves the codebase narrower in the long run. **Complexity should be earned** — by demonstrated correctness, performance, or future-shape cost; never by speculation.

This is a decision lens, not a code-size rule. It cuts both ways. Sometimes the lower-liability option is:

- **More code.** A centralized dispatcher costs more lines than an ad-hoc heal hook, but each future change adds a match arm instead of a new hook scattered through the engine.
- **Less code.** Three similar lines that may diverge later cost less to maintain than a premature abstraction that has to be retrofitted every time a caller deviates.
- **DRYing.** Two copies of business logic that must stay in sync are a perpetual drift risk.
- **Duplication.** Two callers that look similar today but have independent evolution pressure shouldn't be wedged through a shared helper just because the lines match.
- **Removal.** A "just in case" code path with no caller is pure surface area: tests for it, docs that mention it, future changes that have to consider it.
- **Addition.** A migration framework, a typed error variant, a feature flag — each adds code now and lowers the cost of every future change in its surface.
- **A new abstraction**, when the absence forces every consumer to re-derive the same logic. Or **flattening one**, when the abstraction has accumulated more special-cases than the code it replaced.

When evaluating a design, ask: *"what does this look like after 5 more changes like it?"* If the answer is "this converges to one shape", cost is bounded. If it's "this forks every time", the option is mortgaging the future for present convenience — pick differently.

### Tiebreakers when liability alone is silent

- **Correctness > simplicity > performance.** Lexicographic — give up performance for simpler code; give up simplicity for correct code; never give up correctness. The deny-list ("no silent failures," "no acks before durable persistence," "no reads of partial commits") is this rule's hard floor.
- **Reversibility shapes evidence demand.** Reversible changes wait for evidence: prefer prod metrics over napkin math over RFCs. Irreversible changes (substrate choice, on-disk format, database guarantees) earn an RFC, because by the time prod tells you they were wrong, you've shipped years of dependent code. Reviewers should spot both failure modes — RFC-ing a one-line config, and measuring-your-way into a substrate decision.

The always-on rules below and the deny-list in [docs/dev/invariants.md](docs/dev/invariants.md) are specific applications of this principle; when the rules are silent, fall back to it.

---

## Always-on rules (load these into your working memory)

These are architectural rules that need to be in scope on every change. They're framed at the level that survives renames and refactors — the deeper implementation specifics (function names, lock names, branch-prefix conventions, enforcement points) live in the per-area docs and may evolve. The full architectural invariants and deny-list are in [docs/dev/invariants.md](docs/dev/invariants.md); the deny-list is the fastest first-pass when reviewing any change.

1. **Multi-dataset publish is atomic across the whole graph.** A graph commit flips every relevant sub-table version visible together, in one manifest write. Don't introduce code paths that publish per sub-table outside the unified publish path — that loses cross-table snapshot isolation.
2. **Snapshot isolation per query.** A query holds one snapshot for its lifetime. Don't re-read the current head mid-query.
3. **Mutations are atomic at the commit boundary.** Multi-statement change queries publish one commit. Don't commit per-statement.
4. **Bearer-token plaintext never persists in process memory.** Tokens are hashed at startup; auth uses constant-time comparison; the actor id is server-resolved from the hash match and must not be settable by the client.
5. **Reads always see the current index state for the branch they're reading.** Indexes track the branch head, not historical snapshots. If you change index lifecycle, preserve this guarantee.
6. **Stable type IDs survive renames.** Schema migration relies on identity that's stable across rename — don't mint new IDs on rename.
7. **Logical contract over physical state.** Physical state (index coverage, fragment layout, compaction versions, staged writes) is derived and rebuildable; it must never fail a logical operation. Check preconditions against logical state and let reconciliation converge the physical state idempotently — genuine logical conflicts still fail loudly. This is the rule rules 1–6 instantiate; full statement and applications in [docs/dev/invariants.md](docs/dev/invariants.md).

### Deny-list (fast-pass review filter — full reasoning in [docs/dev/invariants.md](docs/dev/invariants.md))

If a proposal fits one of these, the burden is on the proposer to justify why this case is the exception:

- Synchronous-inline index updates for indexes expensive to build (vector ANN, FTS) — use the reconciler pattern.
- Custom WAL / transaction manager / buffer pool — Lance owns these.
- Job queue for state derivable from manifest — reconciler pattern instead.
- Per-feature lowering for shapes that share a structure (interfaces, wildcards, alternation) — use one mechanism.
- Eager materialization of cross-products in multi-hop — factorize; flatten only when needed.
- Ad-hoc IN-list filtering when SIP fits.
- String-flattened SQL filter generation when structured pushdown is available.
- In-process-only `Dataset` impls — `Send + Sync`, remote descriptors.
- Cost-blind plan choice — lowering-order execution is not a planner.
- Hidden statistics — if a metric matters for plan choice, it must be exposed through the trait surface.
- Side-channels for query semantics — search modes, mutations, polymorphism are first-class IR concepts.
- Discarding rank in retrieval — score and rank propagate as columns.
- State that drifts from the manifest — derive from observable state.
- Cloud-only correctness fixes — correctness is always OSS.
- Forking the codebase for Cloud — trait-extension only.
- Hand-rolling something Lance already does — check the spec first.
- Mutating in place state that should be immutable (Lance fragments, index segments) — new segments instead.
- Silent failures — OOM, timeout, partial result must all be surfaced and bounded.
- Shipping observable behavior as if it weren't part of the contract — output ordering, error-message text, timestamp precision, default-flag values, latency profile. Per Hyrum's Law, every observable behavior gets depended on once shipped; don't expose what you don't want to commit to.

---

## Build, test, lint

Rust stable workspace (edition 2024). `protoc` is a build dependency (`brew install protobuf` / `apt-get install protobuf-compiler libprotobuf-dev`). **Crate dir ≠ package name** for the engine: the directory is `crates/omnigraph` but its Cargo package is `omnigraph-engine` (use that in `-p`). The CLI binary built from `omnigraph-cli` is named `omnigraph`.

```bash
cargo build --workspace --locked              # build everything
cargo test  --workspace --locked              # the canonical CI gate (matches CI exactly)
cargo run -p omnigraph-cli -- <args>          # run the `omnigraph` CLI from source
cargo run -p omnigraph-server -- --cluster <dir|s3://...> --bind 0.0.0.0:8080   # run the server from source

# Run one crate / one test file / one test fn
cargo test -p omnigraph-engine --test traversal           # one integration-test file (see docs/dev/testing.md)
cargo test -p omnigraph-engine --test writes concurrent   # one test fn by name substring
cargo test -p omnigraph-engine some_inline_test -- --nocapture   # show stdout

# Feature-gated suites (each is its own job in CI, not part of the default run)
cargo test -p omnigraph-engine --features failpoints --test failpoints   # fault injection
cargo build -p omnigraph-server --features aws   # AWS Secrets Manager bearer-token source
```

S3-backed tests (`s3_storage`, and the S3 paths in server/CLI system tests) **skip** unless `OMNIGRAPH_S3_TEST_BUCKET` + `AWS_*` (incl. `AWS_ENDPOINT_URL_S3` for non-AWS) are set; CI runs them against containerized RustFS. To run RustFS/MinIO yourself, see [docs/user/deployment.md](docs/user/deployment.md) → *Testing against S3 locally*.

CI does **not** run `clippy` or `rustfmt` as gates — but `cargo test --workspace --locked` is the exact gate, so run it before pushing. Two non-test CI checks: `scripts/check-agents-md.sh` (doc cross-link integrity — run it after moving/renaming docs) and OpenAPI drift (`crates/omnigraph-server/tests/openapi.rs` regenerates `openapi.json`; set `OMNIGRAPH_UPDATE_OPENAPI=1` to update the checked-in copy when a server/API change is intentional).

---

## Quick-reference flows

```bash
# Initialize an S3-backed graph
omnigraph init --schema ./schema.pg s3://my-bucket/graph.omni

# Bulk load
omnigraph load --data ./seed.jsonl --mode overwrite s3://my-bucket/graph.omni

# Load a review batch onto its own branch (--from forks it if missing)
omnigraph load --branch review/2026-04-25 --from main --mode merge --data ./batch.jsonl s3://my-bucket/graph.omni

# Run a hybrid (vector + BM25) query — ad-hoc .gq against a store (positional = query name)
omnigraph query --query ./queries.gq find_similar \
  --params '{"q":"trends in AI safety"}' --format table --store s3://my-bucket/graph.omni

# Plan + apply schema migration
omnigraph schema plan  --schema ./next.pg s3://my-bucket/graph.omni
omnigraph schema apply --schema ./next.pg s3://my-bucket/graph.omni --json

# Merge review branch back
omnigraph branch merge review/2026-04-25 --into main s3://my-bucket/graph.omni

# Compact, preview any uncovered drift, then repair/GC after review
omnigraph optimize s3://my-bucket/graph.omni
omnigraph repair s3://my-bucket/graph.omni
omnigraph repair --confirm s3://my-bucket/graph.omni
# For suspicious/unverifiable drift only after deliberate review:
# omnigraph repair --force --confirm s3://my-bucket/graph.omni
omnigraph cleanup  --keep 10 --older-than 7d s3://my-bucket/graph.omni
omnigraph cleanup  --keep 10 --older-than 7d --confirm s3://my-bucket/graph.omni

# Stand up the HTTP server (token from env)
OMNIGRAPH_SERVER_BEARER_TOKEN=xxxx \
  omnigraph-server --cluster s3://my-bucket/cluster --bind 0.0.0.0:8080

# Cedar policy explain
omnigraph policy explain --cluster ./company-brain --graph knowledge --actor act-alice --action change --branch main
```

---

## Capability matrix — "Lens by default vs. added by OmniGraph"

| Capability | L1 (Lance default) | L2 (OmniGraph adds) |
|---|---|---|
| Columnar storage on object store | ✅ Arrow/Lance | URI normalization, S3 env-var plumbing |
| Per-dataset versioning + time travel | ✅ | `snapshot_at_version`, `entity_at`, snapshot-pinned reads across many tables |
| Per-dataset branches | ✅ | **Graph-level** branches (atomic across all sub-tables), lazy fork, system branch filtering |
| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) the open-time recovery sweep for the residual gap between (1) and (2). All three layers ship; the five migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`, `optimize_all_tables`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore` followed by a manifest publish of the restored version (so both directions converge to `manifest == HEAD` — no residual drift), and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). The write entry points (`load_as`, `mutate_as`, `apply_schema_as`, `branch_merge_as`) and `refresh` additionally run an in-process roll-forward-only heal (serialized against live writers via the per-table write queues), so a long-lived server converges on its next write without restart; only rollback-eligible sidecars still defer to the next read-write open (a future background reconciler's goal). Engine writes route through a sealed `TableStorage` trait (`db.storage()`) exposing only `stage_*` + `commit_staged` + reads; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so the default surface cannot couple a write with a HEAD advance — §1 holds by construction. `delete_where` and `create_vector_index` stay inline until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)); `LoadMode::Overwrite` uses Lance `Overwrite` staged transactions. |
| Compaction (`compact_files`) + reindex (`optimize_indices`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency; per table runs `compact_files` **then Lance `optimize_indices`** (folds appended/rewritten fragments back into existing indexes — incremental merge, not retrain) and **publishes the resulting version to `__manifest`** (so the manifest tracks the Lance HEAD — required for reads to observe the work and for schema apply / strict writes to pass their HEAD-vs-manifest precondition), under the per-`(table, main)` write queue with `SidecarKind::Optimize` recovery coverage spanning both ops; **commits even with no compaction work if index coverage is stale**; **refuses on an unrecovered graph**; **skips uncovered HEAD > manifest drift** with `DriftNeedsRepair`; **skips blob-bearing tables** (reported via `TableOptimizeStats.skipped`, not silent; reindex is skipped for them too today), gated on `LANCE_SUPPORTS_BLOB_COMPACTION` until the upstream blob-v2 compaction-decode bug is fixed (see [docs/dev/invariants.md](docs/dev/invariants.md) Known Gaps) |
| Repair uncovered drift | — | `omnigraph repair` explicitly classifies uncovered table `HEAD > manifest` drift: verified maintenance drift (`ReserveFragments`/`Rewrite`) can be published with `--confirm`; suspicious or unverifiable drift requires `--force --confirm`. Sidecar-covered crash residuals still recover automatically on open. |
| Cleanup (`cleanup_old_versions`) | ✅ | `omnigraph cleanup` with `--keep` / `--older-than` policy |
| BTREE / inverted (FTS) / vector indexes | ✅ | `@index`/`@key` declares intent; the physical index is derived state that never fails a logical op. Built per column through one chokepoint (`build_indices_on_dataset_for_catalog`, type-dispatched by `node_prop_index_kind`: enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector); idempotent; lazy across branches. **Schema apply builds nothing** (records intent only); `load`/`mutate` build inline but **defer an untrainable Vector column** (no trainable vectors yet) as *pending* rather than aborting. `ensure_indices`/`optimize` is the reconciler that materializes declared-but-missing indexes and restores coverage of appended/rewritten fragments (`optimize_indices`), reporting still-pending columns (see Compaction row). |
| `merge_insert` upsert | ✅ | `LoadMode::Merge`, mutation `update`/`insert`/`delete` lowering |
| Vector search | ✅ | `nearest()` query op; embedding pipeline (Gemini / OpenAI clients); `@embed` in schema |
| Full-text search | ✅ | `search/fuzzy/match_text/bm25` query ops |
| Hybrid ranking | — | `rrf(...)` Reciprocal Rank Fusion in one runtime |
| Graph traversal | — | CSR/CSC topology index, `Expand` IR op, variable-length hops, `not { }` anti-join |
| Schema language | — | `.pg` + Pest grammar + catalog + interfaces + constraints + annotations |
| Query language | — | `.gq` + Pest grammar + IR + lowering + linter |
| Schema migration planning | — | `plan_schema_migration` + `apply_schema` step types + `__schema_apply_lock__` |
| Commit graph (DAG) across whole graph | — | `_graph_commits.lance` with linear + merge parents, ULID ids, actor map |
| Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing |
| Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` |
| Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming |
| Cedar policy | — | Per-graph actions plus server-scoped actions (see [docs/user/operations/policy.md](docs/user/operations/policy.md) for the current list), branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as` — the deprecated `ingest_as` shims route through it — `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. |
| HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export, **cluster-only boot (RFC-011): always `--cluster <dir | s3://…>`, serving N graphs (N ≥ 1) under multi-graph routes + read-only `GET /graphs` enumeration + per-graph + server-level Cedar policies. Add/remove graphs via `cluster apply` and restart.** |
| CLI with config | — | two-surface config (team `cluster.yaml` dir + per-operator `~/.omnigraph/config.yaml`), scope addressing (`--store`/`--server`/`--cluster`/`--profile`/defaults, RFC-011), aliases, multi-format output (json/jsonl/csv/kv/table) |
| Audit / actor tracking | — | `_as` write APIs + actor map in commit graph |
| Local S3 testing | — | run RustFS/MinIO + the `AWS_*` env; see [docs/user/deployment.md](docs/user/deployment.md) → *Testing against S3 locally* |
| Agent skill | — | `skills/omnigraph` — operational playbook for driving Omnigraph; install with `npx skills add ModernRelay/omnigraph@omnigraph` |

---

## Maintenance contract for agents

When you change something user-visible, **update the relevant `docs/user/<area>.md` in the same change**. Use [docs/user/index.md](docs/user/index.md) for public behavior and [docs/dev/index.md](docs/dev/index.md) for contributor/internal mechanics. Pointers from this file to those docs must keep working — CI enforces cross-link integrity via `scripts/check-agents-md.sh`.

When proposing or reviewing a non-trivial change, walk [docs/dev/invariants.md](docs/dev/invariants.md) — at minimum the deny-list and review checklist. Add to the deny-list when a new anti-pattern surfaces; relaxing an invariant requires the same review process as code.

Rules:

1. **Update in the same PR.** New endpoint, query function, CLI flag, env var, constant, schema construct, or invariant: update both the source code and the doc in the same change. Never split documentation drift into a follow-up.
2. **Bump version on release.** When a release boundary crosses (e.g. v0.3.1 → v0.3.2), update the version line at the top of this file and add a `docs/releases/<version>.md` describing the user-visible delta. Update [docs/dev/architecture.md](docs/dev/architecture.md) only if the architecture itself changed.
3. **Write OSS-facing release notes.** Release docs are public project history. Describe capabilities, behavior changes, breaking changes, upgrade notes, and user impact; do not reference private ticket systems, internal codenames, or planning shorthand that an outside contributor cannot inspect.
4. **Keep versioning coherent.** A release bump must update every published crate manifest, local path dependency constraint, `Cargo.lock`, generated API metadata such as `openapi.json`, and this file's surveyed version. Do not leave mixed package versions unless the release plan explicitly calls for them.
5. **Keep docs audience-neutral.** Prefer stable public identifiers (versions, PR numbers, public issue links, crate names, endpoint names) over organization-specific labels. If internal context is useful for maintainers, translate it into a durable public rationale before committing it.
6. **Don't lie.** If a section becomes wrong but you can't rewrite it fully right now, replace the wrong line with `*(stale — needs update after <change>)*` rather than leaving silently incorrect text. Then fix it ASAP.
7. **Re-verify before recommending.** If you cite a flag, env var, endpoint, or constant to the user or in code, grep for it in source first. Memory and docs go stale; the code is authoritative.
8. **Keep AGENTS.md short.** This file is always loaded into agent context, so every added line has a recurring context-window cost. Prefer pointers and terse invariants here; put detail in `docs/`.
9. **Keep AGENTS.md a map, not an encyclopedia.** New deep content goes into `docs/`. Add an entry to "Where to find each topic" instead of pasting prose into this file. The "Always-on rules" section is the exception — it's for invariants that should always be in scope.
10. **Re-read on schema/query/IR changes.** Edits to `schema.pest`, `query.pest`, `ir/lower.rs`, `query/typecheck.rs`, or `query/lint.rs` should trigger a re-read of [docs/user/schema/index.md](docs/user/schema/index.md), [docs/user/queries/index.md](docs/user/queries/index.md), and [docs/dev/execution.md](docs/dev/execution.md) to confirm they still describe reality.
11. **Always make smaller commits.** Each commit does one thing, compiles, and passes tests; mechanical refactors land separately from the behavior changes they enable.
12. **Test-first for bug fixes.** When fixing an identified bug, write a regression test that reproduces the failure first. Confirm it fails against the current code with the predicted symptom (not an unrelated error). Then land the fix in a separate commit and confirm the test turns green. The test commit lands just before the fix commit so the red → green pair is visible in `git log` and a reviewer can check out the test commit alone and reproduce the failure.
13. **Correct by design over symptomatic patches.** When a bug surfaces, identify the root cause and make the fix correct by construction. Don't patch the symptom. If the design admits the bug class, the fix is to close the class, not to add a guard around the latest instance. A symptomatic patch is acceptable only as a stop-gap, with an explicit note in the commit message and a follow-up issue tracking the design fix.

CI check: `scripts/check-agents-md.sh` verifies that docs links in this file and the audience indexes resolve, and that every canonical doc is linked from either [docs/user/index.md](docs/user/index.md) or [docs/dev/index.md](docs/dev/index.md). Run it locally before opening a PR if you've moved or renamed docs.
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
+								# OmniGraph — Agent Guide
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								This file is the always-on map for AI coding agents (Claude Code, Codex, Cursor, Cline) working in this codebase. It is loaded into context on every turn, so it stays as a **map plus the rules and invariants that need to be in scope at all times** — the encyclopedia content lives under [`docs/`](docs/). When you need depth, follow a pointer.
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Address reviewer feedback (Cursor + cubic) on PR #60

All eight comments verified against source and applied:

- AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of
  the markdown blockquote. Claude Code's @-import parser expects @ at
  column 0; the leading "> " of a blockquote silently broke
  recognition, so the claimed auto-include did nothing. (Cursor,
  Medium severity.)
- docs/cli-reference.md: command-family count 13 → 17. The current
  enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level
  variants. (cubic P2.)
- docs/ci.md: Homebrew tap update is a regular `git push`, not a
  force-push (release.yml:117 is `git push origin HEAD:main`). (cubic
  P2.)
- docs/errors.md: add the Storage variant to the NanoError list — it
  exists at error.rs:88-89 but the doc enumerated only 10 of 11.
  (cubic P2.)
- docs/storage.md: clarify tombstone semantics. There is no
  tombstone_version column; state.rs:180 reads the tombstone version
  from the table_version column on rows where object_type =
  table_tombstone. (cubic P2.)
- docs/branches-commits.md: split the GraphCommit pseudo-struct from
  the underlying storage. actor_id is joined in-memory from
  _graph_commit_actors.lance, not a column on _graph_commits.lance.
  (cubic P2.)
- docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to
  match the actual constant name in catalog/schema_ir.rs:11.
  (cubic P3.)
- docs/testing.md: engine integration test count 16 → 15 (matches
  `ls crates/omnigraph/tests/*.rs`). (cubic P3.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-29 00:09:06 +02:00
+								**Required reading every session, every change:**
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+. **[docs/dev/invariants.md](docs/dev/invariants.md)** — the architectural invariants and deny-list. Apply to every PR, not only architecture work.
-												docs: drop npx mdrip; use curl | pandoc for full-page fetches (#97)

The previous "fetch the full page" recommendation in AGENTS.md and
docs/dev/lance.md pointed at an unknown-author npm CLI that, on consent,
wrote agent-targeted content into AGENTS.md and modified .gitignore /
tsconfig.json. Source audit was clean of malicious code but the
self-perpetuating prompt-injection pattern combined with a single
maintainer and ~21 downloads/day made it not worth the risk. Switched
to the curl + pandoc command already documented as the no-tool option.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
											
										
										
											2026-05-15 16:06:24 +03:00
+. **[docs/dev/lance.md](docs/dev/lance.md)** — the curated index of upstream Lance docs. **Consult it before every task** to identify which Lance pages are relevant. **Then fetch every page in the matching domain section, plus every page that is even slightly relevant** — not just the page whose title most obviously matches the task. Behavior is interlocked across pages (transactions reference index lifecycle; index lifecycle references compaction; compaction references row-id lineage), and skipping a "slightly relevant" page is how alignment misses happen. The index itself is not a substitute for reading the pages — never act on the index alone. **Always fetch the FULL page content, not summaries** — use `curl -sL <url> | pandoc -f html -t markdown` or paste the rendered page text manually. Tools that summarize pages (like Claude's `WebFetch`) drop load-bearing details — we have caught alignment misses (default flags, `pub(crate)` blockers, three-page sub-specs hidden behind navigation hubs) only after dumping the full markdown.
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+. **[docs/dev/testing.md](docs/dev/testing.md)** — the test-coverage map. **Always check what already covers your change before writing a new test.** Extending an existing test (an assertion, a fixture row, a parameterization) is preferred over a duplicated `init_and_load()` block. Walk the before-every-task checklist to identify existing coverage, run those tests as a clean baseline, and only add a new test fn or file when no existing one owns the area.
-												Address reviewer feedback (Cursor + cubic) on PR #60

All eight comments verified against source and applied:

- AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of
  the markdown blockquote. Claude Code's @-import parser expects @ at
  column 0; the leading "> " of a blockquote silently broke
  recognition, so the claimed auto-include did nothing. (Cursor,
  Medium severity.)
- docs/cli-reference.md: command-family count 13 → 17. The current
  enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level
  variants. (cubic P2.)
- docs/ci.md: Homebrew tap update is a regular `git push`, not a
  force-push (release.yml:117 is `git push origin HEAD:main`). (cubic
  P2.)
- docs/errors.md: add the Storage variant to the NanoError list — it
  exists at error.rs:88-89 but the doc enumerated only 10 of 11.
  (cubic P2.)
- docs/storage.md: clarify tombstone semantics. There is no
  tombstone_version column; state.rs:180 reads the tombstone version
  from the table_version column on rows where object_type =
  table_tombstone. (cubic P2.)
- docs/branches-commits.md: split the GraphCommit pseudo-struct from
  the underlying storage. actor_id is joined in-memory from
  _graph_commit_actors.lance, not a column on _graph_commits.lance.
  (cubic P2.)
- docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to
  match the actual constant name in catalog/schema_ir.rs:11.
  (cubic P3.)
- docs/testing.md: engine integration test count 16 → 15 (matches
  `ls crates/omnigraph/tests/*.rs`). (cubic P3.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-29 00:09:06 +02:00
 								Tools that support `@`-imports (Claude Code) auto-include all three files via the imports below — note these must sit at column 0 (not inside a blockquote) for the parser to recognize them. Other agents (Codex, Cursor, Cline, …) must open them explicitly at the start of each session.
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								@docs/dev/invariants.md
 								@docs/dev/lance.md
 								@docs/dev/testing.md
-												Make docs/invariants.md required reading every session

Adds a top-of-file directive plus a Claude-Code @-import so the full
invariants document is loaded into context on every turn, not only
when an agent follows a pointer. Other agents are instructed to open
it explicitly at session start. The §IX deny-list and §X review
checklist apply to every change, so they should be in scope by
default rather than gated on the agent remembering to look.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:39:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								`CLAUDE.md` is a symlink to this file — there is exactly one source of truth. Edit `AGENTS.md`.
-												release: bump workspace to 0.7.0

All six crate manifests + their path-dependency constraints, Cargo.lock,
the regenerated openapi.json version metadata, AGENTS.md's surveyed
version, and the v0.7.0 release notes (object-storage clusters,
config-free --cluster serving, the operator config surface, keyed
credentials, operator targeting/aliases, and the omnigraph.yaml
deprecation stages).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

											
										
										
											2026-06-12 13:21:03 +03:00
+								**Version surveyed:** 0.7.0
-												docs: omnigraph-api-types in the crate list; RFC-009 Phase 2 outcome

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

											
										
										
											2026-06-13 17:10:00 +03:00
+								**Workspace crates:** `omnigraph-compiler`, `omnigraph` (engine), `omnigraph-policy`, `omnigraph-api-types` (shared HTTP wire DTOs), `omnigraph-cluster`, `omnigraph-cli`, `omnigraph-server`
-												build(deps): bump Lance 6.0.1 → 7.0.0 (correct-by-design substrate alignment) (#229)

* build(deps): bump Lance 6.0.1 → 7.0.0 (object_store 0.13.2, roaring 0.11.4)

Arrow stays 58 and DataFusion stays 53 (no change). The only transitive bump
is object_store 0.12.5 → 0.13.2. 141 upstream commits reviewed; no fixes lost
(the 6.0.x release-branch backports are all forward-ported into 7.0.0).

- object_store 0.13 moved get/put/head/rename/delete behind a new ObjectStoreExt
  trait (list/list_with_delimiter/put_opts stay on the core trait). Add
  `use object_store::ObjectStoreExt` in storage.rs and db/manifest/namespace.rs;
  no call-site changes. Mirrors Lance's own migration in PR #6672.
- roaring pinned to 0.11.4 (cargo update -p roaring --precise 0.11.4). Lance
  7.0.0's UpdatedFragmentOffsets newtype (lance#6650) derives Eq over
  HashMap<u64, RoaringBitmap>, which needs RoaringBitmap: Eq, added in roaring
  0.11.4; the loose `roaring = "0.11"` constraint otherwise resolves 0.11.3 and
  lance itself fails to compile.
- lance#6774: merge-insert INSERT rows now stamp _row_created_at_version with the
  commit version (was a fallback of 1). Flip the lance_version_columns assertion
  to `== v2` and correct the changes/mod.rs rationale comment. Production
  change-detection keys on _row_last_updated_at_version + ID membership, so its
  logic is unaffected.

Refs lance#6650, lance#6774, lance#6672.

* fix(storage): pin WriteParams::auto_cleanup = None (lance#6755 default flip)

lance#6755 flipped the WriteParams::auto_cleanup default from on (a full cleanup
pass every 20th commit) to None. On 6.0.1 the on-by-default hook could silently
GC versions that __manifest pins for snapshots/time-travel. OmniGraph owns
cleanup explicitly (optimize.rs::cleanup_all_tables) and never set auto_cleanup,
so it was relying on a default that is both wrong for our snapshot model and now
changed upstream.

Pin auto_cleanup: None explicitly at all 11 production WriteParams sites
(table_store ×6, commit_graph ×2, recovery_audit ×1, manifest/graph ×2 — the
__manifest + sub-table Create paths). Removes the dependency on a default-flag
value and locks in the snapshot-safe behavior regardless of future upstream
re-flips.

Refs lance#6755.

* test(lance): pin BTREE range-boundary correctness (lance#6796)

lance#6796 (issue #6792) fixed a BTREE scalar-index range-query bound
inclusiveness bug: `x <= hi AND x > lo` returned the wrong boundary row.
Add lance_surface_guards.rs::btree_range_query_boundary_is_correct, which
reproduces the exact #6792 shape (5 rows + an explicit BTREE drives the index
path even on tiny data) and pins the corrected inclusive-<= / exclusive->
semantics. It turns red if a future Lance regression reintroduces the bug.

OmniGraph today builds BTREE only on string @key columns and queries them by
equality/IN, so its current patterns do not hit this; the guard protects any
future BTREE-range path (BTREE-on-properties, range-on-key).

Refs lance#6796.

* docs(dev): align Lance docs + invariants to 7.0.0

- docs/dev/lance.md: new 2026-06-14 alignment stanza for the 6.0.1 → 7.0.0 bump
  (object_store ObjectStoreExt move, roaring 0.11.4, #6774/#6796/#6755 behavior,
  #6658 shipped → MR-A unblocked but separate, #6666 + blob compaction still
  open); prior 6.0.1 stanza demoted to historical.
- AGENTS.md: storage substrate 6.x → 7.x (line + architecture diagram).
- docs/dev/invariants.md: deletes/vector known gap updated — the staged
  two-phase delete API (lance#6658) now exists and MR-A is unblocked, but
  delete_where stays inline and D2 stays in place until the migration lands;
  create_vector_index still gated on lance#6666.

* fix(storage): skip Lance auto-cleanup on commit paths for legacy datasets

Addresses PR #229 review (Codex P1). `WriteParams::auto_cleanup` is create-time
config with no effect on existing datasets (Lance write.rs docs), so the previous
`auto_cleanup: None` change alone did NOT protect graphs created before the v7
bump: 6.0.1 defaulted auto_cleanup ON, leaving `lance.auto_cleanup.*` config on
those datasets, and Lance's per-commit hook (io/commit.rs: `if
!commit_config.skip_auto_cleanup`) fires off that stored config — so omnigraph's
own writes would GC versions the __manifest pins for snapshots/time-travel.

Skip the hook on every commit path, covering new and legacy datasets alike:
- commit_staged: CommitBuilder::with_skip_auto_cleanup(true) — the staged data path.
- __manifest publisher: MergeInsertBuilder::skip_auto_cleanup(true).
- all 11 WriteParams: skip_auto_cleanup: true (direct Dataset::write/append paths;
  auto_cleanup: None retained so new datasets store no cleanup config at all).

Tests:
- lance_surface_guards::skip_auto_cleanup_suppresses_version_gc — substrate:
  negative control (config GCs v1 without skip) + with-skip survival.
- staged_writes::commit_staged_skips_auto_cleanup_so_pinned_versions_survive —
  omnigraph usage: commit_staged on a legacy-config dataset preserves the pinned
  create version.

Refs lance#6755.

* test(lance): assert created_at-preserved + updated_at-bumped on merge_insert UPDATE

Addresses PR #229 review follow-up. `lance_merge_insert_update_preserves_created_at_version`
documented (in a comment) that a merge_insert UPDATE preserves created_at and
bumps updated_at, but only asserted the value change — leaving the change-feed
invariant unguarded. Add the two missing assertions:

- bob created_at == v1 (preserved across UPDATE; what the test name promises;
  lance#6774 only changed INSERT-row stamping).
- bob updated_at == v2 (bumped to the commit version) — the invariant
  OmniGraph's insert/update classification relies on (changes/mod.rs keys on
  _row_last_updated_at_version). A regression here would silently drop updates
  from the diff/change feed.
											
										
										
											2026-06-14 20:42:24 +02:00
+								**Storage substrate:** Lance 7.x (columnar, versioned, branchable)
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
+								**License:** MIT
 								**Toolchain:** Rust stable, edition 2024
 								---
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								## Start here — what is this?
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets. Highlights:
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								- **Storage**: per node/edge type a separate Lance dataset; multi-dataset commits coordinated atomically through one `__manifest` table.
 								- **Languages**: a `.pg` schema language and a `.gq` query language, both Pest-based, with a typed IR.
 								- **Multi-modal querying**: vector ANN (`nearest`), full-text (`search`/`fuzzy`/`match_text`/`bm25`), Reciprocal Rank Fusion (`rrf`), and graph traversal (`Expand`, anti-join `not { … }`) in one runtime.
 								- **Branches and commits across the whole graph**: Git-style — every successful publish appends to a commit DAG; merges are three-way at the row level.
-												MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup

Refresh user-facing and agent-facing docs for the staged-write rewire
and clean up stale Run-state-machine references that survived MR-771.

MR-794-specific updates:
* docs/runs.md — remove "Known limitation: mid-query partial failure"
  section; document the in-memory accumulator + D₂ rule + the
  LoadMode::Overwrite residual.
* docs/invariants.md §VI.25 — flip from aspirational/open to
  upheld for inserts/updates. Within-query read-your-writes is now
  load-bearing for the publisher CAS contract.
* docs/architecture.md — add "Mutation atomicity — in-memory
  accumulator (MR-794)" subsection with per-op flow; refresh the
  engine + state diagrams to drop RunRegistry and add MutationStaging.
* docs/execution.md — rewrite the mutation flow sequence diagram
  for the staged-write path; updated the LoadMode table to call
  out per-mode commit semantics; rewrote load vs ingest.
* docs/query-language.md — document the D₂ parse-time rule.
* docs/errors.md — add the D₂ BadRequest rejection path.
* docs/testing.md — extend the runs.rs row to cover the new MR-794
  contract tests; add the staged_writes.rs row.
* docs/releases/v0.4.1.md (new) — release note covering the rewire,
  test additions, residuals, and files changed.
* AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query
  description and the L2 capability matrix row.

Stale-reference cleanup (MR-771 leftovers):
* docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance
  from the layout diagram and prose; mark legacy.
* docs/branches-commits.md — move __run__<id> to a legacy note;
  remove publish_run from the publish-trigger list.
* docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as);
  legacy RunRecord.actor_id moved to a historical note.
* docs/constants.md — mark run registry / branch-prefix rows as legacy.
* docs/cli.md — replace the legacy omnigraph run * quickstart block
  with omnigraph commit list/show.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-01 10:43:19 +02:00
+								- **Atomic per-query writes**: `mutate_as` and `load` accumulate insert/update batches into an in-memory `MutationStaging.pending` per touched table; one `stage_*` + `commit_staged` per table runs at end-of-query, then `ManifestBatchPublisher::publish` commits the manifest atomically with per-table `expected_table_versions` CAS. A mid-query failure leaves Lance HEAD untouched on staged tables — no drift, no run state machine, no staging branches. Deletes still inline-commit; D₂ at parse time prevents inserts/updates and deletes from coexisting in one query.
-												feat!: delete the legacy OmnigraphConfig + config migrate; finish the omnigraph.yaml docs sweep (#252)

* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI

The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.

No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(cli)!: remove the `config migrate` command and migrate.rs

`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.

- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
  dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(server)!: delete the legacy OmnigraphConfig type and load_config

With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).

- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
  only caller was its own test) is removed — server boot and the CLI both build
  registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
  (`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).

No behavior change — every deleted item was unreachable after the excision.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: purge the legacy omnigraph.yaml surface from the docs

Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.

- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
  single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
  two-surface-config passage (drop `config migrate`, the deprecation env vars,
  and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
  file)" section and the `config migrate` row; re-home the `policy` row, the
  bearer-token chain, the actor/format/param-precedence references, and the
  `--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
  cluster (`--cluster` + `cluster apply`); fix the policy examples to
  `--cluster`; replace the `## Config` omnigraph.yaml example with the
  operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
  `policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
  cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
  references.

RFCs, design specs, and prior release notes are left as historical records.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-15 22:31:29 +03:00
+								- **HTTP server**: Axum + utoipa OpenAPI, bearer auth (SHA-256 hashed, optional AWS Secrets Manager). Cedar policy enforcement is engine-wide — every `_as` writer calls `Omnigraph::enforce(action, scope, actor)`, so HTTP, CLI, and embedded SDK consumers all hit the same gate. **Cluster-only boot** (RFC-011): the server always boots from a cluster directory (`--cluster <dir | s3://…>`, RFC-005) and serves N graphs (N ≥ 1) under multi-graph routes (`/graphs/{graph_id}/...` + read-only `GET /graphs` enumeration); there are no single-graph flat routes and no positional-URI boot. Per-graph + server-level Cedar policies. Runtime add/remove (`POST /graphs`, `DELETE /graphs/{id}`) is not exposed — operators run `cluster apply` and restart.
 								- **CLI** with two-surface config (RFC-007/008): the team-owned cluster directory (`cluster.yaml`) plus the per-operator `~/.omnigraph/config.yaml` (servers, clusters, credentials, actor, profiles, aliases, defaults). Graphs are addressed via `--store`/`--server`/`--cluster`/`--profile`/operator defaults (RFC-011). Multi-format output (json/jsonl/csv/kv/table).
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								Throughout the docs, capabilities are split into **L1 — Inherited from Lance** vs **L2 — Added by OmniGraph**.
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
 								---
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								## Architecture at a glance
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
 								```
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								CLI (omnigraph)        HTTP Server (omnigraph-server, Axum)
 								        │                            │
 								        └─────────────┬──────────────┘
 								                      ▼
 								           omnigraph-compiler  ── Pest grammars, catalog, IR, lowering, lint, migration plan
 								                      │
 								                      ▼
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								           omnigraph (engine)  ── ManifestCoordinator, CommitGraph, RunRegistry, GraphIndex (CSR/CSC), exec
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								                      │
 								                      ▼
-												build(deps): bump Lance 6.0.1 → 7.0.0 (correct-by-design substrate alignment) (#229)

* build(deps): bump Lance 6.0.1 → 7.0.0 (object_store 0.13.2, roaring 0.11.4)

Arrow stays 58 and DataFusion stays 53 (no change). The only transitive bump
is object_store 0.12.5 → 0.13.2. 141 upstream commits reviewed; no fixes lost
(the 6.0.x release-branch backports are all forward-ported into 7.0.0).

- object_store 0.13 moved get/put/head/rename/delete behind a new ObjectStoreExt
  trait (list/list_with_delimiter/put_opts stay on the core trait). Add
  `use object_store::ObjectStoreExt` in storage.rs and db/manifest/namespace.rs;
  no call-site changes. Mirrors Lance's own migration in PR #6672.
- roaring pinned to 0.11.4 (cargo update -p roaring --precise 0.11.4). Lance
  7.0.0's UpdatedFragmentOffsets newtype (lance#6650) derives Eq over
  HashMap<u64, RoaringBitmap>, which needs RoaringBitmap: Eq, added in roaring
  0.11.4; the loose `roaring = "0.11"` constraint otherwise resolves 0.11.3 and
  lance itself fails to compile.
- lance#6774: merge-insert INSERT rows now stamp _row_created_at_version with the
  commit version (was a fallback of 1). Flip the lance_version_columns assertion
  to `== v2` and correct the changes/mod.rs rationale comment. Production
  change-detection keys on _row_last_updated_at_version + ID membership, so its
  logic is unaffected.

Refs lance#6650, lance#6774, lance#6672.

* fix(storage): pin WriteParams::auto_cleanup = None (lance#6755 default flip)

lance#6755 flipped the WriteParams::auto_cleanup default from on (a full cleanup
pass every 20th commit) to None. On 6.0.1 the on-by-default hook could silently
GC versions that __manifest pins for snapshots/time-travel. OmniGraph owns
cleanup explicitly (optimize.rs::cleanup_all_tables) and never set auto_cleanup,
so it was relying on a default that is both wrong for our snapshot model and now
changed upstream.

Pin auto_cleanup: None explicitly at all 11 production WriteParams sites
(table_store ×6, commit_graph ×2, recovery_audit ×1, manifest/graph ×2 — the
__manifest + sub-table Create paths). Removes the dependency on a default-flag
value and locks in the snapshot-safe behavior regardless of future upstream
re-flips.

Refs lance#6755.

* test(lance): pin BTREE range-boundary correctness (lance#6796)

lance#6796 (issue #6792) fixed a BTREE scalar-index range-query bound
inclusiveness bug: `x <= hi AND x > lo` returned the wrong boundary row.
Add lance_surface_guards.rs::btree_range_query_boundary_is_correct, which
reproduces the exact #6792 shape (5 rows + an explicit BTREE drives the index
path even on tiny data) and pins the corrected inclusive-<= / exclusive->
semantics. It turns red if a future Lance regression reintroduces the bug.

OmniGraph today builds BTREE only on string @key columns and queries them by
equality/IN, so its current patterns do not hit this; the guard protects any
future BTREE-range path (BTREE-on-properties, range-on-key).

Refs lance#6796.

* docs(dev): align Lance docs + invariants to 7.0.0

- docs/dev/lance.md: new 2026-06-14 alignment stanza for the 6.0.1 → 7.0.0 bump
  (object_store ObjectStoreExt move, roaring 0.11.4, #6774/#6796/#6755 behavior,
  #6658 shipped → MR-A unblocked but separate, #6666 + blob compaction still
  open); prior 6.0.1 stanza demoted to historical.
- AGENTS.md: storage substrate 6.x → 7.x (line + architecture diagram).
- docs/dev/invariants.md: deletes/vector known gap updated — the staged
  two-phase delete API (lance#6658) now exists and MR-A is unblocked, but
  delete_where stays inline and D2 stays in place until the migration lands;
  create_vector_index still gated on lance#6666.

* fix(storage): skip Lance auto-cleanup on commit paths for legacy datasets

Addresses PR #229 review (Codex P1). `WriteParams::auto_cleanup` is create-time
config with no effect on existing datasets (Lance write.rs docs), so the previous
`auto_cleanup: None` change alone did NOT protect graphs created before the v7
bump: 6.0.1 defaulted auto_cleanup ON, leaving `lance.auto_cleanup.*` config on
those datasets, and Lance's per-commit hook (io/commit.rs: `if
!commit_config.skip_auto_cleanup`) fires off that stored config — so omnigraph's
own writes would GC versions the __manifest pins for snapshots/time-travel.

Skip the hook on every commit path, covering new and legacy datasets alike:
- commit_staged: CommitBuilder::with_skip_auto_cleanup(true) — the staged data path.
- __manifest publisher: MergeInsertBuilder::skip_auto_cleanup(true).
- all 11 WriteParams: skip_auto_cleanup: true (direct Dataset::write/append paths;
  auto_cleanup: None retained so new datasets store no cleanup config at all).

Tests:
- lance_surface_guards::skip_auto_cleanup_suppresses_version_gc — substrate:
  negative control (config GCs v1 without skip) + with-skip survival.
- staged_writes::commit_staged_skips_auto_cleanup_so_pinned_versions_survive —
  omnigraph usage: commit_staged on a legacy-config dataset preserves the pinned
  create version.

Refs lance#6755.

* test(lance): assert created_at-preserved + updated_at-bumped on merge_insert UPDATE

Addresses PR #229 review follow-up. `lance_merge_insert_update_preserves_created_at_version`
documented (in a comment) that a merge_insert UPDATE preserves created_at and
bumps updated_at, but only asserted the value change — leaving the change-feed
invariant unguarded. Add the two missing assertions:

- bob created_at == v1 (preserved across UPDATE; what the test name promises;
  lance#6774 only changed INSERT-row stamping).
- bob updated_at == v2 (bumped to the commit version) — the invariant
  OmniGraph's insert/update classification relies on (changes/mod.rs keys on
  _row_last_updated_at_version). A regression here would silently drop updates
  from the diff/change feed.
											
										
										
											2026-06-14 20:42:24 +02:00
+								              Lance 7.x         ── columnar Arrow, fragments, per-dataset versions/branches, indexes
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								                      │
 								                      ▼
 								        Object store (file / s3 / RustFS / MinIO / S3-compat)
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
+								```
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								Full diagram and concurrency model: [docs/dev/architecture.md](docs/dev/architecture.md).
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
 								---
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								## Where to find each topic
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								| Area | Read |
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
+								|---|---|
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								| **User docs entry point (public CLI/API/operator docs)** | **[docs/user/index.md](docs/user/index.md)** |
 								| **Developer docs entry point (architecture, invariants, testing, internals)** | **[docs/dev/index.md](docs/dev/index.md)** |
 								| **Architectural invariants & deny-list (read before any non-trivial proposal or review)** | **[docs/dev/invariants.md](docs/dev/invariants.md)** |
 								| **Lance docs index — fetch upstream Lance docs by problem domain** | **[docs/dev/lance.md](docs/dev/lance.md)** |
 								| **Test coverage map — what's covered, what helpers to reuse, before-every-task checklist** | **[docs/dev/testing.md](docs/dev/testing.md)** |
 								| Architecture, L1/L2 framing, concurrency model | [docs/dev/architecture.md](docs/dev/architecture.md) |
-												docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-14 13:53:46 +03:00
+								| Storage layout, `__manifest` schema, URI schemes, S3 env vars | [docs/user/concepts/storage.md](docs/user/concepts/storage.md) |
 								| `.pg` schema language, types, constraints, annotations, migration planning | [docs/user/schema/index.md](docs/user/schema/index.md) |
 								| Schema-lint codes (`OG-XXX-NNN`), families, severity, suppression | [docs/user/schema/lint.md](docs/user/schema/lint.md) |
 								| `.gq` query language, MATCH/RETURN/ORDER, IR ops, lint codes | [docs/user/queries/index.md](docs/user/queries/index.md) |
 								| Mutations — insert/update/delete, D2, atomicity | [docs/user/mutations/index.md](docs/user/mutations/index.md) |
 								| Search funcs (`nearest`/`bm25`/`rrf`), hybrid ranking | [docs/user/search/index.md](docs/user/search/index.md) |
 								| Indexes (BTREE / inverted / vector / graph topology) | [docs/user/search/indexes.md](docs/user/search/indexes.md) |
-												refactor(compiler): remove dead OpenAI embedding client (RFC-012 Phase 1)

The compiler-side EmbeddingClient (OpenAI/`text-embedding-3-small`) was pub(crate), #![allow(dead_code)], and had zero callers anywhere in the workspace; the live nearest("string") path and the offline `omnigraph embed` CLI both use the engine Gemini client. It carried the only NANOGRAPH_* env vars (vestigial 'nanograph os' naming) and was the sole user of reqwest+tokio in omnigraph-compiler — dropping them removes an HTTP client and async runtime from a crate that advertises 'Zero Lance dependency' (invariant 11).

Rewrites docs/user/search/embeddings.md to the single-client reality and corrects the false 'engine embeds @embed at ingest' claim. Verified: build green, 238 compiler tests pass, `rg NANOGRAPH_` empty.

											
										
										
											2026-06-15 15:07:54 +02:00
+								| Embeddings (engine client, env vars, `@embed`) | [docs/user/search/embeddings.md](docs/user/search/embeddings.md) |
-												docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-14 13:53:46 +03:00
+								| Concepts — what OmniGraph is, L1/L2 framing | [docs/user/concepts/index.md](docs/user/concepts/index.md) |
 								| Quickstart — init → load → query → branch | [docs/user/quickstart.md](docs/user/quickstart.md) |
 								| Branches, commit graph, system branches | [docs/user/branching/index.md](docs/user/branching/index.md) |
 								| Snapshots & time travel | [docs/user/branching/time-travel.md](docs/user/branching/time-travel.md) |
 								| Three-way merge and conflict kinds (user-facing) | [docs/user/branching/merge.md](docs/user/branching/merge.md) |
 								| Transactions and atomicity (per-query atomic; branches as multi-query transactions) | [docs/user/branching/transactions.md](docs/user/branching/transactions.md) |
-												docs: rename runs.md/runs.rs → writes and repoint all references (#131)

The Run state machine was removed in MR-771 (v0.4.0); `docs/dev/runs.md`
and `crates/omnigraph/tests/runs.rs` have since documented and tested the
direct-publish write path, so the "runs" name was misleading.

- git mv docs/dev/runs.md → docs/dev/writes.md (reframe H1 + intro;
  keep MR-771 history note)
- git mv crates/omnigraph/tests/runs.rs → tests/writes.rs (reframe header)
- repoint every runs.md / runs.rs reference across docs, AGENTS.md, and
  source comments
- fix four pre-existing broken `docs/runs.md` links (the file never lived
  at that path) to `docs/dev/writes.md`
- fix the stale v0.4.0 anchor to the live section

No behavior change: every source edit is a comment. Engine builds and the
renamed test passes 25/25; scripts/check-agents-md.sh passes.

The run-removal cleanup itself (run_registry.rs guard, __run__ prefix) is
deferred to MR-770.
											
										
										
											2026-05-30 23:20:56 +02:00
+								| Direct-publish write path (staging, D2, recovery sidecars; the former Run state machine) | [docs/dev/writes.md](docs/dev/writes.md) |
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								| Three-way merge and conflict kinds | [docs/dev/merge.md](docs/dev/merge.md) |
-												docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-14 13:53:46 +03:00
+								| Diff / change feed (`diff_between`, `diff_commits`) | [docs/user/branching/changes.md](docs/user/branching/changes.md) |
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								| Query execution, mutation execution, bulk loader, `load` vs `ingest` | [docs/dev/execution.md](docs/dev/execution.md) |
-												docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-14 13:53:46 +03:00
+								| `optimize` (compaction) and `cleanup` (version GC) | [docs/user/operations/maintenance.md](docs/user/operations/maintenance.md) |
 								| Cluster operator guide (deploy/manage clusters, approvals, recovery, serving) | [docs/user/clusters/index.md](docs/user/clusters/index.md) |
 								| Cedar policy actions, scopes, CLI | [docs/user/operations/policy.md](docs/user/operations/policy.md) |
 								| HTTP server endpoints, auth, error model, body limits | [docs/user/operations/server.md](docs/user/operations/server.md) |
 								| CLI quick-start | [docs/user/cli/index.md](docs/user/cli/index.md) |
-												feat!: delete the legacy OmnigraphConfig + config migrate; finish the omnigraph.yaml docs sweep (#252)

* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI

The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.

No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(cli)!: remove the `config migrate` command and migrate.rs

`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.

- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
  dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(server)!: delete the legacy OmnigraphConfig type and load_config

With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).

- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
  only caller was its own test) is removed — server boot and the CLI both build
  registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
  (`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).

No behavior change — every deleted item was unreachable after the excision.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: purge the legacy omnigraph.yaml surface from the docs

Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.

- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
  single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
  two-surface-config passage (drop `config migrate`, the deprecation env vars,
  and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
  file)" section and the `config migrate` row; re-home the `policy` row, the
  bearer-token chain, the actor/format/param-precedence references, and the
  `--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
  cluster (`--cluster` + `cluster apply`); fix the policy examples to
  `--cluster`; replace the `## Config` omnigraph.yaml example with the
  operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
  `policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
  cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
  references.

RFCs, design specs, and prior release notes are left as historical records.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-15 22:31:29 +03:00
+								| CLI command surface and config schema (`~/.omnigraph/config.yaml`) | [docs/user/cli/reference.md](docs/user/cli/reference.md) |
-												docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-14 13:53:46 +03:00
+								| Audit / actor tracking | [docs/user/operations/audit.md](docs/user/operations/audit.md) |
 								| Error taxonomy and result serialization | [docs/user/operations/errors.md](docs/user/operations/errors.md) |
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								| Install (binary / Homebrew / source / channels) | [docs/user/install.md](docs/user/install.md) |
-												docs: onboarding-first README + in-repo agent skill + drop RustFS script (#257)

* docs: optimize README for dev onboarding; fix 0.7.0 staleness

The README's setup half drifted from the shipped 0.7.0 CLI and led with the
heaviest path (Docker + RustFS). This reworks it for fast, correct onboarding:

README.md
- New zero-dependency "Your first graph in 60 seconds" hero: a fully
  copy-pasteable local file-backed loop (schema → init → load → query → branch).
- Add a correct "Serve it" section (cluster apply + omnigraph-server --cluster);
  the server is cluster-only on main, so the old positional-URI boot is gone.
- Demote the RustFS bootstrap to "rehearse the S3 path locally"; reframe the
  storage bullet as "filesystem or any S3-compatible store (AWS S3, R2, MinIO,
  RustFS)" — RustFS is a provider, not a storage class.
- Fix crate/MCP descriptions (query/mutate/load, not read/change/ingest).

docs/user/quickstart.md
- Fix the query example: `read --name <q> … <uri>` is removed — the query name
  is positional and the graph is addressed with `--store` (`omnigraph query
  find_people --query queries.gq --store graph.omni`).

scripts/local-rustfs-bootstrap.sh
- Convert to cluster mode: write a cluster.yaml (storage: s3://…), then
  validate → import → apply, load the fixture into the derived root with the
  now-required --mode, and serve with `omnigraph-server --cluster`. The old
  flow (`load` without --mode, `omnigraph-server <URI>` positional boot) no
  longer works on a cluster-only server.

* docs: move agent skill into the repo, add agent-setup snippet, drop rustfs script

skills/omnigraph
- The operational skill (formerly `omnigraph-best-practices` in the cookbooks
  repo) now lives with the engine it documents, co-versioned. Renamed to
  `omnigraph`; repository metadata repointed here.
- Broadened the description to trigger on intent — storing/retrieving/querying
  knowledge, agent memory, building a knowledge graph, operating Omnigraph — as
  well as on CLI/artifact sightings (stays ≤1024 chars).
- Install: `npx skills add ModernRelay/omnigraph@omnigraph`.

README
- New "Set it up with an AI agent" paste snippet: installs the skill, reads the
  docs (URL), browses the cookbooks, and asks the user about a use case before
  standing up a first graph.
- "Agent skill & starter graphs" section points at skills/omnigraph + cookbooks.

Drop scripts/local-rustfs-bootstrap.sh
- Not CI-tested (so it rotted: it broke on the cluster-only migration — positional
  server boot, load without --mode), demoed the now-optional S3 path, and was the
  most fragile artifact in the repo. Replaced with a "Testing against S3 locally"
  guide in deployment.md (docker run RustFS/MinIO + AWS_* env + cluster-on-S3).
  README/AGENTS references updated.
											
										
										
											2026-06-16 11:48:13 +02:00
+								| Deployment (binary / container / S3-local testing / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) |
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								| CI / release workflows | [docs/dev/ci.md](docs/dev/ci.md) |
 								| Code ownership (CODEOWNERS source of truth, roles, regeneration) | [docs/dev/codeowners.md](docs/dev/codeowners.md) |
 								| Branch protection policy (declarative, applied via `scripts/apply-branch-protection.sh`) | [docs/dev/branch-protection.md](docs/dev/branch-protection.md) |
-												docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-14 13:53:46 +03:00
+								| Constants & tunables cheat sheet | [docs/user/reference/constants.md](docs/user/reference/constants.md) |
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								| Per-version release notes | [docs/releases/](docs/releases/) |
 								---
-												docs: lead AGENTS.md first principle with integrated-over-time framing

Reframes the first-principle section to lead with Winters' "engineering
is programming integrated over time" as the lens, keeping "minimize
ongoing liability" as the operative directive and folding in "complexity
should be earned." Adds a new Tiebreakers subsection with two rules
that the prior section lacked clean appeals for:

- correctness > simplicity > performance (lexicographic)
- reversibility shapes evidence demand (reversible → prod metrics over
  napkin math over RFCs; irreversible → RFC up-front)

Adds a Hyrum's-Law deny-list entry in both AGENTS.md and
docs/invariants.md §IX: shipping observable behavior is shipping a
contract, even when undocumented.

Net always-on context cost: ~7 lines. No renumbering of §I–VIII
invariants; Hyrum's Law lands in the deny-list to avoid breaking
back-references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-12 16:27:24 -07:00
+								## First principle: engineering is programming integrated over time
-												AGENTS.md: add 'minimize ongoing liability' as a first principle

The always-on rules are concretizations of a broader engineering posture
that wasn't stated explicitly. Add a short section that frames the spirit
behind those rules:

- One centralized detection point, not many heal hooks.
- One dispatcher, not branch-on-shape in every consumer.
- One canonical shape after migration, not forks on "old vs new".
- Three similar lines beats a premature abstraction.
- Delete dead paths when their last caller leaves.

Plus a forward-looking review prompt ("what do these paths look like after
5 more changes like this?") so the principle bites at design time, not
just at review time.

The internal-schema-version mechanism we just shipped is a concrete
application: one stamp + one dispatcher + one match arm per change, no
heal hooks scattered across the engine. Codify the pattern so future
work doesn't drift back to ad-hoc.

											
										
										
											2026-04-29 11:48:47 +00:00
-												docs: lead AGENTS.md first principle with integrated-over-time framing

Reframes the first-principle section to lead with Winters' "engineering
is programming integrated over time" as the lens, keeping "minimize
ongoing liability" as the operative directive and folding in "complexity
should be earned." Adds a new Tiebreakers subsection with two rules
that the prior section lacked clean appeals for:

- correctness > simplicity > performance (lexicographic)
- reversibility shapes evidence demand (reversible → prod metrics over
  napkin math over RFCs; irreversible → RFC up-front)

Adds a Hyrum's-Law deny-list entry in both AGENTS.md and
docs/invariants.md §IX: shipping observable behavior is shipping a
contract, even when undocumented.

Net always-on context cost: ~7 lines. No renumbering of §I–VIII
invariants; Hyrum's Law lands in the deny-list to avoid breaking
back-references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-12 16:27:24 -07:00
+								Software engineering is **programming integrated over time** (Winters, *Software Engineering at Google*). A line of code costs you at every future read, refactor, migration, and dependent change — not just at write-time. So the operative question for any change is: **which option has the lower ongoing liability?** Not "shorter now," not "fastest to ship," but which leaves the codebase narrower in the long run. **Complexity should be earned** — by demonstrated correctness, performance, or future-shape cost; never by speculation.
-												AGENTS.md: add 'minimize ongoing liability' as a first principle

The always-on rules are concretizations of a broader engineering posture
that wasn't stated explicitly. Add a short section that frames the spirit
behind those rules:

- One centralized detection point, not many heal hooks.
- One dispatcher, not branch-on-shape in every consumer.
- One canonical shape after migration, not forks on "old vs new".
- Three similar lines beats a premature abstraction.
- Delete dead paths when their last caller leaves.

Plus a forward-looking review prompt ("what do these paths look like after
5 more changes like this?") so the principle bites at design time, not
just at review time.

The internal-schema-version mechanism we just shipped is a concrete
application: one stamp + one dispatcher + one match arm per change, no
heal hooks scattered across the engine. Codify the pattern so future
work doesn't drift back to ad-hoc.

											
										
										
											2026-04-29 11:48:47 +00:00
-												AGENTS.md: reframe 'minimize ongoing liability' as a general decision lens

The previous bullets read like a migration pattern (centralized
dispatcher, one match arm, no shape forks). That's one application, not
the principle. Reframe it as a bidirectional decision lens: ask "which
option has lower ongoing cost over time?" and let the answer be more
code, less code, DRYing, duplication, removal, addition, a new
abstraction, or flattening one — whichever shape converges over the
expected change horizon.

Add explicit examples of cases where the lower-liability option is
*more* code (dispatcher, migration framework, typed error variants) and
where it's *less* (premature abstractions, "just in case" paths,
helpers that wedge independently-evolving callers together) so readers
don't collapse the principle into "minimize code".

											
										
										
											2026-04-29 12:32:30 +00:00
+								This is a decision lens, not a code-size rule. It cuts both ways. Sometimes the lower-liability option is:
-												AGENTS.md: add 'minimize ongoing liability' as a first principle

The always-on rules are concretizations of a broader engineering posture
that wasn't stated explicitly. Add a short section that frames the spirit
behind those rules:

- One centralized detection point, not many heal hooks.
- One dispatcher, not branch-on-shape in every consumer.
- One canonical shape after migration, not forks on "old vs new".
- Three similar lines beats a premature abstraction.
- Delete dead paths when their last caller leaves.

Plus a forward-looking review prompt ("what do these paths look like after
5 more changes like this?") so the principle bites at design time, not
just at review time.

The internal-schema-version mechanism we just shipped is a concrete
application: one stamp + one dispatcher + one match arm per change, no
heal hooks scattered across the engine. Codify the pattern so future
work doesn't drift back to ad-hoc.

											
										
										
											2026-04-29 11:48:47 +00:00
-												AGENTS.md: reframe 'minimize ongoing liability' as a general decision lens

The previous bullets read like a migration pattern (centralized
dispatcher, one match arm, no shape forks). That's one application, not
the principle. Reframe it as a bidirectional decision lens: ask "which
option has lower ongoing cost over time?" and let the answer be more
code, less code, DRYing, duplication, removal, addition, a new
abstraction, or flattening one — whichever shape converges over the
expected change horizon.

Add explicit examples of cases where the lower-liability option is
*more* code (dispatcher, migration framework, typed error variants) and
where it's *less* (premature abstractions, "just in case" paths,
helpers that wedge independently-evolving callers together) so readers
don't collapse the principle into "minimize code".

											
										
										
											2026-04-29 12:32:30 +00:00
+								- **More code.** A centralized dispatcher costs more lines than an ad-hoc heal hook, but each future change adds a match arm instead of a new hook scattered through the engine.
 								- **Less code.** Three similar lines that may diverge later cost less to maintain than a premature abstraction that has to be retrofitted every time a caller deviates.
 								- **DRYing.** Two copies of business logic that must stay in sync are a perpetual drift risk.
 								- **Duplication.** Two callers that look similar today but have independent evolution pressure shouldn't be wedged through a shared helper just because the lines match.
 								- **Removal.** A "just in case" code path with no caller is pure surface area: tests for it, docs that mention it, future changes that have to consider it.
 								- **Addition.** A migration framework, a typed error variant, a feature flag — each adds code now and lowers the cost of every future change in its surface.
 								- **A new abstraction**, when the absence forces every consumer to re-derive the same logic. Or **flattening one**, when the abstraction has accumulated more special-cases than the code it replaced.
 								When evaluating a design, ask: *"what does this look like after 5 more changes like it?"* If the answer is "this converges to one shape", cost is bounded. If it's "this forks every time", the option is mortgaging the future for present convenience — pick differently.
-												docs: lead AGENTS.md first principle with integrated-over-time framing

Reframes the first-principle section to lead with Winters' "engineering
is programming integrated over time" as the lens, keeping "minimize
ongoing liability" as the operative directive and folding in "complexity
should be earned." Adds a new Tiebreakers subsection with two rules
that the prior section lacked clean appeals for:

- correctness > simplicity > performance (lexicographic)
- reversibility shapes evidence demand (reversible → prod metrics over
  napkin math over RFCs; irreversible → RFC up-front)

Adds a Hyrum's-Law deny-list entry in both AGENTS.md and
docs/invariants.md §IX: shipping observable behavior is shipping a
contract, even when undocumented.

Net always-on context cost: ~7 lines. No renumbering of §I–VIII
invariants; Hyrum's Law lands in the deny-list to avoid breaking
back-references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-12 16:27:24 -07:00
+								### Tiebreakers when liability alone is silent
 								- **Correctness > simplicity > performance.** Lexicographic — give up performance for simpler code; give up simplicity for correct code; never give up correctness. The deny-list ("no silent failures," "no acks before durable persistence," "no reads of partial commits") is this rule's hard floor.
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								- **Reversibility shapes evidence demand.** Reversible changes wait for evidence: prefer prod metrics over napkin math over RFCs. Irreversible changes (substrate choice, on-disk format, database guarantees) earn an RFC, because by the time prod tells you they were wrong, you've shipped years of dependent code. Reviewers should spot both failure modes — RFC-ing a one-line config, and measuring-your-way into a substrate decision.
-												docs: lead AGENTS.md first principle with integrated-over-time framing

Reframes the first-principle section to lead with Winters' "engineering
is programming integrated over time" as the lens, keeping "minimize
ongoing liability" as the operative directive and folding in "complexity
should be earned." Adds a new Tiebreakers subsection with two rules
that the prior section lacked clean appeals for:

- correctness > simplicity > performance (lexicographic)
- reversibility shapes evidence demand (reversible → prod metrics over
  napkin math over RFCs; irreversible → RFC up-front)

Adds a Hyrum's-Law deny-list entry in both AGENTS.md and
docs/invariants.md §IX: shipping observable behavior is shipping a
contract, even when undocumented.

Net always-on context cost: ~7 lines. No renumbering of §I–VIII
invariants; Hyrum's Law lands in the deny-list to avoid breaking
back-references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-12 16:27:24 -07:00
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								The always-on rules below and the deny-list in [docs/dev/invariants.md](docs/dev/invariants.md) are specific applications of this principle; when the rules are silent, fall back to it.
-												AGENTS.md: add 'minimize ongoing liability' as a first principle

The always-on rules are concretizations of a broader engineering posture
that wasn't stated explicitly. Add a short section that frames the spirit
behind those rules:

- One centralized detection point, not many heal hooks.
- One dispatcher, not branch-on-shape in every consumer.
- One canonical shape after migration, not forks on "old vs new".
- Three similar lines beats a premature abstraction.
- Delete dead paths when their last caller leaves.

Plus a forward-looking review prompt ("what do these paths look like after
5 more changes like this?") so the principle bites at design time, not
just at review time.

The internal-schema-version mechanism we just shipped is a concrete
application: one stamp + one dispatcher + one match arm per change, no
heal hooks scattered across the engine. Codify the pattern so future
work doesn't drift back to ad-hoc.

											
										
										
											2026-04-29 11:48:47 +00:00
 								---
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								## Always-on rules (load these into your working memory)
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								These are architectural rules that need to be in scope on every change. They're framed at the level that survives renames and refactors — the deeper implementation specifics (function names, lock names, branch-prefix conventions, enforcement points) live in the per-area docs and may evolve. The full architectural invariants and deny-list are in [docs/dev/invariants.md](docs/dev/invariants.md); the deny-list is the fastest first-pass when reviewing any change.
-												Trim always-on rules to architectural-level invariants

Drops four rules whose phrasing leaned on implementation specifics
(nearest LIMIT, __run__<id> branches, __schema_apply_lock__,
branch_list filter convention) — those are real constraints, but they
live at the implementation layer and would go stale if internals are
renamed or refactored. The architectural intent is captured by the
remaining six rules and by the per-area docs.

Reframes the kept rules at the survives-a-rename level: "multi-dataset
publish is atomic across the whole graph" rather than naming the
manifest table or the publisher type, etc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:38:24 +02:00
 . **Multi-dataset publish is atomic across the whole graph.** A graph commit flips every relevant sub-table version visible together, in one manifest write. Don't introduce code paths that publish per sub-table outside the unified publish path — that loses cross-table snapshot isolation.
 . **Snapshot isolation per query.** A query holds one snapshot for its lifetime. Don't re-read the current head mid-query.
 . **Mutations are atomic at the commit boundary.** Multi-statement change queries publish one commit. Don't commit per-statement.
 . **Bearer-token plaintext never persists in process memory.** Tokens are hashed at startup; auth uses constant-time comparison; the actor id is server-resolved from the hash match and must not be settable by the client.
 . **Reads always see the current index state for the branch they're reading.** Indexes track the branch head, not historical snapshots. If you change index lifecycle, preserve this guarantee.
 . **Stable type IDs survive renames.** Schema migration relies on identity that's stable across rename — don't mint new IDs on rename.
-												Index materialization is derived state: defer off the write path, reconcile via optimize (iss-848) (#246)

* test(engine): reproduce empty-table Vector @index aborting schema apply

A Vector (IVF) index trains k-means centroids over the column, so Lance
cannot build it on 0 vectors ("Creating empty vector indices with
train=False is not yet implemented"). schema apply reconciles a table's
whole index set whenever any @index on it changes, so adding an unrelated
scalar @index materializes the dormant empty vector index and aborts the
entire migration (all-or-nothing).

This regression test inits a 0-row Doc with a Vector @index, adds a scalar
@index, and asserts the apply succeeds (then loads one embedded row and
asserts the deferred index materializes). It fails today at the apply step
with the vector-index abort; the fix lands in the next commit.

Refs dev-graph iss-empty-vector-index-schema-apply, iss-848.

* fix(engine): defer Vector @index on an empty table instead of aborting schema apply

build_indices_on_dataset_for_catalog materialized a declared Vector @index
unconditionally. On a 0-row table Lance cannot train the IVF index
("Creating empty vector indices with train=False is not yet implemented"),
so any later migration that touches the table (e.g. adding an unrelated
scalar @index, which reconciles the table's whole index set) aborted the
entire migration on the dormant vector index — all-or-nothing.

Guard the vector arm with a row-count check, matching the guard
ensure_indices_for_branch and the branch-merge rebuild already use: an
untrainable column becomes a pending index that a later ensure_indices /
optimize materializes once the table has rows. Reads stay correct meanwhile
(vector search degrades to a brute-force scan).

Stop-gap: the residual rows-present-but-vectors-null window and the full
decoupling (intent recorded at apply, an idempotent coverage reconciler)
are dev-graph iss-848. Turns the green half of the regression test added in
the previous commit.

Refs dev-graph iss-empty-vector-index-schema-apply, iss-848, iss-687.

* docs(invariants): record the logical-contract-over-physical-state principle

The bug class behind the empty-table vector-index abort (and the schema-apply
vs optimize version drift) is one shape: a physical operation allowed to fail
a logical one. Several hard invariants (2, 5, 7, 13) and deny-list items are
already instances of this, but the unifying rule was never written down.

Add it to docs/dev/invariants.md as a "Governing principle" section above the
hard invariants, naming which invariants and deny-list items instantiate it
and the smell to watch for (a logical operation gated on a physical fact).
Add a one-line always-on rule (7) in AGENTS.md so it stays in working memory,
with the qualifier that genuine logical conflicts still fail loudly — the
licence to lag covers physical convergence, not correctness.

Audience-neutral: no private ticket refs. check-agents-md.sh passes.

* test(engine): index build must tolerate rows with null vectors (load-before-embed)

Loading rows whose vector column is null into a `Vector @index` table fails
today: build_indices (reached via the loader's prepare_updates_for_commit)
calls create_vector_index, and Lance's IVF KMeans errors "cannot train 1
centroids with 0 vectors". The same abort hits ensure_indices/optimize/schema
apply/merge, since they all funnel through build_indices_on_dataset_for_catalog.

This test loads two null-embedding rows and calls ensure_indices; it must not
abort (the untrainable vector column is deferred, sibling indexes still build).
Fails today at the load step; fixed in the next commit.

Refs dev-graph iss-848, iss-empty-vector-index-schema-apply.

* fix(engine): defer unbuildable index columns instead of aborting the write path

build_indices_on_dataset_for_catalog is the chokepoint every write path funnels
through (load/mutate via prepare_updates_for_commit, schema apply, ensure_indices,
optimize, branch merge). Its vector arm called create_vector_index
unconditionally, so a column with no trainable vectors yet — an empty table, or
rows loaded before `embed` populates them — aborted the whole operation with
Lance's IVF KMeans error.

Fault-isolate the vector build: on failure, record the column as a PendingIndex
(table, column, reason), log it, and continue building the sibling indexes; a
later ensure_indices/optimize materializes it once the column is trainable, and
reads use brute-force meanwhile. Manifest/CAS/IO errors at the publish boundary
still propagate. Isolating at the single chokepoint realizes the governing
principle (physical index state never fails a logical operation) for every write
path, and supersedes the earlier symptomatic count_rows==0 stop-gap (removed) —
closing the residual rows-present-but-vectors-null window it left open.

Surfacing pending index status rather than failing is the database norm
(Postgres indisvalid, LanceDB list_indices). ensure_indices and the build_indices
wrappers now return Vec<PendingIndex>; optimize surfaces it in a later commit.

Refs dev-graph iss-848, iss-951 (vector index stays inline-commit until lance#6666).

* test(engine): index-only schema apply must not touch table data

Adding an @index to an existing column should be a pure metadata change once
index materialization moves to the reconciler (iss-848): the apply records the
intent in the catalog/IR but builds nothing inline, so the table's manifest
version is unchanged. Today the indexed_tables block builds the index inline
and bumps the version (4 -> 5). Fixed in the next commit.

Refs dev-graph iss-848.

* fix(engine): schema apply records index intent only; index-only apply is metadata

Schema apply no longer builds indexes inline. The four build_indices calls
(added/renamed/rewritten/index-only tables) are removed; the @index/@key intent
is already persisted in the catalog/IR the apply writes, and the physical index
is materialized off the critical path by ensure_indices/optimize (iss-848).

Concretely:
- AddConstraint (an @index addition — every other added constraint plans as
  UnsupportedChange) becomes a pure metadata step alongside the metadata-only
  steps: it touches no table data, so the table version is unchanged.
- added/renamed/rewritten tables still write their data; only the trailing
  index build is gone. The rewritten table's coverage is restored later by
  optimize_indices.
- recovery_pins drops index-only tables (they no longer advance Lance HEAD) and
  keeps rewritten tables; their post_commit_pin = expected+1 is now exact (one
  rewrite commit), strengthening recovery classification.
- the now-orphaned Omnigraph::build_indices_on_dataset_for_catalog wrapper is
  removed.

A migration can no longer abort on an index build, for any index type at any
cardinality. Turns the green half of index_only_constraint_apply_touches_no_table_data.

Refs dev-graph iss-848.

* test(engine): optimize must converge a declared-but-unbuilt index

After iss-848, adding an @index post-data is a metadata-only apply that defers
the physical build, so the column is declared-indexed but unbuilt (reads scan).
`optimize` — the operator's cron reconciler — must materialize it. Today optimize
only maintains coverage of EXISTING indexes (optimize_indices) and never creates
missing ones, so the rank BTREE stays Degraded after optimize. Fixed next commit.

Refs dev-graph iss-848.

* fix(engine): optimize materializes declared-but-unbuilt indexes (the reconciler)

`omnigraph optimize` is the operator's cron reconciler. It already compacts and
folds new fragments into EXISTING indexes (optimize_indices); now it also builds
declared-but-missing indexes, so the indexes schema apply / load defer (iss-848)
converge on the next optimize.

Done inside optimize_one_table (not by composing the all-tables ensure_indices,
which is drift-blind and would re-publish the uncovered HEAD>manifest drift that
optimize deliberately skips): after the per-table drift/blob skips and under the
queue + Optimize sidecar already held, a needs_index_create gate (reusing
needs_index_work_node/edge — "declared index missing AND row_count > 0", so empty
tables stay no-ops) admits index-only work, and Phase B builds the missing index
over the just-compacted layout via the build chokepoint. An untrainable vector
column fault-isolates into the new TableOptimizeStats.pending_indexes (the
list_indices/indisvalid analog operators read), not a failure. committed now
reflects index commits, so the existing post-publish cache invalidation covers
them. LanceDB's optimize only maintains existing indexes; creating
declared-but-missing ones is the L2 behavior omnigraph's declarative @index needs.

Turns the green half of optimize_materializes_index_declared_but_unbuilt.

Refs dev-graph iss-848.

* docs: index materialization is deferred to the reconciler (iss-848)

Update the index-lifecycle docs to reflect the new contract: @index/@key
declares intent and the physical index is derived state that never fails a
logical operation. Schema apply builds nothing (records intent only);
load/mutate build inline through one chokepoint that defers an untrainable
Vector column as pending; optimize/ensure_indices is the reconciler that
creates declared-but-missing indexes and maintains coverage, reporting
still-pending columns.

Touches: dev/invariants.md (truth-matrix Index-lifecycle row), AGENTS.md
(capability matrix), user/search/indexes.md (L2 orchestration), user/operations/
maintenance.md (optimize reconciler bullet), dev/testing.md (new tests).

* test(server): schema_apply_route_can_add_index reflects deferred index build

iss-848 made schema apply record @index intent without building the physical
index inline. The route test asserted the index count increased after apply;
on an empty graph it now stays unchanged (the build is deferred to
ensure_indices/optimize). Assert the new contract: apply succeeds and the
physical index count is unchanged.

* fix(engine): precheck vector trainability — don't pin or swallow (PR review)

Two issues Cursor Bugbot caught in the chokepoint fault-isolation:

1. (HIGH) Pending vector pins roll back siblings. needs_index_work_node counted
   a missing vector index as work whenever the table had rows, so a column with
   no trainable vectors got pinned in the EnsureIndices recovery sidecar — but
   the build deferred it (zero commit). On a crash before manifest publish the
   classifier sees NoMovement and the all-or-nothing decision (recovery.rs
   decide()) rolls back the WHOLE sidecar, undoing a sibling table's committed
   index work.
2. (MED) Vector build swallowed fatal errors. The match arm converted every
   create_vector_index error into a deferred PendingIndex, hiding genuine
   I/O/manifest/Lance failures as "pending".

Fix both with one trainability precheck (vector_column_trainable: >=1 non-null
vector, the ivf_flat(1) minimum) used identically by needs_index_work_node and
the build arm: an untrainable column is never counted as work (so never pinned —
no zero-commit pin) and never attempted (so it can't fail); only a trainable
column is built, and then any error PROPAGATES (stays fatal). The deferred
column is still recorded as a PendingIndex with a clear reason.

Refs dev-graph iss-848.

* feat(cli): surface pending index column + reason in optimize output (PR review)

Codex (P2): pending_indexes was documented as visible in `optimize --json` but
the CLI projection never emitted it — operators would lose the only signal that
optimize has deferred index work. Greptile (P2): the stat dropped the reason, so
operators saw which column was stuck, not why.

Carry the reason: TableOptimizeStats.pending_indexes is now Vec<PendingIndex>
(column + reason), and `omnigraph optimize --json` emits {column, reason} per
pending index; human output prints a "↳ index pending on '<col>': <reason>" line.

Refs dev-graph iss-848.

* test: align CLI index-add test with deferred build; cover post-rename reconcile

- schema_apply_json_adds_index_for_existing_property (cli_schema_config.rs): the
  CLI analog of the server test — asserted the index count grew after apply;
  under iss-848 the apply defers the build, so the count is unchanged on an
  empty graph. Assert the deferred contract. (The only full-suite failure.)
- optimize_materializes_index_after_type_rename (maintenance.rs, new): covers
  the gap Greptile flagged — a RenameType writes the renamed table with rows but
  no indexes (inline build removed in Commit B); assert the rank index is
  Degraded post-rename and Indexed after optimize reconciles it.

Refs dev-graph iss-848.

* test(engine): in-source apply tests reflect deferred index materialization

The two db::omnigraph in-source unit tests asserted the old "schema apply builds
/ preserves indexes inline" behavior (the only remaining full-suite failures):

- test_apply_schema_defers_index_then_reconciler_builds_it (was
  test_apply_schema_adds_index_for_existing_property): apply records the @index
  intent but builds nothing; assert the BTREE on `age` is absent after apply and
  present after ensure_indices. (Uses `age`, unindexed in TEST_SCHEMA — `name
  @key` is already FTS-indexed at seed.)
- test_apply_schema_rewrite_defers_index_then_reconciler_restores (was
  test_apply_schema_rewrite_preserves_existing_indices): an AddProperty rewrite
  no longer rebuilds indexes inline; assert ensure_indices restores id BTREE +
  name FTS after the rewrite.

Verified by grep that these + the server/CLI tests are the complete set of
"apply builds an index" assertions; all other index-presence tests run after
load/ensure_indices/primitives, which still build.

Refs dev-graph iss-848.

* fix(engine): optimize always reports pending indexes, not only on create-work (PR review)

Cursor Bugbot (MED): pending_indexes was filled only when needs_index_create was
true, but the vector trainability precheck makes needs_index_work_node exclude an
untrainable Vector column. So a table whose sole missing index is untrainable, but
which optimize still compacts or reindexes, returned an empty pending_indexes —
contradicting the documented operator contract for deferred columns.

Run the (idempotent) build chokepoint unconditionally once past the no-op gate,
rather than gating it on needs_index_create. It skips existing indexes, builds
any buildable missing one, and reports an untrainable column as pending whether
the table entered for compaction, reindex, or index creation. needs_index_create
still gates the no-op decision (so an index-only table still enters the path).

Refs dev-graph iss-848.

* test(engine): reframe staged-BTREE-failure failpoint onto the reconciler path

ensure_indices_stage_btree_failure_leaves_existing_tables_writable fired
`ensure_indices.post_stage_pre_commit_btree` and expected `apply_schema` (adding
a type) to fail mid-BTREE-build. iss-848 removed apply's inline index build, so
that apply now succeeds and the test's unwrap_err panicked — it exercised a
removed code path.

Reframe onto where BTREE builds happen now: seed Person, add an `@index` on
`age` (apply records intent, defers the build), then `ensure_indices` builds the
deferred BTREE and the failpoint fires between stage and commit. Person's HEAD
is unchanged (no drift) and its EnsureIndices sidecar pins NoMovement; a write to
a different, unpinned table (Company) is unaffected (mutations/loads heal
roll-forward and proceed, unlike optimize/repair which refuse on a pending
sidecar). Preserves the original coverage (staged-index stage failure leaves
other tables writable, no drift) in the new architecture.

Refs dev-graph iss-848.

* feat(server): converge deferred indexes promptly after schema apply (iss-848)

Schema apply records @index intent but defers the physical build. On a
long-lived server, spawn a detached best-effort ensure_indices after a
successful apply so the indexes converge promptly instead of waiting for the
operator's next optimize. Fire-and-forget: it never blocks or fails the apply
response, and a failure is logged (the index still converges on the next
optimize). Guarded on result.applied. The CLI is one-shot, so it has no
equivalent; its convergence path is the optimize cadence.

handle.engine is already an Arc, so the spawn takes an owned clone. Convergence
itself is covered by the engine ensure_indices/optimize tests; the existing
empty-graph schema-apply route tests confirm the response is unaffected (the
spawn is a read-only no-op on an empty table).

Refs dev-graph iss-848.

* docs(maintenance): list pending_indexes in optimize per-table stats (consistency)
											
										
										
											2026-06-15 18:48:43 +02:00
+. **Logical contract over physical state.** Physical state (index coverage, fragment layout, compaction versions, staged writes) is derived and rebuildable; it must never fail a logical operation. Check preconditions against logical state and let reconciliation converge the physical state idempotently — genuine logical conflicts still fail loudly. This is the rule rules 1–6 instantiate; full statement and applications in [docs/dev/invariants.md](docs/dev/invariants.md).
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								### Deny-list (fast-pass review filter — full reasoning in [docs/dev/invariants.md](docs/dev/invariants.md))
-												Add architectural invariants & deny-list as docs/invariants.md

A standing reference for invariants that hold across storage, engine,
server, schema, indexing, observability, and the OSS/Cloud split. Used
to check RFCs and PRs against the substrate boundaries (don't rebuild
what Lance gives us), layering rules (one trait boundary per layer),
distributability constraints (Send+Sync, location-neutral IR), honesty
expectations (estimate-vs-actual, bounded failure modes), unified
patterns (reconciler, Union polymorphism, SIP, factorize), the §IX
deny-list, and the §X review checklist.

§IV (additivity / migration) and §VIII (OSS/Cloud kernel-product split)
are referenced but not yet drafted — flagged as placeholders pending
upstream fill-in.

AGENTS.md surfaces it from the topic index, the always-on rules
section, and the maintenance contract; the deny-list is also inlined
there as a fast-pass review filter so it stays in scope every turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:34:44 +02:00
 								If a proposal fits one of these, the burden is on the proposer to justify why this case is the exception:
 								- Synchronous-inline index updates for indexes expensive to build (vector ANN, FTS) — use the reconciler pattern.
 								- Custom WAL / transaction manager / buffer pool — Lance owns these.
 								- Job queue for state derivable from manifest — reconciler pattern instead.
 								- Per-feature lowering for shapes that share a structure (interfaces, wildcards, alternation) — use one mechanism.
 								- Eager materialization of cross-products in multi-hop — factorize; flatten only when needed.
 								- Ad-hoc IN-list filtering when SIP fits.
 								- String-flattened SQL filter generation when structured pushdown is available.
 								- In-process-only `Dataset` impls — `Send + Sync`, remote descriptors.
 								- Cost-blind plan choice — lowering-order execution is not a planner.
 								- Hidden statistics — if a metric matters for plan choice, it must be exposed through the trait surface.
 								- Side-channels for query semantics — search modes, mutations, polymorphism are first-class IR concepts.
 								- Discarding rank in retrieval — score and rank propagate as columns.
 								- State that drifts from the manifest — derive from observable state.
 								- Cloud-only correctness fixes — correctness is always OSS.
 								- Forking the codebase for Cloud — trait-extension only.
 								- Hand-rolling something Lance already does — check the spec first.
 								- Mutating in place state that should be immutable (Lance fragments, index segments) — new segments instead.
 								- Silent failures — OOM, timeout, partial result must all be surfaced and bounded.
-												docs: lead AGENTS.md first principle with integrated-over-time framing

Reframes the first-principle section to lead with Winters' "engineering
is programming integrated over time" as the lens, keeping "minimize
ongoing liability" as the operative directive and folding in "complexity
should be earned." Adds a new Tiebreakers subsection with two rules
that the prior section lacked clean appeals for:

- correctness > simplicity > performance (lexicographic)
- reversibility shapes evidence demand (reversible → prod metrics over
  napkin math over RFCs; irreversible → RFC up-front)

Adds a Hyrum's-Law deny-list entry in both AGENTS.md and
docs/invariants.md §IX: shipping observable behavior is shipping a
contract, even when undocumented.

Net always-on context cost: ~7 lines. No renumbering of §I–VIII
invariants; Hyrum's Law lands in the deny-list to avoid breaking
back-references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-12 16:27:24 -07:00
+								- Shipping observable behavior as if it weren't part of the contract — output ordering, error-message text, timestamp precision, default-flag values, latency profile. Per Hyrum's Law, every observable behavior gets depended on once shipped; don't expose what you don't want to commit to.
-												Add architectural invariants & deny-list as docs/invariants.md

A standing reference for invariants that hold across storage, engine,
server, schema, indexing, observability, and the OSS/Cloud split. Used
to check RFCs and PRs against the substrate boundaries (don't rebuild
what Lance gives us), layering rules (one trait boundary per layer),
distributability constraints (Send+Sync, location-neutral IR), honesty
expectations (estimate-vs-actual, bounded failure modes), unified
patterns (reconciler, Union polymorphism, SIP, factorize), the §IX
deny-list, and the §X review checklist.

§IV (additivity / migration) and §VIII (OSS/Cloud kernel-product split)
are referenced but not yet drafted — flagged as placeholders pending
upstream fill-in.

AGENTS.md surfaces it from the topic index, the always-on rules
section, and the maintenance contract; the deny-list is also inlined
there as a fast-pass review filter so it stays in scope every turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:34:44 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								---
-												docs: rewrite README opening + add AGENTS.md dev commands (#122)

* docs(agents): add build/test/lint dev-command section

AGENTS.md (CLAUDE.md) covered architecture and invariants but had no
developer command surface — only runtime `omnigraph` CLI usage. Add a
concise "Build, test, lint" section with the non-obvious gotchas:

- crate dir `crates/omnigraph` is package `omnigraph-engine` (the `-p` name)
- canonical CI gate is `cargo test --workspace --locked`
- how to run one file / one fn
- feature-gated suites (`failpoints`, server `aws`)
- S3 tests skip without `OMNIGRAPH_S3_TEST_BUCKET`
- the two non-test CI checks (check-agents-md, OpenAPI drift)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(readme): rewrite opening, dedupe, fix stale references

- New manifesto-style opening (tagline, X-as-code, features, core
  use cases, coordination-layer line); drop the old prose intro,
  Use Cases, and Capabilities sections.
- Remove Capabilities, which restated the new opening line-for-line.
- Harmonize heading case: "## Core Use Cases".
- Dedupe the verbatim Slack invite (kept the Community section) and
  the double-linked cli.md (kept the contextual pointer).
- Fix stale references that no longer match the code:
  - drop "transactional runs" / "and runs" — no run concept remains;
    writes are atomic per-query, multi-query workflows use branches.
  - update the CLI crate command list to canonical query/mutate plus
    commit/lint/optimize/cleanup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
											
										
										
											2026-05-30 00:53:42 +01:00
+								## Build, test, lint
 								Rust stable workspace (edition 2024). `protoc` is a build dependency (`brew install protobuf` / `apt-get install protobuf-compiler libprotobuf-dev`). **Crate dir ≠ package name** for the engine: the directory is `crates/omnigraph` but its Cargo package is `omnigraph-engine` (use that in `-p`). The CLI binary built from `omnigraph-cli` is named `omnigraph`.
 								```bash
 								cargo build --workspace --locked              # build everything
 								cargo test  --workspace --locked              # the canonical CI gate (matches CI exactly)
 								cargo run -p omnigraph-cli -- <args>          # run the `omnigraph` CLI from source
-												[codex] fix RFC-011 follow-up regressions (#258)

* fix rfc-011 follow-up regressions

* test(cli): remove served schema-apply tests obsoleted by the cluster 409

This PR disables server-side schema apply for cluster-backed serving (409 →
`omnigraph cluster apply`). Two system_local tests still drove *served* schema
apply against a spawned `--cluster` server and asserted the pre-409 behavior, so
they failed under `cargo test --workspace`:

- `local_cli_schema_apply_enforces_engine_layer_policy` — expected a per-actor
  policy `denied`/allow on the served route; the route now 409s for everyone
  before policy runs.
- `local_cli_schema_apply_rejects_stored_query_breakage_before_publish` —
  expected a served apply to reject a stored-query breakage; the route now 409s
  before any apply.

Both exercise a path the PR intentionally removed. Their surviving coverage:
the 409 itself is pinned by `schema_routes::schema_apply_route_refuses_cluster_backed_server_mode`
(asserts 409 + no mutation); stored-query-breakage-before-publish stays covered
by `schema_routes::schema_apply_route_rejects_stored_query_breakage_before_publish`
(single-mode); engine-layer schema_apply Cedar enforcement stays covered by
`policy_engine_chassis`. Remove the obsolete served versions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(server): report the cluster-backed schema-apply 409 after the Cedar gate

The 409 ("schema apply is disabled for cluster-backed serving") fired at the top
of `server_schema_apply`, before `authorize_request`. An authenticated-but-
unauthorized actor therefore learned the server is cluster-backed (409) instead
of getting a normal 403 — leaking topology before authorization, against the
same posture that keeps `GET /graphs` default-deny.

Move the 409 below the Cedar gate so the route reports 401 → 403 → 409: an
unauthorized actor gets 403, and only an actor authorized for `schema_apply`
sees the actionable "use `omnigraph cluster apply`" 409. (An open/unauthenticated
server still 409s, as it has no topology to protect.)

Regression: `schema_apply_route_cluster_backed_denies_unauthorized_actor_before_409`
(POLICY_YAML grants no schema_apply → act-ragnor gets 403, not 409). Addresses the
bot-review finding on #258.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-16 03:11:43 +03:00
+								cargo run -p omnigraph-server -- --cluster <dir|s3://...> --bind 0.0.0.0:8080   # run the server from source
-												docs: rewrite README opening + add AGENTS.md dev commands (#122)

* docs(agents): add build/test/lint dev-command section

AGENTS.md (CLAUDE.md) covered architecture and invariants but had no
developer command surface — only runtime `omnigraph` CLI usage. Add a
concise "Build, test, lint" section with the non-obvious gotchas:

- crate dir `crates/omnigraph` is package `omnigraph-engine` (the `-p` name)
- canonical CI gate is `cargo test --workspace --locked`
- how to run one file / one fn
- feature-gated suites (`failpoints`, server `aws`)
- S3 tests skip without `OMNIGRAPH_S3_TEST_BUCKET`
- the two non-test CI checks (check-agents-md, OpenAPI drift)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(readme): rewrite opening, dedupe, fix stale references

- New manifesto-style opening (tagline, X-as-code, features, core
  use cases, coordination-layer line); drop the old prose intro,
  Use Cases, and Capabilities sections.
- Remove Capabilities, which restated the new opening line-for-line.
- Harmonize heading case: "## Core Use Cases".
- Dedupe the verbatim Slack invite (kept the Community section) and
  the double-linked cli.md (kept the contextual pointer).
- Fix stale references that no longer match the code:
  - drop "transactional runs" / "and runs" — no run concept remains;
    writes are atomic per-query, multi-query workflows use branches.
  - update the CLI crate command list to canonical query/mutate plus
    commit/lint/optimize/cleanup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
											
										
										
											2026-05-30 00:53:42 +01:00
 								# Run one crate / one test file / one test fn
 								cargo test -p omnigraph-engine --test traversal           # one integration-test file (see docs/dev/testing.md)
-												docs: rename runs.md/runs.rs → writes and repoint all references (#131)

The Run state machine was removed in MR-771 (v0.4.0); `docs/dev/runs.md`
and `crates/omnigraph/tests/runs.rs` have since documented and tested the
direct-publish write path, so the "runs" name was misleading.

- git mv docs/dev/runs.md → docs/dev/writes.md (reframe H1 + intro;
  keep MR-771 history note)
- git mv crates/omnigraph/tests/runs.rs → tests/writes.rs (reframe header)
- repoint every runs.md / runs.rs reference across docs, AGENTS.md, and
  source comments
- fix four pre-existing broken `docs/runs.md` links (the file never lived
  at that path) to `docs/dev/writes.md`
- fix the stale v0.4.0 anchor to the live section

No behavior change: every source edit is a comment. Engine builds and the
renamed test passes 25/25; scripts/check-agents-md.sh passes.

The run-removal cleanup itself (run_registry.rs guard, __run__ prefix) is
deferred to MR-770.
											
										
										
											2026-05-30 23:20:56 +02:00
+								cargo test -p omnigraph-engine --test writes concurrent   # one test fn by name substring
-												docs: rewrite README opening + add AGENTS.md dev commands (#122)

* docs(agents): add build/test/lint dev-command section

AGENTS.md (CLAUDE.md) covered architecture and invariants but had no
developer command surface — only runtime `omnigraph` CLI usage. Add a
concise "Build, test, lint" section with the non-obvious gotchas:

- crate dir `crates/omnigraph` is package `omnigraph-engine` (the `-p` name)
- canonical CI gate is `cargo test --workspace --locked`
- how to run one file / one fn
- feature-gated suites (`failpoints`, server `aws`)
- S3 tests skip without `OMNIGRAPH_S3_TEST_BUCKET`
- the two non-test CI checks (check-agents-md, OpenAPI drift)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(readme): rewrite opening, dedupe, fix stale references

- New manifesto-style opening (tagline, X-as-code, features, core
  use cases, coordination-layer line); drop the old prose intro,
  Use Cases, and Capabilities sections.
- Remove Capabilities, which restated the new opening line-for-line.
- Harmonize heading case: "## Core Use Cases".
- Dedupe the verbatim Slack invite (kept the Community section) and
  the double-linked cli.md (kept the contextual pointer).
- Fix stale references that no longer match the code:
  - drop "transactional runs" / "and runs" — no run concept remains;
    writes are atomic per-query, multi-query workflows use branches.
  - update the CLI crate command list to canonical query/mutate plus
    commit/lint/optimize/cleanup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
											
										
										
											2026-05-30 00:53:42 +01:00
+								cargo test -p omnigraph-engine some_inline_test -- --nocapture   # show stdout
 								# Feature-gated suites (each is its own job in CI, not part of the default run)
 								cargo test -p omnigraph-engine --features failpoints --test failpoints   # fault injection
 								cargo build -p omnigraph-server --features aws   # AWS Secrets Manager bearer-token source
 								```
-												docs: onboarding-first README + in-repo agent skill + drop RustFS script (#257)

* docs: optimize README for dev onboarding; fix 0.7.0 staleness

The README's setup half drifted from the shipped 0.7.0 CLI and led with the
heaviest path (Docker + RustFS). This reworks it for fast, correct onboarding:

README.md
- New zero-dependency "Your first graph in 60 seconds" hero: a fully
  copy-pasteable local file-backed loop (schema → init → load → query → branch).
- Add a correct "Serve it" section (cluster apply + omnigraph-server --cluster);
  the server is cluster-only on main, so the old positional-URI boot is gone.
- Demote the RustFS bootstrap to "rehearse the S3 path locally"; reframe the
  storage bullet as "filesystem or any S3-compatible store (AWS S3, R2, MinIO,
  RustFS)" — RustFS is a provider, not a storage class.
- Fix crate/MCP descriptions (query/mutate/load, not read/change/ingest).

docs/user/quickstart.md
- Fix the query example: `read --name <q> … <uri>` is removed — the query name
  is positional and the graph is addressed with `--store` (`omnigraph query
  find_people --query queries.gq --store graph.omni`).

scripts/local-rustfs-bootstrap.sh
- Convert to cluster mode: write a cluster.yaml (storage: s3://…), then
  validate → import → apply, load the fixture into the derived root with the
  now-required --mode, and serve with `omnigraph-server --cluster`. The old
  flow (`load` without --mode, `omnigraph-server <URI>` positional boot) no
  longer works on a cluster-only server.

* docs: move agent skill into the repo, add agent-setup snippet, drop rustfs script

skills/omnigraph
- The operational skill (formerly `omnigraph-best-practices` in the cookbooks
  repo) now lives with the engine it documents, co-versioned. Renamed to
  `omnigraph`; repository metadata repointed here.
- Broadened the description to trigger on intent — storing/retrieving/querying
  knowledge, agent memory, building a knowledge graph, operating Omnigraph — as
  well as on CLI/artifact sightings (stays ≤1024 chars).
- Install: `npx skills add ModernRelay/omnigraph@omnigraph`.

README
- New "Set it up with an AI agent" paste snippet: installs the skill, reads the
  docs (URL), browses the cookbooks, and asks the user about a use case before
  standing up a first graph.
- "Agent skill & starter graphs" section points at skills/omnigraph + cookbooks.

Drop scripts/local-rustfs-bootstrap.sh
- Not CI-tested (so it rotted: it broke on the cluster-only migration — positional
  server boot, load without --mode), demoed the now-optional S3 path, and was the
  most fragile artifact in the repo. Replaced with a "Testing against S3 locally"
  guide in deployment.md (docker run RustFS/MinIO + AWS_* env + cluster-on-S3).
  README/AGENTS references updated.
											
										
										
											2026-06-16 11:48:13 +02:00
+								S3-backed tests (`s3_storage`, and the S3 paths in server/CLI system tests) **skip** unless `OMNIGRAPH_S3_TEST_BUCKET` + `AWS_*` (incl. `AWS_ENDPOINT_URL_S3` for non-AWS) are set; CI runs them against containerized RustFS. To run RustFS/MinIO yourself, see [docs/user/deployment.md](docs/user/deployment.md) → *Testing against S3 locally*.
-												docs: rewrite README opening + add AGENTS.md dev commands (#122)

* docs(agents): add build/test/lint dev-command section

AGENTS.md (CLAUDE.md) covered architecture and invariants but had no
developer command surface — only runtime `omnigraph` CLI usage. Add a
concise "Build, test, lint" section with the non-obvious gotchas:

- crate dir `crates/omnigraph` is package `omnigraph-engine` (the `-p` name)
- canonical CI gate is `cargo test --workspace --locked`
- how to run one file / one fn
- feature-gated suites (`failpoints`, server `aws`)
- S3 tests skip without `OMNIGRAPH_S3_TEST_BUCKET`
- the two non-test CI checks (check-agents-md, OpenAPI drift)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(readme): rewrite opening, dedupe, fix stale references

- New manifesto-style opening (tagline, X-as-code, features, core
  use cases, coordination-layer line); drop the old prose intro,
  Use Cases, and Capabilities sections.
- Remove Capabilities, which restated the new opening line-for-line.
- Harmonize heading case: "## Core Use Cases".
- Dedupe the verbatim Slack invite (kept the Community section) and
  the double-linked cli.md (kept the contextual pointer).
- Fix stale references that no longer match the code:
  - drop "transactional runs" / "and runs" — no run concept remains;
    writes are atomic per-query, multi-query workflows use branches.
  - update the CLI crate command list to canonical query/mutate plus
    commit/lint/optimize/cleanup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
											
										
										
											2026-05-30 00:53:42 +01:00
 								CI does **not** run `clippy` or `rustfmt` as gates — but `cargo test --workspace --locked` is the exact gate, so run it before pushing. Two non-test CI checks: `scripts/check-agents-md.sh` (doc cross-link integrity — run it after moving/renaming docs) and OpenAPI drift (`crates/omnigraph-server/tests/openapi.rs` regenerates `openapi.json`; set `OMNIGRAPH_UPDATE_OPENAPI=1` to update the checked-in copy when a server/API change is intentional).
 								---
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								## Quick-reference flows
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								```bash
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								# Initialize an S3-backed graph
 								omnigraph init --schema ./schema.pg s3://my-bucket/graph.omni
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								# Bulk load
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								omnigraph load --data ./seed.jsonl --mode overwrite s3://my-bucket/graph.omni
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												feat(cli)!: unified load command; deprecate ingest as an alias

omnigraph load is now the single data-write command:
- works against remote graphs (POSTs the server's /ingest endpoint with the
  same bearer/actor resolution as other remote commands) — previously load
  was the only data command forced to open Lance storage directly
- --from <base> opts into fork-if-missing for --branch (the former ingest
  semantics); without --from a missing branch is an error, never a fork
- --mode is now required: overwrite is destructive, so there is no implicit
  default (the old silent default was overwrite)
- output gains base_branch/branch_created (and table sums on remote loads)

omnigraph ingest stays as a deprecated alias (defaults preserved: --from
main --mode merge) that prints a one-line warning to stderr, matching the
read/change deprecation convention; removal in a later release.

Docs updated in the same change: cli.md, cli-reference.md, policy.md,
audit.md, execution.md (unified load section), AGENTS.md quick-flow,
README.md.

BREAKING CHANGE: scripts running omnigraph load without --mode must now
pass it explicitly (previously defaulted to the destructive overwrite).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

											
										
										
											2026-06-11 04:18:00 +03:00
+								# Load a review batch onto its own branch (--from forks it if missing)
 								omnigraph load --branch review/2026-04-25 --from main --mode merge --data ./batch.jsonl s3://my-bucket/graph.omni
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												docs(readme): align with cluster-first paradigm + RFC-011 CLI ergonomics (#262)

* docs(readme): align with the cluster-first paradigm + RFC-011 CLI ergonomics

The README's command examples and crate list predated 0.7.0. Update them:

- Common Commands: capability/addressing model — positional/`--store` for direct
  storage, `--server` for served graphs (no positional `http(s)://`), `query`/
  `mutate` invoke a stored query by name (positional is the query name), `load`
  requires `--mode`. Drop the false "same URI works for local/s3/http" claim and
  the deprecated `read`/`change` + `--name` forms; mention `~/.omnigraph/config.yaml`.
- New "Serving (cluster-first)" section: a deployment is a cluster.yaml converged
  with `cluster apply` and served by `omnigraph-server --cluster <dir|s3://…>`
  (no single-graph mode; config-free boot from a bucket).
- Fix the stale docs link (`docs/user/cli.md` → `cli/index.md` + the CLI reference)
  after the docs were restructured into topic sections.
- Workspace Crates: list all seven (add omnigraph-policy, omnigraph-api-types,
  omnigraph-cluster) with cluster-first framing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(agents): fix the stale `read --name` quick-reference example

AGENTS.md (always-loaded; symlinked to CLAUDE.md) still showed the deprecated
`omnigraph read --query … --name … <s3-uri>` form. Update it to the RFC-011
shape: `omnigraph query --query … <name> … --store <uri>` (read→query, --name→
positional query name, positional URI→--store). Addresses the bot-review finding
on #262.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-16 11:18:16 +03:00
+								# Run a hybrid (vector + BM25) query — ad-hoc .gq against a store (positional = query name)
 								omnigraph query --query ./queries.gq find_similar \
 								  --params '{"q":"trends in AI safety"}' --format table --store s3://my-bucket/graph.omni
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								# Plan + apply schema migration
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								omnigraph schema plan  --schema ./next.pg s3://my-bucket/graph.omni
 								omnigraph schema apply --schema ./next.pg s3://my-bucket/graph.omni --json
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								# Merge review branch back
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								omnigraph branch merge review/2026-04-25 --into main s3://my-bucket/graph.omni
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												fix(maintenance): route uncovered drift through repair (#156)

* docs(invariants): note the non-atomic manifest->commit-graph publish gap

Every graph publish commits __manifest then appends _graph_commits as two
separate writes; a crash between them leaves the manifest ahead of the commit
DAG. Live reads + durability are unaffected (reads resolve via the manifest) and
recovery does not repair it; impact is bounded to commit history / time-travel
by commit id / merge-base completeness. Pre-existing across all publishes, not
the optimize reconcile specifically. Documented as a Known Gap; the fix is a
commit-graph reconcilable from the manifest, not a recovery sidecar.

* fix(maintenance): route uncovered drift through repair

* fix(maintenance): harden repair review feedback
											
										
										
											2026-06-09 14:42:54 +02:00
+								# Compact, preview any uncovered drift, then repair/GC after review
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								omnigraph optimize s3://my-bucket/graph.omni
-												fix(maintenance): route uncovered drift through repair (#156)

* docs(invariants): note the non-atomic manifest->commit-graph publish gap

Every graph publish commits __manifest then appends _graph_commits as two
separate writes; a crash between them leaves the manifest ahead of the commit
DAG. Live reads + durability are unaffected (reads resolve via the manifest) and
recovery does not repair it; impact is bounded to commit history / time-travel
by commit id / merge-base completeness. Pre-existing across all publishes, not
the optimize reconcile specifically. Documented as a Known Gap; the fix is a
commit-graph reconcilable from the manifest, not a recovery sidecar.

* fix(maintenance): route uncovered drift through repair

* fix(maintenance): harden repair review feedback
											
										
										
											2026-06-09 14:42:54 +02:00
+								omnigraph repair s3://my-bucket/graph.omni
 								omnigraph repair --confirm s3://my-bucket/graph.omni
 								# For suspicious/unverifiable drift only after deliberate review:
 								# omnigraph repair --force --confirm s3://my-bucket/graph.omni
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								omnigraph cleanup  --keep 10 --older-than 7d s3://my-bucket/graph.omni
 								omnigraph cleanup  --keep 10 --older-than 7d --confirm s3://my-bucket/graph.omni
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								# Stand up the HTTP server (token from env)
 								OMNIGRAPH_SERVER_BEARER_TOKEN=xxxx \
-												[codex] fix RFC-011 follow-up regressions (#258)

* fix rfc-011 follow-up regressions

* test(cli): remove served schema-apply tests obsoleted by the cluster 409

This PR disables server-side schema apply for cluster-backed serving (409 →
`omnigraph cluster apply`). Two system_local tests still drove *served* schema
apply against a spawned `--cluster` server and asserted the pre-409 behavior, so
they failed under `cargo test --workspace`:

- `local_cli_schema_apply_enforces_engine_layer_policy` — expected a per-actor
  policy `denied`/allow on the served route; the route now 409s for everyone
  before policy runs.
- `local_cli_schema_apply_rejects_stored_query_breakage_before_publish` —
  expected a served apply to reject a stored-query breakage; the route now 409s
  before any apply.

Both exercise a path the PR intentionally removed. Their surviving coverage:
the 409 itself is pinned by `schema_routes::schema_apply_route_refuses_cluster_backed_server_mode`
(asserts 409 + no mutation); stored-query-breakage-before-publish stays covered
by `schema_routes::schema_apply_route_rejects_stored_query_breakage_before_publish`
(single-mode); engine-layer schema_apply Cedar enforcement stays covered by
`policy_engine_chassis`. Remove the obsolete served versions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(server): report the cluster-backed schema-apply 409 after the Cedar gate

The 409 ("schema apply is disabled for cluster-backed serving") fired at the top
of `server_schema_apply`, before `authorize_request`. An authenticated-but-
unauthorized actor therefore learned the server is cluster-backed (409) instead
of getting a normal 403 — leaking topology before authorization, against the
same posture that keeps `GET /graphs` default-deny.

Move the 409 below the Cedar gate so the route reports 401 → 403 → 409: an
unauthorized actor gets 403, and only an actor authorized for `schema_apply`
sees the actionable "use `omnigraph cluster apply`" 409. (An open/unauthenticated
server still 409s, as it has no topology to protect.)

Regression: `schema_apply_route_cluster_backed_denies_unauthorized_actor_before_409`
(POLICY_YAML grants no schema_apply → act-ragnor gets 403, not 409). Addresses the
bot-review finding on #258.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-16 03:11:43 +03:00
+								  omnigraph-server --cluster s3://my-bucket/cluster --bind 0.0.0.0:8080
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								# Cedar policy explain
-												[codex] fix RFC-011 follow-up regressions (#258)

* fix rfc-011 follow-up regressions

* test(cli): remove served schema-apply tests obsoleted by the cluster 409

This PR disables server-side schema apply for cluster-backed serving (409 →
`omnigraph cluster apply`). Two system_local tests still drove *served* schema
apply against a spawned `--cluster` server and asserted the pre-409 behavior, so
they failed under `cargo test --workspace`:

- `local_cli_schema_apply_enforces_engine_layer_policy` — expected a per-actor
  policy `denied`/allow on the served route; the route now 409s for everyone
  before policy runs.
- `local_cli_schema_apply_rejects_stored_query_breakage_before_publish` —
  expected a served apply to reject a stored-query breakage; the route now 409s
  before any apply.

Both exercise a path the PR intentionally removed. Their surviving coverage:
the 409 itself is pinned by `schema_routes::schema_apply_route_refuses_cluster_backed_server_mode`
(asserts 409 + no mutation); stored-query-breakage-before-publish stays covered
by `schema_routes::schema_apply_route_rejects_stored_query_breakage_before_publish`
(single-mode); engine-layer schema_apply Cedar enforcement stays covered by
`policy_engine_chassis`. Remove the obsolete served versions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(server): report the cluster-backed schema-apply 409 after the Cedar gate

The 409 ("schema apply is disabled for cluster-backed serving") fired at the top
of `server_schema_apply`, before `authorize_request`. An authenticated-but-
unauthorized actor therefore learned the server is cluster-backed (409) instead
of getting a normal 403 — leaking topology before authorization, against the
same posture that keeps `GET /graphs` default-deny.

Move the 409 below the Cedar gate so the route reports 401 → 403 → 409: an
unauthorized actor gets 403, and only an actor authorized for `schema_apply`
sees the actionable "use `omnigraph cluster apply`" 409. (An open/unauthenticated
server still 409s, as it has no topology to protect.)

Regression: `schema_apply_route_cluster_backed_denies_unauthorized_actor_before_409`
(POLICY_YAML grants no schema_apply → act-ragnor gets 403, not 409). Addresses the
bot-review finding on #258.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-16 03:11:43 +03:00
+								omnigraph policy explain --cluster ./company-brain --graph knowledge --actor act-alice --action change --branch main
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
+								```
 								---
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								## Capability matrix — "Lens by default vs. added by OmniGraph"
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
 								| Capability | L1 (Lance default) | L2 (OmniGraph adds) |
 								|---|---|---|
 								| Columnar storage on object store | ✅ Arrow/Lance | URI normalization, S3 env-var plumbing |
 								| Per-dataset versioning + time travel | ✅ | `snapshot_at_version`, `entity_at`, snapshot-pinned reads across many tables |
 								| Per-dataset branches | ✅ | **Graph-level** branches (atomic across all sub-tables), lazy fork, system branch filtering |
-												Recovery liveness, storage fault-injection matrix, and one storage implementation over object_store (#203)

* test(engine): pin the long-lived-handle heal contract for sidecar-covered drift

A Phase B -> Phase C failure (commit_staged advanced Lance HEAD, manifest
publish did not land, recovery sidecar persists) currently wedges every
subsequent staged write on the same engine handle: the commit-time drift
guard rejects with 'run omnigraph repair', but repair itself refuses
while a recovery sidecar is pending, so a long-lived server can only
recover by restart. The documented contract (writes.md 'Long-running
servers', invariants.md invariant 5) says refresh-time roll-forward
closes this residual without restart -- but no write path runs it.

Two red tests pin the intended contract at the write entry points:
a follow-up load (the POST /ingest shape: shared handle, no reopen)
and a follow-up mutation must heal roll-forward-eligible sidecars
in-process and then succeed.

Currently failing with:
  table 'node:Company' has Lance HEAD version 2 ahead of manifest
  version 1; run `omnigraph repair` before writing

The fix lands in the next commit.

* fix(engine): heal pending recovery sidecars at the staged-write entry points

Close the long-lived-process gap in the recovery protocol: a Phase B ->
Phase C residual (per-table commit_staged landed, manifest publish did
not, sidecar persists) previously recovered only at the next ReadWrite
open or via an explicit refresh() that no production write path called,
so a long-lived server wedged every subsequent write on the commit-time
drift guard until restart.

New recovery::heal_pending_sidecars_roll_forward:
- one list_dir of __recovery/ at write entry (empty -> immediate
  return, the steady state), so the per-write cost is one storage list;
- per sidecar, acquires the same per-(table_key, table_branch) write
  queues every sidecar writer holds from before write_sidecar until
  after delete_sidecar, then re-checks sidecar existence -- this
  serializes the heal against live writers instead of rolling an
  in-flight sidecar forward from under its writer (which would fail
  that writer's publish CAS spuriously). Lock order queues ->
  coordinator matches every writer's commit->publish path. This is the
  queue-acquisition design recovery.rs and write_queue.rs already
  documented for in-process recovery;
- processes in RollForwardOnly mode: the common residual rolls forward
  in-process; rollback-eligible sidecars still defer to the next
  ReadWrite open (Dataset::restore is unsafe under concurrency).

Wire it into load_as and mutate_as (before the inline delete path can
advance any HEAD), and rebase Omnigraph::refresh onto the same helper
so refresh stops racing live writers' sidecars.

The maintenance entry points (apply_schema_as, branch_merge_as,
ensure_indices) intentionally keep their strict fail-loud preconditions
for now; wiring the same heal there is a follow-up with its own tests.

Turns the previous commit's two red tests green.

* fix(engine): name the right recovery path in the commit-time drift guard

The drift guard's 'run omnigraph repair before writing' advice is a
dead end when the drift is covered by a pending recovery sidecar:
repair refuses while a sidecar is pending. With the write-entry heal in
place, reaching this guard with sidecar-covered drift means the heal
deferred it (rollback-eligible), and the actual recovery path is a
read-write reopen. Distinguish the two classes on the error path only
(one sidecar list, after the conflict is already certain); a listing
failure falls back to the uncovered-drift wording rather than masking
the conflict.

Pinned by extending refresh_defers_rollback_eligible_sidecar_to_next_open
with a write attempt against the deferred sidecar.

* docs: write-entry in-process sidecar heal — contract and coverage

Update the recovery contract docs to match the previous two commits:
invariant 5 now states that the staged-write entry points and refresh
run in-process roll-forward recovery (long-lived processes converge on
the next write, not at restart); writes.md 'Long-running servers'
describes the heal's queue-acquisition concurrency contract, the
improved drift-guard error, and the entry points that intentionally do
not heal yet; testing.md indexes the new failpoint tests; AGENTS.md
capability matrix drops the claim that in-process recovery is entirely
future work (only the rollback path remains with the background
reconciler).

* test(engine): pin the entry heal contract for schema apply and branch merge

Without the write-entry heal, the two maintenance writers do worse than
wedge on sidecar-covered drift -- they proceed and decide its fate
implicitly:

- schema apply re-plans table rewrites from the manifest pin, orphaning
  the drifted Phase-B commit (its rows silently vanish from the
  rewritten table) while the stale sidecar lingers to misclassify
  against the post-apply pins;
- branch merge publishes over the drift, making the failed writer's
  commit visible as an unattributed side effect (no recovery audit
  row), and leaves the stale sidecar behind.

Two red tests pin the intended contract: both entry points heal the
sidecar first (attributed roll-forward), then run on the converged
state. Currently failing on the stale-sidecar / dropped-rows
assertions; the fix lands in the next commit.

* fix(engine): heal pending recovery sidecars at the schema-apply and branch-merge entries

Extend the write-entry heal to the remaining two write entry points.
Unlike load/mutate (which wedge on the drift guard), these proceeded
over sidecar-covered drift and decided its fate implicitly:

- schema apply re-planned table rewrites from the manifest pin,
  orphaning the drifted Phase-B commit -- its rows silently vanished
  from the rewritten table -- while the stale sidecar lingered to
  misclassify against the post-apply pins;
- branch merge published over the drift, making the failed writer's
  commit visible without a recovery audit row, and left the stale
  sidecar behind.

Both now run the same queue-serialized roll-forward heal at entry,
before their own sidecar exists, so recovery is attributed (audit row)
and deterministic. ensure_indices stays heal-free: it runs inside the
load / schema-apply flows after their entry heal.

Turns the previous commit's two red tests green. Docs updated in the
same change (invariant 5, writes.md, testing.md, AGENTS.md).

* test(engine): pin Phase A sidecar-write failure semantics

Storage fault-injection matrix, row 1: a sidecar PUT failure (S3
PutObject / fs write) in Phase A. New failpoint recovery.sidecar_write
at the top of write_sidecar -- the single choke point all five sidecar
writers go through -- models the storage error backend-generically.

Also adds the other three storage-fault failpoints used by the
following commits (recovery.sidecar_delete, recovery.sidecar_list,
recovery.record_audit); each is a no-op without the failpoints feature.

Pinned contract: every writer writes its sidecar BEFORE its first
HEAD-advancing commit, so a put failure aborts with zero drift (no
sidecar, Lance HEAD == manifest pin, no rows) and a transient fault
never wedges the graph -- the same handle writes/merges normally once
it clears. Covered for load (the staging writer) and branch_merge (the
multi-table writer, forced onto the RewriteMerged path by diverging
both sides).

* test(engine): pin Phase D delete, list, and audit-append storage-fault semantics

Storage fault-injection matrix, rows 2/3/5, plus the real-backend run:

- recovery.sidecar_delete: a Phase D delete failure (S3 DeleteObject)
  must NOT fail the user's write -- the manifest publish already
  landed, so the caller's data is durable. The swallowed failure
  leaves a stale sidecar; the next write's entry heal consumes it via
  the stale-sidecar audit-recovery path (RolledForward, attributed).

- recovery.sidecar_list: a __recovery/ list failure (S3 ListObjectsV2)
  is loud at every consumer -- the write-entry heal fails the write
  and the open-time sweep fails the open. Silently skipping recovery
  over a pending sidecar would be consumer tolerance of drift. Once
  the fault clears, open recovers the pending sidecar normally.

- recovery.record_audit: an audit write failure after the
  roll-forward's manifest publish aborts that recovery attempt and
  keeps the sidecar; re-entry detects the already-published manifest,
  records exactly ONE RolledForward audit row, and converges -- the
  retry tolerance documented on record_audit, exercised end-to-end.

- s3_load_recovers_after_publisher_failure_without_reopen: the
  same-handle heal scenario on a real bucket (gated on
  OMNIGRAPH_S3_TEST_BUCKET, skips locally), exercising sidecar
  put/list/delete through S3StorageAdapter instead of the local-FS
  adapter. CI wiring lands in a follow-up commit.

* test(engine): refuse corrupt recovery sidecars loudly

Storage fault-injection matrix, row 4 (no failpoint needed -- the
corrupt file is written by hand, sibling to the unknown-schema-version
refusal test): a truncated/garbage __recovery/{ulid}.json must be
refused loudly by both the write-entry heal (the write fails naming
the parse error) and the open-time sweep (ReadWrite open fails naming
the file), with the file left on disk for operator inspection.
Read-only opens still work -- the sweep is skipped there.

* test(engine): run the S3 sidecar-lifecycle coverage in CI + document the fault matrix

- ci.yml rustfs_integration: new step running the bucket-gated
  failpoints tests (name filter s3_) against the RustFS container, so
  sidecar put/list/delete are exercised through S3StorageAdapter on
  every storage-affecting PR.
- writes.md: sidecar I/O failure semantics -- Phase A put failure
  aborts with zero drift; Phase D delete failure is swallowed (write
  already durable) and healed by the next write; list failures are
  loud at heal and open; corrupt sidecars are refused with the file
  kept for inspection; audit-append failures are retried to exactly
  one audit row.
- testing.md: index the storage-fault matrix in the failpoints.rs row
  and the new RustFS CI line.

* test(engine): pin read-visibility of acknowledged local if-absent writes

The cluster lib test import_missing_state_creates_state_with_graph_-
observation flakes at ~50% under full-workspace load ('EOF while
parsing a value' reading back the state.json its own import just
acknowledged). Root cause is in the engine's local storage adapter:
write_text_if_absent writes through a buffered tokio::fs::File and
returns when write_all resolves -- which, per tokio's documented File
semantics, means the bytes reached tokio's internal buffer, not the
file. The actual write completes in a background blocking task after
drop, so a caller that acknowledges success and reads the object back
can see an empty or partial file. Under load the window widens; the
red run fails at iteration 0 with 0 of 8192 bytes on disk.

The regression test pins the contract at the adapter boundary: when
write_text_if_absent resolves, the full contents are visible to any
reader; a losing second claim leaves the winner's object untouched.

The fix lands in the next commit.

* fix(engine): publish local storage writes with atomic visibility

Close the class, not the instance. The local adapter admitted three
ways for a reader to observe a write that was acknowledged or visible
before its bytes were complete:

1. write_text_if_absent acknowledged success when the buffered
   tokio::fs::File write_all resolved -- i.e. when the bytes reached
   tokio's internal buffer, not the file. A caller reading back its own
   acknowledged write could see an empty object (the ~50% cluster
   import flake under full-workspace load; the regression test failed
   at iteration 0 with 0 of 8192 bytes visible).
2. The same call published its CLAIM (create_new) before its CONTENT,
   so concurrent readers saw an empty claimed file in the window.
3. write_text (plain tokio::fs::write) exposed truncated content
   mid-replace -- silently falsifying write_sidecar's 'readers either
   see the complete sidecar or none' contract on local FS (true on S3,
   where PutObject is atomic).

A flush in write_text_if_absent would have fixed only (1). Instead,
both local write paths now publish complete temp files atomically:
rename for replace (write_text -- the idiom write_text_if_match
already used) and hard_link for no-replace (write_text_if_absent --
link fails AlreadyExists, so exactly one of N concurrent claimants
wins and the winner's object is fully readable at the instant it
becomes visible). The local adapter now honors the same object-level
atomic-visibility contract as the S3 adapter, which is what every
caller (recovery sidecar protocol, cluster state CAS) was written
against. Crash-orphaned *.tmp.* files are inert: the sidecar sweep
filters to .json, and cluster state reads address state.json by name.

fsync/durability policy is unchanged (no fsync before, none now);
this fix is about visibility ordering, not power-loss durability.

Pre-existing on main (landed with the multi-graph server mode change,
PR #119); surfaced by this branch's heal work only because one extra
list_dir per write shifted test timing. Cluster lib suite: 12/25
failures before, 0/25 after. Turns the previous commit's red test
green.

* refactor(engine): one storage implementation over object_store for every backend

Collapse LocalStorageAdapter (hand-rolled tokio::fs) and
S3StorageAdapter into a single ObjectStorageAdapter backed by
Arc<dyn object_store::ObjectStore> -- LocalFileSystem for local URIs,
the existing AmazonS3 build for s3://, plus a pub in_memory()
constructor (full contract including TRUE conditional updates; the
in-memory test backend testing.md asked for at the adapter level).

Why: the acknowledged-before-visible bug showed the two-impl shape has
no referee -- one prose contract, two independent answers. Upstream
LocalFileSystem::put_opts is byte-for-byte the staged-temp+rename/
hard_link idiom that fix converged on, and Lance's own commit protocol
is built on the same primitives (put-if-not-exists / rename-if-not-
exists), so the substrate-aligned move is to stop hand-rolling it.
The per-backend residue shrinks to a UriCodec (URI <-> object path)
and one capability flag.

Semantics preserved by construction, with three deliberate deltas:
- exists() is now object-store-semantics everywhere (head + non-empty
  prefix fallback): an EMPTY local directory no longer 'exists'. The
  only dir-shaped caller (_graph_commits.lance probes) self-heals via
  ensure_commit_graph_initialized where it previously wedged loudly.
- A directory at an object path reads as NotFound, not as an IO error
  ('only objects exist'). The cluster unreadable-payload test used a
  same-named directory as a portable non-NotFound trigger; it now uses
  chmod 000, which still models genuine transient IO.
- write_text_if_match keeps content-token semantics on local
  (PutMode::Update is NotImplemented upstream for LocalFileSystem in
  0.12.5 and 0.13.2); the capability flag gates the token SOURCE in
  read_text_versioned too -- an ETag token with content-compare writes
  would lose every CAS.

delete_prefix keeps a local remove_dir_all branch: directories are a
local-FS concept, and list+delete would leave empty skeletons that
cluster graph_root_exists (raw Path::exists) reports as still present.

LocalStorageAdapter remains as a delegating shim so the pinned
contract tests gate this swap textually unchanged; the shim and the
test parameterization over local + in-memory land next. Cargo gains
the explicit 'fs' feature (already transitively enabled by lance).

* test(engine): one executable storage contract, run against every backend

Remove the LocalStorageAdapter delegation shim and migrate its
construction sites to ObjectStorageAdapter::local(). Replace the
per-backend duplicated tests with a single contract_suite asserting
the trait's promises (atomic replace, exists incl. the dataset-root
prefix probe, one-winner if_absent, versioned CAS with loud CAS-lost,
rename, list round-trip with no sibling-prefix bleed, idempotent
delete/delete_prefix), run against the local backend and the new
in-memory backend -- which implements true conditional updates, so the
strong-CAS path is exercised without a bucket. The bucket-gated S3
variant already exists (s3_adapter_conditional_writes_contract).

New local-specific pins for the deliberate semantic edges of the
collapse: empty directories are not objects (exists=false; the Lance
dataset-root probe shape is the non-empty case), file://-anchored and
spaces-in-path list output round-trips byte-identically into
read_text, dot-segment paths are lexically absolutized (the CLI's
./graph.omni shape), and upstream rename creating missing destination
parents. The acknowledged-write visibility regression test stays, now
documenting that the cross-API std::fs read-back is the point.

* refactor(cluster): drop put_json's per-backend atomicity branch

The local temp+rename dance predates the storage adapter guaranteeing
atomic visibility; now that write_text publishes via a staged temp +
rename on the filesystem (and a single atomic PUT on object stores) by
contract, the branch duplicated upstream behavior. One call, both
backends.

* docs: storage adapter collapse — contract, in-memory backend, local CAS gap

- testing.md: the 'no MemStorage backend' note is half-closed —
  ObjectStorageAdapter::in_memory() covers the text-object layer with
  the full contract (true conditional updates); Lance datasets bypass
  the adapter, so the engine substrate ask stays open.
- invariants.md: truth-matrix Tests row updated; new Known Gap for
  local write_text_if_match (upstream PutMode::Update is unimplemented
  for LocalFileSystem; content-token emulation is safe only under the
  cluster lock protocol — close before admitting a lock-free caller).
- writes.md: backend notes for the unified adapter (name#N staging
  residue invisible to the sweep, backend-wrapped error text with
  exists()-probing for missing-vs-error, loud permission failures).

* docs: finish renaming the storage adapters in user docs and test comments

storage.md's URI-scheme table and the S3 failpoint test's doc comment
still named the deleted LocalStorageAdapter/S3StorageAdapter; both now
describe the unified ObjectStorageAdapter over object_store, including
the relative-path absolutization note for local URIs.

* test(engine): pin branch-awareness of the drift guard's recovery advice

A pending sidecar on ANOTHER branch does not cover this branch's
drift: with a deferred feature-branch sidecar on disk and genuinely
uncovered drift on main, the main write's error must still point at
omnigraph repair -- a read-write reopen recovers the sidecar but
cannot repair main's uncovered drift. Currently red: the guard
matches sidecar pins by table_key only, so the feature sidecar flips
main's advice to the reopen path. Fix in the next commit.

Surfaced by external review of the drift-guard change.

* fix(engine): branch-aware sidecar matching in the drift guard's advice

The commit-time drift guard's sidecar-covered check matched pins by
table_key alone, so a pending sidecar on another branch flipped this
branch's uncovered-drift advice from 'run omnigraph repair' to the
reopen path -- and a reopen recovers that sidecar but cannot repair
this branch's drift. Compare the pin's table_branch too. Turns the
previous commit's red test green.

Surfaced by external review of the drift-guard change.

* test(engine): pin heal non-interference with a live schema apply

The write-entry heal's schema-staging reconcile runs before any queue
acquisition, so a load on the same handle, overlapping a schema apply
parked between its staging write and manifest commit, promotes the
apply's staging files (new catalog live against the old manifest),
classifies the LIVE apply's sidecar, and publishes its registrations
out from under it. The resumed apply then collides with its own stolen
commit. Currently red with:

  Lance("Concurrent modification: table version 3 already exists for
  node:Tag")

The fix (per-sidecar reconcile under the sidecar's write-queue guards,
plus a serialization key the schema-apply writer and the heal both
acquire) lands in the next commit.

Surfaced by external review of the write-entry heal.

* fix(engine): serialize the heal's schema-staging reconcile with live schema applies

The write-entry heal ran recover_schema_state_files up front, before
acquiring any queue guards. Overlapping a live schema apply parked
between its staging write and manifest commit, the heal promoted the
apply's staging files (new catalog live against the old manifest),
classified the LIVE apply's sidecar, and published its registrations —
the resumed apply then collided with its own stolen commit.

Correct by construction:

- New schema-apply serialization queue key, acquired by the schema-
  apply writer (alongside its per-table keys) from before write_sidecar
  until after delete_sidecar. Per-table keys alone don't cover a
  registration-only migration, which pins no existing tables but has a
  sidecar and staging files on disk.
- The heal reconciles schema staging lazily, PER SchemaApply sidecar,
  after acquiring that sidecar's guards (including the serialization
  key) and re-confirming the sidecar exists — a sidecar that survives
  the queue wait belongs to a dead writer, so the reconcile can no
  longer race a live apply. Recomputing per sidecar also removes the
  staleness of one up-front result across a multi-sidecar pass.
- Omnigraph::refresh drops its up-front reconcile-and-pass-through
  (same race, and a pre-promoted result would make the heal's guarded
  reconcile see clean staging and wrongly defer the sidecar): it now
  reconciles standalone only when NO sidecar exists — which cannot
  race a live apply, whose sidecar always precedes its staging files —
  and otherwise defers entirely to the heal.

The open-time sweep keeps its precomputed reconcile: open has no
concurrent writers. Turns the previous commit's red test green.

Surfaced by external review of the write-entry heal.

Self-audit addendum folded in: refresh's no-sidecar gate had a TOCTOU
(a live apply could write its sidecar + staging between the empty
check and the reconcile) — the standalone reconcile now holds the
serialization key across the list-then-reconcile pair. The remaining
residual is cross-process only (in-process queues cannot serialize
against a writer in another process; the open-time sweep has the same
pre-existing exposure) and is now an explicit Known Gap in
invariants.md rather than an implicit one.

* test(engine): pin catalog reload after the heal recovers a schema apply

When the write-entry heal rolls a crashed apply's SchemaApply sidecar
forward on the same handle, disk and manifest move to the new schema
(staging promoted, registrations published) but the handle's in-memory
schema_source/catalog do not. Subsequent writes then validate against
the stale catalog and reject rows of types the graph already has.
Currently red with:

  record 1: unknown node type 'Tag'

refresh() reloads after its heal; the write entry points must too.
Fix in the next commit.

Surfaced by external review of the write-entry heal.

* fix(engine): reload the in-memory catalog after the heal recovers a schema apply

heal_pending_recovery_sidecars refreshed the coordinator and
invalidated the runtime cache after processing sidecars, but never
reloaded schema_source/catalog — so a write whose entry heal rolled a
crashed SchemaApply sidecar forward proceeded to validate against the
OLD schema while disk and manifest were already on the new one.
reload_schema_if_source_changed is the same post-heal step refresh()
already runs; it no-ops on the (overwhelmingly common) non-schema heal
because the on-disk source is unchanged. Turns the previous commit's
red test green.

Surfaced by external review of the write-entry heal.

* test(engine): pin that a deleted-branch sidecar cannot wedge the graph

A rollback-eligible sidecar pinned to a branch is deferred by every
roll-forward-only pass; if the branch is then deleted, the sidecar
survives, referencing a branch with no manifest tree. The heal (every
write entry) and the open-time sweep (every ReadWrite open) both fail
opening the dead branch, and repair refuses while a sidecar is pending
-- a terminal read-only state with manual sidecar surgery as the only
exit. Currently red with:

  Lance("Not found: .../__manifest/tree/feature/_versions")

The branch's tree and forks are already reclaimed, so the pinned drift
is unreachable and the sidecar is provably moot; the fix classifies it
as an orphaned-branch terminal state (audit + discard) in both passes.

Surfaced by review (P1, verified by repro).

* fix(engine): classify deleted-branch sidecars as orphaned instead of wedging

A deferred (rollback-eligible) sidecar pinned to a branch survives
branch_delete; both the write-entry heal and the open-time sweep then
failed unconditionally opening the dead branch -- every write and
every ReadWrite open errored, and repair refuses while a sidecar
pends. Terminal state, manual sidecar surgery the only exit.

The branch's tree and per-table forks are already reclaimed at delete,
so the drift the sidecar pins is unreachable and the sidecar is
provably moot. Both passes now check the sidecar's branch against the
manifest's branch list (the authority -- deliberately NOT inferred
from a Not-found on open, which could be a transient storage error
masking real recovery intent) and discard orphans with an
OrphanedBranchDiscarded audit row, commit appended on main since the
sidecar's own branch no longer has a commit graph.

The open-time half is pre-existing; the write-entry heal made it hot.
Turns the previous commit's red test green.

Surfaced by review (P1, verified by repro).

* chore: harden review nits — vacuous CI filter, root-runner skip, liveness note

- ci.yml: the RustFS sidecar-lifecycle step now fails loudly if the
  's3_' name filter matches zero tests (cargo passes vacuously on an
  empty filter; the step exists specifically to prove S3 sidecar I/O
  coverage). The pre-existing CLI smoke step has the same shape and is
  left for a follow-up.
- cluster unreadable-payload test: cfg(unix) + a skip-with-log when
  running as root (mode 000 is still readable to root, common in
  container dev runners), so the test degrades instead of failing.
- refresh: document the one-pass-late convergence for legacy staging
  residue while non-SchemaApply sidecars pend, so nobody 'fixes' it by
  re-running the reconcile unserialized — the exact race the
  serialization key closes.

* test(engine): pin orphan-discard idempotency across a delete fault

discard_orphaned_branch_sidecar writes its audit row and main commit
before deleting the sidecar; a Phase D delete fault leaves the sidecar
on disk with the audit already durable, and the retry repeated the
whole path -- a second OrphanedBranchDiscarded audit row (and commit)
for the same operation. Currently red: 2 rows after one fault + retry.
The retry must only finish the delete. Fix next.

Also promotes the recovery-audit kinds reader into the shared test
helpers (it was recovery.rs-local).

Surfaced by external review of the orphan-discard fix.

* fix(engine): orphan-discard idempotency + heal reports acted-vs-deferred

Two review findings on the recovery surface:

- discard_orphaned_branch_sidecar now checks the audit table for an
  existing (operation_id, OrphanedBranchDiscarded) row before appending
  the commit + audit pair, so a Phase D delete fault retries ONLY the
  delete instead of duplicating audit rows and commit-graph entries.
  Cold path: the list scan runs only when an orphaned sidecar exists.
  Turns the previous commit's red test green (exactly one audit row
  across fault + retry).

- process_sidecar returns whether durable state changed; the heal sets
  processed_any only for sidecars that were actually rolled forward /
  rolled back / audit-recovered (orphan discards count). Deferred
  sidecars (rollback-eligible, invariant-violating, unpromoted
  SchemaApply) no longer trigger a per-write schema reload + full
  runtime-cache invalidation while they pend -- the cache is
  snapshot-keyed so this was waste, not corruption, but it was paid on
  every write until reopen. Acted-paths' processed=true remains pinned
  by load_after_schema_apply_phase_b_failure_uses_recovered_catalog
  (the reload depends on it).

Surfaced by external review.

* test(engine): pin the orphan-discard audit-append fault leg as documented tolerance

The orphan discard's commit append and audit append are two writes; a
failure between them leaves a recovery commit with no audit row, and
the retry (keyed on the audit row, the operator-facing record) appends
a second commit before the audit lands. This is the same
not-atomic-pair-write tolerance record_audit documents and the
manifest->commit-graph Known Gap covers for every publish: bounded
commit-graph noise, audit row exactly-once under clean failures.
Keying idempotency on commit rows instead would need an operation_id
column on _graph_commits, and audit-before-commit would dangle the
graph_commit_id join -- both worse than the documented residual.

Make the tolerance explicit instead of implicit: docstring names the
window, a failpoint sits inside it, and the new test pins convergence
across the fault (sidecar consumed, exactly one audit row), completing
the orphan-discard fault matrix alongside the delete-fault leg.

Surfaced by external review of the orphan-discard idempotency.

* test(engine): pin honest drift-guard advice when sidecar listing fails

The guard's unwrap_or(false) conflated 'classified as uncovered' with
'could not classify': a transient list fault on the guard's second
list (the entry heal's first list having succeeded) confidently routed
the operator to omnigraph repair even when the heal had just deferred
a rollback-eligible sidecar -- and repair refuses while a sidecar is
pending. Currently red: the error says 'run omnigraph repair' with no
mention of the reopen path. The fix names both paths plus the failure
cause when classification is impossible.

Surfaced by external review of the drift-guard fallback.

* fix(engine): admit ambiguity in the drift guard when sidecar listing fails

Replace the unwrap_or(false) fallback with a tri-state: covered ->
reopen advice; uncovered -> repair advice; listing FAILED -> say the
drift could not be classified, name the cause, and give both paths in
order ('run repair, or reopen read-write if repair reports a pending
sidecar'). The old fallback confidently routed a transient list fault
to repair, which refuses while a sidecar is pending -- a self-
correcting but pointless detour. The conflict itself is still always
raised; only the advice degrades honestly. Turns the previous commit's
red test green.

Surfaced by external review of the drift-guard fallback.
											
										
										
											2026-06-13 11:20:08 +02:00
+								| Atomic single-dataset commits | ✅ | **Multi-table publish via three layers**, NOT a single Lance primitive: (1) per-table Lance `commit_staged` for the data write, (2) `__manifest` row-level CAS via `ManifestBatchPublisher` for cross-table ordering, (3) the open-time recovery sweep for the residual gap between (1) and (2). All three layers ship; the five migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`, `optimize_all_tables`) write a `__recovery/{ulid}.json` sidecar before Phase B and delete it after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the sweep in `db/manifest/recovery.rs`: classify, decide all-or-nothing per sidecar, roll forward via single `ManifestBatchPublisher::publish` or roll back via `Dataset::restore` followed by a manifest publish of the restored version (so both directions converge to `manifest == HEAD` — no residual drift), and record an audit row in `_graph_commit_recoveries.lance` (queryable via `omnigraph commit list --filter actor=omnigraph:recovery`). The write entry points (`load_as`, `mutate_as`, `apply_schema_as`, `branch_merge_as`) and `refresh` additionally run an in-process roll-forward-only heal (serialized against live writers via the per-table write queues), so a long-lived server converges on its next write without restart; only rollback-eligible sidecars still defer to the next read-write open (a future background reconciler's goal). Engine writes route through a sealed `TableStorage` trait (`db.storage()`) exposing only `stage_*` + `commit_staged` + reads; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so the default surface cannot couple a write with a HEAD advance — §1 holds by construction. `delete_where` and `create_vector_index` stay inline until upstream Lance ships a public two-phase API ([#6658](https://github.com/lance-format/lance/issues/6658), [#6666](https://github.com/lance-format/lance/issues/6666)); `LoadMode::Overwrite` uses Lance `Overwrite` staged transactions. |
-												fix(engine): scalar index coverage + filter literal coercion (query latency) (#216)

* fix(engine): lower date/datetime filter literals as typed Arrow scalars

`literal_to_expr` lowered `Date`/`DateTime` query literals as Utf8 strings,
relying on DataFusion implicit casts. Against a physical `Date32`/`Date64`
column that can coerce the column side (`CAST(col AS Utf8)`), which defeats a
scalar BTREE and degrades the scan to a full filtered read. Lower to typed
`Date32`/`Date64` scalars instead (reusing the loader's
`parse_date32_literal`/`parse_date64_literal`, already used by the in-memory
comparison arm), so the predicate stays a direct column comparison and the
index is used. Malformed literals fall back to the Utf8 string so pushdown
behavior never regresses.

Tests: unit goldens asserting the lowered literal is typed (red before, green
after) + inline-binding pushdown equality in literal_filters confirming the
epoch conversion selects the right rows.

* fix(engine): build scalar BTREE for enum and orderable-scalar @index columns

`build_indices_on_dataset_for_catalog` only handled `String` (-> FTS) and
`Vector` (-> vector). Enums are physically `String`, so an enum `@index`
column (e.g. `status`) got an FTS inverted index, which Lance never consults
for `=`; and `DateTime`/`Date`/numeric/`Bool` `@index` columns fell through
and built nothing. Both meant equality/range filters degraded to full scans
with `indices_loaded=0`.

Dispatch index kind by property type via a shared `node_prop_index_kind`:
enum + orderable scalar -> BTREE, free-text String -> FTS, Vector -> vector,
list/Blob -> none. The helper is shared by the builder and
`needs_index_work_node` so they cannot drift — the latter decides recovery-
sidecar pinning, and under-reporting would leave a HEAD-advancing index build
uncovered (invariant 5).

Tests: scalar_indexes.rs asserts enum/DateTime/numeric @index columns report
`IndexCoverage::Indexed` while free-text String/un-annotated columns stay
`Degraded` (negative control). Docs: docs/user/indexes.md.

* feat(engine): reindex in optimize to keep index coverage current

A scalar/FTS/vector index only covers the fragments it was built over. Rows
appended after the build (e.g. `ingest --mode merge`, whose commit does not
rebuild an existing index) are scanned unindexed, and `compact_files` rewrites
fragments out of coverage. Nothing folded them back in, so coverage decayed as
the graph grew — even the id/src/dst BTREEs that power traversal.

`optimize_one_table` now runs Lance `optimize_indices` after `compact_files`
(incremental merge, not retrain — the same compact->optimize_indices sequence
LanceDB's `optimize()` uses) and enters the publish path on compaction work OR
stale index coverage (new `TableStore::has_unindexed_fragments`, reusing the
fragment_bitmap logic). `optimize_indices` is a committing call with no
uncommitted variant in lance-6.0.1, so it is an inline-commit residual covered
by the existing `SidecarKind::Optimize` recovery sidecar spanning both ops.
Blob-bearing tables are still skipped (the Lance blob-compaction bug is
compaction-specific; reindex-for-blob deferred as a noted follow-up).

Tests: maintenance.rs asserts an appended fragment is uncovered before and
covered after optimize, and idempotency holds (second pass is a no-op).
lance_surface_guards pins the `optimize_indices` signature and its incremental-
coverage behavior. The existing optimize Phase-B recovery failpoint now also
exercises a crash after reindex. Docs: maintenance.md, writes.md, invariants.md,
lance.md, AGENTS.md.

* fix(engine): coerce pushdown filter literals to the column type

Filter literals were pushed to Lance in their natural Arrow type (every integer
Int64, every float Float64). Against a narrower indexed column DataFusion widens
to the literal's type and casts the COLUMN (`CAST(n32 AS Int64)`), which defeats
the scalar BTREE and degrades to a full filtered read. A physical-plan probe
confirms it: an Int32 column filtered by an i32 literal uses `ScalarIndexQuery`;
by an i64 literal it does not.

Thread the scan's `arrow_schema` through `build_lance_filter_expr` ->
`ir_filter_to_expr` and coerce each literal operand to the opposite column's
exact Arrow type, reusing `projection::literal_to_array` + `arrow_cast` (the same
path the in-memory arm uses, so the two arms agree). Coercion never demotes a
filter to None: on failure it falls back to the natural literal, because a node
scan has no in-memory fallback for inline filters.

Supersedes the date-specific change in e4ef67b (PR1): the probe shows dates were
never index-defeated — temporal coercion casts the LITERAL, not the column — so
PR1's index-use rationale was wrong though harmless. The generic coercion
subsumes it; `literal_to_expr`'s date arms revert to the natural Utf8 fallback,
and its unit tests now assert the live coerced path.

Tests: surface guard `scalar_index_use_requires_matched_literal_type` pins the
substrate behavior (matched -> index, widened -> column-cast full scan); unit
tests cover Int32/UInt32/Float32 coercion, range op, reversed operand order, and
the natural fallback; `literal_filters` adds an I32 column with equality + range
and an F32 pushdown case.

* fix(engine): only coerce filter literals when the cast is lossless

The literal coercion in f064121 narrowed unconditionally. typecheck permits
numeric cross-type comparisons (`types_compatible`), so an out-of-domain literal
reaches `literal_to_typed_expr` and casts lossily: a fractional float vs an
integer column truncates (`{ count: 2.7 }` -> `count = 2`, wrongly matching the
count=2 row) and an out-of-range integer overflows to null (`count < 3e9` on I32
-> `count < NULL` -> empty). Both silently change results, and a node scan has no
in-memory fallback for inline filters.

Add a lossless guard for integer targets: round-trip the cast back to the natural
type and, on mismatch, return None so the caller keeps the natural literal
(correct via DataFusion coercion; the index is just unused for that out-of-domain
predicate). Float targets stay coerced -- narrowing F64 -> F32 is the column's own
precision domain, not a value error.

Resolves the two valid review findings on PR #216 (Codex float truncation, Greptile
out-of-range). Tests: unit cases for fractional/out-of-range fallback vs
whole-float/in-range coerce vs F32 exemption; e2e `{ count: 2.7 }` returns no rows.
											
										
										
											2026-06-14 16:31:19 +02:00
+								| Compaction (`compact_files`) + reindex (`optimize_indices`) | ✅ | `omnigraph optimize` orchestrates over all node/edge tables, bounded concurrency; per table runs `compact_files` **then Lance `optimize_indices`** (folds appended/rewritten fragments back into existing indexes — incremental merge, not retrain) and **publishes the resulting version to `__manifest`** (so the manifest tracks the Lance HEAD — required for reads to observe the work and for schema apply / strict writes to pass their HEAD-vs-manifest precondition), under the per-`(table, main)` write queue with `SidecarKind::Optimize` recovery coverage spanning both ops; **commits even with no compaction work if index coverage is stale**; **refuses on an unrecovered graph**; **skips uncovered HEAD > manifest drift** with `DriftNeedsRepair`; **skips blob-bearing tables** (reported via `TableOptimizeStats.skipped`, not silent; reindex is skipped for them too today), gated on `LANCE_SUPPORTS_BLOB_COMPACTION` until the upstream blob-v2 compaction-decode bug is fixed (see [docs/dev/invariants.md](docs/dev/invariants.md) Known Gaps) |
-												fix(maintenance): route uncovered drift through repair (#156)

* docs(invariants): note the non-atomic manifest->commit-graph publish gap

Every graph publish commits __manifest then appends _graph_commits as two
separate writes; a crash between them leaves the manifest ahead of the commit
DAG. Live reads + durability are unaffected (reads resolve via the manifest) and
recovery does not repair it; impact is bounded to commit history / time-travel
by commit id / merge-base completeness. Pre-existing across all publishes, not
the optimize reconcile specifically. Documented as a Known Gap; the fix is a
commit-graph reconcilable from the manifest, not a recovery sidecar.

* fix(maintenance): route uncovered drift through repair

* fix(maintenance): harden repair review feedback
											
										
										
											2026-06-09 14:42:54 +02:00
+								| Repair uncovered drift | — | `omnigraph repair` explicitly classifies uncovered table `HEAD > manifest` drift: verified maintenance drift (`ReserveFragments`/`Rewrite`) can be published with `--confirm`; suspicious or unverifiable drift requires `--force --confirm`. Sidecar-covered crash residuals still recover automatically on open. |
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
+								| Cleanup (`cleanup_old_versions`) | ✅ | `omnigraph cleanup` with `--keep` / `--older-than` policy |
-												Index materialization is derived state: defer off the write path, reconcile via optimize (iss-848) (#246)

* test(engine): reproduce empty-table Vector @index aborting schema apply

A Vector (IVF) index trains k-means centroids over the column, so Lance
cannot build it on 0 vectors ("Creating empty vector indices with
train=False is not yet implemented"). schema apply reconciles a table's
whole index set whenever any @index on it changes, so adding an unrelated
scalar @index materializes the dormant empty vector index and aborts the
entire migration (all-or-nothing).

This regression test inits a 0-row Doc with a Vector @index, adds a scalar
@index, and asserts the apply succeeds (then loads one embedded row and
asserts the deferred index materializes). It fails today at the apply step
with the vector-index abort; the fix lands in the next commit.

Refs dev-graph iss-empty-vector-index-schema-apply, iss-848.

* fix(engine): defer Vector @index on an empty table instead of aborting schema apply

build_indices_on_dataset_for_catalog materialized a declared Vector @index
unconditionally. On a 0-row table Lance cannot train the IVF index
("Creating empty vector indices with train=False is not yet implemented"),
so any later migration that touches the table (e.g. adding an unrelated
scalar @index, which reconciles the table's whole index set) aborted the
entire migration on the dormant vector index — all-or-nothing.

Guard the vector arm with a row-count check, matching the guard
ensure_indices_for_branch and the branch-merge rebuild already use: an
untrainable column becomes a pending index that a later ensure_indices /
optimize materializes once the table has rows. Reads stay correct meanwhile
(vector search degrades to a brute-force scan).

Stop-gap: the residual rows-present-but-vectors-null window and the full
decoupling (intent recorded at apply, an idempotent coverage reconciler)
are dev-graph iss-848. Turns the green half of the regression test added in
the previous commit.

Refs dev-graph iss-empty-vector-index-schema-apply, iss-848, iss-687.

* docs(invariants): record the logical-contract-over-physical-state principle

The bug class behind the empty-table vector-index abort (and the schema-apply
vs optimize version drift) is one shape: a physical operation allowed to fail
a logical one. Several hard invariants (2, 5, 7, 13) and deny-list items are
already instances of this, but the unifying rule was never written down.

Add it to docs/dev/invariants.md as a "Governing principle" section above the
hard invariants, naming which invariants and deny-list items instantiate it
and the smell to watch for (a logical operation gated on a physical fact).
Add a one-line always-on rule (7) in AGENTS.md so it stays in working memory,
with the qualifier that genuine logical conflicts still fail loudly — the
licence to lag covers physical convergence, not correctness.

Audience-neutral: no private ticket refs. check-agents-md.sh passes.

* test(engine): index build must tolerate rows with null vectors (load-before-embed)

Loading rows whose vector column is null into a `Vector @index` table fails
today: build_indices (reached via the loader's prepare_updates_for_commit)
calls create_vector_index, and Lance's IVF KMeans errors "cannot train 1
centroids with 0 vectors". The same abort hits ensure_indices/optimize/schema
apply/merge, since they all funnel through build_indices_on_dataset_for_catalog.

This test loads two null-embedding rows and calls ensure_indices; it must not
abort (the untrainable vector column is deferred, sibling indexes still build).
Fails today at the load step; fixed in the next commit.

Refs dev-graph iss-848, iss-empty-vector-index-schema-apply.

* fix(engine): defer unbuildable index columns instead of aborting the write path

build_indices_on_dataset_for_catalog is the chokepoint every write path funnels
through (load/mutate via prepare_updates_for_commit, schema apply, ensure_indices,
optimize, branch merge). Its vector arm called create_vector_index
unconditionally, so a column with no trainable vectors yet — an empty table, or
rows loaded before `embed` populates them — aborted the whole operation with
Lance's IVF KMeans error.

Fault-isolate the vector build: on failure, record the column as a PendingIndex
(table, column, reason), log it, and continue building the sibling indexes; a
later ensure_indices/optimize materializes it once the column is trainable, and
reads use brute-force meanwhile. Manifest/CAS/IO errors at the publish boundary
still propagate. Isolating at the single chokepoint realizes the governing
principle (physical index state never fails a logical operation) for every write
path, and supersedes the earlier symptomatic count_rows==0 stop-gap (removed) —
closing the residual rows-present-but-vectors-null window it left open.

Surfacing pending index status rather than failing is the database norm
(Postgres indisvalid, LanceDB list_indices). ensure_indices and the build_indices
wrappers now return Vec<PendingIndex>; optimize surfaces it in a later commit.

Refs dev-graph iss-848, iss-951 (vector index stays inline-commit until lance#6666).

* test(engine): index-only schema apply must not touch table data

Adding an @index to an existing column should be a pure metadata change once
index materialization moves to the reconciler (iss-848): the apply records the
intent in the catalog/IR but builds nothing inline, so the table's manifest
version is unchanged. Today the indexed_tables block builds the index inline
and bumps the version (4 -> 5). Fixed in the next commit.

Refs dev-graph iss-848.

* fix(engine): schema apply records index intent only; index-only apply is metadata

Schema apply no longer builds indexes inline. The four build_indices calls
(added/renamed/rewritten/index-only tables) are removed; the @index/@key intent
is already persisted in the catalog/IR the apply writes, and the physical index
is materialized off the critical path by ensure_indices/optimize (iss-848).

Concretely:
- AddConstraint (an @index addition — every other added constraint plans as
  UnsupportedChange) becomes a pure metadata step alongside the metadata-only
  steps: it touches no table data, so the table version is unchanged.
- added/renamed/rewritten tables still write their data; only the trailing
  index build is gone. The rewritten table's coverage is restored later by
  optimize_indices.
- recovery_pins drops index-only tables (they no longer advance Lance HEAD) and
  keeps rewritten tables; their post_commit_pin = expected+1 is now exact (one
  rewrite commit), strengthening recovery classification.
- the now-orphaned Omnigraph::build_indices_on_dataset_for_catalog wrapper is
  removed.

A migration can no longer abort on an index build, for any index type at any
cardinality. Turns the green half of index_only_constraint_apply_touches_no_table_data.

Refs dev-graph iss-848.

* test(engine): optimize must converge a declared-but-unbuilt index

After iss-848, adding an @index post-data is a metadata-only apply that defers
the physical build, so the column is declared-indexed but unbuilt (reads scan).
`optimize` — the operator's cron reconciler — must materialize it. Today optimize
only maintains coverage of EXISTING indexes (optimize_indices) and never creates
missing ones, so the rank BTREE stays Degraded after optimize. Fixed next commit.

Refs dev-graph iss-848.

* fix(engine): optimize materializes declared-but-unbuilt indexes (the reconciler)

`omnigraph optimize` is the operator's cron reconciler. It already compacts and
folds new fragments into EXISTING indexes (optimize_indices); now it also builds
declared-but-missing indexes, so the indexes schema apply / load defer (iss-848)
converge on the next optimize.

Done inside optimize_one_table (not by composing the all-tables ensure_indices,
which is drift-blind and would re-publish the uncovered HEAD>manifest drift that
optimize deliberately skips): after the per-table drift/blob skips and under the
queue + Optimize sidecar already held, a needs_index_create gate (reusing
needs_index_work_node/edge — "declared index missing AND row_count > 0", so empty
tables stay no-ops) admits index-only work, and Phase B builds the missing index
over the just-compacted layout via the build chokepoint. An untrainable vector
column fault-isolates into the new TableOptimizeStats.pending_indexes (the
list_indices/indisvalid analog operators read), not a failure. committed now
reflects index commits, so the existing post-publish cache invalidation covers
them. LanceDB's optimize only maintains existing indexes; creating
declared-but-missing ones is the L2 behavior omnigraph's declarative @index needs.

Turns the green half of optimize_materializes_index_declared_but_unbuilt.

Refs dev-graph iss-848.

* docs: index materialization is deferred to the reconciler (iss-848)

Update the index-lifecycle docs to reflect the new contract: @index/@key
declares intent and the physical index is derived state that never fails a
logical operation. Schema apply builds nothing (records intent only);
load/mutate build inline through one chokepoint that defers an untrainable
Vector column as pending; optimize/ensure_indices is the reconciler that
creates declared-but-missing indexes and maintains coverage, reporting
still-pending columns.

Touches: dev/invariants.md (truth-matrix Index-lifecycle row), AGENTS.md
(capability matrix), user/search/indexes.md (L2 orchestration), user/operations/
maintenance.md (optimize reconciler bullet), dev/testing.md (new tests).

* test(server): schema_apply_route_can_add_index reflects deferred index build

iss-848 made schema apply record @index intent without building the physical
index inline. The route test asserted the index count increased after apply;
on an empty graph it now stays unchanged (the build is deferred to
ensure_indices/optimize). Assert the new contract: apply succeeds and the
physical index count is unchanged.

* fix(engine): precheck vector trainability — don't pin or swallow (PR review)

Two issues Cursor Bugbot caught in the chokepoint fault-isolation:

1. (HIGH) Pending vector pins roll back siblings. needs_index_work_node counted
   a missing vector index as work whenever the table had rows, so a column with
   no trainable vectors got pinned in the EnsureIndices recovery sidecar — but
   the build deferred it (zero commit). On a crash before manifest publish the
   classifier sees NoMovement and the all-or-nothing decision (recovery.rs
   decide()) rolls back the WHOLE sidecar, undoing a sibling table's committed
   index work.
2. (MED) Vector build swallowed fatal errors. The match arm converted every
   create_vector_index error into a deferred PendingIndex, hiding genuine
   I/O/manifest/Lance failures as "pending".

Fix both with one trainability precheck (vector_column_trainable: >=1 non-null
vector, the ivf_flat(1) minimum) used identically by needs_index_work_node and
the build arm: an untrainable column is never counted as work (so never pinned —
no zero-commit pin) and never attempted (so it can't fail); only a trainable
column is built, and then any error PROPAGATES (stays fatal). The deferred
column is still recorded as a PendingIndex with a clear reason.

Refs dev-graph iss-848.

* feat(cli): surface pending index column + reason in optimize output (PR review)

Codex (P2): pending_indexes was documented as visible in `optimize --json` but
the CLI projection never emitted it — operators would lose the only signal that
optimize has deferred index work. Greptile (P2): the stat dropped the reason, so
operators saw which column was stuck, not why.

Carry the reason: TableOptimizeStats.pending_indexes is now Vec<PendingIndex>
(column + reason), and `omnigraph optimize --json` emits {column, reason} per
pending index; human output prints a "↳ index pending on '<col>': <reason>" line.

Refs dev-graph iss-848.

* test: align CLI index-add test with deferred build; cover post-rename reconcile

- schema_apply_json_adds_index_for_existing_property (cli_schema_config.rs): the
  CLI analog of the server test — asserted the index count grew after apply;
  under iss-848 the apply defers the build, so the count is unchanged on an
  empty graph. Assert the deferred contract. (The only full-suite failure.)
- optimize_materializes_index_after_type_rename (maintenance.rs, new): covers
  the gap Greptile flagged — a RenameType writes the renamed table with rows but
  no indexes (inline build removed in Commit B); assert the rank index is
  Degraded post-rename and Indexed after optimize reconciles it.

Refs dev-graph iss-848.

* test(engine): in-source apply tests reflect deferred index materialization

The two db::omnigraph in-source unit tests asserted the old "schema apply builds
/ preserves indexes inline" behavior (the only remaining full-suite failures):

- test_apply_schema_defers_index_then_reconciler_builds_it (was
  test_apply_schema_adds_index_for_existing_property): apply records the @index
  intent but builds nothing; assert the BTREE on `age` is absent after apply and
  present after ensure_indices. (Uses `age`, unindexed in TEST_SCHEMA — `name
  @key` is already FTS-indexed at seed.)
- test_apply_schema_rewrite_defers_index_then_reconciler_restores (was
  test_apply_schema_rewrite_preserves_existing_indices): an AddProperty rewrite
  no longer rebuilds indexes inline; assert ensure_indices restores id BTREE +
  name FTS after the rewrite.

Verified by grep that these + the server/CLI tests are the complete set of
"apply builds an index" assertions; all other index-presence tests run after
load/ensure_indices/primitives, which still build.

Refs dev-graph iss-848.

* fix(engine): optimize always reports pending indexes, not only on create-work (PR review)

Cursor Bugbot (MED): pending_indexes was filled only when needs_index_create was
true, but the vector trainability precheck makes needs_index_work_node exclude an
untrainable Vector column. So a table whose sole missing index is untrainable, but
which optimize still compacts or reindexes, returned an empty pending_indexes —
contradicting the documented operator contract for deferred columns.

Run the (idempotent) build chokepoint unconditionally once past the no-op gate,
rather than gating it on needs_index_create. It skips existing indexes, builds
any buildable missing one, and reports an untrainable column as pending whether
the table entered for compaction, reindex, or index creation. needs_index_create
still gates the no-op decision (so an index-only table still enters the path).

Refs dev-graph iss-848.

* test(engine): reframe staged-BTREE-failure failpoint onto the reconciler path

ensure_indices_stage_btree_failure_leaves_existing_tables_writable fired
`ensure_indices.post_stage_pre_commit_btree` and expected `apply_schema` (adding
a type) to fail mid-BTREE-build. iss-848 removed apply's inline index build, so
that apply now succeeds and the test's unwrap_err panicked — it exercised a
removed code path.

Reframe onto where BTREE builds happen now: seed Person, add an `@index` on
`age` (apply records intent, defers the build), then `ensure_indices` builds the
deferred BTREE and the failpoint fires between stage and commit. Person's HEAD
is unchanged (no drift) and its EnsureIndices sidecar pins NoMovement; a write to
a different, unpinned table (Company) is unaffected (mutations/loads heal
roll-forward and proceed, unlike optimize/repair which refuse on a pending
sidecar). Preserves the original coverage (staged-index stage failure leaves
other tables writable, no drift) in the new architecture.

Refs dev-graph iss-848.

* feat(server): converge deferred indexes promptly after schema apply (iss-848)

Schema apply records @index intent but defers the physical build. On a
long-lived server, spawn a detached best-effort ensure_indices after a
successful apply so the indexes converge promptly instead of waiting for the
operator's next optimize. Fire-and-forget: it never blocks or fails the apply
response, and a failure is logged (the index still converges on the next
optimize). Guarded on result.applied. The CLI is one-shot, so it has no
equivalent; its convergence path is the optimize cadence.

handle.engine is already an Arc, so the spawn takes an owned clone. Convergence
itself is covered by the engine ensure_indices/optimize tests; the existing
empty-graph schema-apply route tests confirm the response is unaffected (the
spawn is a read-only no-op on an empty table).

Refs dev-graph iss-848.

* docs(maintenance): list pending_indexes in optimize per-table stats (consistency)
											
										
										
											2026-06-15 18:48:43 +02:00
+								| BTREE / inverted (FTS) / vector indexes | ✅ | `@index`/`@key` declares intent; the physical index is derived state that never fails a logical op. Built per column through one chokepoint (`build_indices_on_dataset_for_catalog`, type-dispatched by `node_prop_index_kind`: enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector); idempotent; lazy across branches. **Schema apply builds nothing** (records intent only); `load`/`mutate` build inline but **defer an untrainable Vector column** (no trainable vectors yet) as *pending* rather than aborting. `ensure_indices`/`optimize` is the reconciler that materializes declared-but-missing indexes and restores coverage of appended/rewritten fragments (`optimize_indices`), reporting still-pending columns (see Compaction row). |
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
+								| `merge_insert` upsert | ✅ | `LoadMode::Merge`, mutation `update`/`insert`/`delete` lowering |
 								| Vector search | ✅ | `nearest()` query op; embedding pipeline (Gemini / OpenAI clients); `@embed` in schema |
 								| Full-text search | ✅ | `search/fuzzy/match_text/bm25` query ops |
 								| Hybrid ranking | — | `rrf(...)` Reciprocal Rank Fusion in one runtime |
 								| Graph traversal | — | CSR/CSC topology index, `Expand` IR op, variable-length hops, `not { }` anti-join |
 								| Schema language | — | `.pg` + Pest grammar + catalog + interfaces + constraints + annotations |
 								| Query language | — | `.gq` + Pest grammar + IR + lowering + linter |
 								| Schema migration planning | — | `plan_schema_migration` + `apply_schema` step types + `__schema_apply_lock__` |
-												Rename repo terminology to graph (#118)
											
										
										
											2026-05-24 16:46:00 +01:00
+								| Commit graph (DAG) across whole graph | — | `_graph_commits.lance` with linear + merge parents, ULID ids, actor map |
-												MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup

Refresh user-facing and agent-facing docs for the staged-write rewire
and clean up stale Run-state-machine references that survived MR-771.

MR-794-specific updates:
* docs/runs.md — remove "Known limitation: mid-query partial failure"
  section; document the in-memory accumulator + D₂ rule + the
  LoadMode::Overwrite residual.
* docs/invariants.md §VI.25 — flip from aspirational/open to
  upheld for inserts/updates. Within-query read-your-writes is now
  load-bearing for the publisher CAS contract.
* docs/architecture.md — add "Mutation atomicity — in-memory
  accumulator (MR-794)" subsection with per-op flow; refresh the
  engine + state diagrams to drop RunRegistry and add MutationStaging.
* docs/execution.md — rewrite the mutation flow sequence diagram
  for the staged-write path; updated the LoadMode table to call
  out per-mode commit semantics; rewrote load vs ingest.
* docs/query-language.md — document the D₂ parse-time rule.
* docs/errors.md — add the D₂ BadRequest rejection path.
* docs/testing.md — extend the runs.rs row to cover the new MR-794
  contract tests; add the staged_writes.rs row.
* docs/releases/v0.4.1.md (new) — release note covering the rewire,
  test additions, residuals, and files changed.
* AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query
  description and the L2 capability matrix row.

Stale-reference cleanup (MR-771 leftovers):
* docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance
  from the layout diagram and prose; mark legacy.
* docs/branches-commits.md — move __run__<id> to a legacy note;
  remove publish_run from the publish-trigger list.
* docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as);
  legacy RunRecord.actor_id moved to a historical note.
* docs/constants.md — mark run registry / branch-prefix rows as legacy.
* docs/cli.md — replace the legacy omnigraph run * quickstart block
  with omnigraph commit list/show.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-01 10:43:19 +02:00
+								| Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing |
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
+								| Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` |
 								| Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming |
-												docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-14 13:53:46 +03:00
+								| Cedar policy | — | Per-graph actions plus server-scoped actions (see [docs/user/operations/policy.md](docs/user/operations/policy.md) for the current list), branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as` — the deprecated `ingest_as` shims route through it — `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. |
-												feat!: delete the legacy OmnigraphConfig + config migrate; finish the omnigraph.yaml docs sweep (#252)

* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI

The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.

No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(cli)!: remove the `config migrate` command and migrate.rs

`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.

- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
  dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(server)!: delete the legacy OmnigraphConfig type and load_config

With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).

- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
  only caller was its own test) is removed — server boot and the CLI both build
  registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
  (`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).

No behavior change — every deleted item was unreachable after the excision.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: purge the legacy omnigraph.yaml surface from the docs

Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.

- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
  single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
  two-surface-config passage (drop `config migrate`, the deprecation env vars,
  and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
  file)" section and the `config migrate` row; re-home the `policy` row, the
  bearer-token chain, the actor/format/param-precedence references, and the
  `--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
  cluster (`--cluster` + `cluster apply`); fix the policy examples to
  `--cluster`; replace the `## Config` omnigraph.yaml example with the
  operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
  `policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
  cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
  references.

RFCs, design specs, and prior release notes are left as historical records.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-15 22:31:29 +03:00
+								| HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export, **cluster-only boot (RFC-011): always `--cluster <dir | s3://…>`, serving N graphs (N ≥ 1) under multi-graph routes + read-only `GET /graphs` enumeration + per-graph + server-level Cedar policies. Add/remove graphs via `cluster apply` and restart.** |
 								| CLI with config | — | two-surface config (team `cluster.yaml` dir + per-operator `~/.omnigraph/config.yaml`), scope addressing (`--store`/`--server`/`--cluster`/`--profile`/defaults, RFC-011), aliases, multi-format output (json/jsonl/csv/kv/table) |
-												MR-771: demote Run to direct-publish via expected_table_versions CAS

mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-30 08:52:50 +02:00
+								| Audit / actor tracking | — | `_as` write APIs + actor map in commit graph |
-												docs: onboarding-first README + in-repo agent skill + drop RustFS script (#257)

* docs: optimize README for dev onboarding; fix 0.7.0 staleness

The README's setup half drifted from the shipped 0.7.0 CLI and led with the
heaviest path (Docker + RustFS). This reworks it for fast, correct onboarding:

README.md
- New zero-dependency "Your first graph in 60 seconds" hero: a fully
  copy-pasteable local file-backed loop (schema → init → load → query → branch).
- Add a correct "Serve it" section (cluster apply + omnigraph-server --cluster);
  the server is cluster-only on main, so the old positional-URI boot is gone.
- Demote the RustFS bootstrap to "rehearse the S3 path locally"; reframe the
  storage bullet as "filesystem or any S3-compatible store (AWS S3, R2, MinIO,
  RustFS)" — RustFS is a provider, not a storage class.
- Fix crate/MCP descriptions (query/mutate/load, not read/change/ingest).

docs/user/quickstart.md
- Fix the query example: `read --name <q> … <uri>` is removed — the query name
  is positional and the graph is addressed with `--store` (`omnigraph query
  find_people --query queries.gq --store graph.omni`).

scripts/local-rustfs-bootstrap.sh
- Convert to cluster mode: write a cluster.yaml (storage: s3://…), then
  validate → import → apply, load the fixture into the derived root with the
  now-required --mode, and serve with `omnigraph-server --cluster`. The old
  flow (`load` without --mode, `omnigraph-server <URI>` positional boot) no
  longer works on a cluster-only server.

* docs: move agent skill into the repo, add agent-setup snippet, drop rustfs script

skills/omnigraph
- The operational skill (formerly `omnigraph-best-practices` in the cookbooks
  repo) now lives with the engine it documents, co-versioned. Renamed to
  `omnigraph`; repository metadata repointed here.
- Broadened the description to trigger on intent — storing/retrieving/querying
  knowledge, agent memory, building a knowledge graph, operating Omnigraph — as
  well as on CLI/artifact sightings (stays ≤1024 chars).
- Install: `npx skills add ModernRelay/omnigraph@omnigraph`.

README
- New "Set it up with an AI agent" paste snippet: installs the skill, reads the
  docs (URL), browses the cookbooks, and asks the user about a use case before
  standing up a first graph.
- "Agent skill & starter graphs" section points at skills/omnigraph + cookbooks.

Drop scripts/local-rustfs-bootstrap.sh
- Not CI-tested (so it rotted: it broke on the cluster-only migration — positional
  server boot, load without --mode), demoed the now-optional S3 path, and was the
  most fragile artifact in the repo. Replaced with a "Testing against S3 locally"
  guide in deployment.md (docker run RustFS/MinIO + AWS_* env + cluster-on-S3).
  README/AGENTS references updated.
											
										
										
											2026-06-16 11:48:13 +02:00
+								| Local S3 testing | — | run RustFS/MinIO + the `AWS_*` env; see [docs/user/deployment.md](docs/user/deployment.md) → *Testing against S3 locally* |
 								| Agent skill | — | `skills/omnigraph` — operational playbook for driving Omnigraph; install with `npx skills add ModernRelay/omnigraph@omnigraph` |
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
 								---
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								## Maintenance contract for agents
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								When you change something user-visible, **update the relevant `docs/user/<area>.md` in the same change**. Use [docs/user/index.md](docs/user/index.md) for public behavior and [docs/dev/index.md](docs/dev/index.md) for contributor/internal mechanics. Pointers from this file to those docs must keep working — CI enforces cross-link integrity via `scripts/check-agents-md.sh`.
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								When proposing or reviewing a non-trivial change, walk [docs/dev/invariants.md](docs/dev/invariants.md) — at minimum the deny-list and review checklist. Add to the deny-list when a new anti-pattern surfaces; relaxing an invariant requires the same review process as code.
-												Add architectural invariants & deny-list as docs/invariants.md

A standing reference for invariants that hold across storage, engine,
server, schema, indexing, observability, and the OSS/Cloud split. Used
to check RFCs and PRs against the substrate boundaries (don't rebuild
what Lance gives us), layering rules (one trait boundary per layer),
distributability constraints (Send+Sync, location-neutral IR), honesty
expectations (estimate-vs-actual, bounded failure modes), unified
patterns (reconciler, Union polymorphism, SIP, factorize), the §IX
deny-list, and the §X review checklist.

§IV (additivity / migration) and §VIII (OSS/Cloud kernel-product split)
are referenced but not yet drafted — flagged as placeholders pending
upstream fill-in.

AGENTS.md surfaces it from the topic index, the always-on rules
section, and the maintenance contract; the deny-list is also inlined
there as a fast-pass review filter so it stays in scope every turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:34:44 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+								Rules:
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:31:08 +02:00
+. **Update in the same PR.** New endpoint, query function, CLI flag, env var, constant, schema construct, or invariant: update both the source code and the doc in the same change. Never split documentation drift into a follow-up.
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+. **Bump version on release.** When a release boundary crosses (e.g. v0.3.1 → v0.3.2), update the version line at the top of this file and add a `docs/releases/<version>.md` describing the user-visible delta. Update [docs/dev/architecture.md](docs/dev/architecture.md) only if the architecture itself changed.
-												release: prepare omnigraph 0.4.2

											
										
										
											2026-05-10 14:02:28 +00:00
+. **Write OSS-facing release notes.** Release docs are public project history. Describe capabilities, behavior changes, breaking changes, upgrade notes, and user impact; do not reference private ticket systems, internal codenames, or planning shorthand that an outside contributor cannot inspect.
 . **Keep versioning coherent.** A release bump must update every published crate manifest, local path dependency constraint, `Cargo.lock`, generated API metadata such as `openapi.json`, and this file's surveyed version. Do not leave mixed package versions unless the release plan explicitly calls for them.
 . **Keep docs audience-neutral.** Prefer stable public identifiers (versions, PR numbers, public issue links, crate names, endpoint names) over organization-specific labels. If internal context is useful for maintainers, translate it into a durable public rationale before committing it.
 . **Don't lie.** If a section becomes wrong but you can't rewrite it fully right now, replace the wrong line with `*(stale — needs update after <change>)*` rather than leaving silently incorrect text. Then fix it ASAP.
 . **Re-verify before recommending.** If you cite a flag, env var, endpoint, or constant to the user or in code, grep for it in source first. Memory and docs go stale; the code is authoritative.
-												agents: keep guide short for context

											
										
										
											2026-05-10 14:41:02 +00:00
+. **Keep AGENTS.md short.** This file is always loaded into agent context, so every added line has a recurring context-window cost. Prefer pointers and terse invariants here; put detail in `docs/`.
 . **Keep AGENTS.md a map, not an encyclopedia.** New deep content goes into `docs/`. Add an entry to "Where to find each topic" instead of pasting prose into this file. The "Always-on rules" section is the exception — it's for invariants that should always be in scope.
-												docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
											
										
										
											2026-06-14 13:53:46 +03:00
+. **Re-read on schema/query/IR changes.** Edits to `schema.pest`, `query.pest`, `ir/lower.rs`, `query/typecheck.rs`, or `query/lint.rs` should trigger a re-read of [docs/user/schema/index.md](docs/user/schema/index.md), [docs/user/queries/index.md](docs/user/queries/index.md), and [docs/dev/execution.md](docs/dev/execution.md) to confirm they still describe reality.
-												agents: keep guide short for context

											
										
										
											2026-05-10 14:41:02 +00:00
+. **Always make smaller commits.** Each commit does one thing, compiles, and passes tests; mechanical refactors land separately from the behavior changes they enable.
 . **Test-first for bug fixes.** When fixing an identified bug, write a regression test that reproduces the failure first. Confirm it fails against the current code with the predicted symptom (not an unrelated error). Then land the fix in a separate commit and confirm the test turns green. The test commit lands just before the fix commit so the red → green pair is visible in `git log` and a reviewer can check out the test commit alone and reproduce the failure.
 . **Correct by design over symptomatic patches.** When a bug surfaces, identify the root cause and make the fix correct by construction. Don't patch the symptom. If the design admits the bug class, the fix is to close the class, not to add a guard around the latest instance. A symptomatic patch is acceptable only as a stop-gap, with an explicit note in the commit message and a follow-up issue tracking the design fix.
-												Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it

Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-28 23:10:09 +02:00
-												docs: split user and developer docs (#93)
											
										
										
											2026-05-15 03:45:22 +03:00
+								CI check: `scripts/check-agents-md.sh` verifies that docs links in this file and the audience indexes resolve, and that every canonical doc is linked from either [docs/user/index.md](docs/user/index.md) or [docs/dev/index.md](docs/dev/index.md). Run it locally before opening a PR if you've moved or renamed docs.