diff --git a/AGENTS.md b/AGENTS.md index e1919cb..e3c492e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,17 +2,17 @@ This file is the always-on map for AI coding agents (Claude Code, Codex, Cursor, Cline) working in this repo. It is loaded into context on every turn, so it stays as a **map plus the rules and invariants that need to be in scope at all times** — the encyclopedia content lives under [`docs/`](docs/). When you need depth, follow a pointer. -> **Required reading every session, every change:** -> -> 1. **[docs/invariants.md](docs/invariants.md)** — the architectural invariants and §IX deny-list. Apply to every PR, not only architecture work. -> 2. **[docs/lance.md](docs/lance.md)** — the curated index of upstream Lance docs. **Consult it before every task** to identify which Lance pages are relevant to what you're about to do, then fetch those upstream URLs before grepping our code or guessing. Lance is the substrate; behavior is documented there, not here. -> 3. **[docs/testing.md](docs/testing.md)** — the test-coverage map. **Always check what already covers your change before writing a new test.** Extending an existing test (an assertion, a fixture row, a parameterization) is preferred over a duplicated `init_and_load()` block. Walk the before-every-task checklist to identify existing coverage, run those tests as a clean baseline, and only add a new test fn or file when no existing one owns the area. -> -> Tools that support `@`-imports (Claude Code) auto-include all three files via the imports below. Other agents (Codex, Cursor, Cline, …) must open them explicitly at the start of each session. -> -> @docs/invariants.md -> @docs/lance.md -> @docs/testing.md +**Required reading every session, every change:** + +1. **[docs/invariants.md](docs/invariants.md)** — the architectural invariants and §IX deny-list. Apply to every PR, not only architecture work. +2. **[docs/lance.md](docs/lance.md)** — the curated index of upstream Lance docs. **Consult it before every task** to identify which Lance pages are relevant to what you're about to do, then fetch those upstream URLs before grepping our code or guessing. Lance is the substrate; behavior is documented there, not here. +3. **[docs/testing.md](docs/testing.md)** — the test-coverage map. **Always check what already covers your change before writing a new test.** Extending an existing test (an assertion, a fixture row, a parameterization) is preferred over a duplicated `init_and_load()` block. Walk the before-every-task checklist to identify existing coverage, run those tests as a clean baseline, and only add a new test fn or file when no existing one owns the area. + +Tools that support `@`-imports (Claude Code) auto-include all three files via the imports below — note these must sit at column 0 (not inside a blockquote) for the parser to recognize them. Other agents (Codex, Cursor, Cline, …) must open them explicitly at the start of each session. + +@docs/invariants.md +@docs/lance.md +@docs/testing.md `CLAUDE.md` is a symlink to this file — there is exactly one source of truth. Edit `AGENTS.md`. diff --git a/docs/branches-commits.md b/docs/branches-commits.md index eaa6cab..2e8cca8 100644 --- a/docs/branches-commits.md +++ b/docs/branches-commits.md @@ -16,7 +16,7 @@ OmniGraph builds *graph branches* on top by branching every sub-table coherently ## L2 — Commit graph (`db/commit_graph.rs`) -Stored as a Lance dataset `_graph_commits.lance` (with stable row IDs): +In-memory shape of a graph commit: ``` GraphCommit { @@ -25,14 +25,20 @@ GraphCommit { manifest_version: u64, parent_commit_id: Option, merged_parent_commit_id: Option, // populated for merge commits - actor_id: Option, + actor_id: Option, // joined in-memory from _graph_commit_actors.lance, NOT a column on _graph_commits.lance created_at: i64 (microseconds since epoch), } ``` +Storage is split across two Lance datasets (both with stable row IDs): + +- `_graph_commits.lance` — every column above *except* `actor_id`. +- `_graph_commit_actors.lance` — optional separate `(graph_commit_id, actor_id)` map, created on demand. The `actor_id` field above is populated by joining this dataset in-memory at load time. + +Notes: + - Every successful publish (load / change / merge / schema_apply / publish_run) appends one commit. - Merge commits have two parents; linear commits have one. -- `_graph_commit_actors.lance` — optional separate actor map (created on demand). - API: `list_commits(branch)`, `get_commit(id)`, `head_commit_id_for_branch(branch)`. ## L2 — Snapshots & time travel diff --git a/docs/ci.md b/docs/ci.md index 3424830..d9855b0 100644 --- a/docs/ci.md +++ b/docs/ci.md @@ -6,5 +6,5 @@ - **AWS feature build job**: `cargo build/test -p omnigraph-server --features aws` on ubuntu-latest. - **RustFS S3 integration**: spins up RustFS in Docker, runs `s3_storage`, `server_opens_s3_repo_directly_and_serves_snapshot_and_read`, and `local_cli_s3_end_to_end_init_load_read_flow`. - **release-edge.yml**: on every push to main, retags `edge`, builds Linux/macOS-Intel/macOS-arm64 archives + sha256, publishes a rolling prerelease. -- **release.yml**: on `v*` tags, builds the 3-platform matrix and updates the Homebrew tap (`scripts/update-homebrew-formula.sh`) by force-pushing the regenerated formula to `ModernRelay/homebrew-tap`. +- **release.yml**: on `v*` tags, builds the 3-platform matrix and updates the Homebrew tap (`scripts/update-homebrew-formula.sh`) by pushing the regenerated formula to `ModernRelay/homebrew-tap`. - **package.yml**: manual ECR image build; emits two image tags per commit (``, `-aws`) via CodeBuild. diff --git a/docs/cli-reference.md b/docs/cli-reference.md index dee5cfa..599ee13 100644 --- a/docs/cli-reference.md +++ b/docs/cli-reference.md @@ -2,7 +2,7 @@ A reference for the `omnigraph` binary's command surface and `omnigraph.yaml` schema. For a quick-start guide, see [cli.md](cli.md). -13 top-level command families, 40+ subcommands. All commands accept either a positional `URI`, `--uri`, or a `--target ` resolved against `omnigraph.yaml`. +17 top-level command families, 40+ subcommands. All commands accept either a positional `URI`, `--uri`, or a `--target ` resolved against `omnigraph.yaml`. ## Top-level commands diff --git a/docs/errors.md b/docs/errors.md index bd8a2df..4a86a5b 100644 --- a/docs/errors.md +++ b/docs/errors.md @@ -9,7 +9,7 @@ - `Manifest(ManifestError { kind: BadRequest|NotFound|Conflict|Internal, … })` - `MergeConflicts(Vec)` -Compiler-side `NanoError` covers parse / catalog / type / plan / execution / arrow / lance / IO / manifest / unique-constraint, each with structured spans (`SourceSpan { start, end }`) for ariadne-style diagnostics. +Compiler-side `NanoError` covers parse / catalog / type / storage / plan / execution / arrow / lance / IO / manifest / unique-constraint, each with structured spans (`SourceSpan { start, end }`) for ariadne-style diagnostics. ## Result serialization (`omnigraph_compiler::result::QueryResult`) diff --git a/docs/schema-language.md b/docs/schema-language.md index 64d7eaa..8ef1285 100644 --- a/docs/schema-language.md +++ b/docs/schema-language.md @@ -59,7 +59,7 @@ Edge bodies only allow `@unique` and `@index`. ## Schema IR & stable type IDs -- `IR_VERSION = 1` (`catalog/schema_ir.rs`). +- `SCHEMA_IR_VERSION = 1` (`catalog/schema_ir.rs`). - Each interface/node/edge gets a `stable_type_id` (kind+name hashed) so renames can be tracked. - Serialized as JSON for diff/migration plans. diff --git a/docs/storage.md b/docs/storage.md index 1c8dd76..21e73b4 100644 --- a/docs/storage.md +++ b/docs/storage.md @@ -27,7 +27,7 @@ OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordin - `object_type` ∈ `table | table_version | table_tombstone` - `table_key` ∈ `node: | edge:` - `table_branch` is `null` for the main lineage and the branch name otherwise -- **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones whose `tombstone_version >= table_version`. +- **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones — rows where `object_type = table_tombstone`, whose own `table_version` (acting as the tombstone version) is `>= the entry's table_version`. - **Atomic publish**: multi-dataset commits publish via a `ManifestBatchPublisher` so a single write to `__manifest` flips all the new sub-table versions visible at once. ## URI scheme support (`storage.rs`) diff --git a/docs/testing.md b/docs/testing.md index c23c6ec..f4330db 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -6,7 +6,7 @@ This file is the always-on map of the test surface. **Consult it before every ta | Crate | Path | Style | |---|---|---| -| `omnigraph` (engine) | `crates/omnigraph/tests/` | Integration tests (16 files), fixture-driven, share `tests/helpers/mod.rs` | +| `omnigraph` (engine) | `crates/omnigraph/tests/` | Integration tests (15 files), fixture-driven, share `tests/helpers/mod.rs` | | `omnigraph-cli` | `crates/omnigraph-cli/tests/` | `cli.rs` (unit-ish), `system_local.rs`, `system_remote.rs`, share `tests/support/mod.rs` | | `omnigraph-server` | `crates/omnigraph-server/tests/` | `server.rs` (HTTP-level), `openapi.rs` (OpenAPI drift / regeneration) | | `omnigraph-compiler` | mostly in-source `#[cfg(test)] mod tests` | Parser, type-checker, IR lowering, lint |