Commit graph

27 commits

Author SHA1 Message Date
Ragnor Comerford
97c6490c1d
docs: post-merge code-review fixes (Cursor + cubic on PR #60 era)
Two factual mismatches caught during code-grounded re-review:

- docs/architecture.md: "13 cmd families" was stale — the CLI has 17
  Command variants (Version, Embed, Init, Load, Ingest, Branch, Schema,
  Query, Snapshot, Export, Run, Commit, Read, Change, Policy, Optimize,
  Cleanup). Replaced the count with "command families" so the diagram
  doesn't drift again.

- docs/execution.md: the mutation prose said "every mutation runs on a
  fresh __run__<id> branch", which over-claims. mutation.rs:555 short-
  circuits when the target is already a __run__ branch — the assumption
  there is the caller is managing the surrounding run lifecycle. Added a
  one-paragraph caveat noting the exception with the file:line citation.

Both diagrams unchanged; only annotations / counts adjusted.
2026-04-29 17:14:48 +02:00
Ragnor Comerford
411d86b743
docs/execution: fix mutation atomicity narrative — run-branch + publish_run
The mutation flow diagram and prose previously claimed multi-statement
mutations publish through a single ManifestRepo::commit. That's wrong:
each statement commits independently to a __run__<id> branch via
commit_updates; atomicity comes from the publish_run step that promotes
the run-branch into the target (fast path) or three-way merges it
(merge path).

Diagram now shows:
- begin_run forks __run__<id> from target head
- per-statement commit_updates on the run-branch (loop)
- OCC pre-check on target head
- publish_run with fast-path / merge-path branches
- terminate_run(Published)

Prose now points at runs.md for the full run lifecycle and cites the
correct entry points: mutate_with_current_actor (mutation.rs:539),
publish_run (omnigraph.rs:858).

Addresses Codex review comment on PR #64.
2026-04-29 17:09:29 +02:00
Ragnor Comerford
64b9d56476
docs: add Mermaid architecture diagrams across architecture / storage / execution
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.

Static structure (docs/architecture.md):

- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
  via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
  server/CLI. Each zoom-in cites file:line entry points.

Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.

On-disk layout (docs/storage.md):

- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
  _graph_runs.lance, _refs/branches/ down into Lance's per-dataset
  internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
  not move or rewrite that content.

Runtime flows (docs/execution.md):

- Read flow sequence: client → Omnigraph::query → typecheck → lower →
  execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
  Lance write op (Append / merge_insert) → ManifestRepo::commit →
  __manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
  file:line citations so readers can navigate from diagram element to
  source in one step.

Conventions established (this is the first Mermaid in the repo):

- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
  storage trait / Lance / object store. No accidental synonyms.

Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
2026-04-29 16:58:56 +02:00
Claude
243c0c3464
Add internal-schema versioning + auto-migration for __manifest
The on-disk shape of `__manifest` is reconciled with the binary via a single
stamp + dispatcher in `db/manifest/migrations.rs`:

- `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` declares the shape this binary writes.
- The on-disk stamp `omnigraph:internal_schema_version` lives in the manifest
  dataset's schema-level metadata (Lance `update_schema_metadata`).
- `migrate_internal_schema(&mut dataset)` walks `match`-arm steps forward from
  the on-disk stamp until it matches the binary, then returns. Idempotent.
- `init_manifest_repo` stamps the current version at creation; the publisher's
  open-for-write path runs pending migrations before reading state. Reads
  stay side-effect-free.
- Forward-version protection: a stamp higher than the binary's known version
  triggers a clear "upgrade omnigraph first" error so an old binary cannot
  clobber a newer schema.

Self-heals existing pre-MR-766 deployments by auto-applying the v1→v2 step:
the `lance-schema:unenforced-primary-key` annotation on `__manifest.object_id`
that engages Lance's row-level CAS at commit time. New repos created via
`init` are stamped at v2 immediately and don't need migration.

Adding a future on-disk shape change is one constant bump, one match arm in
`migrate_internal_schema`, and one test — no new branches in unrelated code
paths. Code outside the migration module never inspects the stamp.

New tests in `manifest/tests.rs`:
- `test_init_stamps_internal_schema_version`
- `test_publish_migrates_pre_stamp_manifest_to_current_version`
- `test_publish_rejects_manifest_stamped_at_future_version`

Docs: `docs/storage.md`, `docs/maintenance.md`, `docs/constants.md` updated
per the AGENTS.md maintenance contract.
2026-04-29 11:44:14 +00:00
Claude
5eb47b8c13
docs: surface MR-766 publisher OCC in storage / errors / constants
- storage.md: document the row-level CAS annotation on `__manifest.object_id`
  and the `expected_table_versions` OCC contract on `ManifestBatchPublisher::publish`.
- errors.md: list `ManifestConflictDetails` and its variants alongside `ManifestError`.
- constants.md: add `PUBLISHER_RETRY_BUDGET = 5`.

Per AGENTS.md "Maintenance contract": new schema construct, new constant, and
new typed error shape all need to ship with the source change.
2026-04-29 07:56:18 +00:00
Ragnor Comerford
56b30c5c5a
Restructure invariants doc: drop commercial, separate patterns from invariants
- Removed §IX (OSS / Cloud kernel-product split) — business strategy belongs
  in MR-738, not the technical invariants doc.
- Filled the §IV (Additivity / migration) placeholder with five evolution
  invariants.
- Reframed §I to be substrate-agnostic: invariants are about respecting any
  substrate; Lance / DataFusion are noted as the current chosen substrate
  rather than as the invariant itself.
- Added §VI Database guarantees (12 invariants): atomicity, schema integrity,
  isolation, durability, causal consistency, determinism, idempotency, no
  silent loss, bounded operations, failure scope, crash recovery, consistency
  model.
- Added §II.8 wire-protocol agnosticism (kernel transport-agnostic,
  Flight/HTTP at the server boundary).
- Reframed §VII as "Current architectural patterns" — explicitly distinct
  from invariants. Each pattern entry now names the underlying invariant it
  realizes (reconciler / Union / mutations-wrap-reads / SIP / factorize /
  stable row IDs / rank columns / policy predicates / Source).
- Pulled specific config defaults out of §VI (timeouts, memory caps);
  invariant is that bounds exist, values live in docs/constants.md.
- Split §IX deny-list into "invariant violations" (high bar) and "pattern
  violations" (overridable with justification).
- Added status legend: decided / open — see MR-X / aspirational. Annotated
  invariants and patterns that are not yet upheld in current code.
- Updated review checklist (§X) to cover database-guarantee dimensions and
  the wire-protocol / Source / patterns sections.
- Updated Living Document policy (§XI) to spell out how to revise patterns,
  resolve open invariants, and lift aspirational annotations.

Source tickets: MR-737, MR-744, MR-765, MR-694 family, MR-722/MR-725.
2026-04-29 00:39:11 +02:00
Ragnor Comerford
6f25c4f9f8
Address reviewer feedback (Cursor + cubic) on PR #60
All eight comments verified against source and applied:

- AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of
  the markdown blockquote. Claude Code's @-import parser expects @ at
  column 0; the leading "> " of a blockquote silently broke
  recognition, so the claimed auto-include did nothing. (Cursor,
  Medium severity.)
- docs/cli-reference.md: command-family count 13 → 17. The current
  enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level
  variants. (cubic P2.)
- docs/ci.md: Homebrew tap update is a regular `git push`, not a
  force-push (release.yml:117 is `git push origin HEAD:main`). (cubic
  P2.)
- docs/errors.md: add the Storage variant to the NanoError list — it
  exists at error.rs:88-89 but the doc enumerated only 10 of 11.
  (cubic P2.)
- docs/storage.md: clarify tombstone semantics. There is no
  tombstone_version column; state.rs:180 reads the tombstone version
  from the table_version column on rows where object_type =
  table_tombstone. (cubic P2.)
- docs/branches-commits.md: split the GraphCommit pseudo-struct from
  the underlying storage. actor_id is joined in-memory from
  _graph_commit_actors.lance, not a column on _graph_commits.lance.
  (cubic P2.)
- docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to
  match the actual constant name in catalog/schema_ir.rs:11.
  (cubic P3.)
- docs/testing.md: engine integration test count 16 → 15 (matches
  `ls crates/omnigraph/tests/*.rs`). (cubic P3.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:09:06 +02:00
Ragnor Comerford
ada58ccd7b
Make "check existing coverage first" a top-level testing principle
The original docs/testing.md mentioned finding existing tests as step 1
of the checklist but never explicitly said "if existing coverage
already addresses your case, extend it; don't duplicate." Adds a
prominent "First principle" section that names extend-vs-new as the
preferred outcome and lists three duplicated init_and_load blocks as
the most common form of test rot.

Adds an extra checklist item: verify your change makes an *existing*
test fail before it makes a new one pass — if you can break the code
without breaking a test, that coverage gap is the bug to fix first.

Strengthens the AGENTS.md callout so the principle ("always check what
already covers it") is in scope from the top of every session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:03:50 +02:00
Ragnor Comerford
8be0e6a067
Add docs/testing.md as required-read every session
Maps the test surface (engine integration tests by area, CLI/server
tests, helpers harness, fixtures, failpoints feature, RustFS S3
integration, OpenAPI drift) and gives a before-every-task checklist:
find existing tests for the area, run them as a clean baseline, plan
the new test up front, reuse helpers, mind the layer boundary per
invariants §VII.33.

Notes that there's no coverage tooling today — coverage knowledge
comes from reading and running the relevant integration tests, not a
tarpaulin/codecov report.

Threaded into AGENTS.md as the third required-reading file alongside
invariants.md and lance.md, with a Claude-Code @-import so agents
load it on every turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:55:21 +02:00
Ragnor Comerford
b6440d6b17
Add docs/lance.md — task-organized index of Lance upstream docs
Curates the Lance documentation site (lance.org) into a problem-domain
index so agents fetch the right page when working on Lance-touching
code instead of guessing or grepping our codebase. Organized by topic:
storage format & file layout, branching/tags/time travel, indexes
(scalar + system + vector), reads/writes, schema evolution, object
store, data types, performance, compaction, DataFusion integration,
SDK reference, plus quick-starts and the upstream AGENTS.md.

Skips ~200 irrelevant URLs from the upstream sitemap (Namespace REST
API model surface, Spark/Trino/Databricks/etc. integrations,
Python/Ray/HuggingFace docs, community pages) since omnigraph is
Rust-only and doesn't run a Lance Namespace catalog.

AGENTS.md surfaces it in the topic index and adds a directive: "when
you hit a Lance-shaped problem, consult docs/lance.md and fetch the
upstream URL before guessing."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:48:28 +02:00
Ragnor Comerford
c924e121d2
Add architectural invariants & deny-list as docs/invariants.md
A standing reference for invariants that hold across storage, engine,
server, schema, indexing, observability, and the OSS/Cloud split. Used
to check RFCs and PRs against the substrate boundaries (don't rebuild
what Lance gives us), layering rules (one trait boundary per layer),
distributability constraints (Send+Sync, location-neutral IR), honesty
expectations (estimate-vs-actual, bounded failure modes), unified
patterns (reconciler, Union polymorphism, SIP, factorize), the §IX
deny-list, and the §X review checklist.

§IV (additivity / migration) and §VIII (OSS/Cloud kernel-product split)
are referenced but not yet drafted — flagged as placeholders pending
upstream fill-in.

AGENTS.md surfaces it from the topic index, the always-on rules
section, and the maintenance contract; the deny-list is also inlined
there as a fast-pass review filter so it stays in scope every turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:34:44 +02:00
Ragnor Comerford
a335d98854
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
Andrew Altshuler
8649b2084f
Prepare v0.3.0 release (#44)
* Prepare v0.3.0 release

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

* ci: retrigger CI on latest openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-21 19:11:34 +03:00
andrew
d830ebcb64 Document AWS build variant and bearer-token sources
- docs/deployment.md: new "Token sources" section listing the three
  bearer-token source precedences (AWS SM, JSON file/env, single token).
  New "Build Variants" section explaining default vs aws builds and
  their release-artifact naming. New "AWS Secrets Manager" section
  covering env var, secret payload format, IAM role credential
  discovery, and the hard error for feature-less builds.
- CONTRIBUTING.md: documents the `aws` feature and the two test
  commands contributors should run when touching auth code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 04:04:45 +03:00
andrew
33bdab1fcb Prepare v0.2.2 release 2026-04-14 20:13:00 +03:00
andrew
3d74cbfc20 Prepare v0.2.1 release 2026-04-14 19:19:00 +03:00
andrew
1a26e2e654 Rename config targets to graphs 2026-04-14 04:12:14 +03:00
andrew
1bf55fa52d Add query lint and check commands 2026-04-13 00:37:44 +03:00
andrew
ea24efbf24 Harden local RustFS bootstrap repo recovery 2026-04-12 21:08:04 +03:00
andrew
5daeae7571 Prepare v0.2.0 release 2026-04-12 20:35:34 +03:00
andrew
92fa3189f7 Add schema apply command and policy support 2026-04-12 04:01:14 +03:00
andrew
4b058b9813 Fix CLI ergonomics and stream export output 2026-04-11 19:01:48 +03:00
andrew
02acd47a94 Remove stale Homebrew source-build note 2026-04-11 14:12:49 +03:00
andrew
446075f333 Update workflow actions and add Homebrew install docs 2026-04-11 04:01:39 +03:00
andrew
816b24d05e Fix public binary install flow 2026-04-11 02:19:21 +03:00
andrew
cbb312e74f Split binary and source install flows 2026-04-10 23:26:09 +03:00
andrew
338289656a Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00