Commit graph

145 commits

Author SHA1 Message Date
Ragnor Comerford
1a906403cb
MR-771: address cubic comment — drop vacuous __run__ check in cancel test
cubic correctly flagged that the assertion `!branches_after.iter().any(|b| b.starts_with("__run__"))` is vacuous because `branch_list()` already filters `__run__*` via `is_internal_system_branch`. The real structural property (no `__run__` branches can ever be created) is enforced by MR-771's deletion of `begin_run` etc. — that's a build-time invariant, not a runtime one.

Drop the vacuous assertion; document why. The remaining checks (public branch list unchanged + `_graph_runs.lance` never reappears) cover the actual cancel-safety properties.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 15:31:49 +02:00
Ragnor Comerford
a7109d5fba
MR-771: address PR review feedback
Three fixes from automated PR review on #65:

1. Internal-branch guard in mutation/load (Cursor Bugbot, Medium).
   Pre-MR-771 the begin_run path called ensure_public_branch_ref;
   the direct-publish replacements only normalized the name. A caller
   passing __run__* or __schema_apply_lock__ verbatim could write
   directly to a system branch. Re-add the explicit guard at the
   public write boundary in mutate_with_current_actor and load.

2. Panic-safe coordinator restoration (Cursor Bugbot, High).
   The previous swap-and-restore pattern would skip restore_coordinator
   if execute_named_mutation panicked, leaving the handle pinned to
   the wrong branch indefinitely. Replace with a CoordinatorRestoreGuard
   RAII type that captures the previous coordinator on swap and
   restores it in Drop.

3. Flaky cancel-safety test (cubic, P2).
   tests/runs.rs::cancelled_mutation_future_leaves_no_state asserted
   manifest version equality after handle.abort(), but abort races
   the spawned task. Re-frame around what actually defines cancel
   safety: no __run__* branches, no _graph_runs.lance, no synthesized
   public branches.

The fourth comment (Codex P1: branch_delete losing its in-flight
write barrier) is bigger in scope — fits in the MR-794 storage-trait
staging story rather than a hotfix here. Tracked there.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 15:17:00 +02:00
Ragnor Comerford
35be20cb05
MR-771: demote Run to direct-publish via expected_table_versions CAS
mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 08:52:50 +02:00
Ragnor Comerford
4e5374a85e
Merge pull request #63 from ModernRelay/claude/extend-manifest-batch-publisher-GQzB6 2026-04-29 14:45:33 +02:00
Claude
63babede4a
AGENTS.md: reframe 'minimize ongoing liability' as a general decision lens
The previous bullets read like a migration pattern (centralized
dispatcher, one match arm, no shape forks). That's one application, not
the principle. Reframe it as a bidirectional decision lens: ask "which
option has lower ongoing cost over time?" and let the answer be more
code, less code, DRYing, duplication, removal, addition, a new
abstraction, or flattening one — whichever shape converges over the
expected change horizon.

Add explicit examples of cases where the lower-liability option is
*more* code (dispatcher, migration framework, typed error variants) and
where it's *less* (premature abstractions, "just in case" paths,
helpers that wedge independently-evolving callers together) so readers
don't collapse the principle into "minimize code".
2026-04-29 12:32:30 +00:00
Claude
b6a596670a
AGENTS.md: add 'minimize ongoing liability' as a first principle
The always-on rules are concretizations of a broader engineering posture
that wasn't stated explicitly. Add a short section that frames the spirit
behind those rules:

- One centralized detection point, not many heal hooks.
- One dispatcher, not branch-on-shape in every consumer.
- One canonical shape after migration, not forks on "old vs new".
- Three similar lines beats a premature abstraction.
- Delete dead paths when their last caller leaves.

Plus a forward-looking review prompt ("what do these paths look like after
5 more changes like this?") so the principle bites at design time, not
just at review time.

The internal-schema-version mechanism we just shipped is a concrete
application: one stamp + one dispatcher + one match arm per change, no
heal hooks scattered across the engine. Codify the pattern so future
work doesn't drift back to ad-hoc.
2026-04-29 11:48:47 +00:00
Claude
243c0c3464
Add internal-schema versioning + auto-migration for __manifest
The on-disk shape of `__manifest` is reconciled with the binary via a single
stamp + dispatcher in `db/manifest/migrations.rs`:

- `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` declares the shape this binary writes.
- The on-disk stamp `omnigraph:internal_schema_version` lives in the manifest
  dataset's schema-level metadata (Lance `update_schema_metadata`).
- `migrate_internal_schema(&mut dataset)` walks `match`-arm steps forward from
  the on-disk stamp until it matches the binary, then returns. Idempotent.
- `init_manifest_repo` stamps the current version at creation; the publisher's
  open-for-write path runs pending migrations before reading state. Reads
  stay side-effect-free.
- Forward-version protection: a stamp higher than the binary's known version
  triggers a clear "upgrade omnigraph first" error so an old binary cannot
  clobber a newer schema.

Self-heals existing pre-MR-766 deployments by auto-applying the v1→v2 step:
the `lance-schema:unenforced-primary-key` annotation on `__manifest.object_id`
that engages Lance's row-level CAS at commit time. New repos created via
`init` are stamped at v2 immediately and don't need migration.

Adding a future on-disk shape change is one constant bump, one match arm in
`migrate_internal_schema`, and one test — no new branches in unrelated code
paths. Code outside the migration module never inspects the stamp.

New tests in `manifest/tests.rs`:
- `test_init_stamps_internal_schema_version`
- `test_publish_migrates_pre_stamp_manifest_to_current_version`
- `test_publish_rejects_manifest_stamped_at_future_version`

Docs: `docs/storage.md`, `docs/maintenance.md`, `docs/constants.md` updated
per the AGENTS.md maintenance contract.
2026-04-29 11:44:14 +00:00
Claude
5eb47b8c13
docs: surface MR-766 publisher OCC in storage / errors / constants
- storage.md: document the row-level CAS annotation on `__manifest.object_id`
  and the `expected_table_versions` OCC contract on `ManifestBatchPublisher::publish`.
- errors.md: list `ManifestConflictDetails` and its variants alongside `ManifestError`.
- constants.md: add `PUBLISHER_RETRY_BUDGET = 5`.

Per AGENTS.md "Maintenance contract": new schema construct, new constant, and
new typed error shape all need to ship with the source change.
2026-04-29 07:56:18 +00:00
Claude
df0e158190
Add per-table expected-version OCC to ManifestBatchPublisher (MR-766)
Layered approach selected by the CAS-granularity investigation
(.context/merge-insert-cas-granularity.md):

- Annotate __manifest.object_id with `lance-schema:unenforced-primary-key`,
  enabling Lance row-level CAS via the bloom-filter conflict resolver.
  Closes a latent silent-duplicate bug where two concurrent publishes of
  the same `version:T@v=N+1` row could both land in disjoint fragments.
- Extend `ManifestBatchPublisher::publish` with `expected_table_versions:
  &HashMap<String, u64>`. Empty map preserves today's behavior; populated
  map asserts the manifest's latest non-tombstoned version per table
  matches the caller's view. Mismatches surface as a typed
  `ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected,
  actual }` so callers can match without parsing strings.
- Set `merge_builder.conflict_retries(0)` so Lance's transparent rebase
  cannot silently break the OCC contract; retries are owned by the
  publisher loop, where each attempt re-runs `load_publish_state` and
  the expected-version pre-check.
- Surface `ManifestCoordinator::commit_with_expected` for the callers
  that need strict OCC (the run-demotion ticket); existing `commit` and
  `commit_changes` paths are unaffected.

New tests in `manifest/tests.rs` cover: matching expected versions,
stale expected with typed details, drift on an untouched expected
table, unknown expected table (actual=0), and the headline case of two
concurrent publishes with overlapping expected versions where exactly
one succeeds.
2026-04-29 00:34:20 +00:00
Claude
bb95fdceda
Investigate Lance MergeInsertBuilder CAS granularity (MR-766 prereq)
Confirms Lance v4.0.0 has row-level CAS for merge_insert only when the
join-key column carries lance-schema:unenforced-primary-key=true. Our
__manifest schema does not, so the publisher silently allows duplicate
object_id rows under concurrent writers. Note + reproducible scratch
crate select the layered (pre-check + row-level CAS) approach for the
publisher API ticket.
2026-04-28 23:30:17 +00:00
Ragnor Comerford
58dba6210e
Merge pull request #61 from ModernRelay/ragnorc/query-execution-deep-dive
docs/invariants: drop commercial section; separate patterns from invariants
2026-04-29 00:44:26 +02:00
Ragnor Comerford
56b30c5c5a
Restructure invariants doc: drop commercial, separate patterns from invariants
- Removed §IX (OSS / Cloud kernel-product split) — business strategy belongs
  in MR-738, not the technical invariants doc.
- Filled the §IV (Additivity / migration) placeholder with five evolution
  invariants.
- Reframed §I to be substrate-agnostic: invariants are about respecting any
  substrate; Lance / DataFusion are noted as the current chosen substrate
  rather than as the invariant itself.
- Added §VI Database guarantees (12 invariants): atomicity, schema integrity,
  isolation, durability, causal consistency, determinism, idempotency, no
  silent loss, bounded operations, failure scope, crash recovery, consistency
  model.
- Added §II.8 wire-protocol agnosticism (kernel transport-agnostic,
  Flight/HTTP at the server boundary).
- Reframed §VII as "Current architectural patterns" — explicitly distinct
  from invariants. Each pattern entry now names the underlying invariant it
  realizes (reconciler / Union / mutations-wrap-reads / SIP / factorize /
  stable row IDs / rank columns / policy predicates / Source).
- Pulled specific config defaults out of §VI (timeouts, memory caps);
  invariant is that bounds exist, values live in docs/constants.md.
- Split §IX deny-list into "invariant violations" (high bar) and "pattern
  violations" (overridable with justification).
- Added status legend: decided / open — see MR-X / aspirational. Annotated
  invariants and patterns that are not yet upheld in current code.
- Updated review checklist (§X) to cover database-guarantee dimensions and
  the wire-protocol / Source / patterns sections.
- Updated Living Document policy (§XI) to spell out how to revise patterns,
  resolve open invariants, and lift aspirational annotations.

Source tickets: MR-737, MR-744, MR-765, MR-694 family, MR-722/MR-725.
2026-04-29 00:39:11 +02:00
Ragnor Comerford
a9430978fb
Merge pull request #60 from ModernRelay/ragnorc/omnigraph-spec
Add AGENTS.md (map) + docs/ knowledge base + CI link check
2026-04-29 00:15:19 +02:00
Ragnor Comerford
6f25c4f9f8
Address reviewer feedback (Cursor + cubic) on PR #60
All eight comments verified against source and applied:

- AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of
  the markdown blockquote. Claude Code's @-import parser expects @ at
  column 0; the leading "> " of a blockquote silently broke
  recognition, so the claimed auto-include did nothing. (Cursor,
  Medium severity.)
- docs/cli-reference.md: command-family count 13 → 17. The current
  enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level
  variants. (cubic P2.)
- docs/ci.md: Homebrew tap update is a regular `git push`, not a
  force-push (release.yml:117 is `git push origin HEAD:main`). (cubic
  P2.)
- docs/errors.md: add the Storage variant to the NanoError list — it
  exists at error.rs:88-89 but the doc enumerated only 10 of 11.
  (cubic P2.)
- docs/storage.md: clarify tombstone semantics. There is no
  tombstone_version column; state.rs:180 reads the tombstone version
  from the table_version column on rows where object_type =
  table_tombstone. (cubic P2.)
- docs/branches-commits.md: split the GraphCommit pseudo-struct from
  the underlying storage. actor_id is joined in-memory from
  _graph_commit_actors.lance, not a column on _graph_commits.lance.
  (cubic P2.)
- docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to
  match the actual constant name in catalog/schema_ir.rs:11.
  (cubic P3.)
- docs/testing.md: engine integration test count 16 → 15 (matches
  `ls crates/omnigraph/tests/*.rs`). (cubic P3.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:09:06 +02:00
Ragnor Comerford
ada58ccd7b
Make "check existing coverage first" a top-level testing principle
The original docs/testing.md mentioned finding existing tests as step 1
of the checklist but never explicitly said "if existing coverage
already addresses your case, extend it; don't duplicate." Adds a
prominent "First principle" section that names extend-vs-new as the
preferred outcome and lists three duplicated init_and_load blocks as
the most common form of test rot.

Adds an extra checklist item: verify your change makes an *existing*
test fail before it makes a new one pass — if you can break the code
without breaking a test, that coverage gap is the bug to fix first.

Strengthens the AGENTS.md callout so the principle ("always check what
already covers it") is in scope from the top of every session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:03:50 +02:00
Ragnor Comerford
8be0e6a067
Add docs/testing.md as required-read every session
Maps the test surface (engine integration tests by area, CLI/server
tests, helpers harness, fixtures, failpoints feature, RustFS S3
integration, OpenAPI drift) and gives a before-every-task checklist:
find existing tests for the area, run them as a clean baseline, plan
the new test up front, reuse helpers, mind the layer boundary per
invariants §VII.33.

Notes that there's no coverage tooling today — coverage knowledge
comes from reading and running the relevant integration tests, not a
tarpaulin/codecov report.

Threaded into AGENTS.md as the third required-reading file alongside
invariants.md and lance.md, with a Claude-Code @-import so agents
load it on every turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:55:21 +02:00
Ragnor Comerford
a06d8bcf82
Promote docs/lance.md to required-read every session
Adds @docs/lance.md alongside @docs/invariants.md so the Lance index
loads on every turn (Claude Code @-import; explicit-open instruction
for other agents). Reframes the directive from "when you hit a
Lance-shaped problem" to "consult before every task to identify which
upstream pages are relevant." The Lance docs are the authoritative
source for substrate behavior, so reasoning about them should start
every change rather than be triggered conditionally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:50:42 +02:00
Ragnor Comerford
b6440d6b17
Add docs/lance.md — task-organized index of Lance upstream docs
Curates the Lance documentation site (lance.org) into a problem-domain
index so agents fetch the right page when working on Lance-touching
code instead of guessing or grepping our codebase. Organized by topic:
storage format & file layout, branching/tags/time travel, indexes
(scalar + system + vector), reads/writes, schema evolution, object
store, data types, performance, compaction, DataFusion integration,
SDK reference, plus quick-starts and the upstream AGENTS.md.

Skips ~200 irrelevant URLs from the upstream sitemap (Namespace REST
API model surface, Spark/Trino/Databricks/etc. integrations,
Python/Ray/HuggingFace docs, community pages) since omnigraph is
Rust-only and doesn't run a Lance Namespace catalog.

AGENTS.md surfaces it in the topic index and adds a directive: "when
you hit a Lance-shaped problem, consult docs/lance.md and fetch the
upstream URL before guessing."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:48:28 +02:00
Ragnor Comerford
43724b9f18
Make docs/invariants.md required reading every session
Adds a top-of-file directive plus a Claude-Code @-import so the full
invariants document is loaded into context on every turn, not only
when an agent follows a pointer. Other agents are instructed to open
it explicitly at session start. The §IX deny-list and §X review
checklist apply to every change, so they should be in scope by
default rather than gated on the agent remembering to look.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:39:09 +02:00
Ragnor Comerford
1e7334275a
Trim always-on rules to architectural-level invariants
Drops four rules whose phrasing leaned on implementation specifics
(nearest LIMIT, __run__<id> branches, __schema_apply_lock__,
branch_list filter convention) — those are real constraints, but they
live at the implementation layer and would go stale if internals are
renamed or refactored. The architectural intent is captured by the
remaining six rules and by the per-area docs.

Reframes the kept rules at the survives-a-rename level: "multi-dataset
publish is atomic across the whole graph" rather than naming the
manifest table or the publisher type, etc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:38:24 +02:00
Ragnor Comerford
c924e121d2
Add architectural invariants & deny-list as docs/invariants.md
A standing reference for invariants that hold across storage, engine,
server, schema, indexing, observability, and the OSS/Cloud split. Used
to check RFCs and PRs against the substrate boundaries (don't rebuild
what Lance gives us), layering rules (one trait boundary per layer),
distributability constraints (Send+Sync, location-neutral IR), honesty
expectations (estimate-vs-actual, bounded failure modes), unified
patterns (reconciler, Union polymorphism, SIP, factorize), the §IX
deny-list, and the §X review checklist.

§IV (additivity / migration) and §VIII (OSS/Cloud kernel-product split)
are referenced but not yet drafted — flagged as placeholders pending
upstream fill-in.

AGENTS.md surfaces it from the topic index, the always-on rules
section, and the maintenance contract; the deny-list is also inlined
there as a fast-pass review filter so it stays in scope every turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:34:44 +02:00
Ragnor Comerford
a335d98854
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).

Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
Ragnor Comerford
cfea41e942
Add AGENTS.md as canonical agent guide; symlink CLAUDE.md to it
Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:10:09 +02:00
Andrew Altshuler
56b6319197
Enforce schema validators on every write path (#59)
Several validators were defined but only called from a subset of write
paths, so writes that violated @unique, @range, @check, enum, or
@cardinality constraints could silently succeed and corrupt data.

Adds two new helpers in loader/mod.rs:

- validate_enum_constraints — batch-level enum check, scans Arrow
  string columns (and list-of-string columns) for values outside the
  declared set
- enforce_unique_constraints_intra_batch — single-batch duplicate
  detection over named columns; partial enforcement (does not check
  against committed rows yet — cross-batch enforcement is a separate
  effort)

Wires the validators into:

- load_jsonl_reader nodes (alongside the existing
  validate_value_constraints call) and edges (which had no enum or
  unique check at all)
- exec/mutation.rs node insert, edge insert, and update paths
- mutation edge insert now also calls validate_edge_cardinality after
  the row lands but before the manifest commit, matching the loader's
  Phase 3 behavior

A new tests/validators.rs suite asserts rejection on every entry path
for invalid enum values, @range violations, intra-batch @unique
duplicates, and edge @card excesses.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 04:51:10 +03:00
Andrew Altshuler
c8047b6620
Sharpen README tagline; add incident-response and compliance use cases (#58)
* docs: sharpen README tagline; add incident-response and compliance use cases

Lead with "lakehouse-native graph engine with git-style workflows" and a
supporting line that names the action ("branch, commit, and merge typed
graph data like source code") rather than restating capabilities.

Adds incident-response and compliance graphs to the use-case list and
fixes "multi-agentic" to "multi-agent".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: lift Capabilities above Quick Install; rename from "Omnigraph CORE"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 03:46:21 +03:00
Andrew Altshuler
f75b941a9e
Make schema apply atomic across crashes (#57)
Schema apply previously committed the manifest before writing the schema
source and IR contract files. A crash in that window left the manifest
pointing at the new schema while _schema.pg, _schema.ir.json, and
__schema_state.json still reflected the old one — a silent inconsistency
that subsequent reads hit as type errors.

Reorders the apply: write to staging filenames first, commit the
manifest, then atomically rename staging → final. On open, a recovery
sweep reconciles any leftover staging files against the manifest's table
set: pre-commit crashes get the staging files deleted, post-commit
crashes get the renames completed (idempotent — handles partial
renames). Property-only migrations where both schemas imply the same
table set return an operator-actionable error rather than guessing.

Adds rename_text + delete to StorageAdapter (atomic on local FS via
tokio::fs::rename; copy + delete on S3 — recovery is tolerant of the
non-atomic case). Failpoints test coverage at both crash boundaries plus
a partial-rename scenario.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:21:00 +03:00
Andrew Altshuler
372f793ad6
Drop macOS x86_64 build target (#55)
Stop producing the omnigraph-macos-x86_64 archive in both the
stable and edge release workflows. The macos-15-intel runner
build was the slowest of the matrix and Apple Silicon is now
the default Mac developer target.

- release.yml + release-edge.yml: drop the macos-15-intel matrix entry
- install.sh: drop the Darwin/x86_64 case so Intel Macs get a clear
  "no prebuilt binary" error instead of attempting an absent download
- update-homebrew-formula.sh: drop the MACOS_X86_* variables and emit
  an arm64-only Homebrew formula. The on_macos block now declares
  `depends_on arch: :arm64` so Intel `brew install` fails fast with
  a clear architecture message instead of installing an arm64 binary
  that errors at exec time.

Linux x86_64 build is unaffected.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 18:19:26 +03:00
andrew
0469b6883e Ignore local-only working files
Keep machine-local state (.claude/, .worktrees/, local
omnigraph.yaml, CLAUDE.md, and schema design notes) from
showing up as untracked in git status.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:41:15 +03:00
Andrew Altshuler
7310f69928
Revert "Merge pull request #49 from ModernRelay/ragnorc/x-request-id" (#54)
This reverts commit b352fca13c, reversing
changes made to 748ad334a9.
2026-04-26 15:56:29 +03:00
Ragnor Comerford
b352fca13c
Merge pull request #49 from ModernRelay/ragnorc/x-request-id
Add X-Request-Id middleware
2026-04-26 12:33:33 +02:00
Ragnor Comerford
e14b203208
Reuse X_REQUEST_ID constant for inbound header lookup
Both Cursor Bugbot and Cubic flagged that the inbound `headers().get(...)`
call constructed `HeaderName::from_static("x-request-id")` inline instead
of reusing the `X_REQUEST_ID` constant defined at the top of the file.
The two were already kept in sync by both being `from_static("x-request-id")`,
but a future rename would have to touch both sites or risk silent drift
between read and write.

Also drops the now-unused `header` module import.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 12:05:19 +02:00
Ragnor Comerford
748ad334a9
Merge pull request #48 from ModernRelay/ragnorc/api-sdk-research
Polish OpenAPI spec for SDK generation
2026-04-26 11:52:46 +02:00
Ragnor Comerford
189caf893c
Merge pull request #47 from ModernRelay/perf/expand-dense-ids
perf(expand): dense u32 ids end-to-end (follow-up to #45)
2026-04-25 23:54:03 +02:00
Ragnor Comerford
284c9377c2
Add X-Request-Id middleware
Per-request ULID minted at the edge, exposed in request extensions and
on the response header. Caller-supplied X-Request-Id is echoed when
well-formed (1..=128 ASCII printable characters); otherwise rejected
and replaced with a fresh ULID so the value is always safe to log.

Companion to the TypeScript SDK redesign — clients now correlate logs
across the wire by reading X-Request-Id from response headers (and the
SDK already surfaces it on every OmnigraphError as `requestId`).

No spec change required; the header is a transport-layer concern.

Tests:
- mint a ULID when no header is provided
- echo a valid caller-supplied id
- reject overlong header (200 chars), mint a fresh ULID

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 22:56:17 +02:00
Ragnor Comerford
7809bf607e
Polish OpenAPI spec for SDK generation
Add operation descriptions and examples to utoipa annotations so the
generated TypeScript SDK has rich JSDoc, and so future Python/Go SDKs
and any /openapi.json docs UI benefit from the same effort.

- Doc comments on all 18 handlers (utoipa picks up summary/description)
- #[schema(example = ...)] on free-text fields (query_source,
  schema_source, NDJSON data) and i64 timestamps
- Destructive/irreversible warnings on change, applySchema, ingest,
  mergeBranches, deleteBranch, publishRun, abortRun

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 16:36:51 +02:00
Ragnor Comerford
7ea868485e
Update README.md 2026-04-25 16:16:24 +02:00
Ragnor Comerford
7101565929
Update README.md 2026-04-25 16:16:07 +02:00
Andrew Altshuler
74eb5a5380
Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46)
* Parallel per-type load writes + omnigraph optimize/cleanup CLI

## MR-677.3 — parallel per-type load writes

The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).

## MR-676 — `omnigraph optimize` and `omnigraph cleanup`

New CLI subcommands that walk every node + edge table in the repo:

- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
  table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
  runs Lance `cleanup_old_versions` to prune historical manifests +
  unique fragments. Requires `--confirm` because it's destructive.
  Supports both count-based and time-based retention (or both AND'd
  together). Time uses chrono `DateTime<Utc>` (added as a workspace
  dep, default-features off).

Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.

Public API on `Omnigraph`:

  pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
  pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
      -> Result<Vec<TableCleanupStats>>

All 10 existing loader tests still pass.

Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-25 14:22:14 +03:00
Ragnor Comerford
53d7f47909
Pass dense u32 ids through expand instead of round-tripping via String
BFS now emits Vec<u32> dense ids directly with HashSet<u32> per-source
dedup. Only the deduped set is stringified for Lance's IN-list. The
post-hydrate alignment uses a dense-indexed Vec<Option<u32>> instead of
HashMap<&str, usize>, giving O(1) lookup without repeated string hashing.

End-to-end on the bench_expand harness (release, M-series):

  query     baseline    after     speedup
  1k  hop3   460.2 ms    23.7 ms     19x
  10k hop2     4.21 s   139.9 ms     30x
  10k hop3    40.59 s    898.5 ms    45x
  30k hop2    11.71 s    490.2 ms    24x
  30k hop3   197.38 s      3.22 s    61x

The cost lived in stringifying every (src,dst) pair and re-hashing the
strings during alignment; once dense ids stay dense, the BFS inner loop
and the final fan-out both collapse to integer ops.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:07:25 +02:00
andrew
628bc2e607 Clean up bench_expand example
Remove vestigial code left from removed hasher variants: unused
BuildHasherDefault import, PhantomData suppression line, orphan planning
comments for Variant C/E. Also drop an unused `mut` on the PRNG closure
binding. No behavior change; compiles warning-free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 00:59:21 +03:00
Ragnor Comerford
d8e0bfeb22
Dedupe dst ids before hydrating nodes in execute_expand (#45)
The BFS in execute_expand emits one (src_idx, dst_id) pair per edge, so
dst_id_list contains heavy duplication when multi-hop traversals revisit
the same destination nodes. hydrate_nodes then built an
"id IN ('a', 'b', ...)" filter from the full list, passing it verbatim
to Lance. On a 30k-node Person graph, a 3-hop query produced a 15.4M-
entry IN-list against a 30k-row target — 512x more entries than unique
ids.

Deduplicate before the Lance scan; the post-hydrate alignment HashMap
already fans results back out to the original (src, dst) pairs, so
output is bit-identical.

Bench numbers (crates/omnigraph/examples/bench_expand.rs, min of 2-3
runs, release build):

  query         before     after    speedup
  1k   hop3     460 ms     28 ms     16x
  10k  hop2    4.21 s     188 ms     22x
  10k  hop3   40.59 s    1.30 s     31x
  30k  hop2   11.71 s    678 ms     17x
  30k  hop3  197.38 s    4.86 s     41x

All existing omnigraph-engine tests pass (72/72, 0 failures).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:56:18 +03:00
andrew
a1b00e2d06 Fix release.yml: move HOMEBREW_TAP_TOKEN guard into steps
GitHub Actions rejects `secrets.*` in job-level `if:` conditions at
runtime (job-level `if` is evaluated before secrets are available),
causing the workflow to abort in 0s with "workflow file issue" on
every trigger. Moving the guard into a step-level check that writes
`HOMEBREW_TAP_SKIP` to GITHUB_ENV lets the rest of the steps
conditionally no-op when the tap token isn't configured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 19:24:41 +03:00
Andrew Altshuler
8649b2084f
Prepare v0.3.0 release (#44)
* Prepare v0.3.0 release

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

* ci: retrigger CI on latest openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-21 19:11:34 +03:00
Andrew Altshuler
102ccc05f7
Merge pull request #43 from ModernRelay/fix/mr-674-ephemeral-run-branches
Delete __run__ branches on every terminal state (MR-674)
2026-04-21 14:35:49 +03:00
andrew
2df578eab8 Delete __run__ branches on every terminal state (MR-674)
Run branches are transactional scaffolding — the durable audit lives
on RunRecord. Invariant: every terminal state (Published, Aborted,
Failed) deletes the __run__ branch.

- Add `terminate_run` helper: appends terminal RunRecord, then
  deletes the run branch. Delete errors are swallowed — the record
  is authoritative; `cleanup_terminal_run_branches_for_target`
  retries on later `branch_delete` of the target.
- Wire into `publish_run_as`, `abort_run`, `fail_run`.
- Include `Failed` in the cleanup filter (was `Published | Aborted`
  only) for legacy-repo GC during branch_delete.
- Cleanup now checks `coordinator.all_branches()` first to skip
  branches already deleted by a concurrent handle — avoids Lance
  NotFound when two handles publish/clean up independently.
- Drop `Failed` from `ensure_branch_delete_safe` — post-fix, Failed
  means the branch is already gone, so there's no reason to block
  target deletion (MR-674 "Downstream effects").

Tests:
- New regression: `run_branches_do_not_accumulate_across_repeated_loads`
  — 10 loads + 1 abort → `branch_list() == ["main"]`.
- New `failed_load_deletes_run_branch` asserts Failed path cleans up.
- Rename `abort_run_keeps_target_unchanged_and_preserves_hidden_branch_for_inspection`
  → `abort_run_leaves_target_unchanged_and_deletes_run_branch`, invert
  the hidden-branch assertion.
- Rewrite `public_{load,mutation}_preserves_staged_edge_ids_on_publish`
  to capture staged IDs before publish instead of inspecting the run
  branch after (branch is gone now).
- Update MR-670 regression test to assert the run branch is *absent*
  after publish.

Deferred to follow-up: `--keep-run-branch` debug flag, `omnigraph run gc`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:15:39 +03:00
Andrew Altshuler
674eee16bb
Merge pull request #42 from ModernRelay/refactor/extract-compiler-tests-p3
Extract remaining crowded compiler tests (Phase 3)
2026-04-20 23:09:14 +03:00
andrew
54101f7e2c Extract remaining crowded compiler test modules
Files where inline tests crowded out production code (test/prod ratio
≥ 0.8) move to sibling files via `#[path]`. Files where production
dominates (query_input.rs, schema_plan.rs) stay inline — extracting
would add noise, not reduce it.

- ir/lower.rs: 1239 → 577 lines (ratio 1.15)
- catalog/mod.rs: 594 → 326 lines (ratio 0.83)
- query/lint.rs: 562 → 314 lines (ratio 0.80)

catalog/tests.rs uses the shorter name since it's inside a module
directory (no ambiguity with filename).

All 229 compiler tests green, identical count to before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:49:09 +03:00
Andrew Altshuler
f2c3b11508
Merge pull request #41 from ModernRelay/refactor/extract-compiler-tests
Extract compiler test modules to sibling files (Phase 2)
2026-04-20 19:15:45 +03:00
andrew
94849a50b4 Extract compiler test modules to sibling files
typecheck.rs, schema/parser.rs, and query/parser.rs each had
~1000-line inline `mod tests` blocks that overshadowed the production
code in the file. Move each to a sibling `*_tests.rs` using
`#[path = "..."] mod tests;`.

- typecheck.rs: 2865 → 1708 lines; typecheck_tests.rs: 1156 lines
- schema/parser.rs: 1950 → 994 lines; parser_tests.rs: 955 lines
- query/parser.rs: 1737 → 803 lines; parser_tests.rs: 933 lines

No visibility change — the sibling module still has `use super::*`
access to crate-privates. No semantic edits beyond de-indenting by
4 spaces (mechanical). All 229 compiler tests green, identical
count to before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:50:18 +03:00
Andrew Altshuler
789c0633e8
Merge pull request #40 from ModernRelay/refactor/extract-omnigraph-tests
Extract public-API tests from omnigraph.rs to integration tests
2026-04-20 14:15:16 +03:00