Lance's distributed-write API splits "write fragment files" from "advance
HEAD": write_fragments returns a Transaction with FragmentMetadata; a
later CommitBuilder::execute(transaction) commits via the manifest CAS.
The same shape exists for merge_insert via
MergeInsertBuilder::execute_uncommitted. Scanner::with_fragments(staged)
lets in-flight reads see uncommitted staged data.
Adds wrappers for these primitives:
- StagedWrite carries the uncommitted Transaction plus the new Fragments
(extracted for read-your-writes via Scanner::with_fragments).
- TableStore::stage_append wraps InsertBuilder::execute_uncommitted.
- TableStore::stage_merge_insert wraps MergeInsertBuilder::execute_uncommitted.
- TableStore::commit_staged wraps CommitBuilder::execute.
- TableStore::scan_with_staged / count_rows_with_staged thread the staged
fragments into a Scanner alongside the dataset's committed fragments.
The MutationStaging integration that uses these primitives is the next
step in MR-794 — it requires a coordinated rewrite of execute_insert /
execute_update / execute_delete plus the load_jsonl_reader path, plus
end-of-query commit logic. Doc comment on MutationStaging is updated to
reference MR-794 and these primitives so the followup is well-anchored.
The current MR-771 limitation in docs/runs.md ("mid-query partial failure
leaves Lance HEAD ahead of __manifest") still applies until the
follow-up lands; the primitives are the building blocks but not yet the
fix.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cubic correctly flagged that the assertion `!branches_after.iter().any(|b| b.starts_with("__run__"))` is vacuous because `branch_list()` already filters `__run__*` via `is_internal_system_branch`. The real structural property (no `__run__` branches can ever be created) is enforced by MR-771's deletion of `begin_run` etc. — that's a build-time invariant, not a runtime one.
Drop the vacuous assertion; document why. The remaining checks (public branch list unchanged + `_graph_runs.lance` never reappears) cover the actual cancel-safety properties.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three fixes from automated PR review on #65:
1. Internal-branch guard in mutation/load (Cursor Bugbot, Medium).
Pre-MR-771 the begin_run path called ensure_public_branch_ref;
the direct-publish replacements only normalized the name. A caller
passing __run__* or __schema_apply_lock__ verbatim could write
directly to a system branch. Re-add the explicit guard at the
public write boundary in mutate_with_current_actor and load.
2. Panic-safe coordinator restoration (Cursor Bugbot, High).
The previous swap-and-restore pattern would skip restore_coordinator
if execute_named_mutation panicked, leaving the handle pinned to
the wrong branch indefinitely. Replace with a CoordinatorRestoreGuard
RAII type that captures the previous coordinator on swap and
restores it in Drop.
3. Flaky cancel-safety test (cubic, P2).
tests/runs.rs::cancelled_mutation_future_leaves_no_state asserted
manifest version equality after handle.abort(), but abort races
the spawned task. Re-frame around what actually defines cancel
safety: no __run__* branches, no _graph_runs.lance, no synthesized
public branches.
The fourth comment (Codex P1: branch_delete losing its in-flight
write barrier) is bigger in scope — fits in the MR-794 storage-trait
staging story rather than a hotfix here. Tracked there.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two factual mismatches caught during code-grounded re-review:
- docs/architecture.md: "13 cmd families" was stale — the CLI has 17
Command variants (Version, Embed, Init, Load, Ingest, Branch, Schema,
Query, Snapshot, Export, Run, Commit, Read, Change, Policy, Optimize,
Cleanup). Replaced the count with "command families" so the diagram
doesn't drift again.
- docs/execution.md: the mutation prose said "every mutation runs on a
fresh __run__<id> branch", which over-claims. mutation.rs:555 short-
circuits when the target is already a __run__ branch — the assumption
there is the caller is managing the surrounding run lifecycle. Added a
one-paragraph caveat noting the exception with the file:line citation.
Both diagrams unchanged; only annotations / counts adjusted.
The mutation flow diagram and prose previously claimed multi-statement
mutations publish through a single ManifestRepo::commit. That's wrong:
each statement commits independently to a __run__<id> branch via
commit_updates; atomicity comes from the publish_run step that promotes
the run-branch into the target (fast path) or three-way merges it
(merge path).
Diagram now shows:
- begin_run forks __run__<id> from target head
- per-statement commit_updates on the run-branch (loop)
- OCC pre-check on target head
- publish_run with fast-path / merge-path branches
- terminate_run(Published)
Prose now points at runs.md for the full run lifecycle and cites the
correct entry points: mutate_with_current_actor (mutation.rs:539),
publish_run (omnigraph.rs:858).
Addresses Codex review comment on PR #64.
Replace the single ASCII stack in docs/architecture.md with a hierarchy of
Mermaid diagrams that show the system from external context down to the
component level. Add an on-disk layout diagram in docs/storage.md and two
sequence diagrams (read query, mutation) in docs/execution.md so readers
can navigate from "what is OmniGraph" to "how does a query run" without
opening source.
Static structure (docs/architecture.md):
- System context — agents/clients, embedding providers, Cedar, object store.
- Layer view — eight-layer stack with L1 (Lance) / L2 (OmniGraph) styling
via classDef, replacing the pre-existing ASCII art.
- Component zoom-ins — compiler, engine, storage trait, index lifecycle,
server/CLI. Each zoom-in cites file:line entry points.
Aspirational shapes (storage trait, full reconciler) are visually marked
and pointed at the relevant invariants.md section so readers see the
intended seam without thinking it's already implemented.
On-disk layout (docs/storage.md):
- Tree from repo URI through __manifest, nodes/, edges/, _graph_commits.lance,
_graph_runs.lance, _refs/branches/ down into Lance's per-dataset
internals (_versions/, data/, _indices/, _refs/, _transactions/).
- Annotated with the actual filenames so readers can `ls` the same paths.
- Slots in below the existing __manifest CAS / OCC / migration prose; does
not move or rewrite that content.
Runtime flows (docs/execution.md):
- Read flow sequence: client → Omnigraph::query → typecheck → lower →
execute_query → table_store → Lance scanner → RecordBatch stream.
- Mutation flow sequence: Omnigraph::mutate → resolve literals →
Lance write op (Append / merge_insert) → ManifestRepo::commit →
__manifest upsert.
- Both diagrams are followed by a "Code paths" block with verified
file:line citations so readers can navigate from diagram element to
source in one step.
Conventions established (this is the first Mermaid in the repo):
- L1 = orange (#fef3e8), L2 = blue (#e8f4fd), aspirational = dashed.
- Diagram size cap ~9 elements; more detail goes in a sub-diagram.
- Diagrams paired with prose; code-path citations follow each diagram.
- Consistent vocabulary across diagrams: frontend / compiler / engine /
storage trait / Lance / object store. No accidental synonyms.
Subsequent PRs will add flow diagrams for schema apply, branch + merge,
run isolation, index reconcile, and the embedding pipeline in the same
conventions.
The previous bullets read like a migration pattern (centralized
dispatcher, one match arm, no shape forks). That's one application, not
the principle. Reframe it as a bidirectional decision lens: ask "which
option has lower ongoing cost over time?" and let the answer be more
code, less code, DRYing, duplication, removal, addition, a new
abstraction, or flattening one — whichever shape converges over the
expected change horizon.
Add explicit examples of cases where the lower-liability option is
*more* code (dispatcher, migration framework, typed error variants) and
where it's *less* (premature abstractions, "just in case" paths,
helpers that wedge independently-evolving callers together) so readers
don't collapse the principle into "minimize code".
The always-on rules are concretizations of a broader engineering posture
that wasn't stated explicitly. Add a short section that frames the spirit
behind those rules:
- One centralized detection point, not many heal hooks.
- One dispatcher, not branch-on-shape in every consumer.
- One canonical shape after migration, not forks on "old vs new".
- Three similar lines beats a premature abstraction.
- Delete dead paths when their last caller leaves.
Plus a forward-looking review prompt ("what do these paths look like after
5 more changes like this?") so the principle bites at design time, not
just at review time.
The internal-schema-version mechanism we just shipped is a concrete
application: one stamp + one dispatcher + one match arm per change, no
heal hooks scattered across the engine. Codify the pattern so future
work doesn't drift back to ad-hoc.
The on-disk shape of `__manifest` is reconciled with the binary via a single
stamp + dispatcher in `db/manifest/migrations.rs`:
- `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` declares the shape this binary writes.
- The on-disk stamp `omnigraph:internal_schema_version` lives in the manifest
dataset's schema-level metadata (Lance `update_schema_metadata`).
- `migrate_internal_schema(&mut dataset)` walks `match`-arm steps forward from
the on-disk stamp until it matches the binary, then returns. Idempotent.
- `init_manifest_repo` stamps the current version at creation; the publisher's
open-for-write path runs pending migrations before reading state. Reads
stay side-effect-free.
- Forward-version protection: a stamp higher than the binary's known version
triggers a clear "upgrade omnigraph first" error so an old binary cannot
clobber a newer schema.
Self-heals existing pre-MR-766 deployments by auto-applying the v1→v2 step:
the `lance-schema:unenforced-primary-key` annotation on `__manifest.object_id`
that engages Lance's row-level CAS at commit time. New repos created via
`init` are stamped at v2 immediately and don't need migration.
Adding a future on-disk shape change is one constant bump, one match arm in
`migrate_internal_schema`, and one test — no new branches in unrelated code
paths. Code outside the migration module never inspects the stamp.
New tests in `manifest/tests.rs`:
- `test_init_stamps_internal_schema_version`
- `test_publish_migrates_pre_stamp_manifest_to_current_version`
- `test_publish_rejects_manifest_stamped_at_future_version`
Docs: `docs/storage.md`, `docs/maintenance.md`, `docs/constants.md` updated
per the AGENTS.md maintenance contract.
- storage.md: document the row-level CAS annotation on `__manifest.object_id`
and the `expected_table_versions` OCC contract on `ManifestBatchPublisher::publish`.
- errors.md: list `ManifestConflictDetails` and its variants alongside `ManifestError`.
- constants.md: add `PUBLISHER_RETRY_BUDGET = 5`.
Per AGENTS.md "Maintenance contract": new schema construct, new constant, and
new typed error shape all need to ship with the source change.
Layered approach selected by the CAS-granularity investigation
(.context/merge-insert-cas-granularity.md):
- Annotate __manifest.object_id with `lance-schema:unenforced-primary-key`,
enabling Lance row-level CAS via the bloom-filter conflict resolver.
Closes a latent silent-duplicate bug where two concurrent publishes of
the same `version:T@v=N+1` row could both land in disjoint fragments.
- Extend `ManifestBatchPublisher::publish` with `expected_table_versions:
&HashMap<String, u64>`. Empty map preserves today's behavior; populated
map asserts the manifest's latest non-tombstoned version per table
matches the caller's view. Mismatches surface as a typed
`ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected,
actual }` so callers can match without parsing strings.
- Set `merge_builder.conflict_retries(0)` so Lance's transparent rebase
cannot silently break the OCC contract; retries are owned by the
publisher loop, where each attempt re-runs `load_publish_state` and
the expected-version pre-check.
- Surface `ManifestCoordinator::commit_with_expected` for the callers
that need strict OCC (the run-demotion ticket); existing `commit` and
`commit_changes` paths are unaffected.
New tests in `manifest/tests.rs` cover: matching expected versions,
stale expected with typed details, drift on an untouched expected
table, unknown expected table (actual=0), and the headline case of two
concurrent publishes with overlapping expected versions where exactly
one succeeds.
Confirms Lance v4.0.0 has row-level CAS for merge_insert only when the
join-key column carries lance-schema:unenforced-primary-key=true. Our
__manifest schema does not, so the publisher silently allows duplicate
object_id rows under concurrent writers. Note + reproducible scratch
crate select the layered (pre-check + row-level CAS) approach for the
publisher API ticket.
- Removed §IX (OSS / Cloud kernel-product split) — business strategy belongs
in MR-738, not the technical invariants doc.
- Filled the §IV (Additivity / migration) placeholder with five evolution
invariants.
- Reframed §I to be substrate-agnostic: invariants are about respecting any
substrate; Lance / DataFusion are noted as the current chosen substrate
rather than as the invariant itself.
- Added §VI Database guarantees (12 invariants): atomicity, schema integrity,
isolation, durability, causal consistency, determinism, idempotency, no
silent loss, bounded operations, failure scope, crash recovery, consistency
model.
- Added §II.8 wire-protocol agnosticism (kernel transport-agnostic,
Flight/HTTP at the server boundary).
- Reframed §VII as "Current architectural patterns" — explicitly distinct
from invariants. Each pattern entry now names the underlying invariant it
realizes (reconciler / Union / mutations-wrap-reads / SIP / factorize /
stable row IDs / rank columns / policy predicates / Source).
- Pulled specific config defaults out of §VI (timeouts, memory caps);
invariant is that bounds exist, values live in docs/constants.md.
- Split §IX deny-list into "invariant violations" (high bar) and "pattern
violations" (overridable with justification).
- Added status legend: decided / open — see MR-X / aspirational. Annotated
invariants and patterns that are not yet upheld in current code.
- Updated review checklist (§X) to cover database-guarantee dimensions and
the wire-protocol / Source / patterns sections.
- Updated Living Document policy (§XI) to spell out how to revise patterns,
resolve open invariants, and lift aspirational annotations.
Source tickets: MR-737, MR-744, MR-765, MR-694 family, MR-722/MR-725.
All eight comments verified against source and applied:
- AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of
the markdown blockquote. Claude Code's @-import parser expects @ at
column 0; the leading "> " of a blockquote silently broke
recognition, so the claimed auto-include did nothing. (Cursor,
Medium severity.)
- docs/cli-reference.md: command-family count 13 → 17. The current
enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level
variants. (cubic P2.)
- docs/ci.md: Homebrew tap update is a regular `git push`, not a
force-push (release.yml:117 is `git push origin HEAD:main`). (cubic
P2.)
- docs/errors.md: add the Storage variant to the NanoError list — it
exists at error.rs:88-89 but the doc enumerated only 10 of 11.
(cubic P2.)
- docs/storage.md: clarify tombstone semantics. There is no
tombstone_version column; state.rs:180 reads the tombstone version
from the table_version column on rows where object_type =
table_tombstone. (cubic P2.)
- docs/branches-commits.md: split the GraphCommit pseudo-struct from
the underlying storage. actor_id is joined in-memory from
_graph_commit_actors.lance, not a column on _graph_commits.lance.
(cubic P2.)
- docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to
match the actual constant name in catalog/schema_ir.rs:11.
(cubic P3.)
- docs/testing.md: engine integration test count 16 → 15 (matches
`ls crates/omnigraph/tests/*.rs`). (cubic P3.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The original docs/testing.md mentioned finding existing tests as step 1
of the checklist but never explicitly said "if existing coverage
already addresses your case, extend it; don't duplicate." Adds a
prominent "First principle" section that names extend-vs-new as the
preferred outcome and lists three duplicated init_and_load blocks as
the most common form of test rot.
Adds an extra checklist item: verify your change makes an *existing*
test fail before it makes a new one pass — if you can break the code
without breaking a test, that coverage gap is the bug to fix first.
Strengthens the AGENTS.md callout so the principle ("always check what
already covers it") is in scope from the top of every session.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Maps the test surface (engine integration tests by area, CLI/server
tests, helpers harness, fixtures, failpoints feature, RustFS S3
integration, OpenAPI drift) and gives a before-every-task checklist:
find existing tests for the area, run them as a clean baseline, plan
the new test up front, reuse helpers, mind the layer boundary per
invariants §VII.33.
Notes that there's no coverage tooling today — coverage knowledge
comes from reading and running the relevant integration tests, not a
tarpaulin/codecov report.
Threaded into AGENTS.md as the third required-reading file alongside
invariants.md and lance.md, with a Claude-Code @-import so agents
load it on every turn.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds @docs/lance.md alongside @docs/invariants.md so the Lance index
loads on every turn (Claude Code @-import; explicit-open instruction
for other agents). Reframes the directive from "when you hit a
Lance-shaped problem" to "consult before every task to identify which
upstream pages are relevant." The Lance docs are the authoritative
source for substrate behavior, so reasoning about them should start
every change rather than be triggered conditionally.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Curates the Lance documentation site (lance.org) into a problem-domain
index so agents fetch the right page when working on Lance-touching
code instead of guessing or grepping our codebase. Organized by topic:
storage format & file layout, branching/tags/time travel, indexes
(scalar + system + vector), reads/writes, schema evolution, object
store, data types, performance, compaction, DataFusion integration,
SDK reference, plus quick-starts and the upstream AGENTS.md.
Skips ~200 irrelevant URLs from the upstream sitemap (Namespace REST
API model surface, Spark/Trino/Databricks/etc. integrations,
Python/Ray/HuggingFace docs, community pages) since omnigraph is
Rust-only and doesn't run a Lance Namespace catalog.
AGENTS.md surfaces it in the topic index and adds a directive: "when
you hit a Lance-shaped problem, consult docs/lance.md and fetch the
upstream URL before guessing."
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a top-of-file directive plus a Claude-Code @-import so the full
invariants document is loaded into context on every turn, not only
when an agent follows a pointer. Other agents are instructed to open
it explicitly at session start. The §IX deny-list and §X review
checklist apply to every change, so they should be in scope by
default rather than gated on the agent remembering to look.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops four rules whose phrasing leaned on implementation specifics
(nearest LIMIT, __run__<id> branches, __schema_apply_lock__,
branch_list filter convention) — those are real constraints, but they
live at the implementation layer and would go stale if internals are
renamed or refactored. The architectural intent is captured by the
remaining six rules and by the per-area docs.
Reframes the kept rules at the survives-a-rename level: "multi-dataset
publish is atomic across the whole graph" rather than naming the
manifest table or the publisher type, etc.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A standing reference for invariants that hold across storage, engine,
server, schema, indexing, observability, and the OSS/Cloud split. Used
to check RFCs and PRs against the substrate boundaries (don't rebuild
what Lance gives us), layering rules (one trait boundary per layer),
distributability constraints (Send+Sync, location-neutral IR), honesty
expectations (estimate-vs-actual, bounded failure modes), unified
patterns (reconciler, Union polymorphism, SIP, factorize), the §IX
deny-list, and the §X review checklist.
§IV (additivity / migration) and §VIII (OSS/Cloud kernel-product split)
are referenced but not yet drafted — flagged as placeholders pending
upstream fill-in.
AGENTS.md surfaces it from the topic index, the always-on rules
section, and the maintenance contract; the deny-list is also inlined
there as a fast-pass review filter so it stays in scope every turn.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the v0.3.1 feature spec (storage, schema/query languages, IR,
indexes, embeddings, branches/commits/runs, merge, server, CLI, policy,
deployment) and adds a §26 maintenance contract instructing agents to
keep this file current alongside any user-visible change. CLAUDE.md is
a symlink so there's one source of truth.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Several validators were defined but only called from a subset of write
paths, so writes that violated @unique, @range, @check, enum, or
@cardinality constraints could silently succeed and corrupt data.
Adds two new helpers in loader/mod.rs:
- validate_enum_constraints — batch-level enum check, scans Arrow
string columns (and list-of-string columns) for values outside the
declared set
- enforce_unique_constraints_intra_batch — single-batch duplicate
detection over named columns; partial enforcement (does not check
against committed rows yet — cross-batch enforcement is a separate
effort)
Wires the validators into:
- load_jsonl_reader nodes (alongside the existing
validate_value_constraints call) and edges (which had no enum or
unique check at all)
- exec/mutation.rs node insert, edge insert, and update paths
- mutation edge insert now also calls validate_edge_cardinality after
the row lands but before the manifest commit, matching the loader's
Phase 3 behavior
A new tests/validators.rs suite asserts rejection on every entry path
for invalid enum values, @range violations, intra-batch @unique
duplicates, and edge @card excesses.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: sharpen README tagline; add incident-response and compliance use cases
Lead with "lakehouse-native graph engine with git-style workflows" and a
supporting line that names the action ("branch, commit, and merge typed
graph data like source code") rather than restating capabilities.
Adds incident-response and compliance graphs to the use-case list and
fixes "multi-agentic" to "multi-agent".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: lift Capabilities above Quick Install; rename from "Omnigraph CORE"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Schema apply previously committed the manifest before writing the schema
source and IR contract files. A crash in that window left the manifest
pointing at the new schema while _schema.pg, _schema.ir.json, and
__schema_state.json still reflected the old one — a silent inconsistency
that subsequent reads hit as type errors.
Reorders the apply: write to staging filenames first, commit the
manifest, then atomically rename staging → final. On open, a recovery
sweep reconciles any leftover staging files against the manifest's table
set: pre-commit crashes get the staging files deleted, post-commit
crashes get the renames completed (idempotent — handles partial
renames). Property-only migrations where both schemas imply the same
table set return an operator-actionable error rather than guessing.
Adds rename_text + delete to StorageAdapter (atomic on local FS via
tokio::fs::rename; copy + delete on S3 — recovery is tolerant of the
non-atomic case). Failpoints test coverage at both crash boundaries plus
a partial-rename scenario.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stop producing the omnigraph-macos-x86_64 archive in both the
stable and edge release workflows. The macos-15-intel runner
build was the slowest of the matrix and Apple Silicon is now
the default Mac developer target.
- release.yml + release-edge.yml: drop the macos-15-intel matrix entry
- install.sh: drop the Darwin/x86_64 case so Intel Macs get a clear
"no prebuilt binary" error instead of attempting an absent download
- update-homebrew-formula.sh: drop the MACOS_X86_* variables and emit
an arm64-only Homebrew formula. The on_macos block now declares
`depends_on arch: :arm64` so Intel `brew install` fails fast with
a clear architecture message instead of installing an arm64 binary
that errors at exec time.
Linux x86_64 build is unaffected.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep machine-local state (.claude/, .worktrees/, local
omnigraph.yaml, CLAUDE.md, and schema design notes) from
showing up as untracked in git status.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both Cursor Bugbot and Cubic flagged that the inbound `headers().get(...)`
call constructed `HeaderName::from_static("x-request-id")` inline instead
of reusing the `X_REQUEST_ID` constant defined at the top of the file.
The two were already kept in sync by both being `from_static("x-request-id")`,
but a future rename would have to touch both sites or risk silent drift
between read and write.
Also drops the now-unused `header` module import.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per-request ULID minted at the edge, exposed in request extensions and
on the response header. Caller-supplied X-Request-Id is echoed when
well-formed (1..=128 ASCII printable characters); otherwise rejected
and replaced with a fresh ULID so the value is always safe to log.
Companion to the TypeScript SDK redesign — clients now correlate logs
across the wire by reading X-Request-Id from response headers (and the
SDK already surfaces it on every OmnigraphError as `requestId`).
No spec change required; the header is a transport-layer concern.
Tests:
- mint a ULID when no header is provided
- echo a valid caller-supplied id
- reject overlong header (200 chars), mint a fresh ULID
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add operation descriptions and examples to utoipa annotations so the
generated TypeScript SDK has rich JSDoc, and so future Python/Go SDKs
and any /openapi.json docs UI benefit from the same effort.
- Doc comments on all 18 handlers (utoipa picks up summary/description)
- #[schema(example = ...)] on free-text fields (query_source,
schema_source, NDJSON data) and i64 timestamps
- Destructive/irreversible warnings on change, applySchema, ingest,
mergeBranches, deleteBranch, publishRun, abortRun
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Parallel per-type load writes + omnigraph optimize/cleanup CLI
## MR-677.3 — parallel per-type load writes
The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).
## MR-676 — `omnigraph optimize` and `omnigraph cleanup`
New CLI subcommands that walk every node + edge table in the repo:
- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
runs Lance `cleanup_old_versions` to prune historical manifests +
unique fragments. Requires `--confirm` because it's destructive.
Supports both count-based and time-based retention (or both AND'd
together). Time uses chrono `DateTime<Utc>` (added as a workspace
dep, default-features off).
Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.
Public API on `Omnigraph`:
pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
-> Result<Vec<TableCleanupStats>>
All 10 existing loader tests still pass.
Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: regenerate openapi.json
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
BFS now emits Vec<u32> dense ids directly with HashSet<u32> per-source
dedup. Only the deduped set is stringified for Lance's IN-list. The
post-hydrate alignment uses a dense-indexed Vec<Option<u32>> instead of
HashMap<&str, usize>, giving O(1) lookup without repeated string hashing.
End-to-end on the bench_expand harness (release, M-series):
query baseline after speedup
1k hop3 460.2 ms 23.7 ms 19x
10k hop2 4.21 s 139.9 ms 30x
10k hop3 40.59 s 898.5 ms 45x
30k hop2 11.71 s 490.2 ms 24x
30k hop3 197.38 s 3.22 s 61x
The cost lived in stringifying every (src,dst) pair and re-hashing the
strings during alignment; once dense ids stay dense, the BFS inner loop
and the final fan-out both collapse to integer ops.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remove vestigial code left from removed hasher variants: unused
BuildHasherDefault import, PhantomData suppression line, orphan planning
comments for Variant C/E. Also drop an unused `mut` on the PRNG closure
binding. No behavior change; compiles warning-free.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The BFS in execute_expand emits one (src_idx, dst_id) pair per edge, so
dst_id_list contains heavy duplication when multi-hop traversals revisit
the same destination nodes. hydrate_nodes then built an
"id IN ('a', 'b', ...)" filter from the full list, passing it verbatim
to Lance. On a 30k-node Person graph, a 3-hop query produced a 15.4M-
entry IN-list against a 30k-row target — 512x more entries than unique
ids.
Deduplicate before the Lance scan; the post-hydrate alignment HashMap
already fans results back out to the original (src, dst) pairs, so
output is bit-identical.
Bench numbers (crates/omnigraph/examples/bench_expand.rs, min of 2-3
runs, release build):
query before after speedup
1k hop3 460 ms 28 ms 16x
10k hop2 4.21 s 188 ms 22x
10k hop3 40.59 s 1.30 s 31x
30k hop2 11.71 s 678 ms 17x
30k hop3 197.38 s 4.86 s 41x
All existing omnigraph-engine tests pass (72/72, 0 failures).
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
GitHub Actions rejects `secrets.*` in job-level `if:` conditions at
runtime (job-level `if` is evaluated before secrets are available),
causing the workflow to abort in 0s with "workflow file issue" on
every trigger. Moving the guard into a step-level check that writes
`HOMEBREW_TAP_SKIP` to GITHUB_ENV lets the rest of the steps
conditionally no-op when the tap token isn't configured.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>