Enable the Validated-tier enum tightenings. The planner now emits
ChangeEnumConstraint (instead of UnsupportedChange) for:
- narrow (remove allowed variants) → OG-MF-105
- String→enum (constrain a free String) → OG-MF-107
and apply gains a read-only pre-publish scan: for every Validated-tier
ChangeEnumConstraint it opens the affected table at the base snapshot and
runs the existing loader::validate_enum_constraints over every batch. A
row holding a now-disallowed value aborts the whole apply with the
offending value, before any staging write, manifest publish, or schema
rename — so the graph is left untouched (invariant: integrity failures
are loud, never silent data loss). The scan keys targets by table_key in
a BTreeMap for deterministic order and a single scan per type, and runs
before the per-table rewrite loop so a narrow combined with an unrelated
AddProperty aborts before the rewrite advances any HEAD.
This wires the Validated tier the lint chassis reserved but never used,
which also lays the groundwork for OG-MF-104 (nullable tightening).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a ChangeEnumConstraint migration step and detect enum value-set
deltas in the planner before the generic prop-type-change rejection.
This commit lands the Safe (metadata-only) cases:
- widen (add allowed variants): every existing row is still valid, so
it's a no-code, no-scan change.
- enum→String (loosen to a free string): every enum value is a valid
String, so likewise Safe.
Enums are stored physically as Arrow String, so these are catalog-only
changes — no table rewrite, the manifest version doesn't advance, and
apply rides the metadata-only path (handled as a no-op in the step
loop). The CLI `schema plan` renderer shows the step (with code+tier
when present). The new variant's diagnostic() resolves its attached code
so apply/render derive the tier from one source of truth.
Validated tightenings (narrow, String→enum) deliberately still fall
through to UnsupportedChange here; they're enabled in the next commit
together with the apply-time row scan, so we never accept a tightening
we can't validate against existing data.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mint two schema-lint codes for the enum-migration planner work:
- OG-MF-105 "narrow enum value set" (Validated): removing allowed enum
variants — apply scans existing rows and fails loudly on a row holding
a now-disallowed value.
- OG-MF-107 "constrain String to enum" (Validated): tightening a free
String to an enum — same validated-scan semantics.
Both are appended to ALL_CODES and EMITTED_IN_V0. The OG-MF-106 doc is
narrowed to mean a genuine scalar-type change only, now that enum
value-set deltas are split out to 105/107.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins two foundations for enum-migration work:
- Planner + integration coverage that adding a *nullable* enum property
is supported (AddProperty), backfills existing rows as NULL, and that
the enum value-set is enforced on subsequent writes. Closes the gap
where this path was exercised only implicitly via generic
nullable-property logic.
- A regression test for the metadata-only apply path (UpdateTypeMetadata
from an added @description): the manifest version does not advance,
yet the schema contract is persisted, `applied` is true, and a reopen
re-plans the same source as a no-op. Enum *widen* will ride exactly
this path, so the contract is now nailed down before building on it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* gitignore: exclude docs/internal/ from publication
Mirrors the existing "Local-only working files (not for the public
repo)" pattern. Working notes filed under docs/internal/ stay on the
contributor's machine instead of cluttering the published doc tree
or tripping the AGENTS.md / docs-index cross-link check
(scripts/check-agents-md.sh enumerates every docs/*.md and requires
each one to be linked from an audience index — internal notes don't
have an audience index by definition).
Incidental to the v0.5.0 release; lands separately from the version
bump commits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: skip docs/internal/ in agents-md cross-link check
Matches the .gitignore exclusion. Mirrors the existing 'docs/releases/'
exclusion pattern: notes under docs/internal/ aren't part of the
published doc tree and don't need to be linked from an audience index.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* release: v0.5.0 — Lance 6 substrate, Cedar policy engine, schema-lint v1
Bumps the workspace from 0.4.2 to 0.5.0. Release notes at
docs/releases/v0.5.0.md.
Three user-visible pillars motivate the minor bump:
1. Lance 6.0.1 substrate (DataFusion 52→53, Arrow 57→58)
2. Engine-wide Cedar policy enforcement on every _as writer; server
defaults to deny-all; signed-token-claim-only actor identity
3. Schema-lint v1 chassis: OG-XXX-NNN codes, soft drops, and
`--allow-data-loss` (Hard mode) for destructive migrations
Plus structured DataFusion Expr filter pushdown (unblocks
CompOp::Contains via array_has), HTTP allow_data_loss parity, inline
.gq sources on CLI/HTTP, optional CORS layer, and bug fixes
(merge-insert dup-rowid, branch-merge coordinator restore on error,
blob columns in branch merge).
Sites bumped:
- 5 crate [package].version lines (omnigraph, omnigraph-cli,
omnigraph-compiler, omnigraph-policy, omnigraph-server)
- 10 internal path-dep `version = "..."` constraints across the
four manifests that depend on sister crates (engine, server, cli,
plus engine's dev-dep on the compiler)
- Cargo.lock (regenerated via cargo update --workspace)
- AGENTS.md "Version surveyed:"
- openapi.json `info.version` (regenerated via
OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test
openapi)
Verification:
- cargo test --workspace --locked: 907/907 green
- cargo test -p omnigraph-engine --test failpoints --features
failpoints: 19/19 green
- cargo test -p omnigraph-engine --test lance_surface_guards: 3/3
- scripts/check-agents-md.sh: clean
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the second half of the dormant Drop* family. Per
docs/dev/schema-lint-v1-plan.md, commit #4 of the schema-lint chassis
v1 series (MR-694). Builds on commit #3 (PR #90, DropProperty Soft).
Planner (schema_plan.rs):
- plan_nodes leftover loop: emit DropType { Node, name, Soft }
instead of UnsupportedChange (OG-DS-102) for node-type removals.
- plan_edges leftover loop: emit DropType { Edge, name, Soft }
instead of UnsupportedChange (OG-DS-103) for edge-type removals.
Apply (schema_apply.rs):
- New dropped_tables: BTreeSet<String> accumulator alongside
added_tables / renamed_tables / rewritten_tables.
- DropType arm in the metadata loop populates dropped_tables for
Soft mode. Hard mode errors (lands in commit #5 with
--allow-data-loss).
- New tombstone-emission loop after the rename sidecar build:
for each dropped table, push to sidecar_tombstones AND populate
table_tombstones with table_version + 1. The existing manifest
publish path converts table_tombstones into ManifestChange::Tombstone
operations — no new manifest plumbing needed.
- Soft DropType has no Phase B per-table write; the tombstone is the
entire change. Lance dataset files are retained — prior __manifest
versions still reference them, so time travel + branch-from-snapshot
can read the dropped table until cleanup_old_versions runs.
- Rides on SidecarKind::SchemaApply per MR-847 (already established
by commit #3).
Tests:
- Planner unit test plan_emits_soft_drop_for_removed_node_and_edge_types
asserts both Node and Edge DropType { Soft } emission for the
Company + WorksAt combined drop, plus no UnsupportedChange.
- Integration test apply_schema_drops_node_and_referencing_edge_softly
(replaces apply_schema_rejects_dropping_a_node_type): asserts
plan emission, apply success, current manifest entries absent,
pre-drop manifest entries present (time-travel reversibility),
reopen consistency.
- Integration test apply_schema_drops_an_edge_type_softly (replaces
apply_schema_rejects_dropping_an_edge_type): single edge drop,
asserts other tables untouched, time-travel reversibility.
Test results:
- cargo test -p omnigraph-compiler --lib: 239 passed (1 new + 238)
- cargo test -p omnigraph-engine --test schema_apply: 11 passed
(2 converted + 9 unchanged)
Pending for v1 completion:
- Commit #5: --allow-data-loss CLI flag + Hard mode promotion in
planner + immediate compact_files + cleanup_old_versions for
both DropProperty and DropType.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* schema-lint chassis v1 (WIP): tier surfacing + plan doc
First commit of the chassis v1 branch. Lands a small, foundational
slice without behavior change, plus a planning doc that lays out the
remaining 7 commits in sequence so the PR can be reviewed
incrementally.
This commit:
- Adds SchemaMigrationStep::diagnostic() returning the full
&'static DiagnosticCode (family + tier + severity) for
UnsupportedChange steps with codes. Renderers can now reach the
tier without re-implementing the code → tier lookup.
- CLI `omnigraph schema plan` output now displays tier alongside
code:
unsupported change on node:Person.age [OG-DS-104, destructive]:
removing property 'Person.age' is not supported in schema
migration v1
Operators see at-a-glance the kind of risk each rejection
represents — not just the rule identifier.
- No behavior change. All 11 existing schema_apply tests still pass.
Planning doc at docs/schema-lint-v1-plan.md tracks the 7 remaining
commits to bring v1 to feature-complete:
1. (this commit) Tier surfacing in plan output.
2. Soft / Hard mode enum on drop steps.
3. Tombstone fields on catalog IR.
4. Planner emits DropProperty { Soft } by default.
5. Apply path implements Soft mode.
6. Convert PR #62 destructive-rejection tests.
7. --allow-data-loss flag + Hard mode.
8. (optional) Tombstone unhide / restore command.
Delete the planning doc when v1 lands. Intentionally checked in to
the WIP branch so the scope is reviewable; not intended as a
permanent doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* schema-lint v1 commit 2: DropMode + dormant Drop* variants
Second commit of the chassis v1 branch. Lands the type-level shape
of soft/hard drops without wiring them up. Variants are reachable
from emitters but the planner doesn't produce them yet; the apply
path returns an explicit not-yet-implemented error if one shows up
via deserialization.
Added:
- `DropMode { Soft, Hard }` — orthogonal to `SafetyTier`. Tier
classifies the rule's risk class; mode is the operator's intent
for data treatment.
- `Soft` → catalog tombstone, data retained. Tier: safe.
- `Hard` → Lance-level removal. Tier: destructive; will require
--allow-data-loss to apply (commit 7).
- `SchemaMigrationStep::DropType { type_kind, name, mode }` and
`SchemaMigrationStep::DropProperty { type_kind, type_name,
property_name, mode }` variants.
- Re-export `DropMode` from `omnigraph_compiler::DropMode` so
downstream crates don't reach into the catalog submodule.
- CLI `render_schema_plan_step` arms for both variants, surfacing
the mode in plan output: `drop property 'Person.age' of node
'Person' (soft mode)`.
- `apply_schema_with_lock` exhaustive match arm for the two new
variants that returns `manifest_internal` with a clear
not-yet-implemented message. If a SchemaIR JSON containing
Drop{Type,Property} arrives (e.g. from a future tool or hand-
written), the apply path fails explicitly rather than silently
misclassifying.
- Two new in-source tests:
- `drop_steps_round_trip_through_serde` — pins the wire shape
for all four (variant × mode) combinations.
- `drop_mode_serde_uses_snake_case` — pins external-tool-
friendly serialization (`"soft"` / `"hard"`).
Build: clean, only pre-existing warnings.
Tests:
- omnigraph-compiler schema_plan: 6/6 (4 existing + 2 new).
- omnigraph-engine schema_apply: 11/11 (unchanged — planner still
emits UnsupportedChange for removal paths).
Next commit (commit 3 per docs/schema-lint-v1-plan.md): add the
`tombstoned: bool` fields to NodeIR / EdgeIR / PropertyIR for the
catalog representation of soft-mode tombstones.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* plan doc: reframe v1 around Lance native drop_columns
After a substrate audit of the Lance data-evolution guide on
2026-05-13, the v1 plan was simplified. Two key findings:
1. Lance's `drop_columns()` is already metadata-only and reversible
via time travel until cleanup. No need for a parallel
`tombstoned: bool` field in our catalog IR — Lance's version
graph IS the tombstone.
2. The full schema_apply substrate migration (add_columns,
drop_columns, alter_columns vs. stage_overwrite across all step
types) is consolidated in MR-948 as a sibling issue. v1 only
uses the relevant slice (drop_columns for OG-DS-1XX).
Net plan changes:
- Commit 3 (original): tombstone fields on catalog IR → dropped.
No catalog IR change needed. The Lance drop_columns commit IS the
tombstone.
- Commit 5 (original): apply path writes tombstoned: true → replaced
with: apply path calls Dataset::drop_columns([name]).
- Commit 7 Hard mode: stage_overwrite removing the column → replaced
with: drop_columns + compact_files + cleanup_old_versions. Same
APIs omnigraph cleanup already uses.
- Commit 8 (original): omnigraph schema unhide → dropped. Time
travel is the undo (omnigraph snapshot --at <commit>).
Net result: 8 commits → 5 commits. ~250 LoC less surface. More
substrate-aligned.
The chassis types from commit 2 (DropMode enum, DropType /
DropProperty variants) are kept exactly as designed; only the
implementation strategy changed.
The Lance docs quote is included in the doc so future readers see
the substrate behavior cited verbatim.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* schema-lint v1 commit 3: emit + apply DropProperty { Soft }
Wire the dormant DropProperty variant end-to-end for the Soft case.
Per docs/schema-lint-v1-plan.md, commit #3 of the schema-lint chassis
v1 series (MR-694).
Planner (schema_plan.rs):
- plan_properties: emit DropProperty { type_kind, type_name,
property_name, mode: Soft } instead of UnsupportedChange when a
property exists in accepted but not in desired. Plan is now
supported = true for drop-only changes.
Apply (schema_apply.rs):
- Route DropProperty { Soft } through rewritten_tables. The existing
batch_for_schema_apply_rewrite path already iterates the *target*
schema fields, so a property absent from desired_catalog is
naturally projected away. The prior Lance version retains the
dropped column for time-travel reversibility (until cleanup runs).
- DropType still errors (lands in commit #4 with different mechanics:
__manifest entry removal instead of column projection).
- DropProperty { Hard } still errors (lands in commit #5 with
--allow-data-loss CLI flag + immediate compact_files +
cleanup_old_versions).
Tests:
- Planner unit test plan_emits_soft_drop_for_removed_nullable_property
asserts the variant emission + supported = true + no UnsupportedChange.
- Integration test apply_schema_drops_a_nullable_property_softly_
preserves_prior_version (replaces the former
apply_schema_rejects_dropping_a_property_with_data) asserts:
(a) plan contains DropProperty { Soft }
(b) apply succeeds + manifest advances + row count unchanged
(c) current dataset schema lacks the dropped column
(d) snapshot_at_version(pre_drop) still has the dropped column
(e) reopen consistency — drop preserved across engine restart
Recovery: rides on SidecarKind::SchemaApply per MR-847. No new
sidecar kind needed; the entire apply path is already sidecar-wrapped.
Substrate alignment: this commit uses the stage_overwrite full-rewrite
path (full_rewrite cost class) rather than Lance native drop_columns
(catalog_only cost class). MR-948 is the follow-up substrate-alignment
refactor that introduces a LanceColumnOp surface and switches the
metadata-only case onto drop_columns. Functional outcome is identical;
cost-class improvement deferred.
Test results:
- cargo test -p omnigraph-compiler --lib: 238 passed
- cargo test -p omnigraph-engine --test schema_apply: 11 passed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: move schema-lint-v1-plan into docs/dev/ + add to index
Post-rebase fixup for the docs split (#93). The plan doc was added
to docs/ at the top level before main reorganized to docs/{user,dev}/.
This moves it into docs/dev/ and adds an entry to docs/dev/index.md
under a new "Active Implementation Plans" section so the
check-agents-md.sh link check passes.
Per the original commit message (617a77d), the plan doc is intentionally
temporary — it will be deleted when v1 lands.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First slice of the schema-lint chassis. Adds stable `OG-XXX-NNN`
codes to schema-migration rejections so operators can suppress, look
up, and filter on identifiers rather than free-text prose. Atlas-style
chassis adapted to omnigraph's typed-IR substrate (no SQL injection
vector, no per-engine locks, native edge/vector/embedding types).
What's in v0:
- New `omnigraph-compiler/src/lint/` module with:
- `diagnostic.rs` — Family / SafetyTier / Severity enums covering ten
families (DS, MF, CD, BC, NM, OW, NL, VE, ED, LK). Only DS and MF
are populated in this PR.
- `codes.rs` — 8 DiagnosticCode constants (OG-DS-101..105,
OG-MF-103, OG-MF-104, OG-MF-106). Five of the eight are wired to
real emission sites; the other three are reserved.
- Unit tests for catalog invariants: codes unique, prefix matches
family, suffixes are 3-digit, destructive defaults to error,
lookup() works, EMITTED_IN_V0 codes exist in ALL_CODES.
- `SchemaMigrationStep::UnsupportedChange` gains an optional
`code: Option<String>` field. New `unsupported_error_message()`
helper prefixes the message with `[code]` when present.
- 5 of 17 existing rejection paths now carry codes:
- `removing node type` → OG-DS-102
- `removing edge type` → OG-DS-103
- `removing property` → OG-DS-104
- `adding required property without backfill` → OG-MF-103
- `changing property type` → OG-MF-106
Remaining 12 paths carry `code: None` and are tagged as future work.
- `schema_apply` surfaces the formatted error (with `[code]` prefix);
CLI `omnigraph schema plan` renders the code on the
`unsupported change on <entity>` line.
- PR #62 destructive-rejection tests in `tests/schema_apply.rs` now
assert on the stable code (`msg.contains("OG-DS-104")`) instead of
the error-message substring. 11/11 tests pass.
- New `docs/schema-lint.md` documents the v0 catalog + the 10 families
+ Atlas prior art. AGENTS.md index updated.
What's explicitly NOT in v0 (subsequent PRs):
- No severity config in `omnigraph.yaml` (MR-694 §2).
- No `@allow(OG-XXX-NNN, "rationale")` suppression directive (§3).
- No `--allow-data-loss` flag or destructive-tier enforcement.
- No new `SchemaMigrationStep` variants (soft/hard drops, default,
widen/narrow). MR-700, MR-697 land those.
- No pre-migration checks (MR-941).
- No CD / VE / LK / NM family rules (MR-942..945).
- No CI integration (MR-946).
Tests: 235 compiler tests, 11 schema_apply integration tests, 14
lint module tests, 55 CLI tests — all green.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Parallel per-type load writes + omnigraph optimize/cleanup CLI
## MR-677.3 — parallel per-type load writes
The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).
## MR-676 — `omnigraph optimize` and `omnigraph cleanup`
New CLI subcommands that walk every node + edge table in the repo:
- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
runs Lance `cleanup_old_versions` to prune historical manifests +
unique fragments. Requires `--confirm` because it's destructive.
Supports both count-based and time-based retention (or both AND'd
together). Time uses chrono `DateTime<Utc>` (added as a workspace
dep, default-features off).
Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.
Public API on `Omnigraph`:
pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
-> Result<Vec<TableCleanupStats>>
All 10 existing loader tests still pass.
Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: regenerate openapi.json
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Files where inline tests crowded out production code (test/prod ratio
≥ 0.8) move to sibling files via `#[path]`. Files where production
dominates (query_input.rs, schema_plan.rs) stay inline — extracting
would add noise, not reduce it.
- ir/lower.rs: 1239 → 577 lines (ratio 1.15)
- catalog/mod.rs: 594 → 326 lines (ratio 0.83)
- query/lint.rs: 562 → 314 lines (ratio 0.80)
catalog/tests.rs uses the shorter name since it's inside a module
directory (no ambiguity with filename).
All 229 compiler tests green, identical count to before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
typecheck.rs, schema/parser.rs, and query/parser.rs each had
~1000-line inline `mod tests` blocks that overshadowed the production
code in the file. Move each to a sibling `*_tests.rs` using
`#[path = "..."] mod tests;`.
- typecheck.rs: 2865 → 1708 lines; typecheck_tests.rs: 1156 lines
- schema/parser.rs: 1950 → 994 lines; parser_tests.rs: 955 lines
- query/parser.rs: 1737 → 803 lines; parser_tests.rs: 933 lines
No visibility change — the sibling module still has `use super::*`
access to crate-privates. No semantic edits beyond de-indenting by
4 spaces (mechanical). All 229 compiler tests green, identical
count to before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unit tests covering gaps identified by systematic matrix of:
topology (fan-out, fan-in, cycle) × deferral × filter type × direction.
New unit tests:
- fan-out: one root fans to two deferred destinations via different edges
- fan-in: two sources converge on one destination via reverse expand
- cycle: deferred binding + genuine cycle-closing on return edge
- multiple filters on single deferred binding (name + age)
- param filter on deferred binding (IRExpr::Param in dst_filters)
- negation with inner binding (documents current NodeScan+cycle-close behavior)
New integration tests:
- fan-out projection (friend × company cross-product per source)
- deferred filter matching nothing (empty result propagation)
- negation with inner destination binding filter
Also: guard anti-join fast path against non-empty dst_filters. The bulk
CSR existence check only tests neighbor existence, not destination
properties — it must fall back to the slow path when dst_filters are
present to avoid false negatives.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The anonymous wildcard variable _ was included as a regular node in the
undirected adjacency graph used for component analysis. When multiple
traversals referenced $_, it falsely bridged otherwise-independent
components, causing bindings in separate components to be deferred.
The deferred binding would never be introduced (since _ is never added
to bound_vars), leading to silently dropped traversals.
Fix: skip edges involving _ when building the adjacency graph.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The retain-based loop swallowed catalog.lookup_edge_by_name errors by
keeping the traversal for the next pass, where it could never succeed.
This caused the no-progress break to fire, silently dropping the
traversal and producing incorrect query results with missing joins.
Replaced retain with a manual for-loop that propagates errors via ?.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The iterative lowering now handles traversals declared in non-topological
order (e.g. `$b worksAt $c` before `$a knows $b`). Each pass processes
traversals that have at least one bound endpoint, repeating until all are
consumed. Caught during self-review.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The IR lowering previously emitted independent NodeScans for every binding
in a match clause, even when bindings were connected by traversals. This
created O(N×M) cross-joins followed by cycle-closing filters — correct but
extremely slow for large datasets.
Two changes fix this by design:
1. **Deferred bindings** — When multiple bindings are connected by
traversals, only the first-declared binding gets a NodeScan. The rest
are introduced by Expand operations, eliminating cross-joins entirely.
2. **Filter fusion into Expand** — Deferred binding filters are attached
directly to IROp::Expand (new `dst_filters` field) and pushed into
Lance SQL during hydrate_nodes(), so the storage layer skips
non-matching rows. Non-pushable filters (list-contains, FTS) fall back
to in-memory application after hconcat.
For a query like:
match { $p: Person $p worksAt $c $c: Company { name: "Acme" } }
Old plan: NodeScan($p) → NodeScan($c) → cross-join → Expand(__temp) → cycle-close
New plan: NodeScan($p) → Expand($p→$c, Lance SQL: id IN (...) AND name='Acme')
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The early return at line 273 for None/Value::Null params was skipping
the null-fill loop, leaving declared nullable params absent from the
map. Downstream code would then error with "parameter not provided".
https://claude.ai/code/session_014oGFKL7EVg1b2cyPgt9Gne
Parameters declared with `?` (e.g. `$changelogUrl: String?`) now correctly
accept omission or explicit null in JSON input instead of requiring empty
strings as a workaround. Adds `Literal::Null` variant and threads it through
parameter parsing, type-checking, and Arrow array conversion.
https://claude.ai/code/session_014oGFKL7EVg1b2cyPgt9Gne
Add runtime support for aggregate functions (count, sum, avg, min, max)
with GROUP BY semantics, built on a single wide RecordBatch that
eliminates correlation tracking by construction.
Execution engine (exec/query.rs):
- Replace HashMap<String, RecordBatch> with Option<RecordBatch> where
columns are prefixed as <variable>.<property>
- NodeScan prefixes columns and cross-joins with existing batch
- Expand collects (src_row, dst_id) pairs, takes wide batch rows,
appends prefixed destination columns via hconcat
- Filter applies single mask to entire wide batch
- AntiJoin: fast-path returns BooleanArray mask; slow-path slices
one row for inner pipeline execution
Projection engine (exec/projection.rs):
- aggregate_return groups rows by non-aggregate key columns using
length-prefixed string encoding, computes per-group aggregates
- SUM accumulates into f64 to avoid integer overflow
- MIN/MAX support both numeric and string types
- Empty input returns count=0, others=null
Compiler (typecheck.rs):
- T8: split MIN/MAX from SUM/AVG — allow string arguments
- T9: non-aggregate expressions in aggregate queries must be
property accesses or variables
- SUM type inference returns Float64 (matching runtime)
Tests: 8 new integration tests covering grouped count, global count,
sum/avg/min/max per company, aggregate+order+limit, string min/max,
multi-hop aggregates, and edge cases.
https://claude.ai/code/session_019o5NRyYomgETFyd7hpiLey
Allow mutation queries to contain multiple sequential statements that
execute atomically within a single transactional run. This enables
patterns like inserting a node and its edges in one query:
query add_and_link($name: String, $age: I32, $friend: String) {
insert Person { name: $name, age: $age }
insert Knows { from: $name, to: $friend }
}
Changes span the full compiler-to-execution pipeline:
- Grammar: mutation_body = { mutation_stmt+ }
- AST: QueryDecl.mutations: Vec<Mutation>
- IR: MutationIR.ops: Vec<MutationOpIR>
- Execution: loop over ops, accumulate affected counts
Cross-statement visibility works because each statement's commit_updates
advances the manifest state, so subsequent statements see prior writes.
Atomicity comes from the existing run mechanism (begin_run/publish_run).
https://claude.ai/code/session_01E4VG2WXrZW8aeXFiqr8NwF