omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-18 02:24:27 +02:00

Author	SHA1	Message	Date
Andrew Altshuler	c142dafdf3	schema-lint chassis v0: code-tagged diagnostics (MR-694) (#87 ) First slice of the schema-lint chassis. Adds stable `OG-XXX-NNN` codes to schema-migration rejections so operators can suppress, look up, and filter on identifiers rather than free-text prose. Atlas-style chassis adapted to omnigraph's typed-IR substrate (no SQL injection vector, no per-engine locks, native edge/vector/embedding types). What's in v0: - New `omnigraph-compiler/src/lint/` module with: - `diagnostic.rs` — Family / SafetyTier / Severity enums covering ten families (DS, MF, CD, BC, NM, OW, NL, VE, ED, LK). Only DS and MF are populated in this PR. - `codes.rs` — 8 DiagnosticCode constants (OG-DS-101..105, OG-MF-103, OG-MF-104, OG-MF-106). Five of the eight are wired to real emission sites; the other three are reserved. - Unit tests for catalog invariants: codes unique, prefix matches family, suffixes are 3-digit, destructive defaults to error, lookup() works, EMITTED_IN_V0 codes exist in ALL_CODES. - `SchemaMigrationStep::UnsupportedChange` gains an optional `code: Option<String>` field. New `unsupported_error_message()` helper prefixes the message with `[code]` when present. - 5 of 17 existing rejection paths now carry codes: - `removing node type` → OG-DS-102 - `removing edge type` → OG-DS-103 - `removing property` → OG-DS-104 - `adding required property without backfill` → OG-MF-103 - `changing property type` → OG-MF-106 Remaining 12 paths carry `code: None` and are tagged as future work. - `schema_apply` surfaces the formatted error (with `[code]` prefix); CLI `omnigraph schema plan` renders the code on the `unsupported change on <entity>` line. - PR #62 destructive-rejection tests in `tests/schema_apply.rs` now assert on the stable code (`msg.contains("OG-DS-104")`) instead of the error-message substring. 11/11 tests pass. - New `docs/schema-lint.md` documents the v0 catalog + the 10 families + Atlas prior art. AGENTS.md index updated. What's explicitly NOT in v0 (subsequent PRs): - No severity config in `omnigraph.yaml` (MR-694 §2). - No `@allow(OG-XXX-NNN, "rationale")` suppression directive (§3). - No `--allow-data-loss` flag or destructive-tier enforcement. - No new `SchemaMigrationStep` variants (soft/hard drops, default, widen/narrow). MR-700, MR-697 land those. - No pre-migration checks (MR-941). - No CD / VE / LK / NM family rules (MR-942..945). - No CI integration (MR-946). Tests: 235 compiler tests, 11 schema_apply integration tests, 14 lint module tests, 55 CLI tests — all green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:08:18 +03:00
andrew	54101f7e2c	Extract remaining crowded compiler test modules Files where inline tests crowded out production code (test/prod ratio ≥ 0.8) move to sibling files via `#[path]`. Files where production dominates (query_input.rs, schema_plan.rs) stay inline — extracting would add noise, not reduce it. - ir/lower.rs: 1239 → 577 lines (ratio 1.15) - catalog/mod.rs: 594 → 326 lines (ratio 0.83) - query/lint.rs: 562 → 314 lines (ratio 0.80) catalog/tests.rs uses the shorter name since it's inside a module directory (no ambiguity with filename). All 229 compiler tests green, identical count to before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 22:49:09 +03:00
andrew	94849a50b4	Extract compiler test modules to sibling files typecheck.rs, schema/parser.rs, and query/parser.rs each had ~1000-line inline `mod tests` blocks that overshadowed the production code in the file. Move each to a sibling `_tests.rs` using `#[path = "..."] mod tests;`. - typecheck.rs: 2865 → 1708 lines; typecheck_tests.rs: 1156 lines - schema/parser.rs: 1950 → 994 lines; parser_tests.rs: 955 lines - query/parser.rs: 1737 → 803 lines; parser_tests.rs: 933 lines No visibility change — the sibling module still has `use super::` access to crate-privates. No semantic edits beyond de-indenting by 4 spaces (mechanical). All 229 compiler tests green, identical count to before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:50:18 +03:00
Ragnor Comerford	063be3ddc7	Merge pull request #16 from ModernRelay/tin-epoch Fix join alignment for traversal-introduced bindings	2026-04-13 16:54:52 +02:00
Ragnor Comerford	6e43ceac08	Add comprehensive tests from morphological matrix analysis Unit tests covering gaps identified by systematic matrix of: topology (fan-out, fan-in, cycle) × deferral × filter type × direction. New unit tests: - fan-out: one root fans to two deferred destinations via different edges - fan-in: two sources converge on one destination via reverse expand - cycle: deferred binding + genuine cycle-closing on return edge - multiple filters on single deferred binding (name + age) - param filter on deferred binding (IRExpr::Param in dst_filters) - negation with inner binding (documents current NodeScan+cycle-close behavior) New integration tests: - fan-out projection (friend × company cross-product per source) - deferred filter matching nothing (empty result propagation) - negation with inner destination binding filter Also: guard anti-join fast path against non-empty dst_filters. The bulk CSR existence check only tests neighbor existence, not destination properties — it must fall back to the slow path when dst_filters are present to avoid false negatives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:31:08 +02:00
Ragnor Comerford	3461aa123d	Fix: exclude wildcard $_ from traversal adjacency graph The anonymous wildcard variable _ was included as a regular node in the undirected adjacency graph used for component analysis. When multiple traversals referenced $_, it falsely bridged otherwise-independent components, causing bindings in separate components to be deferred. The deferred binding would never be introduced (since _ is never added to bound_vars), leading to silently dropped traversals. Fix: skip edges involving _ when building the adjacency graph. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:11:17 +02:00
Ragnor Comerford	fabd65b08a	Fix: propagate edge-lookup errors from iterative traversal loop The retain-based loop swallowed catalog.lookup_edge_by_name errors by keeping the traversal for the next pass, where it could never succeed. This caused the no-progress break to fire, silently dropping the traversal and producing incorrect query results with missing joins. Replaced retain with a manual for-loop that propagates errors via ?. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 13:40:22 +02:00
Ragnor Comerford	88384476be	Fix traversal ordering: process in dependency order, not declaration order The iterative lowering now handles traversals declared in non-topological order (e.g. `$b worksAt $c` before `$a knows $b`). Each pass processes traversals that have at least one bound endpoint, repeating until all are consumed. Caught during self-review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:16:45 +02:00
Ragnor Comerford	853691c70e	Fix join alignment for traversal-introduced bindings with Lance filter pushdown The IR lowering previously emitted independent NodeScans for every binding in a match clause, even when bindings were connected by traversals. This created O(N×M) cross-joins followed by cycle-closing filters — correct but extremely slow for large datasets. Two changes fix this by design: 1. Deferred bindings — When multiple bindings are connected by traversals, only the first-declared binding gets a NodeScan. The rest are introduced by Expand operations, eliminating cross-joins entirely. 2. Filter fusion into Expand — Deferred binding filters are attached directly to IROp::Expand (new `dst_filters` field) and pushed into Lance SQL during hydrate_nodes(), so the storage layer skips non-matching rows. Non-pushable filters (list-contains, FTS) fall back to in-memory application after hconcat. For a query like: match { $p: Person $p worksAt $c $c: Company { name: "Acme" } } Old plan: NodeScan($p) → NodeScan($c) → cross-join → Expand(__temp) → cycle-close New plan: NodeScan($p) → Expand($p→$c, Lance SQL: id IN (...) AND name='Acme') Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:10:50 +02:00
Claude	c943d97744	Fix null-fill for nullable params when params JSON is None/null The early return at line 273 for None/Value::Null params was skipping the null-fill loop, leaving declared nullable params absent from the map. Downstream code would then error with "parameter not provided". https://claude.ai/code/session_014oGFKL7EVg1b2cyPgt9Gne	2026-04-13 09:37:17 +00:00
Claude	37b7a94eb7	Fix nullable query parameters: accept omission and null for `?` params Parameters declared with `?` (e.g. `$changelogUrl: String?`) now correctly accept omission or explicit null in JSON input instead of requiring empty strings as a workaround. Adds `Literal::Null` variant and threads it through parameter parsing, type-checking, and Arrow array conversion. https://claude.ai/code/session_014oGFKL7EVg1b2cyPgt9Gne	2026-04-13 08:43:48 +00:00
Ragnor Comerford	c5a88cacb5	Merge pull request #6 from ModernRelay/claude/omnigraph-aggregates-a53rG Implement aggregate functions with GROUP BY support	2026-04-13 10:26:07 +02:00
andrew	1bf55fa52d	Add query lint and check commands	2026-04-13 00:37:44 +03:00
Claude	351610d18c	Implement aggregate execution with wide-batch model Add runtime support for aggregate functions (count, sum, avg, min, max) with GROUP BY semantics, built on a single wide RecordBatch that eliminates correlation tracking by construction. Execution engine (exec/query.rs): - Replace HashMap<String, RecordBatch> with Option<RecordBatch> where columns are prefixed as <variable>.<property> - NodeScan prefixes columns and cross-joins with existing batch - Expand collects (src_row, dst_id) pairs, takes wide batch rows, appends prefixed destination columns via hconcat - Filter applies single mask to entire wide batch - AntiJoin: fast-path returns BooleanArray mask; slow-path slices one row for inner pipeline execution Projection engine (exec/projection.rs): - aggregate_return groups rows by non-aggregate key columns using length-prefixed string encoding, computes per-group aggregates - SUM accumulates into f64 to avoid integer overflow - MIN/MAX support both numeric and string types - Empty input returns count=0, others=null Compiler (typecheck.rs): - T8: split MIN/MAX from SUM/AVG — allow string arguments - T9: non-aggregate expressions in aggregate queries must be property accesses or variables - SUM type inference returns Float64 (matching runtime) Tests: 8 new integration tests covering grouped count, global count, sum/avg/min/max per company, aggregate+order+limit, string min/max, multi-hop aggregates, and edge cases. https://claude.ai/code/session_019o5NRyYomgETFyd7hpiLey	2026-04-12 20:59:13 +00:00
Claude	d10f78530f	Support multi-statement mutations (insert + edge in one query) Allow mutation queries to contain multiple sequential statements that execute atomically within a single transactional run. This enables patterns like inserting a node and its edges in one query: query add_and_link($name: String, $age: I32, $friend: String) { insert Person { name: $name, age: $age } insert Knows { from: $name, to: $friend } } Changes span the full compiler-to-execution pipeline: - Grammar: mutation_body = { mutation_stmt+ } - AST: QueryDecl.mutations: Vec<Mutation> - IR: MutationIR.ops: Vec<MutationOpIR> - Execution: loop over ops, accumulate affected counts Cross-statement visibility works because each statement's commit_updates advances the manifest state, so subsequent statements see prior writes. Atomicity comes from the existing run mechanism (begin_run/publish_run). https://claude.ai/code/session_01E4VG2WXrZW8aeXFiqr8NwF	2026-04-11 20:27:51 +00:00
andrew	338289656a	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

16 commits