mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-21 02:28:07 +02:00

Andrew Altshuler 57348cf7fa

fix(engine): preserve identifier case in filter pushdown (#283 ) (#285 )

* test(engine): regression tests for #283 camelCase property filters

Red against current code. A query (or chained mutation) that filters on a
camelCase schema field lints and plans cleanly but fails at run time with
"No field named reponame" because the identifier's case is destroyed at the
engine->Lance boundary.

Coverage added:
- query.rs unit: ir_filter_to_expr on a camelCase property must emit an
  Expr::Column named `repoName`, not `reponame` (red); plus a green coercion
  guard that a camelCase int column still gets a coerced literal.
- mutation.rs unit: predicate_to_sql must emit the column UNQUOTED and
  case-preserved (green guard documenting the committed-scan contract).
- literal_filters.rs e2e: a camelCase @index field with an inline-binding
  pushdown filter returns the seeded row (red — read pushdown).
- writes.rs e2e: an update+delete on a camelCase predicate, and a chained
  update that re-reads the pending side of scan_with_pending by the same
  camelCase predicate (red — pending MemTable scan).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* fix(engine): preserve identifier case in filter pushdown (#283)

Two engine->Lance boundaries lowercased camelCase column identifiers,
breaking any filter on a camelCase schema field even though the IR,
compiler, projection, and in-memory filtering all preserve case.

Read pushdown (exec/query.rs, ir_expr_to_expr): build the column reference
with datafusion::prelude::ident() instead of col(). col() routes through SQL
identifier normalization and lowercases an unquoted identifier
(`repoName` -> `reponame`); ident() builds an unqualified, case-preserved
Column. Property refs here are always bare column names, so there is no
qualified-name handling to lose. No-op for the lowercase columns that work
today.

Pending mutation scan (table_store.rs, scan_pending_batches): the
committed-scan consumer (Lance Scanner::filter(&str)) preserves an unquoted
identifier's case but treats a double-quoted "col" as a string literal, so
predicate_to_sql must keep the column unquoted. The pending side splices that
same unquoted predicate into a DataFusion `SELECT ... WHERE`, which would
lowercase it. Make that path case-preserving by disabling
sql_parser.enable_ident_normalization on its SessionContext rather than
quoting (quoting would match zero committed rows). predicate_to_sql gains
only a clarifying comment; its emitted string is unchanged.

Full engine suite green (579 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs(dev): case study for #283 camelCase filter bug

Record the root cause, the two-boundary fix (read pushdown col→ident; pending
mutation scan ident-normalization off), and why the obvious symmetric
"quote the column" fix is wrong (Lance reads a double-quoted column as a string
literal and silently matches zero committed rows). Linked from a new
"Case Studies" section in the dev index so the link check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-19 18:42:56 +03:00

6.1 KiB

Raw Blame History

Developer Docs

Audience: contributors, maintainers, and coding agents

This is the contributor-facing entry point. These docs explain architecture, invariants, implementation contracts, test ownership, and upstream Lance constraints. User-facing behavior should still be documented through docs/user/index.md and the relevant public reference docs.

Required For Every Non-Trivial Change

Need	Read
Architectural rules, known gaps, deny-list	invariants.md
Upstream Lance source-of-truth index	lance.md
Existing test coverage and test placement	testing.md

Architecture And Storage

Area	Read
System structure, L1/L2 framing, component diagrams	architecture.md
On-disk layout, manifest schema, URI behavior	storage.md
Direct-publish writes, D2, staged writes, recovery sidecars	writes.md
Query execution, mutation execution, loader flow	execution.md
Index lifecycle and graph topology indexes	indexes.md
Branch and commit internals	branches-commits.md
Three-way merge implementation and conflicts	merge.md
Diff/change-feed implementation	changes.md
Branch protection policy	branch-protection.md

Language, Runtime, And Boundaries

Area	Read
Schema grammar, catalog, migration planner	schema-language.md
Query grammar, IR, lints, mutation restrictions	query-language.md
Embedding client and `@embed` integration	embeddings.md
Cedar policy surface and server gating	policy.md
Server auth, OpenAPI, endpoint handlers	server.md
Error taxonomy and serialization	errors.md
Constants and tunables	constants.md
Transaction model public contract	transactions.md

Project Operations

Area	Read
CI and release workflows	ci.md
Install and deployment packaging	install.md, deployment.md
Release history	releases/

Contribution & Governance

Area	Read
How to contribute (external)	CONTRIBUTING.md
Governance model, roles, decision authority	GOVERNANCE.md
Public contribution RFC track	rfcs/

The docs/rfcs/ track is the public, externally-authorable RFC process. The maintainer/internal RFCs below (rfc-00N-*.md) are a separate, team-owned track; don't conflate the two.

Case Studies

Worked write-ups of specific bugs — root cause, fix, and the reasoning that ruled out the tempting-but-wrong alternatives. Read these for the debugging pattern, not just the outcome.

Area	Read
camelCase property filters lowercased at runtime (#283) — two engine→Lance boundaries, two different fixes	bug-case-fix.md

Active Implementation Plans

Working documents for in-flight feature work. Removed when the work lands.

Area	Read
Schema-lint chassis v1 (MR-694) — `--allow-data-loss`, soft/hard drops	schema-lint-v1-plan.md
Inline + stored queries, request/response envelope, MCP (MR-656 / MR-976 / MR-969)	rfc-001-queries-envelope-mcp.md
Config & CLI architecture — layered config, client targeting, file naming (MR-973 / MR-974 / MR-981)	rfc-002-config-cli-architecture.md
MCP server surface — full tool parity, stored queries, modular auth (MR-969 / MR-956 / MR-974)	rfc-003-mcp-server-surface.md
Future cluster control plane — declarative as-code config, JSON state ledger, reconciler	cluster-config-specs.md, cluster-axioms.md, cluster-config-implementation-spec.md
Cluster graph & schema apply — Phase 4 sidecars, roll-forward recovery, approval artifacts	rfc-004-cluster-graph-schema-apply.md
Server boots from cluster state — Phase 5 mode switch, applied-revision serving	rfc-005-server-cluster-boot.md
Per-operator config — `~/.omnigraph/` identity, keyed credentials, named servers (the operator slice of RFC-002)	rfc-007-operator-config.md
Deprecate `omnigraph.yaml` — one concern per config surface; key-by-key migration map and staged retirement	rfc-008-deprecate-omnigraph-yaml.md
Unify CLI embedded/remote access paths — parity referee, shared wire-DTO crate, `GraphClient` trait, declared plane capabilities	rfc-009-unify-access-paths.md
Restructure the CLI around explicit planes — one graph-addressing model, declared capability surface, plane-grouped help (expands RFC-009 Phase 4)	rfc-010-cli-planes-restructure.md
CLI refactoring — one addressing & config model post-`omnigraph.yaml`: scope + `--graph` + derived access path, served-default / privileged-direct, profiles, named queries, capability classifier (completes RFC-008)	rfc-011-cli-refactoring.md
Provider-independent embedding configuration — one resolved `EmbeddingConfig` + sealed provider enum (Gemini/OpenAI/Mock), identity recorded in the schema IR, query-time same-space validation, NFR floor	rfc-012-embedding-provider-config.md

Boundary

Developer docs may mention implementation details, stale gaps, upstream Lance blockers, and review rules. User docs should not require that context unless the detail changes the public contract.

6.1 KiB Raw Blame History