omnigraph/docs/dev/index.md
Andrew Altshuler 57348cf7fa
fix(engine): preserve identifier case in filter pushdown (#283) (#285)
* test(engine): regression tests for #283 camelCase property filters

Red against current code. A query (or chained mutation) that filters on a
camelCase schema field lints and plans cleanly but fails at run time with
"No field named reponame" because the identifier's case is destroyed at the
engine->Lance boundary.

Coverage added:
- query.rs unit: ir_filter_to_expr on a camelCase property must emit an
  Expr::Column named `repoName`, not `reponame` (red); plus a green coercion
  guard that a camelCase int column still gets a coerced literal.
- mutation.rs unit: predicate_to_sql must emit the column UNQUOTED and
  case-preserved (green guard documenting the committed-scan contract).
- literal_filters.rs e2e: a camelCase @index field with an inline-binding
  pushdown filter returns the seeded row (red — read pushdown).
- writes.rs e2e: an update+delete on a camelCase predicate, and a chained
  update that re-reads the pending side of scan_with_pending by the same
  camelCase predicate (red — pending MemTable scan).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* fix(engine): preserve identifier case in filter pushdown (#283)

Two engine->Lance boundaries lowercased camelCase column identifiers,
breaking any filter on a camelCase schema field even though the IR,
compiler, projection, and in-memory filtering all preserve case.

Read pushdown (exec/query.rs, ir_expr_to_expr): build the column reference
with datafusion::prelude::ident() instead of col(). col() routes through SQL
identifier normalization and lowercases an unquoted identifier
(`repoName` -> `reponame`); ident() builds an unqualified, case-preserved
Column. Property refs here are always bare column names, so there is no
qualified-name handling to lose. No-op for the lowercase columns that work
today.

Pending mutation scan (table_store.rs, scan_pending_batches): the
committed-scan consumer (Lance Scanner::filter(&str)) preserves an unquoted
identifier's case but treats a double-quoted "col" as a string literal, so
predicate_to_sql must keep the column unquoted. The pending side splices that
same unquoted predicate into a DataFusion `SELECT ... WHERE`, which would
lowercase it. Make that path case-preserving by disabling
sql_parser.enable_ident_normalization on its SessionContext rather than
quoting (quoting would match zero committed rows). predicate_to_sql gains
only a clarifying comment; its emitted string is unchanged.

Full engine suite green (579 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs(dev): case study for #283 camelCase filter bug

Record the root cause, the two-boundary fix (read pushdown col→ident; pending
mutation scan ident-normalization off), and why the obvious symmetric
"quote the column" fix is wrong (Lance reads a double-quoted column as a string
literal and silently matches zero committed rows). Linked from a new
"Case Studies" section in the dev index so the link check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 18:42:56 +03:00

6.1 KiB

Developer Docs

Audience: contributors, maintainers, and coding agents

This is the contributor-facing entry point. These docs explain architecture, invariants, implementation contracts, test ownership, and upstream Lance constraints. User-facing behavior should still be documented through docs/user/index.md and the relevant public reference docs.

Required For Every Non-Trivial Change

Need Read
Architectural rules, known gaps, deny-list invariants.md
Upstream Lance source-of-truth index lance.md
Existing test coverage and test placement testing.md

Architecture And Storage

Area Read
System structure, L1/L2 framing, component diagrams architecture.md
On-disk layout, manifest schema, URI behavior storage.md
Direct-publish writes, D2, staged writes, recovery sidecars writes.md
Query execution, mutation execution, loader flow execution.md
Index lifecycle and graph topology indexes indexes.md
Branch and commit internals branches-commits.md
Three-way merge implementation and conflicts merge.md
Diff/change-feed implementation changes.md
Branch protection policy branch-protection.md

Language, Runtime, And Boundaries

Area Read
Schema grammar, catalog, migration planner schema-language.md
Query grammar, IR, lints, mutation restrictions query-language.md
Embedding client and @embed integration embeddings.md
Cedar policy surface and server gating policy.md
Server auth, OpenAPI, endpoint handlers server.md
Error taxonomy and serialization errors.md
Constants and tunables constants.md
Transaction model public contract transactions.md

Project Operations

Area Read
CI and release workflows ci.md
Install and deployment packaging install.md, deployment.md
Release history releases/

Contribution & Governance

Area Read
How to contribute (external) CONTRIBUTING.md
Governance model, roles, decision authority GOVERNANCE.md
Public contribution RFC track rfcs/

The docs/rfcs/ track is the public, externally-authorable RFC process. The maintainer/internal RFCs below (rfc-00N-*.md) are a separate, team-owned track; don't conflate the two.

Case Studies

Worked write-ups of specific bugs — root cause, fix, and the reasoning that ruled out the tempting-but-wrong alternatives. Read these for the debugging pattern, not just the outcome.

Area Read
camelCase property filters lowercased at runtime (#283) — two engine→Lance boundaries, two different fixes bug-case-fix.md

Active Implementation Plans

Working documents for in-flight feature work. Removed when the work lands.

Area Read
Schema-lint chassis v1 (MR-694) — --allow-data-loss, soft/hard drops schema-lint-v1-plan.md
Inline + stored queries, request/response envelope, MCP (MR-656 / MR-976 / MR-969) rfc-001-queries-envelope-mcp.md
Config & CLI architecture — layered config, client targeting, file naming (MR-973 / MR-974 / MR-981) rfc-002-config-cli-architecture.md
MCP server surface — full tool parity, stored queries, modular auth (MR-969 / MR-956 / MR-974) rfc-003-mcp-server-surface.md
Future cluster control plane — declarative as-code config, JSON state ledger, reconciler cluster-config-specs.md, cluster-axioms.md, cluster-config-implementation-spec.md
Cluster graph & schema apply — Phase 4 sidecars, roll-forward recovery, approval artifacts rfc-004-cluster-graph-schema-apply.md
Server boots from cluster state — Phase 5 mode switch, applied-revision serving rfc-005-server-cluster-boot.md
Per-operator config — ~/.omnigraph/ identity, keyed credentials, named servers (the operator slice of RFC-002) rfc-007-operator-config.md
Deprecate omnigraph.yaml — one concern per config surface; key-by-key migration map and staged retirement rfc-008-deprecate-omnigraph-yaml.md
Unify CLI embedded/remote access paths — parity referee, shared wire-DTO crate, GraphClient trait, declared plane capabilities rfc-009-unify-access-paths.md
Restructure the CLI around explicit planes — one graph-addressing model, declared capability surface, plane-grouped help (expands RFC-009 Phase 4) rfc-010-cli-planes-restructure.md
CLI refactoring — one addressing & config model post-omnigraph.yaml: scope + --graph + derived access path, served-default / privileged-direct, profiles, named queries, capability classifier (completes RFC-008) rfc-011-cli-refactoring.md
Provider-independent embedding configuration — one resolved EmbeddingConfig + sealed provider enum (Gemini/OpenAI/Mock), identity recorded in the schema IR, query-time same-space validation, NFR floor rfc-012-embedding-provider-config.md

Boundary

Developer docs may mention implementation details, stale gaps, upstream Lance blockers, and review rules. User docs should not require that context unless the detail changes the public contract.