omnigraph/docs/dev
Ragnor Comerford e0d88d1295
fix(unique): collision-free tuple key shared by intake and merge, loud on un-keyable types (#160)
* fix(unique): collision-free tuple key shared by intake and merge, loud on un-keyable types

Hardening on top of #133. That PR introduced a shared
`loader::composite_unique_key(parts)` joining per-column scalars with U+001F
and routed both intake and branch-merge through it, closing the original
'|' vs U+001F separator drift. This takes the shared keying the rest of the
way to correct-by-design:

- Collision-free by construction: the key is now the tuple of per-column
  scalar strings (Vec<String>) keyed directly, no separator, so no data value
  (not even a literal U+001F) can forge a collision.
- One scalar converter across both paths: intake used an explicit type-match,
  merge used Arrow's array_value_to_string. Both now derive the key through
  composite_unique_key(group_columns, row), so they can't drift on conversion.
- Loud on un-keyable types: the scalar converter returned None for any Arrow
  type it didn't recognize, and the caller treated None as null-exempt, so a
  @unique on a column type it couldn't reduce (list, blob) was silently
  un-enforced. It now returns Err, surfacing the constraint it can't enforce
  instead of weakening it in silence.

Tests:
- consistency::composite_unique_key_is_consistent_across_intake_and_merge pins
  that intake and merge key the tuple identically (load-on-branch then merge
  of values containing '|').
- loader unit tests pin tuple keying + null exemption and the loud error on an
  un-keyable (binary) column.

Docs: invariants truth-matrix updated; stale loader/mod.rs line pointers fixed.
Scope unchanged: intra-batch / merge-candidate-set only; cross-version
uniqueness against committed rows stays a documented gap.

* fix(unique): cover all string encodings; make format_tuple private (PR #160 review)

Addresses two Greptile P2 comments on PR #160:

- unique_key_scalar handled only StringArray (Utf8). The loud-on-unknown-type
  behavior turned any legal string column that read back as LargeUtf8 or
  Utf8View into a hard write failure (the old code silently returned None). Add
  LargeStringArray and StringViewArray arms so a legal string column is keyable
  in every physical Arrow encoding; the Err path now fires only for a genuinely
  un-keyable logical type (list/blob/vector), never a legal value in an
  unenumerated encoding.
- format_tuple was pub(crate) but only used within loader/mod.rs; make it a
  private fn (matches the old format_unique_columns it replaced, minimal
  exposed surface).

New unit test unique_key_scalar_handles_all_string_encodings pins that Utf8 /
LargeUtf8 / Utf8View all render rather than error.
2026-06-09 19:28:21 +02:00
..
architecture.md docs: rename runs.md/runs.rs → writes and repoint all references (#131) 2026-05-30 23:20:56 +02:00
branch-protection.md ci(codeowners): un-trap required checks, auto-render, generate owner tables (#142) 2026-06-06 18:09:47 +03:00
ci.md Add Windows release binaries (#127) 2026-05-30 14:23:40 +02:00
cluster-axioms.md docs: add cluster config specs 2026-06-08 17:31:36 +03:00
cluster-config-implementation-spec.md docs: add cluster config specs 2026-06-08 17:31:36 +03:00
cluster-config-specs.md docs: add cluster config specs 2026-06-08 17:31:36 +03:00
codeowners.md ci(codeowners): add aaltshuler to engineering role (#147) 2026-06-07 18:05:01 +03:00
execution.md docs: rename runs.md/runs.rs → writes and repoint all references (#131) 2026-05-30 23:20:56 +02:00
index.md docs: add cluster config specs 2026-06-08 17:31:36 +03:00
invariants.md fix(unique): collision-free tuple key shared by intake and merge, loud on un-keyable types (#160) 2026-06-09 19:28:21 +02:00
lance.md fix(optimize): skip blob-bearing tables to avoid Lance compaction crash (#138) 2026-06-02 17:12:00 +02:00
merge.md docs: split user and developer docs (#93) 2026-05-15 03:45:22 +03:00
rfc-001-queries-envelope-mcp.md feat: inline query strings in CLI and HTTP server (#110) 2026-05-29 13:41:54 +02:00
rfc-002-config-cli-architecture.md Stored-query registry foundation + config/CLI RFC-002 (#128) 2026-06-01 22:50:31 +02:00
rfc-003-mcp-server-surface.md Stored-query registry foundation + config/CLI RFC-002 (#128) 2026-06-01 22:50:31 +02:00
schema-lint-v1-plan.md schema-lint chassis v1.0: DropProperty Soft + code-tagged diagnostics (MR-694) (#90) 2026-05-16 16:30:03 +03:00
testing.md Merge origin/main into cluster-config-docs 2026-06-09 18:11:12 +03:00
writes.md fix: optimize publishes compaction; recovery roll-back converges manifest (#141) 2026-06-08 02:50:12 +03:00