mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
fix(unique): collision-free tuple key shared by intake and merge, loud on un-keyable types (#160)
* fix(unique): collision-free tuple key shared by intake and merge, loud on un-keyable types Hardening on top of #133. That PR introduced a shared `loader::composite_unique_key(parts)` joining per-column scalars with U+001F and routed both intake and branch-merge through it, closing the original '|' vs U+001F separator drift. This takes the shared keying the rest of the way to correct-by-design: - Collision-free by construction: the key is now the tuple of per-column scalar strings (Vec<String>) keyed directly, no separator, so no data value (not even a literal U+001F) can forge a collision. - One scalar converter across both paths: intake used an explicit type-match, merge used Arrow's array_value_to_string. Both now derive the key through composite_unique_key(group_columns, row), so they can't drift on conversion. - Loud on un-keyable types: the scalar converter returned None for any Arrow type it didn't recognize, and the caller treated None as null-exempt, so a @unique on a column type it couldn't reduce (list, blob) was silently un-enforced. It now returns Err, surfacing the constraint it can't enforce instead of weakening it in silence. Tests: - consistency::composite_unique_key_is_consistent_across_intake_and_merge pins that intake and merge key the tuple identically (load-on-branch then merge of values containing '|'). - loader unit tests pin tuple keying + null exemption and the loud error on an un-keyable (binary) column. Docs: invariants truth-matrix updated; stale loader/mod.rs line pointers fixed. Scope unchanged: intra-batch / merge-candidate-set only; cross-version uniqueness against committed rows stays a documented gap. * fix(unique): cover all string encodings; make format_tuple private (PR #160 review) Addresses two Greptile P2 comments on PR #160: - unique_key_scalar handled only StringArray (Utf8). The loud-on-unknown-type behavior turned any legal string column that read back as LargeUtf8 or Utf8View into a hard write failure (the old code silently returned None). Add LargeStringArray and StringViewArray arms so a legal string column is keyable in every physical Arrow encoding; the Err path now fires only for a genuinely un-keyable logical type (list/blob/vector), never a legal value in an unenumerated encoding. - format_tuple was pub(crate) but only used within loader/mod.rs; make it a private fn (matches the old format_unique_columns it replaced, minimal exposed surface). New unit test unique_key_scalar_handles_all_string_encodings pins that Utf8 / LargeUtf8 / Utf8View all render rather than error.
This commit is contained in:
parent
dbfdddc952
commit
e0d88d1295
5 changed files with 235 additions and 73 deletions
|
|
@ -101,7 +101,7 @@ Use it this way:
|
|||
| Deletes | Inline-commit residual; delete-only queries allowed, mixed insert/update/delete rejected by D2 | [query-language.md](../user/query-language.md), [writes.md](writes.md) |
|
||||
| Branch delete | Manifest is the single authority, flipped atomically first; per-table forks + commit-graph branch are derived state, reclaimed best-effort (`force_delete_branch`) with the `cleanup` reconciler as the guaranteed backstop. Reusing a name whose reclaim failed before `cleanup` surfaces an actionable error | [branches-commits.md](../user/branches-commits.md), [maintenance.md](../user/maintenance.md) |
|
||||
| Schema validation | Type checks, required fields, defaults, edge endpoint checks, and edge cardinality are enforced on write paths | [schema-language.md](../user/schema-language.md), [execution.md](execution.md) |
|
||||
| Unique constraints | Intra-batch and write-path checks exist; full cross-version uniqueness is still a gap | [schema-language.md](../user/schema-language.md) |
|
||||
| Unique constraints | Intra-batch and write-path checks exist; intake and branch-merge derive the composite key through one shared function (`loader::composite_unique_key`, a separator-free `Vec<String>` tuple) and fail loudly on an un-keyable column type rather than silently exempting it; full cross-version uniqueness against already-committed rows is still a gap | [schema-language.md](../user/schema-language.md) |
|
||||
| Storage trait | `TableStorage` exists as the sealed staged-write surface; full call-site migration and capability/stat surfaces are incomplete | [writes.md](writes.md), [architecture.md](architecture.md) |
|
||||
| Index lifecycle | `ensure_indices` is explicit today; reconciler-based convergence is roadmap | [indexes.md](../user/indexes.md), [maintenance.md](../user/maintenance.md) |
|
||||
| Traversal IDs | Runtime still builds `TypeIndex`; Lance stable row-id based graph IDs are roadmap | [architecture.md](architecture.md), [query-language.md](../user/query-language.md) |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue