vestige/crates
Sam Valladares 988a31c207 fix(search+demo): rotation-audit fixes — FTS tokenizer match, honest demo labels
3-model rotation audit (DeepSeek V4-Pro / Kimi K2.7 / MiniMax M3, max thinking,
each model × each of 3 sections). Claude verified every finding against code.

CONFIRMED + FIXED:
- [FTS, consensus DeepSeek+MiniMax] sanitize_fts5_or_query split on
  !is_alphanumeric()+'_', but the index uses tokenize='porter ascii' which
  splits on '_' and non-ASCII. So "API_TIMEOUT"/"café" became single phrases that
  could NEVER match. Now splits on !is_ascii_alphanumeric() + lowercases to mirror
  the tokenizer; caps token count (64) and length (64) for DoS hardening. Also
  fixes the pre-existing storage.search bug (multi-word queries silently returned
  nothing). 5 new tests pin it.
- [Demo honesty, consensus Kimi+DeepSeek] the contrast labeled keyword search as
  "SIMILARITY SEARCH" and asserted "NONE of these is the cause" universally. Now
  prints the REAL engine ("keyword (BM25)" vs "semantic (vector + BM25 hybrid)")
  and claims only what's true ("ranked by RESEMBLANCE; its top hit is a lookalike").
  De-hardcoded the "Service crashed:" munging to a generic label-strip.

VERIFIED FALSE POSITIVE (not changed): MiniMax "fts.id non-existent column" —
the FTS5 table is declared `fts5(id, content, tags, ...)`, the JOIN is valid.
No injection found by any model (quote-doubling + operator-stripping confirmed safe).

clippy clean; 527 core + 453 mcp tests pass; demo verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 18:05:01 -05:00
..
vestige-core fix(search+demo): rotation-audit fixes — FTS tokenizer match, honest demo labels 2026-06-27 18:05:01 -05:00
vestige-mcp fix(search+demo): rotation-audit fixes — FTS tokenizer match, honest demo labels 2026-06-27 18:05:01 -05:00