vestige

apunkt/vestige

Fork 0

mirror of https://github.com/samvallad33/vestige.git synced 2026-07-02 22:01:01 +02:00

Commit graph

Author	SHA1	Message	Date
Sam Valladares	d0a403111e	docs(demo): 60-second funding demo script — the category-error pitch A separate, investor-grade script (vs the viral clip): leads with the thesis "the entire AI-memory industry is trapped in a category error — a root cause never looks like the bug it creates", proves it live with the similarity-vs- Postdict contrast, then closes on the moat (faithful Nature port + incumbents' architecture IS the category error) and the market (every agent that touches production). Verified the exact on-screen commands end-to-end: the 3-memory scenario (cause + billing-500 lookalike + crash) makes the contrast devastating — similarity search returns the billing lookalike as its #1, Postdict reaches back 3 days to the real env-var cause. (Without the lookalike distractor the contrast collapses — similarity would also surface the cause — so the script plants all three.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 18:19:41 -05:00
Sam Valladares	561b2301db	docs(demo): full run-it-yourself README + unify failure detection demo/README.md: the complete self-serve demo artifact — one-command run, the seeded scenario explained, a "build your own scenario" section, the honest boundary (won't invent a cause; can't reach a cause that was never recorded), the Nature citation + the "field admits this is unsolved" sources, and the recording playbook + paste-ready caption. Writing/testing the README surfaced a real inconsistency, now fixed: - The CLI's failure-finder used a hardcoded content-only marker subset and ignored tags, so a "Checkout latency spiked" memory (regression tag, no crash word in content) was never picked as the failure. The CLI now calls the SAME public `looks_like_failure` (content + tags, full list) the backfill tool uses — one definition, no drift. - Extended FAILURE_MARKERS with performance/degradation failures (spiked, latency, degraded, slow, hang, throttled, oom, 502/503/504, flaky, ...) so the feature backfills from perf regressions, not just hard crashes. clippy clean; 527 core + 453 mcp tests; both the main demo and the README's custom scenario verified end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 18:12:09 -05:00
Sam Valladares	988a31c207	fix(search+demo): rotation-audit fixes — FTS tokenizer match, honest demo labels 3-model rotation audit (DeepSeek V4-Pro / Kimi K2.7 / MiniMax M3, max thinking, each model × each of 3 sections). Claude verified every finding against code. CONFIRMED + FIXED: - [FTS, consensus DeepSeek+MiniMax] sanitize_fts5_or_query split on !is_alphanumeric()+'_', but the index uses tokenize='porter ascii' which splits on '_' and non-ASCII. So "API_TIMEOUT"/"café" became single phrases that could NEVER match. Now splits on !is_ascii_alphanumeric() + lowercases to mirror the tokenizer; caps token count (64) and length (64) for DoS hardening. Also fixes the pre-existing storage.search bug (multi-word queries silently returned nothing). 5 new tests pin it. - [Demo honesty, consensus Kimi+DeepSeek] the contrast labeled keyword search as "SIMILARITY SEARCH" and asserted "NONE of these is the cause" universally. Now prints the REAL engine ("keyword (BM25)" vs "semantic (vector + BM25 hybrid)") and claims only what's true ("ranked by RESEMBLANCE; its top hit is a lookalike"). De-hardcoded the "Service crashed:" munging to a generic label-strip. VERIFIED FALSE POSITIVE (not changed): MiniMax "fts.id non-existent column" — the FTS5 table is declared `fts5(id, content, tags, ...)`, the JOIN is valid. No injection found by any model (quote-doubling + operator-stripping confirmed safe). clippy clean; 527 core + 453 mcp tests pass; demo verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 18:05:01 -05:00

Author

SHA1

Message

Date

Sam Valladares

d0a403111e

docs(demo): 60-second funding demo script — the category-error pitch

A separate, investor-grade script (vs the viral clip): leads with the thesis
"the entire AI-memory industry is trapped in a category error — a root cause
never looks like the bug it creates", proves it live with the similarity-vs-
Postdict contrast, then closes on the moat (faithful Nature port + incumbents'
architecture IS the category error) and the market (every agent that touches
production).

Verified the exact on-screen commands end-to-end: the 3-memory scenario (cause
+ billing-500 lookalike + crash) makes the contrast devastating — similarity
search returns the billing lookalike as its #1, Postdict reaches back 3 days to
the real env-var cause. (Without the lookalike distractor the contrast collapses
— similarity would also surface the cause — so the script plants all three.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-27 18:19:41 -05:00

Sam Valladares

561b2301db

docs(demo): full run-it-yourself README + unify failure detection

demo/README.md: the complete self-serve demo artifact — one-command run, the
seeded scenario explained, a "build your own scenario" section, the honest
boundary (won't invent a cause; can't reach a cause that was never recorded),
the Nature citation + the "field admits this is unsolved" sources, and the
recording playbook + paste-ready caption.

Writing/testing the README surfaced a real inconsistency, now fixed:
- The CLI's failure-finder used a hardcoded content-only marker subset and
  ignored tags, so a "Checkout latency spiked" memory (regression tag, no crash
  word in content) was never picked as the failure. The CLI now calls the SAME
  public `looks_like_failure` (content + tags, full list) the backfill tool uses
  — one definition, no drift.
- Extended FAILURE_MARKERS with performance/degradation failures (spiked,
  latency, degraded, slow, hang, throttled, oom, 502/503/504, flaky, ...) so the
  feature backfills from perf regressions, not just hard crashes.

clippy clean; 527 core + 453 mcp tests; both the main demo and the README's
custom scenario verified end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-27 18:12:09 -05:00

Sam Valladares

988a31c207

fix(search+demo): rotation-audit fixes — FTS tokenizer match, honest demo labels

3-model rotation audit (DeepSeek V4-Pro / Kimi K2.7 / MiniMax M3, max thinking,
each model × each of 3 sections). Claude verified every finding against code.

CONFIRMED + FIXED:
- [FTS, consensus DeepSeek+MiniMax] sanitize_fts5_or_query split on
  !is_alphanumeric()+'_', but the index uses tokenize='porter ascii' which
  splits on '_' and non-ASCII. So "API_TIMEOUT"/"café" became single phrases that
  could NEVER match. Now splits on !is_ascii_alphanumeric() + lowercases to mirror
  the tokenizer; caps token count (64) and length (64) for DoS hardening. Also
  fixes the pre-existing storage.search bug (multi-word queries silently returned
  nothing). 5 new tests pin it.
- [Demo honesty, consensus Kimi+DeepSeek] the contrast labeled keyword search as
  "SIMILARITY SEARCH" and asserted "NONE of these is the cause" universally. Now
  prints the REAL engine ("keyword (BM25)" vs "semantic (vector + BM25 hybrid)")
  and claims only what's true ("ranked by RESEMBLANCE; its top hit is a lookalike").
  De-hardcoded the "Service crashed:" munging to a generic label-strip.

VERIFIED FALSE POSITIVE (not changed): MiniMax "fts.id non-existent column" —
the FTS5 table is declared `fts5(id, content, tags, ...)`, the JOIN is valid.
No injection found by any model (quote-doubling + operator-stripping confirmed safe).

clippy clean; 527 core + 453 mcp tests pass; demo verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-27 18:05:01 -05:00

3 commits