V2 Implementation Audit
Date: March 2026
Reference: Flakestorm v2.md, flakestorm-v2-addendum.md
Scope
Verification of the codebase against the PRD and addendum: behavior, config schema, CLI, and examples.
1. PRD §8.1 — Environment Chaos
| Requirement |
Status |
Implementation |
| Tool faults: timeout, error, malformed, slow, malicious_response |
✅ |
chaos/faults.py, chaos/http_transport.py (by match_url or tool *) |
| LLM faults: timeout, truncated_response, rate_limit, empty, garbage |
✅ |
chaos/llm_proxy.py, chaos/interceptor.py |
probability, after_calls, tool * |
✅ |
chaos/faults.should_trigger, transport and interceptor |
| Built-in profiles: api_outage, degraded_llm, hostile_tools, high_latency, cascading_failure |
✅ |
chaos/profiles/*.yaml |
| InstrumentedAgentAdapter / httpx transport |
✅ |
ChaosInterceptor, ChaosHttpTransport, HTTPAgentAdapter(transport=...) |
2. PRD §8.2 — Behavioral Contracts
| Requirement |
Status |
Implementation |
| Contract with id, severity, when, negate |
✅ |
ContractInvariantConfig, contracts/engine.py |
| Chaos matrix (scenarios) |
✅ |
contract.chaos_matrix, scenario → ChaosConfig per run |
| Resilience matrix N×M, weighted score |
✅ |
contracts/matrix.py (critical×3, high×2, medium×1), FAIL if any critical |
| Invariant types: contains_any, output_not_empty, completes, excludes_pattern, behavior_unchanged |
✅ |
Assertions + verifier; contract engine runs verifier with contract invariants |
| reset_endpoint / reset_function |
✅ |
AgentConfig, ContractEngine._reset_agent() before each cell |
| Stateful warning when no reset |
✅ |
ContractEngine._detect_stateful_and_warn(), STATEFUL_WARNING |
3. PRD §8.3 — Replay-Based Regression
| Requirement |
Status |
Implementation |
| Replay session: input, tool_responses, contract |
✅ |
ReplaySessionConfig, replay/loader.py, replay/runner.py |
| Contract by name or path |
✅ |
resolve_contract() in loader |
| Verify against contract |
✅ |
ReplayRunner.run() uses InvariantVerifier with resolved contract |
| Export from report |
✅ |
flakestorm replay export --from-report FILE |
| Replays in config: sessions with file or inline |
✅ |
replays.sessions; session can have file only (load from file) or full inline |
4. PRD §9 — Combined Modes & Resilience Score
| Requirement |
Status |
Implementation |
| Mutation only, chaos only, mutation+chaos, contract, replay |
✅ |
run (with --chaos, --chaos-only), contract run, replay run |
| Unified resilience score (mutation_robustness, chaos_resilience, contract_compliance, replay_regression, overall) |
✅ |
reports/models.TestResults.resilience_scores; flakestorm ci computes overall from scoring.weights |
5. PRD §10 — CLI
| Command |
Status |
| flakestorm run --chaos, --chaos-profile, --chaos-only |
✅ |
| flakestorm chaos |
✅ |
| flakestorm contract run / validate / score |
✅ |
| flakestorm replay run [PATH] |
✅ (replay run, replay export) |
| flakestorm replay export --from-report FILE |
✅ |
| flakestorm ci |
✅ (mutation + contract + chaos + replay + overall score) |
6. Addendum (flakestorm-v2-addendum.md) — Full Checklist
Addition 1 — Context Attacks Module
| Requirement |
Status |
Notes |
chaos/context_attacks.py |
✅ |
ContextAttackEngine, maybe_inject_indirect() |
| indirect_injection (inject payloads into tool response) |
✅ |
Wired via engine; profile indirect_injection.yaml |
| memory_poisoning, system_prompt_leak_probe |
⚠️ |
Docstring/config types exist; memory_poisoning inject step and leak probe as contract assertion are not fully wired in execution flow |
| Contract invariants: excludes_pattern, behavior_unchanged |
✅ |
assertions/verifier.py; use for system_prompt_not_leaked, injection_not_executed |
Config: chaos.context_attacks list with type (e.g. indirect_injection) |
✅ |
ContextAttackConfig in core/config.py |
Addition 2 — Model Version Drift (response_drift)
| Requirement |
Status |
Notes |
response_drift in llm_faults |
✅ |
chaos/llm_proxy.py: apply_llm_response_drift, drift_type, severity, direction, factor |
| drift_type: json_field_rename, verbosity_shift, format_change, refusal_rephrase, tone_shift |
✅ |
Implemented in llm_proxy |
Profile model_version_drift.yaml |
✅ |
chaos/profiles/model_version_drift.yaml |
Addition 3 — Multi-Agent Failure Propagation
| Requirement |
Status |
Notes |
| v3 roadmap placeholder, no v2 implementation |
✅ |
Documented in ROADMAP.md as V3; no code required |
Addition 4 — Resilience Certificate Export
| Requirement |
Status |
Notes |
flakestorm certificate CLI command |
❌ |
Not implemented |
reports/certificate.py (PDF/HTML certificate) |
❌ |
Not implemented |
Config certificate.tester_name, pass_threshold, output_format |
❌ |
Not implemented |
Addition 5 — LangSmith Replay Import
| Requirement |
Status |
Notes |
Import single run by ID: flakestorm replay --from-langsmith RUN_ID |
✅ |
replay/loader.py: load_langsmith_run(run_id); CLI option |
Import and run: --from-langsmith RUN_ID --run |
✅ |
_replay_async supports run_after_import |
| Schema validation (fail clearly if LangSmith API changed) |
✅ |
_validate_langsmith_run_schema |
| Map run inputs/outputs/child_runs to ReplaySessionConfig |
✅ |
_langsmith_run_to_session |
--from-langsmith-project PROJECT + --filter-status + --output |
✅ |
replay run --from-langsmith-project X --filter-status error -o ./replays/; writes YAML per run |
replays.sources (type: langsmith |
langsmith_run, project, filter, auto_import) |
✅ |
Addition 6 — Implicit Spec Clarifications
| Requirement |
Status |
Notes |
| 6.1 Python callables: fail loudly if tool_faults but no tools/ToolRegistry |
✅ |
create_instrumented_adapter raises with clear message for type=python |
| 6.2 Contract matrix: reset between cells (reset_endpoint / reset_function) |
✅ |
ContractEngine._reset_agent(); config fields on AgentConfig |
| 6.3 Resilience score formula in spec (weighted, auto-FAIL on critical) |
✅ |
contracts/matrix.py docstring and implementation; docs/V2_SPEC.md |
Summary: Addendum Additions 1, 2, 3, 5, 6 are implemented (with minor gaps on full memory_poisoning/leak_probe wiring). Addition 4 (Resilience Certificate) is not implemented.
7. Config Schema (v2.0)
version: "2.0" supported; v1.0 backward compatible.
chaos, contract, chaos_matrix, replays, scoring present and used.
- Replay session can be
file: "path" only; full session loaded from file. Validation updated so id/input/contract optional when file is set.
8. Changes Made During This Audit
- Replay session file-only —
ReplaySessionConfig allows session with only file; id/input/contract optional when file is set (defaults/loaded from file).
- CI replay path — Replay session file path resolved relative to config file directory:
config_path.parent / s.file.
- V2 example — Added
examples/v2_research_agent/: working HTTP agent (FastAPI), v2 flakestorm.yaml (chaos, contract, replays, scoring), replay file, README, requirements.txt.
9. Example: V2 Research Agent
- Agent:
examples/v2_research_agent/agent.py — FastAPI app with /invoke and /reset.
- Config:
examples/v2_research_agent/flakestorm.yaml — version 2.0, chaos, contract, chaos_matrix, replays.sessions with file, scoring.
- Replay:
examples/v2_research_agent/replays/incident_001.yaml.
- Usage: See
examples/v2_research_agent/README.md (start agent, then run flakestorm run, flakestorm contract run, flakestorm replay run, flakestorm ci).
10. Test Status
- 181 tests passing (including chaos, contract, replay integration tests).
- V2 example config loads successfully (
load_config("examples/v2_research_agent/flakestorm.yaml")).
Audit complete. Implementation aligns with PRD and addendum; optional config and path resolution improved; V2 example added.