mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-25 00:36:54 +02:00
116 lines
5.8 KiB
Markdown
116 lines
5.8 KiB
Markdown
# V2 Implementation Audit
|
||
|
||
**Date:** March 2026
|
||
**Reference:** [Flakestorm v2.md](Flakestorm%20v2.md), [flakestorm-v2-addendum.md](flakestorm-v2-addendum.md)
|
||
|
||
## Scope
|
||
|
||
Verification of the codebase against the PRD and addendum: behavior, config schema, CLI, and examples.
|
||
|
||
---
|
||
|
||
## 1. PRD §8.1 — Environment Chaos
|
||
|
||
| Requirement | Status | Implementation |
|
||
|-------------|--------|----------------|
|
||
| Tool faults: timeout, error, malformed, slow, malicious_response | ✅ | `chaos/faults.py`, `chaos/http_transport.py` (by match_url or tool `*`) |
|
||
| LLM faults: timeout, truncated_response, rate_limit, empty, garbage | ✅ | `chaos/llm_proxy.py`, `chaos/interceptor.py` |
|
||
| probability, after_calls, tool `*` | ✅ | `chaos/faults.should_trigger`, transport and interceptor |
|
||
| Built-in profiles: api_outage, degraded_llm, hostile_tools, high_latency, cascading_failure | ✅ | `chaos/profiles/*.yaml` |
|
||
| InstrumentedAgentAdapter / httpx transport | ✅ | `ChaosInterceptor`, `ChaosHttpTransport`, `HTTPAgentAdapter(transport=...)` |
|
||
|
||
---
|
||
|
||
## 2. PRD §8.2 — Behavioral Contracts
|
||
|
||
| Requirement | Status | Implementation |
|
||
|-------------|--------|----------------|
|
||
| Contract with id, severity, when, negate | ✅ | `ContractInvariantConfig`, `contracts/engine.py` |
|
||
| Chaos matrix (scenarios) | ✅ | `contract.chaos_matrix`, scenario → ChaosConfig per run |
|
||
| Resilience matrix N×M, weighted score | ✅ | `contracts/matrix.py` (critical×3, high×2, medium×1), FAIL if any critical |
|
||
| Invariant types: contains_any, output_not_empty, completes, excludes_pattern, behavior_unchanged | ✅ | Assertions + verifier; contract engine runs verifier with contract invariants |
|
||
| reset_endpoint / reset_function | ✅ | `AgentConfig`, `ContractEngine._reset_agent()` before each cell |
|
||
| Stateful warning when no reset | ✅ | `ContractEngine._detect_stateful_and_warn()`, `STATEFUL_WARNING` |
|
||
|
||
---
|
||
|
||
## 3. PRD §8.3 — Replay-Based Regression
|
||
|
||
| Requirement | Status | Implementation |
|
||
|-------------|--------|----------------|
|
||
| Replay session: input, tool_responses, contract | ✅ | `ReplaySessionConfig`, `replay/loader.py`, `replay/runner.py` |
|
||
| Contract by name or path | ✅ | `resolve_contract()` in loader |
|
||
| Verify against contract | ✅ | `ReplayRunner.run()` uses `InvariantVerifier` with resolved contract |
|
||
| Export from report | ✅ | `flakestorm replay export --from-report FILE` |
|
||
| Replays in config: sessions with file or inline | ✅ | `replays.sessions`; session can have `file` only (load from file) or full inline |
|
||
|
||
---
|
||
|
||
## 4. PRD §9 — Combined Modes & Resilience Score
|
||
|
||
| Requirement | Status | Implementation |
|
||
|-------------|--------|----------------|
|
||
| Mutation only, chaos only, mutation+chaos, contract, replay | ✅ | `run` (with --chaos, --chaos-only), `contract run`, `replay run` |
|
||
| Unified resilience score (mutation_robustness, chaos_resilience, contract_compliance, replay_regression, overall) | ✅ | `reports/models.TestResults.resilience_scores`; `flakestorm ci` computes overall from `scoring.weights` |
|
||
|
||
---
|
||
|
||
## 5. PRD §10 — CLI
|
||
|
||
| Command | Status |
|
||
|---------|--------|
|
||
| flakestorm run --chaos, --chaos-profile, --chaos-only | ✅ |
|
||
| flakestorm chaos | ✅ |
|
||
| flakestorm contract run / validate / score | ✅ |
|
||
| flakestorm replay run [PATH] | ✅ (replay run, replay export) |
|
||
| flakestorm replay export --from-report FILE | ✅ |
|
||
| flakestorm ci | ✅ (mutation + contract + chaos + replay + overall score) |
|
||
|
||
---
|
||
|
||
## 6. Addendum — Context Attacks, Model Drift, LangSmith, Spec
|
||
|
||
| Item | Status |
|
||
|------|--------|
|
||
| Context attacks module (indirect_injection, etc.) | ✅ `chaos/context_attacks.py`; profile `indirect_injection.yaml` |
|
||
| response_drift in llm_proxy | ✅ `chaos/llm_proxy.py` (json_field_rename, verbosity_shift, format_change, refusal_rephrase, tone_shift) |
|
||
| LangSmith load + schema check | ✅ `replay/loader.py`: `load_langsmith_run`, `_validate_langsmith_run_schema` |
|
||
| Python tool fault: fail loudly when no tools | ✅ `create_instrumented_adapter` raises if type=python and tool_faults |
|
||
| Contract matrix isolation (reset) | ✅ Optional reset; warning if stateful and no reset |
|
||
| Resilience score formula (addendum §6.3) | ✅ In `contracts/matrix.py` and `docs/V2_SPEC.md` |
|
||
|
||
---
|
||
|
||
## 7. Config Schema (v2.0)
|
||
|
||
- `version: "2.0"` supported; v1.0 backward compatible.
|
||
- `chaos`, `contract`, `chaos_matrix`, `replays`, `scoring` present and used.
|
||
- Replay session can be `file: "path"` only; full session loaded from file. Validation updated so `id`/`input`/`contract` optional when `file` is set.
|
||
|
||
---
|
||
|
||
## 8. Changes Made During This Audit
|
||
|
||
1. **Replay session file-only** — `ReplaySessionConfig` allows session with only `file`; `id`/`input`/`contract` optional when `file` is set (defaults/loaded from file).
|
||
2. **CI replay path** — Replay session file path resolved relative to config file directory: `config_path.parent / s.file`.
|
||
3. **V2 example** — Added `examples/v2_research_agent/`: working HTTP agent (FastAPI), v2 flakestorm.yaml (chaos, contract, replays, scoring), replay file, README, requirements.txt.
|
||
|
||
---
|
||
|
||
## 9. Example: V2 Research Agent
|
||
|
||
- **Agent:** `examples/v2_research_agent/agent.py` — FastAPI app with `/invoke` and `/reset`.
|
||
- **Config:** `examples/v2_research_agent/flakestorm.yaml` — version 2.0, chaos, contract, chaos_matrix, replays.sessions with file, scoring.
|
||
- **Replay:** `examples/v2_research_agent/replays/incident_001.yaml`.
|
||
- **Usage:** See `examples/v2_research_agent/README.md` (start agent, then run `flakestorm run`, `flakestorm contract run`, `flakestorm replay run`, `flakestorm ci`).
|
||
|
||
---
|
||
|
||
## 10. Test Status
|
||
|
||
- **181 tests passing** (including chaos, contract, replay integration tests).
|
||
- V2 example config loads successfully (`load_config("examples/v2_research_agent/flakestorm.yaml")`).
|
||
|
||
---
|
||
|
||
*Audit complete. Implementation aligns with PRD and addendum; optional config and path resolution improved; V2 example added.*
|