From 8eccb9da00cf5a707f0c8fdb35ec696c55962233 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 18 Feb 2026 14:13:00 +0000 Subject: [PATCH] Add test coverage analysis with prioritized improvement recommendations Analyzes test coverage across all four components (Rust crates, Python CLI, JS/TS apps, E2E tests) and identifies 16 specific areas for improvement, prioritized by impact on production reliability. https://claude.ai/code/session_01Shz5qKiTB9m6oxzEZWJVKk --- TEST_COVERAGE_ANALYSIS.md | 182 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 182 insertions(+) create mode 100644 TEST_COVERAGE_ANALYSIS.md diff --git a/TEST_COVERAGE_ANALYSIS.md b/TEST_COVERAGE_ANALYSIS.md new file mode 100644 index 00000000..cc1940bc --- /dev/null +++ b/TEST_COVERAGE_ANALYSIS.md @@ -0,0 +1,182 @@ +# Test Coverage Analysis + +**Date:** 2026-02-18 + +## Executive Summary + +The Plano codebase has significant test coverage gaps across all components. The Rust crates have ~262 unit tests covering roughly 0.35% of ~75,800 lines of code. The Python CLI has 29 tests covering 4 of 12 modules. The JavaScript/TypeScript apps and packages have **zero tests**. The E2E suite covers the core happy-path flows well but lacks error, edge-case, and performance scenarios. + +Below is a prioritized breakdown of gaps and recommendations. + +--- + +## 1. Rust Crates (`crates/`) + +### Current State + +| Crate | LOC | Tests | Status | +|-------|-----|-------|--------| +| common | 3,912 | 33 | Partial | +| hermesllm | 17,540 | 134 | Partial | +| prompt_gateway | 1,717 | 4 | Critical gap | +| llm_gateway | 1,399 | 0 | Critical gap | +| brightstaff | 13,342 | 91 | Partial | +| **Total** | **~75,800** | **262** | | + +### Critical Gaps + +**llm_gateway — 0 tests, 1,399 LOC.** This WASM filter handles LLM request/response processing and streaming. It has no tests at all. `stream_context.rs` alone is ~1,000 lines of complex streaming logic with zero coverage. + +**prompt_gateway — 4 tests, 1,717 LOC.** The WASM filter for prompt processing and guardrails has near-zero coverage. Untested modules include `filter_context.rs`, `http_context.rs`, `context.rs`, and `metrics.rs`. The intent-matching logic in `stream_context.rs` (~900 lines) has only 1 test. + +**brightstaff pipeline and state management — ~2,200 LOC untested.** The core request pipeline (`handlers/pipeline_processor.rs`, 834 lines), state persistence layer (`state/memory.rs`, `state/postgresql.rs`, `state/response_state_processor.rs` — 1,370 lines combined), and key handler endpoints (`handlers/llm.rs`, `handlers/agent_chat_completions.rs`) have no tests. + +### Partially Covered Areas Needing More Tests + +- **hermesllm streaming transforms** — The non-streaming request/response transforms are well-tested (134 tests), but the streaming buffer modules (`sse.rs`, `amazon_bedrock_binary_frame.rs`, `to_openai_streaming.rs`, `to_anthropic_streaming.rs` — ~5,000 LOC) are untested. +- **common/routing.rs, common/errors.rs, common/http.rs, common/stats.rs, common/tracing.rs** — Utility modules totaling ~560 lines with no coverage. +- **brightstaff router services** — `llm_router.rs` and `plano_orchestrator.rs` (~400 lines) lack tests despite handling routing decisions. + +### Recommendations + +1. **Add unit tests for llm_gateway.** Start with `stream_context.rs` — test streaming chunk assembly, partial frame handling, error recovery, and the filter lifecycle. A WASM-mocking test harness or extracting the core logic into testable pure functions would help. + +2. **Add unit tests for prompt_gateway filter logic.** Test `http_context.rs` request/response handling, `filter_context.rs` lifecycle, and the guardrail filtering paths in `stream_context.rs`. + +3. **Test the brightstaff pipeline processor.** This is the central message processing pipeline. Mock the downstream dependencies and test the orchestration logic, error paths, and streaming assembly. + +4. **Test state persistence.** Both the in-memory and PostgreSQL backends need tests for basic CRUD, concurrent access, state expiration, and connection failure recovery. + +5. **Test hermesllm streaming transforms.** The SSE parser, Bedrock binary frame decoder, and streaming-to-OpenAI/Anthropic converters need unit tests, especially for edge cases like partial frames, malformed chunks, and connection resets. + +--- + +## 2. Python CLI (`cli/`) + +### Current State + +| Module | LOC | Tested? | +|--------|-----|---------| +| config_generator.py | 514 | Yes | +| versioning.py | 70 | Yes | +| init_cmd.py | 303 | Yes | +| trace_cmd.py | 993 | Minimal (2 tests) | +| main.py | 441 | No | +| targets.py | 365 | No | +| core.py | 234 | No | +| docker_cli.py | 143 | No | +| template_sync.py | 122 | No | +| utils.py | 285 | Partial | + +**29 total tests across 4 files. 8 of 12 modules are untested or minimally tested.** + +### Critical Gaps + +**main.py — 0 tests, 441 LOC.** All CLI commands (`up`, `down`, `build`, `logs`, `cli_agent`, `generate_prompt_targets`) are untested. The `up` command alone contains complex logic for port conflict detection, API key validation, and container orchestration. + +**targets.py — 0 tests, 365 LOC.** The AST-based Python code parser for extracting prompt targets from Flask/FastAPI routes and Pydantic models is entirely untested. This is complex parsing logic prone to edge cases. + +**core.py — 0 tests, 234 LOC.** Docker container lifecycle management (start, stop, health check retry loop, timeout handling) is untested. + +**docker_cli.py — 0 tests, 143 LOC.** All 7 Docker subprocess wrapper functions lack tests. + +**trace_cmd.py — 2 tests for 993 LOC.** Only gRPC server bind error handling is tested. The trace collection, OTEL processing, and trace analysis logic are untested. + +### Recommendations + +6. **Add CLI command tests using Click's CliRunner.** Test `planoai up`, `planoai down`, and `planoai build` with mocked Docker operations. Verify argument validation, error messages, and exit codes. + +7. **Add tests for targets.py.** Test Flask route extraction, FastAPI route extraction, Pydantic model field parsing, type annotation handling, and edge cases (nested decorators, complex type hints, missing docstrings). + +8. **Add tests for core.py with mocked subprocess/Docker calls.** Test the health check retry loop, container state transitions (not found → start, running → restart), timeout behavior, and port forwarding. + +9. **Add a shared conftest.py** with common fixtures for environment setup, temporary config files, and Docker mocking. + +--- + +## 3. JavaScript/TypeScript (`apps/`, `packages/`) + +### Current State + +**Zero test files. No test framework configured. No test scripts in any package.json.** + +The codebase has 70+ TypeScript/React source files across two Next.js apps (`apps/www`, `apps/katanemo-www`) and shared packages (`packages/ui`, `packages/shared-styles`). + +Quality tooling is limited to type checking (`tsc --noEmit`) and linting (Biome). + +### Notable Untested Code + +- **`apps/www/src/utils/asciiBuilder.ts`** (425 lines) — Pure utility functions for ASCII diagram generation (`calculateCenterPadding`, `createArrow`, `buildBox`, `fixDiagramSpacing`, `createFlowDiagram`). This is the most testable code in the frontend. +- **`packages/ui/src/`** — 5 shared UI components (Navbar, Footer, Logo, Button, Dialog) used across apps. +- **`apps/www/src/app/api/contact/route.ts`** — API route handler. + +### Recommendations + +10. **Set up Vitest** (or Jest) in the Turbo workspace with a root-level `test` script. Add `@testing-library/react` for component testing. + +11. **Add unit tests for `asciiBuilder.ts`.** These are pure functions with clear inputs and outputs — ideal first candidates. + +12. **Add component tests for shared `packages/ui` components.** These are reused across apps and should have rendering and interaction tests. + +Note: The JS/TS apps are marketing websites, not the core proxy. Prioritize this lower than Rust and Python testing. + +--- + +## 4. E2E Tests (`tests/e2e/`) + +### Current State + +~40 active tests across 4 test files, covering: +- OpenAI and Anthropic SDK integration (streaming and non-streaming) +- Model alias routing and format translation +- Function calling end-to-end flows +- OpenAI Responses API (v1/responses) +- Conversation state management (memory backend) +- Cross-provider format translation (OpenAI client → Claude model, etc.) + +### Gaps + +**Error and failure scenarios are underrepresented.** Only 2 tests cover error handling (400 errors with aliases). There are no tests for: +- Upstream provider unavailability or timeouts +- Malformed request payloads +- Rate limiting behavior +- Invalid API keys +- Partial stream failures or disconnections + +**Bedrock tests are all skipped.** 6 AWS Bedrock tests are marked as unreliable and skipped, leaving this provider path untested in CI. + +**PostgreSQL state storage is untested.** State management E2E tests only use the memory backend. The PostgreSQL backend (which is the production path) has no E2E coverage. + +**No concurrent request testing.** There are no tests validating behavior under concurrent load or verifying resource cleanup. + +**No configuration validation E2E tests.** Invalid config files, missing required fields, and config hot-reload are not tested end-to-end. + +### Recommendations + +13. **Add E2E error scenario tests.** Test upstream timeouts, 5xx errors from providers, malformed responses, and rate limit responses. These are the scenarios most likely to cause production incidents. + +14. **Fix or replace the skipped Bedrock tests.** If Bedrock is flaky, consider using a mock provider or stub that mimics the Bedrock binary event stream format. + +15. **Add PostgreSQL state storage E2E tests.** Use a PostgreSQL container in Docker Compose and test state persistence, multi-turn retrieval, and state cleanup. + +16. **Add concurrent request tests.** Use `pytest-xdist` (already in dependencies) to validate behavior under parallel requests. + +--- + +## Priority Summary + +| Priority | Area | Recommendation | +|----------|------|----------------| +| **P0** | Rust: llm_gateway | Add unit tests for streaming response handling (#1) | +| **P0** | Rust: prompt_gateway | Add unit tests for filter logic and guardrails (#2) | +| **P0** | Rust: brightstaff pipeline | Test the core pipeline processor (#3) | +| **P1** | Rust: state persistence | Test memory and PostgreSQL backends (#4) | +| **P1** | Rust: streaming transforms | Test hermesllm streaming modules (#5) | +| **P1** | Python: CLI commands | Test main.py commands with CliRunner (#6) | +| **P1** | Python: targets.py | Test AST parsing logic (#7) | +| **P1** | E2E: error scenarios | Test upstream failures, timeouts, rate limits (#13) | +| **P2** | Python: core.py | Test Docker lifecycle management (#8) | +| **P2** | E2E: PostgreSQL state | Test production state backend (#15) | +| **P2** | E2E: Bedrock | Fix skipped Bedrock tests (#14) | +| **P3** | JS/TS: test setup | Set up Vitest, test utilities (#10, #11) | +| **P3** | E2E: concurrency | Add parallel request tests (#16) |