create md files for coding agents and for humans

2026-06-17 15:25:17 +02:00 · 2026-02-09 23:34:18 -08:00 · 2026-02-09 23:34:18 -08:00 · 3f8aa14e4c
commit 3f8aa14e4c
parent 46de89590b
12 changed files with 1407 additions and 0 deletions
--- a/docs/ADR/001-envoy-as-data-plane.md
+++ b/docs/ADR/001-envoy-as-data-plane.md
@ -0,0 +1,35 @@
+# ADR 001: Envoy as the Data Plane
+
+**Status:** Accepted
+
+## Context
+
+Plano needs to proxy all traffic between clients, LLM providers, and developer APIs. The options were:
+1. Build a custom proxy from scratch in Rust (e.g., using `hyper`/`axum` directly)
+2. Use an existing L7 proxy (Envoy, NGINX, HAProxy) and extend it
+3. Use a service mesh sidecar approach
+
+We need: TLS termination, connection pooling, retry policies, load balancing, header-based routing, streaming support (SSE), compression, and observability — all at production quality.
+
+## Decision
+
+Use **Envoy Proxy** as the data plane. All external traffic — both inbound client requests and outbound LLM/API calls — flows through Envoy. The native Rust service (Brightstaff) never makes direct outbound connections to external hosts.
+
+## Consequences
+
+**Enables:**
+- Production-grade L7 proxying (TLS, HTTP/2, connection pooling, retries) without building it ourselves
+- WASM filter extension model for inline request/response processing
+- Standard observability (access logs, stats, tracing) out of the box
+- Header-based routing via Envoy's route configuration — no custom routing code needed for cluster selection
+- Hot-restart and graceful draining for zero-downtime updates
+
+**Requires:**
+- All Brightstaff external calls must go through Envoy listeners (localhost:12001 for LLMs, localhost:11000 for APIs)
+- Custom headers (`x-arch-*`) for routing decisions — Envoy matches on these in its route config
+- Envoy configuration must be generated from user config (Jinja2 template → envoy.yaml)
+- Team must understand Envoy's configuration model (listeners, clusters, filter chains)
+
+**Prevents:**
+- Direct HTTP calls from Brightstaff to external services (this is intentional — it ensures all traffic gets WASM filter processing, auth injection, rate limiting, and observability)
+- Simple single-binary deployment (we need Envoy + Brightstaff, managed by Supervisord)
--- a/docs/ADR/002-wasm-filters-over-native.md
+++ b/docs/ADR/002-wasm-filters-over-native.md
@ -0,0 +1,42 @@
+# ADR 002: WASM Filters Over Native Envoy Filters
+
+**Status:** Accepted
+
+## Context
+
+Envoy supports three extension mechanisms:
+1. **Native C++ filters** — compiled into the Envoy binary, highest performance
+2. **WASM filters** — compiled to WebAssembly, loaded at runtime via Envoy's WASM VM
+3. **Lua filters** — scripted, limited functionality
+4. **External processing (ext_proc)** — gRPC callout to an external service
+
+We need filters that: parse and transform LLM request/response bodies, perform intent matching, inject authentication headers, enforce rate limits, and handle SSE stream reassembly.
+
+## Decision
+
+Use **WASM filters** written in Rust, compiled to `wasm32-wasip1`, loaded by Envoy's V8 runtime. We have two filters:
+- `prompt_gateway.wasm` — inbound prompt processing (intent matching, guardrails, function calling)
+- `llm_gateway.wasm` — outbound LLM processing (provider routing, auth, rate limiting, format translation)
+
+## Consequences
+
+**Enables:**
+- Filters written in Rust with strong type safety and shared crates (`common`, `hermesllm`)
+- Runtime-loadable: no need to rebuild Envoy itself
+- Sandboxed execution: a filter crash doesn't bring down Envoy
+- Same language (Rust) for WASM filters and Brightstaff — shared types and logic via workspace crates
+
+**Requires:**
+- No `tokio`, `async/await`, threads, filesystem, or network sockets in WASM crates
+- All I/O must use `proxy-wasm` SDK's `dispatch_http_call` (callback-based)
+- Dependencies must be WASM-compatible: `governor` needs `no_std` feature, no crates using `std::net`
+- `crate-type = ["cdylib"]` — these build as shared libraries, not binaries
+- Testing runs natively (`cargo test`), but building requires `--target wasm32-wasip1`
+
+**Prevents:**
+- Using async Rust patterns in filter code (callback-based `on_http_call_response` instead)
+- Using popular HTTP client crates (`reqwest`, `hyper`) in filters
+- Easy debugging — WASM filters run inside Envoy's V8 VM with limited introspection
+
+**Trade-off vs. ext_proc:**
+External processing would allow using Brightstaff (native Rust with full async) for all processing, but would add network round-trips for every request. WASM filters run inline in Envoy's filter chain — zero additional network hops for common operations like auth injection and rate limiting.
--- a/docs/ADR/003-single-container-supervisord.md
+++ b/docs/ADR/003-single-container-supervisord.md
@ -0,0 +1,42 @@
+# ADR 003: Single Container with Supervisord
+
+**Status:** Accepted
+
+## Context
+
+Plano has three runtime processes:
+1. **Envoy Proxy** — the data plane with WASM filters
+2. **Brightstaff** — the Rust HTTP service for routing and orchestration
+3. **Config generator** — Python script that validates config and renders Envoy's YAML (runs at startup)
+
+The options for deployment were:
+1. **Separate containers** — each process in its own container, orchestrated by Docker Compose / K8s
+2. **Single container with process manager** — all processes in one container, managed by Supervisord
+3. **Single binary** — embed Envoy or reimplement its core functionality
+
+## Decision
+
+Run all processes in a **single container** managed by **Supervisord**. The startup sequence:
+1. Config generator validates `arch_config.yaml` and renders `envoy.yaml`
+2. Supervisord starts Brightstaff and Envoy in parallel
+3. A log tail process unifies access log output
+
+## Consequences
+
+**Enables:**
+- Simple deployment: one container, one image, `docker run` just works
+- No network latency between Envoy and Brightstaff (localhost communication)
+- Config generation happens at container startup — no external config rendering step
+- Easy development: `docker compose up` with volume mounts for hot-reload
+
+**Requires:**
+- Supervisord configuration (`config/supervisord.conf`) to manage process lifecycle
+- Health checks must account for both Envoy and Brightstaff readiness
+- Logs from all processes need unified output (handled by the tail process)
+
+**Prevents:**
+- Independent scaling of Envoy vs. Brightstaff (they scale together as one unit)
+- Kubernetes sidecar pattern (though this could be reconsidered)
+- Process-level fault isolation (though Supervisord restarts failed processes)
+
+**Trade-off:** Simplicity of deployment over horizontal scaling flexibility. For a gateway that needs to be deployed at the edge or as a sidecar, single-container simplicity is more valuable than the ability to scale components independently.
--- a/docs/ADR/004-hermesllm-pure-rust.md
+++ b/docs/ADR/004-hermesllm-pure-rust.md
@ -0,0 +1,45 @@
+# ADR 004: hermesllm as a Pure Rust Library
+
+**Status:** Accepted
+
+## Context
+
+LLM providers use different API formats (OpenAI Chat Completions, Anthropic Messages, Amazon Bedrock Converse, Gemini). The gateway needs to translate between these formats in two places:
+1. In the `llm_gateway` WASM filter (inline in Envoy)
+2. In Brightstaff (for routing decisions and response processing)
+
+The options were:
+1. Duplicate translation logic in both places
+2. Put translation logic in `common` (shared crate, but WASM-constrained)
+3. Create a separate pure Rust library with no WASM dependencies
+
+## Decision
+
+Create **`hermesllm`** as a standalone Rust library that handles all LLM protocol translation. It must never depend on `proxy-wasm` or `common`. Both WASM crates (via `common`) and Brightstaff use `hermesllm` directly.
+
+## Consequences
+
+**Enables:**
+- Single source of truth for LLM protocol translation
+- Reusable outside the gateway context (could be published as an independent crate)
+- Full Rust standard library available (no WASM constraints on the library itself)
+- Clean separation: protocol knowledge lives in `hermesllm`, gateway logic lives in filters
+
+**Requires:**
+- `hermesllm` must not import `proxy-wasm`, `common`, or any WASM-specific crate
+- Adding a new provider requires changes only in `hermesllm` (plus config in `common/configuration.rs` and `envoy.template.yaml`)
+- Types shared between `hermesllm` and the filters go through `common`'s re-exports
+
+**Prevents:**
+- Circular dependencies (hermesllm is always a leaf in the dependency graph)
+- Accidentally coupling protocol translation to WASM runtime specifics
+- Needing to maintain two separate translation implementations
+
+**Dependency direction:**
+```
+prompt_gateway → common → hermesllm
+llm_gateway    → common → hermesllm
+llm_gateway    → hermesllm (direct)
+brightstaff    → hermesllm (direct)
+hermesllm      → (no workspace deps)
+```
--- a/docs/ADR/005-header-based-routing.md
+++ b/docs/ADR/005-header-based-routing.md
@ -0,0 +1,40 @@
+# ADR 005: Header-Based Routing Protocol
+
+**Status:** Accepted
+
+## Context
+
+Envoy needs to route requests to different upstream clusters (LLM providers, developer APIs, agents) based on runtime decisions made by WASM filters and Brightstaff. The options were:
+1. **Path-based routing** — different URL paths for different upstreams
+2. **Header-based routing** — custom headers to signal routing decisions
+3. **Dynamic cluster selection** — programmatic cluster selection in filters
+
+## Decision
+
+Use **custom `x-arch-*` headers** for all routing decisions. WASM filters and Brightstaff set headers like `x-arch-llm-provider` and `x-arch-upstream`, and Envoy's route configuration matches on these headers to select the upstream cluster.
+
+All header names are defined as constants in `common/src/consts.rs` — this is the single source of truth.
+
+## Consequences
+
+**Enables:**
+- Decoupled routing: WASM filters decide *where* to route, Envoy handles *how* to connect
+- Transparent to the client — custom headers are internal, clients see standard HTTP
+- Easy to debug: inspect headers to understand routing decisions
+- Composable: multiple filters can add/modify routing headers in the filter chain
+
+**Requires:**
+- Header names must be consistent between `consts.rs` and `envoy.template.yaml`
+- Any new routing dimension needs a new header constant + Envoy route match rule
+- Developers must grep all consumers when changing a header name
+
+**Prevents:**
+- Routing logic in Envoy's configuration alone (routing decisions are made by Rust code, not Envoy config)
+- Using Envoy's native routing features (like weighted clusters) independently — they must be combined with header matching
+
+**Key headers:**
+- `x-arch-llm-provider` — LLM provider cluster selection (Envoy route matching)
+- `x-arch-llm-provider-hint` — Provider hint from Brightstaff to llm_gateway
+- `x-arch-upstream` — Agent/API endpoint cluster selection
+- `x-arch-streaming-request` — Streaming mode signal
+- `x-arch-state` — Multi-turn conversation state (prompt_gateway internal)
--- a/docs/ADR/006-config-generation-pipeline.md
+++ b/docs/ADR/006-config-generation-pipeline.md
@ -0,0 +1,48 @@
+# ADR 006: Config Generation Pipeline (Python + Jinja2)
+
+**Status:** Accepted
+
+## Context
+
+Envoy's configuration is a large YAML file that must describe all listeners, clusters, filter chains, TLS contexts, and WASM filter configs. This configuration depends on user-provided settings (which LLM providers to use, which agents to connect, which endpoints to expose).
+
+The options were:
+1. **Static Envoy config** — users edit Envoy YAML directly
+2. **Rust-based config generator** — generate Envoy config from a Rust binary
+3. **Python + Jinja2 template** — validate user config against a schema, then render Envoy config from a template
+
+## Decision
+
+Use a **Python config generator** (`cli/planoai/config_generator.py`) that:
+1. Validates user's `arch_config.yaml` against a JSON Schema (`config/arch_config_schema.yaml`)
+2. Applies transformations (legacy format conversion, cluster inference, internal model injection)
+3. Renders `config/envoy.template.yaml` (Jinja2) into the final `envoy.yaml`
+4. Produces `arch_config_rendered.yaml` for Brightstaff and WASM filter consumption
+
+This runs at container startup, before Envoy starts.
+
+## Consequences
+
+**Enables:**
+- Simple user-facing config format (`arch_config.yaml`) — users don't need to understand Envoy internals
+- JSON Schema validation catches errors before Envoy starts
+- Jinja2 templating is mature, well-understood, and powerful for generating complex YAML
+- Python CLI (`planoai`) can also handle Docker management and other tooling
+- Config validation is independently testable (`cli/test/test_config_generator.py`)
+
+**Requires:**
+- Python runtime in the Docker image (adds image size)
+- Config changes need updates in 4 places: schema, template, Python validator, Rust struct
+- Understanding of Jinja2 templating for Envoy config modifications
+- `arch_config_rendered.yaml` must be kept in sync between Python generator and Rust deserialization
+
+**Prevents:**
+- Dynamic config reloading without container restart (config is generated at startup)
+- Using Envoy's xDS protocol for dynamic configuration (could be added later)
+- Rust-only development workflow — Python is required for config generation
+
+**4-file update rule:** Every new user-facing config field requires changes to:
+1. `config/arch_config_schema.yaml` — JSON Schema definition
+2. `config/envoy.template.yaml` — Jinja2 template (if Envoy needs the value)
+3. `cli/planoai/config_generator.py` — Python validation and rendering logic
+4. `common/src/configuration.rs` — Rust `Configuration` struct (for runtime consumption)
--- a/docs/ADR/README.md
+++ b/docs/ADR/README.md
@ -0,0 +1,22 @@
+# Architecture Decision Records
+
+This directory contains Architecture Decision Records (ADRs) for the Plano project. ADRs document key architectural decisions, their context, and rationale — preventing future contributors (human or AI) from unknowingly reversing deliberate choices.
+
+## Index
+
+| ADR | Title | Status |
+|-----|-------|--------|
+| [001](001-envoy-as-data-plane.md) | Envoy as the Data Plane | Accepted |
+| [002](002-wasm-filters-over-native.md) | WASM Filters Over Native Envoy Filters | Accepted |
+| [003](003-single-container-supervisord.md) | Single Container with Supervisord | Accepted |
+| [004](004-hermesllm-pure-rust.md) | hermesllm as a Pure Rust Library | Accepted |
+| [005](005-header-based-routing.md) | Header-Based Routing Protocol | Accepted |
+| [006](006-config-generation-pipeline.md) | Config Generation Pipeline (Python + Jinja2) | Accepted |
+
+## ADR Format
+
+Each ADR follows this structure:
+- **Status**: Proposed / Accepted / Deprecated / Superseded
+- **Context**: What problem or question prompted this decision
+- **Decision**: What was decided
+- **Consequences**: Trade-offs, implications, and what this enables or prevents
--- a/docs/DATA_CONTRACTS.md
+++ b/docs/DATA_CONTRACTS.md
@ -0,0 +1,221 @@
+# Data Contracts — Inter-Component Communication
+
+This document defines the contracts between Plano's components: custom HTTP headers, internal API formats, streaming protocols, and Envoy routing conventions. Breaking any of these contracts will cause silent routing failures.
+
+---
+
+## 1. Custom Header Protocol
+
+All custom headers are defined in `common/src/consts.rs`. This is the **single source of truth** — if a header name appears in `envoy.template.yaml` or Brightstaff code, it must match the constant in `consts.rs`.
+
+### Routing Headers (Envoy-critical)
+
+These headers are used in Envoy's `route_config` for cluster selection. Changing them requires updating `envoy.template.yaml`.
+
+| Header | Constant | Set By | Read By | Value Format | Purpose |
+|---|---|---|---|---|---|
+| `x-arch-llm-provider` | `ARCH_ROUTING_HEADER` | WASM filters | Envoy routes | Provider slug (e.g., `openai`, `anthropic`) | Selects the LLM provider cluster in Envoy |
+| `x-arch-upstream` | `ARCH_UPSTREAM_HOST_HEADER` | WASM filters, Brightstaff | Envoy routes | Cluster name (e.g., agent endpoint name) | Routes to a specific upstream cluster |
+| `x-arch-llm-provider-hint` | `ARCH_PROVIDER_HINT_HEADER` | Brightstaff | llm_gateway | `provider/model` (e.g., `openai/gpt-4`) | Hints which provider+model to use |
+| `x-arch-agent-listener-name` | — | Envoy (set in route config) | Brightstaff | Listener name string | Identifies which agent listener a request arrived on |
+
+### Internal State Headers (WASM filter internal)
+
+These headers pass state between the prompt_gateway filter's request/response phases or between prompt_gateway and the function calling service.
+
+| Header | Constant | Set By | Read By | Value Format | Purpose |
+|---|---|---|---|---|---|
+| `x-arch-state` | `X_ARCH_STATE_HEADER` | prompt_gateway | prompt_gateway | Base64-encoded JSON (`ArchState`) | Multi-turn conversation state across filter invocations |
+| `x-arch-tool-call-message` | `X_ARCH_TOOL_CALL` | prompt_gateway | prompt_gateway | JSON string | Tool call metadata for API orchestration |
+| `x-arch-api-response-message` | `X_ARCH_API_RESPONSE` | prompt_gateway | prompt_gateway | JSON string | Developer API response data |
+| `x-arch-fc-model-response` | `X_ARCH_FC_MODEL_RESPONSE` | prompt_gateway | prompt_gateway | JSON string | Raw Arch-Function model response |
+| `x-arch-llm-route` | `LLM_ROUTE_HEADER` | Brightstaff | llm_gateway | Route name string | LLM route decision result |
+
+### Signaling Headers
+
+| Header | Constant | Set By | Read By | Purpose |
+|---|---|---|---|---|
+| `x-arch-streaming-request` | `ARCH_IS_STREAMING_HEADER` | Brightstaff | llm_gateway | Indicates the request is streaming mode |
+| `x-arch-ratelimit-selector` | `RATELIMIT_SELECTOR_HEADER_KEY` | Client / Envoy | llm_gateway | Key for per-tenant rate limit partitioning |
+
+### Standard Headers Used
+
+| Header | Constant | Purpose |
+|---|---|---|
+| `x-request-id` | `REQUEST_ID_HEADER` | Request tracing (set by Envoy or caller) |
+| `x-envoy-original-path` | `ENVOY_ORIGINAL_PATH_HEADER` | Original path before Envoy rewrites |
+| `x-envoy-max-retries` | `ENVOY_RETRY_HEADER` | Retry count for Envoy's retry policy |
+| `traceparent` | `TRACE_PARENT_HEADER` | W3C Trace Context for OpenTelemetry |
+
+---
+
+## 2. Internal Cluster Names
+
+Defined in `consts.rs` and referenced in `envoy.template.yaml`:
+
+| Constant | Value | Target | Purpose |
+|---|---|---|---|
+| `MODEL_SERVER_NAME` | `"bright_staff"` | localhost:9091 | Brightstaff service |
+| `ARCH_INTERNAL_CLUSTER_NAME` | `"arch_internal"` | localhost:11000 | Outbound API router |
+| `ARCH_FC_CLUSTER` | `"arch"` | archfc.katanemo.dev:443 | Katanemo Arch-Function model |
+
+Additional clusters generated from config:
+- `arch_prompt_gateway_listener` → localhost:10001
+- `arch_listener_llm` → localhost:12001
+- Per-provider clusters (e.g., `openai`, `anthropic`, `gemini`) from `envoy.template.yaml`
+- Per-agent/endpoint clusters from user config
+
+---
+
+## 3. Internal API Formats
+
+### Brightstaff → Envoy (LLM requests via :12001)
+
+Brightstaff sends OpenAI-compatible `ChatCompletionsRequest` JSON to `localhost:12001` with:
+- `x-arch-llm-provider-hint: <provider>/<model>` to select the provider
+- `x-arch-is-streaming: true/false` to indicate streaming
+- Standard `Content-Type: application/json`
+- `traceparent` for distributed tracing
+
+The `llm_gateway` WASM filter at :12001 transforms the request to the target provider's format.
+
+### Brightstaff → Envoy (Agent/API requests via :11000)
+
+Brightstaff sends requests to `localhost:11000` with:
+- `x-arch-upstream-host: <cluster_name>` to route to the target agent/API
+- `x-envoy-max-retries: 3` for resilience
+
+**MCP Agent Protocol:**
+```
+POST /  (with x-arch-upstream-host)
+Content-Type: application/json
+
+# Step 1: Initialize
+{"jsonrpc":"2.0","method":"initialize","id":"<uuid>","params":{...}}
+
+# Step 2: Initialized notification
+{"jsonrpc":"2.0","method":"notifications/initialized"}
+
+# Step 3: Tool call
+{"jsonrpc":"2.0","method":"tools/call","id":"<uuid>","params":{"name":"<tool>","arguments":{...}}}
+```
+
+**HTTP Agent Protocol:**
+```
+POST /  (with x-arch-upstream-host)
+Content-Type: application/json
+
+[{"role":"user","content":"..."},{"role":"assistant","content":"..."}]
+```
+Response: Array of messages.
+
+### prompt_gateway → Arch-Function (/function_calling)
+
+```
+POST /function_calling
+Content-Type: application/json
+
+{
+  "messages": [...],
+  "tools": [...],
+  "model": "Arch-Function",
+  "stream": false,
+  "metadata": {"raw_response": true, "logprobs": true}
+}
+```
+
+Response contains `tool_calls`, `response`, or `clarification` in the assistant message content (JSON string).
+
+---
+
+## 4. Streaming Protocol
+
+### SSE (Server-Sent Events) — Standard LLM Streaming
+
+All streaming LLM responses use SSE format:
+```
+data: {"id":"...","choices":[...]}\n\n
+data: {"id":"...","choices":[...]}\n\n
+data: [DONE]\n\n
+```
+
+**Important:** SSE events can be split across HTTP chunks. The `llm_gateway` uses `SseStreamBuffer` and `SseChunkProcessor` (from `hermesllm`) to reassemble events across chunk boundaries before processing.
+
+### Bedrock Binary Streaming
+
+Amazon Bedrock uses AWS Event Stream binary protocol instead of SSE. The `BedrockBinaryFrameDecoder` in `hermesllm` handles decoding.
+
+### Brightstaff Streaming
+
+Brightstaff uses `tokio::sync::mpsc` channels to stream responses:
+1. Spawns a background task to read from upstream (via `reqwest`)
+2. Parses SSE events, optionally transforms them
+3. Sends chunks through the mpsc channel
+4. Axum's `StreamBody` delivers to the client
+
+---
+
+## 5. Configuration Injection
+
+### WASM Filter Configuration
+
+Envoy injects config into WASM filters via the `configuration` field in the filter definition:
+
+- **prompt_gateway** receives: `prompt_targets`, `prompt_guards`, `system_prompt`, `endpoints`, `overrides`, `tracing`
+- **llm_gateway** receives: `model_providers`, `ratelimits`, `overrides`
+
+Both receive YAML strings parsed by `serde_yaml` in each filter's `RootContext::on_configure()`.
+
+### Brightstaff Configuration
+
+Brightstaff reads `arch_config_rendered.yaml` (path from `ARCH_CONFIG_PATH_RENDERED` env var), which contains the full rendered config including `model_providers`, `agents`, `filters`, `listeners`, `routing`, `model_aliases`, `state_storage`, `tracing`, and `overrides`.
+
+---
+
+## 6. Timeouts
+
+All timeouts are defined in `consts.rs`:
+
+| Constant | Value | Used For |
+|---|---|---|
+| `ARCH_FC_REQUEST_TIMEOUT_MS` | 30,000 ms | Arch-Function model calls from prompt_gateway |
+| `DEFAULT_TARGET_REQUEST_TIMEOUT_MS` | 30,000 ms | Default prompt target endpoint calls |
+| `API_REQUEST_TIMEOUT_MS` | 30,000 ms | Developer API calls from prompt_gateway |
+| `MODEL_SERVER_REQUEST_TIMEOUT_MS` | 30,000 ms | Model server calls |
+
+Envoy also enforces its own route-level timeouts configured in `envoy.template.yaml` (default 300s for LLM routes).
+
+---
+
+## 7. Error Response Format
+
+All error responses from Brightstaff follow this format:
+
+```json
+{
+  "error": {
+    "message": "Human-readable error description",
+    "type": "error_type",
+    "code": 400
+  }
+}
+```
+
+The `llm_gateway` WASM filter returns errors as:
+- HTTP 429 for rate limit exceeded
+- HTTP 503 for provider unavailable
+- The original upstream error status code for pass-through errors
+
+---
+
+## 8. Contract Change Checklist
+
+When modifying any data contract:
+
+- [ ] Update the constant in `common/src/consts.rs`
+- [ ] Grep the entire codebase for the old value (`grep -r "old_value" crates/`)
+- [ ] Update `config/envoy.template.yaml` if the header is used in routing
+- [ ] Update `cli/planoai/config_generator.py` if the config schema changed
+- [ ] Update `config/arch_config_schema.yaml` if user-facing config changed
+- [ ] Run `cargo test --workspace` to catch compile/test failures
+- [ ] Run `cd cli && python -m pytest test/` for config generation tests