create md files for coding agents and for humans

This commit is contained in:
Adil Hafeez 2026-02-09 23:34:18 -08:00
parent 46de89590b
commit 3f8aa14e4c
No known key found for this signature in database
GPG key ID: 9B18EF7691369645
12 changed files with 1407 additions and 0 deletions

View file

@ -0,0 +1,35 @@
# ADR 001: Envoy as the Data Plane
**Status:** Accepted
## Context
Plano needs to proxy all traffic between clients, LLM providers, and developer APIs. The options were:
1. Build a custom proxy from scratch in Rust (e.g., using `hyper`/`axum` directly)
2. Use an existing L7 proxy (Envoy, NGINX, HAProxy) and extend it
3. Use a service mesh sidecar approach
We need: TLS termination, connection pooling, retry policies, load balancing, header-based routing, streaming support (SSE), compression, and observability — all at production quality.
## Decision
Use **Envoy Proxy** as the data plane. All external traffic — both inbound client requests and outbound LLM/API calls — flows through Envoy. The native Rust service (Brightstaff) never makes direct outbound connections to external hosts.
## Consequences
**Enables:**
- Production-grade L7 proxying (TLS, HTTP/2, connection pooling, retries) without building it ourselves
- WASM filter extension model for inline request/response processing
- Standard observability (access logs, stats, tracing) out of the box
- Header-based routing via Envoy's route configuration — no custom routing code needed for cluster selection
- Hot-restart and graceful draining for zero-downtime updates
**Requires:**
- All Brightstaff external calls must go through Envoy listeners (localhost:12001 for LLMs, localhost:11000 for APIs)
- Custom headers (`x-arch-*`) for routing decisions — Envoy matches on these in its route config
- Envoy configuration must be generated from user config (Jinja2 template → envoy.yaml)
- Team must understand Envoy's configuration model (listeners, clusters, filter chains)
**Prevents:**
- Direct HTTP calls from Brightstaff to external services (this is intentional — it ensures all traffic gets WASM filter processing, auth injection, rate limiting, and observability)
- Simple single-binary deployment (we need Envoy + Brightstaff, managed by Supervisord)

View file

@ -0,0 +1,42 @@
# ADR 002: WASM Filters Over Native Envoy Filters
**Status:** Accepted
## Context
Envoy supports three extension mechanisms:
1. **Native C++ filters** — compiled into the Envoy binary, highest performance
2. **WASM filters** — compiled to WebAssembly, loaded at runtime via Envoy's WASM VM
3. **Lua filters** — scripted, limited functionality
4. **External processing (ext_proc)** — gRPC callout to an external service
We need filters that: parse and transform LLM request/response bodies, perform intent matching, inject authentication headers, enforce rate limits, and handle SSE stream reassembly.
## Decision
Use **WASM filters** written in Rust, compiled to `wasm32-wasip1`, loaded by Envoy's V8 runtime. We have two filters:
- `prompt_gateway.wasm` — inbound prompt processing (intent matching, guardrails, function calling)
- `llm_gateway.wasm` — outbound LLM processing (provider routing, auth, rate limiting, format translation)
## Consequences
**Enables:**
- Filters written in Rust with strong type safety and shared crates (`common`, `hermesllm`)
- Runtime-loadable: no need to rebuild Envoy itself
- Sandboxed execution: a filter crash doesn't bring down Envoy
- Same language (Rust) for WASM filters and Brightstaff — shared types and logic via workspace crates
**Requires:**
- No `tokio`, `async/await`, threads, filesystem, or network sockets in WASM crates
- All I/O must use `proxy-wasm` SDK's `dispatch_http_call` (callback-based)
- Dependencies must be WASM-compatible: `governor` needs `no_std` feature, no crates using `std::net`
- `crate-type = ["cdylib"]` — these build as shared libraries, not binaries
- Testing runs natively (`cargo test`), but building requires `--target wasm32-wasip1`
**Prevents:**
- Using async Rust patterns in filter code (callback-based `on_http_call_response` instead)
- Using popular HTTP client crates (`reqwest`, `hyper`) in filters
- Easy debugging — WASM filters run inside Envoy's V8 VM with limited introspection
**Trade-off vs. ext_proc:**
External processing would allow using Brightstaff (native Rust with full async) for all processing, but would add network round-trips for every request. WASM filters run inline in Envoy's filter chain — zero additional network hops for common operations like auth injection and rate limiting.

View file

@ -0,0 +1,42 @@
# ADR 003: Single Container with Supervisord
**Status:** Accepted
## Context
Plano has three runtime processes:
1. **Envoy Proxy** — the data plane with WASM filters
2. **Brightstaff** — the Rust HTTP service for routing and orchestration
3. **Config generator** — Python script that validates config and renders Envoy's YAML (runs at startup)
The options for deployment were:
1. **Separate containers** — each process in its own container, orchestrated by Docker Compose / K8s
2. **Single container with process manager** — all processes in one container, managed by Supervisord
3. **Single binary** — embed Envoy or reimplement its core functionality
## Decision
Run all processes in a **single container** managed by **Supervisord**. The startup sequence:
1. Config generator validates `arch_config.yaml` and renders `envoy.yaml`
2. Supervisord starts Brightstaff and Envoy in parallel
3. A log tail process unifies access log output
## Consequences
**Enables:**
- Simple deployment: one container, one image, `docker run` just works
- No network latency between Envoy and Brightstaff (localhost communication)
- Config generation happens at container startup — no external config rendering step
- Easy development: `docker compose up` with volume mounts for hot-reload
**Requires:**
- Supervisord configuration (`config/supervisord.conf`) to manage process lifecycle
- Health checks must account for both Envoy and Brightstaff readiness
- Logs from all processes need unified output (handled by the tail process)
**Prevents:**
- Independent scaling of Envoy vs. Brightstaff (they scale together as one unit)
- Kubernetes sidecar pattern (though this could be reconsidered)
- Process-level fault isolation (though Supervisord restarts failed processes)
**Trade-off:** Simplicity of deployment over horizontal scaling flexibility. For a gateway that needs to be deployed at the edge or as a sidecar, single-container simplicity is more valuable than the ability to scale components independently.

View file

@ -0,0 +1,45 @@
# ADR 004: hermesllm as a Pure Rust Library
**Status:** Accepted
## Context
LLM providers use different API formats (OpenAI Chat Completions, Anthropic Messages, Amazon Bedrock Converse, Gemini). The gateway needs to translate between these formats in two places:
1. In the `llm_gateway` WASM filter (inline in Envoy)
2. In Brightstaff (for routing decisions and response processing)
The options were:
1. Duplicate translation logic in both places
2. Put translation logic in `common` (shared crate, but WASM-constrained)
3. Create a separate pure Rust library with no WASM dependencies
## Decision
Create **`hermesllm`** as a standalone Rust library that handles all LLM protocol translation. It must never depend on `proxy-wasm` or `common`. Both WASM crates (via `common`) and Brightstaff use `hermesllm` directly.
## Consequences
**Enables:**
- Single source of truth for LLM protocol translation
- Reusable outside the gateway context (could be published as an independent crate)
- Full Rust standard library available (no WASM constraints on the library itself)
- Clean separation: protocol knowledge lives in `hermesllm`, gateway logic lives in filters
**Requires:**
- `hermesllm` must not import `proxy-wasm`, `common`, or any WASM-specific crate
- Adding a new provider requires changes only in `hermesllm` (plus config in `common/configuration.rs` and `envoy.template.yaml`)
- Types shared between `hermesllm` and the filters go through `common`'s re-exports
**Prevents:**
- Circular dependencies (hermesllm is always a leaf in the dependency graph)
- Accidentally coupling protocol translation to WASM runtime specifics
- Needing to maintain two separate translation implementations
**Dependency direction:**
```
prompt_gateway → common → hermesllm
llm_gateway → common → hermesllm
llm_gateway → hermesllm (direct)
brightstaff → hermesllm (direct)
hermesllm → (no workspace deps)
```

View file

@ -0,0 +1,40 @@
# ADR 005: Header-Based Routing Protocol
**Status:** Accepted
## Context
Envoy needs to route requests to different upstream clusters (LLM providers, developer APIs, agents) based on runtime decisions made by WASM filters and Brightstaff. The options were:
1. **Path-based routing** — different URL paths for different upstreams
2. **Header-based routing** — custom headers to signal routing decisions
3. **Dynamic cluster selection** — programmatic cluster selection in filters
## Decision
Use **custom `x-arch-*` headers** for all routing decisions. WASM filters and Brightstaff set headers like `x-arch-llm-provider` and `x-arch-upstream`, and Envoy's route configuration matches on these headers to select the upstream cluster.
All header names are defined as constants in `common/src/consts.rs` — this is the single source of truth.
## Consequences
**Enables:**
- Decoupled routing: WASM filters decide *where* to route, Envoy handles *how* to connect
- Transparent to the client — custom headers are internal, clients see standard HTTP
- Easy to debug: inspect headers to understand routing decisions
- Composable: multiple filters can add/modify routing headers in the filter chain
**Requires:**
- Header names must be consistent between `consts.rs` and `envoy.template.yaml`
- Any new routing dimension needs a new header constant + Envoy route match rule
- Developers must grep all consumers when changing a header name
**Prevents:**
- Routing logic in Envoy's configuration alone (routing decisions are made by Rust code, not Envoy config)
- Using Envoy's native routing features (like weighted clusters) independently — they must be combined with header matching
**Key headers:**
- `x-arch-llm-provider` — LLM provider cluster selection (Envoy route matching)
- `x-arch-llm-provider-hint` — Provider hint from Brightstaff to llm_gateway
- `x-arch-upstream` — Agent/API endpoint cluster selection
- `x-arch-streaming-request` — Streaming mode signal
- `x-arch-state` — Multi-turn conversation state (prompt_gateway internal)

View file

@ -0,0 +1,48 @@
# ADR 006: Config Generation Pipeline (Python + Jinja2)
**Status:** Accepted
## Context
Envoy's configuration is a large YAML file that must describe all listeners, clusters, filter chains, TLS contexts, and WASM filter configs. This configuration depends on user-provided settings (which LLM providers to use, which agents to connect, which endpoints to expose).
The options were:
1. **Static Envoy config** — users edit Envoy YAML directly
2. **Rust-based config generator** — generate Envoy config from a Rust binary
3. **Python + Jinja2 template** — validate user config against a schema, then render Envoy config from a template
## Decision
Use a **Python config generator** (`cli/planoai/config_generator.py`) that:
1. Validates user's `arch_config.yaml` against a JSON Schema (`config/arch_config_schema.yaml`)
2. Applies transformations (legacy format conversion, cluster inference, internal model injection)
3. Renders `config/envoy.template.yaml` (Jinja2) into the final `envoy.yaml`
4. Produces `arch_config_rendered.yaml` for Brightstaff and WASM filter consumption
This runs at container startup, before Envoy starts.
## Consequences
**Enables:**
- Simple user-facing config format (`arch_config.yaml`) — users don't need to understand Envoy internals
- JSON Schema validation catches errors before Envoy starts
- Jinja2 templating is mature, well-understood, and powerful for generating complex YAML
- Python CLI (`planoai`) can also handle Docker management and other tooling
- Config validation is independently testable (`cli/test/test_config_generator.py`)
**Requires:**
- Python runtime in the Docker image (adds image size)
- Config changes need updates in 4 places: schema, template, Python validator, Rust struct
- Understanding of Jinja2 templating for Envoy config modifications
- `arch_config_rendered.yaml` must be kept in sync between Python generator and Rust deserialization
**Prevents:**
- Dynamic config reloading without container restart (config is generated at startup)
- Using Envoy's xDS protocol for dynamic configuration (could be added later)
- Rust-only development workflow — Python is required for config generation
**4-file update rule:** Every new user-facing config field requires changes to:
1. `config/arch_config_schema.yaml` — JSON Schema definition
2. `config/envoy.template.yaml` — Jinja2 template (if Envoy needs the value)
3. `cli/planoai/config_generator.py` — Python validation and rendering logic
4. `common/src/configuration.rs` — Rust `Configuration` struct (for runtime consumption)

22
docs/ADR/README.md Normal file
View file

@ -0,0 +1,22 @@
# Architecture Decision Records
This directory contains Architecture Decision Records (ADRs) for the Plano project. ADRs document key architectural decisions, their context, and rationale — preventing future contributors (human or AI) from unknowingly reversing deliberate choices.
## Index
| ADR | Title | Status |
|-----|-------|--------|
| [001](001-envoy-as-data-plane.md) | Envoy as the Data Plane | Accepted |
| [002](002-wasm-filters-over-native.md) | WASM Filters Over Native Envoy Filters | Accepted |
| [003](003-single-container-supervisord.md) | Single Container with Supervisord | Accepted |
| [004](004-hermesllm-pure-rust.md) | hermesllm as a Pure Rust Library | Accepted |
| [005](005-header-based-routing.md) | Header-Based Routing Protocol | Accepted |
| [006](006-config-generation-pipeline.md) | Config Generation Pipeline (Python + Jinja2) | Accepted |
## ADR Format
Each ADR follows this structure:
- **Status**: Proposed / Accepted / Deprecated / Superseded
- **Context**: What problem or question prompted this decision
- **Decision**: What was decided
- **Consequences**: Trade-offs, implications, and what this enables or prevents