plano/crates/README.md

# Plano Rust Crates

This workspace contains 5 Rust crates that form the core of the Plano AI gateway. They are organized by compilation target and responsibility.

## Workspace Layout

```
crates/
├── Cargo.toml          # Workspace root (resolver = "2")
├── build.sh            # Builds WASM filters + native binary
├── brightstaff/        # Native Rust HTTP server (Axum)
├── common/             # Shared library (WASM-compatible)
├── hermesllm/          # LLM protocol translation (pure Rust)
├── llm_gateway/        # WASM filter: LLM routing & auth
└── prompt_gateway/     # WASM filter: intent matching & guardrails
```

---

## Crate Details

### `prompt_gateway` — Inbound Prompt Processing

| | |
|---|---|
| **Type** | `cdylib` (WASM filter) |
| **Target** | `wasm32-wasip1` |
| **Envoy listener** | `ingress_traffic_prompt` (:10001) |
| **Root ID** | `prompt_gateway` |
| **Depends on** | `common`, `proxy-wasm` |

**Responsibilities:**
- Intercepts incoming chat completion requests
- Converts `prompt_targets` into OpenAI tool definitions
- Dispatches to `Arch-Function` model for intent classification
- If intent matches: calls developer API endpoints, augments prompt with response context
- If no match: prepends system prompt, forwards to upstream LLM
- Manages multi-turn state via `x-arch-state` header
- Applies `prompt_guards` (jailbreak detection)

**Key modules:**
- `filter_context.rs` — RootContext, config parsing
- `http_context.rs` — Request interception, tool definition construction
- `stream_context.rs` — Core orchestration (intent matching, API calls, response handling)
- `tools.rs` — URL path/query parameter substitution for API calls

**Constraints:**
- No `tokio`, `async/await`, threads, or network sockets
- All HTTP calls via `proxy-wasm` `dispatch_http_call`

---

### `llm_gateway` — LLM Provider Routing & Translation

| | |
|---|---|
| **Type** | `cdylib` (WASM filter) |
| **Target** | `wasm32-wasip1` |
| **Envoy listeners** | `ingress_traffic_prompt` (:10001), `egress_traffic_llm` (:12001) |
| **Root ID** | `llm_gateway` |
| **Depends on** | `common`, `hermesllm`, `proxy-wasm` |

**Responsibilities:**
- Selects LLM provider based on `x-arch-llm-provider-hint` header or default
- Injects authentication credentials (Bearer token, x-api-key, passthrough)
- Rewrites request path for target provider API
- Transforms request/response formats between providers (OpenAI ↔ Anthropic ↔ Bedrock) via `hermesllm`
- Enforces token-based rate limits (`governor` with `no_std`)
- Handles SSE stream reassembly across chunk boundaries (`SseStreamBuffer`)
- Records metrics: TTFT, tokens/sec, request latency, rate-limited count

**Key modules:**
- `filter_context.rs` — RootContext, provider & rate limit initialization
- `stream_context.rs` — Request/response transformation, auth, rate limiting, streaming
- `metrics.rs` — Gauge, counter, histogram definitions

**Constraints:**
- Same WASM constraints as `prompt_gateway`
- Uses `hermesllm` for protocol translation — do NOT duplicate translation logic here

---

### `common` — Shared Types & Utilities

| | |
|---|---|
| **Type** | `lib` |
| **Target** | Both native and `wasm32-wasip1` |
| **Depends on** | `hermesllm`, `proxy-wasm`, `governor` (no_std), `tiktoken-rs` |

**Responsibilities:**
- Central configuration schema (`Configuration`, `LlmProvider`, `PromptTarget`, `PromptGuards`, etc.)
- `LlmProviders` collection — provider lookup with slug matching and wildcard expansion
- HTTP client trait wrapping `proxy-wasm` `dispatch_http_call`
- All `x-arch-*` header constants and path constants (`consts.rs`)
- Token-based rate limiting (`governor`, keyed by model + header selector)
- Token counting via `tiktoken-rs`
- OpenAI-compatible API types (`ChatCompletionsRequest`, `Message`, `ToolCall`, etc.)
- Error types (`ClientError`, `ServerError`)
- Metrics primitives (`Gauge`, `Counter`, `Histogram`)
- URL path parameter substitution
- PII obfuscation for logging

**Key modules:**
- `configuration.rs` — All config structs, deserialization, validation
- `consts.rs` — Canonical header names, paths, timeouts, cluster names
- `llm_providers.rs` — Provider collection with lookup logic
- `ratelimit.rs` — Token-based rate limiter (global `OnceLock`)
- `http.rs` — `Client` trait for WASM HTTP dispatch
- `tokenizer.rs` — Token counting (tiktoken, GPT-4 fallback)

**Constraints:**
- Must compile for `wasm32-wasip1` — no std networking, no threads
- Must NOT depend on `brightstaff`

---

### `hermesllm` — LLM Protocol Translation

| | |
|---|---|
| **Type** | `lib` |
| **Target** | Native only (but no WASM-incompatible deps) |
| **Depends on** | `serde`, `serde_json`, `aws-smithy-eventstream`, `uuid` |

**Responsibilities:**
- Cross-provider request/response translation (OpenAI ↔ Anthropic ↔ Amazon Bedrock ↔ Gemini)
- `ProviderRequest` / `ProviderResponse` / `ProviderStreamResponse` traits
- SSE stream parsing (`SseStreamIter`, `SseStreamBuffer`, `SseChunkProcessor`)
- AWS Event Stream binary frame decoding (Bedrock)
- Provider identification (`ProviderId` enum with model catalog from `provider_models.yaml`)
- Target endpoint path rewriting (`/v1/chat/completions` → provider-specific paths)

**Key modules:**
- `apis/` — Format definitions: `openai.rs`, `anthropic.rs`, `amazon_bedrock.rs`, `openai_responses.rs`
- `apis/streaming_shapes/` — SSE and binary stream parsing
- `providers/` — `id.rs` (ProviderId), `request.rs`, `response.rs`, `streaming_response.rs`
- `clients/endpoints.rs` — API path mapping
- `transforms/` — Request/response transformations organized by direction

**Constraints:**
- **MUST NOT depend on `proxy-wasm` or `common`** — this is a pure Rust library
- Must remain usable outside of the WASM/Envoy context
- Optional `model-fetch` feature gates network dependencies (`ureq`)

---

### `brightstaff` — Native HTTP Server

| | |
|---|---|
| **Type** | Binary (Axum) |
| **Target** | Native only |
| **Port** | `0.0.0.0:9091` |
| **Depends on** | `hermesllm`, `common` (non-WASM parts), `tokio`, `axum`, `reqwest`, `opentelemetry` |

**Responsibilities:**
- LLM request routing via `Arch-Router` model (selects best provider/model)
- Agent orchestration via `Plano-Orchestrator` model (selects and chains agents)
- Agent execution pipeline: filter chains → agent invocation (MCP JSON-RPC or HTTP)
- `Arch-Function` handler: tool calling with hallucination detection
- Conversation state management for Responses API (memory or PostgreSQL)
- Model alias resolution
- OpenTelemetry tracing with per-component service names
- Interaction signal analysis (frustration, repetition, escalation detection)

**Key modules:**
- `handlers/llm.rs` — LLM passthrough with routing
- `handlers/agent_chat_completions.rs` — Agent orchestration entry point
- `handlers/agent_selector.rs` — Agent selection logic
- `handlers/pipeline_processor.rs` — Sequential agent/filter execution
- `handlers/function_calling.rs` — Arch-Function tool calling
- `router/llm_router.rs` — `RouterService` (Arch-Router model)
- `router/plano_orchestrator.rs` — `OrchestratorService` (Plano-Orchestrator model)
- `state/` — `StateStorage` trait, memory & PostgreSQL backends
- `signals/` — Conversation quality analysis
- `tracing/` — OpenTelemetry setup with custom service name routing

**Constraints:**
- All external calls go through Envoy (localhost:12001 for LLMs, localhost:11000 for agents)
- Does NOT use `common`'s `proxy-wasm` Client trait — uses `reqwest` instead

---

## Dependency Graph

```
prompt_gateway ──► common ──► hermesllm
llm_gateway ───┬► common ──► hermesllm
               └► hermesllm
brightstaff ───┬► hermesllm
               └► common (config types only, not WASM code)

hermesllm ────► (standalone — no proxy-wasm, no common)
```

**Direction is strictly enforced:**
- Arrows point toward dependencies
- No cycles allowed
- `hermesllm` is the leaf node — it must never depend on any other workspace crate

---

## Build Commands

```bash
# Everything (recommended)
./build.sh

# Equivalent to:
cargo build --release --target wasm32-wasip1 -p prompt_gateway -p llm_gateway
cargo build --release -p brightstaff

# Tests (all crates, native target)
cargo test --workspace

# Single crate test
cargo test -p common
cargo test -p hermesllm
cargo test -p prompt_gateway
cargo test -p llm_gateway
cargo test -p brightstaff
```

## WASM Output Location

After building, WASM filter binaries are at:
```
target/wasm32-wasip1/release/prompt_gateway.wasm
target/wasm32-wasip1/release/llm_gateway.wasm
```

These are loaded by Envoy at startup from `/etc/envoy/proxy-wasm-plugins/` in the Docker image.