# Copilot Instructions for Plano (ArchGW)

## System Identity

Plano is an AI-native gateway built on Envoy Proxy. It uses WASM filters for inline request processing and a native Rust service (Brightstaff) for orchestration. All components run in a single container managed by Supervisord.

## Critical Architectural Rules

### 1. Envoy Is the Data Plane — Never Bypass It

All external traffic MUST flow through Envoy. Brightstaff NEVER makes direct outbound HTTP calls to LLM providers or developer APIs. It always routes through Envoy listeners:
- LLM requests → `localhost:12001` (egress LLM listener with `llm_gateway.wasm`)
- Agent/API requests → `localhost:11000` (outbound API listener)

**Do not** add direct HTTP calls from Brightstaff to external services. Use Envoy's cluster routing via `x-arch-*` headers instead.

### 2. WASM Crate Constraints

`prompt_gateway` and `llm_gateway` compile to `wasm32-wasip1`. This means:
- **No `tokio`, no `async/await`, no threads, no filesystem, no network sockets**
- All I/O goes through `proxy-wasm` SDK's `dispatch_http_call` (async callback-based)
- No crate with `std` networking features — use `governor` with `no_std`, etc.
- The `crate-type` is `["cdylib"]` — these are shared libraries, not binaries
- Test with `cargo test` (native), but build with `--target wasm32-wasip1`

**Do not** add dependencies to WASM crates that require `std::net`, `tokio`, `reqwest`, `hyper`, or any async runtime.

### 3. Crate Dependency Direction

```
prompt_gateway → common
llm_gateway    → common, hermesllm
common         → hermesllm
brightstaff    → common (non-WASM parts), hermesllm
hermesllm      → (standalone, no proxy-wasm)
```

- `hermesllm` must NEVER depend on `proxy-wasm` or `common` — it's a pure Rust library usable outside WASM
- `common` provides the `proxy-wasm` abstractions — WASM crates use `common`, not raw `proxy-wasm` directly (except for the SDK traits)
- `brightstaff` uses `hermesllm` directly for LLM types but does NOT use `common`'s WASM-specific code (like `proxy-wasm` Client trait)

### 4. Header-Based Routing Protocol

Envoy routes requests using custom headers. These are the canonical header names defined in `common/src/consts.rs`:

| Header | Purpose | Do NOT change |
|--------|---------|---------------|
| `x-arch-llm-provider` | Envoy route matching for LLM provider cluster | Used in envoy.template.yaml |
| `x-arch-llm-provider-hint` | Brightstaff → llm_gateway provider selection | Both sides must agree |
| `x-arch-upstream` | Targets a specific agent/API cluster in Envoy | Used in envoy.template.yaml |
| `x-arch-streaming-request` | Signals streaming mode | llm_gateway reads this |
| `x-arch-state` | Multi-turn conversation state in prompt_gateway | Serialized JSON |
| `x-arch-tool-call-message` | Tool call metadata | prompt_gateway internal |
| `x-arch-api-response-message` | Developer API response | prompt_gateway internal |
| `x-arch-agent-listener-name` | Identifies agent listener | Set by Envoy, read by Brightstaff |
| `x-arch-llm-route` | LLM route decision result | Brightstaff ↔ llm_gateway |

Changing header names requires updating: `consts.rs`, `envoy.template.yaml`, and all consumers.

### 5. Build System

```bash
# WASM filters — must use wasm32-wasip1 target
cargo build --release --target wasm32-wasip1 -p prompt_gateway -p llm_gateway

# Brightstaff — native binary
cargo build --release -p brightstaff
```

The workspace uses Rust edition 2021 and resolver "2". The workspace root is `crates/Cargo.toml`.

### 6. Configuration Flow

User config (`arch_config.yaml`) is validated and rendered by `cli/planoai/config_generator.py`:
- Schema: `config/arch_config_schema.yaml`
- Template: `config/envoy.template.yaml` (Jinja2)
- Output: `envoy.yaml` (for Envoy) + `arch_config_rendered.yaml` (for Brightstaff + WASM filter configs)

When adding new config fields: update the schema, the template (if Envoy-relevant), the Python generator, AND the Rust `Configuration` struct in `common/src/configuration.rs`.

### 7. Internal Model Names

These are reserved model names used internally — do not conflict with them:
- `Arch-Function` — intent classification / function calling
- `Arch-Router` — (used as route name prefix, not direct model name)
- `Plano-Orchestrator` — agent selection orchestrator

### 8. API Compatibility

Brightstaff exposes OpenAI-compatible endpoints:
- `/v1/chat/completions` — Chat Completions API
- `/v1/messages` — Anthropic Messages API compatible
- `/v1/responses` — OpenAI Responses API with state management
- `/function_calling` — Internal Arch-Function endpoint

The `/agents/` prefix variants mirror these for agent orchestration.

Do NOT change these path structures without updating `consts.rs`, Brightstaff router, and `envoy.template.yaml`.

### 9. Streaming

- LLM responses use SSE (Server-Sent Events) format: `data: {json}\n\n`
- The `llm_gateway` WASM filter handles SSE stream reassembly across chunk boundaries via `SseStreamBuffer`
- Brightstaff uses `mpsc` channels for streaming responses back to clients
- Bedrock uses AWS Event Stream binary protocol — decoded by `hermesllm`

### 10. Testing Conventions

- WASM crates: unit tests run natively (`cargo test`), NOT under WASM runtime
- Brightstaff: unit tests with `mockito` for HTTP mocking
- E2E tests: separate `tests/` directory, run via GitHub Actions workflows
- Config validation tests: `cli/test/test_config_generator.py`

## File Layout Reference

```
crates/
  Cargo.toml          # Workspace root
  brightstaff/        # Native Rust HTTP server (Axum)
  common/             # Shared types, config, HTTP, rate limiting
  hermesllm/          # LLM protocol translation (pure Rust)
  llm_gateway/        # WASM filter: provider routing, auth, rate limits
  prompt_gateway/     # WASM filter: intent matching, guardrails
config/
  arch_config_schema.yaml   # User config JSON schema
  envoy.template.yaml       # Jinja2 template → envoy.yaml
  docker-compose.dev.yaml   # Dev environment
cli/
  planoai/                  # Python CLI (config generator, Docker management)
```