plano/.github/copilot-instructions.md
2026-02-09 23:34:18 -08:00

6 KiB

Copilot Instructions for Plano (ArchGW)

System Identity

Plano is an AI-native gateway built on Envoy Proxy. It uses WASM filters for inline request processing and a native Rust service (Brightstaff) for orchestration. All components run in a single container managed by Supervisord.

Critical Architectural Rules

1. Envoy Is the Data Plane — Never Bypass It

All external traffic MUST flow through Envoy. Brightstaff NEVER makes direct outbound HTTP calls to LLM providers or developer APIs. It always routes through Envoy listeners:

  • LLM requests → localhost:12001 (egress LLM listener with llm_gateway.wasm)
  • Agent/API requests → localhost:11000 (outbound API listener)

Do not add direct HTTP calls from Brightstaff to external services. Use Envoy's cluster routing via x-arch-* headers instead.

2. WASM Crate Constraints

prompt_gateway and llm_gateway compile to wasm32-wasip1. This means:

  • No tokio, no async/await, no threads, no filesystem, no network sockets
  • All I/O goes through proxy-wasm SDK's dispatch_http_call (async callback-based)
  • No crate with std networking features — use governor with no_std, etc.
  • The crate-type is ["cdylib"] — these are shared libraries, not binaries
  • Test with cargo test (native), but build with --target wasm32-wasip1

Do not add dependencies to WASM crates that require std::net, tokio, reqwest, hyper, or any async runtime.

3. Crate Dependency Direction

prompt_gateway → common
llm_gateway    → common, hermesllm
common         → hermesllm
brightstaff    → common (non-WASM parts), hermesllm
hermesllm      → (standalone, no proxy-wasm)
  • hermesllm must NEVER depend on proxy-wasm or common — it's a pure Rust library usable outside WASM
  • common provides the proxy-wasm abstractions — WASM crates use common, not raw proxy-wasm directly (except for the SDK traits)
  • brightstaff uses hermesllm directly for LLM types but does NOT use common's WASM-specific code (like proxy-wasm Client trait)

4. Header-Based Routing Protocol

Envoy routes requests using custom headers. These are the canonical header names defined in common/src/consts.rs:

Header Purpose Do NOT change
x-arch-llm-provider Envoy route matching for LLM provider cluster Used in envoy.template.yaml
x-arch-llm-provider-hint Brightstaff → llm_gateway provider selection Both sides must agree
x-arch-upstream Targets a specific agent/API cluster in Envoy Used in envoy.template.yaml
x-arch-streaming-request Signals streaming mode llm_gateway reads this
x-arch-state Multi-turn conversation state in prompt_gateway Serialized JSON
x-arch-tool-call-message Tool call metadata prompt_gateway internal
x-arch-api-response-message Developer API response prompt_gateway internal
x-arch-agent-listener-name Identifies agent listener Set by Envoy, read by Brightstaff
x-arch-llm-route LLM route decision result Brightstaff ↔ llm_gateway

Changing header names requires updating: consts.rs, envoy.template.yaml, and all consumers.

5. Build System

# WASM filters — must use wasm32-wasip1 target
cargo build --release --target wasm32-wasip1 -p prompt_gateway -p llm_gateway

# Brightstaff — native binary
cargo build --release -p brightstaff

The workspace uses Rust edition 2021 and resolver "2". The workspace root is crates/Cargo.toml.

6. Configuration Flow

User config (arch_config.yaml) is validated and rendered by cli/planoai/config_generator.py:

  • Schema: config/arch_config_schema.yaml
  • Template: config/envoy.template.yaml (Jinja2)
  • Output: envoy.yaml (for Envoy) + arch_config_rendered.yaml (for Brightstaff + WASM filter configs)

When adding new config fields: update the schema, the template (if Envoy-relevant), the Python generator, AND the Rust Configuration struct in common/src/configuration.rs.

7. Internal Model Names

These are reserved model names used internally — do not conflict with them:

  • Arch-Function — intent classification / function calling
  • Arch-Router — (used as route name prefix, not direct model name)
  • Plano-Orchestrator — agent selection orchestrator

8. API Compatibility

Brightstaff exposes OpenAI-compatible endpoints:

  • /v1/chat/completions — Chat Completions API
  • /v1/messages — Anthropic Messages API compatible
  • /v1/responses — OpenAI Responses API with state management
  • /function_calling — Internal Arch-Function endpoint

The /agents/ prefix variants mirror these for agent orchestration.

Do NOT change these path structures without updating consts.rs, Brightstaff router, and envoy.template.yaml.

9. Streaming

  • LLM responses use SSE (Server-Sent Events) format: data: {json}\n\n
  • The llm_gateway WASM filter handles SSE stream reassembly across chunk boundaries via SseStreamBuffer
  • Brightstaff uses mpsc channels for streaming responses back to clients
  • Bedrock uses AWS Event Stream binary protocol — decoded by hermesllm

10. Testing Conventions

  • WASM crates: unit tests run natively (cargo test), NOT under WASM runtime
  • Brightstaff: unit tests with mockito for HTTP mocking
  • E2E tests: separate tests/ directory, run via GitHub Actions workflows
  • Config validation tests: cli/test/test_config_generator.py

File Layout Reference

crates/
  Cargo.toml          # Workspace root
  brightstaff/        # Native Rust HTTP server (Axum)
  common/             # Shared types, config, HTTP, rate limiting
  hermesllm/          # LLM protocol translation (pure Rust)
  llm_gateway/        # WASM filter: provider routing, auth, rate limits
  prompt_gateway/     # WASM filter: intent matching, guardrails
config/
  arch_config_schema.yaml   # User config JSON schema
  envoy.template.yaml       # Jinja2 template → envoy.yaml
  docker-compose.dev.yaml   # Dev environment
cli/
  planoai/                  # Python CLI (config generator, Docker management)