mirror of
https://github.com/katanemo/plano.git
synced 2026-06-17 15:25:17 +02:00
233 lines
8.6 KiB
Markdown
233 lines
8.6 KiB
Markdown
# Plano Rust Crates
|
|
|
|
This workspace contains 5 Rust crates that form the core of the Plano AI gateway. They are organized by compilation target and responsibility.
|
|
|
|
## Workspace Layout
|
|
|
|
```
|
|
crates/
|
|
├── Cargo.toml # Workspace root (resolver = "2")
|
|
├── build.sh # Builds WASM filters + native binary
|
|
├── brightstaff/ # Native Rust HTTP server (Axum)
|
|
├── common/ # Shared library (WASM-compatible)
|
|
├── hermesllm/ # LLM protocol translation (pure Rust)
|
|
├── llm_gateway/ # WASM filter: LLM routing & auth
|
|
└── prompt_gateway/ # WASM filter: intent matching & guardrails
|
|
```
|
|
|
|
---
|
|
|
|
## Crate Details
|
|
|
|
### `prompt_gateway` — Inbound Prompt Processing
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Type** | `cdylib` (WASM filter) |
|
|
| **Target** | `wasm32-wasip1` |
|
|
| **Envoy listener** | `ingress_traffic_prompt` (:10001) |
|
|
| **Root ID** | `prompt_gateway` |
|
|
| **Depends on** | `common`, `proxy-wasm` |
|
|
|
|
**Responsibilities:**
|
|
- Intercepts incoming chat completion requests
|
|
- Converts `prompt_targets` into OpenAI tool definitions
|
|
- Dispatches to `Arch-Function` model for intent classification
|
|
- If intent matches: calls developer API endpoints, augments prompt with response context
|
|
- If no match: prepends system prompt, forwards to upstream LLM
|
|
- Manages multi-turn state via `x-arch-state` header
|
|
- Applies `prompt_guards` (jailbreak detection)
|
|
|
|
**Key modules:**
|
|
- `filter_context.rs` — RootContext, config parsing
|
|
- `http_context.rs` — Request interception, tool definition construction
|
|
- `stream_context.rs` — Core orchestration (intent matching, API calls, response handling)
|
|
- `tools.rs` — URL path/query parameter substitution for API calls
|
|
|
|
**Constraints:**
|
|
- No `tokio`, `async/await`, threads, or network sockets
|
|
- All HTTP calls via `proxy-wasm` `dispatch_http_call`
|
|
|
|
---
|
|
|
|
### `llm_gateway` — LLM Provider Routing & Translation
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Type** | `cdylib` (WASM filter) |
|
|
| **Target** | `wasm32-wasip1` |
|
|
| **Envoy listeners** | `ingress_traffic_prompt` (:10001), `egress_traffic_llm` (:12001) |
|
|
| **Root ID** | `llm_gateway` |
|
|
| **Depends on** | `common`, `hermesllm`, `proxy-wasm` |
|
|
|
|
**Responsibilities:**
|
|
- Selects LLM provider based on `x-arch-llm-provider-hint` header or default
|
|
- Injects authentication credentials (Bearer token, x-api-key, passthrough)
|
|
- Rewrites request path for target provider API
|
|
- Transforms request/response formats between providers (OpenAI ↔ Anthropic ↔ Bedrock) via `hermesllm`
|
|
- Enforces token-based rate limits (`governor` with `no_std`)
|
|
- Handles SSE stream reassembly across chunk boundaries (`SseStreamBuffer`)
|
|
- Records metrics: TTFT, tokens/sec, request latency, rate-limited count
|
|
|
|
**Key modules:**
|
|
- `filter_context.rs` — RootContext, provider & rate limit initialization
|
|
- `stream_context.rs` — Request/response transformation, auth, rate limiting, streaming
|
|
- `metrics.rs` — Gauge, counter, histogram definitions
|
|
|
|
**Constraints:**
|
|
- Same WASM constraints as `prompt_gateway`
|
|
- Uses `hermesllm` for protocol translation — do NOT duplicate translation logic here
|
|
|
|
---
|
|
|
|
### `common` — Shared Types & Utilities
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Type** | `lib` |
|
|
| **Target** | Both native and `wasm32-wasip1` |
|
|
| **Depends on** | `hermesllm`, `proxy-wasm`, `governor` (no_std), `tiktoken-rs` |
|
|
|
|
**Responsibilities:**
|
|
- Central configuration schema (`Configuration`, `LlmProvider`, `PromptTarget`, `PromptGuards`, etc.)
|
|
- `LlmProviders` collection — provider lookup with slug matching and wildcard expansion
|
|
- HTTP client trait wrapping `proxy-wasm` `dispatch_http_call`
|
|
- All `x-arch-*` header constants and path constants (`consts.rs`)
|
|
- Token-based rate limiting (`governor`, keyed by model + header selector)
|
|
- Token counting via `tiktoken-rs`
|
|
- OpenAI-compatible API types (`ChatCompletionsRequest`, `Message`, `ToolCall`, etc.)
|
|
- Error types (`ClientError`, `ServerError`)
|
|
- Metrics primitives (`Gauge`, `Counter`, `Histogram`)
|
|
- URL path parameter substitution
|
|
- PII obfuscation for logging
|
|
|
|
**Key modules:**
|
|
- `configuration.rs` — All config structs, deserialization, validation
|
|
- `consts.rs` — Canonical header names, paths, timeouts, cluster names
|
|
- `llm_providers.rs` — Provider collection with lookup logic
|
|
- `ratelimit.rs` — Token-based rate limiter (global `OnceLock`)
|
|
- `http.rs` — `Client` trait for WASM HTTP dispatch
|
|
- `tokenizer.rs` — Token counting (tiktoken, GPT-4 fallback)
|
|
|
|
**Constraints:**
|
|
- Must compile for `wasm32-wasip1` — no std networking, no threads
|
|
- Must NOT depend on `brightstaff`
|
|
|
|
---
|
|
|
|
### `hermesllm` — LLM Protocol Translation
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Type** | `lib` |
|
|
| **Target** | Native only (but no WASM-incompatible deps) |
|
|
| **Depends on** | `serde`, `serde_json`, `aws-smithy-eventstream`, `uuid` |
|
|
|
|
**Responsibilities:**
|
|
- Cross-provider request/response translation (OpenAI ↔ Anthropic ↔ Amazon Bedrock ↔ Gemini)
|
|
- `ProviderRequest` / `ProviderResponse` / `ProviderStreamResponse` traits
|
|
- SSE stream parsing (`SseStreamIter`, `SseStreamBuffer`, `SseChunkProcessor`)
|
|
- AWS Event Stream binary frame decoding (Bedrock)
|
|
- Provider identification (`ProviderId` enum with model catalog from `provider_models.yaml`)
|
|
- Target endpoint path rewriting (`/v1/chat/completions` → provider-specific paths)
|
|
|
|
**Key modules:**
|
|
- `apis/` — Format definitions: `openai.rs`, `anthropic.rs`, `amazon_bedrock.rs`, `openai_responses.rs`
|
|
- `apis/streaming_shapes/` — SSE and binary stream parsing
|
|
- `providers/` — `id.rs` (ProviderId), `request.rs`, `response.rs`, `streaming_response.rs`
|
|
- `clients/endpoints.rs` — API path mapping
|
|
- `transforms/` — Request/response transformations organized by direction
|
|
|
|
**Constraints:**
|
|
- **MUST NOT depend on `proxy-wasm` or `common`** — this is a pure Rust library
|
|
- Must remain usable outside of the WASM/Envoy context
|
|
- Optional `model-fetch` feature gates network dependencies (`ureq`)
|
|
|
|
---
|
|
|
|
### `brightstaff` — Native HTTP Server
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Type** | Binary (Axum) |
|
|
| **Target** | Native only |
|
|
| **Port** | `0.0.0.0:9091` |
|
|
| **Depends on** | `hermesllm`, `common` (non-WASM parts), `tokio`, `axum`, `reqwest`, `opentelemetry` |
|
|
|
|
**Responsibilities:**
|
|
- LLM request routing via `Arch-Router` model (selects best provider/model)
|
|
- Agent orchestration via `Plano-Orchestrator` model (selects and chains agents)
|
|
- Agent execution pipeline: filter chains → agent invocation (MCP JSON-RPC or HTTP)
|
|
- `Arch-Function` handler: tool calling with hallucination detection
|
|
- Conversation state management for Responses API (memory or PostgreSQL)
|
|
- Model alias resolution
|
|
- OpenTelemetry tracing with per-component service names
|
|
- Interaction signal analysis (frustration, repetition, escalation detection)
|
|
|
|
**Key modules:**
|
|
- `handlers/llm.rs` — LLM passthrough with routing
|
|
- `handlers/agent_chat_completions.rs` — Agent orchestration entry point
|
|
- `handlers/agent_selector.rs` — Agent selection logic
|
|
- `handlers/pipeline_processor.rs` — Sequential agent/filter execution
|
|
- `handlers/function_calling.rs` — Arch-Function tool calling
|
|
- `router/llm_router.rs` — `RouterService` (Arch-Router model)
|
|
- `router/plano_orchestrator.rs` — `OrchestratorService` (Plano-Orchestrator model)
|
|
- `state/` — `StateStorage` trait, memory & PostgreSQL backends
|
|
- `signals/` — Conversation quality analysis
|
|
- `tracing/` — OpenTelemetry setup with custom service name routing
|
|
|
|
**Constraints:**
|
|
- All external calls go through Envoy (localhost:12001 for LLMs, localhost:11000 for agents)
|
|
- Does NOT use `common`'s `proxy-wasm` Client trait — uses `reqwest` instead
|
|
|
|
---
|
|
|
|
## Dependency Graph
|
|
|
|
```
|
|
prompt_gateway ──► common ──► hermesllm
|
|
llm_gateway ───┬► common ──► hermesllm
|
|
└► hermesllm
|
|
brightstaff ───┬► hermesllm
|
|
└► common (config types only, not WASM code)
|
|
|
|
hermesllm ────► (standalone — no proxy-wasm, no common)
|
|
```
|
|
|
|
**Direction is strictly enforced:**
|
|
- Arrows point toward dependencies
|
|
- No cycles allowed
|
|
- `hermesllm` is the leaf node — it must never depend on any other workspace crate
|
|
|
|
---
|
|
|
|
## Build Commands
|
|
|
|
```bash
|
|
# Everything (recommended)
|
|
./build.sh
|
|
|
|
# Equivalent to:
|
|
cargo build --release --target wasm32-wasip1 -p prompt_gateway -p llm_gateway
|
|
cargo build --release -p brightstaff
|
|
|
|
# Tests (all crates, native target)
|
|
cargo test --workspace
|
|
|
|
# Single crate test
|
|
cargo test -p common
|
|
cargo test -p hermesllm
|
|
cargo test -p prompt_gateway
|
|
cargo test -p llm_gateway
|
|
cargo test -p brightstaff
|
|
```
|
|
|
|
## WASM Output Location
|
|
|
|
After building, WASM filter binaries are at:
|
|
```
|
|
target/wasm32-wasip1/release/prompt_gateway.wasm
|
|
target/wasm32-wasip1/release/llm_gateway.wasm
|
|
```
|
|
|
|
These are loaded by Envoy at startup from `/etc/envoy/proxy-wasm-plugins/` in the Docker image.
|