mirror of
https://github.com/katanemo/plano.git
synced 2026-06-17 15:25:17 +02:00
8.6 KiB
8.6 KiB
Plano Rust Crates
This workspace contains 5 Rust crates that form the core of the Plano AI gateway. They are organized by compilation target and responsibility.
Workspace Layout
crates/
├── Cargo.toml # Workspace root (resolver = "2")
├── build.sh # Builds WASM filters + native binary
├── brightstaff/ # Native Rust HTTP server (Axum)
├── common/ # Shared library (WASM-compatible)
├── hermesllm/ # LLM protocol translation (pure Rust)
├── llm_gateway/ # WASM filter: LLM routing & auth
└── prompt_gateway/ # WASM filter: intent matching & guardrails
Crate Details
prompt_gateway — Inbound Prompt Processing
| Type | cdylib (WASM filter) |
| Target | wasm32-wasip1 |
| Envoy listener | ingress_traffic_prompt (:10001) |
| Root ID | prompt_gateway |
| Depends on | common, proxy-wasm |
Responsibilities:
- Intercepts incoming chat completion requests
- Converts
prompt_targetsinto OpenAI tool definitions - Dispatches to
Arch-Functionmodel for intent classification - If intent matches: calls developer API endpoints, augments prompt with response context
- If no match: prepends system prompt, forwards to upstream LLM
- Manages multi-turn state via
x-arch-stateheader - Applies
prompt_guards(jailbreak detection)
Key modules:
filter_context.rs— RootContext, config parsinghttp_context.rs— Request interception, tool definition constructionstream_context.rs— Core orchestration (intent matching, API calls, response handling)tools.rs— URL path/query parameter substitution for API calls
Constraints:
- No
tokio,async/await, threads, or network sockets - All HTTP calls via
proxy-wasmdispatch_http_call
llm_gateway — LLM Provider Routing & Translation
| Type | cdylib (WASM filter) |
| Target | wasm32-wasip1 |
| Envoy listeners | ingress_traffic_prompt (:10001), egress_traffic_llm (:12001) |
| Root ID | llm_gateway |
| Depends on | common, hermesllm, proxy-wasm |
Responsibilities:
- Selects LLM provider based on
x-arch-llm-provider-hintheader or default - Injects authentication credentials (Bearer token, x-api-key, passthrough)
- Rewrites request path for target provider API
- Transforms request/response formats between providers (OpenAI ↔ Anthropic ↔ Bedrock) via
hermesllm - Enforces token-based rate limits (
governorwithno_std) - Handles SSE stream reassembly across chunk boundaries (
SseStreamBuffer) - Records metrics: TTFT, tokens/sec, request latency, rate-limited count
Key modules:
filter_context.rs— RootContext, provider & rate limit initializationstream_context.rs— Request/response transformation, auth, rate limiting, streamingmetrics.rs— Gauge, counter, histogram definitions
Constraints:
- Same WASM constraints as
prompt_gateway - Uses
hermesllmfor protocol translation — do NOT duplicate translation logic here
common — Shared Types & Utilities
| Type | lib |
| Target | Both native and wasm32-wasip1 |
| Depends on | hermesllm, proxy-wasm, governor (no_std), tiktoken-rs |
Responsibilities:
- Central configuration schema (
Configuration,LlmProvider,PromptTarget,PromptGuards, etc.) LlmProviderscollection — provider lookup with slug matching and wildcard expansion- HTTP client trait wrapping
proxy-wasmdispatch_http_call - All
x-arch-*header constants and path constants (consts.rs) - Token-based rate limiting (
governor, keyed by model + header selector) - Token counting via
tiktoken-rs - OpenAI-compatible API types (
ChatCompletionsRequest,Message,ToolCall, etc.) - Error types (
ClientError,ServerError) - Metrics primitives (
Gauge,Counter,Histogram) - URL path parameter substitution
- PII obfuscation for logging
Key modules:
configuration.rs— All config structs, deserialization, validationconsts.rs— Canonical header names, paths, timeouts, cluster namesllm_providers.rs— Provider collection with lookup logicratelimit.rs— Token-based rate limiter (globalOnceLock)http.rs—Clienttrait for WASM HTTP dispatchtokenizer.rs— Token counting (tiktoken, GPT-4 fallback)
Constraints:
- Must compile for
wasm32-wasip1— no std networking, no threads - Must NOT depend on
brightstaff
hermesllm — LLM Protocol Translation
| Type | lib |
| Target | Native only (but no WASM-incompatible deps) |
| Depends on | serde, serde_json, aws-smithy-eventstream, uuid |
Responsibilities:
- Cross-provider request/response translation (OpenAI ↔ Anthropic ↔ Amazon Bedrock ↔ Gemini)
ProviderRequest/ProviderResponse/ProviderStreamResponsetraits- SSE stream parsing (
SseStreamIter,SseStreamBuffer,SseChunkProcessor) - AWS Event Stream binary frame decoding (Bedrock)
- Provider identification (
ProviderIdenum with model catalog fromprovider_models.yaml) - Target endpoint path rewriting (
/v1/chat/completions→ provider-specific paths)
Key modules:
apis/— Format definitions:openai.rs,anthropic.rs,amazon_bedrock.rs,openai_responses.rsapis/streaming_shapes/— SSE and binary stream parsingproviders/—id.rs(ProviderId),request.rs,response.rs,streaming_response.rsclients/endpoints.rs— API path mappingtransforms/— Request/response transformations organized by direction
Constraints:
- MUST NOT depend on
proxy-wasmorcommon— this is a pure Rust library - Must remain usable outside of the WASM/Envoy context
- Optional
model-fetchfeature gates network dependencies (ureq)
brightstaff — Native HTTP Server
| Type | Binary (Axum) |
| Target | Native only |
| Port | 0.0.0.0:9091 |
| Depends on | hermesllm, common (non-WASM parts), tokio, axum, reqwest, opentelemetry |
Responsibilities:
- LLM request routing via
Arch-Routermodel (selects best provider/model) - Agent orchestration via
Plano-Orchestratormodel (selects and chains agents) - Agent execution pipeline: filter chains → agent invocation (MCP JSON-RPC or HTTP)
Arch-Functionhandler: tool calling with hallucination detection- Conversation state management for Responses API (memory or PostgreSQL)
- Model alias resolution
- OpenTelemetry tracing with per-component service names
- Interaction signal analysis (frustration, repetition, escalation detection)
Key modules:
handlers/llm.rs— LLM passthrough with routinghandlers/agent_chat_completions.rs— Agent orchestration entry pointhandlers/agent_selector.rs— Agent selection logichandlers/pipeline_processor.rs— Sequential agent/filter executionhandlers/function_calling.rs— Arch-Function tool callingrouter/llm_router.rs—RouterService(Arch-Router model)router/plano_orchestrator.rs—OrchestratorService(Plano-Orchestrator model)state/—StateStoragetrait, memory & PostgreSQL backendssignals/— Conversation quality analysistracing/— OpenTelemetry setup with custom service name routing
Constraints:
- All external calls go through Envoy (localhost:12001 for LLMs, localhost:11000 for agents)
- Does NOT use
common'sproxy-wasmClient trait — usesreqwestinstead
Dependency Graph
prompt_gateway ──► common ──► hermesllm
llm_gateway ───┬► common ──► hermesllm
└► hermesllm
brightstaff ───┬► hermesllm
└► common (config types only, not WASM code)
hermesllm ────► (standalone — no proxy-wasm, no common)
Direction is strictly enforced:
- Arrows point toward dependencies
- No cycles allowed
hermesllmis the leaf node — it must never depend on any other workspace crate
Build Commands
# Everything (recommended)
./build.sh
# Equivalent to:
cargo build --release --target wasm32-wasip1 -p prompt_gateway -p llm_gateway
cargo build --release -p brightstaff
# Tests (all crates, native target)
cargo test --workspace
# Single crate test
cargo test -p common
cargo test -p hermesllm
cargo test -p prompt_gateway
cargo test -p llm_gateway
cargo test -p brightstaff
WASM Output Location
After building, WASM filter binaries are at:
target/wasm32-wasip1/release/prompt_gateway.wasm
target/wasm32-wasip1/release/llm_gateway.wasm
These are loaded by Envoy at startup from /etc/envoy/proxy-wasm-plugins/ in the Docker image.