mirror of https://github.com/katanemo/plano.git synced 2026-06-17 15:25:17 +02:00

Adil Hafeez 3f8aa14e4c create md files for coding agents and for humans		2026-02-09 23:34:18 -08:00
..
.vscode	use standard tracing and logging in brightstaff (#721 )	2026-02-09 13:33:27 -08:00
brightstaff	use standard tracing and logging in brightstaff (#721 )	2026-02-09 13:33:27 -08:00
common	use standard tracing and logging in brightstaff (#721 )	2026-02-09 13:33:27 -08:00
hermesllm	upgrade rust to 1.93.0 and fix pre-commit (#720 )	2026-02-02 11:03:12 -08:00
llm_gateway	use standard tracing and logging in brightstaff (#721 )	2026-02-09 13:33:27 -08:00
prompt_gateway	upgrade rust to 1.93.0 and fix pre-commit (#720 )	2026-02-02 11:03:12 -08:00
build.sh	Use mcp tools for filter chain (#621 )	2025-12-17 17:30:14 -08:00
Cargo.lock	use standard tracing and logging in brightstaff (#721 )	2026-02-09 13:33:27 -08:00
Cargo.toml	use standard tracing and logging in brightstaff (#721 )	2026-02-09 13:33:27 -08:00
README.md	create md files for coding agents and for humans	2026-02-09 23:34:18 -08:00

README.md

Plano Rust Crates

This workspace contains 5 Rust crates that form the core of the Plano AI gateway. They are organized by compilation target and responsibility.

Workspace Layout

crates/
├── Cargo.toml          # Workspace root (resolver = "2")
├── build.sh            # Builds WASM filters + native binary
├── brightstaff/        # Native Rust HTTP server (Axum)
├── common/             # Shared library (WASM-compatible)
├── hermesllm/          # LLM protocol translation (pure Rust)
├── llm_gateway/        # WASM filter: LLM routing & auth
└── prompt_gateway/     # WASM filter: intent matching & guardrails

Crate Details

`prompt_gateway` — Inbound Prompt Processing


Type	`cdylib` (WASM filter)
Target	`wasm32-wasip1`
Envoy listener	`ingress_traffic_prompt` (:10001)
Root ID	`prompt_gateway`
Depends on	`common`, `proxy-wasm`

Responsibilities:

Intercepts incoming chat completion requests
Converts prompt_targets into OpenAI tool definitions
Dispatches to Arch-Function model for intent classification
If intent matches: calls developer API endpoints, augments prompt with response context
If no match: prepends system prompt, forwards to upstream LLM
Manages multi-turn state via x-arch-state header
Applies prompt_guards (jailbreak detection)

Key modules:

filter_context.rs — RootContext, config parsing
http_context.rs — Request interception, tool definition construction
stream_context.rs — Core orchestration (intent matching, API calls, response handling)
tools.rs — URL path/query parameter substitution for API calls

Constraints:

No tokio, async/await, threads, or network sockets
All HTTP calls via proxy-wasm dispatch_http_call

`llm_gateway` — LLM Provider Routing & Translation


Type	`cdylib` (WASM filter)
Target	`wasm32-wasip1`
Envoy listeners	`ingress_traffic_prompt` (:10001), `egress_traffic_llm` (:12001)
Root ID	`llm_gateway`
Depends on	`common`, `hermesllm`, `proxy-wasm`

Responsibilities:

Selects LLM provider based on x-arch-llm-provider-hint header or default
Injects authentication credentials (Bearer token, x-api-key, passthrough)
Rewrites request path for target provider API
Transforms request/response formats between providers (OpenAI ↔ Anthropic ↔ Bedrock) via hermesllm
Enforces token-based rate limits (governor with no_std)
Handles SSE stream reassembly across chunk boundaries (SseStreamBuffer)
Records metrics: TTFT, tokens/sec, request latency, rate-limited count

Key modules:

filter_context.rs — RootContext, provider & rate limit initialization
stream_context.rs — Request/response transformation, auth, rate limiting, streaming
metrics.rs — Gauge, counter, histogram definitions

Constraints:

Same WASM constraints as prompt_gateway
Uses hermesllm for protocol translation — do NOT duplicate translation logic here

`common` — Shared Types & Utilities


Type	`lib`
Target	Both native and `wasm32-wasip1`
Depends on	`hermesllm`, `proxy-wasm`, `governor` (no_std), `tiktoken-rs`

Responsibilities:

Central configuration schema (Configuration, LlmProvider, PromptTarget, PromptGuards, etc.)
LlmProviders collection — provider lookup with slug matching and wildcard expansion
HTTP client trait wrapping proxy-wasm dispatch_http_call
All x-arch-* header constants and path constants (consts.rs)
Token-based rate limiting (governor, keyed by model + header selector)
Token counting via tiktoken-rs
OpenAI-compatible API types (ChatCompletionsRequest, Message, ToolCall, etc.)
Error types (ClientError, ServerError)
Metrics primitives (Gauge, Counter, Histogram)
URL path parameter substitution
PII obfuscation for logging

Key modules:

configuration.rs — All config structs, deserialization, validation
consts.rs — Canonical header names, paths, timeouts, cluster names
llm_providers.rs — Provider collection with lookup logic
ratelimit.rs — Token-based rate limiter (global OnceLock)
http.rs — Client trait for WASM HTTP dispatch
tokenizer.rs — Token counting (tiktoken, GPT-4 fallback)

Constraints:

Must compile for wasm32-wasip1 — no std networking, no threads
Must NOT depend on brightstaff

`hermesllm` — LLM Protocol Translation


Type	`lib`
Target	Native only (but no WASM-incompatible deps)
Depends on	`serde`, `serde_json`, `aws-smithy-eventstream`, `uuid`

Responsibilities:

Cross-provider request/response translation (OpenAI ↔ Anthropic ↔ Amazon Bedrock ↔ Gemini)
ProviderRequest / ProviderResponse / ProviderStreamResponse traits
SSE stream parsing (SseStreamIter, SseStreamBuffer, SseChunkProcessor)
AWS Event Stream binary frame decoding (Bedrock)
Provider identification (ProviderId enum with model catalog from provider_models.yaml)
Target endpoint path rewriting (/v1/chat/completions → provider-specific paths)

Key modules:

apis/ — Format definitions: openai.rs, anthropic.rs, amazon_bedrock.rs, openai_responses.rs
apis/streaming_shapes/ — SSE and binary stream parsing
providers/ — id.rs (ProviderId), request.rs, response.rs, streaming_response.rs
clients/endpoints.rs — API path mapping
transforms/ — Request/response transformations organized by direction

Constraints:

MUST NOT depend on proxy-wasm or common — this is a pure Rust library
Must remain usable outside of the WASM/Envoy context
Optional model-fetch feature gates network dependencies (ureq)

`brightstaff` — Native HTTP Server


Type	Binary (Axum)
Target	Native only
Port	`0.0.0.0:9091`
Depends on	`hermesllm`, `common` (non-WASM parts), `tokio`, `axum`, `reqwest`, `opentelemetry`

Responsibilities:

LLM request routing via Arch-Router model (selects best provider/model)
Agent orchestration via Plano-Orchestrator model (selects and chains agents)
Agent execution pipeline: filter chains → agent invocation (MCP JSON-RPC or HTTP)
Arch-Function handler: tool calling with hallucination detection
Conversation state management for Responses API (memory or PostgreSQL)
Model alias resolution
OpenTelemetry tracing with per-component service names
Interaction signal analysis (frustration, repetition, escalation detection)

Key modules:

handlers/llm.rs — LLM passthrough with routing
handlers/agent_chat_completions.rs — Agent orchestration entry point
handlers/agent_selector.rs — Agent selection logic
handlers/pipeline_processor.rs — Sequential agent/filter execution
handlers/function_calling.rs — Arch-Function tool calling
router/llm_router.rs — RouterService (Arch-Router model)
router/plano_orchestrator.rs — OrchestratorService (Plano-Orchestrator model)
state/ — StateStorage trait, memory & PostgreSQL backends
signals/ — Conversation quality analysis
tracing/ — OpenTelemetry setup with custom service name routing

Constraints:

All external calls go through Envoy (localhost:12001 for LLMs, localhost:11000 for agents)
Does NOT use common's proxy-wasm Client trait — uses reqwest instead

Dependency Graph

prompt_gateway ──► common ──► hermesllm
llm_gateway ───┬► common ──► hermesllm
               └► hermesllm
brightstaff ───┬► hermesllm
               └► common (config types only, not WASM code)

hermesllm ────► (standalone — no proxy-wasm, no common)

Direction is strictly enforced:

Arrows point toward dependencies
No cycles allowed
hermesllm is the leaf node — it must never depend on any other workspace crate

Build Commands

# Everything (recommended)
./build.sh

# Equivalent to:
cargo build --release --target wasm32-wasip1 -p prompt_gateway -p llm_gateway
cargo build --release -p brightstaff

# Tests (all crates, native target)
cargo test --workspace

# Single crate test
cargo test -p common
cargo test -p hermesllm
cargo test -p prompt_gateway
cargo test -p llm_gateway
cargo test -p brightstaff

WASM Output Location

After building, WASM filter binaries are at:

target/wasm32-wasip1/release/prompt_gateway.wasm
target/wasm32-wasip1/release/llm_gateway.wasm

These are loaded by Envoy at startup from /etc/envoy/proxy-wasm-plugins/ in the Docker image.

README.md

Plano Rust Crates

Workspace Layout

Crate Details

prompt_gateway — Inbound Prompt Processing

llm_gateway — LLM Provider Routing & Translation

common — Shared Types & Utilities

hermesllm — LLM Protocol Translation

brightstaff — Native HTTP Server

Dependency Graph

Build Commands

WASM Output Location

`prompt_gateway` — Inbound Prompt Processing

`llm_gateway` — LLM Provider Routing & Translation

`common` — Shared Types & Utilities

`hermesllm` — LLM Protocol Translation

`brightstaff` — Native HTTP Server