apunkt/plano

Fork 0

mirror of https://github.com/katanemo/plano.git synced 2026-06-17 15:25:17 +02:00

Adil Hafeez 3f8aa14e4c

create md files for coding agents and for humans

2026-02-09 23:34:18 -08:00

23 KiB

Raw Permalink Blame History

Plano (ArchGW) — High-Level Architecture

Overview

Plano is an AI-native gateway built on Envoy Proxy, extended with custom WebAssembly (WASM) filters and a native Rust service called Brightstaff. It acts as an intelligent intermediary between client applications, AI agents, and LLM providers — handling intent-based routing, prompt guardrails, function calling, agent orchestration, rate limiting, and multi-provider LLM translation.

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Plano Gateway                                  │
│                                                                             │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │                         Envoy Proxy (L7)                             │  │
│   │                                                                      │  │
│   │   ┌──────────────────┐       ┌──────────────────┐                    │  │
│   │   │  prompt_gateway  │──────▶│   llm_gateway     │                   │  │
│   │   │    (WASM)        │       │     (WASM)        │                   │  │
│   │   │                  │       │                   │                   │  │
│   │   │ • Intent matching│       │ • Provider routing│                   │  │
│   │   │ • Guardrails     │       │ • Auth injection  │                   │  │
│   │   │ • Function call  │       │ • Rate limiting   │                   │  │
│   │   │ • Prompt targets │       │ • API translation │                   │  │
│   │   └──────────────────┘       └────────┬─────────┘                   │  │
│   │                                       │                              │  │
│   └───────────────────────────────────────┼──────────────────────────────┘  │
│                                           │                                 │
│   ┌───────────────────────────────────────┼──────────────────────────────┐  │
│   │                    Brightstaff (Rust HTTP Server :9091)               │  │
│   │                                                                      │  │
│   │   • LLM request routing (Arch-Router model)                          │  │
│   │   • Agent orchestration (Plano-Orchestrator model)                   │  │
│   │   • Conversation state management (memory / PostgreSQL)              │  │
│   │   • Function calling handler (Arch-Function model)                   │  │
│   │   • Observability & signal analysis                                  │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
         │                    │                         │
         ▼                    ▼                         ▼
   ┌──────────┐      ┌──────────────┐          ┌──────────────┐
   │  Agents  │      │ Developer    │          │ LLM Providers│
   │ (MCP/HTTP)│     │   APIs       │          │ (OpenAI, etc)│
   └──────────┘      └──────────────┘          └──────────────┘

The Role of Envoy

Envoy is the data plane of Plano. All client traffic — both inbound prompts and outbound LLM calls — flows through Envoy. It provides:

L7 HTTP routing based on paths and custom headers
WASM filter execution for inline request/response transformation
Connection pooling and TLS to upstream LLM providers
Retry policies for resilience
Compression/decompression for LLM streaming responses

Envoy Listeners

Envoy defines six listener types, each serving a distinct role in the request flow:

Listener	Port	Direction	Purpose
`ingress_traffic`	10000 (configurable)	Inbound	Client-facing entry point. Forwards all traffic to the prompt gateway listener.
`ingress_traffic_prompt`	10001	Inbound	Core processing listener. Runs both WASM filters (`prompt_gateway` → `llm_gateway`). Routes to LLM providers by `x-arch-llm-provider` header.
`outbound_api_traffic`	11000	Internal	Routes to upstream developer APIs and agents using `x-arch-upstream` header. No WASM filters.
Agent listeners	Per-config	Inbound	One per agent listener in config. Routes to Brightstaff with `/agents/` path prefix.
`egress_traffic`	12000 (configurable)	Outbound	LLM gateway entry for agents/services reaching LLMs. Routes to Brightstaff for routing decisions.
`egress_traffic_llm`	12001	Outbound	Final outbound LLM listener. Runs `llm_gateway.wasm` for auth injection, provider translation, and rate limiting before reaching the actual LLM provider.

Envoy Clusters

Envoy manages connections to all upstream services:

LLM Provider Clusters — Pre-configured TLS clusters for: OpenAI, Anthropic (Claude), Groq, Mistral, DeepSeek, Gemini, xAI, MoonshotAI, Zhipu, Together AI, and Katanemo's hosted Arch models. Custom-URL providers (e.g., Azure OpenAI, Ollama) are dynamically added from config.

Internal Clusters:

Cluster	Target	Purpose
`bright_staff`	localhost:9091	The Brightstaff Rust service
`arch_prompt_gateway_listener`	localhost:10001	Internal forwarding from ingress
`arch_listener_llm`	localhost:12001	Internal forwarding for LLM egress
`arch_internal`	localhost:11000	Outbound API router

Dynamic Clusters — Generated from endpoints and agents config sections (developer APIs, agent services).

Custom Headers Used for Routing

Header	Set By	Used By	Purpose
`x-arch-llm-provider`	WASM filters	Envoy routes	Selects the LLM provider cluster
`x-arch-llm-provider-hint`	Brightstaff	llm_gateway	Hints which provider/model to use
`x-arch-upstream` / `x-arch-upstream-host`	WASM filters / Brightstaff	Envoy routes	Targets a specific agent or API endpoint
`x-arch-is-streaming`	Brightstaff	llm_gateway	Indicates streaming mode
`x-arch-state`	prompt_gateway	prompt_gateway	Carries multi-turn conversation state
`x-arch-tool-call`	prompt_gateway	prompt_gateway	Carries tool call metadata
`x-arch-api-response`	prompt_gateway	prompt_gateway	Carries developer API response data
`x-arch-agent-listener-name`	Envoy	Brightstaff	Identifies which agent listener a request arrived on

Request Flows

Flow 1: Direct LLM Chat (`POST /v1/chat/completions`)

This is the standard path for client-to-LLM requests with optional intent matching and routing.

Client
  │
  ▼
[Envoy :10000 — ingress_traffic]
  │  (simple passthrough)
  ▼
[Envoy :10001 — ingress_traffic_prompt]
  │
  ├── prompt_gateway.wasm
  │     1. Parse ChatCompletions request
  │     2. Convert prompt_targets → tool definitions
  │     3. Dispatch to Arch-Function model at /function_calling
  │     4. If intent matched:
  │         → Call developer API endpoint via :11000
  │         → Augment prompt with API response context
  │     5. If no intent matched:
  │         → Prepend system prompt, forward to LLM
  │
  ├── llm_gateway.wasm
  │     1. Select LLM provider (from header hint or default)
  │     2. Enforce rate limits (token-based via tiktoken)
  │     3. Inject auth credentials (Bearer / x-api-key)
  │     4. Transform request format (OpenAI ↔ Anthropic ↔ Bedrock)
  │     5. Rewrite upstream path for target provider
  │
  ▼
LLM Provider (OpenAI, Anthropic, Gemini, etc.)
  │
  ▼
(Response flows back through llm_gateway for format translation)
  │
  ▼
Client

Flow 2: Brightstaff LLM Routing (`POST /v1/chat/completions` via egress)

When requests reach Brightstaff (directly or via agent listeners), it performs intelligent model routing.

Client / Agent
  │
  ▼
[Brightstaff :9091]
  │
  ├── Resolve model aliases
  ├── Validate model exists in configured providers
  ├── Retrieve conversation state (if using Responses API)
  │
  ├── Call Arch-Router model ──► [Envoy :12001]
  │     (determines best model/provider for the request    ──► LLM Provider
  │      based on routing_preferences in config)
  │
  ├── Forward actual request ──► [Envoy :12001]
  │     (with x-arch-llm-provider-hint header)             ──► LLM Provider
  │
  ▼
[Stream response back with metrics, signal analysis, state capture]
  │
  ▼
Client / Agent

Flow 3: Agent Orchestration (`POST /agents/v1/chat/completions`)

The agentic flow where Brightstaff selects and chains agents based on user intent.

Client
  │
  ▼
[Envoy — Agent Listener :configurable]
  │  (path rewrite: /agents/...)
  ▼
[Brightstaff :9091]
  │
  ├── Identify listener from x-arch-agent-listener-name
  ├── Find configured agents for this listener
  │
  ├── If multiple agents:
  │     Call Plano-Orchestrator model ──► [Envoy :12001] ──► LLM
  │     (selects which agents to run and in what order)
  │
  ├── For each selected agent:
  │     │
  │     ├── Run filter chain (pre-processing)
  │     │     └── [Envoy :11000] ──► Filter Service (MCP/HTTP)
  │     │
  │     ├── Invoke agent
  │     │     └── [Envoy :11000] ──► Agent Service (MCP/HTTP)
  │     │
  │     ├── If intermediate agent:
  │     │     Collect full response → feed as input to next agent
  │     │
  │     └── If final agent:
  │           Stream response directly to client
  │
  ▼
Client

Brightstaff Service

Brightstaff is a native Rust HTTP server (0.0.0.0:9091) built with Axum. It is the control plane brain of Plano — while Envoy handles the data plane (proxying, filtering), Brightstaff handles the intelligent decision-making.

Endpoints

Method	Path	Handler	Purpose
`POST`	`/v1/chat/completions`	`llm_chat`	LLM passthrough with model routing
`POST`	`/v1/messages`	`llm_chat`	Anthropic Messages API compat
`POST`	`/v1/responses`	`llm_chat`	OpenAI Responses API with state
`POST`	`/agents/v1/chat/completions`	`agent_chat`	Agent orchestration pipeline
`POST`	`/agents/v1/messages`	`agent_chat`	Agent orchestration (Messages)
`POST`	`/agents/v1/responses`	`agent_chat`	Agent orchestration (Responses)
`POST`	`/function_calling`	`function_calling_chat_handler`	Arch-Function tool calling
`GET`	`/v1/models`	`list_models`	List configured LLM models

Core Components

RouterService (LLM Routing)

Uses the Arch-Router model — a specialized LLM that determines which provider/model best matches a user's request based on routing_preferences defined in config. Constructs a system prompt describing available routes, sends the conversation, and parses a {"route": "route_name"} response.

OrchestratorService (Agent Selection)

Uses the Plano-Orchestrator model to determine which agent(s) should handle a request when multiple agents are available on a listener. Returns an ordered list of agents: {"route": ["agent1", "agent2"]}.

PipelineProcessor (Agent Execution)

Manages the sequential execution of agent filter chains and agent invocations:

MCP agents: JSON-RPC 2.0 protocol over SSE transport (initialize → notifications/initialized → tools/call)
HTTP agents: Direct POST with message array
Routes through Envoy at :11000 using x-arch-upstream-host header

Function Calling Handler

Specialized handler for the Arch-Function model:

Converts OpenAI tool definitions into prompts
Parses structured JSON responses (tool_calls, clarifications)
Includes hallucination detection using entropy/varentropy/probability thresholds from logprobs

State Management

Manages conversation state for the OpenAI Responses API (v1/responses):

Memory backend — HashMap behind Arc<RwLock> for single-instance dev
PostgreSQL backend — Persistent storage with upsert semantics
ResponsesStateProcessor intercepts streaming responses to capture response_id and output items, storing them asynchronously for future conversation chaining via previous_response_id

Signal Analysis (Observability)

Analyzes conversation patterns for interaction quality:

Frustration, repetition/looping, escalation requests, positive feedback, repair patterns
Quality graded as Good / Fair / Poor / Severe
Concerning signals flag spans with indicators for monitoring

Rust Crate Architecture

┌─────────────────────────────────────────────────────────────┐
│                     brightstaff (binary)                     │
│                                                             │
│   Native Rust HTTP server — routing, orchestration, state   │
│   Depends on: hermesllm, common (non-WASM parts)           │
└─────────────────────────────────────────────────────────────┘

┌──────────────────────┐    ┌──────────────────────┐
│   prompt_gateway     │    │   llm_gateway         │
│      (WASM)          │    │      (WASM)           │
│                      │    │                       │
│  Intent matching     │    │  Provider routing     │
│  Prompt guards       │    │  Auth injection       │
│  Function calling    │    │  Rate limiting        │
│  API orchestration   │    │  Request/Response     │
│                      │    │  format translation   │
├──────────────────────┤    ├───────────────────────┤
│  depends on: common  │    │  depends on: common,  │
│                      │    │  hermesllm            │
└──────────┬───────────┘    └──────────┬────────────┘
           │                           │
           ▼                           ▼
┌──────────────────────────────────────────────────────────────┐
│                        common (lib)                          │
│                                                             │
│  Configuration types, LlmProviders, HTTP client trait,      │
│  rate limiting (governor), tokenization (tiktoken),         │
│  OpenAI API types, routing, metrics, tracing, constants     │
│  Depends on: hermesllm                                      │
└─────────────────────────────┬───────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────┐
│                       hermesllm (lib)                        │
│                                                             │
│  LLM protocol abstraction — cross-provider request/response │
│  translation (OpenAI ↔ Anthropic ↔ Bedrock ↔ Gemini)       │
│  SSE stream parsing, provider model catalog, endpoint       │
│  mapping. No proxy-wasm dependency (pure Rust).             │
└──────────────────────────────────────────────────────────────┘

WASM Compilation

Both prompt_gateway and llm_gateway compile to cdylib targets for wasm32-wasip1 using the proxy-wasm SDK (v0.2.1). Envoy loads them via its V8 WASM runtime. Each filter implements RootContext (for config parsing and per-stream creation) and HttpContext (for per-request processing).

Deployment Architecture

All components run inside a single container managed by Supervisord:

┌─────────────────────────────────────────────────────────────┐
│                     Docker Container                         │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │                   Supervisord                        │    │
│  │                                                     │    │
│  │  ┌─────────────┐  ┌───────────────┐  ┌───────────┐ │    │
│  │  │ Brightstaff  │  │  Envoy Proxy  │  │  Log Tail │ │    │
│  │  │  (Rust)      │  │  + WASM       │  │           │ │    │
│  │  │  :9091       │  │  :10000-12001 │  │           │ │    │
│  │  └─────────────┘  └───────────────┘  └───────────┘ │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  Startup sequence:                                          │
│   1. config_generator.py validates arch_config.yaml         │
│   2. Renders envoy.template.yaml → envoy.yaml (Jinja2)     │
│   3. Starts Brightstaff + Envoy in parallel                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Docker multi-stage build:

deps — Rust 1.93.0 with wasm32-wasip1 target, dependency pre-compilation
wasm-builder — Builds prompt_gateway.wasm + llm_gateway.wasm (release)
brightstaff-builder — Builds the brightstaff native binary (release)
envoy — Pulls envoyproxy/envoy:v1.37.0
arch (final) — Python 3.13.6-slim base with Envoy binary, WASM plugins, Brightstaff binary, and the planoai CLI

Configuration Pipeline

User-facing configuration flows through a generation pipeline before reaching Envoy and Brightstaff:

arch_config.yaml (user-authored)
        │
        ▼
config_generator.py (Python CLI)
  1. Validate against arch_config_schema.yaml (JSON Schema)
  2. Normalize legacy formats (llm_providers → model_providers)
  3. Parse agents, filters, endpoints → infer Envoy clusters
  4. Parse model_providers → validate provider/model format
  5. Auto-add internal models (arch-function, arch-router, plano-orchestrator)
  6. Validate model aliases, routing preferences, prompt target endpoints
        │
        ├──► envoy.yaml (rendered from envoy.template.yaml via Jinja2)
        │      → consumed by Envoy
        │
        └──► arch_config_rendered.yaml
               → consumed by Brightstaff
               → injected into WASM filter configs

Key Config Sections

Section	Consumed By	Purpose
`model_providers`	llm_gateway, Brightstaff	LLM provider definitions with models, auth, routing preferences
`prompt_targets`	prompt_gateway	Intent-to-API mappings with parameter schemas
`prompt_guards`	prompt_gateway	Input guardrails (jailbreak detection)
`endpoints`	prompt_gateway, Envoy	Named upstream API endpoint definitions
`agents`	Brightstaff, Envoy	Agent service definitions (id, URL, type)
`listeners`	Brightstaff, Envoy	Listener configs binding agents to ports
`ratelimits`	llm_gateway	Per-model rate limits with token-based quotas
`routing`	Brightstaff	LLM routing model/provider config
`model_aliases`	Brightstaff	Friendly name → provider/model mappings
`state_storage`	Brightstaff	Conversation state backend (memory / postgres)
`tracing`	All components	OpenTelemetry config (sampling, OTLP endpoint)
`overrides`	prompt_gateway, Brightstaff	Tuning (intent threshold, agent orchestrator toggle)

Supported LLM Providers

Provider	Cluster	Auth Method
OpenAI	api.openai.com	Bearer token
Anthropic (Claude)	api.anthropic.com	x-api-key header
Google (Gemini)	generativelanguage.googleapis.com	API key in URL
Groq	api.groq.com	Bearer token
Mistral	api.mistral.ai	Bearer token
DeepSeek	api.deepseek.com	Bearer token
xAI	api.x.ai	Bearer token
Together AI	api.together.xyz	Bearer token
MoonshotAI	api.moonshot.ai	Bearer token
Zhipu	open.bigmodel.cn	Bearer token
Amazon Bedrock	Custom base_url	AWS Sig v4
Azure OpenAI	Custom base_url	Bearer / API key
Ollama	Custom base_url	None
Katanemo (Arch)	archfc.katanemo.dev	Bearer token

The hermesllm crate handles cross-provider request/response translation so clients can use a single API format (typically OpenAI-compatible) regardless of which upstream provider serves the request.

23 KiB Raw Permalink Blame History