* http-filter: add fully http based demo (remove MCP) * Fix pre-commit formatting * add instructions about uv build/sync in cli (#675) * Musa/demo fix (#676) * fix demo with travel agent * Update .gitignore * remove sse chunk rendering * ensure that request id is consistent (#677) * ensure that request id is consistent * remove test debug/info statements * Introduce signals change (#655) * adding support for signals * reducing false positives for signals like positive interaction * adding docs. Still need to fix the messages list, but waiting on PR #621 * Improve frustration detection: normalize contractions and refine punctuation * Further refine test cases with longer messages * minor doc changes * fixing echo statement for build * fixing the messages construction and using the trait for signals * update signals docs * fixed some minor doc changes * added more tests and fixed docuemtnation. PR 100% ready * made fixes based on PR comments * Optimize latency 1. replace sliding window approach with trigram containment check 2. add code to pre-compute ngrams for patterns * removed some debug statements to make tests easier to read * PR comments to make ObservableStreamProcessor accept optonal Vec<Messagges> * fixed PR comments --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> Co-authored-by: MeiyuZhong <mariazhong9612@gmail.com> Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com> * pass request_id in orchestrator and routing model (#678) * release 0.4.2 (#679) * tweaks to web and docs to align to 0.4.2 (#680) * tweaks to web and docs to align to 0.4.2 * made our release banner clickable --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> * Added request id across agents * fix http_filter agent: request id + pre-commit --------- Co-authored-by: Adil Hafeez <adil@katanemo.com> Co-authored-by: Musa <malikmusa1323@gmail.com> Co-authored-by: Salman Paracha <salman.paracha@gmail.com> Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com> |
||
|---|---|---|
| .github/workflows | ||
| apps | ||
| cli | ||
| config | ||
| crates | ||
| demos | ||
| docs | ||
| packages | ||
| tests | ||
| .dockerignore | ||
| .gitignore | ||
| .gitmodules | ||
| .pre-commit-config.yaml | ||
| archgw.code-workspace | ||
| build_filter_image.sh | ||
| CONTRIBUTING.md | ||
| Dockerfile | ||
| LICENSE | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| turbo.json | ||
The AI-native proxy server and data plane for agentic apps.
Plano pulls out the rote plumbing work and decouples you from brittle framework abstractions, centralizing what shouldn’t be bespoke in every codebase - like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for model agility. Use any language or AI framework, and deliver agents faster to production.
Quickstart Guide • Build Agentic Apps with Plano • Documentation • Contact
Star ⭐️ the repo if you found Plano useful — new releases and updates land here first.
Overview
Building agentic demos is easy. Shipping agentic applications safely, reliably, and repeatably to production is hard. After the thrill of a quick hack, you end up building the “hidden middleware” to reach production: routing logic to reach the right agent, guardrail hooks for safety and moderation, evaluation and observability glue for continuous learning, and model/provider quirks scattered across frameworks and application code.
Plano solves this by moving core delivery concerns into a unified, out-of-process dataplane.
- 🚦 Orchestration: Low-latency orchestration between agents; add new agents without modifying app code.
- 🔗 Model Agility: Route by model name, alias (semantic names) or automatically via preferences.
- 🕵 Agentic Signals™: Zero-code capture of Signals plus OTEL traces/metrics across every agent.
- 🛡️ Moderation & Memory Hooks: Build jailbreak protection, add moderation policies and memory consistently via Filter Chains.
Plano pulls rote plumbing out of your framework so you can stay focused on what matters most: the core product logic of your agentic applications. Plano is backed by industry-leading LLM research and built on Envoy by its core contributors, who built critical infrastructure at scale for modern worklaods.
High-Level Network Sequence Diagram:

Jump to our docs to learn how you can use Plano to improve the speed, safety and obervability of your agentic applications.
Important
Plano and the Arch family of LLMs (like Plano-Orchestrator-4B, Arch-Router, etc) are hosted free of charge in the US-central region to give you a great first-run developer experience of Plano. To scale and run in production, you can either run these LLMs locally or contact us on Discord for API keys.
Build Agentic Apps with Plano
Plano handles orchestration, model management, and observability as modular building blocks - letting you configure only what you need (edge proxying for agentic orchestration and guardrails, or LLM routing from your services, or both together) to fit cleanly into existing architectures. Below is a simple multi-agent travel agent built with Plano that showcases all three core capabilities
📁 Full working code: See
demos/use_cases/travel_agents/for complete weather and flight agents you can run locally.
1. Define Your Agents in YAML
# config.yaml
version: v0.3.0
# What you declare: Agent URLs and natural language descriptions
# What you don't write: Intent classifiers, routing logic, model fallbacks, provider adapters, or tracing instrumentation
agents:
- id: weather_agent
url: http://localhost:10510
- id: flight_agent
url: http://localhost:10520
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
- model: anthropic/claude-3-5-sonnet
access_key: $ANTHROPIC_API_KEY
listeners:
- type: agent
name: travel_assistant
port: 8001
router: plano_orchestrator_v1 # Powered by our 4B-parameter routing model. You can change this to different models
agents:
- id: weather_agent
description: |
Gets real-time weather and forecasts for any city worldwide.
Handles: "What's the weather in Paris?", "Will it rain in Tokyo?"
- id: flight_agent
description: |
Searches flights between airports with live status and schedules.
Handles: "Flights from NYC to LA", "Show me flights to Seattle"
tracing:
random_sampling: 100 # Auto-capture traces for evaluation
2. Write Simple Agent Code
Your agents are just HTTP servers that implement the OpenAI-compatible chat completions endpoint. Use any language or framework:
# weather_agent.py
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI
app = FastAPI()
# Point to Plano's LLM gateway - it handles model routing for you
llm = AsyncOpenAI(base_url="http://localhost:12001/v1", api_key="EMPTY")
@app.post("/v1/chat/completions")
async def chat(request: Request):
body = await request.json()
messages = body.get("messages", [])
days = 7
# Your agent logic: fetch data, call APIs, run tools
# See demos/use_cases/travel_agents/ for the full implementation
weather_data = await get_weather_data(request, messages, days)
# Stream the response back through Plano
async def generate():
stream = await llm.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "system", "content": f"Weather: {weather_data}"}, *messages],
stream=True
)
async for chunk in stream:
yield f"data: {chunk.model_dump_json()}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
3. Start Plano & Query Your Agents
Prerequisites: Follow the prerequisites guide to install Plano and set up your environment.
# Start Plano
planoai up config.yaml
...
# Query - Plano intelligently routes to both agents in a single conversation
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "I want to travel from NYC to Paris next week. What is the weather like there, and can you find me some flights?"}
]
}'
# → Plano routes to weather_agent for Paris weather ✓
# → Then routes to flight_agent for NYC → Paris flights ✓
# → Returns a complete travel plan with both weather info and flight options
4. Get Observability and Model Agility for Free
Every request is traced end-to-end with OpenTelemetry - no instrumentation code needed.
What You Didn't Have to Build
| Infrastructure Concern | Without Plano | With Plano |
|---|---|---|
| Agent Orchestration | Write intent classifier + routing logic | Declare agent descriptions in YAML |
| Model Management | Handle each provider's API quirks | Unified LLM APIs with state management |
| Rich Tracing | Instrument every service with OTEL | Automatic end-to-end traces and logs |
| Learning Signals | Build pipeline to capture/export spans | Zero-code agentic signals |
| Adding Agents | Update routing code, test, redeploy | Add to config, restart |
Why it's efficient: Plano uses purpose-built, lightweight LLMs (like our 4B-parameter orchestrator) instead of heavyweight frameworks or GPT-4 for routing - giving you production-grade routing at a fraction of the cost and latency.
Contact
To get in touch with us, please join our discord server. We actively monitor that and offer support there.
Getting Started
Ready to try Plano? Check out our comprehensive documentation:
- Quickstart Guide - Get up and running in minutes
- LLM Routing - Route by model name, alias, or intelligent preferences
- Agent Orchestration - Build multi-agent workflows
- Filter Chains - Add guardrails, moderation, and memory hooks
- Prompt Targets - Turn prompts into deterministic API calls
- Observability - Traces, metrics, and logs
Contribution
We would love feedback on our Roadmap and we welcome contributions to Plano! Whether you're fixing bugs, adding new features, improving documentation, or creating tutorials, your help is much appreciated. Please visit our Contribution Guide for more details
Star ⭐️ the repo if you found Plano useful — new releases and updates land here first.
