mirror of https://github.com/katanemo/plano.git synced 2026-06-17 15:25:17 +02:00

Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic. https://planoai.dev

ai-gateway ai-gateway-support envoy envoyproxy gateway generative-ai llm-gateway llm-inference llm-proxy llm-routing llmops llms openai prompt proxy proxy-server routing

Find a file

Salman Paracha b53fb81a3f removed tests for guards in configuration.rs		2025-12-20 10:58:01 -08:00
.github/workflows	Improve end to end tracing (#628 )	2025-12-11 15:21:57 -08:00
arch	Use mcp tools for filter chain (#621 )	2025-12-17 17:30:14 -08:00
crates	removed tests for guards in configuration.rs	2025-12-20 10:58:01 -08:00
demos	removed prompt guards altogether from our repo - use filters	2025-12-20 10:50:39 -08:00
docs	removed prompt guards altogether from our repo - use filters	2025-12-20 10:50:39 -08:00
tests	removed prompt guards altogether from our repo - use filters	2025-12-20 10:50:39 -08:00
www	updating readme and docs with note about Arch-Function (#285 )	2024-11-19 08:43:56 -08:00
.dockerignore	update .dockerignore file after filter move	2024-10-18 14:44:39 -07:00
.gitignore	removing model_server. buh bye (#619 )	2025-11-22 15:04:41 -08:00
.gitmodules	Remove OMF (#78 )	2024-09-24 15:18:20 -07:00
.pre-commit-config.yaml	run rust tests for all crates upon commit (#393 )	2025-02-05 18:57:01 -08:00
archgw.code-workspace	Improve end to end tracing (#628 )	2025-12-11 15:21:57 -08:00
build_filter_image.sh	release 0.3.22 (#629 )	2025-12-11 11:20:19 -08:00
CONTRIBUTING.md	Update discord server invite url (#428 )	2025-03-05 13:21:35 -08:00
LICENSE	Create LICENSE	2024-10-10 06:30:23 -07:00
README.md	removed prompt guards altogether from our repo - use filters	2025-12-20 10:50:39 -08:00

README.md

Plano is a models-native proxy and data plane for agents.

Plano pulls out the rote plumbing work and decouples you from brittle framework abstractions, centralizing what shouldn’t be bespoke in every codebase - like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for moderation, and smart LLM routing APIs for UX and DX agility. Use any language or AI framework, and deliver agents faster to production.

Quickstart • Demos • Route LLMs • Build Agentic Apps with Plano • Documentation • Contact

Overview

Building agentic demos is easy. Shipping agentic applications safely, reliably, and repeatably to production is hard. After the thrill of a quick hack, you end up building the “hidden middleware” to reach production: routing logic to reach the right agent, guardrail hooks for safety and moderation, evaluation and observability glue for continuous learning, and model/provider quirks scattered across frameworks and application code.

Plano solves this by moving core delivery concerns into a unified, out-of-process dataplane.

🚦 Orchestration: Low-latency orchestration between agents, and add new agents without changing app code
🔗 Model Agility: Route by model name, alias (semantic names) or automatically via preferences
🕵 Agentic Signals™: Zero-code capture of behavior signals plus OTEL traces/metrics across every agent.
🛡️ Moderation & Memory Hooks: Build jailbreak protection, add moderation policies and memory consistently via Filter Chains.

Plano pulls rote plumbing out of your framework so you can stay focused on what matters most: the core product logic of your agentic applications. Plano is backed by industry-leading LLM research and built on Envoy by its core contributors, who built critical infrastructure at scale for modern worklaods.

High-Level Network Sequence Diagram:

Jump to our docs to learn how you can use Plano to improve the speed, safety and obervability of your agentic applications.

Important

Plano and the Arch family of LLMs (like Plano-Orchestrator-4B, Arch-Router, etc) are hosted free of charge in the US-central region to give you a great first-run developer experience of Plano. To scale and run in production, you can either run these LLMs locally or contact us on Discord for API keys.

Contact

To get in touch with us, please join our discord server. We will be monitoring that actively and offering support there.

Demos

Sample App: Weather Forecast Agent - A sample agentic weather forecasting app that highlights core function calling capabilities of Plano.
Sample App: Network Operator Agent - A simple network device switch operator agent that can retrieve device statistics and reboot them.

Quickstart

Follow this quickstart guide to use Plano as a router for local or hosted LLMs, including dynamic routing. Later in the section we will see how you can Plano to build highly capable agentic applications, and to provide e2e observability.

Prerequisites

Before you begin, ensure you have the following:

Docker System (v24)
Docker compose (v2.29)
Python (v3.13)

Plano's CLI allows you to manage and interact with the Plano gateway efficiently. To install the CLI, simply run the following command:

Tip

We recommend that developers create a new Python virtual environment to isolate dependencies before installing Plano. This ensures that plano and its dependencies do not interfere with other packages on your system.

$ python3.12 -m venv venv
$ source venv/bin/activate   # On Windows, use: venv\Scripts\activate
$ pip install plano==0.4.0

Use Plano as a LLM Router

Plano supports multiple powerful routing strategies for LLMs. Model-based routing gives you direct control over specific models and supports 11+ LLM providers including OpenAI, Anthropic, DeepSeek, Mistral, Groq, and more. Alias-based routing lets you create semantic model names that decouple your application code from specific providers, making it easy to experiment with different models or handle provider changes without refactoring. For full configuration examples and code walkthroughs, see our routing guides.

Preference-aligned Routing

Preference-aligned routing provides intelligent, dynamic model selection based on natural language descriptions of tasks and preferences. Instead of hardcoded routing logic, you describe what each model is good at using plain English.

version: v0.1.0

listeners:
  egress_traffic:
    address: 0.0.0.0
    port: 12000
    message_format: openai
    timeout: 30s

llm_providers:
  - model: openai/gpt-4o
    access_key: $OPENAI_API_KEY
    routing_preferences:
      - name: complex_reasoning
        description: deep analysis, mathematical problem solving, and logical reasoning
      - name: creative_writing
        description: storytelling, creative content, and artistic writing

  - model: deepseek/deepseek-coder
    access_key: $DEEPSEEK_API_KEY
    routing_preferences:
      - name: code_generation
        description: generating new code, writing functions, and creating scripts
      - name: code_review
        description: analyzing existing code for bugs, improvements, and optimization

Plano uses a lightweight 1.5B autoregressive model to intelligently map user prompts to these preferences, automatically selecting the best model for each request. This approach adapts to intent drift, supports multi-turn conversations, and avoids brittle embedding-based classifiers or manual if/else chains. No retraining required when adding models or updating policies — routing is governed entirely by human-readable rules.

Learn More: Check our documentation for comprehensive provider setup guides and routing strategies. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper:

Build Agentic Apps with Plano

In following quickstart we will show you how easy it is to build AI agent with Plano gateway. We will build a currency exchange agent using following simple steps. For this demo we will use https://api.frankfurter.dev/ to fetch latest price for currencies and assume USD as base currency.

Step 1. Create plano config file

Create plano_config.yaml file with following content,

version: v0.1.0

listeners:
  ingress_traffic:
    address: 0.0.0.0
    port: 10000
    message_format: openai
    timeout: 30s

llm_providers:
  - access_key: $OPENAI_API_KEY
    model: openai/gpt-4o

system_prompt: |
  You are a helpful assistant.

prompt_targets:
  - name: currency_exchange
    description: Get currency exchange rate from USD to other currencies
    parameters:
      - name: currency_symbol
        description: the currency that needs conversion
        required: true
        type: str
        in_path: true
    endpoint:
      name: frankfurter_api
      path: /v1/latest?base=USD&symbols={currency_symbol}
    system_prompt: |
      You are a helpful assistant. Show me the currency symbol you want to convert from USD.

  - name: get_supported_currencies
    description: Get list of supported currencies for conversion
    endpoint:
      name: frankfurter_api
      path: /v1/currencies

endpoints:
  frankfurter_api:
    endpoint: api.frankfurter.dev:443
    protocol: https

Step 2. Start plano gateway with currency conversion config


$ plano up plano_config.yaml
2024-12-05 16:56:27,979 - cli.main - INFO - Starting plano cli version: 0.4.0
2024-12-05 16:56:28,485 - cli.utils - INFO - Schema validation successful!
2024-12-05 16:56:28,485 - cli.main - INFO - Starting plano model server and plano gateway
2024-12-05 16:56:51,647 - cli.core - INFO - Container is healthy!

Once the gateway is up you can start interacting with at port 10000 using openai chat completion API.

Some of the sample queries you can ask could be what is currency rate for gbp? or show me list of currencies for conversion.

Step 3. Interacting with gateway using curl command

Here is a sample curl command you can use to interact,

$ curl --header 'Content-Type: application/json' \
  --data '{"messages": [{"role": "user","content": "what is exchange rate for gbp"}], "model": "none"}' \
  http://localhost:10000/v1/chat/completions | jq ".choices[0].message.content"

"As of the date provided in your context, December 5, 2024, the exchange rate for GBP (British Pound) from USD (United States Dollar) is 0.78558. This means that 1 USD is equivalent to 0.78558 GBP."

And to get list of supported currencies,

$ curl --header 'Content-Type: application/json' \
  --data '{"messages": [{"role": "user","content": "show me list of currencies that are supported for conversion"}], "model": "none"}' \
  http://localhost:10000/v1/chat/completions | jq ".choices[0].message.content"

"Here is a list of the currencies that are supported for conversion from USD, along with their symbols:\n\n1. AUD - Australian Dollar\n2. BGN - Bulgarian Lev\n3. BRL - Brazilian Real\n4. CAD - Canadian Dollar\n5. CHF - Swiss Franc\n6. CNY - Chinese Renminbi Yuan\n7. CZK - Czech Koruna\n8. DKK - Danish Krone\n9. EUR - Euro\n10. GBP - British Pound\n11. HKD - Hong Kong Dollar\n12. HUF - Hungarian Forint\n13. IDR - Indonesian Rupiah\n14. ILS - Israeli New Sheqel\n15. INR - Indian Rupee\n16. ISK - Icelandic Króna\n17. JPY - Japanese Yen\n18. KRW - South Korean Won\n19. MXN - Mexican Peso\n20. MYR - Malaysian Ringgit\n21. NOK - Norwegian Krone\n22. NZD - New Zealand Dollar\n23. PHP - Philippine Peso\n24. PLN - Polish Złoty\n25. RON - Romanian Leu\n26. SEK - Swedish Krona\n27. SGD - Singapore Dollar\n28. THB - Thai Baht\n29. TRY - Turkish Lira\n30. USD - United States Dollar\n31. ZAR - South African Rand\n\nIf you want to convert USD to any of these currencies, you can select the one you are interested in."

Observability

Plano is designed to support best-in class observability by supporting open standards. Please read our docs on observability for more details on tracing, metrics, and logs. The screenshot below is from our integration with Signoz (among others)

Contribution

We would love feedback on our Roadmap and we welcome contributions to Plano! Whether you're fixing bugs, adding new features, improving documentation, or creating tutorials, your help is much appreciated. Please visit our Contribution Guide for more details

README.md Unescape Escape