mirror of
https://github.com/katanemo/plano.git
synced 2026-06-17 15:25:17 +02:00
merge origin/main into musa/custom-trace-attributes
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
commit
e30f93b1cd
24 changed files with 268 additions and 45 deletions
|
|
@ -17,7 +17,7 @@ from sphinxawesome_theme.postprocess import Icons
|
|||
project = "Plano Docs"
|
||||
copyright = "2025, Katanemo Labs, Inc"
|
||||
author = "Katanemo Labs, Inc"
|
||||
release = " v0.4.7"
|
||||
release = " v0.4.8"
|
||||
|
||||
# -- General configuration ---------------------------------------------------
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ Plano's CLI allows you to manage and interact with the Plano efficiently. To ins
|
|||
|
||||
.. code-block:: console
|
||||
|
||||
$ uv tool install planoai==0.4.7
|
||||
$ uv tool install planoai==0.4.8
|
||||
|
||||
**Option 2: Install with pip (Traditional)**
|
||||
|
||||
|
|
@ -45,7 +45,7 @@ Plano's CLI allows you to manage and interact with the Plano efficiently. To ins
|
|||
|
||||
$ python -m venv venv
|
||||
$ source venv/bin/activate # On Windows, use: venv\Scripts\activate
|
||||
$ pip install planoai==0.4.7
|
||||
$ pip install planoai==0.4.8
|
||||
|
||||
|
||||
.. _llm_routing_quickstart:
|
||||
|
|
@ -90,7 +90,7 @@ Start Plano:
|
|||
|
||||
$ planoai up plano_config.yaml
|
||||
# Or if installed with uv tool: uvx planoai up plano_config.yaml
|
||||
2024-12-05 11:24:51,288 - planoai.main - INFO - Starting plano cli version: 0.4.7
|
||||
2024-12-05 11:24:51,288 - planoai.main - INFO - Starting plano cli version: 0.4.8
|
||||
2024-12-05 11:24:51,825 - planoai.utils - INFO - Schema validation successful!
|
||||
2024-12-05 11:24:51,825 - planoai.main - INFO - Starting plano
|
||||
...
|
||||
|
|
|
|||
|
|
@ -25,7 +25,7 @@ Create a ``docker-compose.yml`` file with the following configuration:
|
|||
# docker-compose.yml
|
||||
services:
|
||||
plano:
|
||||
image: katanemo/plano:0.4.7
|
||||
image: katanemo/plano:0.4.8
|
||||
container_name: plano
|
||||
ports:
|
||||
- "10000:10000" # ingress (client -> plano)
|
||||
|
|
|
|||
|
|
@ -46,6 +46,117 @@ Also, Plano utilizes `Envoy event-based thread model <https://blog.envoyproxy.io
|
|||
Worker threads rarely share state and operate in a trivially parallel fashion. This threading model
|
||||
enables scaling to very high core count CPUs.
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ P L A N O │
|
||||
│ AI-native proxy and data plane for agentic applications │
|
||||
│ │
|
||||
│ ┌─────────────────────┐ │
|
||||
│ │ YOUR CLIENTS │ │
|
||||
│ │ (apps· agents · UI) │ │
|
||||
│ └──────────┬──────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────────────────┼──────────────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ┌──────▼──────────┐ ┌─────────▼────────┐ ┌────────▼─────────┐ │
|
||||
│ │ Agent Port(s) │ │ Model Port │ │ Function-Call │ │
|
||||
│ │ :8001+ │ │ :12000 │ │ Port :10000 │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ route your │ │ direct LLM │ │ prompt-target / │ │
|
||||
│ │ prompts to │ │ calls with │ │ tool dispatch │ │
|
||||
│ │ the right │ │ model-alias │ │ with parameter │ │
|
||||
│ │ agent │ │ translation │ │ extraction │ │
|
||||
│ └──────┬──────────┘ └─────────┬────────┘ └────────┬─────────┘ │
|
||||
│ └──────────────────────────────┼─────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ╔══════════════════════════════════════▼══════════════════════════════════════╗ │
|
||||
│ ║ BRIGHTSTAFF (SUBSYSTEM) — Agentic Control Plane ║ │
|
||||
│ ║ Async · non-blocking · parallel per-request Tokio tasks ║ │
|
||||
│ ║ ║ │
|
||||
│ ║ ┌─────────────────────────────────────────────────────────────────────┐ ║ │
|
||||
│ ║ │ Agentic ROUTER │ ║ │
|
||||
│ ║ │ Reads listener config · maps incoming request to execution path │ ║ │
|
||||
│ ║ │ │ ║ │
|
||||
│ ║ │ /agents/* ──────────────────────► AGENT PATH │ ║ │
|
||||
│ ║ │ /v1/chat|messages|responses ──────► LLM PATH │ ║ │
|
||||
│ ║ └─────────────────────────────────────────────────────────────────────┘ ║ │
|
||||
│ ║ ║ │
|
||||
│ ║ ─────────────────────── AGENT PATH ──────────────────────────────────── ║ │
|
||||
│ ║ ║ │
|
||||
│ ║ ┌──────────────────────────────────────────────────────────────────────┐ ║ │
|
||||
│ ║ │ FILTER CHAIN (pipeline_processor.rs) │ ║ │
|
||||
│ ║ │ │ ║ │
|
||||
│ ║ │ prompt ──► [input_guards] ──► [query_rewrite] ──► [context_builder] │ ║ │
|
||||
│ ║ │ guardrails prompt mutation RAG / enrichment │ ║ │
|
||||
│ ║ │ │ ║ │
|
||||
│ ║ │ Each filter: HTTP or MCP · can mutate, enrich, or short-circuit │ ║ │
|
||||
│ ║ └──────────────────────────────────┬───────────────────────────────────┘ ║ │
|
||||
│ ║ │ ║ │
|
||||
│ ║ ┌──────────────────────────────────▼───────────────────────────────────┐ ║ │
|
||||
│ ║ │ AGENT ORCHESTRATOR (agent_chat_completions.rs) │ ║ │
|
||||
│ ║ │ Select agent · forward enriched request · manage conversation state │ ║ │
|
||||
│ ║ │ Stream response back · multi-turn aware │ ║ │
|
||||
│ ║ └──────────────────────────────────────────────────────────────────────┘ ║ │
|
||||
│ ║ ║ │
|
||||
│ ║ ─────────────────────── LLM PATH ────────────────────────────────────── ║ │
|
||||
│ ║ ║ │
|
||||
│ ║ ┌──────────────────────────────────────────────────────────────────────┐ ║ │
|
||||
│ ║ │ MODEL ROUTER (llm_router.rs + router_chat.rs) │ ║ │
|
||||
│ ║ │ Model alias resolution · preference-based provider selection │ ║ │
|
||||
│ ║ │ "fast-llm" → gpt-4o-mini · "smart-llm" → gpt-4o │ ║ │
|
||||
│ ║ └──────────────────────────────────────────────────────────────────────┘ ║ │
|
||||
│ ║ ║ │
|
||||
│ ║ ─────────────────── ALWAYS ON (every request) ───────────────────────── ║ │
|
||||
│ ║ ║ │
|
||||
│ ║ ┌────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ ║ │
|
||||
│ ║ │ SIGNALS ANALYZER │ │ STATE STORAGE │ │ OTEL TRACING │ ║ │
|
||||
│ ║ │ loop detection │ │ memory / postgres │ │ traceparent │ ║ │
|
||||
│ ║ │ repetition score │ │ /v1/responses │ │ span injection │ ║ │
|
||||
│ ║ │ quality indicators│ │ stateful API │ │ trace export │ ║ │
|
||||
│ ║ └────────────────────┘ └─────────────────────┘ └──────────────────┘ ║ │
|
||||
│ ╚═════════════════════════════════════╤═══════════════════════════════════════╝ │
|
||||
│ │ │
|
||||
│ ┌─────────────────────────────────────▼──────────────────────────────────────┐ │
|
||||
│ │ LLM GATEWAY (llm_gateway.wasm — embedded in Envoy egress filter chain) │ │
|
||||
│ │ │ │
|
||||
│ │ Rate limiting · Provider format translation · TTFT metrics │ │
|
||||
│ │ OpenAI → Anthropic · Gemini · Mistral · Groq · DeepSeek · xAI · Bedrock │ │
|
||||
│ │ │ │
|
||||
│ │ Envoy handles beneath this: TLS origination · SNI · retry + backoff │ │
|
||||
│ │ connection pooling · LOGICAL_DNS · structured access logs │ │
|
||||
│ └─────────────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
└─────────────────────────────────────────┼───────────────────────────────────────────┘
|
||||
│
|
||||
┌───────────────────────────┼────────────────────────────┐
|
||||
│ │ │
|
||||
┌─────────▼──────────┐ ┌────────────▼──────────┐ ┌────────────▼──────────┐
|
||||
│ LLM PROVIDERS │ │ EXTERNAL AGENTS │ │ TOOL / API BACKENDS │
|
||||
│ OpenAI · Anthropic│ │ (filter chain svc) │ │ (endpoint clusters) │
|
||||
│ Gemini · Mistral │ │ HTTP / MCP :10500+ │ │ user-defined hosts │
|
||||
│ Groq · DeepSeek │ │ input_guards │ │ │
|
||||
│ xAI · Together.ai │ │ query_rewriter │ │ │
|
||||
└────────────────────┘ │ context_builder │ └───────────────────────┘
|
||||
└───────────────────────┘
|
||||
|
||||
|
||||
HOW PLANO IS DIFFERENT
|
||||
─────────────────────────────────────────────────────────────────────────────────
|
||||
Brightstaff is the entire agentic brain — one async Rust binary that handles
|
||||
agent selection, filter chain orchestration, model routing, state, and signals
|
||||
without blocking a thread per request.
|
||||
|
||||
Filter chains are programmable dataplane steps — reusable HTTP/MCP services
|
||||
you wire into any agent, executing in-path before the agent ever sees the prompt.
|
||||
|
||||
The LLM gateway is a zero-overhead WASM plugin inside Envoy — format translation
|
||||
and rate limiting happen in-process with the proxy, not as a separate service hop.
|
||||
|
||||
Envoy provides the transport substrate (TLS, HTTP codecs, retries, connection
|
||||
pools, access logs) so Plano never reimplements solved infrastructure problems.
|
||||
|
||||
|
||||
Request Flow (Ingress)
|
||||
----------------------
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue