mirror of
https://github.com/katanemo/plano.git
synced 2026-06-23 15:38:07 +02:00
deploy: 90b926c2ce
This commit is contained in:
parent
0dd2552f91
commit
07b84a0d42
35 changed files with 105 additions and 105 deletions
|
|
@ -1,6 +1,6 @@
|
|||
Plano Docs v0.4.18
|
||||
llms.txt (auto-generated)
|
||||
Generated (UTC): 2026-04-14T02:31:14.825020+00:00
|
||||
Generated (UTC): 2026-04-15T23:42:11.682797+00:00
|
||||
|
||||
Table of contents
|
||||
- Agents (concepts/agents)
|
||||
|
|
@ -3760,9 +3760,9 @@ response = client.chat.completions.create(
|
|||
|
||||
|
||||
|
||||
Preference-aligned routing (Arch-Router)
|
||||
Preference-aligned routing (Plano-Orchestrator)
|
||||
|
||||
Preference-aligned routing uses the Arch-Router model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
|
||||
Preference-aligned routing uses the Plano-Orchestrator model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
|
||||
|
||||
Domain: High-level topic of the request (e.g., legal, healthcare, programming).
|
||||
|
||||
|
|
@ -3770,7 +3770,7 @@ Action: What the user wants to do (e.g., summarize, generate code, translate).
|
|||
|
||||
Routing preferences: Your mapping from (domain, action) to preferred models.
|
||||
|
||||
Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
|
||||
Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
|
||||
|
||||
Configuration
|
||||
|
||||
|
|
@ -3810,20 +3810,20 @@ Client usage
|
|||
|
||||
Clients can let the router decide or still specify aliases:
|
||||
|
||||
# Let Arch-Router choose based on content
|
||||
# Let Plano-Orchestrator choose based on content
|
||||
response = client.chat.completions.create(
|
||||
messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
|
||||
# No model specified - router will analyze and choose claude-sonnet-4-5
|
||||
)
|
||||
|
||||
Arch-Router
|
||||
Plano-Orchestrator
|
||||
|
||||
The Arch-Router is a state-of-the-art preference-based routing model specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
|
||||
Plano-Orchestrator is a preference-based routing model specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
|
||||
|
||||
Addressing Traditional Routing Limitations:
|
||||
|
||||
Human Preference Alignment
|
||||
Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
|
||||
Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
|
||||
|
||||
Flexible Model Integration
|
||||
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
|
||||
|
|
@ -3831,15 +3831,15 @@ The system supports seamlessly adding new models for routing without requiring r
|
|||
Preference-Encoded Routing
|
||||
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
|
||||
|
||||
To support effective routing, Arch-Router introduces two key concepts:
|
||||
To support effective routing, Plano-Orchestrator introduces two key concepts:
|
||||
|
||||
Domain – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
|
||||
|
||||
Action – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
|
||||
|
||||
Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
|
||||
Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
|
||||
|
||||
In summary, Arch-Router demonstrates:
|
||||
In summary, Plano-Orchestrator demonstrates:
|
||||
|
||||
Structured Preference Routing: Aligns prompt request with model strengths using explicit domain–action mappings.
|
||||
|
||||
|
|
@ -3849,9 +3849,9 @@ Flexible and Adaptive: Supports evolving user needs, model updates, and new doma
|
|||
|
||||
Production-Ready Performance: Optimized for low-latency, high-throughput applications in multi-model environments.
|
||||
|
||||
Self-hosting Arch-Router
|
||||
Self-hosting Plano-Orchestrator
|
||||
|
||||
By default, Plano uses a hosted Arch-Router endpoint. To run Arch-Router locally, you can serve the model yourself using either Ollama or vLLM.
|
||||
By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either Ollama or vLLM.
|
||||
|
||||
Using Ollama (recommended for local development)
|
||||
|
||||
|
|
@ -3859,14 +3859,14 @@ Install Ollama
|
|||
|
||||
Download and install from ollama.ai.
|
||||
|
||||
Pull and serve Arch-Router
|
||||
Pull and serve the routing model
|
||||
|
||||
ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
ollama serve
|
||||
|
||||
This downloads the quantized GGUF model from HuggingFace and starts serving on http://localhost:11434.
|
||||
|
||||
Configure Plano to use local Arch-Router
|
||||
Configure Plano to use local routing model
|
||||
|
||||
overrides:
|
||||
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
|
|
@ -3919,7 +3919,7 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
|
|||
--load-format gguf \
|
||||
--chat-template ${SNAPSHOT_DIR}template.jinja \
|
||||
--tokenizer katanemo/Arch-Router-1.5B \
|
||||
--served-model-name Arch-Router \
|
||||
--served-model-name Plano-Orchestrator \
|
||||
--gpu-memory-utilization 0.3 \
|
||||
--tensor-parallel-size 1 \
|
||||
--enable-prefix-caching
|
||||
|
|
@ -3927,10 +3927,10 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
|
|||
Configure Plano to use the vLLM endpoint
|
||||
|
||||
overrides:
|
||||
llm_routing_model: plano/Arch-Router
|
||||
llm_routing_model: plano/Plano-Orchestrator
|
||||
|
||||
model_providers:
|
||||
- model: plano/Arch-Router
|
||||
- model: plano/Plano-Orchestrator
|
||||
base_url: http://<your-server-ip>:10000
|
||||
|
||||
- model: openai/gpt-5.2
|
||||
|
|
@ -3950,16 +3950,16 @@ curl http://localhost:10000/v1/models
|
|||
|
||||
Using vLLM on Kubernetes (GPU nodes)
|
||||
|
||||
For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
|
||||
For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
|
||||
The demos/llm_routing/model_routing_service/ directory includes ready-to-use manifests:
|
||||
|
||||
vllm-deployment.yaml — Arch-Router served by vLLM, with an init container to download
|
||||
vllm-deployment.yaml — Plano-Orchestrator served by vLLM, with an init container to download
|
||||
the model from HuggingFace
|
||||
|
||||
plano-deployment.yaml — Plano proxy configured to use the in-cluster Arch-Router
|
||||
plano-deployment.yaml — Plano proxy configured to use the in-cluster Plano-Orchestrator
|
||||
|
||||
config_k8s.yaml — Plano config with llm_routing_model pointing at
|
||||
http://arch-router:10000 instead of the default hosted endpoint
|
||||
http://plano-orchestrator:10000 instead of the default hosted endpoint
|
||||
|
||||
Key things to know before deploying:
|
||||
|
||||
|
|
@ -4092,7 +4092,7 @@ Let the router decide: No model specified, router analyzes content
|
|||
|
||||
Example Use Cases
|
||||
|
||||
Here are common scenarios where Arch-Router excels:
|
||||
Here are common scenarios where Plano-Orchestrator excels:
|
||||
|
||||
Coding Tasks: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.
|
||||
|
||||
|
|
@ -4134,13 +4134,13 @@ Best practices
|
|||
|
||||
Unsupported Features
|
||||
|
||||
The following features are not supported by the Arch-Router model:
|
||||
The following features are not supported by the Plano-Orchestrator routing model:
|
||||
|
||||
Multi-modality: The model is not trained to process raw image or audio inputs. It can handle textual queries about these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.
|
||||
|
||||
Function calling: Arch-Router is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
|
||||
Function calling: Plano-Orchestrator is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
|
||||
|
||||
System prompt dependency: Arch-Router routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
|
||||
System prompt dependency: Plano-Orchestrator routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -6455,7 +6455,7 @@ model_providers:
|
|||
|
||||
# routing_preferences: tags a model with named capabilities so Plano's LLM router
|
||||
# can select the best model for each request based on intent. Requires the
|
||||
# Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.
|
||||
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
|
||||
# Each preference has a name (short label) and a description (used for intent matching).
|
||||
- model: groq/llama-3.3-70b-versatile
|
||||
access_key: $GROQ_API_KEY
|
||||
|
|
@ -6591,7 +6591,7 @@ overrides:
|
|||
# Path to the trusted CA bundle for upstream TLS verification
|
||||
upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
|
||||
# Model used for intent-based LLM routing (must be listed in model_providers)
|
||||
llm_routing_model: Arch-Router
|
||||
llm_routing_model: Plano-Orchestrator
|
||||
# Model used for agent orchestration (must be listed in model_providers)
|
||||
agent_orchestration_model: Plano-Orchestrator
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue