This commit is contained in:
adilhafeez 2026-04-15 23:42:15 +00:00
parent 0dd2552f91
commit 07b84a0d42
35 changed files with 105 additions and 105 deletions

View file

@ -1,6 +1,6 @@
Plano Docs v0.4.18
llms.txt (auto-generated)
Generated (UTC): 2026-04-14T02:31:14.825020+00:00
Generated (UTC): 2026-04-15T23:42:11.682797+00:00
Table of contents
- Agents (concepts/agents)
@ -3760,9 +3760,9 @@ response = client.chat.completions.create(
Preference-aligned routing (Arch-Router)
Preference-aligned routing (Plano-Orchestrator)
Preference-aligned routing uses the Arch-Router model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
Preference-aligned routing uses the Plano-Orchestrator model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
Domain: High-level topic of the request (e.g., legal, healthcare, programming).
@ -3770,7 +3770,7 @@ Action: What the user wants to do (e.g., summarize, generate code, translate).
Routing preferences: Your mapping from (domain, action) to preferred models.
Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
Configuration
@ -3810,20 +3810,20 @@ Client usage
Clients can let the router decide or still specify aliases:
# Let Arch-Router choose based on content
# Let Plano-Orchestrator choose based on content
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
# No model specified - router will analyze and choose claude-sonnet-4-5
)
Arch-Router
Plano-Orchestrator
The Arch-Router is a state-of-the-art preference-based routing model specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
Plano-Orchestrator is a preference-based routing model specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
Addressing Traditional Routing Limitations:
Human Preference Alignment
Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
Flexible Model Integration
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
@ -3831,15 +3831,15 @@ The system supports seamlessly adding new models for routing without requiring r
Preference-Encoded Routing
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
To support effective routing, Arch-Router introduces two key concepts:
To support effective routing, Plano-Orchestrator introduces two key concepts:
Domain the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
Action the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
In summary, Arch-Router demonstrates:
In summary, Plano-Orchestrator demonstrates:
Structured Preference Routing: Aligns prompt request with model strengths using explicit domainaction mappings.
@ -3849,9 +3849,9 @@ Flexible and Adaptive: Supports evolving user needs, model updates, and new doma
Production-Ready Performance: Optimized for low-latency, high-throughput applications in multi-model environments.
Self-hosting Arch-Router
Self-hosting Plano-Orchestrator
By default, Plano uses a hosted Arch-Router endpoint. To run Arch-Router locally, you can serve the model yourself using either Ollama or vLLM.
By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either Ollama or vLLM.
Using Ollama (recommended for local development)
@ -3859,14 +3859,14 @@ Install Ollama
Download and install from ollama.ai.
Pull and serve Arch-Router
Pull and serve the routing model
ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
ollama serve
This downloads the quantized GGUF model from HuggingFace and starts serving on http://localhost:11434.
Configure Plano to use local Arch-Router
Configure Plano to use local routing model
overrides:
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
@ -3919,7 +3919,7 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
--load-format gguf \
--chat-template ${SNAPSHOT_DIR}template.jinja \
--tokenizer katanemo/Arch-Router-1.5B \
--served-model-name Arch-Router \
--served-model-name Plano-Orchestrator \
--gpu-memory-utilization 0.3 \
--tensor-parallel-size 1 \
--enable-prefix-caching
@ -3927,10 +3927,10 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
Configure Plano to use the vLLM endpoint
overrides:
llm_routing_model: plano/Arch-Router
llm_routing_model: plano/Plano-Orchestrator
model_providers:
- model: plano/Arch-Router
- model: plano/Plano-Orchestrator
base_url: http://<your-server-ip>:10000
- model: openai/gpt-5.2
@ -3950,16 +3950,16 @@ curl http://localhost:10000/v1/models
Using vLLM on Kubernetes (GPU nodes)
For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
The demos/llm_routing/model_routing_service/ directory includes ready-to-use manifests:
vllm-deployment.yaml — Arch-Router served by vLLM, with an init container to download
vllm-deployment.yaml — Plano-Orchestrator served by vLLM, with an init container to download
the model from HuggingFace
plano-deployment.yaml — Plano proxy configured to use the in-cluster Arch-Router
plano-deployment.yaml — Plano proxy configured to use the in-cluster Plano-Orchestrator
config_k8s.yaml — Plano config with llm_routing_model pointing at
http://arch-router:10000 instead of the default hosted endpoint
http://plano-orchestrator:10000 instead of the default hosted endpoint
Key things to know before deploying:
@ -4092,7 +4092,7 @@ Let the router decide: No model specified, router analyzes content
Example Use Cases
Here are common scenarios where Arch-Router excels:
Here are common scenarios where Plano-Orchestrator excels:
Coding Tasks: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.
@ -4134,13 +4134,13 @@ Best practices
Unsupported Features
The following features are not supported by the Arch-Router model:
The following features are not supported by the Plano-Orchestrator routing model:
Multi-modality: The model is not trained to process raw image or audio inputs. It can handle textual queries about these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.
Function calling: Arch-Router is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
Function calling: Plano-Orchestrator is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
System prompt dependency: Arch-Router routes based solely on the users conversation history. It does not use or rely on system prompts for routing decisions.
System prompt dependency: Plano-Orchestrator routes based solely on the users conversation history. It does not use or rely on system prompts for routing decisions.
---
@ -6455,7 +6455,7 @@ model_providers:
# routing_preferences: tags a model with named capabilities so Plano's LLM router
# can select the best model for each request based on intent. Requires the
# Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
# Each preference has a name (short label) and a description (used for intent matching).
- model: groq/llama-3.3-70b-versatile
access_key: $GROQ_API_KEY
@ -6591,7 +6591,7 @@ overrides:
# Path to the trusted CA bundle for upstream TLS verification
upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
# Model used for intent-based LLM routing (must be listed in model_providers)
llm_routing_model: Arch-Router
llm_routing_model: Plano-Orchestrator
# Model used for agent orchestration (must be listed in model_providers)
agent_orchestration_model: Plano-Orchestrator