deploy: 90b926c2ce

2026-06-23 15:38:07 +02:00 · 2026-04-15 23:42:15 +00:00 · 2026-04-15 23:42:15 +00:00 · 07b84a0d42
commit 07b84a0d42
parent 0dd2552f91
35 changed files with 105 additions and 105 deletions
--- a/includes/llms.txt
+++ b/includes/llms.txt
@ -1,6 +1,6 @@
 Plano Docs v0.4.18
 llms.txt (auto-generated)
-Generated (UTC): 2026-04-14T02:31:14.825020+00:00
+Generated (UTC): 2026-04-15T23:42:11.682797+00:00

 Table of contents
 - Agents (concepts/agents)
@ -3760,9 +3760,9 @@ response = client.chat.completions.create(



-Preference-aligned routing (Arch-Router)
+Preference-aligned routing (Plano-Orchestrator)

-Preference-aligned routing uses the Arch-Router model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
+Preference-aligned routing uses the Plano-Orchestrator model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.

 Domain: High-level topic of the request (e.g., legal, healthcare, programming).

@ -3770,7 +3770,7 @@ Action: What the user wants to do (e.g., summarize, generate code, translate).

 Routing preferences: Your mapping from (domain, action) to preferred models.

-Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
+Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.

 Configuration

@ -3810,20 +3810,20 @@ Client usage

 Clients can let the router decide or still specify aliases:

-# Let Arch-Router choose based on content
+# Let Plano-Orchestrator choose based on content
 response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
    # No model specified - router will analyze and choose claude-sonnet-4-5
 )

-Arch-Router
+Plano-Orchestrator

-The Arch-Router is a state-of-the-art preference-based routing model specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
+Plano-Orchestrator is a preference-based routing model specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.

 Addressing Traditional Routing Limitations:

 Human Preference Alignment
-Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
+Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.

 Flexible Model Integration
 The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
@ -3831,15 +3831,15 @@ The system supports seamlessly adding new models for routing without requiring r
 Preference-Encoded Routing
 Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.

-To support effective routing, Arch-Router introduces two key concepts:
+To support effective routing, Plano-Orchestrator introduces two key concepts:

 Domain – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).

 Action – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).

-Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
+Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.

-In summary, Arch-Router demonstrates:
+In summary, Plano-Orchestrator demonstrates:

 Structured Preference Routing: Aligns prompt request with model strengths using explicit domain–action mappings.

@ -3849,9 +3849,9 @@ Flexible and Adaptive: Supports evolving user needs, model updates, and new doma

 Production-Ready Performance: Optimized for low-latency, high-throughput applications in multi-model environments.

-Self-hosting Arch-Router
+Self-hosting Plano-Orchestrator

-By default, Plano uses a hosted Arch-Router endpoint. To run Arch-Router locally, you can serve the model yourself using either Ollama or vLLM.
+By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either Ollama or vLLM.

 Using Ollama (recommended for local development)

@ -3859,14 +3859,14 @@ Install Ollama

 Download and install from ollama.ai.

-Pull and serve Arch-Router
+Pull and serve the routing model

 ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
 ollama serve

 This downloads the quantized GGUF model from HuggingFace and starts serving on http://localhost:11434.

-Configure Plano to use local Arch-Router
+Configure Plano to use local routing model

 overrides:
  llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
@ -3919,7 +3919,7 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
    --load-format gguf \
    --chat-template ${SNAPSHOT_DIR}template.jinja \
    --tokenizer katanemo/Arch-Router-1.5B \
-    --served-model-name Arch-Router \
+    --served-model-name Plano-Orchestrator \
    --gpu-memory-utilization 0.3 \
    --tensor-parallel-size 1 \
    --enable-prefix-caching
@ -3927,10 +3927,10 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
 Configure Plano to use the vLLM endpoint

 overrides:
-  llm_routing_model: plano/Arch-Router
+  llm_routing_model: plano/Plano-Orchestrator

 model_providers:
-  - model: plano/Arch-Router
+  - model: plano/Plano-Orchestrator
    base_url: http://<your-server-ip>:10000

  - model: openai/gpt-5.2
@ -3950,16 +3950,16 @@ curl http://localhost:10000/v1/models

 Using vLLM on Kubernetes (GPU nodes)

-For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
+For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
 The demos/llm_routing/model_routing_service/ directory includes ready-to-use manifests:

-vllm-deployment.yaml — Arch-Router served by vLLM, with an init container to download
+vllm-deployment.yaml — Plano-Orchestrator served by vLLM, with an init container to download
 the model from HuggingFace

-plano-deployment.yaml — Plano proxy configured to use the in-cluster Arch-Router
+plano-deployment.yaml — Plano proxy configured to use the in-cluster Plano-Orchestrator

 config_k8s.yaml — Plano config with llm_routing_model pointing at
-http://arch-router:10000 instead of the default hosted endpoint
+http://plano-orchestrator:10000 instead of the default hosted endpoint

 Key things to know before deploying:

@ -4092,7 +4092,7 @@ Let the router decide: No model specified, router analyzes content

 Example Use Cases

-Here are common scenarios where Arch-Router excels:
+Here are common scenarios where Plano-Orchestrator excels:

 Coding Tasks: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.

@ -4134,13 +4134,13 @@ Best practices

 Unsupported Features

-The following features are not supported by the Arch-Router model:
+The following features are not supported by the Plano-Orchestrator routing model:

 Multi-modality: The model is not trained to process raw image or audio inputs. It can handle textual queries about these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.

-Function calling: Arch-Router is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
+Function calling: Plano-Orchestrator is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.

-System prompt dependency: Arch-Router routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
+System prompt dependency: Plano-Orchestrator routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.

 ---

@ -6455,7 +6455,7 @@ model_providers:

  # routing_preferences: tags a model with named capabilities so Plano's LLM router
  # can select the best model for each request based on intent. Requires the
-  # Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.
+  # Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
  # Each preference has a name (short label) and a description (used for intent matching).
  - model: groq/llama-3.3-70b-versatile
    access_key: $GROQ_API_KEY
@ -6591,7 +6591,7 @@ overrides:
  # Path to the trusted CA bundle for upstream TLS verification
  upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
  # Model used for intent-based LLM routing (must be listed in model_providers)
-  llm_routing_model: Arch-Router
+  llm_routing_model: Plano-Orchestrator
  # Model used for agent orchestration (must be listed in model_providers)
  agent_orchestration_model: Plano-Orchestrator