plano/docs/source/guides/llm_router.rst

.. _llm_router:

LLM Routing
==============================================================

With the rapid proliferation of large language models (LLMs) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models. Plano provides three distinct routing approaches to meet different use cases: :ref:`Model-based routing <model_based_routing>`, :ref:`Alias-based routing <alias_based_routing>`, and :ref:`Preference-aligned routing <preference_aligned_routing>`. This enables optimal performance, cost efficiency, and response quality by matching requests with the most suitable model from your available LLM fleet.

.. note::
  For details on supported model providers, configuration options, and client libraries, see :ref:`LLM Providers <llm_providers>`.

Routing Methods
---------------

.. _model_based_routing:

Model-based routing
~~~~~~~~~~~~~~~~~~~

Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``:

- Use provider-specific names like ``openai/gpt-5.2`` or ``anthropic/claude-sonnet-4-5``
- Provides full control and transparency over which model handles each request
- Ideal for production workloads where you want predictable routing behavior

Configuration
^^^^^^^^^^^^^

Configure your LLM providers with specific provider/model names:

.. code-block:: yaml
    :caption: Model-based Routing Configuration

    listeners:
      egress_traffic:
        address: 0.0.0.0
        port: 12000
        message_format: openai
        timeout: 30s

    llm_providers:
      - model: openai/gpt-5.2
        access_key: $OPENAI_API_KEY
        default: true

      - model: openai/gpt-5
        access_key: $OPENAI_API_KEY

      - model: anthropic/claude-sonnet-4-5
        access_key: $ANTHROPIC_API_KEY

Client usage
^^^^^^^^^^^^

Clients specify exact models:

.. code-block:: python

    # Direct provider/model specification
    response = client.chat.completions.create(
        model="openai/gpt-5.2",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    response = client.chat.completions.create(
        model="anthropic/claude-sonnet-4-5",
        messages=[{"role": "user", "content": "Write a story"}]
    )

.. _alias_based_routing:

Alias-based routing
~~~~~~~~~~~~~~~~~~~

Alias-based routing lets you create semantic model names that decouple your application from specific providers:

- Use meaningful names like ``fast-model``, ``reasoning-model``, or ``plano.summarize.v1`` (see :ref:`model_aliases`)
- Maps semantic names to underlying provider models for easier experimentation and provider switching
- Ideal for applications that want abstraction from specific model names while maintaining control

Configuration
^^^^^^^^^^^^^

Configure semantic aliases that map to underlying models:

.. code-block:: yaml
    :caption: Alias-based Routing Configuration

    listeners:
      egress_traffic:
        address: 0.0.0.0
        port: 12000
        message_format: openai
        timeout: 30s

    llm_providers:
      - model: openai/gpt-5.2
        access_key: $OPENAI_API_KEY

      - model: openai/gpt-5
        access_key: $OPENAI_API_KEY

      - model: anthropic/claude-sonnet-4-5
        access_key: $ANTHROPIC_API_KEY

    model_aliases:
      # Model aliases - friendly names that map to actual provider names
      fast-model:
        target: gpt-5.2

      reasoning-model:
        target: gpt-5

      creative-model:
        target: claude-sonnet-4-5

Client usage
^^^^^^^^^^^^

Clients use semantic names:

.. code-block:: python

    # Using semantic aliases
    response = client.chat.completions.create(
        model="fast-model",  # Routes to best available fast model
        messages=[{"role": "user", "content": "Quick summary please"}]
    )

    response = client.chat.completions.create(
        model="reasoning-model",  # Routes to best reasoning model
        messages=[{"role": "user", "content": "Solve this complex problem"}]
    )

.. _preference_aligned_routing:

Preference-aligned routing (Plano-Orchestrator)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Preference-aligned routing uses the `Plano-Orchestrator <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B>`_ model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.

- **Domain**: High-level topic of the request (e.g., legal, healthcare, programming).
- **Action**: What the user wants to do (e.g., summarize, generate code, translate).
- **Routing preferences**: Your mapping from (domain, action) to preferred models.

Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples **routing policy** (how to choose) from **model assignment** (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.

Configuration
^^^^^^^^^^^^^

To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:

.. code-block:: yaml
    :caption: Preference-Aligned Dynamic Routing Configuration

    listeners:
      egress_traffic:
        address: 0.0.0.0
        port: 12000
        message_format: openai
        timeout: 30s

    llm_providers:
      - model: openai/gpt-5.2
        access_key: $OPENAI_API_KEY
        default: true

      - model: openai/gpt-5
        access_key: $OPENAI_API_KEY
        routing_preferences:
          - name: code understanding
            description: understand and explain existing code snippets, functions, or libraries
          - name: complex reasoning
            description: deep analysis, mathematical problem solving, and logical reasoning

      - model: anthropic/claude-sonnet-4-5
        access_key: $ANTHROPIC_API_KEY
        routing_preferences:
          - name: creative writing
            description: creative content generation, storytelling, and writing assistance
          - name: code generation
            description: generating new code snippets, functions, or boilerplate based on user prompts

Client usage
^^^^^^^^^^^^

Clients can let the router decide or still specify aliases:

.. code-block:: python

    # Let Plano-Orchestrator choose based on content
    response = client.chat.completions.create(
        messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
        # No model specified - router will analyze and choose claude-sonnet-4-5
    )


Plano-Orchestrator
-------------------
Plano-Orchestrator is a **preference-based routing model** specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.

**Addressing Traditional Routing Limitations:**

**Human Preference Alignment**
Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.

**Flexible Model Integration**
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.

**Preference-Encoded Routing**
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.

To support effective routing, Plano-Orchestrator introduces two key concepts:

- **Domain** – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).

- **Action** – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).

Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.

In summary, Plano-Orchestrator demonstrates:

- **Structured Preference Routing**: Aligns prompt request with model strengths using explicit domain–action mappings.

- **Transparent and Controllable**: Makes routing decisions transparent and configurable, empowering users to customize system behavior.

- **Flexible and Adaptive**: Supports evolving user needs, model updates, and new domains/actions without retraining the router.

- **Production-Ready Performance**: Optimized for low-latency, high-throughput applications in multi-model environments.


Self-hosting Plano-Orchestrator
-------------------------------

By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either **Ollama** or **vLLM**.

Using Ollama (recommended for local development)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. **Install Ollama**

   Download and install from `ollama.ai <https://ollama.ai>`_.

2. **Pull and serve the routing model**

   .. code-block:: bash

       ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
       ollama serve

   This downloads the quantized GGUF model from HuggingFace and starts serving on ``http://localhost:11434``.

3. **Configure Plano to use local routing model**

   .. code-block:: yaml

       overrides:
         llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M

       model_providers:
         - model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
           base_url: http://localhost:11434

         - model: openai/gpt-5.2
           access_key: $OPENAI_API_KEY
           default: true

         - model: anthropic/claude-sonnet-4-5
           access_key: $ANTHROPIC_API_KEY
           routing_preferences:
             - name: creative writing
               description: creative content generation, storytelling, and writing assistance

4. **Verify the model is running**

   .. code-block:: bash

       curl http://localhost:11434/v1/models

   You should see ``Arch-Router-1.5B`` listed in the response.

Using vLLM (recommended for production / EC2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vLLM provides higher throughput and GPU optimizations suitable for production deployments.

1. **Install vLLM**

   .. code-block:: bash

       pip install vllm

2. **Download the model weights**

   The GGUF weights are downloaded automatically from HuggingFace on first use. To pre-download:

   .. code-block:: bash

       pip install huggingface_hub
       huggingface-cli download katanemo/Arch-Router-1.5B.gguf

3. **Start the vLLM server**

   After downloading, find the GGUF file and Jinja template in the HuggingFace cache:

   .. code-block:: bash

       # Find the downloaded files
       SNAPSHOT_DIR=$(ls -d ~/.cache/huggingface/hub/models--katanemo--Arch-Router-1.5B.gguf/snapshots/*/ | head -1)

       vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
           --host 0.0.0.0 \
           --port 10000 \
           --load-format gguf \
           --chat-template ${SNAPSHOT_DIR}template.jinja \
           --tokenizer katanemo/Arch-Router-1.5B \
           --served-model-name Plano-Orchestrator \
           --gpu-memory-utilization 0.3 \
           --tensor-parallel-size 1 \
           --enable-prefix-caching

4. **Configure Plano to use the vLLM endpoint**

   .. code-block:: yaml

       overrides:
         llm_routing_model: plano/Plano-Orchestrator

       model_providers:
         - model: plano/Plano-Orchestrator
           base_url: http://<your-server-ip>:10000

         - model: openai/gpt-5.2
           access_key: $OPENAI_API_KEY
           default: true

         - model: anthropic/claude-sonnet-4-5
           access_key: $ANTHROPIC_API_KEY
           routing_preferences:
             - name: creative writing
               description: creative content generation, storytelling, and writing assistance

5. **Verify the server is running**

   .. code-block:: bash

       curl http://localhost:10000/health
       curl http://localhost:10000/v1/models


Using vLLM on Kubernetes (GPU nodes)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
The ``demos/llm_routing/model_routing_service/`` directory includes ready-to-use manifests:

- ``vllm-deployment.yaml`` — Plano-Orchestrator served by vLLM, with an init container to download
  the model from HuggingFace
- ``plano-deployment.yaml`` — Plano proxy configured to use the in-cluster Plano-Orchestrator
- ``config_k8s.yaml`` — Plano config with ``llm_routing_model`` pointing at
  ``http://plano-orchestrator:10000`` instead of the default hosted endpoint

Key things to know before deploying:

- GPU nodes commonly have a ``nvidia.com/gpu:NoSchedule`` taint — the ``vllm-deployment.yaml``
  includes a matching toleration. The ``nvidia.com/gpu: "1"`` resource request is sufficient
  for scheduling in most clusters; a ``nodeSelector`` is optional and commented out in the
  manifest for cases where you need to pin to a specific GPU node pool.
- Model download takes ~1 minute; vLLM loads the model in ~1-2 minutes after that. The
  ``livenessProbe`` has a 180-second ``initialDelaySeconds`` to avoid premature restarts.
- The Plano config ConfigMap must use ``--from-file=plano_config.yaml=config_k8s.yaml`` with
  ``subPath`` in the Deployment — omitting ``subPath`` causes Kubernetes to mount a directory
  instead of a file.

For the canonical Plano Kubernetes deployment (ConfigMap, Secrets, Deployment YAML), see
:ref:`deployment`. For full step-by-step commands specific to this demo, see the
`demo README <https://github.com/katanemo/plano/tree/main/demos/llm_routing/model_routing_service/README.md>`_.


.. _model_affinity:

Model Affinity
--------------

In agentic loops — where a single user request triggers multiple LLM calls through tool use — Plano's router classifies each turn independently. Because successive prompts differ in intent (tool selection looks like code generation, reasoning about results looks like analysis), the router may select different models mid-session. This causes behavioral inconsistency and invalidates provider-side KV caches, increasing both latency and cost.

**Model affinity** pins the routing decision for the duration of a session. Send an ``X-Model-Affinity`` header with any string identifier (typically a UUID). The first request routes normally and caches the result. All subsequent requests with the same affinity ID skip routing and reuse the cached model.

.. code-block:: python

    import uuid
    from openai import OpenAI

    client = OpenAI(base_url="http://localhost:12000/v1", api_key="EMPTY")
    affinity_id = str(uuid.uuid4())

    # Every call in the loop uses the same header
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        tools=tools,
        extra_headers={"X-Model-Affinity": affinity_id},
    )

Without the header, routing runs fresh on every request — no behavior change for existing clients.

**Configuration:**

.. code-block:: yaml

    routing:
      session_ttl_seconds: 600    # How long affinity lasts (default: 10 min)
      session_max_entries: 10000  # Max cached sessions (upper limit: 10000)

To start a new routing decision (e.g., when the agent's task changes), generate a new affinity ID.

Session Cache Backends
~~~~~~~~~~~~~~~~~~~~~~

By default, Plano stores session affinity state in an in-process LRU cache. This works well for single-instance deployments, but sessions are not shared across replicas — each instance has its own independent cache.

For deployments with multiple Plano replicas (Kubernetes, Docker Compose with ``scale``, or any load-balanced setup), use Redis as the session cache backend. All replicas connect to the same Redis instance, so an affinity decision made by one replica is honoured by every other replica in the pool.

**In-memory (default)**

No configuration required. Sessions live only for the lifetime of the process and are lost on restart.

.. code-block:: yaml

    routing:
      session_ttl_seconds: 600    # How long affinity lasts (default: 10 min)
      session_max_entries: 10000  # LRU capacity (upper limit: 10000)

**Redis**

Requires a reachable Redis instance. The ``url`` field supports standard Redis URI syntax, including authentication (``redis://:password@host:6379``) and TLS (``rediss://host:6380``). Redis handles TTL expiry natively, so no periodic cleanup is needed.

.. code-block:: yaml

    routing:
      session_ttl_seconds: 600
      session_cache:
        type: redis
        url: redis://localhost:6379

.. note::
   When using Redis in a multi-tenant environment, construct the ``X-Model-Affinity`` header value to include a tenant identifier, for example ``{tenant_id}:{session_id}``. Plano stores each key under the internal namespace ``plano:affinity:{key}``, so tenant-scoped values avoid cross-tenant collisions without any additional configuration.

**Example: Kubernetes multi-replica deployment**

Deploy a Redis instance alongside your Plano pods and point all replicas at it:

.. code-block:: yaml

    routing:
      session_ttl_seconds: 600
      session_cache:
        type: redis
        url: redis://redis.plano.svc.cluster.local:6379

With this configuration, any replica that first receives a request for affinity ID ``abc-123`` caches the routing decision in Redis. Subsequent requests for ``abc-123`` — regardless of which replica they land on — retrieve the same pinned model.


Combining Routing Methods
-------------------------

You can combine static model selection with dynamic routing preferences for maximum flexibility:

.. code-block:: yaml
    :caption: Hybrid Routing Configuration

    llm_providers:
      - model: openai/gpt-5.2
        access_key: $OPENAI_API_KEY
        default: true

      - model: openai/gpt-5
        access_key: $OPENAI_API_KEY
        routing_preferences:
          - name: complex_reasoning
            description: deep analysis and complex problem solving

      - model: anthropic/claude-sonnet-4-5
        access_key: $ANTHROPIC_API_KEY
        routing_preferences:
          - name: creative_tasks
            description: creative writing and content generation

    model_aliases:
      # Model aliases - friendly names that map to actual provider names
      fast-model:
        target: gpt-5.2

      reasoning-model:
        target: gpt-5

      # Aliases that can also participate in dynamic routing
      creative-model:
        target: claude-sonnet-4-5

This configuration allows clients to:

1. **Use direct model selection**: ``model="fast-model"``
2. **Let the router decide**: No model specified, router analyzes content

Example Use Cases
-----------------
Here are common scenarios where Plano-Orchestrator excels:

- **Coding Tasks**: Distinguish between code generation requests ("write a Python function"), debugging needs ("fix this error"), and code optimization ("make this faster"), routing each to appropriately specialized models.

- **Content Processing Workflows**: Classify requests as summarization ("summarize this document"), translation ("translate to Spanish"), or analysis ("what are the key themes"), enabling targeted model selection.

- **Multi-Domain Applications**: Accurately identify whether requests fall into legal, healthcare, technical, or general domains, even when the subject matter isn't explicitly stated in the prompt.

- **Conversational Routing**: Track conversation context to identify when topics shift between domains or when the type of assistance needed changes mid-conversation.

Best practices
--------------
- **💡Consistent Naming:**  Route names should align with their descriptions.

  - ❌ Bad:
    ```
    {"name": "math", "description": "handle solving quadratic equations"}
    ```
  - ✅ Good:
    ```
    {"name": "quadratic_equation", "description": "solving quadratic equations"}
    ```

- **💡 Clear Usage Description:**  Make your route names and descriptions specific, unambiguous, and minimizing overlap between routes. The Router performs better when it can clearly distinguish between different types of requests.

  - ❌ Bad:
    ```
    {"name": "math", "description": "anything closely related to mathematics"}
    ```
  - ✅ Good:
    ```
    {"name": "math", "description": "solving, explaining math problems, concepts"}
    ```

- **💡Nouns Descriptor:** Preference-based routers perform better with noun-centric descriptors, as they offer more stable and semantically rich signals for matching.

- **💡Domain Inclusion:** for best user experience, you should always include a domain route. This helps the router fall back to domain when action is not confidently inferred.

Unsupported Features
--------------------

The following features are **not supported** by the Plano-Orchestrator routing model:

- **Multi-modality**: The model is not trained to process raw image or audio inputs. It can handle textual queries *about* these modalities (e.g., "generate an image of a cat"), but cannot interpret encoded multimedia data directly.

- **Function calling**: Plano-Orchestrator is designed for **semantic preference matching**, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.

- **System prompt dependency**: Plano-Orchestrator routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
+								.. _llm_router:
 								LLM Routing
 								==============================================================
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								With the rapid proliferation of large language models (LLMs) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models. Plano provides three distinct routing approaches to meet different use cases: :ref:`Model-based routing <model_based_routing>`, :ref:`Alias-based routing <alias_based_routing>`, and :ref:`Preference-aligned routing <preference_aligned_routing>`. This enables optimal performance, cost efficiency, and response quality by matching requests with the most suitable model from your available LLM fleet.
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								.. note::
 								  For details on supported model providers, configuration options, and client libraries, see :ref:`LLM Providers <llm_providers>`.
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								Routing Methods
 								---------------
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								.. _model_based_routing:
 								Model-based routing
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								~~~~~~~~~~~~~~~~~~~
 								Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								- Use provider-specific names like ``openai/gpt-5.2`` or ``anthropic/claude-sonnet-4-5``
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								- Provides full control and transparency over which model handles each request
 								- Ideal for production workloads where you want predictable routing behavior
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								Configuration
 								^^^^^^^^^^^^^
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								Configure your LLM providers with specific provider/model names:
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
 								.. code-block:: yaml
 								    :caption: Model-based Routing Configuration
 								    listeners:
 								      egress_traffic:
 								        address: 0.0.0.0
 								        port: 12000
 								        message_format: openai
 								        timeout: 30s
 								    llm_providers:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: openai/gpt-5.2
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $OPENAI_API_KEY
 								        default: true
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: openai/gpt-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $OPENAI_API_KEY
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: anthropic/claude-sonnet-4-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $ANTHROPIC_API_KEY
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								Client usage
 								^^^^^^^^^^^^
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								Clients specify exact models:
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								.. code-block:: python
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								    # Direct provider/model specification
 								    response = client.chat.completions.create(
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        model="openai/gpt-5.2",
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        messages=[{"role": "user", "content": "Hello!"}]
 								    )
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								    response = client.chat.completions.create(
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        model="anthropic/claude-sonnet-4-5",
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        messages=[{"role": "user", "content": "Write a story"}]
 								    )
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								.. _alias_based_routing:
 								Alias-based routing
 								~~~~~~~~~~~~~~~~~~~
 								Alias-based routing lets you create semantic model names that decouple your application from specific providers:
 								- Use meaningful names like ``fast-model``, ``reasoning-model``, or ``plano.summarize.v1`` (see :ref:`model_aliases`)
 								- Maps semantic names to underlying provider models for easier experimentation and provider switching
 								- Ideal for applications that want abstraction from specific model names while maintaining control
 								Configuration
 								^^^^^^^^^^^^^
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								Configure semantic aliases that map to underlying models:
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
 								.. code-block:: yaml
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								    :caption: Alias-based Routing Configuration
 								    listeners:
 								      egress_traffic:
 								        address: 0.0.0.0
 								        port: 12000
 								        message_format: openai
 								        timeout: 30s
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								    llm_providers:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: openai/gpt-5.2
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $OPENAI_API_KEY
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: openai/gpt-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $OPENAI_API_KEY
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: anthropic/claude-sonnet-4-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $ANTHROPIC_API_KEY
 								    model_aliases:
 								      # Model aliases - friendly names that map to actual provider names
 								      fast-model:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        target: gpt-5.2
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
 								      reasoning-model:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        target: gpt-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
 								      creative-model:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        target: claude-sonnet-4-5
 								Client usage
 								^^^^^^^^^^^^
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
 								Clients use semantic names:
 								.. code-block:: python
 								    # Using semantic aliases
 								    response = client.chat.completions.create(
 								        model="fast-model",  # Routes to best available fast model
 								        messages=[{"role": "user", "content": "Quick summary please"}]
 								    )
 								    response = client.chat.completions.create(
 								        model="reasoning-model",  # Routes to best reasoning model
 								        messages=[{"role": "user", "content": "Solve this complex problem"}]
 								    )
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								.. _preference_aligned_routing:
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								Preference-aligned routing (Plano-Orchestrator)
 								~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								Preference-aligned routing uses the `Plano-Orchestrator <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B>`_ model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
 								- **Domain**: High-level topic of the request (e.g., legal, healthcare, programming).
 								- **Action**: What the user wants to do (e.g., summarize, generate code, translate).
 								- **Routing preferences**: Your mapping from (domain, action) to preferred models.
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples **routing policy** (how to choose) from **model assignment** (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
 								Configuration
 								^^^^^^^^^^^^^
 								To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
 								.. code-block:: yaml
 								    :caption: Preference-Aligned Dynamic Routing Configuration
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
 								    listeners:
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								      egress_traffic:
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
+								        address: 0.0.0.0
 								        port: 12000
 								        message_format: openai
 								        timeout: 30s
 								    llm_providers:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: openai/gpt-5.2
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $OPENAI_API_KEY
 								        default: true
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: openai/gpt-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $OPENAI_API_KEY
 								        routing_preferences:
 								          - name: code understanding
 								            description: understand and explain existing code snippets, functions, or libraries
 								          - name: complex reasoning
 								            description: deep analysis, mathematical problem solving, and logical reasoning
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: anthropic/claude-sonnet-4-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $ANTHROPIC_API_KEY
 								        routing_preferences:
 								          - name: creative writing
 								            description: creative content generation, storytelling, and writing assistance
 								          - name: code generation
 								            description: generating new code snippets, functions, or boilerplate based on user prompts
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								Client usage
 								^^^^^^^^^^^^
 								Clients can let the router decide or still specify aliases:
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
 								.. code-block:: python
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								    # Let Plano-Orchestrator choose based on content
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								    response = client.chat.completions.create(
 								        messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        # No model specified - router will analyze and choose claude-sonnet-4-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								    )
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								Plano-Orchestrator
 								-------------------
 								Plano-Orchestrator is a **preference-based routing model** specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
 								**Addressing Traditional Routing Limitations:**
 								**Human Preference Alignment**
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
 								**Flexible Model Integration**
 								The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
 								**Preference-Encoded Routing**
 								Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								To support effective routing, Plano-Orchestrator introduces two key concepts:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
 								- **Domain** – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
 								- **Action** – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								In summary, Plano-Orchestrator demonstrates:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
 								- **Structured Preference Routing**: Aligns prompt request with model strengths using explicit domain–action mappings.
 								- **Transparent and Controllable**: Makes routing decisions transparent and configurable, empowering users to customize system behavior.
 								- **Flexible and Adaptive**: Supports evolving user needs, model updates, and new domains/actions without retraining the router.
 								- **Production-Ready Performance**: Optimized for low-latency, high-throughput applications in multi-model environments.
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								Self-hosting Plano-Orchestrator
 								-------------------------------
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either **Ollama** or **vLLM**.
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
 								Using Ollama (recommended for local development)
 								~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 . **Install Ollama**
 								   Download and install from `ollama.ai <https://ollama.ai>`_.
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+. **Pull and serve the routing model**
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
 								   .. code-block:: bash
 								       ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
 								       ollama serve
 								   This downloads the quantized GGUF model from HuggingFace and starts serving on ``http://localhost:11434``.
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+. **Configure Plano to use local routing model**
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
 								   .. code-block:: yaml
-												Unified overrides for custom router and orchestrator models (#820)

* support configurable orchestrator model via orchestration config section

* add self-hosting docs and demo for Plano-Orchestrator

* list all Plano-Orchestrator model variants in docs

* use overrides for custom routing and orchestration model

* update docs

* update orchestrator model name

* rename arch provider to plano, use llm_routing_model and agent_orchestration_model

* regenerate rendered config reference
											
										
										
											2026-03-15 09:36:11 -07:00
+								       overrides:
 								         llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
 								       model_providers:
-												Unified overrides for custom router and orchestrator models (#820)

* support configurable orchestrator model via orchestration config section

* add self-hosting docs and demo for Plano-Orchestrator

* list all Plano-Orchestrator model variants in docs

* use overrides for custom routing and orchestration model

* update docs

* update orchestrator model name

* rename arch provider to plano, use llm_routing_model and agent_orchestration_model

* regenerate rendered config reference
											
										
										
											2026-03-15 09:36:11 -07:00
+								         - model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
+								           base_url: http://localhost:11434
 								         - model: openai/gpt-5.2
 								           access_key: $OPENAI_API_KEY
 								           default: true
 								         - model: anthropic/claude-sonnet-4-5
 								           access_key: $ANTHROPIC_API_KEY
 								           routing_preferences:
 								             - name: creative writing
 								               description: creative content generation, storytelling, and writing assistance
 . **Verify the model is running**
 								   .. code-block:: bash
 								       curl http://localhost:11434/v1/models
 								   You should see ``Arch-Router-1.5B`` listed in the response.
 								Using vLLM (recommended for production / EC2)
 								~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 								vLLM provides higher throughput and GPU optimizations suitable for production deployments.
 . **Install vLLM**
 								   .. code-block:: bash
 								       pip install vllm
 . **Download the model weights**
 								   The GGUF weights are downloaded automatically from HuggingFace on first use. To pre-download:
 								   .. code-block:: bash
 								       pip install huggingface_hub
 								       huggingface-cli download katanemo/Arch-Router-1.5B.gguf
 . **Start the vLLM server**
 								   After downloading, find the GGUF file and Jinja template in the HuggingFace cache:
 								   .. code-block:: bash
 								       # Find the downloaded files
 								       SNAPSHOT_DIR=$(ls -d ~/.cache/huggingface/hub/models--katanemo--Arch-Router-1.5B.gguf/snapshots/*/ | head -1)
 								       vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
 								           --host 0.0.0.0 \
 								           --port 10000 \
 								           --load-format gguf \
 								           --chat-template ${SNAPSHOT_DIR}template.jinja \
 								           --tokenizer katanemo/Arch-Router-1.5B \
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								           --served-model-name Plano-Orchestrator \
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
+								           --gpu-memory-utilization 0.3 \
 								           --tensor-parallel-size 1 \
 								           --enable-prefix-caching
 . **Configure Plano to use the vLLM endpoint**
 								   .. code-block:: yaml
-												Unified overrides for custom router and orchestrator models (#820)

* support configurable orchestrator model via orchestration config section

* add self-hosting docs and demo for Plano-Orchestrator

* list all Plano-Orchestrator model variants in docs

* use overrides for custom routing and orchestration model

* update docs

* update orchestrator model name

* rename arch provider to plano, use llm_routing_model and agent_orchestration_model

* regenerate rendered config reference
											
										
										
											2026-03-15 09:36:11 -07:00
+								       overrides:
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								         llm_routing_model: plano/Plano-Orchestrator
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
 								       model_providers:
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								         - model: plano/Plano-Orchestrator
-												add instructions on hosting arch-router locally (#819)
											
										
										
											2026-03-11 15:28:50 -07:00
+								           base_url: http://<your-server-ip>:10000
 								         - model: openai/gpt-5.2
 								           access_key: $OPENAI_API_KEY
 								           default: true
 								         - model: anthropic/claude-sonnet-4-5
 								           access_key: $ANTHROPIC_API_KEY
 								           routing_preferences:
 								             - name: creative writing
 								               description: creative content generation, storytelling, and writing assistance
 . **Verify the server is running**
 								   .. code-block:: bash
 								       curl http://localhost:10000/health
 								       curl http://localhost:10000/v1/models
-												add k8s deployment manifests and docs for self-hosted Arch-Router (#831)
											
										
										
											2026-03-16 12:05:30 -07:00
+								Using vLLM on Kubernetes (GPU nodes)
 								~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
-												add k8s deployment manifests and docs for self-hosted Arch-Router (#831)
											
										
										
											2026-03-16 12:05:30 -07:00
+								The ``demos/llm_routing/model_routing_service/`` directory includes ready-to-use manifests:
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								- ``vllm-deployment.yaml`` — Plano-Orchestrator served by vLLM, with an init container to download
-												add k8s deployment manifests and docs for self-hosted Arch-Router (#831)
											
										
										
											2026-03-16 12:05:30 -07:00
+								  the model from HuggingFace
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								- ``plano-deployment.yaml`` — Plano proxy configured to use the in-cluster Plano-Orchestrator
-												add k8s deployment manifests and docs for self-hosted Arch-Router (#831)
											
										
										
											2026-03-16 12:05:30 -07:00
+								- ``config_k8s.yaml`` — Plano config with ``llm_routing_model`` pointing at
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								  ``http://plano-orchestrator:10000`` instead of the default hosted endpoint
-												add k8s deployment manifests and docs for self-hosted Arch-Router (#831)
											
										
										
											2026-03-16 12:05:30 -07:00
 								Key things to know before deploying:
 								- GPU nodes commonly have a ``nvidia.com/gpu:NoSchedule`` taint — the ``vllm-deployment.yaml``
 								  includes a matching toleration. The ``nvidia.com/gpu: "1"`` resource request is sufficient
 								  for scheduling in most clusters; a ``nodeSelector`` is optional and commented out in the
 								  manifest for cases where you need to pin to a specific GPU node pool.
 								- Model download takes ~1 minute; vLLM loads the model in ~1-2 minutes after that. The
 								  ``livenessProbe`` has a 180-second ``initialDelaySeconds`` to avoid premature restarts.
 								- The Plano config ConfigMap must use ``--from-file=plano_config.yaml=config_k8s.yaml`` with
 								  ``subPath`` in the Deployment — omitting ``subPath`` causes Kubernetes to mount a directory
 								  instead of a file.
 								For the canonical Plano Kubernetes deployment (ConfigMap, Secrets, Deployment YAML), see
 								:ref:`deployment`. For full step-by-step commands specific to this demo, see the
 								`demo README <https://github.com/katanemo/plano/tree/main/demos/llm_routing/model_routing_service/README.md>`_.
-												Model affinity for consistent model selection in agentic loops (#827)
											
										
										
											2026-04-08 17:32:02 -07:00
+								.. _model_affinity:
 								Model Affinity
 								--------------
 								In agentic loops — where a single user request triggers multiple LLM calls through tool use — Plano's router classifies each turn independently. Because successive prompts differ in intent (tool selection looks like code generation, reasoning about results looks like analysis), the router may select different models mid-session. This causes behavioral inconsistency and invalidates provider-side KV caches, increasing both latency and cost.
 								**Model affinity** pins the routing decision for the duration of a session. Send an ``X-Model-Affinity`` header with any string identifier (typically a UUID). The first request routes normally and caches the result. All subsequent requests with the same affinity ID skip routing and reuse the cached model.
 								.. code-block:: python
 								    import uuid
 								    from openai import OpenAI
 								    client = OpenAI(base_url="http://localhost:12000/v1", api_key="EMPTY")
 								    affinity_id = str(uuid.uuid4())
 								    # Every call in the loop uses the same header
 								    response = client.chat.completions.create(
 								        model="gpt-4o-mini",
 								        messages=messages,
 								        tools=tools,
 								        extra_headers={"X-Model-Affinity": affinity_id},
 								    )
 								Without the header, routing runs fresh on every request — no behavior change for existing clients.
 								**Configuration:**
 								.. code-block:: yaml
 								    routing:
 								      session_ttl_seconds: 600    # How long affinity lasts (default: 10 min)
 								      session_max_entries: 10000  # Max cached sessions (upper limit: 10000)
 								To start a new routing decision (e.g., when the agent's task changes), generate a new affinity ID.
-												Redis-backed session cache for cross-replica model affinity (#879)

* add pluggable session cache with Redis backend

* add Redis session affinity demos (Docker Compose and Kubernetes)

* address PR review feedback on session cache

* document Redis session cache backend for model affinity

* sync rendered config reference with session_cache addition

* add tenant-scoped Redis session cache keys and remove dead log_affinity_hit

- Add tenant_header to SessionCacheConfig; when set, cache keys are scoped
  as plano:affinity:{tenant_id}:{session_id} for multi-tenant isolation
- Thread tenant_id through RouterService, routing_service, and llm handlers
- Use Cow<'_, str> in session_key to avoid allocation when no tenant is set
- Remove unused log_affinity_hit (logging was already inlined at call sites)

* remove session_affinity_redis and session_affinity_redis_k8s demos
											
										
										
											2026-04-13 19:30:47 -07:00
+								Session Cache Backends
 								~~~~~~~~~~~~~~~~~~~~~~
 								By default, Plano stores session affinity state in an in-process LRU cache. This works well for single-instance deployments, but sessions are not shared across replicas — each instance has its own independent cache.
 								For deployments with multiple Plano replicas (Kubernetes, Docker Compose with ``scale``, or any load-balanced setup), use Redis as the session cache backend. All replicas connect to the same Redis instance, so an affinity decision made by one replica is honoured by every other replica in the pool.
 								**In-memory (default)**
 								No configuration required. Sessions live only for the lifetime of the process and are lost on restart.
 								.. code-block:: yaml
 								    routing:
 								      session_ttl_seconds: 600    # How long affinity lasts (default: 10 min)
 								      session_max_entries: 10000  # LRU capacity (upper limit: 10000)
 								**Redis**
 								Requires a reachable Redis instance. The ``url`` field supports standard Redis URI syntax, including authentication (``redis://:password@host:6379``) and TLS (``rediss://host:6380``). Redis handles TTL expiry natively, so no periodic cleanup is needed.
 								.. code-block:: yaml
 								    routing:
 								      session_ttl_seconds: 600
 								      session_cache:
 								        type: redis
 								        url: redis://localhost:6379
 								.. note::
 								   When using Redis in a multi-tenant environment, construct the ``X-Model-Affinity`` header value to include a tenant identifier, for example ``{tenant_id}:{session_id}``. Plano stores each key under the internal namespace ``plano:affinity:{key}``, so tenant-scoped values avoid cross-tenant collisions without any additional configuration.
 								**Example: Kubernetes multi-replica deployment**
 								Deploy a Redis instance alongside your Plano pods and point all replicas at it:
 								.. code-block:: yaml
 								    routing:
 								      session_ttl_seconds: 600
 								      session_cache:
 								        type: redis
 								        url: redis://redis.plano.svc.cluster.local:6379
 								With this configuration, any replica that first receives a request for affinity ID ``abc-123`` caches the routing decision in Redis. Subsequent requests for ``abc-123`` — regardless of which replica they land on — retrieve the same pinned model.
-												Model affinity for consistent model selection in agentic loops (#827)
											
										
										
											2026-04-08 17:32:02 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								Combining Routing Methods
 								-------------------------
 								You can combine static model selection with dynamic routing preferences for maximum flexibility:
 								.. code-block:: yaml
 								    :caption: Hybrid Routing Configuration
 								    llm_providers:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: openai/gpt-5.2
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $OPENAI_API_KEY
 								        default: true
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: openai/gpt-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $OPENAI_API_KEY
 								        routing_preferences:
 								          - name: complex_reasoning
 								            description: deep analysis and complex problem solving
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								      - model: anthropic/claude-sonnet-4-5
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								        access_key: $ANTHROPIC_API_KEY
 								        routing_preferences:
 								          - name: creative_tasks
 								            description: creative writing and content generation
 								    model_aliases:
 								      # Model aliases - friendly names that map to actual provider names
 								      fast-model:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        target: gpt-5.2
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
 								      reasoning-model:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        target: gpt-5
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								      # Aliases that can also participate in dynamic routing
 								      creative-model:
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								        target: claude-sonnet-4-5
-												better model names (#517)


											
										
										
											2025-07-11 16:42:16 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+								This configuration allows clients to:
-												better model names (#517)


											
										
										
											2025-07-11 16:42:16 -07:00
-												Salmanap/fix docs new providers model alias (#571)

* fixed docs and added ollama as a first-class LLM provider

* matching the LLM routing section on the README.md to the docs

* updated the section on preference-based routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local>
											
										
										
											2025-09-19 10:19:57 -07:00
+. **Use direct model selection**: ``model="fast-model"``
 . **Let the router decide**: No model specified, router analyzes content
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
 								Example Use Cases
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								-----------------
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								Here are common scenarios where Plano-Orchestrator excels:
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
 								- **Coding Tasks**: Distinguish between code generation requests ("write a Python function"), debugging needs ("fix this error"), and code optimization ("make this faster"), routing each to appropriately specialized models.
 								- **Content Processing Workflows**: Classify requests as summarization ("summarize this document"), translation ("translate to Spanish"), or analysis ("what are the key themes"), enabling targeted model selection.
 								- **Multi-Domain Applications**: Accurately identify whether requests fall into legal, healthcare, technical, or general domains, even when the subject matter isn't explicitly stated in the prompt.
 								- **Conversational Routing**: Track conversation context to identify when topics shift between domains or when the type of assistance needed changes mid-conversation.
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								Best practices
 								--------------
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
+								- **💡Consistent Naming:**  Route names should align with their descriptions.
 								  - ❌ Bad:
 								    ```
 								    {"name": "math", "description": "handle solving quadratic equations"}
 								    ```
 								  - ✅ Good:
 								    ```
 								    {"name": "quadratic_equation", "description": "solving quadratic equations"}
 								    ```
 								- **💡 Clear Usage Description:**  Make your route names and descriptions specific, unambiguous, and minimizing overlap between routes. The Router performs better when it can clearly distinguish between different types of requests.
 								  - ❌ Bad:
 								    ```
 								    {"name": "math", "description": "anything closely related to mathematics"}
 								    ```
 								  - ✅ Good:
 								    ```
 								    {"name": "math", "description": "solving, explaining math problems, concepts"}
 								    ```
 								- **💡Nouns Descriptor:** Preference-based routers perform better with noun-centric descriptors, as they offer more stable and semantically rich signals for matching.
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								- **💡Domain Inclusion:** for best user experience, you should always include a domain route. This helps the router fall back to domain when action is not confidently inferred.
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								Unsupported Features
 								--------------------
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								The following features are **not supported** by the Plano-Orchestrator routing model:
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												Update docs to Plano (#639)


											
										
										
											2025-12-23 17:14:50 -08:00
+								- **Multi-modality**: The model is not trained to process raw image or audio inputs. It can handle textual queries *about* these modalities (e.g., "generate an image of a cat"), but cannot interpret encoded multimedia data directly.
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								- **Function calling**: Plano-Orchestrator is designed for **semantic preference matching**, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
-												LLM Router api doc (#492)

* Create router.rst

* add doc

* update api

* update api

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/source/guides/llm_router.rst

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix whitespace

* Update llm_router.rst

* remove faeture and align examples

* remove faeture and align examples

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Shuguang Chen <54548843+nehcgs@users.noreply.github.com>
											
										
										
											2025-05-30 16:15:26 -07:00
-												use plano-orchestrator for LLM routing, remove arch-router (#886)
											
										
										
											2026-04-15 16:41:42 -07:00
+								- **System prompt dependency**: Plano-Orchestrator routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.