several improvements to docs. TODOS: Tracing and Filters

2026-06-17 15:25:17 +02:00 · 2025-12-21 22:10:32 -08:00 · 2025-12-21 22:10:32 -08:00 · e0404d305c
commit e0404d305c
parent 1d6a1613a2
9 changed files with 297 additions and 301 deletions
--- a/README.md
+++ b/README.md
@ -3,8 +3,8 @@
 </div>
 <div align="center">

- _Plano is a models-native proxy and data plane for agents._<br><br>
- Plano pulls out the rote plumbing work and decouples you from brittle framework abstractions, centralizing what shouldn’t be bespoke in every codebase - like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for moderation, and smart LLM routing APIs for UX and DX agility. Use any language or AI framework, and deliver agents faster to production.
+ _Plano is a models-native proxy server and data plane for agents._<br><br>
+ Plano pulls out the rote plumbing work and decouples you from brittle framework abstractions, centralizing what shouldn’t be bespoke in every codebase - like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for UX and DX agility. Use any language or AI framework, and deliver agents faster to production.


 [Quickstart](#Quickstart) •
@ -26,7 +26,7 @@ Building agentic demos is easy. Shipping agentic applications safely, reliably,

 Plano solves this by moving core delivery concerns into a unified, out-of-process dataplane.

- **🚦 Orchestration:** Low-latency orchestration between agents, and add new agents without changing app code
+- **🚦 Orchestration:** Low-latency orchestration between agents; add new agents without changing application code
 - **🔗 Model Agility:** Route [by model name, alias (semantic names) or automatically via preferences](#use-plano-as-a-llm-router)
 - **🕵 Agentic Signals&trade;:** Zero-code capture of [behavior signals](#observability) plus OTEL traces/metrics across every agent.
 - **🛡️ Moderation & Memory Hooks:** Build jailbreak protection, add moderation policies and memory consistently via [Filter Chains](https://docs.planoai.dev/concepts/filter_chain.html).
@ -75,8 +75,8 @@ $ pip install plano==0.4.0
 ### Use Plano as a LLM Router
 Plano supports multiple powerful routing strategies for LLMs. [Model-based routing](https://docs.arch.com/guides/llm_router.html#model-based-routing) gives you direct control over specific models and supports 11+ LLM providers including OpenAI, Anthropic, DeepSeek, Mistral, Groq, and more. [Alias-based routing](https://docs.arch.com/guides/llm_router.html#alias-based-routing) lets you create semantic model names that decouple your application code from specific providers, making it easy to experiment with different models or handle provider changes without refactoring. For full configuration examples and code walkthroughs, see our [routing guides](https://docs.arch.com/guides/llm_router.html).

-#### Preference-aligned Routing
-Preference-aligned routing provides intelligent, dynamic model selection based on natural language descriptions of tasks and preferences. Instead of hardcoded routing logic, you describe what each model is good at using plain English.
+#### Policy-based Routing
+Policy-based routing provides deterministic constructs to achieve automatic routing. intelligent, dynamic model selection based on natural language descriptions of tasks and preferences. Instead of hardcoded routing logic, you describe what each model is good at using plain English.

 ```yaml
 version: v0.1.0
@ -106,8 +106,6 @@ llm_providers:
        description: analyzing existing code for bugs, improvements, and optimization
 ```

-
-
 Plano uses a lightweight 1.5B autoregressive model to intelligently map user prompts to these preferences, automatically selecting the best model for each request. This approach adapts to intent drift, supports multi-turn conversations, and avoids brittle embedding-based classifiers or manual if/else chains. No retraining required when adding models or updating policies — routing is governed entirely by human-readable rules.

 **Learn More**: Check our [documentation](https://docs.plano.com/concepts/llm_providers/llm_providers.html) for comprehensive provider setup guides and routing strategies. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper:
@ -120,11 +118,78 @@ Plano uses a lightweight 1.5B autoregressive model to intelligently map user pro

 ### Build Agentic Apps with Plano

-In following quickstart we will show you how easy it is to build AI agent with Plano gateway. We will build a currency exchange agent using following simple steps. For this demo we will use `https://api.frankfurter.dev/` to fetch latest price for currencies and assume USD as base currency.
+Plano helps you build agentic applications in two complementary ways:

-#### Step 1. Create plano config file
+- **Orchestrate agents**: Let Plano decide which agent or LLM should handle each request and in what sequence.
+- **Call deterministic backends**: Use prompt targets to turn natural-language prompts into structured, validated API calls.

-Create `plano_config.yaml` file with following content,
+You focus on product logic (agents and APIs) while Plano handles routing, parameter extraction, and wiring. The full examples used here are available in the [`plano-quickstart` repository](https://github.com/plano-ai/plano-quickstart).
+
+#### Build agents with Plano orchestration
+
+Agents are where your business logic lives (the "inner loop"). Plano takes care of the "outer loop"—routing, sequencing, and managing calls across agents and LLMs. In this quick example, we show a simplified **Travel Assistant** that routes between a `flight_agent` and a `hotel_agent`.
+
+##### Step 1. Minimal orchestration config
+
+Create a `plano_config.yaml` that wires Plano-Orchestrator to your agents:
+
+```yaml
+version: v0.1.0
+
+agents:
+  - id: flight_agent
+    url: http://host.docker.internal:10520  # your flights service
+  - id: hotel_agent
+    url: http://host.docker.internal:10530  # your hotels service
+
+model_providers:
+  - model: openai/gpt-4o
+    access_key: $OPENAI_API_KEY
+
+listeners:
+  - type: agent
+    name: travel_assistant
+    port: 8001
+    router: plano_orchestrator_v1
+    agents:
+      - id: flight_agent
+        description: Search for flights and provide flight status.
+      - id: hotel_agent
+        description: Find hotels and check availability.
+
+tracing:
+  random_sampling: 100
+```
+
+##### Step 2. Start your agents and Plano
+
+Run your `flight_agent` and `hotel_agent` services (see the [Orchestration guide](https://docs.planoai.dev/guides/orchestration.html) for a full Travel Booking example), then start Plano with the config above:
+
+```console
+$ plano up plano_config.yaml
+```
+
+Plano will start the orchestrator and expose an agent listener on port `8001`.
+
+##### Step 3. Send a prompt and let Plano route
+
+Send a request to Plano using the OpenAI-compatible chat completions API—the orchestrator will analyze the prompt and route it to the right agent based on intent:
+
+```bash
+$ curl --header 'Content-Type: application/json' \
+  --data '{"messages": [{"role": "user","content": "Find me flights from SFO to JFK tomorrow"}], "model": "openai/gpt-4o"}' \
+  http://localhost:8001/v1/chat/completions
+```
+
+You can then ask a follow-up like "Also book me a hotel near JFK" and Plano-Orchestrator will route to `hotel_agent`—your agents stay focused on business logic while Plano handles routing.
+
+#### Deterministic API calls with prompt targets
+
+Next, we'll show Plano's deterministic API calling using a single prompt target. We'll build a currency exchange backend powered by `https://api.frankfurter.dev/`, assuming USD as the base currency.
+
+##### Step 1. Create plano config file
+
+Create `plano_config.yaml` with the following content:

 ```yaml
 version: v0.1.0
@ -170,10 +235,9 @@ endpoints:
    protocol: https
 ```

-#### Step 2. Start plano gateway with currency conversion config
+##### Step 2. Start Plano with currency conversion config

 ```sh
-
 $ plano up plano_config.yaml
 2024-12-05 16:56:27,979 - cli.main - INFO - Starting plano cli version: 0.4.0
 2024-12-05 16:56:28,485 - cli.utils - INFO - Schema validation successful!
@ -181,13 +245,13 @@ $ plano up plano_config.yaml
 2024-12-05 16:56:51,647 - cli.core - INFO - Container is healthy!
 ```

-Once the gateway is up you can start interacting with at port 10000 using openai chat completion API.
+Once the gateway is up you can start interacting with it at port `10000` using the OpenAI chat completion API.

-Some of the sample queries you can ask could be `what is currency rate for gbp?` or `show me list of currencies for conversion`.
+Some sample queries you can ask include: `what is currency rate for gbp?` or `show me list of currencies for conversion`.

-#### Step 3. Interacting with gateway using curl command
+##### Step 3. Interact with the gateway using curl

-Here is a sample curl command you can use to interact,
+Here is a sample curl command you can use to interact:

 ```bash
 $ curl --header 'Content-Type: application/json' \
@ -195,10 +259,9 @@ $ curl --header 'Content-Type: application/json' \
  http://localhost:10000/v1/chat/completions | jq ".choices[0].message.content"

 "As of the date provided in your context, December 5, 2024, the exchange rate for GBP (British Pound) from USD (United States Dollar) is 0.78558. This means that 1 USD is equivalent to 0.78558 GBP."
-
 ```

-And to get list of supported currencies,
+And to get the list of supported currencies:

 ```bash
 $ curl --header 'Content-Type: application/json' \
@ -206,7 +269,6 @@ $ curl --header 'Content-Type: application/json' \
  http://localhost:10000/v1/chat/completions | jq ".choices[0].message.content"

 "Here is a list of the currencies that are supported for conversion from USD, along with their symbols:\n\n1. AUD - Australian Dollar\n2. BGN - Bulgarian Lev\n3. BRL - Brazilian Real\n4. CAD - Canadian Dollar\n5. CHF - Swiss Franc\n6. CNY - Chinese Renminbi Yuan\n7. CZK - Czech Koruna\n8. DKK - Danish Krone\n9. EUR - Euro\n10. GBP - British Pound\n11. HKD - Hong Kong Dollar\n12. HUF - Hungarian Forint\n13. IDR - Indonesian Rupiah\n14. ILS - Israeli New Sheqel\n15. INR - Indian Rupee\n16. ISK - Icelandic Króna\n17. JPY - Japanese Yen\n18. KRW - South Korean Won\n19. MXN - Mexican Peso\n20. MYR - Malaysian Ringgit\n21. NOK - Norwegian Krone\n22. NZD - New Zealand Dollar\n23. PHP - Philippine Peso\n24. PLN - Polish Złoty\n25. RON - Romanian Leu\n26. SEK - Swedish Krona\n27. SGD - Singapore Dollar\n28. THB - Thai Baht\n29. TRY - Turkish Lira\n30. USD - United States Dollar\n31. ZAR - South African Rand\n\nIf you want to convert USD to any of these currencies, you can select the one you are interested in."
-
 ```

 ## [Observability](https://docs.plano.com/guides/observability/observability.html)
--- a/docs/source/concepts/agents.rst
+++ b/docs/source/concepts/agents.rst
@ -48,7 +48,7 @@ Your agent controls:
   * **Rich agentic signals**: Automatic capture of function calls, tool usage, reasoning steps, and model behavior—surfaced through traces and metrics without instrumenting your agent code.
   * **Smart model routing**: Leverage :ref:`model-based, alias-based, or preference-aligned routing <llm_providers>` to dynamically select the best model for each task based on cost, performance, or custom policies.

-   By routing LLM calls through the Model Proxy, your agents remain decoupled from specific providers and can benefit from centralized policy enforcement, observability, and intelligent routing—all managed in the outer loop. For a step-by-step guide, see :ref:`implementing_routing` in the LLM Router guide.
+   By routing LLM calls through the Model Proxy, your agents remain decoupled from specific providers and can benefit from centralized policy enforcement, observability, and intelligent routing—all managed in the outer loop. For a step-by-step guide, see :ref:`llm_router` in the LLM Router guide.

 Outer Loop (Orchestration)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
--- a/docs/source/concepts/llm_providers/llm_providers.rst
+++ b/docs/source/concepts/llm_providers/llm_providers.rst
@ -2,16 +2,16 @@

 Model (LLM) Providers
 =====================
-**LLM Providers** are a top-level primitive in Plano, helping developers centrally define, secure, observe,
-and manage the usage of their LLMs. Plano builds on Envoy's reliable `cluster subsystem <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/upstream/cluster_manager>`_
-to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms,
-ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly
-switch between LLM providers or upgrade LLM versions, simplifying the integration and scaling of LLMs
-across applications.
+**Model Providers** are a top-level primitive in Plano, helping developers centrally define, secure, observe,
+and manage the usage of their models. Plano builds on Envoy's reliable `cluster subsystem <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/upstream/cluster_manager>`_ to manage egress traffic to models, which includes intelligent routing, retry and fail-over mechanisms,
+ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly switch between model providers or upgrade model versions, simplifying the integration and scaling of models across applications.

-Today, we are enabling you to connect to 11+ different AI providers through a unified interface with advanced routing and management capabilities.
+Today, we are enable you to connect to 15+ different AI providers through a unified interface with advanced routing and management capabilities.
 Whether you're using OpenAI, Anthropic, Azure OpenAI, local Ollama models, or any OpenAI-compatible provider, Plano provides seamless integration with enterprise-grade features.

+.. note::
+    Please refer to the quickstart guide :ref:`here <llm_routing_quickstart>` to configure and use LLM providers via common client libraries like OpenAI and Anthropic Python SDKs, or via direct HTTP/cURL requests.
+
 Core Capabilities
 -----------------

--- a/docs/source/concepts/llm_providers/supported_providers.rst
+++ b/docs/source/concepts/llm_providers/supported_providers.rst
@ -8,6 +8,9 @@ Plano provides first-class support for multiple LLM providers through native int
 .. note::
   **Model Support:** Plano supports all chat models from each provider, not just the examples shown in this guide. The configurations below demonstrate common models for reference, but you can use any chat model available from your chosen provider.

+   Please refer to the quuickstart guide :ref:`here <llm_routing_quickstart>` to configure and use LLM providers via common client libraries like OpenAI and Anthropic Python SDKs, or via direct HTTP/cURL requests.
+
+
 Configuration Structure
 -----------------------

--- a/docs/source/get_started/overview.rst
+++ b/docs/source/get_started/overview.rst
@ -2,7 +2,7 @@

 Overview
 ========
-`Plano <https://github.com/katanemo/plano>`_ is delivery infrastructure for agentic apps —a models-native proxy and dataplane that helps you ship agents to production faster and operate them reliably.
+`Plano <https://github.com/katanemo/plano>`_ is delivery infrastructure for agentic apps. A models-native proxy server and data plane designed to help you build agents faster, and deliver them reliably to production.

 Plano pulls out the rote plumbing work (the “hidden AI middleware”) and decouples you from brittle, ever‑changing framework abstractions. It centralizes what shouldn’t be bespoke in every codebase like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for UX and DX agility. Use any language or AI framework, and ship agents to production faster with Plano.

--- a/docs/source/get_started/quickstart.rst
+++ b/docs/source/get_started/quickstart.rst
@ -3,8 +3,16 @@
 Quickstart
 ==========

-Follow this guide to learn how to quickly set up Plano and integrate it into your generative AI applications.
+Follow this guide to learn how to quickly set up Plano and integrate it into your generative AI applications. You can:

+- :ref:`Build agents <quickstart_agents>` for multi-step workflows (e.g., travel assistants with flights and hotels).
+- :ref:`Call deterministic APIs via prompt targets <quickstart_prompt_targets>` to turn instructions directly into function calls.
+- :ref:`Use Plano as a model proxy (Gateway) <llm_routing_quickstart>` to standardize access to multiple LLM providers.
+
+.. note::
+  This quickstart assumes basic familiarity with agents and prompt targets from the Concepts section. For background, see :ref:`Agents <agents>` and :ref:`Prompt Target <prompt_target>`.
+
+  The full agent and backend API implementations used here are available in the `plano-quickstart repository <https://github.com/plano-ai/plano-quickstart>`_. This guide focuses on wiring and configuring Plano (orchestration, prompt targets, and the model proxy), not application code.

 Prerequisites
 -------------
@ -30,10 +38,92 @@ Plano's CLI allows you to manage and interact with the Plano efficiently. To ins
 Build Agentic Apps with Plano
 -----------------------------

-In the following quickstart, we will show you how easy it is to build an AI agent with the Plano. We will build a currency exchange agent using the following simple steps. For this demo, we will use `https://api.frankfurter.dev/` to fetch the latest prices for currencies and assume USD as the base currency.
+Plano helps you build agentic applications in two complementary ways:
+
+* **Orchestrate agents**: Let Plano decide which agent or LLM should handle each request and in what sequence.
+* **Call deterministic backends**: Use prompt targets to turn natural-language prompts into structured, validated API calls.
+
+.. _quickstart_agents:
+
+Building agents with Plano orchestration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Agents are where your business logic lives (the "inner loop"). Plano takes care of the "outer loop"—routing, sequencing, and managing calls across agents and LLMs.
+
+At a high level, building agents with Plano looks like this:
+
+1. **Implement your agent** in your framework of choice (Python, JS/TS, etc.), exposing it as an HTTP service.
+2. **Route LLM calls through Plano's Model Proxy**, so all models share a consistent interface and observability.
+3. **Configure Plano to orchestrate**: define which agent(s) can handle which kinds of prompts, and let Plano decide when to call an agent vs. an LLM.
+
+This quickstart uses a simplified version of the Travel Booking Assistant; for the full multi-agent walkthrough, see :ref:`Orchestration <agent_routing>`.
+
+Step 1. Minimal orchestration config
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Here is a minimal configuration that wires Plano-Orchestrator to two HTTP services: one for flights and one for hotels.
+
+.. code-block:: yaml
+
+  version: v0.1.0
+
+  agents:
+    - id: flight_agent
+      url: http://host.docker.internal:10520  # your flights service
+    - id: hotel_agent
+      url: http://host.docker.internal:10530  # your hotels service
+
+  model_providers:
+    - model: openai/gpt-4o
+      access_key: $OPENAI_API_KEY
+
+  listeners:
+    - type: agent
+      name: travel_assistant
+      port: 8001
+      router: plano_orchestrator_v1
+      agents:
+        - id: flight_agent
+          description: Search for flights and provide flight status.
+        - id: hotel_agent
+          description: Find hotels and check availability.
+
+  tracing:
+    random_sampling: 100
+
+Step 2. Start your agents and Plano
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Run your ``flight_agent`` and ``hotel_agent`` services (see :ref:`Orchestration <agent_routing>` for a full Travel Booking example), then start Plano with the config above:
+
+.. code-block:: console
+
+  $ plano up plano_config.yaml
+
+Plano will start the orchestrator and expose an agent listener on port ``8001``.
+
+Step 3. Send a prompt and let Plano route
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Now send a request to Plano using the OpenAI-compatible chat completions API—the orchestrator will analyze the prompt and route it to the right agent based on intent:
+
+.. code-block:: bash
+
+  $ curl --header 'Content-Type: application/json' \
+    --data '{"messages": [{"role": "user","content": "Find me flights from SFO to JFK tomorrow"}], "model": "openai/gpt-4o"}' \
+    http://localhost:8001/v1/chat/completions
+
+You can then ask a follow-up like "Also book me a hotel near JFK" and Plano-Orchestrator will route to ``hotel_agent``—your agents stay focused on business logic while Plano handles routing.
+
+.. _quickstart_prompt_targets:
+
+Deterministic API calls with prompt targets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Next, we'll show Plano's deterministic API calling using a single prompt target. We'll build a currency exchange backend powered by `https://api.frankfurter.dev/`, assuming USD as the base currency.

 Step 1. Create plano config file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Create ``plano_config.yaml`` file with the following content:

@ -82,7 +172,7 @@ Create ``plano_config.yaml`` file with the following content:
       protocol: https

 Step 2. Start plano with currency conversion config
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. code-block:: sh

@ -99,7 +189,7 @@ Once the gateway is up, you can start interacting with it at port 10000 using th
 Some sample queries you can ask include: ``what is currency rate for gbp?`` or ``show me list of currencies for conversion``.

 Step 3. Interacting with gateway using curl command
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Here is a sample curl command you can use to interact:

@ -122,8 +212,10 @@ And to get the list of supported currencies:
   "Here is a list of the currencies that are supported for conversion from USD, along with their symbols:\n\n1. AUD - Australian Dollar\n2. BGN - Bulgarian Lev\n3. BRL - Brazilian Real\n4. CAD - Canadian Dollar\n5. CHF - Swiss Franc\n6. CNY - Chinese Renminbi Yuan\n7. CZK - Czech Koruna\n8. DKK - Danish Krone\n9. EUR - Euro\n10. GBP - British Pound\n11. HKD - Hong Kong Dollar\n12. HUF - Hungarian Forint\n13. IDR - Indonesian Rupiah\n14. ILS - Israeli New Sheqel\n15. INR - Indian Rupee\n16. ISK - Icelandic Króna\n17. JPY - Japanese Yen\n18. KRW - South Korean Won\n19. MXN - Mexican Peso\n20. MYR - Malaysian Ringgit\n21. NOK - Norwegian Krone\n22. NZD - New Zealand Dollar\n23. PHP - Philippine Peso\n24. PLN - Polish Złoty\n25. RON - Romanian Leu\n26. SEK - Swedish Krona\n27. SGD - Singapore Dollar\n28. THB - Thai Baht\n29. TRY - Turkish Lira\n30. USD - United States Dollar\n31. ZAR - South African Rand\n\nIf you want to convert USD to any of these currencies, you can select the one you are interested in."


-Use Plano as a LLM Router
-------------------------
+.. _llm_routing_quickstart:
+
+Use Plano as a Model Proxy (Gateway)
+------------------------------------

 Step 1. Create plano config file
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -156,14 +248,14 @@ Step 2. Start plano

 Once the config file is created, ensure that you have environment variables set up for ``MISTRAL_API_KEY`` and ``OPENAI_API_KEY`` (or these are defined in a ``.env`` file).

-Start the Plano gateway:
+Start Plano:

 .. code-block:: console

   $ plano up plano_config.yaml
   2024-12-05 11:24:51,288 - cli.main - INFO - Starting plano cli version: 0.1.5
   2024-12-05 11:24:51,825 - cli.utils - INFO - Schema validation successful!
-   2024-12-05 11:24:51,825 - cli.main - INFO - Starting plano model server and plano gateway
+   2024-12-05 11:24:51,825 - cli.main - INFO - Starting plano
   ...
   2024-12-05 11:25:16,131 - cli.core - INFO - Container is healthy!

@ -171,7 +263,7 @@ Step 3: Interact with LLM
 ~~~~~~~~~~~~~~~~~~~~~~~~~

 Step 3.1: Using OpenAI Python client
-++++++++++++++++++++++++++++++++++++
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Make outbound calls via the Plano gateway:

@ -196,7 +288,7 @@ Make outbound calls via the Plano gateway:
   print("OpenAI Response:", response.choices[0].message.content)

 Step 3.2: Using curl command
-++++++++++++++++++++++++++++
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 .. code-block:: bash

@ -218,31 +310,6 @@ Step 3.2: Using curl command
     ],
   }

-You can override model selection using the ``x-plano-llm-provider-hint`` header. For example, to use Mistral, use the following curl command:
-
-.. code-block:: bash
-
-   $ curl --header 'Content-Type: application/json' \
-     --header 'x-plano-llm-provider-hint: ministral-3b' \
-     --data '{"messages": [{"role": "user","content": "What is the capital of France?"}], "model": "none"}' \
-     http://localhost:12000/v1/chat/completions
-
-   {
-     ...
-     "model": "ministral-3b-latest",
-     "choices": [
-       {
-         "messages": {
-           "role": "assistant",
-           "content": "The capital of France is Paris. It is the most populous city in France and is known for its iconic landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also a major global center for art, fashion, gastronomy, and culture.",
-         },
-         ...
-       }
-     ],
-     ...
-   }
-
-
 Next Steps
 ==========

--- a/docs/source/guides/llm_router.rst
+++ b/docs/source/guides/llm_router.rst
@ -3,21 +3,17 @@
 LLM Routing
 ==============================================================

-With the rapid proliferation of large language models (LLM) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models.
-
-Plano provides three distinct routing approaches to meet different use cases:
-
-1. **Model-based Routing**: Direct routing to specific models using provider/model names
-2. **Alias-based Routing**: Semantic routing using custom aliases that map to underlying models
-3. **Preference-aligned Routing**: Intelligent routing using the Arch-Router model based on context and user-defined preferences
-
-This enables optimal performance, cost efficiency, and response quality by matching requests with the most suitable model from your available LLM fleet.
+With the rapid proliferation of large language models (LLMs) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models. Plano provides three distinct routing approaches to meet different use cases: :ref:`Model-based routing <model_based_routing>`, :ref:`Alias-based routing <alias_based_routing>`, and :ref:`Preference-aligned routing <preference_aligned_routing>`. This enables optimal performance, cost efficiency, and response quality by matching requests with the most suitable model from your available LLM fleet.

+.. note::
+  For details on supported model providers, configuration options, and client libraries, see :ref:`LLM Providers <llm_providers>`.

 Routing Methods
 ---------------

-Model-based Routing
+.. _model_based_routing:
+
+Model-based routing
 ~~~~~~~~~~~~~~~~~~~

 Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``:
@ -26,147 +22,10 @@ Direct routing allows you to specify exact provider and model combinations using
 - Provides full control and transparency over which model handles each request
 - Ideal for production workloads where you want predictable routing behavior

-Alias-based Routing
-~~~~~~~~~~~~~~~~~~~
+Configuration
+^^^^^^^^^^^^^

-Alias-based routing lets you create semantic model names that decouple your application from specific providers:
-
- Use meaningful names like ``fast-model``, ``reasoning-model``, or ``arch.summarize.v1`` (see :ref:`model_aliases`)
- Maps semantic names to underlying provider models for easier experimentation and provider switching
- Ideal for applications that want abstraction from specific model names while maintaining control
-
-.. _preference_aligned_routing:
-
-Preference-aligned Routing (Arch-Router)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Traditional LLM routing approaches face significant limitations: they evaluate performance using benchmarks that often fail to capture human preferences, select from fixed model pools, and operate as "black boxes" without practical mechanisms for encoding user preferences.
-
-Plano's preference-aligned routing addresses these challenges by applying a fundamental engineering principle: decoupling. The framework separates route selection (matching queries to human-readable policies) from model assignment (mapping policies to specific LLMs). This separation allows you to define routing policies using descriptive labels like ``Domain: 'finance', Action: 'analyze_earnings_report'`` rather than cryptic identifiers, while independently configuring which models handle each policy.
-
-The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ model automatically selects the most appropriate LLM based on:
-
- Domain Analysis: Identifies the subject matter (e.g., legal, healthcare, programming)
- Action Classification: Determines the type of operation (e.g., summarization, code generation, translation)
- User-Defined Preferences: Maps domains and actions to preferred models using transparent, configurable routing decisions
- Human Preference Alignment: Uses domain-action mappings that capture subjective evaluation criteria, ensuring routing aligns with real-world user needs rather than just benchmark scores
-
-This approach supports seamlessly adding new models without retraining and is ideal for dynamic, context-aware routing that adapts to request content and intent.
-
-
-Model-based Routing Workflow
----------------------------
-
-For direct model routing, the process is straightforward:
-
-#. **Client Request**
-
-    The client specifies the exact model using provider/model format (``openai/gpt-4o``).
-
-#. **Provider Validation**
-
-    Plano validates that the specified provider and model are configured and available.
-
-#. **Direct Routing**
-
-    The request is sent directly to the specified model without analysis or decision-making.
-
-#. **Response Handling**
-
-    The response is returned to the client with optional metadata about the routing decision.
-
-
-Alias-based Routing Workflow
-----------------------------
-
-For alias-based routing, the process includes name resolution:
-
-#. **Client Request**
-
-    The client specifies a semantic alias name (``reasoning-model``).
-
-#. **Alias Resolution**
-
-    Plano resolves the alias to the actual provider/model name based on configuration.
-
-#. **Model Selection**
-
-    If the alias maps to multiple models, Plano selects one based on availability and load balancing.
-
-#. **Request Forwarding**
-
-    The request is forwarded to the resolved model.
-
-#. **Response Handling**
-
-    The response is returned with optional metadata about the alias resolution.
-
-
-.. _preference_aligned_routing_workflow:
-
-Preference-aligned Routing Workflow (Arch-Router)
-------------------------------------------------
-
-For preference-aligned dynamic routing, the process involves intelligent analysis:
-
-#. **Prompt Analysis**
-
-    When a user submits a prompt without specifying a model, the Arch-Router analyzes it to determine the domain (subject matter) and action (type of operation requested).
-
-#. **Model Selection**
-
-    Based on the analyzed intent and your configured routing preferences, the Router selects the most appropriate model from your available LLM fleet.
-
-#. **Request Forwarding**
-
-    Once the optimal model is identified, our gateway forwards the original prompt to the selected LLM endpoint. The routing decision is transparent and can be logged for monitoring and optimization purposes.
-
-#. **Response Handling**
-
-    After the selected model processes the request, the response is returned through the gateway. The gateway can optionally add routing metadata or performance metrics to help you understand and optimize your routing decisions.
-
-Arch-Router
-------------------------
-The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ is a state-of-the-art **preference-based routing model** specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
-
-**Addressing Traditional Routing Limitations:**
-
-**Human Preference Alignment**
-Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
-
-**Flexible Model Integration**
-The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
-
-**Preference-Encoded Routing**
-Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
-
-To support effective routing, Arch-Router introduces two key concepts:
-
- **Domain** – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
-
- **Action** – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
-
-Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
-
-In summary, Arch-Router demonstrates:
-
- **Structured Preference Routing**: Aligns prompt request with model strengths using explicit domain–action mappings.
-
- **Transparent and Controllable**: Makes routing decisions transparent and configurable, empowering users to customize system behavior.
-
- **Flexible and Adaptive**: Supports evolving user needs, model updates, and new domains/actions without retraining the router.
-
- **Production-Ready Performance**: Optimized for low-latency, high-throughput applications in multi-model environments.
-
-
-.. _implementing_routing:
-
-Implementing Routing
--------------------
-
-**Model-based Routing**
-
-For direct model routing, configure your LLM providers with specific provider/model names:
+Configure your LLM providers with specific provider/model names:

 .. code-block:: yaml
    :caption: Model-based Routing Configuration
@ -189,6 +48,9 @@ For direct model routing, configure your LLM providers with specific provider/mo
      - model: anthropic/claude-3-5-sonnet-20241022
        access_key: $ANTHROPIC_API_KEY

+Client usage
+^^^^^^^^^^^^
+
 Clients specify exact models:

 .. code-block:: python
@ -204,7 +66,19 @@ Clients specify exact models:
        messages=[{"role": "user", "content": "Write a story"}]
    )

-**Alias-based Routing**
+.. _alias_based_routing:
+
+Alias-based routing
+~~~~~~~~~~~~~~~~~~~
+
+Alias-based routing lets you create semantic model names that decouple your application from specific providers:
+
+- Use meaningful names like ``fast-model``, ``reasoning-model``, or ``arch.summarize.v1`` (see :ref:`model_aliases`)
+- Maps semantic names to underlying provider models for easier experimentation and provider switching
+- Ideal for applications that want abstraction from specific model names while maintaining control
+
+Configuration
+^^^^^^^^^^^^^

 Configure semantic aliases that map to underlying models:

@ -239,6 +113,9 @@ Configure semantic aliases that map to underlying models:
      creative-model:
        target: claude-3-5-sonnet-20241022

+Client usage
+^^^^^^^^^^^^
+
 Clients use semantic names:

 .. code-block:: python
@ -254,9 +131,23 @@ Clients use semantic names:
        messages=[{"role": "user", "content": "Solve this complex problem"}]
    )

-**Preference-aligned Routing (Arch-Router)**
+.. _preference_aligned_routing:

-To configure preference-aligned dynamic routing, you need to define routing preferences that map domains and actions to specific models:
+Preference-aligned routing (Arch-Router)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Preference-aligned routing uses the `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
+
+- **Domain**: High-level topic of the request (e.g., legal, healthcare, programming).
+- **Action**: What the user wants to do (e.g., summarize, generate code, translate).
+- **Routing preferences**: Your mapping from (domain, action) to preferred models.
+
+Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples **routing policy** (how to choose) from **model assignment** (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
+
+Configuration
+^^^^^^^^^^^^^
+
+To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:

 .. code-block:: yaml
    :caption: Preference-Aligned Dynamic Routing Configuration
@ -289,7 +180,10 @@ To configure preference-aligned dynamic routing, you need to define routing pref
          - name: code generation
            description: generating new code snippets, functions, or boilerplate based on user prompts

-Clients can let the router decide or use aliases:
+Client usage
+^^^^^^^^^^^^
+
+Clients can let the router decide or still specify aliases:

 .. code-block:: python

@ -300,6 +194,40 @@ Clients can let the router decide or use aliases:
    )


+Arch-Router
+-----------
+The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ is a state-of-the-art **preference-based routing model** specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
+
+**Addressing Traditional Routing Limitations:**
+
+**Human Preference Alignment**
+Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
+
+**Flexible Model Integration**
+The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
+
+**Preference-Encoded Routing**
+Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
+
+To support effective routing, Arch-Router introduces two key concepts:
+
+- **Domain** – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
+
+- **Action** – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
+
+Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
+
+In summary, Arch-Router demonstrates:
+
+- **Structured Preference Routing**: Aligns prompt request with model strengths using explicit domain–action mappings.
+
+- **Transparent and Controllable**: Makes routing decisions transparent and configurable, empowering users to customize system behavior.
+
+- **Flexible and Adaptive**: Supports evolving user needs, model updates, and new domains/actions without retraining the router.
+
+- **Production-Ready Performance**: Optimized for low-latency, high-throughput applications in multi-model environments.
+
+
 Combining Routing Methods
 -------------------------

@ -343,7 +271,7 @@ This configuration allows clients to:
 2. **Let the router decide**: No model specified, router analyzes content

 Example Use Cases
-------------------------
+-----------------
 Here are common scenarios where Arch-Router excels:

 - **Coding Tasks**: Distinguish between code generation requests ("write a Python function"), debugging needs ("fix this error"), and code optimization ("make this faster"), routing each to appropriately specialized models.
@ -354,9 +282,8 @@ Here are common scenarios where Arch-Router excels:

 - **Conversational Routing**: Track conversation context to identify when topics shift between domains or when the type of assistance needed changes mid-conversation.

-
-Best practicesm
-------------------------
+Best practices
+--------------
 - **💡Consistent Naming:**  Route names should align with their descriptions.

  - ❌ Bad:
@ -381,18 +308,15 @@ Best practicesm

 - **💡Nouns Descriptor:** Preference-based routers perform better with noun-centric descriptors, as they offer more stable and semantically rich signals for matching.

- **💡Domain Inclusion:** for best user experience, you should always include domain route. This help the router fall back to domain when action is not
+- **💡Domain Inclusion:** for best user experience, you should always include a domain route. This helps the router fall back to domain when action is not confidently inferred.

-.. Unsupported Features
-.. -------------------------
+Unsupported Features
+--------------------

-.. The following features are **not supported** by the Arch-Router model:
+The following features are **not supported** by the Arch-Router model:

-.. - **❌ Multi-Modality:**
-..   The model is not trained to process raw image or audio inputs. While it can handle textual queries *about* these modalities (e.g., "generate an image of a cat"), it cannot interpret encoded multimedia data directly.
+- **Multi-modality**: The model is not trained to process raw image or audio inputs. It can handle textual queries *about* these modalities (e.g., "generate an image of a cat"), but cannot interpret encoded multimedia data directly.

-.. - **❌ Function Calling:**
-..   This model is designed for **semantic preference matching**, not exact intent classification or tool execution. For structured function invocation, use models in the **Plano-Function-Calling** collection.
+- **Function calling**: Arch-Router is designed for **semantic preference matching**, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.

-.. - **❌ System Prompt Dependency:**
-..   Arch-Router routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
+- **System prompt dependency**: Arch-Router routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
--- a/docs/source/guides/observability/tracing.rst
+++ b/docs/source/guides/observability/tracing.rst
@ -17,9 +17,9 @@ requests in an AI application. With tracing, you can capture a detailed view of
 through various services and components, which is crucial for **debugging**, **performance optimization**,
 and understanding complex AI agent architectures like Co-pilots.

-**Arch** propagates trace context using the W3C Trace Context standard, specifically through the
+**Plano** propagates trace context using the W3C Trace Context standard, specifically through the
 ``traceparent`` header. This allows each component in the system to record its part of the request
-flow, enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Arch ensures
+flow, enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Plano ensures
 that developers can capture this trace data consistently and in a format compatible with various observability
 tools.

@ -43,7 +43,7 @@ How to Initiate A Trace

 1. **Enable Tracing Configuration**: Simply add the ``random_sampling`` in ``tracing`` section to 100`` flag to in the :ref:`listener <plano_overview_listeners>` config

-2. **Trace Context Propagation**: Arch automatically propagates the ``traceparent`` header. When a request is received, Arch will:
+2. **Trace Context Propagation**: Plano automatically propagates the ``traceparent`` header. When a request is received, Plano will:

   - Generate a new ``traceparent`` header if one is not present.
   - Extract the trace context from the ``traceparent`` header if it exists.
@ -57,7 +57,7 @@ How to Initiate A Trace
 Trace Propagation
 -----------------

-Arch uses the W3C Trace Context standard for trace propagation, which relies on the ``traceparent`` header.
+Plano uses the W3C Trace Context standard for trace propagation, which relies on the ``traceparent`` header.
 This header carries tracing information in a standardized format, enabling interoperability between different
 tracing systems.

@ -77,7 +77,7 @@ Instrumentation
 ~~~~~~~~~~~~~~~

 To integrate AI tracing, your application needs to follow a few simple steps. The steps
-below are very common practice, and not unique to Arch, when you reading tracing headers and export
+below are very common practice, and not unique to Plano, when you reading tracing headers and export
 `spans <https://docs.lightstep.com/docs/understand-distributed-tracing>`_ for distributed tracing.

 - Read the ``traceparent`` header from incoming requests.
@ -148,66 +148,6 @@ Handle incoming requests:
           print(f"Payment service response: {response.content}")


-AI Agent Tracing Visualization Example
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The following is an example of tracing for an AI-powered customer support system.
-A customer interacts with AI agents, which forward their requests through different
-specialized services and external systems.
-
-::
-
-    +--------------------------+
-    |   Customer Interaction   |
-    +--------------------------+
-               |
-               v
-    +--------------------------+        +--------------------------+
-    |  Agent 1 (Main - Arch)   | ---->  | External Payment Service |
-    +--------------------------+        +--------------------------+
-               |                                  |
-               v                                  v
-    +--------------------------+        +--------------------------+
-    |  Agent 2 (Support - Arch)| ---->  |   Internal Tech Support  |
-    +--------------------------+        +--------------------------+
-               |                                  |
-               v                                  v
-    +--------------------------+        +--------------------------+
-    | Agent 3 (Orders- Arch)   | ---->  |   Inventory Management   |
-    +--------------------------+        +--------------------------+
-
-Trace Breakdown:
-****************
-
- Customer Interaction:
-    - Span 1: Customer initiates a request via the AI-powered chatbot for billing support (e.g., asking for payment details).
-
- AI Agent 1 (Main - Arch):
-    - Span 2: AI Agent 1 (Main) processes the request and identifies it as related to billing, forwarding the request
-      to an external payment service.
-    - Span 3: AI Agent 1 determines that additional technical support is needed for processing and forwards the request
-      to AI Agent 2.
-
- External Payment Service:
-    - Span 4: The external payment service processes the payment-related request (e.g., verifying payment status) and sends
-      the response back to AI Agent 1.
-
- AI Agent 2 (Tech - Arch):
-    - Span 5: AI Agent 2, responsible for technical queries, processes a request forwarded from AI Agent 1 (e.g., checking for
-      any account issues).
-    - Span 6: AI Agent 2 forwards the query to Internal Tech Support for further investigation.
-
- Internal Tech Support:
-    - Span 7: Internal Tech Support processes the request (e.g., resolving account access issues) and responds to AI Agent 2.
-
- AI Agent 3 (Orders - Arch):
-    - Span 8: AI Agent 3 handles order-related queries. AI Agent 1 forwards the request to AI Agent 3 after payment verification
-      is completed.
-    - Span 9: AI Agent 3 forwards a request to the Inventory Management system to confirm product availability for a pending order.
-
- Inventory Management:
-    - Span 10: The Inventory Management system checks stock and availability and returns the information to AI Agent 3.
-
 Integrating with Tracing Tools
 ------------------------------

@ -292,11 +232,11 @@ To send tracing data to `Datadog <https://docs.datadoghq.com/getting_started/tra
 Langtrace
 ~~~~~~~~~

-Langtrace is an observability tool designed specifically for large language models (LLMs). It helps you capture, analyze, and understand how LLMs are used in your applications including those built using Arch.
+Langtrace is an observability tool designed specifically for large language models (LLMs). It helps you capture, analyze, and understand how LLMs are used in your applications including those built using Plano.

 To send tracing data to `Langtrace <https://docs.langtrace.ai/supported-integrations/llm-tools/arch>`_:

-1. **Configure Arch**: Make sure Arch is installed and setup correctly. For more information, refer to the `installation guide <https://github.com/katanemo/archgw?tab=readme-ov-file#prerequisites>`_.
+1. **Configure Plano**: Make sure Plano is installed and setup correctly. For more information, refer to the `installation guide <https://github.com/katanemo/archgw?tab=readme-ov-file#prerequisites>`_.

 2. **Install Langtrace**: Install the Langtrace SDK.:

@ -348,7 +288,7 @@ Best Practices
 Summary
 ----------

-By leveraging the ``traceparent`` header for trace context propagation, Arch enables developers to implement
+By leveraging the ``traceparent`` header for trace context propagation, Plano enables developers to implement
 tracing efficiently. This approach simplifies the process of collecting and analyzing tracing data in common
 tools like AWS X-Ray and Datadog, enhancing observability and facilitating faster debugging and optimization.

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -5,7 +5,7 @@ Welcome to Plano!
   :width: 100%
   :align: center

-`Plano <https://github.com/katanemo/plano>`_ is delivery infrastructure for agentic apps. A models-native proxy and data plane designed to help you build agents faster, and deliver them reliably to production.
+`Plano <https://github.com/katanemo/plano>`_ is delivery infrastructure for agentic apps. A models-native proxy server and data plane designed to help you build agents faster, and deliver them reliably to production.

 Plano pulls out the rote plumbing work (aka “hidden AI middleware”) and decouples you from brittle, ever‑changing framework abstractions. It centralizes what shouldn’t be bespoke in every codebase like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for UX and DX agility. Use any language or AI framework, and ship agents to production faster with Plano.