updating plano docs, README and CLI

2026-06-17 15:25:17 +02:00 · 2025-12-19 17:45:51 -08:00 · 2025-12-19 17:45:51 -08:00 · 28fd430efd
commit 28fd430efd
parent 2f9121407b
61 changed files with 1449 additions and 1306 deletions
--- a/docs/source/guides/agent_routing.rst
+++ b/docs/source/guides/agent_routing.rst
@ -1,105 +0,0 @@
-.. _agent_routing:
-
-Agent Routing and Hand Off
-===========================
-
-Agent Routing and Hand Off is a key feature in Arch that enables intelligent routing of user prompts to specialized AI agents or human agents based on the nature and complexity of the user's request.
-
-This capability significantly enhances the efficiency and personalization of interactions, ensuring each prompt receives the most appropriate and effective handling. The following section describes
-the workflow, configuration, and implementation of Agent routing and hand off in Arch.
-
-#. **Agent Selection**
-   When a user submits a prompt, Arch analyzes the input to determine the intent and complexity. Based on the analysis, Arch selects the most suitable agent configured within your application to handle the specific category of the user's request—such as sales inquiries, technical issues, or complex scenarios requiring human attention.
-
-#. **Prompt Routing**
-   After selecting the appropriate agent, Arch routes the user's prompt to the designated agent's endpoint and waits for the agent to respond back with the processed output or further instructions.
-
-#. **Hand Off**
-   Based on follow-up queries from the user, Arch repeats the process of analysis, agent selection, and routing to ensure a seamless hand off between AI agents as needed.
-
-.. code-block:: yaml
-    :caption: Agent Routing and Hand Off Configuration Example
-
-    prompt_targets:
-      - name: sales_agent
-        description: Handles queries related to sales and purchases
-
-      - name: issues_and_repairs
-        description: handles issues, repairs, or refunds
-
-      - name: escalate_to_human
-        description: escalates to human agent
-
-.. code-block:: python
-    :caption: Agent Routing and Hand Off Implementation Example via FastAPI
-
-    class Agent:
-        def __init__(self, role: str, instructions: str):
-            self.system_prompt = f"You are a {role}.\n{instructions}"
-
-        def handle(self, req: ChatCompletionsRequest):
-            messages = [{"role": "system", "content": self.get_system_prompt()}] + [
-                message.model_dump() for message in req.messages
-            ]
-            return call_openai(messages, req.stream) #call_openai is a placeholder for the actual API call
-
-        def get_system_prompt(self) -> str:
-            return self.system_prompt
-
-    # Define your agents
-    AGENTS = {
-        "sales_agent": Agent(
-            role="sales agent",
-            instructions=(
-                "Always answer in a sentence or less.\n"
-                "Follow the following routine with the user:\n"
-                "1. Engage\n"
-                "2. Quote ridiculous price\n"
-                "3. Reveal caveat if user agrees."
-            ),
-        ),
-        "issues_and_repairs": Agent(
-            role="issues and repairs agent",
-            instructions="Propose a solution, offer refund if necessary.",
-        ),
-        "escalate_to_human": Agent(
-            role="human escalation agent", instructions="Escalate issues to a human."
-        ),
-        "unknown_agent": Agent(
-            role="general assistant", instructions="Assist the user in general queries."
-        ),
-    }
-
-    #handle the request from arch gateway
-    @app.post("/v1/chat/completions")
-    def completion_api(req: ChatCompletionsRequest, request: Request):
-
-        agent_name = req.metadata.get("agent-name", "unknown_agent")
-        agent = AGENTS.get(agent_name)
-        logger.info(f"Routing to agent: {agent_name}")
-
-        return agent.handle(req)
-
-.. note::
-    The above example demonstrates a simple implementation of Agent Routing and Hand Off using FastAPI. For the full implementation of this example
-    please see our `GitHub demo <https://github.com/katanemo/archgw/tree/main/demos/use_cases/orchestrating_agents>`_.
-
-Example Use Cases
-----------------
-Agent Routing and Hand Off is particularly beneficial in scenarios such as:
-
- **Customer Support**: Routing common customer queries to automated support agents, while escalating complex or sensitive issues to human support staff.
- **Sales and Marketing**: Automatically directing potential leads and sales inquiries to specialized sales agents for timely and targeted follow-ups.
- **Technical Assistance**: Managing user-reported issues, repairs, or refunds by assigning them to the correct technical or support agent efficiently.
-
-Best Practices and Tips
------------------------
-When implementing Agent Routing and Hand Off in your applications, consider these best practices:
-
- Clearly define agent responsibilities: Ensure each agent or human endpoint has a clear, specific description of the prompts they handle, reducing mis-routing.
- Monitor and optimize routes: Regularly review how prompts are routed to adjust and optimize agent definitions and configurations.
-
-.. note::
-    To observe traffic to and from agents, please read more about :ref:`observability <observability>` in Arch.
-
-By carefully configuring and managing your Agent routing and hand off, you can significantly improve your application's responsiveness, performance, and overall user satisfaction.
--- a/docs/source/guides/function_calling.rst
+++ b/docs/source/guides/function_calling.rst
@ -126,7 +126,7 @@ Step 3: Arch Takes Over
 Once you have defined the functions and configured the prompt targets, Arch Gateway takes care of the remaining work.
 It will automatically validate parameters, and ensure that the required parameters (e.g., location) are present in the prompt, and add validation rules if necessary.

-.. figure:: /_static/img/arch_network_diagram_high_level.png
+.. figure:: /_static/img/plano_network_diagram_high_level.png
   :width: 100%
   :align: center

--- a/docs/source/guides/llm_router.rst
+++ b/docs/source/guides/llm_router.rst
@ -5,7 +5,7 @@ LLM Routing

 With the rapid proliferation of large language models (LLM) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models.

-Arch provides three distinct routing approaches to meet different use cases:
+Plano provides three distinct routing approaches to meet different use cases:

 1. **Model-based Routing**: Direct routing to specific models using provider/model names
 2. **Alias-based Routing**: Semantic routing using custom aliases that map to underlying models
@ -42,7 +42,7 @@ Preference-aligned Routing (Arch-Router)

 Traditional LLM routing approaches face significant limitations: they evaluate performance using benchmarks that often fail to capture human preferences, select from fixed model pools, and operate as "black boxes" without practical mechanisms for encoding user preferences.

-Arch's preference-aligned routing addresses these challenges by applying a fundamental engineering principle: decoupling. The framework separates route selection (matching queries to human-readable policies) from model assignment (mapping policies to specific LLMs). This separation allows you to define routing policies using descriptive labels like ``Domain: 'finance', Action: 'analyze_earnings_report'`` rather than cryptic identifiers, while independently configuring which models handle each policy.
+Plano's preference-aligned routing addresses these challenges by applying a fundamental engineering principle: decoupling. The framework separates route selection (matching queries to human-readable policies) from model assignment (mapping policies to specific LLMs). This separation allows you to define routing policies using descriptive labels like ``Domain: 'finance', Action: 'analyze_earnings_report'`` rather than cryptic identifiers, while independently configuring which models handle each policy.

 The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ model automatically selects the most appropriate LLM based on:

@ -65,7 +65,7 @@ For direct model routing, the process is straightforward:

 #. **Provider Validation**

-    Arch validates that the specified provider and model are configured and available.
+    Plano validates that the specified provider and model are configured and available.

 #. **Direct Routing**

@ -87,11 +87,11 @@ For alias-based routing, the process includes name resolution:

 #. **Alias Resolution**

-    Arch resolves the alias to the actual provider/model name based on configuration.
+    Plano resolves the alias to the actual provider/model name based on configuration.

 #. **Model Selection**

-    If the alias maps to multiple models, Arch selects one based on availability and load balancing.
+    If the alias maps to multiple models, Plano selects one based on availability and load balancing.

 #. **Request Forwarding**

@ -159,6 +159,8 @@ In summary, Arch-Router demonstrates:
 - **Production-Ready Performance**: Optimized for low-latency, high-throughput applications in multi-model environments.


+.. _implementing_routing:
+
 Implementing Routing
 --------------------

@ -390,7 +392,7 @@ Best practicesm
 ..   The model is not trained to process raw image or audio inputs. While it can handle textual queries *about* these modalities (e.g., "generate an image of a cat"), it cannot interpret encoded multimedia data directly.

 .. - **❌ Function Calling:**
-..   This model is designed for **semantic preference matching**, not exact intent classification or tool execution. For structured function invocation, use models in the **Arch-Function-Calling** collection.
+..   This model is designed for **semantic preference matching**, not exact intent classification or tool execution. For structured function invocation, use models in the **Plano-Function-Calling** collection.

 .. - **❌ System Prompt Dependency:**
 ..   Arch-Router routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
--- a/docs/source/guides/observability/access_logging.rst
+++ b/docs/source/guides/observability/access_logging.rst
@ -3,14 +3,14 @@
 Access Logging
 ==============

-Access logging in Arch refers to the logging of detailed information about each request and response that flows through Arch.
-It provides visibility into the traffic passing through Arch, which is crucial for monitoring, debugging, and analyzing the
+Access logging in Plano refers to the logging of detailed information about each request and response that flows through Plano.
+It provides visibility into the traffic passing through Plano, which is crucial for monitoring, debugging, and analyzing the
 behavior of AI applications and their interactions.

 Key Features
 ^^^^^^^^^^^^
 * **Per-Request Logging**:
-  Each request that passes through Arch is logged. This includes important metadata such as HTTP method,
+  Each request that passes through Plano is logged. This includes important metadata such as HTTP method,
  path, response status code, request duration, upstream host, and more.
 * **Integration with Monitoring Tools**:
  Access logs can be exported to centralized logging systems (e.g., ELK stack or Fluentd) or used to feed monitoring and alerting systems.
@ -19,7 +19,7 @@ Key Features
 How It Works
 ^^^^^^^^^^^^

-Arch gateway exposes access logs for every call it manages on your behalf. By default these access logs can be found under ``~/archgw_logs``. For example:
+Plano exposes access logs for every call it manages on your behalf. By default these access logs can be found under ``~/archgw_logs``. For example:

 .. code-block:: console

--- a/docs/source/guides/observability/monitoring.rst
+++ b/docs/source/guides/observability/monitoring.rst
@ -8,11 +8,11 @@ and instrumentation for generating, collecting, processing, and exporting teleme
 metrics, and logs. Its flexible design supports a wide range of backends and seamlessly integrates with
 modern application tools.

-Arch acts a *source* for several monitoring metrics related to **prompts** and **LLMs** natively integrated
+Plano acts a *source* for several monitoring metrics related to **agents** and **LLMs** natively integrated
 via `OpenTelemetry <https://opentelemetry.io/>`_ to help you understand three critical aspects of your application:
 latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which your application
 is responding to users, which includes metrics like time to first token (TFT), time per output token (TOT) metrics, and
-the total latency as perceived by users. Below are some screenshots how Arch integrates natively with tools like
+the total latency as perceived by users. Below are some screenshots how Plano integrates natively with tools like
 `Grafana <https://grafana.com/grafana/dashboards/>`_ via `Promethus <https://prometheus.io/>`_


@ -32,7 +32,7 @@ Metrics Dashboard (via Grafana)

 Configure Monitoring
 ~~~~~~~~~~~~~~~~~~~~
-Arch gateway publishes stats endpoint at http://localhost:19901/stats. As noted above, Arch is a source for metrics. To view and manipulate dashbaords, you will
+Plano publishes stats endpoint at http://localhost:19901/stats. As noted above, Plano is a source for metrics. To view and manipulate dashbaords, you will
 need to configiure `Promethus <https://prometheus.io/>`_ (as a metrics store) and `Grafana <https://grafana.com/grafana/dashboards/>`_ for dashboards. Below
 are some sample configuration files for both, respectively.

--- a/docs/source/guides/observability/tracing.rst
+++ b/docs/source/guides/observability/tracing.rst
@ -41,7 +41,7 @@ Benefits of Using ``Traceparent`` Headers
 How to Initiate A Trace
 -----------------------

-1. **Enable Tracing Configuration**: Simply add the ``random_sampling`` in ``tracing`` section to 100`` flag to in the :ref:`listener <arch_overview_listeners>` config
+1. **Enable Tracing Configuration**: Simply add the ``random_sampling`` in ``tracing`` section to 100`` flag to in the :ref:`listener <plano_overview_listeners>` config

 2. **Trace Context Propagation**: Arch automatically propagates the ``traceparent`` header. When a request is received, Arch will:

--- a/docs/source/guides/orchestration.rst
+++ b/docs/source/guides/orchestration.rst
@ -0,0 +1,224 @@
+.. _agent_routing:
+
+Orchestration
+==============
+
+Building multi-agent systems allow you to route requests across multiple specialized agents, each designed to handle specific types of tasks.
+Plano makes it easy to build and scale these systems by managing the orchestration layer—deciding which agent(s) should handle each request—while you focus on implementing individual agent logic.
+
+This guide shows you how to configure and implement multi-agent orchestration in Plano.
+
+How It Works
+------------
+
+Plano's orchestration layer analyzes incoming prompts and routes them to the most appropriate agent based on user intent and conversation context. The workflow is:
+
+1. **User submits a prompt**: The request arrives at Plano's agent listener.
+2. **Agent selection**: Plano analyzes the prompt to determine user intent and complexity, and routes the request to the most suitable agent configured in your system—such as a sales agent, technical support agent, or RAG agent.
+3. **Agent handles request**: The selected agent processes the prompt using its specialized logic and tools.
+4. **Seamless handoffs**: For multi-turn conversations, Plano repeats the intent analysis for each follow-up query, enabling smooth handoffs between agents as the conversation evolves.
+
+Configuration
+-------------
+
+Configure your agents in the ``listeners`` section of your ``plano_config.yaml``:
+
+.. code-block:: yaml
+    :caption: Multi-Agent Configuration Example
+
+    listeners:
+      - type: agent
+        name: agent_listener
+        port: 8001
+        agents:
+          - id: sales_agent
+            description: Handles sales inquiries, product recommendations, and pricing questions
+            endpoint: http://sales-service:8000/agent
+
+          - id: support_agent
+            description: Handles technical issues, troubleshooting, and repairs
+            endpoint: http://support-service:8000/agent
+
+          - id: rag_agent
+            description: Answers questions using retrieval augmented generation
+            endpoint: http://rag-service:8000/agent
+            filter_chain:
+              - query_rewriter
+              - context_builder
+
+**Key Configuration Elements:**
+
+* **agent listener**: A listener of ``type: agent`` tells Plano to perform intent analysis and routing for incoming requests.
+* **agents list**: Define each agent with an ``id``, ``description`` (used for routing decisions), and ``endpoint`` (where Plano forwards requests).
+* **filter_chain**: Optionally attach :ref:`filter chains <filter_chain>` to agents for guardrails, query rewriting, or context enrichment.
+
+Implementation
+--------------
+
+Agents are HTTP services that receive routed requests from Plano. Here's how to implement a simple multi-agent system using Python and FastAPI:
+
+.. code-block:: python
+    :caption: Multi-Agent Implementation Example
+
+    class Agent:
+        def __init__(self, role: str, instructions: str):
+            self.system_prompt = f"You are a {role}.\n{instructions}"
+
+        def handle(self, req: ChatCompletionsRequest):
+            messages = [{"role": "system", "content": self.get_system_prompt()}] + [
+                message.model_dump() for message in req.messages
+            ]
+            return call_openai(messages, req.stream) #call_openai is a placeholder for the actual API call
+
+        def get_system_prompt(self) -> str:
+            return self.system_prompt
+
+    # Define your agents
+    AGENTS = {
+        "sales_agent": Agent(
+            role="sales agent",
+            instructions=(
+                "Always answer in a sentence or less.\n"
+                "Follow the following routine with the user:\n"
+                "1. Engage\n"
+                "2. Quote ridiculous price\n"
+                "3. Reveal caveat if user agrees."
+            ),
+        ),
+        "issues_and_repairs": Agent(
+            role="issues and repairs agent",
+            instructions="Propose a solution, offer refund if necessary.",
+        ),
+        "escalate_to_human": Agent(
+            role="human escalation agent", instructions="Escalate issues to a human."
+        ),
+        "unknown_agent": Agent(
+            role="general assistant", instructions="Assist the user in general queries."
+        ),
+    }
+
+    #handle the request from arch gateway
+    @app.post("/v1/chat/completions")
+    def completion_api(req: ChatCompletionsRequest, request: Request):
+
+        agent_name = req.metadata.get("agent-name", "unknown_agent")
+        agent = AGENTS.get(agent_name)
+        logger.info(f"Routing to agent: {agent_name}")
+
+        return agent.handle(req)
+
+**How Requests Flow:**
+
+1. User sends a prompt to Plano's agent listener (e.g., "I need help with a billing issue").
+2. Plano-Orchestrator analyzes the intent and routes to the ``support_agent``.
+3. Plano forwards the request to ``http://support-service:8000/agent`` with metadata indicating which agent was selected.
+4. Your agent service receives the request, processes it using its specialized logic, and returns a response.
+5. Plano forwards the agent's response back to the user.
+
+.. note::
+    For a complete working example with multiple agents, see our `multi-agent orchestration demo <https://github.com/katanemo/archgw/tree/main/demos/use_cases/orchestrating_agents>`_ on GitHub.
+
+Common Use Cases
+----------------
+
+Multi-agent orchestration is particularly powerful for:
+
+**Customer Support**
+
+Route common queries to automated support agents while escalating complex or sensitive issues to human support staff.
+
+.. code-block:: yaml
+
+    agents:
+      - id: tier1_support
+        description: Handles common FAQs, password resets, and basic troubleshooting
+      - id: tier2_support
+        description: Handles complex technical issues requiring deep product knowledge
+      - id: human_escalation
+        description: Escalates sensitive issues or unresolved problems to human agents
+
+**Sales and Marketing**
+
+Direct potential leads and sales inquiries to specialized sales agents for timely, targeted follow-ups.
+
+.. code-block:: yaml
+
+    agents:
+      - id: product_recommendation
+        description: Recommends products based on user needs and preferences
+      - id: pricing_agent
+        description: Provides pricing information and quotes
+      - id: sales_closer
+        description: Handles final negotiations and closes deals
+
+**Technical Documentation and Support**
+
+Combine RAG agents for documentation lookup with specialized troubleshooting agents.
+
+.. code-block:: yaml
+
+    agents:
+      - id: docs_agent
+        description: Retrieves relevant documentation and guides
+        filter_chain:
+          - query_rewriter
+          - context_builder
+      - id: troubleshoot_agent
+        description: Diagnoses and resolves technical issues step by step
+
+Best Practices
+--------------
+
+**Write Clear Agent Descriptions**
+
+Agent descriptions are used by Plano-Orchestrator to make routing decisions. Be specific about what each agent handles:
+
+.. code-block:: yaml
+
+    # Good - specific and actionable
+    - id: refund_agent
+      description: Processes refund requests for orders within 30 days, validates return eligibility
+
+    # Less ideal - too vague
+    - id: refund_agent
+      description: Handles refunds
+
+**Use Filter Chains for Cross-Cutting Concerns**
+
+Apply :ref:`filter chains <filter_chain>` to agents that need guardrails, context enrichment, or query rewriting:
+
+.. code-block:: yaml
+
+    agents:
+      - id: rag_agent
+        description: Answers questions using company knowledge base
+        filter_chain:
+          - compliance_check  # Ensure queries comply with policies
+          - query_rewriter    # Optimize query for retrieval
+          - context_builder   # Fetch relevant docs
+
+**Monitor and Optimize Routing**
+
+Regularly review which agents handle which requests to identify mis-routing patterns and adjust agent descriptions:
+
+.. code-block:: yaml
+
+    # Monitor routing decisions through Plano's tracing
+    # Adjust descriptions if you see unexpected routing behavior
+
+**Route LLM Calls Through Plano's Model Proxy**
+
+When your agents need to call LLMs, route those calls through Plano's :ref:`Model Proxy <llm_providers>` for consistent responses, smart routing, and rich observability. See :ref:`Making LLM Calls from Agents <agents>` for details.
+
+Next Steps
+----------
+
+* Learn more about :ref:`agents <agents>` and the inner vs. outer loop model
+* Explore :ref:`filter chains <filter_chain>` for adding guardrails and context enrichment
+* See :ref:`observability <observability>` for monitoring multi-agent workflows
+* Review the :ref:`LLM Providers <llm_providers>` guide for model routing within agents
+
+.. note::
+    To observe traffic to and from agents, please read more about :ref:`observability <observability>` in Plano.
+
+By carefully configuring and managing your Agent routing and hand off, you can significantly improve your application's responsiveness, performance, and overall user satisfaction.
--- a/docs/source/guides/prompt_guard.rst
+++ b/docs/source/guides/prompt_guard.rst
@ -1,66 +1,89 @@
 .. _prompt_guard:

-Prompt Guard
-=============
+Guardrails
+==========

-**Prompt guard** is a security and validation feature offered in Arch to protect agents, by filtering and analyzing prompts before they reach your application logic.
-In applications where prompts generate responses or execute specific actions based on user inputs, prompt guard minimizes risks like malicious inputs (or misaligned outputs).
-By adding a layer of input scrutiny, prompt guards ensures safer, more reliable, and accurate interactions with agents.
+**Guardrails** are Plano's way of applying safety and validation checks to prompts before they reach your application logic. They are typically implemented as
+filters in a :ref:`Filter Chain <filter_chain>` attached to an agent, so every request passes through a consistent processing layer.
+
+
+Why Guardrails
+--------------
+Guardrails are essential for maintaining control over AI-driven applications. They help enforce organizational policies, ensure compliance with regulations
+(like GDPR or HIPAA), and protect users from harmful or inappropriate content. In applications where prompts generate responses or trigger actions, guardrails
+minimize risks like malicious inputs, off-topic queries, or misaligned outputs—adding a consistent layer of input scrutiny that makes interactions safer,
+more reliable, and easier to reason about.

-Why Prompt Guard
----------------

 .. vale Vale.Spelling = NO

- **Prompt Sanitization via Arch-Guard**
-    - **Jailbreak Prevention**: Detects and filters inputs that might attempt jailbreak attacks, like alternating LLM intended behavior, exposing the system prompt, or bypassing ethnics safety.
+- **Jailbreak Prevention**: Detect and filter inputs that attempt to change LLM behavior, expose system prompts, or bypass safety policies.
+- **Domain and Topicality Enforcement**: Ensure that agents only respond to prompts within an approved domain (for example, finance-only or healthcare-only use cases) and reject unrelated queries.
+- **Dynamic Error Handling**: Provide clear error messages when requests violate policy, helping users correct their inputs.

- **Dynamic Error Handling**
-    - **Automatic Correction**: Applies error-handling techniques to suggest corrections for minor input errors, such as typos or misformatted data.
-    - **Feedback Mechanism**: Provides informative error messages to users, helping them understand how to correct input mistakes or adhere to guidelines.
+How Guardrails Work
+-------------------

-.. Note::
-    Today, Arch offers support for jailbreak via Arch-Guard. We will be adding support for additional guards in Q1, 2025 (including response guardrails)
+In Plano, guardrails are usually implemented as filters that run as HTTP services. Each filter receives the incoming prompt and related metadata, evaluates it
+against policy, and either lets the request continue (HTTP 200) or terminates it early with an appropriate error code (typically HTTP 4xx for policy failures).

-What Is Arch-Guard
-~~~~~~~~~~~~~~~~~~
-`Arch-Guard <https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d>`_ is a robust classifier model specifically trained on a diverse corpus of prompt attacks.
-It excels at detecting explicitly malicious prompts, providing an essential layer of security for LLM applications.
+The example below shows a simple, plain-Python HTTP service that acts as a topicality guardrail: it rejects any prompt that is not related to the
+"weather" domain.

-By embedding Arch-Guard within the Arch architecture, we empower developers to build robust, LLM-powered applications while prioritizing security and safety. With Arch-Guard, you can navigate the complexities of prompt management with confidence, knowing you have a reliable defense against malicious input.
+.. code-block:: python
+    :caption: Example topicality guard filter in plain Python (FastAPI)
+
+    from fastapi import FastAPI, Request, HTTPException
+
+    app = FastAPI()
+
+    ALLOWED_KEYWORDS = {"weather", "forecast", "temperature", "rain", "snow", "humidity"}
+
+    @app.post("/guardrails/topic")
+    async def topic_guard(request: Request):
+        body = await request.json()
+        # Expecting an OpenAI-style request body with messages
+        messages = body.get("messages", [])
+        user_content = " ".join(
+            m["content"] for m in messages if m.get("role") == "user"
+        ).lower()
+
+        if not any(keyword in user_content for keyword in ALLOWED_KEYWORDS):
+            # Return 400 to indicate a policy failure (not a server error)
+            raise HTTPException(
+                status_code=400,
+                detail={
+                    "error": "off_topic",
+                    "message": "This assistant only answers weather-related questions.",
+                },
+            )
+
+        # If the prompt is on-topic, just pass the original body through
+        return body


-Example Configuration
-~~~~~~~~~~~~~~~~~~~~~
-Here is an example of using Arch-Guard in Arch:
+To wire this guardrail into Plano, you define a listener of ``type: agent`` and attach a filter chain with a single filter that points
+to the Python service above.

-.. literalinclude:: includes/arch_config.yaml
-    :language: yaml
-    :linenos:
-    :lines: 22-26
-    :caption: Arch-Guard Example Configuration
+.. code-block:: yaml
+    :caption: Listener (type: agent) with a topicality guard filter

-How Arch-Guard Works
----------------------
+    filters:
+      - id: topicality_guard
+        url: http://topic-guard:8000/guardrails/topic

-#. **Pre-Processing Stage**
-
-    As a request or prompt is received, Arch Guard first performs validation. If any violations are detected, the input is flagged, and a tailored error message may be returned.
-
-#. **Error Handling and Feedback**
-
-    If the prompt contains errors or does not meet certain criteria, the user receives immediate feedback or correction suggestions, enhancing usability and reducing the chance of repeated input mistakes.
-
-Benefits of Using Arch Guard
------------------------------
-
- **Enhanced Security**: Protects against injection attacks, harmful content, and misuse, securing both system and user data.
-
- **Better User Experience**: Clear feedback and error correction improve user interactions by guiding them to correct input formats and constraints.
+    listeners:
+    - type: agent
+        name: agent_listener
+        port: 8001
+        router: arch_agent_router
+        agents:
+        - id: rag_agent
+            description: virtual assistant for retrieval augmented generation tasks
+            filter_chain:
+            - topicality_guard


-Summary
-------
-
-Prompt guard is an essential tool for any prompt-based system that values security, accuracy, and compliance.
-By implementing Prompt Guard, developers can provide a robust layer of input validation and security, leading to better-performing, reliable, and safer applications.
+When a request arrives at ``agent_listener``, Plano will first call the ``topicality_guard`` filter. If the filter returns **HTTP 200**,
+the request continues on to the configured agent or prompt target. If the filter returns **HTTP 400**, Plano returns that error back to
+the caller and does not forward the request further—enforcing your domain guardrail without changing any application code.
--- a/docs/source/guides/state.rst
+++ b/docs/source/guides/state.rst
@ -0,0 +1,255 @@
+.. _managing_conversational_state:
+
+Conversational State
+=====================
+
+The OpenAI Responses API (``v1/responses``) is designed for multi-turn conversations where context needs to persist across requests. Plano provides a unified ``v1/responses`` API that works with **any LLM provider**—OpenAI, Anthropic, Azure OpenAI, DeepSeek, or any OpenAI-compatible provider—while automatically managing conversational state for you.
+
+Unlike the traditional Chat Completions API where you manually manage conversation history by including all previous messages in each request, Plano handles state management behind the scenes. This means you can use the Responses API with any model provider, and Plano will persist conversation context across requests—making it ideal for building conversational agents that remember context without bloating every request with full message history.
+
+How It Works
+------------
+
+When a client calls the Responses API:
+
+1. **First request**: Plano generates a unique ``resp_id`` and stores the conversation state (messages, model, provider, timestamp).
+2. **Subsequent requests**: The client includes the ``previous_resp_id`` from the previous response. Plano retrieves the stored conversation state, merges it with the new input, and sends the combined context to the LLM.
+3. **Response**: The LLM sees the full conversation history without the client needing to resend all previous messages.
+
+This pattern dramatically reduces bandwidth and makes it easier to build multi-turn agents—Plano handles the state plumbing so you can focus on agent logic.
+
+**Example Using OpenAI Python SDK:**
+
+.. code-block:: python
+
+    from openai import OpenAI
+
+    # Point to Plano's Model Proxy endpoint
+    client = OpenAI(
+        api_key="test-key",
+        base_url="http://127.0.0.1:12000/v1"
+    )
+
+    # First turn - Plano creates a new conversation state
+    response = client.responses.create(
+        model="gpt-4o-mini",  # Works with any configured provider
+        input="My name is Alice and I like Python"
+    )
+
+    # Save the response_id for conversation continuity
+    resp_id = response.id
+    print(f"Assistant: {response.output_text}")
+
+    # Second turn - Plano automatically retrieves previous context
+    resp2 = client.responses.create(
+        model="claude-sonnet-4-20250514", # Different model/provider, make sure its configured in plano_config.yaml
+        input="Please list all the messages you have received in our conversation, numbering each one.",
+        previous_response_id=resp_id,
+    )
+
+    print(f"Assistant: {resp2.output_text}")
+    # Output: "Your name is Alice and your favorite language is Python"
+
+Notice how the second request only includes the new user message—Plano automatically merges it with the stored conversation history before sending to the LLM.
+
+Configuration Overview
+----------------------
+
+State storage is configured in the ``state_storage`` section of your ``plano_config.yaml``:
+
+.. literalinclude:: ../resources/includes/arch_config_state_storage_example.yaml
+    :language: yaml
+    :lines: 21-30
+    :linenos:
+    :emphasize-lines: 3,6-10
+
+Plano supports two storage backends:
+
+* **Memory**: Fast, ephemeral storage for development and testing. State is lost when Plano restarts.
+* **PostgreSQL**: Durable, production-ready storage with support for Supabase and self-hosted PostgreSQL instances.
+
+.. note::
+   If you don't configure ``state_storage``, conversation state management is **disabled**. The Responses API will still work, but clients must manually include full conversation history in each request (similar to the Chat Completions API behavior).
+
+Memory Storage (Development)
+----------------------------
+
+Memory storage keeps conversation state in-memory using a thread-safe ``HashMap``. It's perfect for local development, demos, and testing, but all state is lost when Plano restarts.
+
+**Configuration**
+
+Add this to your ``plano_config.yaml``:
+
+.. code-block:: yaml
+
+   state_storage:
+     type: memory
+
+That's it. No additional setup required.
+
+**When to Use Memory Storage**
+
+* Local development and debugging
+* Demos and proof-of-concepts
+* Automated testing environments
+* Single-instance deployments where persistence isn't critical
+
+**Limitations**
+
+* State is lost on restart
+* Not suitable for production workloads
+* Cannot scale across multiple Plano instances
+
+PostgreSQL Storage (Production)
+--------------------------------
+
+PostgreSQL storage provides durable, production-grade conversation state management. It works with both self-hosted PostgreSQL and Supabase (PostgreSQL-as-a-service), making it ideal for scaling multi-agent systems in production.
+
+Prerequisites
+^^^^^^^^^^^^^
+
+Before configuring PostgreSQL storage, you need:
+
+1. A PostgreSQL database (version 12 or later)
+2. Database credentials (host, user, password)
+3. The ``conversation_states`` table created in your database
+
+**Setting Up the Database**
+
+Run the SQL schema to create the required table:
+
+.. literalinclude:: ../resources/db_setup/conversation_states.sql
+    :language: sql
+    :linenos:
+
+**Using psql:**
+
+.. code-block:: bash
+
+   psql $DATABASE_URL -f docs/db_setup/conversation_states.sql
+
+**Using Supabase Dashboard:**
+
+1. Log in to your Supabase project
+2. Navigate to the SQL Editor
+3. Copy and paste the SQL from ``docs/db_setup/conversation_states.sql``
+4. Run the query
+
+Configuration
+^^^^^^^^^^^^^
+
+Once the database table is created, configure Plano to use PostgreSQL storage:
+
+.. code-block:: yaml
+
+   state_storage:
+     type: postgres
+     connection_string: "postgresql://user:password@host:5432/database"
+
+**Using Environment Variables**
+
+You should **never** hardcode credentials. Use environment variables instead:
+
+.. code-block:: yaml
+
+   state_storage:
+     type: postgres
+     connection_string: "postgresql://myuser:$DB_PASSWORD@db.example.com:5432/postgres"
+
+Then set the environment variable before running Plano:
+
+.. code-block:: bash
+
+   export DB_PASSWORD="your-secure-password"
+   # Run Plano or config validation
+   ./plano
+
+.. warning::
+   **Special Characters in Passwords**: If your password contains special characters like ``#``, ``@``, or ``&``, you must URL-encode them in the connection string. For example, ``MyPass#123`` becomes ``MyPass%23123``.
+
+Supabase Connection Strings
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Supabase requires different connection strings depending on your network setup. Most users should use the **Session Pooler** connection string.
+
+**IPv4 Networks (Most Common)**
+
+Use the Session Pooler connection string (port 5432):
+
+.. code-block:: text
+
+   postgresql://postgres.[PROJECT-REF]:[PASSWORD]@aws-0-[REGION].pooler.supabase.com:5432/postgres
+
+**IPv6 Networks**
+
+Use the direct connection (port 5432):
+
+.. code-block:: text
+
+   postgresql://postgres:[PASSWORD]@db.[PROJECT-REF].supabase.co:5432/postgres
+
+**Finding Your Connection String**
+
+1. Go to your Supabase project dashboard
+2. Navigate to **Settings → Database → Connection Pooling**
+3. Copy the **Session mode** connection string
+4. Replace ``[YOUR-PASSWORD]`` with your actual database password
+5. URL-encode special characters in the password
+
+**Example Configuration**
+
+.. code-block:: yaml
+
+   state_storage:
+     type: postgres
+     connection_string: "postgresql://postgres.myproject:$DB_PASSWORD@aws-0-us-west-2.pooler.supabase.com:5432/postgres"
+
+Then set the environment variable:
+
+.. code-block:: bash
+
+   # If your password is "MyPass#123", encode it as "MyPass%23123"
+   export DB_PASSWORD="MyPass%23123"
+
+Troubleshooting
+---------------
+
+**"Table 'conversation_states' does not exist"**
+
+Run the SQL schema from ``docs/db_setup/conversation_states.sql`` against your database.
+
+**Connection errors with Supabase**
+
+* Verify you're using the correct connection string format (Session Pooler for IPv4)
+* Check that your password is URL-encoded if it contains special characters
+* Ensure your Supabase project hasn't paused due to inactivity (free tier)
+
+**Permission errors**
+
+Ensure your database user has the following permissions:
+
+.. code-block:: sql
+
+   GRANT SELECT, INSERT, UPDATE, DELETE ON conversation_states TO your_user;
+
+**State not persisting across requests**
+
+* Verify ``state_storage`` is configured in your ``plano_config.yaml``
+* Check Plano logs for state storage initialization messages
+* Ensure the client is sending the ``prev_response_id={$response_id}`` from previous responses
+
+Best Practices
+--------------
+
+1. **Use environment variables for credentials**: Never hardcode database passwords in configuration files.
+2. **Start with memory storage for development**: Switch to PostgreSQL when moving to production.
+3. **Implement cleanup policies**: Prevent unbounded growth by regularly archiving or deleting old conversations.
+4. **Monitor storage usage**: Track conversation state table size and query performance in production.
+5. **Test failover scenarios**: Ensure your application handles storage backend failures gracefully.
+
+Next Steps
+----------
+
+* Learn more about building :ref:`agents <agents>` that leverage conversational state
+* Explore :ref:`filter chains <filter_chain>` for enriching conversation context
+* See the :ref:`LLM Providers <llm_providers>` guide for configuring model routing