http-filter: add fully http based demo (remove MCP) (#681)

* http-filter: add fully http based demo (remove MCP) * Fix pre-commit formatting * add instructions about uv build/sync in cli (#675) * Musa/demo fix (#676) * fix demo with travel agent * Update .gitignore * remove sse chunk rendering * ensure that request id is consistent (#677) * ensure that request id is consistent * remove test debug/info statements * Introduce signals change (#655) * adding support for signals * reducing false positives for signals like positive interaction * adding docs. Still need to fix the messages list, but waiting on PR #621 * Improve frustration detection: normalize contractions and refine punctuation * Further refine test cases with longer messages * minor doc changes * fixing echo statement for build * fixing the messages construction and using the trait for signals * update signals docs * fixed some minor doc changes * added more tests and fixed docuemtnation. PR 100% ready * made fixes based on PR comments * Optimize latency 1. replace sliding window approach with trigram containment check 2. add code to pre-compute ngrams for patterns * removed some debug statements to make tests easier to read * PR comments to make ObservableStreamProcessor accept optonal Vec<Messagges> * fixed PR comments --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> Co-authored-by: MeiyuZhong <mariazhong9612@gmail.com> Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com> * pass request_id in orchestrator and routing model (#678) * release 0.4.2 (#679) * tweaks to web and docs to align to 0.4.2 (#680) * tweaks to web and docs to align to 0.4.2 * made our release banner clickable --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> * Added request id across agents * fix http_filter agent: request id + pre-commit --------- Co-authored-by: Adil Hafeez <adil@katanemo.com> Co-authored-by: Musa <malikmusa1323@gmail.com> Co-authored-by: Salman Paracha <salman.paracha@gmail.com> Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com>
2026-07-23 16:51:04 +02:00 · 2026-01-13 13:51:12 -08:00 · 2026-01-13 13:51:12 -08:00 · 6a29f8398f
commit 6a29f8398f
parent ab391f96c7
19 changed files with 3655 additions and 0 deletions
--- a/demos/use_cases/http_filter/Dockerfile
+++ b/demos/use_cases/http_filter/Dockerfile
@ -0,0 +1,26 @@
+FROM python:3.13-slim
+
+WORKDIR /app
+
+# Install bash and uv
+RUN apt-get update && apt-get install -y bash && rm -rf /var/lib/apt/lists/*
+RUN pip install --no-cache-dir uv
+
+# Copy dependency files
+COPY pyproject.toml README.md ./
+
+# Copy source code
+COPY src/ ./src/
+COPY start_agents.sh ./
+
+# Install dependencies using uv
+RUN uv pip install --system --no-cache click fastmcp pydantic fastapi uvicorn openai
+
+# Make start script executable
+RUN chmod +x start_agents.sh
+
+# Expose ports for all agents
+EXPOSE 10500 10501 10502 10505
+
+# Run the start script with bash
+CMD ["bash", "./start_agents.sh"]
--- a/demos/use_cases/http_filter/README.md
+++ b/demos/use_cases/http_filter/README.md
@ -0,0 +1,128 @@
+# RAG Agent Demo
+
+A multi-agent RAG system demonstrating plano's agent filter chain with MCP protocol.
+
+## Architecture
+
+This demo consists of four components:
+1. **Input Guards** (MCP filter) - Validates queries are within TechCorp's domain
+2. **Query Rewriter** (MCP filter) - Rewrites user queries for better retrieval
+3. **Context Builder** (MCP filter) - Retrieves relevant context from knowledge base
+4. **RAG Agent** (REST) - Generates final responses based on augmented context
+
+## Components
+
+### Input Guards Filter (MCP)
+- **Port**: 10500
+- **Tool**: `input_guards`
+- Validates queries are within TechCorp's domain
+- Rejects queries about other companies or unrelated topics
+
+### Query Rewrit3r Filter (MCP)
+- **Port**: 10501
+- **Tool**: `query_rewriter`
+- Improves queries using LLM before retrieval
+
+### Context Builder Filter (MCP)
+- **Port**: 10502
+- **Tool**: `context_builder`
+- Augments queries with relevant passages from knowledge base
+
+### RAG Agent (REST/OpenAI)
+- **Port**: 10505
+- **Endpoint**: `/v1/chat/completions`
+- Generates responses using OpenAI-compatible API
+
+## Quick Start
+
+### 1. Start everything with Docker Compose
+```bash
+docker compose up --build
+```
+
+This brings up:
+- Input Guards MCP server on port 10500
+- Query Rewriter MCP server on port 10501
+- Context Builder MCP server on port 10502
+- RAG Agent REST server on port 10505
+- Plano listener on port 8001 (and gateway on 12000)
+- Jaeger UI for viewing traces at http://localhost:16686
+- Open WebUI at http://localhost:8080 for interactive queries
+
+> Set `OPENAI_API_KEY` in your environment before running; `LLM_GATEWAY_ENDPOINT` defaults to `http://host.docker.internal:12000/v1`.
+
+### 2. Test the system
+
+**Option A: Using Open WebUI (recommended)**
+
+Navigate to http://localhost:8080 and send queries through the chat interface.
+
+**Option B: Using curl**
+```bash
+curl -X POST http://localhost:8001/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o",
+    "messages": [{"role": "user", "content": "What is the guaranteed uptime for TechCorp?"}]
+  }'
+```
+
+## Configuration
+
+The `config.yaml` defines how agents are connected:
+
+```yaml
+filters:
+  - id: input_guards
+    url: http://host.docker.internal:10500
+    # type: mcp (default)
+    # tool: input_guards (default - same as filter id)
+
+  - id: query_rewriter
+    url: http://host.docker.internal:10501
+    # type: mcp (default)
+
+  - id: context_builder
+    url: http://host.docker.internal:10502
+```
+
+## How It Works
+
+1. User sends request to plano listener on port 8001
+2. Request passes through MCP filter chain:
+   - **Input Guards** validates the query is within TechCorp's domain
+   - **Query Rewriter** rewrites the query for better retrieval
+   - **Context Builder** augments query with relevant knowledge base passages
+3. Augmented request is forwarded to **RAG Agent** REST endpoint
+4. RAG Agent generates final response using LLM
+
+## Additional Configuration
+
+See `config.yaml` for the complete filter chain setup. The MCP filters use default settings:
+- `type: mcp` (default)
+- `transport: streamable-http` (default)
+- Tool name defaults to filter ID
+
+See `sample_queries.md` for example queries to test the RAG system.
+
+Example request:
+```bash
+curl -X POST http://localhost:8001/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o",
+    "messages": [
+      {
+        "role": "user",
+        "content": "What is the guaranteed uptime for TechCorp?"
+      }
+    ]
+  }'
+```
+- `LLM_GATEWAY_ENDPOINT` - lpano endpoint (default: `http://localhost:12000/v1`)
+- `OPENAI_API_KEY` - OpenAI API key for model providers
+
+## Additional Resources
+
+- See `sample_queries.md` for more example queries
+- See `config.yaml` for complete configuration details
--- a/demos/use_cases/http_filter/config.yaml
+++ b/demos/use_cases/http_filter/config.yaml
@ -0,0 +1,50 @@
+version: v0.3.0
+
+agents:
+  - id: rag_agent
+    url: http://rag-agents:10505
+
+filters:
+  - id: input_guards
+    url: http://rag-agents:10500
+    type: http
+    # type: mcp (default)
+    # transport: streamable-http (default)
+    # tool: input_guards (default - same as filter id)
+  - id: query_rewriter
+    url: http://rag-agents:10501
+    type: http
+    # type: mcp (default)
+    # transport: streamable-http (default)
+    # tool: query_rewriter (default - same as filter id)
+  - id: context_builder
+    url: http://rag-agents:10502
+    type: http
+
+model_providers:
+  - model: openai/gpt-4o-mini
+    access_key: $OPENAI_API_KEY
+    default: true
+  - model: openai/gpt-4o
+    access_key: $OPENAI_API_KEY
+
+model_aliases:
+  fast-llm:
+    target: gpt-4o-mini
+  smart-llm:
+    target: gpt-4o
+
+listeners:
+  - type: agent
+    name: agent_1
+    port: 8001
+    router: plano_orchestrator_v1
+    agents:
+      - id: rag_agent
+        description: virtual assistant for retrieval augmented generation tasks
+        filter_chain:
+          - input_guards
+          - query_rewriter
+          - context_builder
+tracing:
+  random_sampling: 100
--- a/demos/use_cases/http_filter/docker-compose.yaml
+++ b/demos/use_cases/http_filter/docker-compose.yaml
@ -0,0 +1,42 @@
+services:
+  rag-agents:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports:
+      - "10500:10500"
+      - "10501:10501"
+      - "10502:10502"
+      - "10505:10505"
+    environment:
+      - LLM_GATEWAY_ENDPOINT=${LLM_GATEWAY_ENDPOINT:-http://host.docker.internal:12000/v1}
+      - OPENAI_API_KEY=${OPENAI_API_KEY:?OPENAI_API_KEY environment variable is required but not set}
+  plano:
+    build:
+      context: ../../../
+      dockerfile: Dockerfile
+    ports:
+      - "12000:12000"
+      - "8001:8001"
+    environment:
+      - ARCH_CONFIG_PATH=/config/config.yaml
+      - OPENAI_API_KEY=${OPENAI_API_KEY:?OPENAI_API_KEY environment variable is required but not set}
+    volumes:
+      - ./config.yaml:/app/arch_config.yaml
+      - /etc/ssl/cert.pem:/etc/ssl/cert.pem
+  jaeger:
+    build:
+      context: ../../shared/jaeger
+    ports:
+      - "16686:16686"
+      - "4317:4317"
+      - "4318:4318"
+  open-web-ui:
+    image: dyrnq/open-webui:main
+    restart: always
+    ports:
+      - "8080:8080"
+    environment:
+      - DEFAULT_MODEL=gpt-4o-mini
+      - ENABLE_OPENAI_API=true
+      - OPENAI_API_BASE_URL=http://host.docker.internal:8001/v1
--- a/demos/use_cases/http_filter/http.rest
+++ b/demos/use_cases/http_filter/http.rest
@ -0,0 +1,77 @@
+# http.rest (replacement for mcp_query.rest)
+
+@host = http://localhost
+@model = gpt-4o-mini
+
+# Filter endpoints (HTTP-only)
+@inputGuards = {{host}}:10500
+@queryRewriter = {{host}}:10501
+@contextBuilder = {{host}}:10502
+
+# Plano agent listener (the thing Open WebUI calls)
+@planoAgent = {{host}}:8001
+
+
+POST {{queryRewriter}}/
+Content-Type: application/json
+
+[
+  {
+    "role": "user",
+    "content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
+  }
+]
+
+
+
+POST {{inputGuards}}/
+Content-Type: application/json
+
+[
+  {
+    "role": "user",
+    "content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
+  }
+]
+
+
+
+POST {{contextBuilder}}/
+Content-Type: application/json
+
+[
+  {
+    "role": "user",
+    "content": "What is TechCorp's uptime SLA?"
+  }
+]
+
+
+POST {{planoAgent}}/v1/chat/completions
+Content-Type: application/json
+
+{
+  "model": "fast-llm",
+  "messages": [
+    {
+      "role": "user",
+      "content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
+    }
+  ],
+  "stream": false
+}
+
+
+POST {{planoAgent}}/v1/chat/completions
+Content-Type: application/json
+
+{
+  "model": "fast-llm",
+  "messages": [
+    {
+      "role": "user",
+      "content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
+    }
+  ],
+  "stream": true
+}
--- a/demos/use_cases/http_filter/mcp_query.rest
+++ b/demos/use_cases/http_filter/mcp_query.rest
@ -0,0 +1,86 @@
+### Initialize MCP Session (SSE)
+POST http://localhost:10501/mcp
+Content-Type: application/json
+Accept: application/json, text/event-stream
+
+{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"capabilities":{},"protocolVersion":"2024-11-05","clientInfo":{"name":"test","version":"1.0.0"}}}
+
+### Send Initialized Notification
+POST http://localhost:10501/mcp
+Content-Type: application/json
+Accept: application/json, text/event-stream
+mcp-session-id: 35d455dc07b8400887f86668590f12bb
+
+{
+  "jsonrpc": "2.0",
+  "method": "notifications/initialized"
+}
+
+### List Tools
+POST http://localhost:10501/mcp
+Content-Type: application/json
+Accept: application/json, text/event-stream
+mcp-session-id: eb10a691b36e4547b6c93c5dc5b47e11
+
+{
+  "jsonrpc": "2.0",
+  "id": "list-tools-1",
+  "method": "tools/list"
+}
+
+### Call Query Rewriter Tool
+POST http://localhost:10501/mcp
+Content-Type: application/json
+Accept: application/json, text/event-stream
+mcp-session-id: 6b95ff75825a402b90eb3ea07e23fbce
+
+{
+  "jsonrpc": "2.0",
+  "id": "3d3b886a-6216-4a26-a422-7a972529c0e7",
+  "method": "tools/call",
+  "params": {
+    "arguments": {
+      "messages": [
+        {
+          "content": "What is the guaranteed uptime percentage for TechCorp's cloud services?",
+          "role": "user"
+        }
+      ]
+    },
+    "name": "query_rewriter"
+  }
+}
+
+### another test
+
+# Content-Type: application/json
+# Accept: application/json, text/event-stream
+# mcp-session-id: ed7a81a1d39549ecaadb867a6b2daf1e
+
+POST http://localhost:10501/mcp
+content-type: application/json
+mcp-session-id: e4ec1ae904e14e06b7d194da10e5f74c
+accept: application/json, text/event-stream
+
+{"jsonrpc":"2.0","id":"4bb1043a-2953-4bcd-b801-f270b0ae8c39","method":"tools/call","params":{"arguments":{"messages":[{"content":"What is the guaranteed uptime percentage for TechCorp's cloud services?","role":"user"}]},"name":"query_rewriter"}}
+
+
+
+### stream test
+
+POST http://localhost:10501/mcp
+content-type: application/json
+mcp-session-id: 35d455dc07b8400887f86668590f12bb
+accept: application/json, text/event-stream
+
+{
+  "jsonrpc": "2.0",
+  "id": 1,
+  "method": "tools/call",
+  "params": {
+    "name": "long_job",
+    "arguments": {
+      "n": 3
+    }
+  }
+}
--- a/demos/use_cases/http_filter/pyproject.toml
+++ b/demos/use_cases/http_filter/pyproject.toml
@ -0,0 +1,22 @@
+[project]
+name = "rag_agent"
+version = "0.1.0"
+description = "RAG Agent"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "click>=8.2.1",
+    "mcp>=1.13.1",
+    "fastmcp>=2.14",
+    "pydantic>=2.11.7",
+    "fastapi>=0.104.1",
+    "uvicorn>=0.24.0",
+    "openai==2.13.0",
+]
+
+[project.scripts]
+rag_agent = "rag_agent:main"
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
--- a/demos/use_cases/http_filter/sample_queries.md
+++ b/demos/use_cases/http_filter/sample_queries.md
@ -0,0 +1,64 @@
+# Sample Queries for Knowledge Base RAG Agent
+
+## Service Level Agreement Queries
+- What is the guaranteed uptime percentage for TechCorp's cloud services?
+- What remedies are available if the API response time exceeds the agreed threshold?
+- How quickly must TechCorp respond to critical support issues?
+- What monitoring and reporting requirements are specified in the SLA?
+- When was the TechCorp service agreement signed and by whom?
+
+## Privacy Policy Queries
+- What encryption methods does DataSecure use to protect data?
+- How long does DataSecure retain personal data after account deletion?
+- What rights do users have regarding their personal information?
+- Can DataSecure sell user data to third parties for marketing?
+- Who should be contacted for privacy-related concerns at DataSecure?
+
+## Supply Chain Agreement Queries
+- What types of automotive components does PrecisionParts supply?
+- What are the payment terms and volume discount structure?
+- What quality standards must the supplied components meet?
+- What are the penalties for late delivery?
+- What insurance coverage requirements apply to the supplier?
+
+## Student Data Management Queries
+- What federal laws must EduTech comply with regarding student data?
+- What security measures are in place to protect student information?
+- How long are student records retained after graduation?
+- What consent is required for students under 13 years old?
+- Who can access student educational records?
+
+## Investment Advisory Queries
+- What is FinanceFirst's management fee structure?
+- What types of investments are included in the advisory services?
+- What regulatory body oversees FinanceFirst Advisors?
+- How often are portfolio reviews conducted?
+- What are the client's responsibilities under this agreement?
+
+## Healthcare Standards Queries
+- What is the target response time for emergency code teams?
+- What hand hygiene compliance rate is required?
+- How quickly must medical records be completed after patient encounters?
+- What continuing education requirements apply to nursing staff?
+- What patient safety protocols are mandatory upon admission?
+
+## Cross-Document Queries
+- Which agreements include confidentiality or data protection provisions?
+- What are the common termination notice periods across different contract types?
+- Which documents specify insurance or liability coverage requirements?
+- What compliance and regulatory requirements are mentioned across agreements?
+- Which contracts include performance metrics or service level commitments?
+
+## Complex Analysis Queries
+- Compare the data retention policies across the privacy policy and student data management documents.
+- What are the different approaches to risk management across the supply chain and investment advisory agreements?
+- How do the security measures in the healthcare standards compare to those in the privacy policy?
+- Which agreements provide the most detailed compliance and regulatory frameworks?
+- What common themes exist in the quality assurance requirements across different industries?
+
+## Document-Specific Detail Queries
+- List all the specific percentages, timeframes, and numerical requirements mentioned in the SLA.
+- What are all the contact persons and their roles mentioned across the documents?
+- Identify all the compliance standards and certifications referenced in the supply chain agreement.
+- What are the specific consequences or penalties mentioned for non-compliance across agreements?
+- List all the third-party systems, tools, or services mentioned in the documents.
--- a/demos/use_cases/http_filter/src/rag_agent/init.py
+++ b/demos/use_cases/http_filter/src/rag_agent/init.py
@ -0,0 +1,109 @@
+import click
+from fastmcp import FastMCP
+
+mcp = None
+
+
+@click.command()
+@click.option(
+    "--transport",
+    "transport",
+    default="streamable-http",
+    help="Transport type: stdio or sse",
+)
+@click.option("--host", "host", default="localhost", help="Host to bind MCP server to")
+@click.option("--port", "port", type=int, default=10500, help="Port for MCP server")
+@click.option(
+    "--agent",
+    "agent",
+    required=True,
+    help="Agent name: query_rewriter, context_builder, or response_generator",
+)
+@click.option(
+    "--name",
+    "agent_name",
+    default=None,
+    help="Custom MCP server name (defaults to agent type)",
+)
+@click.option(
+    "--rest-server",
+    "rest_server",
+    is_flag=True,
+    help="Start REST server instead of MCP server",
+)
+@click.option("--rest-port", "rest_port", default=8000, help="Port for REST server")
+def main(host, port, agent, transport, agent_name, rest_server, rest_port):
+    """Start a RAG agent as an MCP server or REST server."""
+
+    # Map friendly names to agent modules
+    agent_map = {
+        "input_guards": ("rag_agent.input_guards", "Input Guards Agent"),
+        "query_rewriter": ("rag_agent.query_rewriter", "Query Rewriter Agent"),
+        "context_builder": ("rag_agent.context_builder", "Context Builder Agent"),
+        "response_generator": (
+            "rag_agent.rag_agent",
+            "Response Generator Agent",
+        ),
+    }
+
+    if agent not in agent_map:
+        print(f"Error: Unknown agent '{agent}'")
+        print(f"Available agents: {', '.join(agent_map.keys())}")
+        return
+
+    module_name, default_name = agent_map[agent]
+    mcp_name = agent_name or default_name
+
+    if rest_server:
+        # REST server mode - supported for query_rewriter and response_generator
+        if agent == "response_generator":
+            print(f"Starting REST server on {host}:{rest_port} for agent: {agent}")
+            from rag_agent.rag_agent import start_server
+
+            start_server(host=host, port=rest_port)
+            return
+        elif agent == "query_rewriter":
+            print(f"Starting REST server on {host}:{rest_port} for agent: {agent}")
+            from rag_agent.query_rewriter import start_server
+
+            start_server(host=host, port=rest_port)
+            return
+        else:
+            print(f"Error: Agent '{agent}' does not support REST server mode.")
+            print(
+                f"REST server is only supported for: query_rewriter, response_generator"
+            )
+            print(f"Remove --rest-server flag to start {agent} as an MCP server.")
+            return
+    else:
+        # Only input_guards, query_rewriter and context_builder support MCP
+        if agent not in ["input_guards", "query_rewriter", "context_builder"]:
+            print(f"Error: Agent '{agent}' does not support MCP mode.")
+            print(
+                f"MCP is only supported for: input_guards, query_rewriter, context_builder"
+            )
+            print(f"Use --rest-server flag to start {agent} as a REST server.")
+            return
+
+        global mcp
+        mcp = FastMCP(mcp_name, host=host, port=port)
+
+        print(f"Starting MCP server: {mcp_name}")
+        print(f"  Agent: {agent}")
+        print(f"  Transport: {transport}")
+        print(f"  Host: {host}")
+        print(f"  Port: {port}")
+
+        # Import the agent module to register its tools
+        import importlib
+
+        importlib.import_module(module_name)
+
+        print(f"Agent '{agent}' loaded successfully")
+        print(f"MCP server ready on {transport}://{host}:{port}")
+
+        mcp.run(transport=transport)
+
+
+if __name__ == "__main__":
+    main()
--- a/demos/use_cases/http_filter/src/rag_agent/main.py
+++ b/demos/use_cases/http_filter/src/rag_agent/main.py
@ -0,0 +1,4 @@
+from . import main
+
+if __name__ == "__main__":
+    main()
--- a/demos/use_cases/http_filter/src/rag_agent/api.py
+++ b/demos/use_cases/http_filter/src/rag_agent/api.py
@ -0,0 +1,36 @@
+from pydantic import BaseModel
+from typing import List, Optional, Dict, Any
+
+
+class ChatMessage(BaseModel):
+    role: str
+    content: str
+
+
+class ChatCompletionRequest(BaseModel):
+    model: str
+    messages: List[ChatMessage]
+    temperature: Optional[float] = 1.0
+    max_tokens: Optional[int] = None
+    top_p: Optional[float] = 1.0
+    frequency_penalty: Optional[float] = 0.0
+    presence_penalty: Optional[float] = 0.0
+    stream: Optional[bool] = False
+    stop: Optional[List[str]] = None
+
+
+class ChatCompletionResponse(BaseModel):
+    id: str
+    object: str = "chat.completion"
+    created: int
+    model: str
+    choices: List[Dict[str, Any]]
+    usage: Dict[str, int]
+
+
+class ChatCompletionStreamResponse(BaseModel):
+    id: str
+    object: str = "chat.completion.chunk"
+    created: int
+    model: str
+    choices: List[Dict[str, Any]]
--- a/demos/use_cases/http_filter/src/rag_agent/context_builder.py
+++ b/demos/use_cases/http_filter/src/rag_agent/context_builder.py
@ -0,0 +1,228 @@
+import json
+from typing import List, Optional, Dict, Any
+from openai import AsyncOpenAI
+import os
+import logging
+import csv
+from pathlib import Path
+
+from .api import ChatMessage
+from fastapi import Request, FastAPI
+
+# from . import mcp
+# from fastmcp.server.dependencies import get_http_headers
+
+# Set up logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - [CONTEXT_BUILDER]    - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+### add new setup
+app = FastAPI(title="RAG Agent Context Builder", version="1.0.0")
+
+# Configuration for archgw LLM gateway
+LLM_GATEWAY_ENDPOINT = os.getenv("LLM_GATEWAY_ENDPOINT", "http://localhost:12000/v1")
+RAG_MODEL = "gpt-4o-mini"
+
+# Initialize OpenAI client for archgw
+archgw_client = AsyncOpenAI(
+    base_url=LLM_GATEWAY_ENDPOINT,
+    api_key="EMPTY",  # archgw doesn't require a real API key
+)
+
+# Global variable to store the knowledge base
+knowledge_base = []
+
+
+def load_knowledge_base():
+    """Load the sample_knowledge_base.csv file into memory on startup."""
+    global knowledge_base
+
+    # Get the path to the CSV file relative to this script
+    current_dir = Path(__file__).parent
+    csv_path = current_dir / "sample_knowledge_base.csv"
+
+    print(f"Loading knowledge base from {csv_path}")
+
+    try:
+        knowledge_base = []
+        with open(csv_path, "r", encoding="utf-8-sig") as file:
+            csv_reader = csv.DictReader(file)
+            for row in csv_reader:
+                knowledge_base.append({"path": row["path"], "content": row["content"]})
+
+        logger.info(f"Loaded {len(knowledge_base)} documents from knowledge base")
+
+    except Exception as e:
+        logger.error(f"Error loading knowledge base: {e}")
+        knowledge_base = []
+
+
+async def find_relevant_passages(
+    query: str,
+    traceparent: Optional[str] = None,
+    request_id: Optional[str] = None,
+    top_k: int = 3,
+) -> List[Dict[str, str]]:
+    """Use the LLM to find the most relevant passages from the knowledge base."""
+
+    if not knowledge_base:
+        logger.warning("Knowledge base is empty")
+        return []
+
+    # Create a system prompt for passage selection
+    system_prompt = f"""You are a retrieval assistant that selects the most relevant document passages for a given query.
+
+                    Given a user query and a list of document passages, identify the {top_k} most relevant passages that would help answer the query.
+
+                    Query: {query}
+
+                    Available passages:
+                    """
+
+    # Add all passages with indices
+    for i, doc in enumerate(knowledge_base):
+        system_prompt += (
+            f"\n[{i}] Path: {doc['path']}\nContent: {doc['content'][:500]}...\n"
+        )
+
+    system_prompt += f"""
+
+        Please respond with ONLY the indices of the {top_k} most relevant passages, separated by commas (e.g., "0,3,7").
+        If fewer than {top_k} passages are relevant, return only the relevant ones.
+        If no passages are relevant, return "NONE"."""
+
+    try:
+        # Call archgw to select relevant passages
+        logger.info(f"Calling archgw to find relevant passages for query: '{query}'")
+
+        # Prepare extra headers if traceparent is provided
+        extra_headers = {"x-envoy-max-retries": "3", "x-request-id": request_id}
+        if traceparent:
+            extra_headers["traceparent"] = traceparent
+
+        response = await archgw_client.chat.completions.create(
+            model=RAG_MODEL,
+            messages=[{"role": "system", "content": system_prompt}],
+            temperature=0.1,
+            max_tokens=50,
+            extra_headers=extra_headers,
+        )
+
+        result = response.choices[0].message.content.strip()
+        logger.info(f"LLM selected passages: {result}")
+
+        # Parse the indices
+        if result.upper() == "NONE":
+            return []
+
+        selected_passages = []
+        indices = [
+            int(idx.strip()) for idx in result.split(",") if idx.strip().isdigit()
+        ]
+
+        for idx in indices:
+            if 0 <= idx < len(knowledge_base):
+                selected_passages.append(knowledge_base[idx])
+
+        logger.info(f"Selected {len(selected_passages)} relevant passages")
+        return selected_passages
+
+    except Exception as e:
+        logger.error(f"Error finding relevant passages: {e}")
+        return []
+
+
+async def augment_query_with_context(
+    messages: List[ChatMessage],
+    traceparent: Optional[str] = None,
+    request_id: Optional[str] = None,
+) -> List[ChatMessage]:
+    """Extract user query, find relevant context, and augment the messages."""
+
+    # Find the last user message
+    last_user_message = None
+    last_user_index = -1
+
+    for i in range(len(messages) - 1, -1, -1):
+        if messages[i].role == "user":
+            last_user_message = messages[i].content
+            last_user_index = i
+            break
+
+    if not last_user_message:
+        logger.warning("No user message found in conversation")
+        return messages
+
+    logger.info(f"Processing user query: '{last_user_message}'")
+
+    # Find relevant passages
+    relevant_passages = await find_relevant_passages(
+        last_user_message, traceparent, request_id
+    )
+
+    if not relevant_passages:
+        logger.info("No relevant passages found, returning original messages")
+        return messages
+
+    # Build context from relevant passages
+    context_parts = []
+    for i, passage in enumerate(relevant_passages):
+        context_parts.append(
+            f"Document {i+1} ({passage['path']}):\n{passage['content']}"
+        )
+
+    context = "\n\n".join(context_parts)
+
+    # Create augmented content with original query and context
+    augmented_content = f"""{last_user_message} RELEVANT CONTEXT:
+    {context}"""
+
+    # Create updated messages with the augmented query
+    updated_messages = messages.copy()
+    updated_messages[last_user_index] = ChatMessage(
+        role="user", content=augmented_content
+    )
+
+    logger.info(f"Augmented user query with {len(relevant_passages)} relevant passages")
+
+    return updated_messages
+
+
+# Load knowledge base on module import
+load_knowledge_base()
+
+
+@app.post("/")
+async def context_builder(
+    messages: List[ChatMessage], request: Request
+) -> List[ChatMessage]:
+    """MCP tool that augments user queries with relevant context from the knowledge base."""
+    logger.info(f"Received chat completion request with {len(messages)} messages")
+
+    # Get traceparent header from MCP request
+    # headers = get_http_headers()
+    # traceparent_header = headers.get("traceparent")
+
+    traceparent_header = request.headers.get("traceparent")
+    request_id = request.headers.get("x-request-id")
+
+    if traceparent_header:
+        logger.info(f"Received traceparent header: {traceparent_header}")
+    else:
+        logger.info("No traceparent header found")
+
+    # Augment the user query with relevant context
+    updated_messages = await augment_query_with_context(
+        messages, traceparent_header, request_id
+    )
+
+    # Return as dict to minimize text serialization
+    return [{"role": msg.role, "content": msg.content} for msg in updated_messages]
+
+
+# Register MCP tool only if mcp is available
+# if mcp is not None:
+#     mcp.tool()(context_builder)
--- a/demos/use_cases/http_filter/src/rag_agent/input_guards.py
+++ b/demos/use_cases/http_filter/src/rag_agent/input_guards.py
@ -0,0 +1,172 @@
+import asyncio
+import json
+import time
+from typing import List, Optional, Dict, Any
+import uuid
+from fastapi import FastAPI, Depends, Request, HTTPException
+
+# from fastmcp.exceptions import ToolError
+from openai import AsyncOpenAI
+import os
+import logging
+
+from .api import ChatCompletionRequest, ChatCompletionResponse, ChatMessage
+from . import mcp
+
+# from fastmcp.server.dependencies import get_http_headers
+
+# Set up logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - [INPUT_GUARDS]       - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+# Configuration for archgw LLM gateway
+LLM_GATEWAY_ENDPOINT = os.getenv("LLM_GATEWAY_ENDPOINT", "http://localhost:12000/v1")
+GUARD_MODEL = "gpt-4o-mini"
+
+# Initialize OpenAI client for archgw
+archgw_client = AsyncOpenAI(
+    base_url=LLM_GATEWAY_ENDPOINT,
+    api_key="EMPTY",  # archgw doesn't require a real API key
+)
+
+app = FastAPI(title="RAG Agent Input Guards", version="1.0.0")
+
+
+async def validate_query_scope(
+    messages: List[ChatMessage],
+    traceparent_header: Optional[str] = None,
+    request_id: Optional[str] = None,
+) -> Dict[str, Any]:
+    """Validate that the user query is within TechCorp's domain.
+
+    Returns a dict with:
+        - is_valid: bool indicating if query is within scope
+        - reason: str explaining why query is out of scope (if applicable)
+    """
+    system_prompt = """You are an input validation guard for TechCorp's customer support system.
+
+Your job is to determine if a user's query is related to TechCorp and its services/products.
+
+TechCorp is a technology company that provides:
+- Cloud services and infrastructure
+- SaaS products
+- Technical support
+- Service level agreements (SLAs)
+- Uptime guarantees
+- Enterprise solutions
+
+ALLOW queries about:
+- TechCorp's services, products, or offerings
+- TechCorp's pricing, SLAs, uptime, or policies
+- Technical support for TechCorp products
+- General questions about TechCorp as a company
+
+REJECT queries about:
+- Other companies or their products
+- General knowledge questions unrelated to TechCorp
+- Personal advice or topics outside TechCorp's domain
+- Anything that doesn't relate to TechCorp's business
+
+Respond in JSON format:
+{
+    "is_valid": true/false,
+    "reason": "brief explanation if invalid"
+}"""
+
+    # Get the last user message for validation
+    last_user_message = None
+    for msg in reversed(messages):
+        if msg.role == "user":
+            last_user_message = msg.content
+            break
+
+    if not last_user_message:
+        return {"is_valid": True, "reason": ""}
+
+    # Prepare messages for the guard
+    guard_messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": f"Query to validate: {last_user_message}"},
+    ]
+
+    try:
+        # Call archgw using OpenAI client
+        extra_headers = {"x-envoy-max-retries": "3", "x-request-id": request_id}
+        if traceparent_header:
+            extra_headers["traceparent"] = traceparent_header
+
+        logger.info(f"Validating query scope: '{last_user_message}'")
+        response = await archgw_client.chat.completions.create(
+            model=GUARD_MODEL,
+            messages=guard_messages,
+            temperature=0.1,
+            max_tokens=150,
+            extra_headers=extra_headers,
+        )
+
+        result_text = response.choices[0].message.content.strip()
+
+        # Parse JSON response
+        try:
+            result = json.loads(result_text)
+            logger.info(f"Validation result: {result}")
+            return result
+        except json.JSONDecodeError:
+            logger.error(f"Failed to parse validation response: {result_text}")
+            # Default to allowing if parsing fails
+            return {"is_valid": True, "reason": ""}
+
+    except Exception as e:
+        logger.error(f"Error validating query: {e}")
+        # Default to allowing if validation fails
+        return {"is_valid": True, "reason": ""}
+
+
+# @mcp.tool
+@app.post("/")
+async def input_guards(
+    messages: List[ChatMessage], request: Request
+) -> List[ChatMessage]:
+    """Input guard that validates queries are within TechCorp's domain.
+
+    If the query is out of scope, replaces the user message with a rejection notice.
+    """
+    logger.info(f"Received request with {len(messages)} messages")
+
+    # Get traceparent header from HTTP request using FastMCP's dependency function
+    # headers = get_http_headers()
+    # traceparent_header = headers.get("traceparent")
+    traceparent_header = request.headers.get("traceparent")
+    request_id = request.headers.get("x-request-id")
+
+    if traceparent_header:
+        logger.info(f"Received traceparent header: {traceparent_header}")
+    else:
+        logger.info("No traceparent header found")
+
+    # Validate the query scope
+    validation_result = await validate_query_scope(
+        messages, traceparent_header, request_id
+    )
+
+    if not validation_result.get("is_valid", True):
+        reason = validation_result.get("reason", "Query is outside TechCorp's domain")
+        logger.warning(f"Query rejected: {reason}")
+
+        # Throw ToolError
+        error_message = f"I apologize, but I can only assist with questions related to TechCorp and its services. Your query appears to be outside this scope. {reason}\n\nPlease ask me about TechCorp's products, services, pricing, SLAs, or technical support."
+        # raise ToolError(error_message)
+        raise HTTPException(
+            status_code=400, detail={"error": "out_of_scope", "message": error_message}
+        )
+
+    logger.info("Query validation passed - forwarding to next filter")
+    return messages
+
+
+@app.get("/health")
+async def health():
+    return {"status": "healthy"}
--- a/demos/use_cases/http_filter/src/rag_agent/query_rewriter.py
+++ b/demos/use_cases/http_filter/src/rag_agent/query_rewriter.py
@ -0,0 +1,133 @@
+import asyncio
+import json
+import time
+from typing import List, Optional, Dict, Any
+import uuid
+from fastapi import FastAPI, Depends, Request
+from openai import AsyncOpenAI
+import os
+import logging
+
+from .api import ChatCompletionRequest, ChatCompletionResponse, ChatMessage
+
+# from . import mcp
+# from fastmcp.server.dependencies import get_http_headers
+
+# Set up logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - [QUERY_REWRITER]     - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+# Configuration for archgw LLM gateway
+LLM_GATEWAY_ENDPOINT = os.getenv("LLM_GATEWAY_ENDPOINT", "http://localhost:12000/v1")
+QUERY_REWRITE_MODEL = "gpt-4o-mini"
+
+# Initialize OpenAI client for archgw
+archgw_client = AsyncOpenAI(
+    base_url=LLM_GATEWAY_ENDPOINT,
+    api_key="EMPTY",  # archgw doesn't require a real API key
+)
+
+app = FastAPI(title="RAG Agent Query Rewriter", version="1.0.0")
+
+
+async def rewrite_query_with_archgw(
+    messages: List[ChatMessage],
+    traceparent_header: Optional[str] = None,
+    request_id: Optional[str] = None,
+) -> str:
+    """Rewrite the last user message for better retrieval. Returns the rewritten query."""
+    system_prompt = """You are a query rewriter that improves user queries for better retrieval.
+
+Given a conversation history, rewrite the last user message to be more specific and context-aware.
+The rewritten query should:
+1. Include relevant context from previous messages
+2. Be clear and specific for information retrieval
+3. Maintain the user's intent
+4. Be concise but comprehensive
+
+Return only the rewritten query, nothing else."""
+
+    rewrite_messages = [{"role": "system", "content": system_prompt}]
+    for msg in messages:
+        rewrite_messages.append({"role": msg.role, "content": msg.content})
+
+    extra_headers = {"x-envoy-max-retries": "3", "x-request-id": request_id}
+    if traceparent_header:
+        extra_headers["traceparent"] = traceparent_header
+
+    try:
+        logger.info(f"Calling archgw at {LLM_GATEWAY_ENDPOINT} to rewrite query")
+        resp = await archgw_client.chat.completions.create(
+            model=QUERY_REWRITE_MODEL,
+            messages=rewrite_messages,
+            temperature=0.3,
+            max_tokens=200,
+            extra_headers=extra_headers,
+        )
+        rewritten = resp.choices[0].message.content.strip()
+        logger.info(f"Query rewritten successfully: '{rewritten}'")
+        return rewritten
+    except Exception as e:
+        logger.error(f"Error rewriting query: {e}")
+
+    # Fallback: return the original last user message
+    for m in reversed(messages):
+        if m.role == "user":
+            logger.info("Falling back to original user message")
+            return m.content
+    return ""
+
+
+@app.post("/")
+async def query_rewriter_http(
+    messages: List[ChatMessage], request: Request
+) -> List[ChatMessage]:
+    """HTTP filter endpoint used by Plano (type: http)."""
+    logger.info(f"Received request with {len(messages)} messages")
+
+    traceparent_header = request.headers.get("traceparent")
+    request_id = request.headers.get("x-request-id")
+
+    if traceparent_header:
+        logger.info(f"Received traceparent header: {traceparent_header}")
+    else:
+        logger.info("No traceparent header found")
+
+    rewritten_query = await rewrite_query_with_archgw(
+        messages, traceparent_header, request_id
+    )
+    # Create updated messages with the rewritten query
+    updated_messages = messages.copy()
+
+    # Find and update the last user message with the rewritten query
+    for i in range(len(updated_messages) - 1, -1, -1):
+        if updated_messages[i].role == "user":
+            original_query = updated_messages[i].content
+            updated_messages[i] = ChatMessage(role="user", content=rewritten_query)
+            logger.info(
+                f"Updated user query from '{original_query}' to '{rewritten_query}'"
+            )
+            break
+    updated_messages_data = [
+        {"role": msg.role, "content": msg.content} for msg in updated_messages
+    ]
+    updated_messages = [ChatMessage(**msg) for msg in updated_messages_data]
+
+    logger.info("Returning rewritten chat completion response")
+    return updated_messages
+
+
+@app.get("/health")
+async def health():
+    return {"status": "healthy"}
+
+
+def start_server(host: str = "0.0.0.0", port: int = 10501):
+    """Start the FastAPI server for query rewriter."""
+    import uvicorn
+
+    logger.info(f"Starting Query Rewriter REST server on {host}:{port}")
+    uvicorn.run(app, host=host, port=port)
--- a/demos/use_cases/http_filter/src/rag_agent/rag_agent.py
+++ b/demos/use_cases/http_filter/src/rag_agent/rag_agent.py
@ -0,0 +1,221 @@
+import json
+from fastapi import FastAPI, Request
+from fastapi.responses import StreamingResponse
+from openai import AsyncOpenAI
+import os
+import logging
+import time
+import uuid
+import uvicorn
+import asyncio
+
+from .api import (
+    ChatCompletionRequest,
+    ChatCompletionResponse,
+    ChatCompletionStreamResponse,
+)
+
+# Set up logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - [RESPONSE_GENERATOR] - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+# Configuration for archgw LLM gateway
+LLM_GATEWAY_ENDPOINT = os.getenv("LLM_GATEWAY_ENDPOINT", "http://localhost:12000/v1")
+RESPONSE_MODEL = "gpt-4o"
+
+# System prompt for response generation
+SYSTEM_PROMPT = """You are a helpful assistant that generates coherent, contextual responses.
+
+Given a conversation history, generate a helpful and relevant response based on all the context available in the messages.
+Your response should:
+1. Be contextually aware of the entire conversation
+2. Address the user's needs appropriately
+3. Be helpful and informative
+4. Maintain a natural conversational tone
+
+Generate a complete response to assist the user."""
+
+# Initialize OpenAI client for archgw
+archgw_client = AsyncOpenAI(
+    base_url=LLM_GATEWAY_ENDPOINT,
+    api_key="EMPTY",  # archgw doesn't require a real API key
+)
+
+# FastAPI app for REST server
+app = FastAPI(title="RAG Agent Response Generator", version="1.0.0")
+
+
+def prepare_response_messages(request_body: ChatCompletionRequest):
+    """Prepare messages for response generation by adding system prompt."""
+    response_messages = [{"role": "system", "content": SYSTEM_PROMPT}]
+
+    # Add conversation history
+    for msg in request_body.messages:
+        response_messages.append({"role": msg.role, "content": msg.content})
+
+    return response_messages
+
+
+@app.post("/v1/chat/completions")
+async def chat_completion_http(request: Request, request_body: ChatCompletionRequest):
+    """HTTP endpoint for chat completions with streaming support."""
+    logger.info(
+        f"Received chat completion request with {len(request_body.messages)} messages"
+    )
+
+    # Get traceparent header from HTTP request
+    traceparent_header = request.headers.get("traceparent")
+    request_id = request.headers.get("x-request-id") or f"req-{uuid.uuid4().hex}"
+
+    if traceparent_header:
+        logger.info(f"Received traceparent header: {traceparent_header}")
+    else:
+        logger.info("No traceparent header found")
+
+    return StreamingResponse(
+        stream_chat_completions(request_body, traceparent_header, request_id),
+        media_type="text/plain",
+        headers={
+            "content-type": "text/event-stream",
+            "x-request-id": request_id,
+        },
+    )
+
+
+async def stream_chat_completions(
+    request_body: ChatCompletionRequest,
+    traceparent_header: str = None,
+    request_id: str = None,
+):
+    """Generate streaming chat completions."""
+    # Prepare messages for response generation
+    response_messages = prepare_response_messages(request_body)
+
+    try:
+        # Call archgw using OpenAI client for streaming
+        logger.info(
+            f"Calling archgw at {LLM_GATEWAY_ENDPOINT} to generate streaming response"
+        )
+
+        # Prepare extra headers if traceparent is provided
+        extra_headers = {"x-envoy-max-retries": "3", "x-request-id": request_id}
+        if traceparent_header:
+            extra_headers["traceparent"] = traceparent_header
+
+        response_stream = await archgw_client.chat.completions.create(
+            model=RESPONSE_MODEL,
+            messages=response_messages,
+            temperature=request_body.temperature or 0.7,
+            max_tokens=request_body.max_tokens or 1000,
+            stream=True,
+            extra_headers=extra_headers,
+        )
+
+        completion_id = f"chatcmpl-{uuid.uuid4().hex[:8]}"
+        created_time = int(time.time())
+        collected_content = []
+
+        async for chunk in response_stream:
+            if chunk.choices and chunk.choices[0].delta.content:
+                content = chunk.choices[0].delta.content
+                collected_content.append(content)
+
+                # Create streaming response chunk
+                stream_chunk = ChatCompletionStreamResponse(
+                    id=completion_id,
+                    created=created_time,
+                    model=request_body.model,
+                    choices=[
+                        {
+                            "index": 0,
+                            "delta": {"content": content},
+                            "finish_reason": None,
+                        }
+                    ],
+                )
+
+                yield f"data: {stream_chunk.model_dump_json()}\n\n"
+
+        # Send final chunk with complete response in expected format
+        full_response = "".join(collected_content)
+        updated_history = [{"role": "assistant", "content": full_response}]
+
+        final_chunk = ChatCompletionStreamResponse(
+            id=completion_id,
+            created=created_time,
+            model=request_body.model,
+            choices=[
+                {
+                    "index": 0,
+                    "delta": {},
+                    "finish_reason": "stop",
+                    "message": {
+                        "role": "assistant",
+                        "content": json.dumps(updated_history),
+                    },
+                }
+            ],
+        )
+
+        yield f"data: {final_chunk.model_dump_json()}\n\n"
+        yield "data: [DONE]\n\n"
+
+    except Exception as e:
+        logger.error(f"Error generating streaming response: {e}")
+
+        # Send error as streaming response
+        error_chunk = ChatCompletionStreamResponse(
+            id=f"chatcmpl-{uuid.uuid4().hex[:8]}",
+            created=int(time.time()),
+            model=request_body.model,
+            choices=[
+                {
+                    "index": 0,
+                    "delta": {
+                        "content": "I apologize, but I'm having trouble generating a response right now. Please try again."
+                    },
+                    "finish_reason": "stop",
+                }
+            ],
+        )
+
+        yield f"data: {error_chunk.model_dump_json()}\n\n"
+        yield "data: [DONE]\n\n"
+
+
+@app.get("/health")
+async def health_check():
+    """Health check endpoint."""
+    return {"status": "healthy"}
+
+
+def start_server(host: str = "localhost", port: int = 8000):
+    """Start the REST server."""
+    uvicorn.run(
+        app,
+        host=host,
+        port=port,
+        log_config={
+            "version": 1,
+            "disable_existing_loggers": False,
+            "formatters": {
+                "default": {
+                    "format": "%(asctime)s - [RESPONSE_GENERATOR] - %(levelname)s - %(message)s",
+                },
+            },
+            "handlers": {
+                "default": {
+                    "formatter": "default",
+                    "class": "logging.StreamHandler",
+                    "stream": "ext://sys.stdout",
+                },
+            },
+            "root": {
+                "level": "INFO",
+                "handlers": ["default"],
+            },
+        },
+    )
--- a/demos/use_cases/http_filter/src/rag_agent/sample_knowledge_base.csv
+++ b/demos/use_cases/http_filter/src/rag_agent/sample_knowledge_base.csv
@ -0,0 +1,257 @@
+path,content
+TechCorp_CloudServices_SLA_Agreement_2024,"SERVICE LEVEL AGREEMENT
+This Service Level Agreement (""SLA"") is entered into on March 15, 2024, between TechCorp Solutions Inc., a Delaware corporation (""Provider""), and CloudFirst Enterprises LLC (""Customer"").
+
+DEFINITIONS
+Service Availability: The percentage of time during which the cloud services are operational and accessible.
+Downtime: Any period when the services are unavailable or inaccessible to Customer.
+Response Time: The time between service request submission and initial response from Provider.
+
+SERVICE COMMITMENTS
+Provider guarantees 99.9% uptime for all cloud infrastructure services during any calendar month.
+Average response time for API calls shall not exceed 200 milliseconds under normal operating conditions.
+Customer support response times: Critical issues within 1 hour, Standard issues within 4 hours.
+
+REMEDIES
+For each full percentage point below 99.9% availability, Customer receives 10% credit on monthly fees.
+If response times exceed 500ms for more than 5 minutes in any hour, Customer receives 5% monthly credit.
+
+MONITORING AND REPORTING
+Provider will maintain real-time monitoring systems and provide monthly performance reports.
+All metrics will be measured from Provider's monitoring systems located in primary data centers.
+
+This SLA remains in effect for the duration of the underlying service agreement.
+
+Executed by:
+TechCorp Solutions Inc.
+Sarah Mitchell, VP Operations
+Date: March 15, 2024
+
+CloudFirst Enterprises LLC
+Robert Chen, CTO
+Date: March 16, 2024"
+
+DataSecure_Privacy_Policy_v3.2,"PRIVACY POLICY
+DataSecure Analytics, Inc. (""Company"") Privacy Policy
+Effective Date: January 1, 2024
+Last Updated: February 28, 2024
+
+INFORMATION COLLECTION
+We collect information you provide directly, such as account details, usage preferences, and communication records.
+Automatically collected data includes IP addresses, browser types, device information, and service interaction logs.
+Third-party integrations may provide additional user behavior and demographic information with consent.
+
+DATA USAGE
+Personal information is used to provide services, improve user experience, and communicate service updates.
+Aggregated, non-identifiable data may be used for analytics, research, and service enhancement.
+We do not sell personal information to third parties for marketing purposes.
+
+DATA PROTECTION
+All data is encrypted in transit using TLS 1.3 and at rest using AES-256 encryption.
+Access controls limit data access to authorized personnel only on a need-to-know basis.
+Regular security audits and penetration testing ensure ongoing protection measures.
+
+DATA RETENTION
+Personal data is retained for the duration of active service plus 24 months.
+Logs and analytics data are retained for 12 months unless legally required otherwise.
+Upon account deletion, personal data is permanently removed within 30 days.
+
+USER RIGHTS
+Users may request access to, correction of, or deletion of their personal information.
+Data portability requests will be fulfilled in standard formats within 30 days.
+Marketing communications can be opted out of at any time.
+
+CONTACT
+For privacy concerns, contact: privacy@datasecure.com
+Data Protection Officer: Jennifer Walsh, jwalsh@datasecure.com"
+
+GlobalManufacturing_SupplyChain_Contract_Q2_2024,"SUPPLY CHAIN AGREEMENT
+This Supply Chain Agreement is entered into between GlobalManufacturing Corp (""Buyer"") and PrecisionParts Ltd (""Supplier"") effective April 1, 2024.
+
+SCOPE OF SERVICES
+Supplier will provide automotive components including brake assemblies, suspension parts, and electrical harnesses.
+All products must meet ISO 9001 quality standards and automotive industry specifications.
+Delivery schedule: Weekly shipments every Tuesday, with 48-hour advance shipping notifications.
+
+PRICING AND PAYMENT
+Component pricing is fixed for initial 6-month term with quarterly price review thereafter.
+Payment terms: Net 45 days from invoice date via electronic transfer.
+Volume discounts apply: 5% for orders exceeding 10,000 units per month, 8% for orders exceeding 25,000 units.
+
+QUALITY REQUIREMENTS
+All components must pass incoming inspection with less than 0.1% defect rate.
+Supplier maintains quality certifications including IATF 16949 and environmental compliance.
+Batch tracking and traceability required for all delivered components.
+
+LOGISTICS AND DELIVERY
+Supplier responsible for packaging, labeling, and delivery to Buyer's distribution centers.
+Delivery windows: 8 AM - 4 PM, Monday through Friday, with advance appointment scheduling.
+Late delivery penalties: 2% of shipment value for each day beyond scheduled delivery.
+
+RISK MANAGEMENT
+Supplier maintains business continuity plans and alternative sourcing strategies.
+Force majeure events must be reported within 24 hours with mitigation plans.
+Insurance requirements: $5M general liability, $2M product liability coverage.
+
+INTELLECTUAL PROPERTY
+All custom tooling and specifications remain property of Buyer.
+Supplier grants license to use necessary patents for component manufacturing.
+
+This agreement shall remain in effect for 24 months with automatic renewal unless terminated.
+
+GlobalManufacturing Corp
+Michael Rodriguez, Supply Chain Director
+Date: April 1, 2024
+
+PrecisionParts Ltd
+Amanda Foster, VP Sales
+Date: April 2, 2024"
+
+EduTech_StudentData_Management_Policy_2024,"STUDENT DATA MANAGEMENT POLICY
+EduTech Learning Platform - Data Management and Protection Policy
+Document Version: 2.1
+Effective Date: August 15, 2024
+
+SCOPE AND PURPOSE
+This policy governs the collection, use, storage, and protection of student educational records and personal information.
+Applies to all employees, contractors, and third-party service providers accessing student data.
+Compliance with FERPA, COPPA, and state student privacy laws is mandatory.
+
+DATA CLASSIFICATION
+Educational Records: Grades, attendance, assignments, and academic progress information.
+Personal Information: Names, addresses, contact details, and demographic information.
+Behavioral Data: Learning patterns, platform usage, and engagement metrics.
+
+COLLECTION PRINCIPLES
+Data collection is limited to educational purposes and service improvement only.
+Parental consent required for students under 13 years of age.
+Students and parents have right to review and request corrections to educational records.
+
+ACCESS CONTROLS
+Role-based access ensures personnel see only data necessary for their functions.
+Multi-factor authentication required for all system access.
+Access logs maintained and reviewed monthly for unauthorized activity.
+
+DATA SHARING
+Educational records shared only with authorized school personnel and parents/students.
+No data sharing with third parties for commercial purposes without explicit consent.
+Research data must be de-identified and aggregated before external sharing.
+
+SECURITY MEASURES
+Data encrypted using industry-standard protocols during transmission and storage.
+Regular security assessments and vulnerability testing conducted quarterly.
+Incident response plan includes notification procedures for data breaches.
+
+RETENTION AND DISPOSAL
+Student records retained according to school district policies, typically 5-7 years post-graduation.
+Inactive accounts and associated data purged after 2 years of non-use.
+Secure data destruction protocols ensure complete removal of sensitive information.
+
+COMPLIANCE MONITORING
+Annual privacy training required for all staff handling student data.
+Regular audits ensure ongoing compliance with applicable privacy regulations.
+Privacy impact assessments conducted for new features or data uses.
+
+Contact: Dr. Lisa Thompson, Chief Privacy Officer
+Email: privacy@edutech-learning.com
+Phone: (555) 123-4567"
+
+FinanceFirst_Investment_Advisory_Agreement_2024,"INVESTMENT ADVISORY AGREEMENT
+This Investment Advisory Agreement is entered into between FinanceFirst Advisors LLC (""Advisor"") and Madison Investment Group (""Client"") on May 20, 2024.
+
+ADVISORY SERVICES
+Advisor will provide comprehensive investment management and financial planning services.
+Services include portfolio construction, asset allocation, risk assessment, and performance monitoring.
+Regular portfolio reviews conducted quarterly with detailed performance reporting.
+
+INVESTMENT AUTHORITY
+Client grants Advisor discretionary authority to make investment decisions within agreed parameters.
+Investment universe includes stocks, bonds, ETFs, mutual funds, and alternative investments as appropriate.
+All trades executed through qualified broker-dealers with best execution practices.
+
+FEE STRUCTURE
+Management fee: 1.25% annually on assets under management, calculated and billed quarterly.
+Performance fee: 15% of returns exceeding S&P 500 benchmark, calculated annually.
+Additional fees may apply for specialized services such as tax planning or estate planning.
+
+CLIENT RESPONSIBILITIES
+Client must provide accurate financial information and promptly communicate changes in circumstances.
+Investment objectives and risk tolerance should be reviewed and updated annually.
+Client responsible for reviewing and approving investment policy statement.
+
+RISK DISCLOSURE
+All investments carry risk of loss, and past performance does not guarantee future results.
+Diversification does not ensure profit or protect against loss in declining markets.
+Alternative investments may have limited liquidity and higher volatility.
+
+REGULATORY COMPLIANCE
+Advisor is registered with the Securities and Exchange Commission as an investment advisor.
+All activities conducted in accordance with Investment Advisers Act of 1940 and applicable regulations.
+Form ADV Part 2 brochure provided annually with material updates.
+
+CONFIDENTIALITY
+All client information treated as confidential and shared only as necessary for service provision.
+Third-party service providers bound by confidentiality agreements.
+Client data protected through secure systems and access controls.
+
+TERMINATION
+Either party may terminate agreement with 30 days written notice.
+Upon termination, Advisor will assist with orderly transfer of assets to new custodian or advisor.
+Final fee calculation prorated to date of termination.
+
+FinanceFirst Advisors LLC
+Thomas Anderson, Managing Partner
+Date: May 20, 2024
+
+Madison Investment Group
+Rebecca Martinez, Chief Investment Officer
+Date: May 21, 2024"
+
+HealthSystem_PatientCare_Standards_2024,"PATIENT CARE STANDARDS AND PROTOCOLS
+Metropolitan Health System - Clinical Care Standards
+Document ID: MHS-PCS-2024-001
+Effective Date: June 1, 2024
+
+PATIENT SAFETY PROTOCOLS
+All patients must have proper identification verification using two unique identifiers.
+Medication administration requires independent double-check for high-risk medications.
+Fall risk assessments completed within 4 hours of admission with appropriate interventions.
+
+CLINICAL DOCUMENTATION
+Medical records must be completed within 24 hours of patient encounter.
+All entries require electronic signature with timestamp and provider identification.
+Critical values and abnormal results must be communicated and documented immediately.
+
+INFECTION CONTROL
+Hand hygiene compliance monitored with target rate of 95% or higher.
+Personal protective equipment used according to transmission-based precautions.
+Isolation procedures implemented within 2 hours of identification of infectious conditions.
+
+EMERGENCY RESPONSE
+Code team response time target: 3 minutes from activation to arrival.
+Crash cart and emergency equipment checks performed daily and documented.
+All staff required to maintain current CPR and emergency response certifications.
+
+PATIENT COMMUNICATION
+Patient rights and responsibilities communicated upon admission.
+Informed consent obtained and documented prior to procedures and treatments.
+Family involvement encouraged with respect for patient privacy preferences.
+
+QUALITY MEASURES
+Patient satisfaction scores monitored monthly with target of 4.5/5.0 or higher.
+Medication error rates tracked with goal of less than 1 per 1000 patient days.
+Hospital-acquired infection rates measured and benchmarked against national standards.
+
+STAFF COMPETENCY
+Annual competency assessments required for all clinical staff.
+Continuing education requirements: 24 hours annually for nurses, 40 hours for physicians.
+Specialty certifications maintained according to department and role requirements.
+
+TECHNOLOGY STANDARDS
+Electronic health record system used for all patient documentation.
+Telemedicine capabilities available for remote consultations and monitoring.
+Clinical decision support tools integrated to assist with diagnosis and treatment decisions.
+
+Contact: Dr. Patricia Williams, Chief Medical Officer
+Email: pwilliams@metrohealthsystem.org
+Phone: (555) 987-6543"
--- a/demos/use_cases/http_filter/start_agents.sh
+++ b/demos/use_cases/http_filter/start_agents.sh
@ -0,0 +1,78 @@
+# #!/bin/bash
+# set -e
+
+# WAIT_FOR_PIDS=()
+
+# log() {
+#   timestamp=$(python3 -c 'from datetime import datetime; print(datetime.now().strftime("%Y-%m-%d %H:%M:%S,%f")[:23])')
+#   message="$*"
+#   echo "$timestamp - $message"
+# }
+
+# cleanup() {
+#     log "Caught signal, terminating all user processes ..."
+#     for PID in "${WAIT_FOR_PIDS[@]}"; do
+#         if kill $PID 2> /dev/null; then
+#             log "killed process: $PID"
+#         fi
+#     done
+#     exit 1
+# }
+
+# trap cleanup EXIT
+
+# log "Starting input_guards agent on port 10500/mcp..."
+# uv run python -m rag_agent --rest-server --host 0.0.0.0 --rest-port 10500 --agent input_guards &
+# WAIT_FOR_PIDS+=($!)
+
+# log "Starting query_rewriter agent on port 10501/mcp..."
+# uv run python -m rag_agent --rest-server --host 0.0.0.0 --rest-port 10501 --agent query_rewriter &
+# WAIT_FOR_PIDS+=($!)
+
+# log "Starting context_builder agent on port 10502/mcp..."
+# uv run python -m rag_agent --rest-server --host 0.0.0.0 --rest-port 10502 --agent context_builder &
+# WAIT_FOR_PIDS+=($!)
+
+# # log "Starting response_generator agent on port 10400..."
+# # uv run python -m rag_agent --host 0.0.0.0 --port 10400 --agent response_generator &
+# # WAIT_FOR_PIDS+=($!)
+
+# log "Starting response_generator agent on port 10505..."
+# uv run python -m rag_agent --rest-server --host 0.0.0.0 --rest-port 10505 --agent response_generator &
+# WAIT_FOR_PIDS+=($!)
+
+# for PID in "${WAIT_FOR_PIDS[@]}"; do
+#     wait "$PID"
+# done
+
+
+
+
+#!/bin/bash
+set -e
+
+export PYTHONPATH=/app/src
+
+pids=()
+
+log() { echo "$(date '+%F %T') - $*"; }
+
+log "Starting input_guards HTTP server on :10500"
+uv run uvicorn rag_agent.input_guards:app --host 0.0.0.0 --port 10500 &
+pids+=($!)
+
+log "Starting query_rewriter HTTP server on :10501"
+uv run uvicorn rag_agent.query_rewriter:app --host 0.0.0.0 --port 10501 &
+pids+=($!)
+
+log "Starting context_builder HTTP server on :10502"
+uv run uvicorn rag_agent.context_builder:app --host 0.0.0.0 --port 10502 &
+pids+=($!)
+
+log "Starting response_generator (OpenAI-compatible) on :10505"
+uv run uvicorn rag_agent.rag_agent:app --host 0.0.0.0 --port 10505 &
+pids+=($!)
+
+for PID in "${pids[@]}"; do
+    wait "$PID"
+done
--- a/demos/use_cases/http_filter/test.rest
+++ b/demos/use_cases/http_filter/test.rest
@ -0,0 +1,92 @@
+@baseUrl = http://0.0.0.0:10502
+@model = gpt-4o
+
+# Health Check
+GET {{baseUrl}}/health
+
+###
+
+# Test 1: Simple Non-Streaming Chat Completion
+POST {{baseUrl}}/v1/chat/completions
+Content-Type: application/json
+
+{
+  "model": "{{model}}",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Hello! Can you help me understand what machine learning is?"
+    }
+  ]
+}
+
+###
+
+# Test 2: Simple Streaming Chat Completion
+POST {{baseUrl}}/v1/chat/completions
+Content-Type: application/json
+
+{
+  "model": "{{model}}",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Explain the concept of artificial intelligence in simple terms."
+    }
+  ],
+  "stream": true
+}
+
+### Test 3
+POST http://localhost:8001/v1/chat/completions
+Content-Type: application/json
+
+{
+  "model": "{{model}}",
+  "messages": [
+    {
+      "role": "user",
+      "content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
+    }
+  ],
+  "stream": true
+}
+
+### send request to query_rewriter agent
+POST http://localhost:10500/
+Content-Type: application/json
+
+[
+  {
+    "role": "user",
+    "content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
+  }
+]
+
+### test fast-llm
+POST http://localhost:12000/v1/chat/completions
+Content-Type: application/json
+
+{
+  "model": "fast-llm",
+  "messages": [
+    {
+      "role": "user",
+      "content": "hello"
+    }
+  ]
+}
+
+### test smart-llm
+POST http://localhost:12000/v1/chat/completions
+Content-Type: application/json
+
+{
+  "model": "smart-llm",
+  "messages": [
+    {
+      "role": "user",
+      "content": "hello"
+    }
+  ]
+}
--- a/demos/use_cases/http_filter/uv.lock
+++ b/demos/use_cases/http_filter/uv.lock