mirror of
https://github.com/katanemo/plano.git
synced 2026-06-08 14:55:14 +02:00
http-filter: add fully http based demo (remove MCP) (#681)
* http-filter: add fully http based demo (remove MCP) * Fix pre-commit formatting * add instructions about uv build/sync in cli (#675) * Musa/demo fix (#676) * fix demo with travel agent * Update .gitignore * remove sse chunk rendering * ensure that request id is consistent (#677) * ensure that request id is consistent * remove test debug/info statements * Introduce signals change (#655) * adding support for signals * reducing false positives for signals like positive interaction * adding docs. Still need to fix the messages list, but waiting on PR #621 * Improve frustration detection: normalize contractions and refine punctuation * Further refine test cases with longer messages * minor doc changes * fixing echo statement for build * fixing the messages construction and using the trait for signals * update signals docs * fixed some minor doc changes * added more tests and fixed docuemtnation. PR 100% ready * made fixes based on PR comments * Optimize latency 1. replace sliding window approach with trigram containment check 2. add code to pre-compute ngrams for patterns * removed some debug statements to make tests easier to read * PR comments to make ObservableStreamProcessor accept optonal Vec<Messagges> * fixed PR comments --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> Co-authored-by: MeiyuZhong <mariazhong9612@gmail.com> Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com> * pass request_id in orchestrator and routing model (#678) * release 0.4.2 (#679) * tweaks to web and docs to align to 0.4.2 (#680) * tweaks to web and docs to align to 0.4.2 * made our release banner clickable --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> * Added request id across agents * fix http_filter agent: request id + pre-commit --------- Co-authored-by: Adil Hafeez <adil@katanemo.com> Co-authored-by: Musa <malikmusa1323@gmail.com> Co-authored-by: Salman Paracha <salman.paracha@gmail.com> Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com>
This commit is contained in:
parent
ab391f96c7
commit
6a29f8398f
19 changed files with 3655 additions and 0 deletions
26
demos/use_cases/http_filter/Dockerfile
Normal file
26
demos/use_cases/http_filter/Dockerfile
Normal file
|
|
@ -0,0 +1,26 @@
|
|||
FROM python:3.13-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install bash and uv
|
||||
RUN apt-get update && apt-get install -y bash && rm -rf /var/lib/apt/lists/*
|
||||
RUN pip install --no-cache-dir uv
|
||||
|
||||
# Copy dependency files
|
||||
COPY pyproject.toml README.md ./
|
||||
|
||||
# Copy source code
|
||||
COPY src/ ./src/
|
||||
COPY start_agents.sh ./
|
||||
|
||||
# Install dependencies using uv
|
||||
RUN uv pip install --system --no-cache click fastmcp pydantic fastapi uvicorn openai
|
||||
|
||||
# Make start script executable
|
||||
RUN chmod +x start_agents.sh
|
||||
|
||||
# Expose ports for all agents
|
||||
EXPOSE 10500 10501 10502 10505
|
||||
|
||||
# Run the start script with bash
|
||||
CMD ["bash", "./start_agents.sh"]
|
||||
128
demos/use_cases/http_filter/README.md
Normal file
128
demos/use_cases/http_filter/README.md
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
# RAG Agent Demo
|
||||
|
||||
A multi-agent RAG system demonstrating plano's agent filter chain with MCP protocol.
|
||||
|
||||
## Architecture
|
||||
|
||||
This demo consists of four components:
|
||||
1. **Input Guards** (MCP filter) - Validates queries are within TechCorp's domain
|
||||
2. **Query Rewriter** (MCP filter) - Rewrites user queries for better retrieval
|
||||
3. **Context Builder** (MCP filter) - Retrieves relevant context from knowledge base
|
||||
4. **RAG Agent** (REST) - Generates final responses based on augmented context
|
||||
|
||||
## Components
|
||||
|
||||
### Input Guards Filter (MCP)
|
||||
- **Port**: 10500
|
||||
- **Tool**: `input_guards`
|
||||
- Validates queries are within TechCorp's domain
|
||||
- Rejects queries about other companies or unrelated topics
|
||||
|
||||
### Query Rewrit3r Filter (MCP)
|
||||
- **Port**: 10501
|
||||
- **Tool**: `query_rewriter`
|
||||
- Improves queries using LLM before retrieval
|
||||
|
||||
### Context Builder Filter (MCP)
|
||||
- **Port**: 10502
|
||||
- **Tool**: `context_builder`
|
||||
- Augments queries with relevant passages from knowledge base
|
||||
|
||||
### RAG Agent (REST/OpenAI)
|
||||
- **Port**: 10505
|
||||
- **Endpoint**: `/v1/chat/completions`
|
||||
- Generates responses using OpenAI-compatible API
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Start everything with Docker Compose
|
||||
```bash
|
||||
docker compose up --build
|
||||
```
|
||||
|
||||
This brings up:
|
||||
- Input Guards MCP server on port 10500
|
||||
- Query Rewriter MCP server on port 10501
|
||||
- Context Builder MCP server on port 10502
|
||||
- RAG Agent REST server on port 10505
|
||||
- Plano listener on port 8001 (and gateway on 12000)
|
||||
- Jaeger UI for viewing traces at http://localhost:16686
|
||||
- Open WebUI at http://localhost:8080 for interactive queries
|
||||
|
||||
> Set `OPENAI_API_KEY` in your environment before running; `LLM_GATEWAY_ENDPOINT` defaults to `http://host.docker.internal:12000/v1`.
|
||||
|
||||
### 2. Test the system
|
||||
|
||||
**Option A: Using Open WebUI (recommended)**
|
||||
|
||||
Navigate to http://localhost:8080 and send queries through the chat interface.
|
||||
|
||||
**Option B: Using curl**
|
||||
```bash
|
||||
curl -X POST http://localhost:8001/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o",
|
||||
"messages": [{"role": "user", "content": "What is the guaranteed uptime for TechCorp?"}]
|
||||
}'
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The `config.yaml` defines how agents are connected:
|
||||
|
||||
```yaml
|
||||
filters:
|
||||
- id: input_guards
|
||||
url: http://host.docker.internal:10500
|
||||
# type: mcp (default)
|
||||
# tool: input_guards (default - same as filter id)
|
||||
|
||||
- id: query_rewriter
|
||||
url: http://host.docker.internal:10501
|
||||
# type: mcp (default)
|
||||
|
||||
- id: context_builder
|
||||
url: http://host.docker.internal:10502
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. User sends request to plano listener on port 8001
|
||||
2. Request passes through MCP filter chain:
|
||||
- **Input Guards** validates the query is within TechCorp's domain
|
||||
- **Query Rewriter** rewrites the query for better retrieval
|
||||
- **Context Builder** augments query with relevant knowledge base passages
|
||||
3. Augmented request is forwarded to **RAG Agent** REST endpoint
|
||||
4. RAG Agent generates final response using LLM
|
||||
|
||||
## Additional Configuration
|
||||
|
||||
See `config.yaml` for the complete filter chain setup. The MCP filters use default settings:
|
||||
- `type: mcp` (default)
|
||||
- `transport: streamable-http` (default)
|
||||
- Tool name defaults to filter ID
|
||||
|
||||
See `sample_queries.md` for example queries to test the RAG system.
|
||||
|
||||
Example request:
|
||||
```bash
|
||||
curl -X POST http://localhost:8001/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What is the guaranteed uptime for TechCorp?"
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
- `LLM_GATEWAY_ENDPOINT` - lpano endpoint (default: `http://localhost:12000/v1`)
|
||||
- `OPENAI_API_KEY` - OpenAI API key for model providers
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- See `sample_queries.md` for more example queries
|
||||
- See `config.yaml` for complete configuration details
|
||||
50
demos/use_cases/http_filter/config.yaml
Normal file
50
demos/use_cases/http_filter/config.yaml
Normal file
|
|
@ -0,0 +1,50 @@
|
|||
version: v0.3.0
|
||||
|
||||
agents:
|
||||
- id: rag_agent
|
||||
url: http://rag-agents:10505
|
||||
|
||||
filters:
|
||||
- id: input_guards
|
||||
url: http://rag-agents:10500
|
||||
type: http
|
||||
# type: mcp (default)
|
||||
# transport: streamable-http (default)
|
||||
# tool: input_guards (default - same as filter id)
|
||||
- id: query_rewriter
|
||||
url: http://rag-agents:10501
|
||||
type: http
|
||||
# type: mcp (default)
|
||||
# transport: streamable-http (default)
|
||||
# tool: query_rewriter (default - same as filter id)
|
||||
- id: context_builder
|
||||
url: http://rag-agents:10502
|
||||
type: http
|
||||
|
||||
model_providers:
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
- model: openai/gpt-4o
|
||||
access_key: $OPENAI_API_KEY
|
||||
|
||||
model_aliases:
|
||||
fast-llm:
|
||||
target: gpt-4o-mini
|
||||
smart-llm:
|
||||
target: gpt-4o
|
||||
|
||||
listeners:
|
||||
- type: agent
|
||||
name: agent_1
|
||||
port: 8001
|
||||
router: plano_orchestrator_v1
|
||||
agents:
|
||||
- id: rag_agent
|
||||
description: virtual assistant for retrieval augmented generation tasks
|
||||
filter_chain:
|
||||
- input_guards
|
||||
- query_rewriter
|
||||
- context_builder
|
||||
tracing:
|
||||
random_sampling: 100
|
||||
42
demos/use_cases/http_filter/docker-compose.yaml
Normal file
42
demos/use_cases/http_filter/docker-compose.yaml
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
services:
|
||||
rag-agents:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
ports:
|
||||
- "10500:10500"
|
||||
- "10501:10501"
|
||||
- "10502:10502"
|
||||
- "10505:10505"
|
||||
environment:
|
||||
- LLM_GATEWAY_ENDPOINT=${LLM_GATEWAY_ENDPOINT:-http://host.docker.internal:12000/v1}
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY:?OPENAI_API_KEY environment variable is required but not set}
|
||||
plano:
|
||||
build:
|
||||
context: ../../../
|
||||
dockerfile: Dockerfile
|
||||
ports:
|
||||
- "12000:12000"
|
||||
- "8001:8001"
|
||||
environment:
|
||||
- ARCH_CONFIG_PATH=/config/config.yaml
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY:?OPENAI_API_KEY environment variable is required but not set}
|
||||
volumes:
|
||||
- ./config.yaml:/app/arch_config.yaml
|
||||
- /etc/ssl/cert.pem:/etc/ssl/cert.pem
|
||||
jaeger:
|
||||
build:
|
||||
context: ../../shared/jaeger
|
||||
ports:
|
||||
- "16686:16686"
|
||||
- "4317:4317"
|
||||
- "4318:4318"
|
||||
open-web-ui:
|
||||
image: dyrnq/open-webui:main
|
||||
restart: always
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
- DEFAULT_MODEL=gpt-4o-mini
|
||||
- ENABLE_OPENAI_API=true
|
||||
- OPENAI_API_BASE_URL=http://host.docker.internal:8001/v1
|
||||
77
demos/use_cases/http_filter/http.rest
Normal file
77
demos/use_cases/http_filter/http.rest
Normal file
|
|
@ -0,0 +1,77 @@
|
|||
# http.rest (replacement for mcp_query.rest)
|
||||
|
||||
@host = http://localhost
|
||||
@model = gpt-4o-mini
|
||||
|
||||
# Filter endpoints (HTTP-only)
|
||||
@inputGuards = {{host}}:10500
|
||||
@queryRewriter = {{host}}:10501
|
||||
@contextBuilder = {{host}}:10502
|
||||
|
||||
# Plano agent listener (the thing Open WebUI calls)
|
||||
@planoAgent = {{host}}:8001
|
||||
|
||||
|
||||
POST {{queryRewriter}}/
|
||||
Content-Type: application/json
|
||||
|
||||
[
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
|
||||
POST {{inputGuards}}/
|
||||
Content-Type: application/json
|
||||
|
||||
[
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
|
||||
POST {{contextBuilder}}/
|
||||
Content-Type: application/json
|
||||
|
||||
[
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What is TechCorp's uptime SLA?"
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
POST {{planoAgent}}/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "fast-llm",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
|
||||
}
|
||||
],
|
||||
"stream": false
|
||||
}
|
||||
|
||||
|
||||
POST {{planoAgent}}/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "fast-llm",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
|
||||
}
|
||||
],
|
||||
"stream": true
|
||||
}
|
||||
86
demos/use_cases/http_filter/mcp_query.rest
Normal file
86
demos/use_cases/http_filter/mcp_query.rest
Normal file
|
|
@ -0,0 +1,86 @@
|
|||
### Initialize MCP Session (SSE)
|
||||
POST http://localhost:10501/mcp
|
||||
Content-Type: application/json
|
||||
Accept: application/json, text/event-stream
|
||||
|
||||
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"capabilities":{},"protocolVersion":"2024-11-05","clientInfo":{"name":"test","version":"1.0.0"}}}
|
||||
|
||||
### Send Initialized Notification
|
||||
POST http://localhost:10501/mcp
|
||||
Content-Type: application/json
|
||||
Accept: application/json, text/event-stream
|
||||
mcp-session-id: 35d455dc07b8400887f86668590f12bb
|
||||
|
||||
{
|
||||
"jsonrpc": "2.0",
|
||||
"method": "notifications/initialized"
|
||||
}
|
||||
|
||||
### List Tools
|
||||
POST http://localhost:10501/mcp
|
||||
Content-Type: application/json
|
||||
Accept: application/json, text/event-stream
|
||||
mcp-session-id: eb10a691b36e4547b6c93c5dc5b47e11
|
||||
|
||||
{
|
||||
"jsonrpc": "2.0",
|
||||
"id": "list-tools-1",
|
||||
"method": "tools/list"
|
||||
}
|
||||
|
||||
### Call Query Rewriter Tool
|
||||
POST http://localhost:10501/mcp
|
||||
Content-Type: application/json
|
||||
Accept: application/json, text/event-stream
|
||||
mcp-session-id: 6b95ff75825a402b90eb3ea07e23fbce
|
||||
|
||||
{
|
||||
"jsonrpc": "2.0",
|
||||
"id": "3d3b886a-6216-4a26-a422-7a972529c0e7",
|
||||
"method": "tools/call",
|
||||
"params": {
|
||||
"arguments": {
|
||||
"messages": [
|
||||
{
|
||||
"content": "What is the guaranteed uptime percentage for TechCorp's cloud services?",
|
||||
"role": "user"
|
||||
}
|
||||
]
|
||||
},
|
||||
"name": "query_rewriter"
|
||||
}
|
||||
}
|
||||
|
||||
### another test
|
||||
|
||||
# Content-Type: application/json
|
||||
# Accept: application/json, text/event-stream
|
||||
# mcp-session-id: ed7a81a1d39549ecaadb867a6b2daf1e
|
||||
|
||||
POST http://localhost:10501/mcp
|
||||
content-type: application/json
|
||||
mcp-session-id: e4ec1ae904e14e06b7d194da10e5f74c
|
||||
accept: application/json, text/event-stream
|
||||
|
||||
{"jsonrpc":"2.0","id":"4bb1043a-2953-4bcd-b801-f270b0ae8c39","method":"tools/call","params":{"arguments":{"messages":[{"content":"What is the guaranteed uptime percentage for TechCorp's cloud services?","role":"user"}]},"name":"query_rewriter"}}
|
||||
|
||||
|
||||
|
||||
### stream test
|
||||
|
||||
POST http://localhost:10501/mcp
|
||||
content-type: application/json
|
||||
mcp-session-id: 35d455dc07b8400887f86668590f12bb
|
||||
accept: application/json, text/event-stream
|
||||
|
||||
{
|
||||
"jsonrpc": "2.0",
|
||||
"id": 1,
|
||||
"method": "tools/call",
|
||||
"params": {
|
||||
"name": "long_job",
|
||||
"arguments": {
|
||||
"n": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
22
demos/use_cases/http_filter/pyproject.toml
Normal file
22
demos/use_cases/http_filter/pyproject.toml
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
[project]
|
||||
name = "rag_agent"
|
||||
version = "0.1.0"
|
||||
description = "RAG Agent"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = [
|
||||
"click>=8.2.1",
|
||||
"mcp>=1.13.1",
|
||||
"fastmcp>=2.14",
|
||||
"pydantic>=2.11.7",
|
||||
"fastapi>=0.104.1",
|
||||
"uvicorn>=0.24.0",
|
||||
"openai==2.13.0",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
rag_agent = "rag_agent:main"
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
64
demos/use_cases/http_filter/sample_queries.md
Normal file
64
demos/use_cases/http_filter/sample_queries.md
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
# Sample Queries for Knowledge Base RAG Agent
|
||||
|
||||
## Service Level Agreement Queries
|
||||
- What is the guaranteed uptime percentage for TechCorp's cloud services?
|
||||
- What remedies are available if the API response time exceeds the agreed threshold?
|
||||
- How quickly must TechCorp respond to critical support issues?
|
||||
- What monitoring and reporting requirements are specified in the SLA?
|
||||
- When was the TechCorp service agreement signed and by whom?
|
||||
|
||||
## Privacy Policy Queries
|
||||
- What encryption methods does DataSecure use to protect data?
|
||||
- How long does DataSecure retain personal data after account deletion?
|
||||
- What rights do users have regarding their personal information?
|
||||
- Can DataSecure sell user data to third parties for marketing?
|
||||
- Who should be contacted for privacy-related concerns at DataSecure?
|
||||
|
||||
## Supply Chain Agreement Queries
|
||||
- What types of automotive components does PrecisionParts supply?
|
||||
- What are the payment terms and volume discount structure?
|
||||
- What quality standards must the supplied components meet?
|
||||
- What are the penalties for late delivery?
|
||||
- What insurance coverage requirements apply to the supplier?
|
||||
|
||||
## Student Data Management Queries
|
||||
- What federal laws must EduTech comply with regarding student data?
|
||||
- What security measures are in place to protect student information?
|
||||
- How long are student records retained after graduation?
|
||||
- What consent is required for students under 13 years old?
|
||||
- Who can access student educational records?
|
||||
|
||||
## Investment Advisory Queries
|
||||
- What is FinanceFirst's management fee structure?
|
||||
- What types of investments are included in the advisory services?
|
||||
- What regulatory body oversees FinanceFirst Advisors?
|
||||
- How often are portfolio reviews conducted?
|
||||
- What are the client's responsibilities under this agreement?
|
||||
|
||||
## Healthcare Standards Queries
|
||||
- What is the target response time for emergency code teams?
|
||||
- What hand hygiene compliance rate is required?
|
||||
- How quickly must medical records be completed after patient encounters?
|
||||
- What continuing education requirements apply to nursing staff?
|
||||
- What patient safety protocols are mandatory upon admission?
|
||||
|
||||
## Cross-Document Queries
|
||||
- Which agreements include confidentiality or data protection provisions?
|
||||
- What are the common termination notice periods across different contract types?
|
||||
- Which documents specify insurance or liability coverage requirements?
|
||||
- What compliance and regulatory requirements are mentioned across agreements?
|
||||
- Which contracts include performance metrics or service level commitments?
|
||||
|
||||
## Complex Analysis Queries
|
||||
- Compare the data retention policies across the privacy policy and student data management documents.
|
||||
- What are the different approaches to risk management across the supply chain and investment advisory agreements?
|
||||
- How do the security measures in the healthcare standards compare to those in the privacy policy?
|
||||
- Which agreements provide the most detailed compliance and regulatory frameworks?
|
||||
- What common themes exist in the quality assurance requirements across different industries?
|
||||
|
||||
## Document-Specific Detail Queries
|
||||
- List all the specific percentages, timeframes, and numerical requirements mentioned in the SLA.
|
||||
- What are all the contact persons and their roles mentioned across the documents?
|
||||
- Identify all the compliance standards and certifications referenced in the supply chain agreement.
|
||||
- What are the specific consequences or penalties mentioned for non-compliance across agreements?
|
||||
- List all the third-party systems, tools, or services mentioned in the documents.
|
||||
109
demos/use_cases/http_filter/src/rag_agent/__init__.py
Normal file
109
demos/use_cases/http_filter/src/rag_agent/__init__.py
Normal file
|
|
@ -0,0 +1,109 @@
|
|||
import click
|
||||
from fastmcp import FastMCP
|
||||
|
||||
mcp = None
|
||||
|
||||
|
||||
@click.command()
|
||||
@click.option(
|
||||
"--transport",
|
||||
"transport",
|
||||
default="streamable-http",
|
||||
help="Transport type: stdio or sse",
|
||||
)
|
||||
@click.option("--host", "host", default="localhost", help="Host to bind MCP server to")
|
||||
@click.option("--port", "port", type=int, default=10500, help="Port for MCP server")
|
||||
@click.option(
|
||||
"--agent",
|
||||
"agent",
|
||||
required=True,
|
||||
help="Agent name: query_rewriter, context_builder, or response_generator",
|
||||
)
|
||||
@click.option(
|
||||
"--name",
|
||||
"agent_name",
|
||||
default=None,
|
||||
help="Custom MCP server name (defaults to agent type)",
|
||||
)
|
||||
@click.option(
|
||||
"--rest-server",
|
||||
"rest_server",
|
||||
is_flag=True,
|
||||
help="Start REST server instead of MCP server",
|
||||
)
|
||||
@click.option("--rest-port", "rest_port", default=8000, help="Port for REST server")
|
||||
def main(host, port, agent, transport, agent_name, rest_server, rest_port):
|
||||
"""Start a RAG agent as an MCP server or REST server."""
|
||||
|
||||
# Map friendly names to agent modules
|
||||
agent_map = {
|
||||
"input_guards": ("rag_agent.input_guards", "Input Guards Agent"),
|
||||
"query_rewriter": ("rag_agent.query_rewriter", "Query Rewriter Agent"),
|
||||
"context_builder": ("rag_agent.context_builder", "Context Builder Agent"),
|
||||
"response_generator": (
|
||||
"rag_agent.rag_agent",
|
||||
"Response Generator Agent",
|
||||
),
|
||||
}
|
||||
|
||||
if agent not in agent_map:
|
||||
print(f"Error: Unknown agent '{agent}'")
|
||||
print(f"Available agents: {', '.join(agent_map.keys())}")
|
||||
return
|
||||
|
||||
module_name, default_name = agent_map[agent]
|
||||
mcp_name = agent_name or default_name
|
||||
|
||||
if rest_server:
|
||||
# REST server mode - supported for query_rewriter and response_generator
|
||||
if agent == "response_generator":
|
||||
print(f"Starting REST server on {host}:{rest_port} for agent: {agent}")
|
||||
from rag_agent.rag_agent import start_server
|
||||
|
||||
start_server(host=host, port=rest_port)
|
||||
return
|
||||
elif agent == "query_rewriter":
|
||||
print(f"Starting REST server on {host}:{rest_port} for agent: {agent}")
|
||||
from rag_agent.query_rewriter import start_server
|
||||
|
||||
start_server(host=host, port=rest_port)
|
||||
return
|
||||
else:
|
||||
print(f"Error: Agent '{agent}' does not support REST server mode.")
|
||||
print(
|
||||
f"REST server is only supported for: query_rewriter, response_generator"
|
||||
)
|
||||
print(f"Remove --rest-server flag to start {agent} as an MCP server.")
|
||||
return
|
||||
else:
|
||||
# Only input_guards, query_rewriter and context_builder support MCP
|
||||
if agent not in ["input_guards", "query_rewriter", "context_builder"]:
|
||||
print(f"Error: Agent '{agent}' does not support MCP mode.")
|
||||
print(
|
||||
f"MCP is only supported for: input_guards, query_rewriter, context_builder"
|
||||
)
|
||||
print(f"Use --rest-server flag to start {agent} as a REST server.")
|
||||
return
|
||||
|
||||
global mcp
|
||||
mcp = FastMCP(mcp_name, host=host, port=port)
|
||||
|
||||
print(f"Starting MCP server: {mcp_name}")
|
||||
print(f" Agent: {agent}")
|
||||
print(f" Transport: {transport}")
|
||||
print(f" Host: {host}")
|
||||
print(f" Port: {port}")
|
||||
|
||||
# Import the agent module to register its tools
|
||||
import importlib
|
||||
|
||||
importlib.import_module(module_name)
|
||||
|
||||
print(f"Agent '{agent}' loaded successfully")
|
||||
print(f"MCP server ready on {transport}://{host}:{port}")
|
||||
|
||||
mcp.run(transport=transport)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
4
demos/use_cases/http_filter/src/rag_agent/__main__.py
Normal file
4
demos/use_cases/http_filter/src/rag_agent/__main__.py
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
from . import main
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
36
demos/use_cases/http_filter/src/rag_agent/api.py
Normal file
36
demos/use_cases/http_filter/src/rag_agent/api.py
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
from pydantic import BaseModel
|
||||
from typing import List, Optional, Dict, Any
|
||||
|
||||
|
||||
class ChatMessage(BaseModel):
|
||||
role: str
|
||||
content: str
|
||||
|
||||
|
||||
class ChatCompletionRequest(BaseModel):
|
||||
model: str
|
||||
messages: List[ChatMessage]
|
||||
temperature: Optional[float] = 1.0
|
||||
max_tokens: Optional[int] = None
|
||||
top_p: Optional[float] = 1.0
|
||||
frequency_penalty: Optional[float] = 0.0
|
||||
presence_penalty: Optional[float] = 0.0
|
||||
stream: Optional[bool] = False
|
||||
stop: Optional[List[str]] = None
|
||||
|
||||
|
||||
class ChatCompletionResponse(BaseModel):
|
||||
id: str
|
||||
object: str = "chat.completion"
|
||||
created: int
|
||||
model: str
|
||||
choices: List[Dict[str, Any]]
|
||||
usage: Dict[str, int]
|
||||
|
||||
|
||||
class ChatCompletionStreamResponse(BaseModel):
|
||||
id: str
|
||||
object: str = "chat.completion.chunk"
|
||||
created: int
|
||||
model: str
|
||||
choices: List[Dict[str, Any]]
|
||||
228
demos/use_cases/http_filter/src/rag_agent/context_builder.py
Normal file
228
demos/use_cases/http_filter/src/rag_agent/context_builder.py
Normal file
|
|
@ -0,0 +1,228 @@
|
|||
import json
|
||||
from typing import List, Optional, Dict, Any
|
||||
from openai import AsyncOpenAI
|
||||
import os
|
||||
import logging
|
||||
import csv
|
||||
from pathlib import Path
|
||||
|
||||
from .api import ChatMessage
|
||||
from fastapi import Request, FastAPI
|
||||
|
||||
# from . import mcp
|
||||
# from fastmcp.server.dependencies import get_http_headers
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - [CONTEXT_BUILDER] - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
### add new setup
|
||||
app = FastAPI(title="RAG Agent Context Builder", version="1.0.0")
|
||||
|
||||
# Configuration for archgw LLM gateway
|
||||
LLM_GATEWAY_ENDPOINT = os.getenv("LLM_GATEWAY_ENDPOINT", "http://localhost:12000/v1")
|
||||
RAG_MODEL = "gpt-4o-mini"
|
||||
|
||||
# Initialize OpenAI client for archgw
|
||||
archgw_client = AsyncOpenAI(
|
||||
base_url=LLM_GATEWAY_ENDPOINT,
|
||||
api_key="EMPTY", # archgw doesn't require a real API key
|
||||
)
|
||||
|
||||
# Global variable to store the knowledge base
|
||||
knowledge_base = []
|
||||
|
||||
|
||||
def load_knowledge_base():
|
||||
"""Load the sample_knowledge_base.csv file into memory on startup."""
|
||||
global knowledge_base
|
||||
|
||||
# Get the path to the CSV file relative to this script
|
||||
current_dir = Path(__file__).parent
|
||||
csv_path = current_dir / "sample_knowledge_base.csv"
|
||||
|
||||
print(f"Loading knowledge base from {csv_path}")
|
||||
|
||||
try:
|
||||
knowledge_base = []
|
||||
with open(csv_path, "r", encoding="utf-8-sig") as file:
|
||||
csv_reader = csv.DictReader(file)
|
||||
for row in csv_reader:
|
||||
knowledge_base.append({"path": row["path"], "content": row["content"]})
|
||||
|
||||
logger.info(f"Loaded {len(knowledge_base)} documents from knowledge base")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error loading knowledge base: {e}")
|
||||
knowledge_base = []
|
||||
|
||||
|
||||
async def find_relevant_passages(
|
||||
query: str,
|
||||
traceparent: Optional[str] = None,
|
||||
request_id: Optional[str] = None,
|
||||
top_k: int = 3,
|
||||
) -> List[Dict[str, str]]:
|
||||
"""Use the LLM to find the most relevant passages from the knowledge base."""
|
||||
|
||||
if not knowledge_base:
|
||||
logger.warning("Knowledge base is empty")
|
||||
return []
|
||||
|
||||
# Create a system prompt for passage selection
|
||||
system_prompt = f"""You are a retrieval assistant that selects the most relevant document passages for a given query.
|
||||
|
||||
Given a user query and a list of document passages, identify the {top_k} most relevant passages that would help answer the query.
|
||||
|
||||
Query: {query}
|
||||
|
||||
Available passages:
|
||||
"""
|
||||
|
||||
# Add all passages with indices
|
||||
for i, doc in enumerate(knowledge_base):
|
||||
system_prompt += (
|
||||
f"\n[{i}] Path: {doc['path']}\nContent: {doc['content'][:500]}...\n"
|
||||
)
|
||||
|
||||
system_prompt += f"""
|
||||
|
||||
Please respond with ONLY the indices of the {top_k} most relevant passages, separated by commas (e.g., "0,3,7").
|
||||
If fewer than {top_k} passages are relevant, return only the relevant ones.
|
||||
If no passages are relevant, return "NONE"."""
|
||||
|
||||
try:
|
||||
# Call archgw to select relevant passages
|
||||
logger.info(f"Calling archgw to find relevant passages for query: '{query}'")
|
||||
|
||||
# Prepare extra headers if traceparent is provided
|
||||
extra_headers = {"x-envoy-max-retries": "3", "x-request-id": request_id}
|
||||
if traceparent:
|
||||
extra_headers["traceparent"] = traceparent
|
||||
|
||||
response = await archgw_client.chat.completions.create(
|
||||
model=RAG_MODEL,
|
||||
messages=[{"role": "system", "content": system_prompt}],
|
||||
temperature=0.1,
|
||||
max_tokens=50,
|
||||
extra_headers=extra_headers,
|
||||
)
|
||||
|
||||
result = response.choices[0].message.content.strip()
|
||||
logger.info(f"LLM selected passages: {result}")
|
||||
|
||||
# Parse the indices
|
||||
if result.upper() == "NONE":
|
||||
return []
|
||||
|
||||
selected_passages = []
|
||||
indices = [
|
||||
int(idx.strip()) for idx in result.split(",") if idx.strip().isdigit()
|
||||
]
|
||||
|
||||
for idx in indices:
|
||||
if 0 <= idx < len(knowledge_base):
|
||||
selected_passages.append(knowledge_base[idx])
|
||||
|
||||
logger.info(f"Selected {len(selected_passages)} relevant passages")
|
||||
return selected_passages
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error finding relevant passages: {e}")
|
||||
return []
|
||||
|
||||
|
||||
async def augment_query_with_context(
|
||||
messages: List[ChatMessage],
|
||||
traceparent: Optional[str] = None,
|
||||
request_id: Optional[str] = None,
|
||||
) -> List[ChatMessage]:
|
||||
"""Extract user query, find relevant context, and augment the messages."""
|
||||
|
||||
# Find the last user message
|
||||
last_user_message = None
|
||||
last_user_index = -1
|
||||
|
||||
for i in range(len(messages) - 1, -1, -1):
|
||||
if messages[i].role == "user":
|
||||
last_user_message = messages[i].content
|
||||
last_user_index = i
|
||||
break
|
||||
|
||||
if not last_user_message:
|
||||
logger.warning("No user message found in conversation")
|
||||
return messages
|
||||
|
||||
logger.info(f"Processing user query: '{last_user_message}'")
|
||||
|
||||
# Find relevant passages
|
||||
relevant_passages = await find_relevant_passages(
|
||||
last_user_message, traceparent, request_id
|
||||
)
|
||||
|
||||
if not relevant_passages:
|
||||
logger.info("No relevant passages found, returning original messages")
|
||||
return messages
|
||||
|
||||
# Build context from relevant passages
|
||||
context_parts = []
|
||||
for i, passage in enumerate(relevant_passages):
|
||||
context_parts.append(
|
||||
f"Document {i+1} ({passage['path']}):\n{passage['content']}"
|
||||
)
|
||||
|
||||
context = "\n\n".join(context_parts)
|
||||
|
||||
# Create augmented content with original query and context
|
||||
augmented_content = f"""{last_user_message} RELEVANT CONTEXT:
|
||||
{context}"""
|
||||
|
||||
# Create updated messages with the augmented query
|
||||
updated_messages = messages.copy()
|
||||
updated_messages[last_user_index] = ChatMessage(
|
||||
role="user", content=augmented_content
|
||||
)
|
||||
|
||||
logger.info(f"Augmented user query with {len(relevant_passages)} relevant passages")
|
||||
|
||||
return updated_messages
|
||||
|
||||
|
||||
# Load knowledge base on module import
|
||||
load_knowledge_base()
|
||||
|
||||
|
||||
@app.post("/")
|
||||
async def context_builder(
|
||||
messages: List[ChatMessage], request: Request
|
||||
) -> List[ChatMessage]:
|
||||
"""MCP tool that augments user queries with relevant context from the knowledge base."""
|
||||
logger.info(f"Received chat completion request with {len(messages)} messages")
|
||||
|
||||
# Get traceparent header from MCP request
|
||||
# headers = get_http_headers()
|
||||
# traceparent_header = headers.get("traceparent")
|
||||
|
||||
traceparent_header = request.headers.get("traceparent")
|
||||
request_id = request.headers.get("x-request-id")
|
||||
|
||||
if traceparent_header:
|
||||
logger.info(f"Received traceparent header: {traceparent_header}")
|
||||
else:
|
||||
logger.info("No traceparent header found")
|
||||
|
||||
# Augment the user query with relevant context
|
||||
updated_messages = await augment_query_with_context(
|
||||
messages, traceparent_header, request_id
|
||||
)
|
||||
|
||||
# Return as dict to minimize text serialization
|
||||
return [{"role": msg.role, "content": msg.content} for msg in updated_messages]
|
||||
|
||||
|
||||
# Register MCP tool only if mcp is available
|
||||
# if mcp is not None:
|
||||
# mcp.tool()(context_builder)
|
||||
172
demos/use_cases/http_filter/src/rag_agent/input_guards.py
Normal file
172
demos/use_cases/http_filter/src/rag_agent/input_guards.py
Normal file
|
|
@ -0,0 +1,172 @@
|
|||
import asyncio
|
||||
import json
|
||||
import time
|
||||
from typing import List, Optional, Dict, Any
|
||||
import uuid
|
||||
from fastapi import FastAPI, Depends, Request, HTTPException
|
||||
|
||||
# from fastmcp.exceptions import ToolError
|
||||
from openai import AsyncOpenAI
|
||||
import os
|
||||
import logging
|
||||
|
||||
from .api import ChatCompletionRequest, ChatCompletionResponse, ChatMessage
|
||||
from . import mcp
|
||||
|
||||
# from fastmcp.server.dependencies import get_http_headers
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - [INPUT_GUARDS] - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Configuration for archgw LLM gateway
|
||||
LLM_GATEWAY_ENDPOINT = os.getenv("LLM_GATEWAY_ENDPOINT", "http://localhost:12000/v1")
|
||||
GUARD_MODEL = "gpt-4o-mini"
|
||||
|
||||
# Initialize OpenAI client for archgw
|
||||
archgw_client = AsyncOpenAI(
|
||||
base_url=LLM_GATEWAY_ENDPOINT,
|
||||
api_key="EMPTY", # archgw doesn't require a real API key
|
||||
)
|
||||
|
||||
app = FastAPI(title="RAG Agent Input Guards", version="1.0.0")
|
||||
|
||||
|
||||
async def validate_query_scope(
|
||||
messages: List[ChatMessage],
|
||||
traceparent_header: Optional[str] = None,
|
||||
request_id: Optional[str] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Validate that the user query is within TechCorp's domain.
|
||||
|
||||
Returns a dict with:
|
||||
- is_valid: bool indicating if query is within scope
|
||||
- reason: str explaining why query is out of scope (if applicable)
|
||||
"""
|
||||
system_prompt = """You are an input validation guard for TechCorp's customer support system.
|
||||
|
||||
Your job is to determine if a user's query is related to TechCorp and its services/products.
|
||||
|
||||
TechCorp is a technology company that provides:
|
||||
- Cloud services and infrastructure
|
||||
- SaaS products
|
||||
- Technical support
|
||||
- Service level agreements (SLAs)
|
||||
- Uptime guarantees
|
||||
- Enterprise solutions
|
||||
|
||||
ALLOW queries about:
|
||||
- TechCorp's services, products, or offerings
|
||||
- TechCorp's pricing, SLAs, uptime, or policies
|
||||
- Technical support for TechCorp products
|
||||
- General questions about TechCorp as a company
|
||||
|
||||
REJECT queries about:
|
||||
- Other companies or their products
|
||||
- General knowledge questions unrelated to TechCorp
|
||||
- Personal advice or topics outside TechCorp's domain
|
||||
- Anything that doesn't relate to TechCorp's business
|
||||
|
||||
Respond in JSON format:
|
||||
{
|
||||
"is_valid": true/false,
|
||||
"reason": "brief explanation if invalid"
|
||||
}"""
|
||||
|
||||
# Get the last user message for validation
|
||||
last_user_message = None
|
||||
for msg in reversed(messages):
|
||||
if msg.role == "user":
|
||||
last_user_message = msg.content
|
||||
break
|
||||
|
||||
if not last_user_message:
|
||||
return {"is_valid": True, "reason": ""}
|
||||
|
||||
# Prepare messages for the guard
|
||||
guard_messages = [
|
||||
{"role": "system", "content": system_prompt},
|
||||
{"role": "user", "content": f"Query to validate: {last_user_message}"},
|
||||
]
|
||||
|
||||
try:
|
||||
# Call archgw using OpenAI client
|
||||
extra_headers = {"x-envoy-max-retries": "3", "x-request-id": request_id}
|
||||
if traceparent_header:
|
||||
extra_headers["traceparent"] = traceparent_header
|
||||
|
||||
logger.info(f"Validating query scope: '{last_user_message}'")
|
||||
response = await archgw_client.chat.completions.create(
|
||||
model=GUARD_MODEL,
|
||||
messages=guard_messages,
|
||||
temperature=0.1,
|
||||
max_tokens=150,
|
||||
extra_headers=extra_headers,
|
||||
)
|
||||
|
||||
result_text = response.choices[0].message.content.strip()
|
||||
|
||||
# Parse JSON response
|
||||
try:
|
||||
result = json.loads(result_text)
|
||||
logger.info(f"Validation result: {result}")
|
||||
return result
|
||||
except json.JSONDecodeError:
|
||||
logger.error(f"Failed to parse validation response: {result_text}")
|
||||
# Default to allowing if parsing fails
|
||||
return {"is_valid": True, "reason": ""}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error validating query: {e}")
|
||||
# Default to allowing if validation fails
|
||||
return {"is_valid": True, "reason": ""}
|
||||
|
||||
|
||||
# @mcp.tool
|
||||
@app.post("/")
|
||||
async def input_guards(
|
||||
messages: List[ChatMessage], request: Request
|
||||
) -> List[ChatMessage]:
|
||||
"""Input guard that validates queries are within TechCorp's domain.
|
||||
|
||||
If the query is out of scope, replaces the user message with a rejection notice.
|
||||
"""
|
||||
logger.info(f"Received request with {len(messages)} messages")
|
||||
|
||||
# Get traceparent header from HTTP request using FastMCP's dependency function
|
||||
# headers = get_http_headers()
|
||||
# traceparent_header = headers.get("traceparent")
|
||||
traceparent_header = request.headers.get("traceparent")
|
||||
request_id = request.headers.get("x-request-id")
|
||||
|
||||
if traceparent_header:
|
||||
logger.info(f"Received traceparent header: {traceparent_header}")
|
||||
else:
|
||||
logger.info("No traceparent header found")
|
||||
|
||||
# Validate the query scope
|
||||
validation_result = await validate_query_scope(
|
||||
messages, traceparent_header, request_id
|
||||
)
|
||||
|
||||
if not validation_result.get("is_valid", True):
|
||||
reason = validation_result.get("reason", "Query is outside TechCorp's domain")
|
||||
logger.warning(f"Query rejected: {reason}")
|
||||
|
||||
# Throw ToolError
|
||||
error_message = f"I apologize, but I can only assist with questions related to TechCorp and its services. Your query appears to be outside this scope. {reason}\n\nPlease ask me about TechCorp's products, services, pricing, SLAs, or technical support."
|
||||
# raise ToolError(error_message)
|
||||
raise HTTPException(
|
||||
status_code=400, detail={"error": "out_of_scope", "message": error_message}
|
||||
)
|
||||
|
||||
logger.info("Query validation passed - forwarding to next filter")
|
||||
return messages
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "healthy"}
|
||||
133
demos/use_cases/http_filter/src/rag_agent/query_rewriter.py
Normal file
133
demos/use_cases/http_filter/src/rag_agent/query_rewriter.py
Normal file
|
|
@ -0,0 +1,133 @@
|
|||
import asyncio
|
||||
import json
|
||||
import time
|
||||
from typing import List, Optional, Dict, Any
|
||||
import uuid
|
||||
from fastapi import FastAPI, Depends, Request
|
||||
from openai import AsyncOpenAI
|
||||
import os
|
||||
import logging
|
||||
|
||||
from .api import ChatCompletionRequest, ChatCompletionResponse, ChatMessage
|
||||
|
||||
# from . import mcp
|
||||
# from fastmcp.server.dependencies import get_http_headers
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - [QUERY_REWRITER] - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Configuration for archgw LLM gateway
|
||||
LLM_GATEWAY_ENDPOINT = os.getenv("LLM_GATEWAY_ENDPOINT", "http://localhost:12000/v1")
|
||||
QUERY_REWRITE_MODEL = "gpt-4o-mini"
|
||||
|
||||
# Initialize OpenAI client for archgw
|
||||
archgw_client = AsyncOpenAI(
|
||||
base_url=LLM_GATEWAY_ENDPOINT,
|
||||
api_key="EMPTY", # archgw doesn't require a real API key
|
||||
)
|
||||
|
||||
app = FastAPI(title="RAG Agent Query Rewriter", version="1.0.0")
|
||||
|
||||
|
||||
async def rewrite_query_with_archgw(
|
||||
messages: List[ChatMessage],
|
||||
traceparent_header: Optional[str] = None,
|
||||
request_id: Optional[str] = None,
|
||||
) -> str:
|
||||
"""Rewrite the last user message for better retrieval. Returns the rewritten query."""
|
||||
system_prompt = """You are a query rewriter that improves user queries for better retrieval.
|
||||
|
||||
Given a conversation history, rewrite the last user message to be more specific and context-aware.
|
||||
The rewritten query should:
|
||||
1. Include relevant context from previous messages
|
||||
2. Be clear and specific for information retrieval
|
||||
3. Maintain the user's intent
|
||||
4. Be concise but comprehensive
|
||||
|
||||
Return only the rewritten query, nothing else."""
|
||||
|
||||
rewrite_messages = [{"role": "system", "content": system_prompt}]
|
||||
for msg in messages:
|
||||
rewrite_messages.append({"role": msg.role, "content": msg.content})
|
||||
|
||||
extra_headers = {"x-envoy-max-retries": "3", "x-request-id": request_id}
|
||||
if traceparent_header:
|
||||
extra_headers["traceparent"] = traceparent_header
|
||||
|
||||
try:
|
||||
logger.info(f"Calling archgw at {LLM_GATEWAY_ENDPOINT} to rewrite query")
|
||||
resp = await archgw_client.chat.completions.create(
|
||||
model=QUERY_REWRITE_MODEL,
|
||||
messages=rewrite_messages,
|
||||
temperature=0.3,
|
||||
max_tokens=200,
|
||||
extra_headers=extra_headers,
|
||||
)
|
||||
rewritten = resp.choices[0].message.content.strip()
|
||||
logger.info(f"Query rewritten successfully: '{rewritten}'")
|
||||
return rewritten
|
||||
except Exception as e:
|
||||
logger.error(f"Error rewriting query: {e}")
|
||||
|
||||
# Fallback: return the original last user message
|
||||
for m in reversed(messages):
|
||||
if m.role == "user":
|
||||
logger.info("Falling back to original user message")
|
||||
return m.content
|
||||
return ""
|
||||
|
||||
|
||||
@app.post("/")
|
||||
async def query_rewriter_http(
|
||||
messages: List[ChatMessage], request: Request
|
||||
) -> List[ChatMessage]:
|
||||
"""HTTP filter endpoint used by Plano (type: http)."""
|
||||
logger.info(f"Received request with {len(messages)} messages")
|
||||
|
||||
traceparent_header = request.headers.get("traceparent")
|
||||
request_id = request.headers.get("x-request-id")
|
||||
|
||||
if traceparent_header:
|
||||
logger.info(f"Received traceparent header: {traceparent_header}")
|
||||
else:
|
||||
logger.info("No traceparent header found")
|
||||
|
||||
rewritten_query = await rewrite_query_with_archgw(
|
||||
messages, traceparent_header, request_id
|
||||
)
|
||||
# Create updated messages with the rewritten query
|
||||
updated_messages = messages.copy()
|
||||
|
||||
# Find and update the last user message with the rewritten query
|
||||
for i in range(len(updated_messages) - 1, -1, -1):
|
||||
if updated_messages[i].role == "user":
|
||||
original_query = updated_messages[i].content
|
||||
updated_messages[i] = ChatMessage(role="user", content=rewritten_query)
|
||||
logger.info(
|
||||
f"Updated user query from '{original_query}' to '{rewritten_query}'"
|
||||
)
|
||||
break
|
||||
updated_messages_data = [
|
||||
{"role": msg.role, "content": msg.content} for msg in updated_messages
|
||||
]
|
||||
updated_messages = [ChatMessage(**msg) for msg in updated_messages_data]
|
||||
|
||||
logger.info("Returning rewritten chat completion response")
|
||||
return updated_messages
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "healthy"}
|
||||
|
||||
|
||||
def start_server(host: str = "0.0.0.0", port: int = 10501):
|
||||
"""Start the FastAPI server for query rewriter."""
|
||||
import uvicorn
|
||||
|
||||
logger.info(f"Starting Query Rewriter REST server on {host}:{port}")
|
||||
uvicorn.run(app, host=host, port=port)
|
||||
221
demos/use_cases/http_filter/src/rag_agent/rag_agent.py
Normal file
221
demos/use_cases/http_filter/src/rag_agent/rag_agent.py
Normal file
|
|
@ -0,0 +1,221 @@
|
|||
import json
|
||||
from fastapi import FastAPI, Request
|
||||
from fastapi.responses import StreamingResponse
|
||||
from openai import AsyncOpenAI
|
||||
import os
|
||||
import logging
|
||||
import time
|
||||
import uuid
|
||||
import uvicorn
|
||||
import asyncio
|
||||
|
||||
from .api import (
|
||||
ChatCompletionRequest,
|
||||
ChatCompletionResponse,
|
||||
ChatCompletionStreamResponse,
|
||||
)
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - [RESPONSE_GENERATOR] - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Configuration for archgw LLM gateway
|
||||
LLM_GATEWAY_ENDPOINT = os.getenv("LLM_GATEWAY_ENDPOINT", "http://localhost:12000/v1")
|
||||
RESPONSE_MODEL = "gpt-4o"
|
||||
|
||||
# System prompt for response generation
|
||||
SYSTEM_PROMPT = """You are a helpful assistant that generates coherent, contextual responses.
|
||||
|
||||
Given a conversation history, generate a helpful and relevant response based on all the context available in the messages.
|
||||
Your response should:
|
||||
1. Be contextually aware of the entire conversation
|
||||
2. Address the user's needs appropriately
|
||||
3. Be helpful and informative
|
||||
4. Maintain a natural conversational tone
|
||||
|
||||
Generate a complete response to assist the user."""
|
||||
|
||||
# Initialize OpenAI client for archgw
|
||||
archgw_client = AsyncOpenAI(
|
||||
base_url=LLM_GATEWAY_ENDPOINT,
|
||||
api_key="EMPTY", # archgw doesn't require a real API key
|
||||
)
|
||||
|
||||
# FastAPI app for REST server
|
||||
app = FastAPI(title="RAG Agent Response Generator", version="1.0.0")
|
||||
|
||||
|
||||
def prepare_response_messages(request_body: ChatCompletionRequest):
|
||||
"""Prepare messages for response generation by adding system prompt."""
|
||||
response_messages = [{"role": "system", "content": SYSTEM_PROMPT}]
|
||||
|
||||
# Add conversation history
|
||||
for msg in request_body.messages:
|
||||
response_messages.append({"role": msg.role, "content": msg.content})
|
||||
|
||||
return response_messages
|
||||
|
||||
|
||||
@app.post("/v1/chat/completions")
|
||||
async def chat_completion_http(request: Request, request_body: ChatCompletionRequest):
|
||||
"""HTTP endpoint for chat completions with streaming support."""
|
||||
logger.info(
|
||||
f"Received chat completion request with {len(request_body.messages)} messages"
|
||||
)
|
||||
|
||||
# Get traceparent header from HTTP request
|
||||
traceparent_header = request.headers.get("traceparent")
|
||||
request_id = request.headers.get("x-request-id") or f"req-{uuid.uuid4().hex}"
|
||||
|
||||
if traceparent_header:
|
||||
logger.info(f"Received traceparent header: {traceparent_header}")
|
||||
else:
|
||||
logger.info("No traceparent header found")
|
||||
|
||||
return StreamingResponse(
|
||||
stream_chat_completions(request_body, traceparent_header, request_id),
|
||||
media_type="text/plain",
|
||||
headers={
|
||||
"content-type": "text/event-stream",
|
||||
"x-request-id": request_id,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
async def stream_chat_completions(
|
||||
request_body: ChatCompletionRequest,
|
||||
traceparent_header: str = None,
|
||||
request_id: str = None,
|
||||
):
|
||||
"""Generate streaming chat completions."""
|
||||
# Prepare messages for response generation
|
||||
response_messages = prepare_response_messages(request_body)
|
||||
|
||||
try:
|
||||
# Call archgw using OpenAI client for streaming
|
||||
logger.info(
|
||||
f"Calling archgw at {LLM_GATEWAY_ENDPOINT} to generate streaming response"
|
||||
)
|
||||
|
||||
# Prepare extra headers if traceparent is provided
|
||||
extra_headers = {"x-envoy-max-retries": "3", "x-request-id": request_id}
|
||||
if traceparent_header:
|
||||
extra_headers["traceparent"] = traceparent_header
|
||||
|
||||
response_stream = await archgw_client.chat.completions.create(
|
||||
model=RESPONSE_MODEL,
|
||||
messages=response_messages,
|
||||
temperature=request_body.temperature or 0.7,
|
||||
max_tokens=request_body.max_tokens or 1000,
|
||||
stream=True,
|
||||
extra_headers=extra_headers,
|
||||
)
|
||||
|
||||
completion_id = f"chatcmpl-{uuid.uuid4().hex[:8]}"
|
||||
created_time = int(time.time())
|
||||
collected_content = []
|
||||
|
||||
async for chunk in response_stream:
|
||||
if chunk.choices and chunk.choices[0].delta.content:
|
||||
content = chunk.choices[0].delta.content
|
||||
collected_content.append(content)
|
||||
|
||||
# Create streaming response chunk
|
||||
stream_chunk = ChatCompletionStreamResponse(
|
||||
id=completion_id,
|
||||
created=created_time,
|
||||
model=request_body.model,
|
||||
choices=[
|
||||
{
|
||||
"index": 0,
|
||||
"delta": {"content": content},
|
||||
"finish_reason": None,
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
yield f"data: {stream_chunk.model_dump_json()}\n\n"
|
||||
|
||||
# Send final chunk with complete response in expected format
|
||||
full_response = "".join(collected_content)
|
||||
updated_history = [{"role": "assistant", "content": full_response}]
|
||||
|
||||
final_chunk = ChatCompletionStreamResponse(
|
||||
id=completion_id,
|
||||
created=created_time,
|
||||
model=request_body.model,
|
||||
choices=[
|
||||
{
|
||||
"index": 0,
|
||||
"delta": {},
|
||||
"finish_reason": "stop",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": json.dumps(updated_history),
|
||||
},
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
yield f"data: {final_chunk.model_dump_json()}\n\n"
|
||||
yield "data: [DONE]\n\n"
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error generating streaming response: {e}")
|
||||
|
||||
# Send error as streaming response
|
||||
error_chunk = ChatCompletionStreamResponse(
|
||||
id=f"chatcmpl-{uuid.uuid4().hex[:8]}",
|
||||
created=int(time.time()),
|
||||
model=request_body.model,
|
||||
choices=[
|
||||
{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"content": "I apologize, but I'm having trouble generating a response right now. Please try again."
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
yield f"data: {error_chunk.model_dump_json()}\n\n"
|
||||
yield "data: [DONE]\n\n"
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health_check():
|
||||
"""Health check endpoint."""
|
||||
return {"status": "healthy"}
|
||||
|
||||
|
||||
def start_server(host: str = "localhost", port: int = 8000):
|
||||
"""Start the REST server."""
|
||||
uvicorn.run(
|
||||
app,
|
||||
host=host,
|
||||
port=port,
|
||||
log_config={
|
||||
"version": 1,
|
||||
"disable_existing_loggers": False,
|
||||
"formatters": {
|
||||
"default": {
|
||||
"format": "%(asctime)s - [RESPONSE_GENERATOR] - %(levelname)s - %(message)s",
|
||||
},
|
||||
},
|
||||
"handlers": {
|
||||
"default": {
|
||||
"formatter": "default",
|
||||
"class": "logging.StreamHandler",
|
||||
"stream": "ext://sys.stdout",
|
||||
},
|
||||
},
|
||||
"root": {
|
||||
"level": "INFO",
|
||||
"handlers": ["default"],
|
||||
},
|
||||
},
|
||||
)
|
||||
|
|
@ -0,0 +1,257 @@
|
|||
path,content
|
||||
TechCorp_CloudServices_SLA_Agreement_2024,"SERVICE LEVEL AGREEMENT
|
||||
This Service Level Agreement (""SLA"") is entered into on March 15, 2024, between TechCorp Solutions Inc., a Delaware corporation (""Provider""), and CloudFirst Enterprises LLC (""Customer"").
|
||||
|
||||
DEFINITIONS
|
||||
Service Availability: The percentage of time during which the cloud services are operational and accessible.
|
||||
Downtime: Any period when the services are unavailable or inaccessible to Customer.
|
||||
Response Time: The time between service request submission and initial response from Provider.
|
||||
|
||||
SERVICE COMMITMENTS
|
||||
Provider guarantees 99.9% uptime for all cloud infrastructure services during any calendar month.
|
||||
Average response time for API calls shall not exceed 200 milliseconds under normal operating conditions.
|
||||
Customer support response times: Critical issues within 1 hour, Standard issues within 4 hours.
|
||||
|
||||
REMEDIES
|
||||
For each full percentage point below 99.9% availability, Customer receives 10% credit on monthly fees.
|
||||
If response times exceed 500ms for more than 5 minutes in any hour, Customer receives 5% monthly credit.
|
||||
|
||||
MONITORING AND REPORTING
|
||||
Provider will maintain real-time monitoring systems and provide monthly performance reports.
|
||||
All metrics will be measured from Provider's monitoring systems located in primary data centers.
|
||||
|
||||
This SLA remains in effect for the duration of the underlying service agreement.
|
||||
|
||||
Executed by:
|
||||
TechCorp Solutions Inc.
|
||||
Sarah Mitchell, VP Operations
|
||||
Date: March 15, 2024
|
||||
|
||||
CloudFirst Enterprises LLC
|
||||
Robert Chen, CTO
|
||||
Date: March 16, 2024"
|
||||
|
||||
DataSecure_Privacy_Policy_v3.2,"PRIVACY POLICY
|
||||
DataSecure Analytics, Inc. (""Company"") Privacy Policy
|
||||
Effective Date: January 1, 2024
|
||||
Last Updated: February 28, 2024
|
||||
|
||||
INFORMATION COLLECTION
|
||||
We collect information you provide directly, such as account details, usage preferences, and communication records.
|
||||
Automatically collected data includes IP addresses, browser types, device information, and service interaction logs.
|
||||
Third-party integrations may provide additional user behavior and demographic information with consent.
|
||||
|
||||
DATA USAGE
|
||||
Personal information is used to provide services, improve user experience, and communicate service updates.
|
||||
Aggregated, non-identifiable data may be used for analytics, research, and service enhancement.
|
||||
We do not sell personal information to third parties for marketing purposes.
|
||||
|
||||
DATA PROTECTION
|
||||
All data is encrypted in transit using TLS 1.3 and at rest using AES-256 encryption.
|
||||
Access controls limit data access to authorized personnel only on a need-to-know basis.
|
||||
Regular security audits and penetration testing ensure ongoing protection measures.
|
||||
|
||||
DATA RETENTION
|
||||
Personal data is retained for the duration of active service plus 24 months.
|
||||
Logs and analytics data are retained for 12 months unless legally required otherwise.
|
||||
Upon account deletion, personal data is permanently removed within 30 days.
|
||||
|
||||
USER RIGHTS
|
||||
Users may request access to, correction of, or deletion of their personal information.
|
||||
Data portability requests will be fulfilled in standard formats within 30 days.
|
||||
Marketing communications can be opted out of at any time.
|
||||
|
||||
CONTACT
|
||||
For privacy concerns, contact: privacy@datasecure.com
|
||||
Data Protection Officer: Jennifer Walsh, jwalsh@datasecure.com"
|
||||
|
||||
GlobalManufacturing_SupplyChain_Contract_Q2_2024,"SUPPLY CHAIN AGREEMENT
|
||||
This Supply Chain Agreement is entered into between GlobalManufacturing Corp (""Buyer"") and PrecisionParts Ltd (""Supplier"") effective April 1, 2024.
|
||||
|
||||
SCOPE OF SERVICES
|
||||
Supplier will provide automotive components including brake assemblies, suspension parts, and electrical harnesses.
|
||||
All products must meet ISO 9001 quality standards and automotive industry specifications.
|
||||
Delivery schedule: Weekly shipments every Tuesday, with 48-hour advance shipping notifications.
|
||||
|
||||
PRICING AND PAYMENT
|
||||
Component pricing is fixed for initial 6-month term with quarterly price review thereafter.
|
||||
Payment terms: Net 45 days from invoice date via electronic transfer.
|
||||
Volume discounts apply: 5% for orders exceeding 10,000 units per month, 8% for orders exceeding 25,000 units.
|
||||
|
||||
QUALITY REQUIREMENTS
|
||||
All components must pass incoming inspection with less than 0.1% defect rate.
|
||||
Supplier maintains quality certifications including IATF 16949 and environmental compliance.
|
||||
Batch tracking and traceability required for all delivered components.
|
||||
|
||||
LOGISTICS AND DELIVERY
|
||||
Supplier responsible for packaging, labeling, and delivery to Buyer's distribution centers.
|
||||
Delivery windows: 8 AM - 4 PM, Monday through Friday, with advance appointment scheduling.
|
||||
Late delivery penalties: 2% of shipment value for each day beyond scheduled delivery.
|
||||
|
||||
RISK MANAGEMENT
|
||||
Supplier maintains business continuity plans and alternative sourcing strategies.
|
||||
Force majeure events must be reported within 24 hours with mitigation plans.
|
||||
Insurance requirements: $5M general liability, $2M product liability coverage.
|
||||
|
||||
INTELLECTUAL PROPERTY
|
||||
All custom tooling and specifications remain property of Buyer.
|
||||
Supplier grants license to use necessary patents for component manufacturing.
|
||||
|
||||
This agreement shall remain in effect for 24 months with automatic renewal unless terminated.
|
||||
|
||||
GlobalManufacturing Corp
|
||||
Michael Rodriguez, Supply Chain Director
|
||||
Date: April 1, 2024
|
||||
|
||||
PrecisionParts Ltd
|
||||
Amanda Foster, VP Sales
|
||||
Date: April 2, 2024"
|
||||
|
||||
EduTech_StudentData_Management_Policy_2024,"STUDENT DATA MANAGEMENT POLICY
|
||||
EduTech Learning Platform - Data Management and Protection Policy
|
||||
Document Version: 2.1
|
||||
Effective Date: August 15, 2024
|
||||
|
||||
SCOPE AND PURPOSE
|
||||
This policy governs the collection, use, storage, and protection of student educational records and personal information.
|
||||
Applies to all employees, contractors, and third-party service providers accessing student data.
|
||||
Compliance with FERPA, COPPA, and state student privacy laws is mandatory.
|
||||
|
||||
DATA CLASSIFICATION
|
||||
Educational Records: Grades, attendance, assignments, and academic progress information.
|
||||
Personal Information: Names, addresses, contact details, and demographic information.
|
||||
Behavioral Data: Learning patterns, platform usage, and engagement metrics.
|
||||
|
||||
COLLECTION PRINCIPLES
|
||||
Data collection is limited to educational purposes and service improvement only.
|
||||
Parental consent required for students under 13 years of age.
|
||||
Students and parents have right to review and request corrections to educational records.
|
||||
|
||||
ACCESS CONTROLS
|
||||
Role-based access ensures personnel see only data necessary for their functions.
|
||||
Multi-factor authentication required for all system access.
|
||||
Access logs maintained and reviewed monthly for unauthorized activity.
|
||||
|
||||
DATA SHARING
|
||||
Educational records shared only with authorized school personnel and parents/students.
|
||||
No data sharing with third parties for commercial purposes without explicit consent.
|
||||
Research data must be de-identified and aggregated before external sharing.
|
||||
|
||||
SECURITY MEASURES
|
||||
Data encrypted using industry-standard protocols during transmission and storage.
|
||||
Regular security assessments and vulnerability testing conducted quarterly.
|
||||
Incident response plan includes notification procedures for data breaches.
|
||||
|
||||
RETENTION AND DISPOSAL
|
||||
Student records retained according to school district policies, typically 5-7 years post-graduation.
|
||||
Inactive accounts and associated data purged after 2 years of non-use.
|
||||
Secure data destruction protocols ensure complete removal of sensitive information.
|
||||
|
||||
COMPLIANCE MONITORING
|
||||
Annual privacy training required for all staff handling student data.
|
||||
Regular audits ensure ongoing compliance with applicable privacy regulations.
|
||||
Privacy impact assessments conducted for new features or data uses.
|
||||
|
||||
Contact: Dr. Lisa Thompson, Chief Privacy Officer
|
||||
Email: privacy@edutech-learning.com
|
||||
Phone: (555) 123-4567"
|
||||
|
||||
FinanceFirst_Investment_Advisory_Agreement_2024,"INVESTMENT ADVISORY AGREEMENT
|
||||
This Investment Advisory Agreement is entered into between FinanceFirst Advisors LLC (""Advisor"") and Madison Investment Group (""Client"") on May 20, 2024.
|
||||
|
||||
ADVISORY SERVICES
|
||||
Advisor will provide comprehensive investment management and financial planning services.
|
||||
Services include portfolio construction, asset allocation, risk assessment, and performance monitoring.
|
||||
Regular portfolio reviews conducted quarterly with detailed performance reporting.
|
||||
|
||||
INVESTMENT AUTHORITY
|
||||
Client grants Advisor discretionary authority to make investment decisions within agreed parameters.
|
||||
Investment universe includes stocks, bonds, ETFs, mutual funds, and alternative investments as appropriate.
|
||||
All trades executed through qualified broker-dealers with best execution practices.
|
||||
|
||||
FEE STRUCTURE
|
||||
Management fee: 1.25% annually on assets under management, calculated and billed quarterly.
|
||||
Performance fee: 15% of returns exceeding S&P 500 benchmark, calculated annually.
|
||||
Additional fees may apply for specialized services such as tax planning or estate planning.
|
||||
|
||||
CLIENT RESPONSIBILITIES
|
||||
Client must provide accurate financial information and promptly communicate changes in circumstances.
|
||||
Investment objectives and risk tolerance should be reviewed and updated annually.
|
||||
Client responsible for reviewing and approving investment policy statement.
|
||||
|
||||
RISK DISCLOSURE
|
||||
All investments carry risk of loss, and past performance does not guarantee future results.
|
||||
Diversification does not ensure profit or protect against loss in declining markets.
|
||||
Alternative investments may have limited liquidity and higher volatility.
|
||||
|
||||
REGULATORY COMPLIANCE
|
||||
Advisor is registered with the Securities and Exchange Commission as an investment advisor.
|
||||
All activities conducted in accordance with Investment Advisers Act of 1940 and applicable regulations.
|
||||
Form ADV Part 2 brochure provided annually with material updates.
|
||||
|
||||
CONFIDENTIALITY
|
||||
All client information treated as confidential and shared only as necessary for service provision.
|
||||
Third-party service providers bound by confidentiality agreements.
|
||||
Client data protected through secure systems and access controls.
|
||||
|
||||
TERMINATION
|
||||
Either party may terminate agreement with 30 days written notice.
|
||||
Upon termination, Advisor will assist with orderly transfer of assets to new custodian or advisor.
|
||||
Final fee calculation prorated to date of termination.
|
||||
|
||||
FinanceFirst Advisors LLC
|
||||
Thomas Anderson, Managing Partner
|
||||
Date: May 20, 2024
|
||||
|
||||
Madison Investment Group
|
||||
Rebecca Martinez, Chief Investment Officer
|
||||
Date: May 21, 2024"
|
||||
|
||||
HealthSystem_PatientCare_Standards_2024,"PATIENT CARE STANDARDS AND PROTOCOLS
|
||||
Metropolitan Health System - Clinical Care Standards
|
||||
Document ID: MHS-PCS-2024-001
|
||||
Effective Date: June 1, 2024
|
||||
|
||||
PATIENT SAFETY PROTOCOLS
|
||||
All patients must have proper identification verification using two unique identifiers.
|
||||
Medication administration requires independent double-check for high-risk medications.
|
||||
Fall risk assessments completed within 4 hours of admission with appropriate interventions.
|
||||
|
||||
CLINICAL DOCUMENTATION
|
||||
Medical records must be completed within 24 hours of patient encounter.
|
||||
All entries require electronic signature with timestamp and provider identification.
|
||||
Critical values and abnormal results must be communicated and documented immediately.
|
||||
|
||||
INFECTION CONTROL
|
||||
Hand hygiene compliance monitored with target rate of 95% or higher.
|
||||
Personal protective equipment used according to transmission-based precautions.
|
||||
Isolation procedures implemented within 2 hours of identification of infectious conditions.
|
||||
|
||||
EMERGENCY RESPONSE
|
||||
Code team response time target: 3 minutes from activation to arrival.
|
||||
Crash cart and emergency equipment checks performed daily and documented.
|
||||
All staff required to maintain current CPR and emergency response certifications.
|
||||
|
||||
PATIENT COMMUNICATION
|
||||
Patient rights and responsibilities communicated upon admission.
|
||||
Informed consent obtained and documented prior to procedures and treatments.
|
||||
Family involvement encouraged with respect for patient privacy preferences.
|
||||
|
||||
QUALITY MEASURES
|
||||
Patient satisfaction scores monitored monthly with target of 4.5/5.0 or higher.
|
||||
Medication error rates tracked with goal of less than 1 per 1000 patient days.
|
||||
Hospital-acquired infection rates measured and benchmarked against national standards.
|
||||
|
||||
STAFF COMPETENCY
|
||||
Annual competency assessments required for all clinical staff.
|
||||
Continuing education requirements: 24 hours annually for nurses, 40 hours for physicians.
|
||||
Specialty certifications maintained according to department and role requirements.
|
||||
|
||||
TECHNOLOGY STANDARDS
|
||||
Electronic health record system used for all patient documentation.
|
||||
Telemedicine capabilities available for remote consultations and monitoring.
|
||||
Clinical decision support tools integrated to assist with diagnosis and treatment decisions.
|
||||
|
||||
Contact: Dr. Patricia Williams, Chief Medical Officer
|
||||
Email: pwilliams@metrohealthsystem.org
|
||||
Phone: (555) 987-6543"
|
||||
|
78
demos/use_cases/http_filter/start_agents.sh
Normal file
78
demos/use_cases/http_filter/start_agents.sh
Normal file
|
|
@ -0,0 +1,78 @@
|
|||
# #!/bin/bash
|
||||
# set -e
|
||||
|
||||
# WAIT_FOR_PIDS=()
|
||||
|
||||
# log() {
|
||||
# timestamp=$(python3 -c 'from datetime import datetime; print(datetime.now().strftime("%Y-%m-%d %H:%M:%S,%f")[:23])')
|
||||
# message="$*"
|
||||
# echo "$timestamp - $message"
|
||||
# }
|
||||
|
||||
# cleanup() {
|
||||
# log "Caught signal, terminating all user processes ..."
|
||||
# for PID in "${WAIT_FOR_PIDS[@]}"; do
|
||||
# if kill $PID 2> /dev/null; then
|
||||
# log "killed process: $PID"
|
||||
# fi
|
||||
# done
|
||||
# exit 1
|
||||
# }
|
||||
|
||||
# trap cleanup EXIT
|
||||
|
||||
# log "Starting input_guards agent on port 10500/mcp..."
|
||||
# uv run python -m rag_agent --rest-server --host 0.0.0.0 --rest-port 10500 --agent input_guards &
|
||||
# WAIT_FOR_PIDS+=($!)
|
||||
|
||||
# log "Starting query_rewriter agent on port 10501/mcp..."
|
||||
# uv run python -m rag_agent --rest-server --host 0.0.0.0 --rest-port 10501 --agent query_rewriter &
|
||||
# WAIT_FOR_PIDS+=($!)
|
||||
|
||||
# log "Starting context_builder agent on port 10502/mcp..."
|
||||
# uv run python -m rag_agent --rest-server --host 0.0.0.0 --rest-port 10502 --agent context_builder &
|
||||
# WAIT_FOR_PIDS+=($!)
|
||||
|
||||
# # log "Starting response_generator agent on port 10400..."
|
||||
# # uv run python -m rag_agent --host 0.0.0.0 --port 10400 --agent response_generator &
|
||||
# # WAIT_FOR_PIDS+=($!)
|
||||
|
||||
# log "Starting response_generator agent on port 10505..."
|
||||
# uv run python -m rag_agent --rest-server --host 0.0.0.0 --rest-port 10505 --agent response_generator &
|
||||
# WAIT_FOR_PIDS+=($!)
|
||||
|
||||
# for PID in "${WAIT_FOR_PIDS[@]}"; do
|
||||
# wait "$PID"
|
||||
# done
|
||||
|
||||
|
||||
|
||||
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
export PYTHONPATH=/app/src
|
||||
|
||||
pids=()
|
||||
|
||||
log() { echo "$(date '+%F %T') - $*"; }
|
||||
|
||||
log "Starting input_guards HTTP server on :10500"
|
||||
uv run uvicorn rag_agent.input_guards:app --host 0.0.0.0 --port 10500 &
|
||||
pids+=($!)
|
||||
|
||||
log "Starting query_rewriter HTTP server on :10501"
|
||||
uv run uvicorn rag_agent.query_rewriter:app --host 0.0.0.0 --port 10501 &
|
||||
pids+=($!)
|
||||
|
||||
log "Starting context_builder HTTP server on :10502"
|
||||
uv run uvicorn rag_agent.context_builder:app --host 0.0.0.0 --port 10502 &
|
||||
pids+=($!)
|
||||
|
||||
log "Starting response_generator (OpenAI-compatible) on :10505"
|
||||
uv run uvicorn rag_agent.rag_agent:app --host 0.0.0.0 --port 10505 &
|
||||
pids+=($!)
|
||||
|
||||
for PID in "${pids[@]}"; do
|
||||
wait "$PID"
|
||||
done
|
||||
92
demos/use_cases/http_filter/test.rest
Normal file
92
demos/use_cases/http_filter/test.rest
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
@baseUrl = http://0.0.0.0:10502
|
||||
@model = gpt-4o
|
||||
|
||||
# Health Check
|
||||
GET {{baseUrl}}/health
|
||||
|
||||
###
|
||||
|
||||
# Test 1: Simple Non-Streaming Chat Completion
|
||||
POST {{baseUrl}}/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "{{model}}",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello! Can you help me understand what machine learning is?"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
###
|
||||
|
||||
# Test 2: Simple Streaming Chat Completion
|
||||
POST {{baseUrl}}/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "{{model}}",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Explain the concept of artificial intelligence in simple terms."
|
||||
}
|
||||
],
|
||||
"stream": true
|
||||
}
|
||||
|
||||
### Test 3
|
||||
POST http://localhost:8001/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "{{model}}",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
|
||||
}
|
||||
],
|
||||
"stream": true
|
||||
}
|
||||
|
||||
### send request to query_rewriter agent
|
||||
POST http://localhost:10500/
|
||||
Content-Type: application/json
|
||||
|
||||
[
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What is the guaranteed uptime percentage for TechCorp's cloud services?"
|
||||
}
|
||||
]
|
||||
|
||||
### test fast-llm
|
||||
POST http://localhost:12000/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "fast-llm",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "hello"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
### test smart-llm
|
||||
POST http://localhost:12000/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "smart-llm",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "hello"
|
||||
}
|
||||
]
|
||||
}
|
||||
1830
demos/use_cases/http_filter/uv.lock
generated
Normal file
1830
demos/use_cases/http_filter/uv.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue