updated all demo READMes and minor doc changes (#154)

* updated all demo READMes and minor doc changes * minor typo fixes * updated main Readme * fixed README and docs * fixed README and docs --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2026-07-26 17:01:04 +02:00 · 2024-10-08 23:58:55 -07:00 · 2024-10-08 23:58:55 -07:00 · 42d4a28e13
commit 42d4a28e13
parent b63a01fe82
22 changed files with 324 additions and 1455 deletions
--- a/README.md
+++ b/README.md
@ -10,30 +10,177 @@ Arch is an intelligent [Layer 7](https://www.cloudflare.com/learning/ddos/what-i

 *Prompts are nuanced and opaque user requests, which require the same capabilities as traditional HTTP requests including secure handling, intelligent routing, robust observability, and integration with backend (API) systems for personalization – all outside business logic.*

+**Core Features**:
+  - Built on [Envoy](https://envoyproxy.io): Arch runs alongside application servers build on top of Envoy's proven HTTP management and scalability features to handle ingress and egreess prompts and LLM traffic
+  - Engineered with purpose-built [(fast) LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68): Arch is optimized for sub-billion parameter LLMs to handle fast, cost-effective, and accurate prompt-based tasks like function/API calling.
+  - Prompt [Guardrails](https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d): Arch centralizes prompt guardrails to prevent jailbreak attempts and ensure safe user interactions without writing extra code.
+  - Traffic Management: Arch manages LLM calls, offering smart retries, automatic cutover, and resilient upstream connections for continuous availability.
+  - Open Observability: Arch uses the W3C Trace Context standard to enable complete request tracing across applications, ensuring compatibility with observability tools, and provides metrics to monitor latency, token usage, and error rates, helping optimize AI application performance.
+  - [Coming Soon] Intent-Markers: Arch helps developers detect when users shift their intent, improving response relevance, token cost, and speed.
+
+**Jump to our [docs](https://docs.archgw.com)** to learn more about how you can use Arch to improve the speed, robustneess and personalization of your GenAI apps

 # Contact
-To get in touch with us, please join our [discord server](https://discord.gg/rbjqVbpa). We will be monitoring that actively.
+To get in touch with us, please join our [discord server](https://discord.gg/rbjqVbpa). We will be monitoring that actively and offering support there.

 # Demos
-## Complete
-* [Weather Forecast](demos/function-calling/README.md)
-  * Showing function calling cabaility
-## In progress
-* Network Co-pilot
-## Not Started
-* Show routing between different prompt targets (keyword search vs. top-k semantic search).
-* Show routing between different prompt-resolver vs RAG-based resolver targets.
-* Text Summarization Based on Lightweight vs. Thoughtful Dialogue using OpenAI
-* Show conversational and system observability metrics. This includes topic/intent detection
-* Show how we can help developers implement safeguards customized to their application requirements and responsible AI policies.
+* [Function Calling](demos/function_calling/README.md) -Showcases critical function calling cabaility
+* [Insurance Agent](demos/insurance_agent/README.md) -Build a full insurance agent with arch
+* [Network Agent](demos/network_agent/README.md) - Build a networking co-pilot/agent agent with arch

-# Dev setup
+# Quickstart

-## Pre-commit
-Use instructions at [pre-commit.com](https://pre-commit.com/#install) to set it up for your machine. Once installed make sure github hooks are setup, so that when you upstream your change pre-commit hooks can run and validate your change. Follow command below to setup github hooks,
+Follow this guide to learn how to quickly set up Arch and integrate it into your generative AI applications.

-```sh
-$ brew install pre-commit
-$ pre-commit install
-pre-commit installed at .git/hooks/pre-commit
+## Prerequisites
+
+Before you begin, ensure you have the following:
+
+- `Docker` & `Python` installed on your system
+- `API Keys` for LLM providers (if using external LLMs)
+
+The fastest way to get started using Arch is to use [katanemo/arch](https://hub.docker.com/r/katanemo/arch) pre-built binaries.
+You can also build it from source.
+
+## Step 1: Install Arch
+
+Arch's CLI allows you to manage and interact with the Arch gateway efficiently. To install the CLI, simply run the following command:
+Tip: We recommend that developers create a new Python virtual environment to isolate dependencies before installing Arch. This ensures that archgw and its dependencies do not interfere with other packages on your system.
+
+
+```console
+$ python -m venv venv
+$ source venv/bin/activate   # On Windows, use: venv\Scripts\activate
+$ pip install archgw
 ```
+
+## Step 2: Configure Arch with your application
+
+Arch operates based on a configuration file where you can define LLM providers, prompt targets, guardrails, etc.
+Below is an example configuration to get you started:
+
+```yaml
+version: v0.1
+
+listen:
+  address: 0.0.0.0 # or 127.0.0.1
+  port: 10000
+  # Defines how Arch should parse the content from application/json or text/pain Content-type in the http request
+  message_format: huggingface
+
+# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
+llm_providers:
+  - name: OpenAI
+    provider: openai
+    access_key: OPENAI_API_KEY
+    model: gpt-4o
+    default: true
+    stream: true
+
+# default system prompt used by all prompt targets
+system_prompt: You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
+
+prompt_targets:
+  - name: reboot_devices
+    description: Reboot specific devices or device groups
+
+    path: /agent/device_reboot
+    parameters:
+      - name: device_ids
+        type: list
+        description: A list of device identifiers (IDs) to reboot.
+        required: false
+      - name: device_group
+        type: str
+        description: The name of the device group to reboot
+        required: false
+
+# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
+endpoints:
+  app_server:
+    # value could be ip address or a hostname with port
+    # this could also be a list of endpoints for load balancing
+    # for example endpoint: [ ip1:port, ip2:port ]
+    endpoint: 127.0.0.1:80
+    # max time to wait for a connection to be established
+    connect_timeout: 0.005s
+```
+## Step 3: Using OpenAI Client with Arch as an Egress Gateway
+
+Make outbound calls via Arch
+
+```python
+import openai
+
+# Set the OpenAI API base URL to the Arch gateway endpoint
+openai.api_base = "http://127.0.0.1:51001/v1"
+
+# No need to set openai.api_key since it's configured in Arch's gateway
+
+# Use the OpenAI client as usual
+response = openai.Completion.create(
+   model="text-davinci-003",
+   prompt="What is the capital of France?"
+)
+
+print("OpenAI Response:", response.choices[0].text.strip())
+
+```
+
+## Observability
+
+
+## Contribution
+We would love feedback on our [Roadmap](https://github.com/orgs/katanemo/projects/1) and we welcome contributions to **Arch**!
+Whether you're fixing bugs, adding new features, improving documentation, or creating tutorials, your help is much appreciated.
+
+## How to Contribute
+
+### 1. Fork the Repository
+
+Fork the repository to create your own version of **Arch**:
+
+- Navigate to the [Arch GitHub repository](https://github.com/katanemo/arch).
+- Click the "Fork" button in the upper right corner.
+- This will create a copy of the repository under your GitHub account.
+
+### 2. Clone Your Fork
+
+Once you've forked the repository, clone it to your local machine:
+
+```bash
+$ git clone https://github.com/katanemo/arch.git
+$ cd arch
+```
+
+### 3. Create a branch
+Use a descriptive name for your branch (e.g., fix-bug-123, add-feature-x).
+```bash
+$ git checkout -b <your-branch-name>
+```
+
+### 4. Make Your changes
+
+Make your changes in the relevant files. If you're adding new features or fixing bugs, please include tests where applicable.
+
+### 5. Test your changes
+```bash
+cd arch
+cargo test
+```
+
+### 6. Push changes, and create a Pull request
+
+Go back to the original Arch repository, and you should see a "Compare & pull request" button. Click that to submit a Pull Request (PR). In your PR description, clearly explain the changes you made and why they are necessary.
+
+We will review your pull request and provide feedback. Once approved, your contribution will be merged into the main repository!
+
+Contribution Guidelines
+
+    Ensure that all existing tests pass.
+    Write clear commit messages.
+    Add tests for any new functionality.
+    Follow the existing coding style.
+    Update documentation as needed.
+
+To get in touch with us, please join our [discord server](https://discord.gg/rbjqVbpa). We will be monitoring that actively and offering support there.
--- a/demos/employee_details_copilot/Bolt-FC-1B-Q4_K_M.model_file
+++ b/demos/employee_details_copilot/Bolt-FC-1B-Q4_K_M.model_file
@ -1,24 +0,0 @@
-FROM Bolt-Function-Calling-1B-Q4_K_M.gguf
-
-# Set the size of the context window used to generate the next token
-PARAMETER num_ctx 4096
-
-# Set parameters for response generation
-PARAMETER num_predict 1024
-PARAMETER temperature 0.1
-PARAMETER top_p 0.5
-PARAMETER top_k 32022
-PARAMETER repeat_penalty 1.0
-PARAMETER stop "<|EOT|>"
-
-# Set the random number seed to use for generation
-PARAMETER seed 42
-
-# Set the prompt template to be passed into the model
-TEMPLATE """{{ if .System }}<｜begin▁of▁sentence｜>
-{{ .System }}
-{{ end }}{{ if .Prompt }}### Instruction:
-{{ .Prompt }}
-{{ end }}### Response:
-{{ .Response }}
-<|EOT|>"""
--- a/demos/employee_details_copilot/README.md
+++ b/demos/employee_details_copilot/README.md
@ -1,24 +0,0 @@
-# Function calling
-This demo shows how you can use intelligent prompt gateway to act a copilot for calling the correct proc by capturing the required and optional parametrs from the prompt. This demo assumes you are using ollama running natively. If you want to run ollama running inside docker then please update ollama endpoint in docker-compose file.
-
-# Starting the demo
-1. Create `.env` file and set OpenAI key using env var `OPENAI_API_KEY`
-1. Start services
-   ```sh
-   docker compose up
-   ```
-1. Download Bolt-FC model. This demo assumes we have downloaded [Bolt-Function-Calling-1B:Q4_K_M](https://huggingface.co/katanemolabs/Bolt-Function-Calling-1B.gguf/blob/main/Bolt-Function-Calling-1B-Q4_K_M.gguf) to local folder.
-1. If running ollama natively run
-   ```sh
-   ollama serve
-   ```
-2. Create model file in ollama repository
-   ```sh
-   ollama create Bolt-Function-Calling-1B:Q4_K_M -f Bolt-FC-1B-Q4_K_M.model_file
-   ```
-3. Navigate to http://localhost:18080/
-4. You can type in queries like "show me the top 5 employees in each department with highest salary"
-   - You can also ask follow up questions like "just show the top 2"
-5. To see metrics navigate to "http://localhost:3000/" (use admin/grafana for login)
-   - Open up dahsboard named "Intelligent Gateway Overview"
-   - On this dashboard you can see reuqest latency and number of requests
--- a/demos/employee_details_copilot/api_server/.vscode/launch.json
+++ b/demos/employee_details_copilot/api_server/.vscode/launch.json
@ -1,16 +0,0 @@
-{
-  // Use IntelliSense to learn about possible attributes.
-  // Hover to view descriptions of existing attributes.
-  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
-  "version": "0.2.0",
-  "configurations": [
-    {
-      "name": "function-calling api server",
-      "cwd": "${workspaceFolder}/app",
-      "type": "debugpy",
-      "request": "launch",
-      "module": "uvicorn",
-      "args": ["main:app","--reload", "--port", "8001"],
-    }
-  ]
-}
--- a/demos/employee_details_copilot/api_server/Dockerfile
+++ b/demos/employee_details_copilot/api_server/Dockerfile
@ -1,19 +0,0 @@
-FROM python:3 AS base
-
-FROM base AS builder
-
-WORKDIR /src
-
-COPY requirements.txt /src/
-RUN pip install --prefix=/runtime --force-reinstall -r requirements.txt
-
-COPY . /src
-
-FROM python:3-slim AS output
-
-COPY --from=builder /runtime /usr/local
-
-COPY /app /app
-WORKDIR /app
-
-CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]
--- a/demos/employee_details_copilot/api_server/app/functions.py
+++ b/demos/employee_details_copilot/api_server/app/functions.py
@ -1,29 +0,0 @@
-from typing import List, Optional
-
-# Function for top_employees
-def top_employees(grouping: str, ranking_criteria: str, top_n: int):
-    pass
-
-# Function for aggregate_stats
-def aggregate_stats(grouping: str, aggregate_criteria: str, aggregate_type: str):
-    pass
-
-# Function for employees_projects
-def employees_projects(min_performance_score: float, min_years_experience: int, department: str, min_project_count: int = None, months_range: int = None):
-    pass
-
-# Function for salary_growth
-def salary_growth(min_salary_increase_percentage: float, department: str = None):
-    pass
-
-# Function for promotions_increases
-def promotions_increases(year: int, min_salary_increase_percentage: float = None, department: str = None):
-    pass
-
-# Function for avg_project_performance
-def avg_project_performance(min_project_count: int, min_performance_score: float, department: str = None):
-    pass
-
-# Function for certifications_experience
-def certifications_experience(certifications: list, min_years_experience: int, department: str = None):
-    pass
--- a/demos/employee_details_copilot/api_server/app/generate_config.py
+++ b/demos/employee_details_copilot/api_server/app/generate_config.py
@ -1,78 +0,0 @@
-import inspect
-import yaml
-import functions  # This is your module containing the function definitions
-import os
-
-
-def generate_config_from_function(func):
-    func_name = func.__name__
-    func_doc = func.__doc__
-
-    # Get function signature
-    sig = inspect.signature(func)
-    params = []
-
-    # Extract parameter info
-    for name, param in sig.parameters.items():
-        param_info = {
-            'name': name,
-            'description': f"Provide the {name.replace('_', ' ')}",  # Customize as needed
-            'required': param.default == inspect.Parameter.empty,  # True if no default value
-            'type': param.annotation.__name__ if param.annotation != inspect.Parameter.empty else 'str'  # Get type if available
-        }
-        params.append(param_info)
-
-    # Define the config for this function
-    config = {
-        'name': func_name,
-        'description': func_doc or "",
-        'parameters': params,
-        'endpoint': {
-            'cluster': 'api_server',
-            'path': f"/{func_name}"
-        },
-        'system_prompt': f"You are responsible for handling {func_name} requests."
-    }
-
-    return config
-
-
-def generate_full_config(module):
-    config = {'prompt_targets': []}
-
-    # Automatically get all functions from the module
-    functions_list = inspect.getmembers(module, inspect.isfunction)
-
-    for func_name, func_obj in functions_list:
-        func_config = generate_config_from_function(func_obj)
-        config['prompt_targets'].append(func_config)
-
-    return config
-
-
-def replace_prompt_targets_in_config(file_path, new_prompt_targets):
-    # Load the existing arch_config.yaml
-    with open(file_path, 'r') as file:
-        config_data = yaml.safe_load(file)
-
-    # Replace the prompt_targets section with the new one
-    config_data['prompt_targets'] = new_prompt_targets
-
-    # Write the updated config back to the YAML file
-    with open("arch_config.yaml", 'w+') as file:
-        yaml.dump(config_data, file, sort_keys=False)
-
-    print(f"Updated prompt_targets in arch_config.yaml")
-
-
-# Main execution
-if __name__ == "__main__":
-    # Path to the existing arch_config.yaml two directories up
-    arch_config_path = os.path.abspath(os.path.join(os.path.dirname(__file__), '../../arch_config.yaml'))
-
-    # Generate new prompt_targets from the functions module
-    new_config = generate_full_config(functions)
-    new_prompt_targets = new_config['prompt_targets']
-
-    # Replace the prompt_targets in the existing arch_config.yaml
-    replace_prompt_targets_in_config(arch_config_path, new_prompt_targets)
--- a/demos/employee_details_copilot/api_server/app/main.py
+++ b/demos/employee_details_copilot/api_server/app/main.py
@ -1,288 +0,0 @@
-import random
-from typing import List
-from fastapi import FastAPI, HTTPException, Response
-import logging
-from pydantic import BaseModel
-from utils import load_sql
-import pandas as pd
-
-
-logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
-logger = logging.getLogger(__name__)
-
-app = FastAPI()
-
-@app.get("/healthz")
-async def healthz():
-    return {
-        "status": "ok"
-    }
-
-conn = load_sql()
-name_col = "name"
-
-
-class TopEmployees(BaseModel):
-    grouping: str
-    ranking_criteria: str
-    top_n: int
-
-
-@app.post("/top_employees")
-async def top_employees(req: TopEmployees, res: Response):
-    name_col = "name"
-    # Check if `req.ranking_criteria` is a Text object and extract its value accordingly
-    logger.info(
-        f"{'* ' * 50}\n\nCaptured Ranking Criteria: {req.ranking_criteria}\n\n{'* ' * 50}"
-    )
-
-    if req.ranking_criteria == "yoe":
-        req.ranking_criteria = "years_of_experience"
-    elif req.ranking_criteria == "rating":
-        req.ranking_criteria = "performance_score"
-
-    logger.info(
-        f"{'* ' * 50}\n\nFinal Ranking Criteria: {req.ranking_criteria}\n\n{'* ' * 50}"
-    )
-
-    query = f"""
-    SELECT {req.grouping}, {name_col}, {req.ranking_criteria}
-    FROM (
-        SELECT {req.grouping}, {name_col}, {req.ranking_criteria},
-               DENSE_RANK() OVER (PARTITION BY {req.grouping} ORDER BY {req.ranking_criteria} DESC) as emp_rank
-        FROM employees
-    ) ranked_employees
-    WHERE emp_rank <= {req.top_n};
-    """
-    result_df = pd.read_sql_query(query, conn)
-    result = result_df.to_dict(orient="records")
-    return result
-
-
-class AggregateStats(BaseModel):
-    grouping: str
-    aggregate_criteria: str
-    aggregate_type: str
-
-
-@app.post("/aggregate_stats")
-async def aggregate_stats(req: AggregateStats, res: Response):
-    logger.info(
-        f"{'* ' * 50}\n\nCaptured Aggregate Criteria: {req.aggregate_criteria}\n\n{'* ' * 50}"
-    )
-
-    if req.aggregate_criteria == "yoe":
-        req.aggregate_criteria = "years_of_experience"
-
-    logger.info(
-        f"{'* ' * 50}\n\nFinal Aggregate Criteria: {req.aggregate_criteria}\n\n{'* ' * 50}"
-    )
-
-    logger.info(
-        f"{'* ' * 50}\n\nCaptured Aggregate Type: {req.aggregate_type}\n\n{'* ' * 50}"
-    )
-    if req.aggregate_type.lower() not in ["sum", "avg", "min", "max"]:
-        if req.aggregate_type.lower() == "count":
-            req.aggregate_type = "COUNT"
-        elif req.aggregate_type.lower() == "total":
-            req.aggregate_type = "SUM"
-        elif req.aggregate_type.lower() == "average":
-            req.aggregate_type = "AVG"
-        elif req.aggregate_type.lower() == "minimum":
-            req.aggregate_type = "MIN"
-        elif req.aggregate_type.lower() == "maximum":
-            req.aggregate_type = "MAX"
-        else:
-            raise HTTPException(status_code=400, detail="Invalid aggregate type")
-
-    logger.info(
-        f"{'* ' * 50}\n\nFinal Aggregate Type: {req.aggregate_type}\n\n{'* ' * 50}"
-    )
-
-    query = f"""
-    SELECT {req.grouping}, {req.aggregate_type}({req.aggregate_criteria}) as {req.aggregate_type}_{req.aggregate_criteria}
-    FROM employees
-    GROUP BY {req.grouping};
-    """
-    result_df = pd.read_sql_query(query, conn)
-    result = result_df.to_dict(orient="records")
-    return result
-
-# 1. Top Employees by Performance, Projects, and Timeframe
-class TopEmployeesProjects(BaseModel):
-    min_performance_score: float
-    min_years_experience: int
-    department: str
-    min_project_count: int = None  # Optional
-    months_range: int = None  # Optional (for filtering recent projects)
-
-
-@app.post("/employees_projects")
-async def employees_projects(req: TopEmployeesProjects, res: Response):
-    params, filters = {}, []
-
-    # Add optional months_range filter
-    if req.months_range:
-        params['months_range'] = req.months_range
-        filters.append(f"p.start_date >= DATE('now', '-{req.months_range} months')")
-
-    # Add project count filter if provided
-    if req.min_project_count:
-        filters.append(f"COUNT(p.project_id) >= {req.min_project_count}")
-
-    where_clause = " AND ".join(filters)
-    if where_clause:
-        where_clause = "AND " + where_clause
-
-    query = f"""
-    SELECT e.name, e.department, e.years_of_experience, e.performance_score, COUNT(p.project_id) as project_count
-    FROM employees e
-    LEFT JOIN projects p ON e.eid = p.eid
-    WHERE e.performance_score >= {req.min_performance_score}
-      AND e.years_of_experience >= {req.min_years_experience}
-      AND e.department = '{req.department}'
-      {where_clause}
-    GROUP BY e.eid, e.name, e.department, e.years_of_experience, e.performance_score
-    ORDER BY e.performance_score DESC;
-    """
-
-    result_df = pd.read_sql_query(query, conn, params=params)
-    return result_df.to_dict(orient='records')
-
-
-# 2. Employees with Salary Growth Since Last Promotion
-class SalaryGrowthRequest(BaseModel):
-    min_salary_increase_percentage: float
-    department: str = None  # Optional
-
-
-@app.post("/salary_growth")
-async def salary_growth(req: SalaryGrowthRequest, res: Response):
-    params, filters = {}, []
-
-    if req.department:
-        filters.append("e.department = :department")
-        params['department'] = req.department
-
-    where_clause = " AND ".join(filters)
-    if where_clause:
-        where_clause = "AND " + where_clause
-
-    query = f"""
-    SELECT e.name, e.department, s.salary_increase_percentage
-    FROM employees e
-    JOIN salary_history s ON e.eid = s.eid
-    WHERE s.salary_increase_percentage >= {req.min_salary_increase_percentage}
-      AND s.promotion_date IS NOT NULL
-      {where_clause}
-    ORDER BY s.salary_increase_percentage DESC;
-    """
-
-    result_df = pd.read_sql_query(query, conn, params=params)
-    return result_df.to_dict(orient='records')
-
-
-# 4. Employees with Promotions and Salary Increases
-class PromotionsIncreasesRequest(BaseModel):
-    year: int
-    min_salary_increase_percentage: float = None  # Optional
-    department: str = None  # Optional
-
-
-@app.post("/promotions_increases")
-async def promotions_increases(req: PromotionsIncreasesRequest, res: Response):
-    params, filters = {}, []
-
-    if req.min_salary_increase_percentage:
-        filters.append(f"s.salary_increase_percentage >= {req.min_salary_increase_percentage}")
-
-    if req.department:
-        filters.append("e.department = :department")
-        params['department'] = req.department
-
-    where_clause = " AND ".join(filters)
-    if where_clause:
-        where_clause = "AND " + where_clause
-
-    query = f"""
-    SELECT e.name, e.department, s.salary_increase_percentage, s.promotion_date
-    FROM employees e
-    JOIN salary_history s ON e.eid = s.eid
-    WHERE strftime('%Y', s.promotion_date) = '{req.year}'
-      {where_clause}
-    ORDER BY s.salary_increase_percentage DESC;
-    """
-
-    result_df = pd.read_sql_query(query, conn, params=params)
-    return result_df.to_dict(orient='records')
-
-
-# 5. Employees with Highest Average Project Performance
-class AvgProjPerformanceRequest(BaseModel):
-    min_project_count: int
-    min_performance_score: float
-    department: str = None  # Optional
-
-
-@app.post("/project_performance")
-async def project_performance(req: AvgProjPerformanceRequest, res: Response):
-    params, filters = {}, []
-
-    if req.department:
-        filters.append("e.department = :department")
-        params['department'] = req.department
-
-    filters.append(f"p.performance_score >= {req.min_performance_score}")
-
-    where_clause = " AND ".join(filters)
-
-    query = f"""
-    SELECT e.name, e.department, AVG(p.performance_score) as avg_performance_score, COUNT(p.project_id) as project_count
-    FROM employees e
-    JOIN projects p ON e.eid = p.eid
-    WHERE {where_clause}
-    GROUP BY e.eid, e.name, e.department
-    HAVING COUNT(p.project_id) >= {req.min_project_count}
-    ORDER BY avg_performance_score DESC;
-    """
-
-    result_df = pd.read_sql_query(query, conn, params=params)
-    return result_df.to_dict(orient='records')
-
-
-# 6. Employees by Certification and Years of Experience
-class CertificationsExperienceRequest(BaseModel):
-    certifications: List[str]
-    min_years_experience: int
-    department: str = None  # Optional
-
-@app.post("/certifications_experience")
-async def certifications_experience(req: CertificationsExperienceRequest, res: Response):
-    # Convert the list of certifications into a format for SQL query
-    certs_filter = ', '.join([f"'{cert}'" for cert in req.certifications])
-
-    params, filters = {}, []
-
-    # Add department filter if provided
-    if req.department:
-        filters.append("e.department = :department")
-        params['department'] = req.department
-
-    filters.append("e.years_of_experience >= :min_years_experience")
-    params['min_years_experience'] = req.min_years_experience
-
-    where_clause = " AND ".join(filters)
-
-    query = f"""
-    SELECT e.name, e.department, e.years_of_experience, COUNT(c.certification_name) as cert_count
-    FROM employees e
-    JOIN certifications c ON e.eid = c.eid
-    WHERE c.certification_name IN ({certs_filter})
-      AND {where_clause}
-    GROUP BY e.eid, e.name, e.department, e.years_of_experience
-    HAVING COUNT(c.certification_name) = {len(req.certifications)}
-    ORDER BY e.years_of_experience DESC;
-    """
-
-    result_df = pd.read_sql_query(query, conn, params=params)
-    return result_df.to_dict(orient='records')
--- a/demos/employee_details_copilot/api_server/app/utils.py
+++ b/demos/employee_details_copilot/api_server/app/utils.py
@ -1,157 +0,0 @@
-import pandas as pd
-import random
-import datetime
-import sqlite3
-
-def load_sql():
-    # Example Usage
-    conn = sqlite3.connect(":memory:")
-
-    # create and load the employees table
-    generate_employee_data(conn)
-
-    # create and load the projects table
-    generate_project_data(conn)
-
-    # create and load the salary_history table
-    generate_salary_history(conn)
-
-    # create and load the certifications table
-    generate_certifications(conn)
-
-    return conn
-
-# Function to generate random employee data with `eid` as the primary key
-def generate_employee_data(conn):
-    # List of possible names, positions, departments, and locations
-    names = [
-        "Alice",
-        "Bob",
-        "Charlie",
-        "David",
-        "Eve",
-        "Frank",
-        "Grace",
-        "Hank",
-        "Ivy",
-        "Jack",
-    ]
-    positions = [
-        "Manager",
-        "Engineer",
-        "Salesperson",
-        "HR Specialist",
-        "Marketing Analyst",
-    ]
-    # List of possible names, positions, departments, locations, and certifications
-    names = ["Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Hank", "Ivy", "Jack"]
-    positions = ["Manager", "Engineer", "Salesperson", "HR Specialist", "Marketing Analyst"]
-    departments = ["Engineering", "Marketing", "HR", "Sales", "Finance"]
-    locations = ["New York", "San Francisco", "Austin", "Boston", "Chicago"]
-    certifications = ["AWS Certified", "Google Cloud Certified", "PMP", "Scrum Master", "Cisco Certified"]
-
-    # Generate random hire dates
-    def random_hire_date():
-        start_date = datetime.date(2000, 1, 1)
-        end_date = datetime.date(2023, 12, 31)
-        time_between_dates = end_date - start_date
-        days_between_dates = time_between_dates.days
-        random_number_of_days = random.randrange(days_between_dates)
-        return start_date + datetime.timedelta(days=random_number_of_days)
-
-    # Generate random employee records with an employee ID (eid)
-    employees = []
-    for eid in range(1, 101):  # 100 employees with `eid` starting from 1
-        name = random.choice(names)
-        position = random.choice(positions)
-        salary = round(random.uniform(50000, 150000), 2)  # Salary between 50,000 and 150,000
-        department = random.choice(departments)
-        location = random.choice(locations)
-        hire_date = random_hire_date()
-        performance_score = round(random.uniform(1, 5), 2)  # Performance score between 1.0 and 5.0
-        years_of_experience = random.randint(1, 30)  # Years of experience between 1 and 30
-
-        employee = {
-            "eid": eid,  # Employee ID
-            "name": name,
-            "position": position,
-            "salary": salary,
-            "department": department,
-            "location": location,
-            "hire_date": hire_date,
-            "performance_score": performance_score,
-            "years_of_experience": years_of_experience
-        }
-
-        employees.append(employee)
-
-    # Convert the list of dictionaries to a DataFrame and save to DB
-    df_employees = pd.DataFrame(employees)
-    df_employees.to_sql('employees', conn, index=False, if_exists='replace')
-
-# Function to generate random project data with `eid`
-def generate_project_data(conn):
-    employees = pd.read_sql_query("SELECT eid FROM employees", conn)
-    projects = []
-
-    for _ in range(500):  # 500 projects
-        eid = random.choice(employees['eid'])
-        project_name = f"Project_{random.randint(1, 100)}"
-        start_date = datetime.date(2020, 1, 1) + datetime.timedelta(days=random.randint(0, 365 * 3))  # Within the last 3 years
-        performance_score = round(random.uniform(1, 5), 2)  # Performance score for the project between 1.0 and 5.0
-
-        project = {
-            "eid": eid,  # Foreign key from employees table
-            "project_name": project_name,
-            "start_date": start_date,
-            "performance_score": performance_score
-        }
-
-        projects.append(project)
-
-    # Convert the list of dictionaries to a DataFrame and save to DB
-    df_projects = pd.DataFrame(projects)
-    df_projects.to_sql('projects', conn, index=False, if_exists='replace')
-
-# Function to generate random salary history data with `eid`
-def generate_salary_history(conn):
-    employees = pd.read_sql_query("SELECT eid FROM employees", conn)
-    salary_history = []
-
-    for _ in range(300):  # 300 salary records
-        eid = random.choice(employees['eid'])
-        salary_increase_percentage = round(random.uniform(5, 30), 2)  # Salary increase between 5% and 30%
-        promotion_date = datetime.date(2018, 1, 1) + datetime.timedelta(days=random.randint(0, 365 * 5))  # Promotions in the last 5 years
-
-        salary_record = {
-            "eid": eid,  # Foreign key from employees table
-            "salary_increase_percentage": salary_increase_percentage,
-            "promotion_date": promotion_date
-        }
-
-        salary_history.append(salary_record)
-
-    # Convert the list of dictionaries to a DataFrame and save to DB
-    df_salary_history = pd.DataFrame(salary_history)
-    df_salary_history.to_sql('salary_history', conn, index=False, if_exists='replace')
-
-# Function to generate random certifications data with `eid`
-def generate_certifications(conn):
-    employees = pd.read_sql_query("SELECT eid FROM employees", conn)
-    certifications_list = ["AWS Certified", "Google Cloud Certified", "PMP", "Scrum Master", "Cisco Certified"]
-    employee_certifications = []
-
-    for _ in range(300):  # 300 certification records
-        eid = random.choice(employees['eid'])
-        certification = random.choice(certifications_list)
-
-        cert_record = {
-            "eid": eid,  # Foreign key from employees table
-            "certification_name": certification
-        }
-
-        employee_certifications.append(cert_record)
-
-    # Convert the list of dictionaries to a DataFrame and save to DB
-    df_certifications = pd.DataFrame(employee_certifications)
-    df_certifications.to_sql('certifications', conn, index=False, if_exists='replace')
--- a/demos/employee_details_copilot/api_server/requirements.txt
+++ b/demos/employee_details_copilot/api_server/requirements.txt
@ -1,4 +0,0 @@
-fastapi
-uvicorn
-pandas
-dateparser
--- a/demos/employee_details_copilot/bolt_config.yaml
+++ b/demos/employee_details_copilot/bolt_config.yaml
@ -1,199 +0,0 @@
-default_prompt_endpoint: "127.0.0.1"
-load_balancing: "round_robin"
-timeout_ms: 5000
-
-overrides:
-  # confidence threshold for prompt target intent matching
-  prompt_target_intent_matching_threshold: 0.7
-
-llm_providers:
-
-  - name: open-ai-gpt-4
-    api_key: $OPEN_AI_API_KEY
-    model: gpt-4
-    default: true
-
-prompt_targets:
-
-  - type: function_resolver
-    name: top_employees
-    description: |
-      Allows you to find the top employees in different groups, such as departments, locations, or position. You can rank the employees by different criteria, like salary, yoe, or rating. Returns the best-ranked employees for each group, helping you identify top n in the list.
-    parameters:
-      - name: grouping
-        description: |
-          Select how you'd like to group the employees. For example, you can group them by department, location, or their position. The tool will provide the top-ranked employees within each group you choose.
-        required: true
-        type: string
-        enum: [department, location, position]
-      - name: ranking_criteria
-        required: true
-        type: string
-        description: |
-          Choose how you'd like to rank the employees. You can rank them by their salary, their years of experience, or their rating. The tool will sort the employees based on this ranking and return the best ones from each group.
-        enum: [salary, years_of_experience, performance_score]
-      - name: top_n
-        required: true
-        type: integer
-        description: |
-          Enter how many of the top employees you want to see in each group. For example, if you enter 3, the tool will show you the top 3 employees for each group you selected.
-    endpoint:
-      cluster: api_server
-      path: /top_employees
-    system_prompt: |
-      You are responsible for retrieving the top N employees per group ranked by a constraint.
-
-  - type: function_resolver
-    name: aggregate_stats
-    description: |
-       Calculate summary statistics for groups of employees. You can group employees by categories like department or location and then compute totals, averages, or other statistics for specific attributes such as salary or years of experience.
-    parameters:
-      - name: grouping
-        description: |
-          Choose how you'd like to organize the employees. For example, you can group them by department, location, or position. The tool will calculate the summary statistics for each group.
-        required: true
-        enum: [department, location, position]
-      - name: aggregate_criteria
-        description: |
-          Select the specific attribute you'd like to analyze. This could be something like salary, years of experience, or rating. The tool will calculate the statistic you request for this attribute.
-        required: true
-        enum: [salary, years_of_experience, performance_score]
-      - name: aggregate_type
-        description: |
-          Choose the type of statistic you'd like to calculate for the selected attribute. For example, you can calculate the sum, average, minimum, or maximum value for each group.
-        required: true
-        enum: [SUM, AVG, MIN, MAX]
-    endpoint:
-      cluster: api_server
-      path: /aggregate_stats
-    system_prompt: |
-      You help calculate summary statistics for groups of employees. First, organize the employees by the specified grouping (e.g., department, location, or position). Then, compute the requested statistic (e.g., total, average, minimum, or maximum) for a specific attribute like salary, experience, or rating.
-
-  # 1. Top Employees by Performance, Projects, and Timeframe
-  - type: function_resolver
-    name: employees_projects
-    description: |
-      Fetch employees with the highest performance scores, considering their project participation and years of experience. You can filter by minimum performance score, years of experience, and department. Optionally, you can also filter by recent project participation within the last Y months.
-    parameters:
-      - name: min_performance_score
-        description: Minimum performance score to filter employees.
-        required: true
-        type: float
-      - name: min_years_experience
-        description: Minimum years of experience to filter employees.
-        required: true
-        type: integer
-      - name: department
-        description: Department to filter employees by.
-        required: true
-        type: string
-      - name: min_project_count
-        description: Minimum number of projects employees participated in (optional).
-        required: false
-        type: integer
-      - name: months_range
-        description: Timeframe (in months) for filtering recent projects (optional).
-        required: false
-        type: integer
-    endpoint:
-      cluster: api_server
-      path: /employees_projects
-    system_prompt: |
-      You are responsible for retrieving the top N employees ranked by performance and project participation. Use filters for experience and optional project criteria.
-
-
-  # 2. Employees with Salary Growth Since Last Promotion
-  - type: function_resolver
-    name: salary_growth
-    description: |
-      Fetch employees with the highest salary growth since their last promotion, grouped by department. You can filter by a minimum salary increase percentage and department.
-    parameters:
-      - name: min_salary_increase_percentage
-        description: Minimum percentage increase in salary since the last promotion.
-        required: true
-        type: float
-      - name: department
-        description: Department to filter employees by (optional).
-        required: false
-        type: string
-    endpoint:
-      cluster: api_server
-      path: /salary_growth
-    system_prompt: |
-      You are responsible for retrieving employees with the highest salary growth since their last promotion. Filter by minimum salary increase percentage and department.
-
-  # 4. Employees with Promotions and Salary Increases by Year
-  - type: function_resolver
-    name: promotions_increases
-    description: |
-      Fetch employees who were promoted and received a salary increase in a specific year, grouped by department. You can optionally filter by minimum percentage salary increase and department.
-    parameters:
-      - name: year
-        description: The year in which the promotion and salary increase occurred.
-        required: true
-        type: integer
-      - name: min_salary_increase_percentage
-        description: Minimum percentage salary increase to filter employees.
-        required: false
-        type: float
-      - name: department
-        description: Department to filter by (optional).
-        required: false
-        type: string
-    endpoint:
-      cluster: api_server
-      path: /promotions_increases
-    system_prompt: |
-      You are responsible for fetching employees who were promoted and received a salary increase in a specific year. Apply filters for salary increase percentage and department.
-
-
-  # 5. Employees with Highest Average Project Performance
-  - type: function_resolver
-    name: project_performance
-    description: |
-      Fetch employees with the highest average performance across all projects they have worked on over time. You can filter by minimum project count, department, and minimum performance score.
-    parameters:
-      - name: min_project_count
-        description: Minimum number of projects an employee must have participated in.
-        required: true
-        type: integer
-      - name: min_performance_score
-        description: Minimum performance score to filter employees.
-        required: true
-        type: float
-        default: 4.0
-      - name: department
-        description: Department to filter by (optional).
-        required: false
-        type: string
-    endpoint:
-      cluster: api_server
-      path: /project_performance
-    system_prompt: |
-      You are responsible for fetching employees with the highest average performance across all projects they’ve worked on. Apply filters for minimum project count, performance score, and department.
-
-
-  # 6. Employees by Certification and Years of Experience
-  - type: function_resolver
-    name: certifications_experience
-    description: |
-      Fetch employees who have all the required certifications and meet the minimum years of experience. You can filter by department and provide a list of certifications to match.
-    parameters:
-      - name: certifications
-        description: List of required certifications.
-        required: true
-        type: list
-      - name: min_years_experience
-        description: Minimum years of experience.
-        required: true
-        type: integer
-      - name: department
-        description: Department to filter employees by (optional).
-        required: false
-        type: string
-        default: "Engineering"
-    endpoint:
-      cluster: api_server
-      path: /certifications_experience
-    system_prompt: |
-      You are responsible for fetching employees who have the required certifications and meet the minimum years of experience. Optionally, filter by department.
--- a/demos/employee_details_copilot/docker-compose.yaml
+++ b/demos/employee_details_copilot/docker-compose.yaml
@ -1,143 +0,0 @@
-services:
-
-  config_generator:
-    build:
-      context: ../../
-      dockerfile: config_generator/Dockerfile
-    volumes:
-      - ../../arch/envoy.template.yaml:/usr/src/app/envoy.template.yaml
-      - ./arch_config.yaml:/usr/src/app/arch_config.yaml
-      - ./generated:/usr/src/app/out
-
-  arch:
-    build:
-      context: ../../
-      dockerfile: arch/Dockerfile
-    hostname: arch
-    ports:
-      - "10000:10000"
-      - "19901:9901"
-    volumes:
-      - ./generated/envoy.yaml:/etc/envoy/envoy.yaml
-      - /etc/ssl/cert.pem:/etc/ssl/cert.pem
-      - ./arch_config.yaml:/config/arch_config.yaml
-    depends_on:
-      config_generator:
-        condition: service_completed_successfully
-      model_server:
-        condition: service_healthy
-    environment:
-      - LOG_LEVEL=debug
-
-  model_server:
-    build:
-      context: ../../model_server
-      dockerfile: Dockerfile
-    ports:
-      - "18081:80"
-    healthcheck:
-        test: ["CMD", "curl" ,"http://localhost:80/healthz"]
-        interval: 5s
-        retries: 20
-    volumes:
-      - ~/.cache/huggingface:/root/.cache/huggingface
-      - ./arch_config.yaml:/root/arch_config.yaml
-
-  api_server:
-    build:
-      context: api_server
-      dockerfile: Dockerfile
-    ports:
-      - "18083:80"
-    healthcheck:
-        test: ["CMD", "curl" ,"http://localhost:80/healthz"]
-        interval: 5s
-        retries: 20
-
-  function_resolver:
-    build:
-      context: ../../function_resolver
-      dockerfile: Dockerfile
-    ports:
-      - "18082:80"
-    healthcheck:
-        test: ["CMD", "curl" ,"http://localhost:80/healthz"]
-        interval: 5s
-        retries: 20
-    volumes:
-      - ~/.cache/huggingface:/root/.cache/huggingface
-    environment:
-      # use ollama endpoint that is hosted by host machine (no virtualization)
-      - OLLAMA_ENDPOINT=${OLLAMA_ENDPOINT:-host.docker.internal}
-      # uncomment following line to use ollama endpoint that is hosted by docker
-      # - OLLAMA_ENDPOINT=ollama
-      - OLLAMA_MODEL=Bolt-Function-Calling-1B:Q4_K_M
-
-  ollama:
-    image: ollama/ollama
-    container_name: ollama
-    volumes:
-      - ./ollama:/root/.ollama
-    restart: unless-stopped
-    ports:
-      - '11434:11434'
-    profiles:
-      - manual
-
-  open-webui:
-    image: ghcr.io/open-webui/open-webui:${WEBUI_DOCKER_TAG-main}
-    container_name: open-webui
-    volumes:
-      - ./open-webui:/app/backend/data
-    # depends_on:
-      # - ollama
-    ports:
-      - 18090:8080
-    environment:
-      - OLLAMA_BASE_URL=http://${OLLAMA_ENDPOINT:-host.docker.internal}:11434
-      - WEBUI_AUTH=false
-    extra_hosts:
-      - host.docker.internal:host-gateway
-    restart: unless-stopped
-    profiles:
-      - monitoring
-
-  chatbot_ui:
-    build:
-      context: ../../chatbot_ui
-      dockerfile: Dockerfile
-    ports:
-      - "18080:8080"
-    environment:
-      - OPENAI_API_KEY=${OPENAI_API_KEY:?error}
-      - CHAT_COMPLETION_ENDPOINT=http://arch:10000/v1
-
-  prometheus:
-    image: prom/prometheus
-    container_name: prometheus
-    command:
-      - '--config.file=/etc/prometheus/prometheus.yaml'
-    ports:
-      - 9090:9090
-    restart: unless-stopped
-    volumes:
-      - ./prometheus:/etc/prometheus
-      - ./prom_data:/prometheus
-    profiles:
-      - monitoring
-
-  grafana:
-    image: grafana/grafana
-    container_name: grafana
-    ports:
-      - 3000:3000
-    restart: unless-stopped
-    environment:
-      - GF_SECURITY_ADMIN_USER=admin
-      - GF_SECURITY_ADMIN_PASSWORD=grafana
-    volumes:
-      - ./grafana:/etc/grafana/provisioning/datasources
-      - ./grafana/dashboard.yaml:/etc/grafana/provisioning/dashboards/main.yaml
-      - ./grafana/dashboards:/var/lib/grafana/dashboards
-    profiles:
-      - monitoring
--- a/demos/employee_details_copilot/grafana/dashboard.yaml
+++ b/demos/employee_details_copilot/grafana/dashboard.yaml
@ -1,12 +0,0 @@
-apiVersion: 1
-
-providers:
-  - name: "Dashboard provider"
-    orgId: 1
-    type: file
-    disableDeletion: false
-    updateIntervalSeconds: 10
-    allowUiUpdates: false
-    options:
-      path: /var/lib/grafana/dashboards
-      foldersFromFilesStructure: true
--- a/demos/employee_details_copilot/grafana/dashboards/envoy_overview.json
+++ b/demos/employee_details_copilot/grafana/dashboards/envoy_overview.json
@ -1,355 +0,0 @@
-{
-  "annotations": {
-    "list": [
-      {
-        "builtIn": 1,
-        "datasource": {
-          "type": "grafana",
-          "uid": "-- Grafana --"
-        },
-        "enable": true,
-        "hide": true,
-        "iconColor": "rgba(0, 211, 255, 1)",
-        "name": "Annotations & Alerts",
-        "type": "dashboard"
-      }
-    ]
-  },
-  "editable": true,
-  "fiscalYearStartMonth": 0,
-  "graphTooltip": 1,
-  "links": [],
-  "panels": [
-    {
-      "datasource": {
-        "type": "prometheus",
-        "uid": "PBFA97CFB590B2093"
-      },
-      "fieldConfig": {
-        "defaults": {
-          "color": {
-            "mode": "palette-classic"
-          },
-          "custom": {
-            "axisBorderShow": false,
-            "axisCenteredZero": false,
-            "axisColorMode": "text",
-            "axisLabel": "",
-            "axisPlacement": "auto",
-            "barAlignment": 0,
-            "drawStyle": "line",
-            "fillOpacity": 0,
-            "gradientMode": "none",
-            "hideFrom": {
-              "legend": false,
-              "tooltip": false,
-              "viz": false
-            },
-            "insertNulls": false,
-            "lineInterpolation": "linear",
-            "lineWidth": 1,
-            "pointSize": 5,
-            "scaleDistribution": {
-              "type": "linear"
-            },
-            "showPoints": "auto",
-            "spanNulls": false,
-            "stacking": {
-              "group": "A",
-              "mode": "none"
-            },
-            "thresholdsStyle": {
-              "mode": "off"
-            }
-          },
-          "mappings": [],
-          "thresholds": {
-            "mode": "absolute",
-            "steps": [
-              {
-                "color": "green",
-                "value": null
-              },
-              {
-                "color": "red",
-                "value": 80
-              }
-            ]
-          }
-        },
-        "overrides": []
-      },
-      "gridPos": {
-        "h": 8,
-        "w": 12,
-        "x": 0,
-        "y": 0
-      },
-      "id": 2,
-      "options": {
-        "legend": {
-          "calcs": [],
-          "displayMode": "list",
-          "placement": "bottom",
-          "showLegend": true
-        },
-        "tooltip": {
-          "mode": "single",
-          "sort": "none"
-        }
-      },
-      "targets": [
-        {
-          "datasource": {
-            "type": "prometheus",
-            "uid": "PBFA97CFB590B2093"
-          },
-          "disableTextWrap": false,
-          "editorMode": "code",
-          "expr": "avg(rate(envoy_cluster_internal_upstream_rq_time_sum[1m]) / rate(envoy_cluster_internal_upstream_rq_time_count[1m])) by (envoy_cluster_name)",
-          "fullMetaSearch": false,
-          "hide": false,
-          "includeNullMetadata": true,
-          "instant": false,
-          "legendFormat": "__auto",
-          "range": true,
-          "refId": "A",
-          "useBackend": false
-        }
-      ],
-      "title": "request latency - internal (ms)",
-      "type": "timeseries"
-    },
-    {
-      "datasource": {
-        "type": "prometheus",
-        "uid": "PBFA97CFB590B2093"
-      },
-      "fieldConfig": {
-        "defaults": {
-          "color": {
-            "mode": "palette-classic"
-          },
-          "custom": {
-            "axisBorderShow": false,
-            "axisCenteredZero": false,
-            "axisColorMode": "text",
-            "axisLabel": "",
-            "axisPlacement": "auto",
-            "barAlignment": 0,
-            "drawStyle": "line",
-            "fillOpacity": 0,
-            "gradientMode": "none",
-            "hideFrom": {
-              "legend": false,
-              "tooltip": false,
-              "viz": false
-            },
-            "insertNulls": false,
-            "lineInterpolation": "linear",
-            "lineWidth": 1,
-            "pointSize": 5,
-            "scaleDistribution": {
-              "type": "linear"
-            },
-            "showPoints": "auto",
-            "spanNulls": false,
-            "stacking": {
-              "group": "A",
-              "mode": "none"
-            },
-            "thresholdsStyle": {
-              "mode": "off"
-            }
-          },
-          "mappings": [],
-          "thresholds": {
-            "mode": "absolute",
-            "steps": [
-              {
-                "color": "green",
-                "value": null
-              },
-              {
-                "color": "red",
-                "value": 80
-              }
-            ]
-          }
-        },
-        "overrides": []
-      },
-      "gridPos": {
-        "h": 8,
-        "w": 12,
-        "x": 12,
-        "y": 0
-      },
-      "id": 1,
-      "options": {
-        "legend": {
-          "calcs": [],
-          "displayMode": "list",
-          "placement": "bottom",
-          "showLegend": true
-        },
-        "tooltip": {
-          "mode": "single",
-          "sort": "none"
-        }
-      },
-      "targets": [
-        {
-          "datasource": {
-            "type": "prometheus",
-            "uid": "PBFA97CFB590B2093"
-          },
-          "disableTextWrap": false,
-          "editorMode": "code",
-          "expr": "avg(rate(envoy_cluster_external_upstream_rq_time_sum[1m]) / rate(envoy_cluster_external_upstream_rq_time_count[1m])) by (envoy_cluster_name)",
-          "fullMetaSearch": false,
-          "hide": false,
-          "includeNullMetadata": true,
-          "instant": false,
-          "legendFormat": "__auto",
-          "range": true,
-          "refId": "A",
-          "useBackend": false
-        }
-      ],
-      "title": "request latency - external (ms)",
-      "type": "timeseries"
-    },
-    {
-      "datasource": {
-        "type": "prometheus",
-        "uid": "PBFA97CFB590B2093"
-      },
-      "fieldConfig": {
-        "defaults": {
-          "color": {
-            "mode": "palette-classic"
-          },
-          "custom": {
-            "axisBorderShow": false,
-            "axisCenteredZero": false,
-            "axisColorMode": "text",
-            "axisLabel": "",
-            "axisPlacement": "auto",
-            "barAlignment": 0,
-            "drawStyle": "line",
-            "fillOpacity": 0,
-            "gradientMode": "none",
-            "hideFrom": {
-              "legend": false,
-              "tooltip": false,
-              "viz": false
-            },
-            "insertNulls": false,
-            "lineInterpolation": "linear",
-            "lineWidth": 1,
-            "pointSize": 5,
-            "scaleDistribution": {
-              "type": "linear"
-            },
-            "showPoints": "auto",
-            "spanNulls": false,
-            "stacking": {
-              "group": "A",
-              "mode": "none"
-            },
-            "thresholdsStyle": {
-              "mode": "off"
-            }
-          },
-          "mappings": [],
-          "thresholds": {
-            "mode": "absolute",
-            "steps": [
-              {
-                "color": "green",
-                "value": null
-              },
-              {
-                "color": "red",
-                "value": 80
-              }
-            ]
-          }
-        },
-        "overrides": []
-      },
-      "gridPos": {
-        "h": 8,
-        "w": 12,
-        "x": 0,
-        "y": 8
-      },
-      "id": 3,
-      "options": {
-        "legend": {
-          "calcs": [],
-          "displayMode": "list",
-          "placement": "bottom",
-          "showLegend": true
-        },
-        "tooltip": {
-          "mode": "single",
-          "sort": "none"
-        }
-      },
-      "targets": [
-        {
-          "datasource": {
-            "type": "prometheus",
-            "uid": "PBFA97CFB590B2093"
-          },
-          "disableTextWrap": false,
-          "editorMode": "code",
-          "expr": "avg(rate(envoy_cluster_internal_upstream_rq_completed[1m])) by (envoy_cluster_name)",
-          "fullMetaSearch": false,
-          "includeNullMetadata": true,
-          "instant": false,
-          "legendFormat": "__auto",
-          "range": true,
-          "refId": "A",
-          "useBackend": false
-        },
-        {
-          "datasource": {
-            "type": "prometheus",
-            "uid": "PBFA97CFB590B2093"
-          },
-          "disableTextWrap": false,
-          "editorMode": "code",
-          "expr": "avg(rate(envoy_cluster_external_upstream_rq_completed[1m])) by (envoy_cluster_name)",
-          "fullMetaSearch": false,
-          "hide": false,
-          "includeNullMetadata": true,
-          "instant": false,
-          "legendFormat": "__auto",
-          "range": true,
-          "refId": "B",
-          "useBackend": false
-        }
-      ],
-      "title": "Upstream request count",
-      "type": "timeseries"
-    }
-  ],
-  "schemaVersion": 39,
-  "tags": [],
-  "templating": {
-    "list": []
-  },
-  "time": {
-    "from": "now-15m",
-    "to": "now"
-  },
-  "timepicker": {},
-  "timezone": "browser",
-  "title": "Intelligent Gateway Overview",
-  "uid": "adt6uhx5lk8aob",
-  "version": 3,
-  "weekStart": ""
-}
--- a/demos/employee_details_copilot/grafana/datasource.yaml
+++ b/demos/employee_details_copilot/grafana/datasource.yaml
@ -1,9 +0,0 @@
-apiVersion: 1
-
-datasources:
- name: Prometheus
-  type: prometheus
-  url: http://prometheus:9090
-  isDefault: true
-  access: proxy
-  editable: true
--- a/demos/employee_details_copilot/prometheus/prometheus.yaml
+++ b/demos/employee_details_copilot/prometheus/prometheus.yaml
@ -1,23 +0,0 @@
-global:
-  scrape_interval: 15s
-  scrape_timeout: 10s
-  evaluation_interval: 15s
-alerting:
-  alertmanagers:
-  - static_configs:
-    - targets: []
-    scheme: http
-    timeout: 10s
-    api_version: v1
-scrape_configs:
- job_name: envoy
-  honor_timestamps: true
-  scrape_interval: 15s
-  scrape_timeout: 10s
-  metrics_path: /stats
-  scheme: http
-  static_configs:
-  - targets:
-    - arch:9901
-  params:
-    format: ['prometheus']
--- a/demos/function_calling/README.md
+++ b/demos/function_calling/README.md
@ -1,29 +1,20 @@
 # Function calling
-This demo shows how you can use intelligent prompt gateway to do function calling. This demo assumes you are using ollama running natively. If you want to run ollama running inside docker then please update ollama endpoint in docker-compose file.
+This demo shows how you can use Arch's function calling capabilites.

 # Starting the demo
-1. Ensure that submodule is up to date
-   ```sh
-   git submodule sync --recursive
-   ```
 1. Create `.env` file and set OpenAI key using env var `OPENAI_API_KEY`
-1. Start services
+2. Start Arch
   ```sh
-   docker compose up
+   archgw up arch_config.yaml
   ```
-1. Download Bolt-FC model. This demo assumes we have downloaded [Arch-Function-Calling-1.5B:Q4_K_M](https://huggingface.co/katanemolabs/Arch-Function-Calling-1.5B.gguf/blob/main/Arch-Function-Calling-1.5B-Q4_K_M.gguf) to local folder.
-1. If running ollama natively run
-   ```sh
-   ollama serve
+3. Start Network Agent
+    ```sh
+    docker compose up
   ```
-2. Create model file in ollama repository
-   ```sh
-   ollama create Arch-Function-Calling-1.5B:Q4_K_M -f Arch-Function-Calling-1.5B-Q4_K_M.model_file
-   ```
-3. Navigate to http://localhost:18080/
+4. Navigate to http://localhost:18080/
 4. You can type in queries like "how is the weather in Seattle"
   - You can also ask follow up questions like "show me sunny days"
-5. To see metrics navigate to "http://localhost:3000/" (use admin/grafana for login)
+6. To see metrics navigate to "http://localhost:3000/" (use admin/grafana for login)
   - Open up dahsboard named "Intelligent Gateway Overview"
   - On this dashboard you can see reuqest latency and number of requests

@ -38,6 +29,5 @@ Arch gateway publishes stats endpoint at http://localhost:19901/stats. In this d
 1. From grafana left nav click on dashboards and select "Intelligent Gateway Overview" to view arch gateway stats


-Here is sample interaction,
-
+Here is a sample interaction,
 <img width="575" alt="image" src="https://github.com/user-attachments/assets/e0929490-3eb2-4130-ae87-a732aea4d059">
--- a/demos/insurance_agent/README.md
+++ b/demos/insurance_agent/README.md
@ -1 +1,64 @@
-The following demo
+# Insurance Agent Demo
+
+This demo showcases how the **Arch** can be used to manage insurance-related tasks such as policy inquiries, initiating policies, and updating claims or deductibles. In this demo, the assistant provides factual information related to insurance policies (e.g., car, boat, house, motorcycle).
+
+The system can perform a variety of tasks, such as answering insurance-related questions, retrieving policy coverage details, initiating policies, and updating claims or deductibles.
+
+## Available Functions:
+
+- **Policy Q/A**: Handles general Q&A related to insurance policies.
+  - **Endpoint**: `/policy/qa`
+  - This function answers general inquiries related to insurance, such as coverage details or policy types. It is the default target for insurance-related queries.
+
+- **Get Policy Coverage**: Retrieves the coverage details for a given policy type (car, boat, house, motorcycle).
+  - **Endpoint**: `/policy/coverage`
+  - Parameters:
+    - `policy_type` (required): The type of policy. Available options: `car`, `boat`, `house`, `motorcycle`. Defaults to `car`.
+
+- **Initiate Policy**: Starts a policy coverage for car, boat, motorcycle, or house.
+  - **Endpoint**: `/policy/initiate`
+  - Parameters:
+    - `policy_type` (required): The type of policy. Available options: `car`, `boat`, `house`, `motorcycle`. Defaults to `car`.
+    - `deductible` (required): The deductible amount set for the policy.
+
+- **Update Claim**: Updates the notes on a specific insurance claim.
+  - **Endpoint**: `/policy/claim`
+  - Parameters:
+    - `claim_id` (required): The claim number.
+    - `notes` (optional): Notes about the claim number for the adjustor to see.
+
+- **Update Deductible**: Updates the deductible amount for a specific policy coverage.
+  - **Endpoint**: `/policy/deductible`
+  - Parameters:
+    - `policy_id` (required): The ID of the policy.
+    - `deductible` (required): The deductible amount to be set for the policy.
+
+**Arch** is designed to intelligently routes prompts to the appropriate functions based on the target, allowing for seamless interaction with various insurance-related services.
+
+# Starting the demo
+1. Create `.env` file and set OpenAI key using env var `OPENAI_API_KEY`
+2. Start Arch
+   ```sh
+   archgw up [path to arch_config.yaml]
+   ```
+3. Start Network Agent
+    ```sh
+    docker compose up
+   ```
+3. Navigate to http://localhost:18080/
+4. You can type in queries like "show me device statics for the past 7 days"
+
+
+# Observability
+Arch gateway publishes stats endpoint at http://localhost:19901/stats. In this demo we are using grafana to pull stats from
+arch and we are using grafana to visalize the stats in dashboard. To see grafana dashboard follow instructions below,
+
+1. Start grafana and prometheus using following command
+   ```yaml
+   docker compose --profile monitoring up
+   ```
+1. Navigate to http://localhost:3000/ to open grafana UI (use admin/grafana as credentials)
+1. From grafana left nav click on dashboards and select "Arch" to view the arch gateway stats
+
+Here is sample interaction,
+<img width="575" alt="image" src="https://github.com/user-attachments/assets/25d40f46-616e-41ea-be8e-1623055c84ec">
--- a/demos/network_agent/README.md
+++ b/demos/network_agent/README.md
@ -0,0 +1,47 @@
+# Network Agent Demo
+
+This demo illustrates how **Arch** can be used to perform function calling with network-related tasks. In this demo, you act as a **network assistant** that provides factual information, without offering advice on manufacturers or purchasing decisions.
+
+The assistant can perform several key operations, including rebooting devices, answering general networking questions, and retrieving device statistics. By default, the system prompt ensures that the assistant's responses are factual and neutral.
+
+## Available Functions:
+- **Reboot Devices**: Allows rebooting specific devices or device groups, with an optional time range for scheduling the reboot.
+  - Parameters:
+    - `device_ids` (required): A list of device IDs to reboot.
+    - `time_range` (optional): Specifies the time range in days, defaulting to 7 days if not provided.
+
+- **Network Q/A**: Handles general Q&A related to networking. This function is the default target for general networking queries.
+
+- **Device Summary**: Retrieves statistics for specific devices within a given time range.
+  - Parameters:
+    - `device_ids` (required): A list of device IDs for which statistics will be retrieved.
+    - `time_range` (optional): Specifies the time range in days for gathering statistics, with a default of 7 days.
+
+
+# Starting the demo
+1. Create `.env` file and set OpenAI key using env var `OPENAI_API_KEY`
+2. Start Arch
+   ```sh
+   archgw up [path to arch_config.yaml]
+   ```
+3. Start Network Agent
+    ```sh
+    docker compose up
+   ```
+3. Navigate to http://localhost:18080/
+4. You can type in queries like "show me device statics for the past 7 days"
+
+
+# Observability
+Arch gateway publishes stats endpoint at http://localhost:19901/stats. In this demo we are using grafana to pull stats from
+arch and we are using grafana to visalize the stats in dashboard. To see grafana dashboard follow instructions below,
+
+1. Start grafana and prometheus using following command
+   ```yaml
+   docker compose --profile monitoring up
+   ```
+1. Navigate to http://localhost:3000/ to open grafana UI (use admin/grafana as credentials)
+1. From grafana left nav click on dashboards and select "Arch" to view the arch gateway stats
+
+Here is sample interaction,
+<img width="575" alt="image" src="https://github.com/user-attachments/assets/25d40f46-616e-41ea-be8e-1623055c84ec">
--- a/docs/source/build_with_arch/rag.rst
+++ b/docs/source/build_with_arch/rag.rst
@ -6,16 +6,45 @@ RAG Application
 The following section describes how Arch can help you build faster, smarter and more accurate
 Retrieval-Augmented Generation (RAG) applications.

-Intent-drift Detection
----------------------
-Developers struggle to handle ``follow-up`` or ``clarification`` questions.
-Specifically, when users ask for changes or additions to previous responses their AI applications often generate entirely new responses instead of adjusting previous ones.
-Arch offers **intent-drift** tracking as a feature so that developers can know when the user has shifted away from a previous intent so that they can dramatically improve retrieval accuracy, lower overall token cost and  improve the speed of their responses back to users.
+Parameter Extraction for RAG
+----------------------------
+
+To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
+enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
+retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
+the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
+streamline data retrieval and processing to build more efficient and precise RAG applications.
+
+Step 1: Define Prompt Targets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. literalinclude:: includes/rag/prompt_targets.yaml
+    :language: yaml
+    :caption: Prompt Targets
+    :linenos:
+
+Step 2: Process Request Parameters in Flask
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the prompt targets are configured as above, handling those parameters is
+
+.. literalinclude:: includes/rag/parameter_handling.py
+    :language: python
+    :caption: Parameter handling with Flask
+    :linenos:
+
+[Coming Soon] `Drift Detection via Arch Intent-Markers <https://github.com/orgs/katanemo/projects/1/views/1?pane=issue&itemId=82697909>`_
+-----------------------------------------------------------------------------------------------------------------------------------------
+Developers struggle to efficiently handle ``follow-up`` or ``clarification`` questions. Specifically, when users ask for
+changes or additions to previous responses their AI applications often generate entirely new responses instead of adjusting
+previous ones.Arch offers **intent** tracking as a feature so that developers can know when the user has shifted away from a
+previous intent so that they can dramatically improve retrieval accuracy, lower overall token cost and  improve the speed of
+their responses back to users.

 Arch uses its built-in lightweight NLI and embedding models to know if the user has steered away from an active intent.
 Arch's intent-drift detection mechanism is based on its' :ref:`prompt_targets <prompt_target>` primtive. Arch tries to match an incoming
 prompt to one of the prompt_targets configured in the gateway. Once it detects that the user has moved away from an active
-active intent, Arch adds the ``x-arch-intent-drift`` headers to the request before sending it your application servers.
+active intent, Arch adds the ``x-arch-intent-marker`` headers to the request before sending it your application servers.

 .. literalinclude:: includes/rag/intent_detection.py
    :language: python
@ -61,30 +90,3 @@ Step 3: Get Messages based on latest drift
 You can used the last set of messages that match to an intent to prompt an LLM, use it with an vector-DB for
 improved retrieval, etc. With Arch and a few lines of code, you can improve the retrieval accuracy, lower overall
 token cost and dramatically improve the speed of their responses back to users.
-
-Parameter Extraction for RAG
----------------------------
-
-To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
-enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
-retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
-the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
-streamline data retrieval and processing to build more efficient and precise RAG applications.
-
-Step 1: Define Prompt Targets
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. literalinclude:: includes/rag/prompt_targets.yaml
-    :language: yaml
-    :caption: Prompt Targets
-    :linenos:
-
-Step 2: Process Request Parameters in Flask
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Once the prompt targets are configured as above, handling those parameters is
-
-.. literalinclude:: includes/rag/parameter_handling.py
-    :language: python
-    :caption: Parameter handling with Flask
-    :linenos:
--- a/docs/source/concepts/tech_overview/prompt.rst
+++ b/docs/source/concepts/tech_overview/prompt.rst
@ -108,7 +108,7 @@ traffic, apply rate limits, and utilize a large set of traffic management capabi
 .. Attention::
   When you start Arch, it automatically creates a listener port for egress calls to upstream LLMs. This is based on the
   ``llm_providers`` configuration section in the ``prompt_config.yml`` file. Arch binds itself to a local address such as
-   127.0.0.1:51001/v1.
+   127.0.0.1:12000/v1.


 Example: Using OpenAI Client with Arch as an Egress Gateway
@ -119,7 +119,7 @@ Example: Using OpenAI Client with Arch as an Egress Gateway
   import openai

   # Set the OpenAI API base URL to the Arch gateway endpoint
-   openai.api_base = "http://127.0.0.1:51001/v1"
+   openai.api_base = "http://127.0.0.1:12000/v1"

   # No need to set openai.api_key since it's configured in Arch's gateway

--- a/docs/source/concepts/tech_overview/terminology.rst
+++ b/docs/source/concepts/tech_overview/terminology.rst
@ -21,7 +21,7 @@ before forwarding them to your application server endpoints. rch enables you to
 .. Note::

   When you start Arch, you specify a listener address/port that you want to bind downstream. But, Arch uses are predefined port
-   that you can use (``127.0.0.1:10000``) to proxy egress calls originating from your application to LLMs (API-based or hosted).
+   that you can use (``127.0.0.1:12000``) to proxy egress calls originating from your application to LLMs (API-based or hosted).
   For more details, check out :ref:`LLM provider <llm_provider>`.

 **Instance**: An instance of the Arch gateway. When you start Arch it creates at most two processes. One to handle Layer 7