updated all demo READMes and minor doc changes (#154)

* updated all demo READMes and minor doc changes

* minor typo fixes

* updated main Readme

* fixed README and docs

* fixed README and docs

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
This commit is contained in:
Salman Paracha 2024-10-08 23:58:55 -07:00 committed by GitHub
parent b63a01fe82
commit 42d4a28e13
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
22 changed files with 324 additions and 1455 deletions

185
README.md
View file

@ -10,30 +10,177 @@ Arch is an intelligent [Layer 7](https://www.cloudflare.com/learning/ddos/what-i
*Prompts are nuanced and opaque user requests, which require the same capabilities as traditional HTTP requests including secure handling, intelligent routing, robust observability, and integration with backend (API) systems for personalization all outside business logic.*
**Core Features**:
- Built on [Envoy](https://envoyproxy.io): Arch runs alongside application servers build on top of Envoy's proven HTTP management and scalability features to handle ingress and egreess prompts and LLM traffic
- Engineered with purpose-built [(fast) LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68): Arch is optimized for sub-billion parameter LLMs to handle fast, cost-effective, and accurate prompt-based tasks like function/API calling.
- Prompt [Guardrails](https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d): Arch centralizes prompt guardrails to prevent jailbreak attempts and ensure safe user interactions without writing extra code.
- Traffic Management: Arch manages LLM calls, offering smart retries, automatic cutover, and resilient upstream connections for continuous availability.
- Open Observability: Arch uses the W3C Trace Context standard to enable complete request tracing across applications, ensuring compatibility with observability tools, and provides metrics to monitor latency, token usage, and error rates, helping optimize AI application performance.
- [Coming Soon] Intent-Markers: Arch helps developers detect when users shift their intent, improving response relevance, token cost, and speed.
**Jump to our [docs](https://docs.archgw.com)** to learn more about how you can use Arch to improve the speed, robustneess and personalization of your GenAI apps
# Contact
To get in touch with us, please join our [discord server](https://discord.gg/rbjqVbpa). We will be monitoring that actively.
To get in touch with us, please join our [discord server](https://discord.gg/rbjqVbpa). We will be monitoring that actively and offering support there.
# Demos
## Complete
* [Weather Forecast](demos/function-calling/README.md)
* Showing function calling cabaility
## In progress
* Network Co-pilot
## Not Started
* Show routing between different prompt targets (keyword search vs. top-k semantic search).
* Show routing between different prompt-resolver vs RAG-based resolver targets.
* Text Summarization Based on Lightweight vs. Thoughtful Dialogue using OpenAI
* Show conversational and system observability metrics. This includes topic/intent detection
* Show how we can help developers implement safeguards customized to their application requirements and responsible AI policies.
* [Function Calling](demos/function_calling/README.md) -Showcases critical function calling cabaility
* [Insurance Agent](demos/insurance_agent/README.md) -Build a full insurance agent with arch
* [Network Agent](demos/network_agent/README.md) - Build a networking co-pilot/agent agent with arch
# Dev setup
# Quickstart
## Pre-commit
Use instructions at [pre-commit.com](https://pre-commit.com/#install) to set it up for your machine. Once installed make sure github hooks are setup, so that when you upstream your change pre-commit hooks can run and validate your change. Follow command below to setup github hooks,
Follow this guide to learn how to quickly set up Arch and integrate it into your generative AI applications.
```sh
$ brew install pre-commit
$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
## Prerequisites
Before you begin, ensure you have the following:
- `Docker` & `Python` installed on your system
- `API Keys` for LLM providers (if using external LLMs)
The fastest way to get started using Arch is to use [katanemo/arch](https://hub.docker.com/r/katanemo/arch) pre-built binaries.
You can also build it from source.
## Step 1: Install Arch
Arch's CLI allows you to manage and interact with the Arch gateway efficiently. To install the CLI, simply run the following command:
Tip: We recommend that developers create a new Python virtual environment to isolate dependencies before installing Arch. This ensures that archgw and its dependencies do not interfere with other packages on your system.
```console
$ python -m venv venv
$ source venv/bin/activate # On Windows, use: venv\Scripts\activate
$ pip install archgw
```
## Step 2: Configure Arch with your application
Arch operates based on a configuration file where you can define LLM providers, prompt targets, guardrails, etc.
Below is an example configuration to get you started:
```yaml
version: v0.1
listen:
address: 0.0.0.0 # or 127.0.0.1
port: 10000
# Defines how Arch should parse the content from application/json or text/pain Content-type in the http request
message_format: huggingface
# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
llm_providers:
- name: OpenAI
provider: openai
access_key: OPENAI_API_KEY
model: gpt-4o
default: true
stream: true
# default system prompt used by all prompt targets
system_prompt: You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
prompt_targets:
- name: reboot_devices
description: Reboot specific devices or device groups
path: /agent/device_reboot
parameters:
- name: device_ids
type: list
description: A list of device identifiers (IDs) to reboot.
required: false
- name: device_group
type: str
description: The name of the device group to reboot
required: false
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
endpoints:
app_server:
# value could be ip address or a hostname with port
# this could also be a list of endpoints for load balancing
# for example endpoint: [ ip1:port, ip2:port ]
endpoint: 127.0.0.1:80
# max time to wait for a connection to be established
connect_timeout: 0.005s
```
## Step 3: Using OpenAI Client with Arch as an Egress Gateway
Make outbound calls via Arch
```python
import openai
# Set the OpenAI API base URL to the Arch gateway endpoint
openai.api_base = "http://127.0.0.1:51001/v1"
# No need to set openai.api_key since it's configured in Arch's gateway
# Use the OpenAI client as usual
response = openai.Completion.create(
model="text-davinci-003",
prompt="What is the capital of France?"
)
print("OpenAI Response:", response.choices[0].text.strip())
```
## Observability
## Contribution
We would love feedback on our [Roadmap](https://github.com/orgs/katanemo/projects/1) and we welcome contributions to **Arch**!
Whether you're fixing bugs, adding new features, improving documentation, or creating tutorials, your help is much appreciated.
## How to Contribute
### 1. Fork the Repository
Fork the repository to create your own version of **Arch**:
- Navigate to the [Arch GitHub repository](https://github.com/katanemo/arch).
- Click the "Fork" button in the upper right corner.
- This will create a copy of the repository under your GitHub account.
### 2. Clone Your Fork
Once you've forked the repository, clone it to your local machine:
```bash
$ git clone https://github.com/katanemo/arch.git
$ cd arch
```
### 3. Create a branch
Use a descriptive name for your branch (e.g., fix-bug-123, add-feature-x).
```bash
$ git checkout -b <your-branch-name>
```
### 4. Make Your changes
Make your changes in the relevant files. If you're adding new features or fixing bugs, please include tests where applicable.
### 5. Test your changes
```bash
cd arch
cargo test
```
### 6. Push changes, and create a Pull request
Go back to the original Arch repository, and you should see a "Compare & pull request" button. Click that to submit a Pull Request (PR). In your PR description, clearly explain the changes you made and why they are necessary.
We will review your pull request and provide feedback. Once approved, your contribution will be merged into the main repository!
Contribution Guidelines
Ensure that all existing tests pass.
Write clear commit messages.
Add tests for any new functionality.
Follow the existing coding style.
Update documentation as needed.
To get in touch with us, please join our [discord server](https://discord.gg/rbjqVbpa). We will be monitoring that actively and offering support there.

View file

@ -1,24 +0,0 @@
FROM Bolt-Function-Calling-1B-Q4_K_M.gguf
# Set the size of the context window used to generate the next token
PARAMETER num_ctx 4096
# Set parameters for response generation
PARAMETER num_predict 1024
PARAMETER temperature 0.1
PARAMETER top_p 0.5
PARAMETER top_k 32022
PARAMETER repeat_penalty 1.0
PARAMETER stop "<|EOT|>"
# Set the random number seed to use for generation
PARAMETER seed 42
# Set the prompt template to be passed into the model
TEMPLATE """{{ if .System }}<begin▁of▁sentence>
{{ .System }}
{{ end }}{{ if .Prompt }}### Instruction:
{{ .Prompt }}
{{ end }}### Response:
{{ .Response }}
<|EOT|>"""

View file

@ -1,24 +0,0 @@
# Function calling
This demo shows how you can use intelligent prompt gateway to act a copilot for calling the correct proc by capturing the required and optional parametrs from the prompt. This demo assumes you are using ollama running natively. If you want to run ollama running inside docker then please update ollama endpoint in docker-compose file.
# Starting the demo
1. Create `.env` file and set OpenAI key using env var `OPENAI_API_KEY`
1. Start services
```sh
docker compose up
```
1. Download Bolt-FC model. This demo assumes we have downloaded [Bolt-Function-Calling-1B:Q4_K_M](https://huggingface.co/katanemolabs/Bolt-Function-Calling-1B.gguf/blob/main/Bolt-Function-Calling-1B-Q4_K_M.gguf) to local folder.
1. If running ollama natively run
```sh
ollama serve
```
2. Create model file in ollama repository
```sh
ollama create Bolt-Function-Calling-1B:Q4_K_M -f Bolt-FC-1B-Q4_K_M.model_file
```
3. Navigate to http://localhost:18080/
4. You can type in queries like "show me the top 5 employees in each department with highest salary"
- You can also ask follow up questions like "just show the top 2"
5. To see metrics navigate to "http://localhost:3000/" (use admin/grafana for login)
- Open up dahsboard named "Intelligent Gateway Overview"
- On this dashboard you can see reuqest latency and number of requests

View file

@ -1,16 +0,0 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "function-calling api server",
"cwd": "${workspaceFolder}/app",
"type": "debugpy",
"request": "launch",
"module": "uvicorn",
"args": ["main:app","--reload", "--port", "8001"],
}
]
}

View file

@ -1,19 +0,0 @@
FROM python:3 AS base
FROM base AS builder
WORKDIR /src
COPY requirements.txt /src/
RUN pip install --prefix=/runtime --force-reinstall -r requirements.txt
COPY . /src
FROM python:3-slim AS output
COPY --from=builder /runtime /usr/local
COPY /app /app
WORKDIR /app
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

View file

@ -1,29 +0,0 @@
from typing import List, Optional
# Function for top_employees
def top_employees(grouping: str, ranking_criteria: str, top_n: int):
pass
# Function for aggregate_stats
def aggregate_stats(grouping: str, aggregate_criteria: str, aggregate_type: str):
pass
# Function for employees_projects
def employees_projects(min_performance_score: float, min_years_experience: int, department: str, min_project_count: int = None, months_range: int = None):
pass
# Function for salary_growth
def salary_growth(min_salary_increase_percentage: float, department: str = None):
pass
# Function for promotions_increases
def promotions_increases(year: int, min_salary_increase_percentage: float = None, department: str = None):
pass
# Function for avg_project_performance
def avg_project_performance(min_project_count: int, min_performance_score: float, department: str = None):
pass
# Function for certifications_experience
def certifications_experience(certifications: list, min_years_experience: int, department: str = None):
pass

View file

@ -1,78 +0,0 @@
import inspect
import yaml
import functions # This is your module containing the function definitions
import os
def generate_config_from_function(func):
func_name = func.__name__
func_doc = func.__doc__
# Get function signature
sig = inspect.signature(func)
params = []
# Extract parameter info
for name, param in sig.parameters.items():
param_info = {
'name': name,
'description': f"Provide the {name.replace('_', ' ')}", # Customize as needed
'required': param.default == inspect.Parameter.empty, # True if no default value
'type': param.annotation.__name__ if param.annotation != inspect.Parameter.empty else 'str' # Get type if available
}
params.append(param_info)
# Define the config for this function
config = {
'name': func_name,
'description': func_doc or "",
'parameters': params,
'endpoint': {
'cluster': 'api_server',
'path': f"/{func_name}"
},
'system_prompt': f"You are responsible for handling {func_name} requests."
}
return config
def generate_full_config(module):
config = {'prompt_targets': []}
# Automatically get all functions from the module
functions_list = inspect.getmembers(module, inspect.isfunction)
for func_name, func_obj in functions_list:
func_config = generate_config_from_function(func_obj)
config['prompt_targets'].append(func_config)
return config
def replace_prompt_targets_in_config(file_path, new_prompt_targets):
# Load the existing arch_config.yaml
with open(file_path, 'r') as file:
config_data = yaml.safe_load(file)
# Replace the prompt_targets section with the new one
config_data['prompt_targets'] = new_prompt_targets
# Write the updated config back to the YAML file
with open("arch_config.yaml", 'w+') as file:
yaml.dump(config_data, file, sort_keys=False)
print(f"Updated prompt_targets in arch_config.yaml")
# Main execution
if __name__ == "__main__":
# Path to the existing arch_config.yaml two directories up
arch_config_path = os.path.abspath(os.path.join(os.path.dirname(__file__), '../../arch_config.yaml'))
# Generate new prompt_targets from the functions module
new_config = generate_full_config(functions)
new_prompt_targets = new_config['prompt_targets']
# Replace the prompt_targets in the existing arch_config.yaml
replace_prompt_targets_in_config(arch_config_path, new_prompt_targets)

View file

@ -1,288 +0,0 @@
import random
from typing import List
from fastapi import FastAPI, HTTPException, Response
import logging
from pydantic import BaseModel
from utils import load_sql
import pandas as pd
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
app = FastAPI()
@app.get("/healthz")
async def healthz():
return {
"status": "ok"
}
conn = load_sql()
name_col = "name"
class TopEmployees(BaseModel):
grouping: str
ranking_criteria: str
top_n: int
@app.post("/top_employees")
async def top_employees(req: TopEmployees, res: Response):
name_col = "name"
# Check if `req.ranking_criteria` is a Text object and extract its value accordingly
logger.info(
f"{'* ' * 50}\n\nCaptured Ranking Criteria: {req.ranking_criteria}\n\n{'* ' * 50}"
)
if req.ranking_criteria == "yoe":
req.ranking_criteria = "years_of_experience"
elif req.ranking_criteria == "rating":
req.ranking_criteria = "performance_score"
logger.info(
f"{'* ' * 50}\n\nFinal Ranking Criteria: {req.ranking_criteria}\n\n{'* ' * 50}"
)
query = f"""
SELECT {req.grouping}, {name_col}, {req.ranking_criteria}
FROM (
SELECT {req.grouping}, {name_col}, {req.ranking_criteria},
DENSE_RANK() OVER (PARTITION BY {req.grouping} ORDER BY {req.ranking_criteria} DESC) as emp_rank
FROM employees
) ranked_employees
WHERE emp_rank <= {req.top_n};
"""
result_df = pd.read_sql_query(query, conn)
result = result_df.to_dict(orient="records")
return result
class AggregateStats(BaseModel):
grouping: str
aggregate_criteria: str
aggregate_type: str
@app.post("/aggregate_stats")
async def aggregate_stats(req: AggregateStats, res: Response):
logger.info(
f"{'* ' * 50}\n\nCaptured Aggregate Criteria: {req.aggregate_criteria}\n\n{'* ' * 50}"
)
if req.aggregate_criteria == "yoe":
req.aggregate_criteria = "years_of_experience"
logger.info(
f"{'* ' * 50}\n\nFinal Aggregate Criteria: {req.aggregate_criteria}\n\n{'* ' * 50}"
)
logger.info(
f"{'* ' * 50}\n\nCaptured Aggregate Type: {req.aggregate_type}\n\n{'* ' * 50}"
)
if req.aggregate_type.lower() not in ["sum", "avg", "min", "max"]:
if req.aggregate_type.lower() == "count":
req.aggregate_type = "COUNT"
elif req.aggregate_type.lower() == "total":
req.aggregate_type = "SUM"
elif req.aggregate_type.lower() == "average":
req.aggregate_type = "AVG"
elif req.aggregate_type.lower() == "minimum":
req.aggregate_type = "MIN"
elif req.aggregate_type.lower() == "maximum":
req.aggregate_type = "MAX"
else:
raise HTTPException(status_code=400, detail="Invalid aggregate type")
logger.info(
f"{'* ' * 50}\n\nFinal Aggregate Type: {req.aggregate_type}\n\n{'* ' * 50}"
)
query = f"""
SELECT {req.grouping}, {req.aggregate_type}({req.aggregate_criteria}) as {req.aggregate_type}_{req.aggregate_criteria}
FROM employees
GROUP BY {req.grouping};
"""
result_df = pd.read_sql_query(query, conn)
result = result_df.to_dict(orient="records")
return result
# 1. Top Employees by Performance, Projects, and Timeframe
class TopEmployeesProjects(BaseModel):
min_performance_score: float
min_years_experience: int
department: str
min_project_count: int = None # Optional
months_range: int = None # Optional (for filtering recent projects)
@app.post("/employees_projects")
async def employees_projects(req: TopEmployeesProjects, res: Response):
params, filters = {}, []
# Add optional months_range filter
if req.months_range:
params['months_range'] = req.months_range
filters.append(f"p.start_date >= DATE('now', '-{req.months_range} months')")
# Add project count filter if provided
if req.min_project_count:
filters.append(f"COUNT(p.project_id) >= {req.min_project_count}")
where_clause = " AND ".join(filters)
if where_clause:
where_clause = "AND " + where_clause
query = f"""
SELECT e.name, e.department, e.years_of_experience, e.performance_score, COUNT(p.project_id) as project_count
FROM employees e
LEFT JOIN projects p ON e.eid = p.eid
WHERE e.performance_score >= {req.min_performance_score}
AND e.years_of_experience >= {req.min_years_experience}
AND e.department = '{req.department}'
{where_clause}
GROUP BY e.eid, e.name, e.department, e.years_of_experience, e.performance_score
ORDER BY e.performance_score DESC;
"""
result_df = pd.read_sql_query(query, conn, params=params)
return result_df.to_dict(orient='records')
# 2. Employees with Salary Growth Since Last Promotion
class SalaryGrowthRequest(BaseModel):
min_salary_increase_percentage: float
department: str = None # Optional
@app.post("/salary_growth")
async def salary_growth(req: SalaryGrowthRequest, res: Response):
params, filters = {}, []
if req.department:
filters.append("e.department = :department")
params['department'] = req.department
where_clause = " AND ".join(filters)
if where_clause:
where_clause = "AND " + where_clause
query = f"""
SELECT e.name, e.department, s.salary_increase_percentage
FROM employees e
JOIN salary_history s ON e.eid = s.eid
WHERE s.salary_increase_percentage >= {req.min_salary_increase_percentage}
AND s.promotion_date IS NOT NULL
{where_clause}
ORDER BY s.salary_increase_percentage DESC;
"""
result_df = pd.read_sql_query(query, conn, params=params)
return result_df.to_dict(orient='records')
# 4. Employees with Promotions and Salary Increases
class PromotionsIncreasesRequest(BaseModel):
year: int
min_salary_increase_percentage: float = None # Optional
department: str = None # Optional
@app.post("/promotions_increases")
async def promotions_increases(req: PromotionsIncreasesRequest, res: Response):
params, filters = {}, []
if req.min_salary_increase_percentage:
filters.append(f"s.salary_increase_percentage >= {req.min_salary_increase_percentage}")
if req.department:
filters.append("e.department = :department")
params['department'] = req.department
where_clause = " AND ".join(filters)
if where_clause:
where_clause = "AND " + where_clause
query = f"""
SELECT e.name, e.department, s.salary_increase_percentage, s.promotion_date
FROM employees e
JOIN salary_history s ON e.eid = s.eid
WHERE strftime('%Y', s.promotion_date) = '{req.year}'
{where_clause}
ORDER BY s.salary_increase_percentage DESC;
"""
result_df = pd.read_sql_query(query, conn, params=params)
return result_df.to_dict(orient='records')
# 5. Employees with Highest Average Project Performance
class AvgProjPerformanceRequest(BaseModel):
min_project_count: int
min_performance_score: float
department: str = None # Optional
@app.post("/project_performance")
async def project_performance(req: AvgProjPerformanceRequest, res: Response):
params, filters = {}, []
if req.department:
filters.append("e.department = :department")
params['department'] = req.department
filters.append(f"p.performance_score >= {req.min_performance_score}")
where_clause = " AND ".join(filters)
query = f"""
SELECT e.name, e.department, AVG(p.performance_score) as avg_performance_score, COUNT(p.project_id) as project_count
FROM employees e
JOIN projects p ON e.eid = p.eid
WHERE {where_clause}
GROUP BY e.eid, e.name, e.department
HAVING COUNT(p.project_id) >= {req.min_project_count}
ORDER BY avg_performance_score DESC;
"""
result_df = pd.read_sql_query(query, conn, params=params)
return result_df.to_dict(orient='records')
# 6. Employees by Certification and Years of Experience
class CertificationsExperienceRequest(BaseModel):
certifications: List[str]
min_years_experience: int
department: str = None # Optional
@app.post("/certifications_experience")
async def certifications_experience(req: CertificationsExperienceRequest, res: Response):
# Convert the list of certifications into a format for SQL query
certs_filter = ', '.join([f"'{cert}'" for cert in req.certifications])
params, filters = {}, []
# Add department filter if provided
if req.department:
filters.append("e.department = :department")
params['department'] = req.department
filters.append("e.years_of_experience >= :min_years_experience")
params['min_years_experience'] = req.min_years_experience
where_clause = " AND ".join(filters)
query = f"""
SELECT e.name, e.department, e.years_of_experience, COUNT(c.certification_name) as cert_count
FROM employees e
JOIN certifications c ON e.eid = c.eid
WHERE c.certification_name IN ({certs_filter})
AND {where_clause}
GROUP BY e.eid, e.name, e.department, e.years_of_experience
HAVING COUNT(c.certification_name) = {len(req.certifications)}
ORDER BY e.years_of_experience DESC;
"""
result_df = pd.read_sql_query(query, conn, params=params)
return result_df.to_dict(orient='records')

View file

@ -1,157 +0,0 @@
import pandas as pd
import random
import datetime
import sqlite3
def load_sql():
# Example Usage
conn = sqlite3.connect(":memory:")
# create and load the employees table
generate_employee_data(conn)
# create and load the projects table
generate_project_data(conn)
# create and load the salary_history table
generate_salary_history(conn)
# create and load the certifications table
generate_certifications(conn)
return conn
# Function to generate random employee data with `eid` as the primary key
def generate_employee_data(conn):
# List of possible names, positions, departments, and locations
names = [
"Alice",
"Bob",
"Charlie",
"David",
"Eve",
"Frank",
"Grace",
"Hank",
"Ivy",
"Jack",
]
positions = [
"Manager",
"Engineer",
"Salesperson",
"HR Specialist",
"Marketing Analyst",
]
# List of possible names, positions, departments, locations, and certifications
names = ["Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Hank", "Ivy", "Jack"]
positions = ["Manager", "Engineer", "Salesperson", "HR Specialist", "Marketing Analyst"]
departments = ["Engineering", "Marketing", "HR", "Sales", "Finance"]
locations = ["New York", "San Francisco", "Austin", "Boston", "Chicago"]
certifications = ["AWS Certified", "Google Cloud Certified", "PMP", "Scrum Master", "Cisco Certified"]
# Generate random hire dates
def random_hire_date():
start_date = datetime.date(2000, 1, 1)
end_date = datetime.date(2023, 12, 31)
time_between_dates = end_date - start_date
days_between_dates = time_between_dates.days
random_number_of_days = random.randrange(days_between_dates)
return start_date + datetime.timedelta(days=random_number_of_days)
# Generate random employee records with an employee ID (eid)
employees = []
for eid in range(1, 101): # 100 employees with `eid` starting from 1
name = random.choice(names)
position = random.choice(positions)
salary = round(random.uniform(50000, 150000), 2) # Salary between 50,000 and 150,000
department = random.choice(departments)
location = random.choice(locations)
hire_date = random_hire_date()
performance_score = round(random.uniform(1, 5), 2) # Performance score between 1.0 and 5.0
years_of_experience = random.randint(1, 30) # Years of experience between 1 and 30
employee = {
"eid": eid, # Employee ID
"name": name,
"position": position,
"salary": salary,
"department": department,
"location": location,
"hire_date": hire_date,
"performance_score": performance_score,
"years_of_experience": years_of_experience
}
employees.append(employee)
# Convert the list of dictionaries to a DataFrame and save to DB
df_employees = pd.DataFrame(employees)
df_employees.to_sql('employees', conn, index=False, if_exists='replace')
# Function to generate random project data with `eid`
def generate_project_data(conn):
employees = pd.read_sql_query("SELECT eid FROM employees", conn)
projects = []
for _ in range(500): # 500 projects
eid = random.choice(employees['eid'])
project_name = f"Project_{random.randint(1, 100)}"
start_date = datetime.date(2020, 1, 1) + datetime.timedelta(days=random.randint(0, 365 * 3)) # Within the last 3 years
performance_score = round(random.uniform(1, 5), 2) # Performance score for the project between 1.0 and 5.0
project = {
"eid": eid, # Foreign key from employees table
"project_name": project_name,
"start_date": start_date,
"performance_score": performance_score
}
projects.append(project)
# Convert the list of dictionaries to a DataFrame and save to DB
df_projects = pd.DataFrame(projects)
df_projects.to_sql('projects', conn, index=False, if_exists='replace')
# Function to generate random salary history data with `eid`
def generate_salary_history(conn):
employees = pd.read_sql_query("SELECT eid FROM employees", conn)
salary_history = []
for _ in range(300): # 300 salary records
eid = random.choice(employees['eid'])
salary_increase_percentage = round(random.uniform(5, 30), 2) # Salary increase between 5% and 30%
promotion_date = datetime.date(2018, 1, 1) + datetime.timedelta(days=random.randint(0, 365 * 5)) # Promotions in the last 5 years
salary_record = {
"eid": eid, # Foreign key from employees table
"salary_increase_percentage": salary_increase_percentage,
"promotion_date": promotion_date
}
salary_history.append(salary_record)
# Convert the list of dictionaries to a DataFrame and save to DB
df_salary_history = pd.DataFrame(salary_history)
df_salary_history.to_sql('salary_history', conn, index=False, if_exists='replace')
# Function to generate random certifications data with `eid`
def generate_certifications(conn):
employees = pd.read_sql_query("SELECT eid FROM employees", conn)
certifications_list = ["AWS Certified", "Google Cloud Certified", "PMP", "Scrum Master", "Cisco Certified"]
employee_certifications = []
for _ in range(300): # 300 certification records
eid = random.choice(employees['eid'])
certification = random.choice(certifications_list)
cert_record = {
"eid": eid, # Foreign key from employees table
"certification_name": certification
}
employee_certifications.append(cert_record)
# Convert the list of dictionaries to a DataFrame and save to DB
df_certifications = pd.DataFrame(employee_certifications)
df_certifications.to_sql('certifications', conn, index=False, if_exists='replace')

View file

@ -1,4 +0,0 @@
fastapi
uvicorn
pandas
dateparser

View file

@ -1,199 +0,0 @@
default_prompt_endpoint: "127.0.0.1"
load_balancing: "round_robin"
timeout_ms: 5000
overrides:
# confidence threshold for prompt target intent matching
prompt_target_intent_matching_threshold: 0.7
llm_providers:
- name: open-ai-gpt-4
api_key: $OPEN_AI_API_KEY
model: gpt-4
default: true
prompt_targets:
- type: function_resolver
name: top_employees
description: |
Allows you to find the top employees in different groups, such as departments, locations, or position. You can rank the employees by different criteria, like salary, yoe, or rating. Returns the best-ranked employees for each group, helping you identify top n in the list.
parameters:
- name: grouping
description: |
Select how you'd like to group the employees. For example, you can group them by department, location, or their position. The tool will provide the top-ranked employees within each group you choose.
required: true
type: string
enum: [department, location, position]
- name: ranking_criteria
required: true
type: string
description: |
Choose how you'd like to rank the employees. You can rank them by their salary, their years of experience, or their rating. The tool will sort the employees based on this ranking and return the best ones from each group.
enum: [salary, years_of_experience, performance_score]
- name: top_n
required: true
type: integer
description: |
Enter how many of the top employees you want to see in each group. For example, if you enter 3, the tool will show you the top 3 employees for each group you selected.
endpoint:
cluster: api_server
path: /top_employees
system_prompt: |
You are responsible for retrieving the top N employees per group ranked by a constraint.
- type: function_resolver
name: aggregate_stats
description: |
Calculate summary statistics for groups of employees. You can group employees by categories like department or location and then compute totals, averages, or other statistics for specific attributes such as salary or years of experience.
parameters:
- name: grouping
description: |
Choose how you'd like to organize the employees. For example, you can group them by department, location, or position. The tool will calculate the summary statistics for each group.
required: true
enum: [department, location, position]
- name: aggregate_criteria
description: |
Select the specific attribute you'd like to analyze. This could be something like salary, years of experience, or rating. The tool will calculate the statistic you request for this attribute.
required: true
enum: [salary, years_of_experience, performance_score]
- name: aggregate_type
description: |
Choose the type of statistic you'd like to calculate for the selected attribute. For example, you can calculate the sum, average, minimum, or maximum value for each group.
required: true
enum: [SUM, AVG, MIN, MAX]
endpoint:
cluster: api_server
path: /aggregate_stats
system_prompt: |
You help calculate summary statistics for groups of employees. First, organize the employees by the specified grouping (e.g., department, location, or position). Then, compute the requested statistic (e.g., total, average, minimum, or maximum) for a specific attribute like salary, experience, or rating.
# 1. Top Employees by Performance, Projects, and Timeframe
- type: function_resolver
name: employees_projects
description: |
Fetch employees with the highest performance scores, considering their project participation and years of experience. You can filter by minimum performance score, years of experience, and department. Optionally, you can also filter by recent project participation within the last Y months.
parameters:
- name: min_performance_score
description: Minimum performance score to filter employees.
required: true
type: float
- name: min_years_experience
description: Minimum years of experience to filter employees.
required: true
type: integer
- name: department
description: Department to filter employees by.
required: true
type: string
- name: min_project_count
description: Minimum number of projects employees participated in (optional).
required: false
type: integer
- name: months_range
description: Timeframe (in months) for filtering recent projects (optional).
required: false
type: integer
endpoint:
cluster: api_server
path: /employees_projects
system_prompt: |
You are responsible for retrieving the top N employees ranked by performance and project participation. Use filters for experience and optional project criteria.
# 2. Employees with Salary Growth Since Last Promotion
- type: function_resolver
name: salary_growth
description: |
Fetch employees with the highest salary growth since their last promotion, grouped by department. You can filter by a minimum salary increase percentage and department.
parameters:
- name: min_salary_increase_percentage
description: Minimum percentage increase in salary since the last promotion.
required: true
type: float
- name: department
description: Department to filter employees by (optional).
required: false
type: string
endpoint:
cluster: api_server
path: /salary_growth
system_prompt: |
You are responsible for retrieving employees with the highest salary growth since their last promotion. Filter by minimum salary increase percentage and department.
# 4. Employees with Promotions and Salary Increases by Year
- type: function_resolver
name: promotions_increases
description: |
Fetch employees who were promoted and received a salary increase in a specific year, grouped by department. You can optionally filter by minimum percentage salary increase and department.
parameters:
- name: year
description: The year in which the promotion and salary increase occurred.
required: true
type: integer
- name: min_salary_increase_percentage
description: Minimum percentage salary increase to filter employees.
required: false
type: float
- name: department
description: Department to filter by (optional).
required: false
type: string
endpoint:
cluster: api_server
path: /promotions_increases
system_prompt: |
You are responsible for fetching employees who were promoted and received a salary increase in a specific year. Apply filters for salary increase percentage and department.
# 5. Employees with Highest Average Project Performance
- type: function_resolver
name: project_performance
description: |
Fetch employees with the highest average performance across all projects they have worked on over time. You can filter by minimum project count, department, and minimum performance score.
parameters:
- name: min_project_count
description: Minimum number of projects an employee must have participated in.
required: true
type: integer
- name: min_performance_score
description: Minimum performance score to filter employees.
required: true
type: float
default: 4.0
- name: department
description: Department to filter by (optional).
required: false
type: string
endpoint:
cluster: api_server
path: /project_performance
system_prompt: |
You are responsible for fetching employees with the highest average performance across all projects theyve worked on. Apply filters for minimum project count, performance score, and department.
# 6. Employees by Certification and Years of Experience
- type: function_resolver
name: certifications_experience
description: |
Fetch employees who have all the required certifications and meet the minimum years of experience. You can filter by department and provide a list of certifications to match.
parameters:
- name: certifications
description: List of required certifications.
required: true
type: list
- name: min_years_experience
description: Minimum years of experience.
required: true
type: integer
- name: department
description: Department to filter employees by (optional).
required: false
type: string
default: "Engineering"
endpoint:
cluster: api_server
path: /certifications_experience
system_prompt: |
You are responsible for fetching employees who have the required certifications and meet the minimum years of experience. Optionally, filter by department.

View file

@ -1,143 +0,0 @@
services:
config_generator:
build:
context: ../../
dockerfile: config_generator/Dockerfile
volumes:
- ../../arch/envoy.template.yaml:/usr/src/app/envoy.template.yaml
- ./arch_config.yaml:/usr/src/app/arch_config.yaml
- ./generated:/usr/src/app/out
arch:
build:
context: ../../
dockerfile: arch/Dockerfile
hostname: arch
ports:
- "10000:10000"
- "19901:9901"
volumes:
- ./generated/envoy.yaml:/etc/envoy/envoy.yaml
- /etc/ssl/cert.pem:/etc/ssl/cert.pem
- ./arch_config.yaml:/config/arch_config.yaml
depends_on:
config_generator:
condition: service_completed_successfully
model_server:
condition: service_healthy
environment:
- LOG_LEVEL=debug
model_server:
build:
context: ../../model_server
dockerfile: Dockerfile
ports:
- "18081:80"
healthcheck:
test: ["CMD", "curl" ,"http://localhost:80/healthz"]
interval: 5s
retries: 20
volumes:
- ~/.cache/huggingface:/root/.cache/huggingface
- ./arch_config.yaml:/root/arch_config.yaml
api_server:
build:
context: api_server
dockerfile: Dockerfile
ports:
- "18083:80"
healthcheck:
test: ["CMD", "curl" ,"http://localhost:80/healthz"]
interval: 5s
retries: 20
function_resolver:
build:
context: ../../function_resolver
dockerfile: Dockerfile
ports:
- "18082:80"
healthcheck:
test: ["CMD", "curl" ,"http://localhost:80/healthz"]
interval: 5s
retries: 20
volumes:
- ~/.cache/huggingface:/root/.cache/huggingface
environment:
# use ollama endpoint that is hosted by host machine (no virtualization)
- OLLAMA_ENDPOINT=${OLLAMA_ENDPOINT:-host.docker.internal}
# uncomment following line to use ollama endpoint that is hosted by docker
# - OLLAMA_ENDPOINT=ollama
- OLLAMA_MODEL=Bolt-Function-Calling-1B:Q4_K_M
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ./ollama:/root/.ollama
restart: unless-stopped
ports:
- '11434:11434'
profiles:
- manual
open-webui:
image: ghcr.io/open-webui/open-webui:${WEBUI_DOCKER_TAG-main}
container_name: open-webui
volumes:
- ./open-webui:/app/backend/data
# depends_on:
# - ollama
ports:
- 18090:8080
environment:
- OLLAMA_BASE_URL=http://${OLLAMA_ENDPOINT:-host.docker.internal}:11434
- WEBUI_AUTH=false
extra_hosts:
- host.docker.internal:host-gateway
restart: unless-stopped
profiles:
- monitoring
chatbot_ui:
build:
context: ../../chatbot_ui
dockerfile: Dockerfile
ports:
- "18080:8080"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY:?error}
- CHAT_COMPLETION_ENDPOINT=http://arch:10000/v1
prometheus:
image: prom/prometheus
container_name: prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yaml'
ports:
- 9090:9090
restart: unless-stopped
volumes:
- ./prometheus:/etc/prometheus
- ./prom_data:/prometheus
profiles:
- monitoring
grafana:
image: grafana/grafana
container_name: grafana
ports:
- 3000:3000
restart: unless-stopped
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=grafana
volumes:
- ./grafana:/etc/grafana/provisioning/datasources
- ./grafana/dashboard.yaml:/etc/grafana/provisioning/dashboards/main.yaml
- ./grafana/dashboards:/var/lib/grafana/dashboards
profiles:
- monitoring

View file

@ -1,12 +0,0 @@
apiVersion: 1
providers:
- name: "Dashboard provider"
orgId: 1
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: false
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true

View file

@ -1,355 +0,0 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "avg(rate(envoy_cluster_internal_upstream_rq_time_sum[1m]) / rate(envoy_cluster_internal_upstream_rq_time_count[1m])) by (envoy_cluster_name)",
"fullMetaSearch": false,
"hide": false,
"includeNullMetadata": true,
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A",
"useBackend": false
}
],
"title": "request latency - internal (ms)",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "avg(rate(envoy_cluster_external_upstream_rq_time_sum[1m]) / rate(envoy_cluster_external_upstream_rq_time_count[1m])) by (envoy_cluster_name)",
"fullMetaSearch": false,
"hide": false,
"includeNullMetadata": true,
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A",
"useBackend": false
}
],
"title": "request latency - external (ms)",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "avg(rate(envoy_cluster_internal_upstream_rq_completed[1m])) by (envoy_cluster_name)",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A",
"useBackend": false
},
{
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "avg(rate(envoy_cluster_external_upstream_rq_completed[1m])) by (envoy_cluster_name)",
"fullMetaSearch": false,
"hide": false,
"includeNullMetadata": true,
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "B",
"useBackend": false
}
],
"title": "Upstream request count",
"type": "timeseries"
}
],
"schemaVersion": 39,
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-15m",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Intelligent Gateway Overview",
"uid": "adt6uhx5lk8aob",
"version": 3,
"weekStart": ""
}

View file

@ -1,9 +0,0 @@
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus:9090
isDefault: true
access: proxy
editable: true

View file

@ -1,23 +0,0 @@
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: []
scheme: http
timeout: 10s
api_version: v1
scrape_configs:
- job_name: envoy
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /stats
scheme: http
static_configs:
- targets:
- arch:9901
params:
format: ['prometheus']

View file

@ -1,29 +1,20 @@
# Function calling
This demo shows how you can use intelligent prompt gateway to do function calling. This demo assumes you are using ollama running natively. If you want to run ollama running inside docker then please update ollama endpoint in docker-compose file.
This demo shows how you can use Arch's function calling capabilites.
# Starting the demo
1. Ensure that submodule is up to date
```sh
git submodule sync --recursive
```
1. Create `.env` file and set OpenAI key using env var `OPENAI_API_KEY`
1. Start services
2. Start Arch
```sh
docker compose up
archgw up arch_config.yaml
```
1. Download Bolt-FC model. This demo assumes we have downloaded [Arch-Function-Calling-1.5B:Q4_K_M](https://huggingface.co/katanemolabs/Arch-Function-Calling-1.5B.gguf/blob/main/Arch-Function-Calling-1.5B-Q4_K_M.gguf) to local folder.
1. If running ollama natively run
```sh
ollama serve
3. Start Network Agent
```sh
docker compose up
```
2. Create model file in ollama repository
```sh
ollama create Arch-Function-Calling-1.5B:Q4_K_M -f Arch-Function-Calling-1.5B-Q4_K_M.model_file
```
3. Navigate to http://localhost:18080/
4. Navigate to http://localhost:18080/
4. You can type in queries like "how is the weather in Seattle"
- You can also ask follow up questions like "show me sunny days"
5. To see metrics navigate to "http://localhost:3000/" (use admin/grafana for login)
6. To see metrics navigate to "http://localhost:3000/" (use admin/grafana for login)
- Open up dahsboard named "Intelligent Gateway Overview"
- On this dashboard you can see reuqest latency and number of requests
@ -38,6 +29,5 @@ Arch gateway publishes stats endpoint at http://localhost:19901/stats. In this d
1. From grafana left nav click on dashboards and select "Intelligent Gateway Overview" to view arch gateway stats
Here is sample interaction,
Here is a sample interaction,
<img width="575" alt="image" src="https://github.com/user-attachments/assets/e0929490-3eb2-4130-ae87-a732aea4d059">

View file

@ -1 +1,64 @@
The following demo
# Insurance Agent Demo
This demo showcases how the **Arch** can be used to manage insurance-related tasks such as policy inquiries, initiating policies, and updating claims or deductibles. In this demo, the assistant provides factual information related to insurance policies (e.g., car, boat, house, motorcycle).
The system can perform a variety of tasks, such as answering insurance-related questions, retrieving policy coverage details, initiating policies, and updating claims or deductibles.
## Available Functions:
- **Policy Q/A**: Handles general Q&A related to insurance policies.
- **Endpoint**: `/policy/qa`
- This function answers general inquiries related to insurance, such as coverage details or policy types. It is the default target for insurance-related queries.
- **Get Policy Coverage**: Retrieves the coverage details for a given policy type (car, boat, house, motorcycle).
- **Endpoint**: `/policy/coverage`
- Parameters:
- `policy_type` (required): The type of policy. Available options: `car`, `boat`, `house`, `motorcycle`. Defaults to `car`.
- **Initiate Policy**: Starts a policy coverage for car, boat, motorcycle, or house.
- **Endpoint**: `/policy/initiate`
- Parameters:
- `policy_type` (required): The type of policy. Available options: `car`, `boat`, `house`, `motorcycle`. Defaults to `car`.
- `deductible` (required): The deductible amount set for the policy.
- **Update Claim**: Updates the notes on a specific insurance claim.
- **Endpoint**: `/policy/claim`
- Parameters:
- `claim_id` (required): The claim number.
- `notes` (optional): Notes about the claim number for the adjustor to see.
- **Update Deductible**: Updates the deductible amount for a specific policy coverage.
- **Endpoint**: `/policy/deductible`
- Parameters:
- `policy_id` (required): The ID of the policy.
- `deductible` (required): The deductible amount to be set for the policy.
**Arch** is designed to intelligently routes prompts to the appropriate functions based on the target, allowing for seamless interaction with various insurance-related services.
# Starting the demo
1. Create `.env` file and set OpenAI key using env var `OPENAI_API_KEY`
2. Start Arch
```sh
archgw up [path to arch_config.yaml]
```
3. Start Network Agent
```sh
docker compose up
```
3. Navigate to http://localhost:18080/
4. You can type in queries like "show me device statics for the past 7 days"
# Observability
Arch gateway publishes stats endpoint at http://localhost:19901/stats. In this demo we are using grafana to pull stats from
arch and we are using grafana to visalize the stats in dashboard. To see grafana dashboard follow instructions below,
1. Start grafana and prometheus using following command
```yaml
docker compose --profile monitoring up
```
1. Navigate to http://localhost:3000/ to open grafana UI (use admin/grafana as credentials)
1. From grafana left nav click on dashboards and select "Arch" to view the arch gateway stats
Here is sample interaction,
<img width="575" alt="image" src="https://github.com/user-attachments/assets/25d40f46-616e-41ea-be8e-1623055c84ec">

View file

@ -0,0 +1,47 @@
# Network Agent Demo
This demo illustrates how **Arch** can be used to perform function calling with network-related tasks. In this demo, you act as a **network assistant** that provides factual information, without offering advice on manufacturers or purchasing decisions.
The assistant can perform several key operations, including rebooting devices, answering general networking questions, and retrieving device statistics. By default, the system prompt ensures that the assistant's responses are factual and neutral.
## Available Functions:
- **Reboot Devices**: Allows rebooting specific devices or device groups, with an optional time range for scheduling the reboot.
- Parameters:
- `device_ids` (required): A list of device IDs to reboot.
- `time_range` (optional): Specifies the time range in days, defaulting to 7 days if not provided.
- **Network Q/A**: Handles general Q&A related to networking. This function is the default target for general networking queries.
- **Device Summary**: Retrieves statistics for specific devices within a given time range.
- Parameters:
- `device_ids` (required): A list of device IDs for which statistics will be retrieved.
- `time_range` (optional): Specifies the time range in days for gathering statistics, with a default of 7 days.
# Starting the demo
1. Create `.env` file and set OpenAI key using env var `OPENAI_API_KEY`
2. Start Arch
```sh
archgw up [path to arch_config.yaml]
```
3. Start Network Agent
```sh
docker compose up
```
3. Navigate to http://localhost:18080/
4. You can type in queries like "show me device statics for the past 7 days"
# Observability
Arch gateway publishes stats endpoint at http://localhost:19901/stats. In this demo we are using grafana to pull stats from
arch and we are using grafana to visalize the stats in dashboard. To see grafana dashboard follow instructions below,
1. Start grafana and prometheus using following command
```yaml
docker compose --profile monitoring up
```
1. Navigate to http://localhost:3000/ to open grafana UI (use admin/grafana as credentials)
1. From grafana left nav click on dashboards and select "Arch" to view the arch gateway stats
Here is sample interaction,
<img width="575" alt="image" src="https://github.com/user-attachments/assets/25d40f46-616e-41ea-be8e-1623055c84ec">

View file

@ -6,16 +6,45 @@ RAG Application
The following section describes how Arch can help you build faster, smarter and more accurate
Retrieval-Augmented Generation (RAG) applications.
Intent-drift Detection
----------------------
Developers struggle to handle ``follow-up`` or ``clarification`` questions.
Specifically, when users ask for changes or additions to previous responses their AI applications often generate entirely new responses instead of adjusting previous ones.
Arch offers **intent-drift** tracking as a feature so that developers can know when the user has shifted away from a previous intent so that they can dramatically improve retrieval accuracy, lower overall token cost and improve the speed of their responses back to users.
Parameter Extraction for RAG
----------------------------
To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
streamline data retrieval and processing to build more efficient and precise RAG applications.
Step 1: Define Prompt Targets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. literalinclude:: includes/rag/prompt_targets.yaml
:language: yaml
:caption: Prompt Targets
:linenos:
Step 2: Process Request Parameters in Flask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once the prompt targets are configured as above, handling those parameters is
.. literalinclude:: includes/rag/parameter_handling.py
:language: python
:caption: Parameter handling with Flask
:linenos:
[Coming Soon] `Drift Detection via Arch Intent-Markers <https://github.com/orgs/katanemo/projects/1/views/1?pane=issue&itemId=82697909>`_
-----------------------------------------------------------------------------------------------------------------------------------------
Developers struggle to efficiently handle ``follow-up`` or ``clarification`` questions. Specifically, when users ask for
changes or additions to previous responses their AI applications often generate entirely new responses instead of adjusting
previous ones.Arch offers **intent** tracking as a feature so that developers can know when the user has shifted away from a
previous intent so that they can dramatically improve retrieval accuracy, lower overall token cost and improve the speed of
their responses back to users.
Arch uses its built-in lightweight NLI and embedding models to know if the user has steered away from an active intent.
Arch's intent-drift detection mechanism is based on its' :ref:`prompt_targets <prompt_target>` primtive. Arch tries to match an incoming
prompt to one of the prompt_targets configured in the gateway. Once it detects that the user has moved away from an active
active intent, Arch adds the ``x-arch-intent-drift`` headers to the request before sending it your application servers.
active intent, Arch adds the ``x-arch-intent-marker`` headers to the request before sending it your application servers.
.. literalinclude:: includes/rag/intent_detection.py
:language: python
@ -61,30 +90,3 @@ Step 3: Get Messages based on latest drift
You can used the last set of messages that match to an intent to prompt an LLM, use it with an vector-DB for
improved retrieval, etc. With Arch and a few lines of code, you can improve the retrieval accuracy, lower overall
token cost and dramatically improve the speed of their responses back to users.
Parameter Extraction for RAG
----------------------------
To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
streamline data retrieval and processing to build more efficient and precise RAG applications.
Step 1: Define Prompt Targets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. literalinclude:: includes/rag/prompt_targets.yaml
:language: yaml
:caption: Prompt Targets
:linenos:
Step 2: Process Request Parameters in Flask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once the prompt targets are configured as above, handling those parameters is
.. literalinclude:: includes/rag/parameter_handling.py
:language: python
:caption: Parameter handling with Flask
:linenos:

View file

@ -108,7 +108,7 @@ traffic, apply rate limits, and utilize a large set of traffic management capabi
.. Attention::
When you start Arch, it automatically creates a listener port for egress calls to upstream LLMs. This is based on the
``llm_providers`` configuration section in the ``prompt_config.yml`` file. Arch binds itself to a local address such as
127.0.0.1:51001/v1.
127.0.0.1:12000/v1.
Example: Using OpenAI Client with Arch as an Egress Gateway
@ -119,7 +119,7 @@ Example: Using OpenAI Client with Arch as an Egress Gateway
import openai
# Set the OpenAI API base URL to the Arch gateway endpoint
openai.api_base = "http://127.0.0.1:51001/v1"
openai.api_base = "http://127.0.0.1:12000/v1"
# No need to set openai.api_key since it's configured in Arch's gateway

View file

@ -21,7 +21,7 @@ before forwarding them to your application server endpoints. rch enables you to
.. Note::
When you start Arch, you specify a listener address/port that you want to bind downstream. But, Arch uses are predefined port
that you can use (``127.0.0.1:10000``) to proxy egress calls originating from your application to LLMs (API-based or hosted).
that you can use (``127.0.0.1:12000``) to proxy egress calls originating from your application to LLMs (API-based or hosted).
For more details, check out :ref:`LLM provider <llm_provider>`.
**Instance**: An instance of the Arch gateway. When you start Arch it creates at most two processes. One to handle Layer 7