updating readme and see how it flows (#556)

* updating readme and see how it flows

* fixed links
---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-4.local>
This commit is contained in:
Salman Paracha 2025-08-21 06:29:47 -07:00 committed by GitHub
parent 89ab51697a
commit 95d28df725
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

142
README.md
View file

@ -4,14 +4,14 @@
<div align="center">
_Arch is a smart proxy server designed as a modular edge and AI gateway for agentic apps_<br><br>
_Arch is a smart proxy server designed as a modular edge and AI gateway for agents._<br><br>
Arch handles the *pesky low-level work* in building agentic apps — like applying guardrails, clarifying vague user input, routing prompts to the right agent, and unifying access to any LLM. Its a language and framework friendly infrastructure layer designed to help you build and ship agentic apps faster.
[Quickstart](#Quickstart) •
[Demos](#Demos) •
[Build agentic apps with Arch](#Build-AI-Agent-with-Arch-Gateway) •
[Route LLMs](#Use-Arch-as-a-LLM-Router) •
[Build agentic apps with Arch](#Build-Agentic-Apps-with-Arch) •
[Documentation](https://docs.archgw.com) •
[Contact](#Contact)
@ -26,12 +26,12 @@ _Arch is a smart proxy server designed as a modular edge and AI gateway for agen
# Overview
<a href="https://www.producthunt.com/posts/arch-3?embed=true&utm_source=badge-top-post-badge&utm_medium=badge&utm_souce=badge-arch&#0045;3" target="_blank"><img src="https://api.producthunt.com/widgets/embed-image/v1/top-post-badge.svg?post_id=565761&theme=dark&period=daily&t=1742359429995" alt="Arch - Build&#0032;fast&#0044;&#0032;hyper&#0045;personalized&#0032;agents&#0032;with&#0032;intelligent&#0032;infra | Product Hunt" style="width: 188px; height: 41px;" width="188" height="41" /></a>
AI demos are easy to build. But past the thrill of a quick hack, you are left building, maintaining and scaling low-level plumbing code for agents that slows down AI innovation. For example:
AI demos are easy to hack. But once you move past the prototype stage, youre stuck building and maintaining low-level plumbing code that slows down real innovation. For example:
- You want to build specialized agents, but get stuck building **routing and handoff** code.
- You want use new LLMs, but struggle to **quickly and safely add LLMs** without writing integration code.
- You're bogged down with prompt engineering work to **clarify user intent and validate inputs**.
- You're wasting cycles choosing and integrating code for **observability** instead of it happening transparently.
- **Routing & orchestration.** Frameworks handle routing and handoffs in tightly coupled ways, so if you want to plug in your own router, planner, or policy engine, youre stuck with a heavy refactor or brittle overrides.
- **Model integration churn.** Frameworks wire LLM integrations directly into code abstractions, making it hard to add or swap models without touching application code — meaning youll have to bounce your app every time you want to experiment with a new provider or version.
- **Observability & governance.** Logging, tracing, and guardrails are baked in as tightly coupled features, so bringing in best-of-breed solutions is painful and often requires digging through the guts of a framework.
- **Prompt engineering overhead**. Input validation, clarifying vague user input, and coercing outputs into the right schema all pile up, turning what should be design work into low-level plumbing work.
With Arch, you can move faster by focusing on higher-level objectives in a language and framework agnostic way. **Arch** was built by the contributors of [Envoy Proxy](https://www.envoyproxy.io/) with the belief that:
@ -39,8 +39,8 @@ With Arch, you can move faster by focusing on higher-level objectives in a langu
**Core Features**:
- `🚦 Routing to Agents`. Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) for fast (<100ms) agent routing and hand-off scenarios
- `🔗 Routing to LLMs`: Unify access and routing to any LLM, including dynamic routing via [preference policies](#Preference-based-Routing).
- `🚦 Route to Agents`: Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) for fast (<100ms) agent routing and hand-off
- `🔗 Route to LLMs`: Unify access to LLMs with support for [dynamic routing](#Preference-based-Routing). Model aliases [coming soon](https://github.com/katanemo/archgw/issues/557)
- `⛨ Guardrails`: Centrally configure and prevent harmful outcomes and ensure safe user interactions
- `⚡ Tools Use`: For common agentic scenarios let Arch instantly clarify and convert prompts to tools/API calls
- `🕵 Observability`: W3C compatible request tracing and LLM metrics that instantly plugin with popular tools
@ -85,7 +85,68 @@ $ source venv/bin/activate # On Windows, use: venv\Scripts\activate
$ pip install archgw==0.3.10
```
### Build Agentic Apps with Arch Gateway
### Use Arch as a LLM Router
Arch supports two primary routing strategies for LLMs: model-based routing and preference-based routing.
#### Model-based Routing
Model-based routing allows you to configure static model names for routing. This is useful when you always want to use a specific model for certain tasks, or manually swap between models. Below an example configuration for model-based routing, and you can follow our [usage guide](demos/use_cases/README.md) on how to get working.
```yaml
version: v0.1.0
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- access_key: $OPENAI_API_KEY
model: openai/gpt-4o
default: true
- access_key: $MISTRAL_API_KEY
model: mistral/mistral-3b-latest
```
#### Preference-based Routing
Preference-based routing is designed for more dynamic and intelligent selection of models. Instead of static model names, you write plain-language routing policies that describe the type of task or preference — for example:
```yaml
version: v0.1.0
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- model: openai/gpt-4.1
access_key: $OPENAI_API_KEY
default: true
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
```
Arch uses a lightweight 1.5B autoregressive model to map prompts (and conversation context) to these policies. This approach adapts to intent drift, supports multi-turn conversations, and avoids the brittleness of embedding-based classifiers or manual if/else chains. No retraining is required when adding new models or updating policies — routing is governed entirely by human-readable rules. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper:
<div align="left">
<a href="https://arxiv.org/abs/2506.16655" target="_blank">
<img src="docs/source/_static/img/arch_router_paper_preview.png" alt="Arch Router Paper Preview">
</a>
</div>
### Build Agentic Apps with Arch
In following quickstart we will show you how easy it is to build AI agent with Arch gateway. We will build a currency exchange agent using following simple steps. For this demo we will use `https://api.frankfurter.dev/` to fetch latest price for currencies and assume USD as base currency.
@ -182,67 +243,6 @@ $ curl --header 'Content-Type: application/json' \
```
### Use Arch as a LLM Router
Arch supports two primary routing strategies for LLMs: model-based routing and preference-based routing.
#### Model-based Routing
Model-based routing allows you to configure static model names for routing. This is useful when you always want to use a specific model for certain tasks, or manually swap between models. Below an example configuration for model-based routing, and you can follow our [usage guide](demos/use_cases/README.md) on how to get working.
```yaml
version: v0.1.0
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- access_key: $OPENAI_API_KEY
model: openai/gpt-4o
default: true
- access_key: $MISTRAL_API_KEY
model: mistral/mistral-3b-latest
```
#### Preference-based Routing
Preference-based routing is designed for more dynamic and intelligent selection of models. Instead of static model names, you write plain-language routing policies that describe the type of task or preference — for example:
```yaml
version: v0.1.0
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- model: openai/gpt-4.1
access_key: $OPENAI_API_KEY
default: true
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
```
Arch uses a lightweight 1.5B autoregressive model to map prompts (and conversation context) to these policies. This approach adapts to intent drift, supports multi-turn conversations, and avoids the brittleness of embedding-based classifiers or manual if/else chains. No retraining is required when adding new models or updating policies — routing is governed entirely by human-readable rules. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper:
<div align="left">
<a href="https://arxiv.org/abs/2506.16655" target="_blank">
<img src="docs/source/_static/img/arch_router_paper_preview.png" alt="Arch Router Paper Preview">
</a>
</div>
## [Observability](https://docs.archgw.com/guides/observability/observability.html)
Arch is designed to support best-in class observability by supporting open standards. Please read our [docs](https://docs.archgw.com/guides/observability/observability.html) on observability for more details on tracing, metrics, and logs. The screenshot below is from our integration with Signoz (among others)