diff --git a/README.md b/README.md
index f299e029..09c8aaca 100644
--- a/README.md
+++ b/README.md
@@ -4,14 +4,14 @@
-_Arch is a smart proxy server designed as a modular edge and AI gateway for agentic apps_
+_Arch is a smart proxy server designed as a modular edge and AI gateway for agents._
Arch handles the *pesky low-level work* in building agentic apps — like applying guardrails, clarifying vague user input, routing prompts to the right agent, and unifying access to any LLM. It’s a language and framework friendly infrastructure layer designed to help you build and ship agentic apps faster.
[Quickstart](#Quickstart) •
[Demos](#Demos) •
-[Build agentic apps with Arch](#Build-AI-Agent-with-Arch-Gateway) •
[Route LLMs](#Use-Arch-as-a-LLM-Router) •
+[Build agentic apps with Arch](#Build-Agentic-Apps-with-Arch) •
[Documentation](https://docs.archgw.com) •
[Contact](#Contact)
@@ -26,12 +26,12 @@ _Arch is a smart proxy server designed as a modular edge and AI gateway for agen
# Overview

-AI demos are easy to build. But past the thrill of a quick hack, you are left building, maintaining and scaling low-level plumbing code for agents that slows down AI innovation. For example:
+AI demos are easy to hack. But once you move past the prototype stage, you’re stuck building and maintaining low-level plumbing code that slows down real innovation. For example:
-- You want to build specialized agents, but get stuck building **routing and handoff** code.
-- You want use new LLMs, but struggle to **quickly and safely add LLMs** without writing integration code.
-- You're bogged down with prompt engineering work to **clarify user intent and validate inputs**.
-- You're wasting cycles choosing and integrating code for **observability** instead of it happening transparently.
+- **Routing & orchestration.** Frameworks handle routing and handoffs in tightly coupled ways, so if you want to plug in your own router, planner, or policy engine, you’re stuck with a heavy refactor or brittle overrides.
+- **Model integration churn.** Frameworks wire LLM integrations directly into code abstractions, making it hard to add or swap models without touching application code — meaning you’ll have to bounce your app every time you want to experiment with a new provider or version.
+- **Observability & governance.** Logging, tracing, and guardrails are baked in as tightly coupled features, so bringing in best-of-breed solutions is painful and often requires digging through the guts of a framework.
+- **Prompt engineering overhead**. Input validation, clarifying vague user input, and coercing outputs into the right schema all pile up, turning what should be design work into low-level plumbing work.
With Arch, you can move faster by focusing on higher-level objectives in a language and framework agnostic way. **Arch** was built by the contributors of [Envoy Proxy](https://www.envoyproxy.io/) with the belief that:
@@ -39,8 +39,8 @@ With Arch, you can move faster by focusing on higher-level objectives in a langu
**Core Features**:
- - `🚦 Routing to Agents`. Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) for fast (<100ms) agent routing and hand-off scenarios
- - `🔗 Routing to LLMs`: Unify access and routing to any LLM, including dynamic routing via [preference policies](#Preference-based-Routing).
+ - `🚦 Route to Agents`: Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) for fast (<100ms) agent routing and hand-off
+ - `🔗 Route to LLMs`: Unify access to LLMs with support for [dynamic routing](#Preference-based-Routing). Model aliases [coming soon](https://github.com/katanemo/archgw/issues/557)
- `⛨ Guardrails`: Centrally configure and prevent harmful outcomes and ensure safe user interactions
- `⚡ Tools Use`: For common agentic scenarios let Arch instantly clarify and convert prompts to tools/API calls
- `🕵 Observability`: W3C compatible request tracing and LLM metrics that instantly plugin with popular tools
@@ -85,7 +85,68 @@ $ source venv/bin/activate # On Windows, use: venv\Scripts\activate
$ pip install archgw==0.3.10
```
-### Build Agentic Apps with Arch Gateway
+### Use Arch as a LLM Router
+Arch supports two primary routing strategies for LLMs: model-based routing and preference-based routing.
+
+#### Model-based Routing
+Model-based routing allows you to configure static model names for routing. This is useful when you always want to use a specific model for certain tasks, or manually swap between models. Below an example configuration for model-based routing, and you can follow our [usage guide](demos/use_cases/README.md) on how to get working.
+
+```yaml
+version: v0.1.0
+
+listeners:
+ egress_traffic:
+ address: 0.0.0.0
+ port: 12000
+ message_format: openai
+ timeout: 30s
+
+llm_providers:
+ - access_key: $OPENAI_API_KEY
+ model: openai/gpt-4o
+ default: true
+
+ - access_key: $MISTRAL_API_KEY
+ model: mistral/mistral-3b-latest
+```
+
+#### Preference-based Routing
+Preference-based routing is designed for more dynamic and intelligent selection of models. Instead of static model names, you write plain-language routing policies that describe the type of task or preference — for example:
+
+```yaml
+version: v0.1.0
+
+listeners:
+ egress_traffic:
+ address: 0.0.0.0
+ port: 12000
+ message_format: openai
+ timeout: 30s
+
+llm_providers:
+ - model: openai/gpt-4.1
+ access_key: $OPENAI_API_KEY
+ default: true
+ routing_preferences:
+ - name: code generation
+ description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
+
+ - model: openai/gpt-4o-mini
+ access_key: $OPENAI_API_KEY
+ routing_preferences:
+ - name: code understanding
+ description: understand and explain existing code snippets, functions, or libraries
+```
+
+Arch uses a lightweight 1.5B autoregressive model to map prompts (and conversation context) to these policies. This approach adapts to intent drift, supports multi-turn conversations, and avoids the brittleness of embedding-based classifiers or manual if/else chains. No retraining is required when adding new models or updating policies — routing is governed entirely by human-readable rules. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper:
+
+
+
+### Build Agentic Apps with Arch
In following quickstart we will show you how easy it is to build AI agent with Arch gateway. We will build a currency exchange agent using following simple steps. For this demo we will use `https://api.frankfurter.dev/` to fetch latest price for currencies and assume USD as base currency.
@@ -182,67 +243,6 @@ $ curl --header 'Content-Type: application/json' \
```
-### Use Arch as a LLM Router
-Arch supports two primary routing strategies for LLMs: model-based routing and preference-based routing.
-
-#### Model-based Routing
-Model-based routing allows you to configure static model names for routing. This is useful when you always want to use a specific model for certain tasks, or manually swap between models. Below an example configuration for model-based routing, and you can follow our [usage guide](demos/use_cases/README.md) on how to get working.
-
-```yaml
-version: v0.1.0
-
-listeners:
- egress_traffic:
- address: 0.0.0.0
- port: 12000
- message_format: openai
- timeout: 30s
-
-llm_providers:
- - access_key: $OPENAI_API_KEY
- model: openai/gpt-4o
- default: true
-
- - access_key: $MISTRAL_API_KEY
- model: mistral/mistral-3b-latest
-```
-
-#### Preference-based Routing
-Preference-based routing is designed for more dynamic and intelligent selection of models. Instead of static model names, you write plain-language routing policies that describe the type of task or preference — for example:
-
-```yaml
-version: v0.1.0
-
-listeners:
- egress_traffic:
- address: 0.0.0.0
- port: 12000
- message_format: openai
- timeout: 30s
-
-llm_providers:
- - model: openai/gpt-4.1
- access_key: $OPENAI_API_KEY
- default: true
- routing_preferences:
- - name: code generation
- description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
-
- - model: openai/gpt-4o-mini
- access_key: $OPENAI_API_KEY
- routing_preferences:
- - name: code understanding
- description: understand and explain existing code snippets, functions, or libraries
-```
-
-Arch uses a lightweight 1.5B autoregressive model to map prompts (and conversation context) to these policies. This approach adapts to intent drift, supports multi-turn conversations, and avoids the brittleness of embedding-based classifiers or manual if/else chains. No retraining is required when adding new models or updating policies — routing is governed entirely by human-readable rules. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper:
-
-
-
## [Observability](https://docs.archgw.com/guides/observability/observability.html)
Arch is designed to support best-in class observability by supporting open standards. Please read our [docs](https://docs.archgw.com/guides/observability/observability.html) on observability for more details on tracing, metrics, and logs. The screenshot below is from our integration with Signoz (among others)