diff --git a/README.md b/README.md index f299e029..09c8aaca 100644 --- a/README.md +++ b/README.md @@ -4,14 +4,14 @@
-_Arch is a smart proxy server designed as a modular edge and AI gateway for agentic apps_

+_Arch is a smart proxy server designed as a modular edge and AI gateway for agents._

Arch handles the *pesky low-level work* in building agentic apps — like applying guardrails, clarifying vague user input, routing prompts to the right agent, and unifying access to any LLM. It’s a language and framework friendly infrastructure layer designed to help you build and ship agentic apps faster. [Quickstart](#Quickstart) • [Demos](#Demos) • -[Build agentic apps with Arch](#Build-AI-Agent-with-Arch-Gateway) • [Route LLMs](#Use-Arch-as-a-LLM-Router) • +[Build agentic apps with Arch](#Build-Agentic-Apps-with-Arch) • [Documentation](https://docs.archgw.com) • [Contact](#Contact) @@ -26,12 +26,12 @@ _Arch is a smart proxy server designed as a modular edge and AI gateway for agen # Overview Arch - Build fast, hyper-personalized agents with intelligent infra | Product Hunt -AI demos are easy to build. But past the thrill of a quick hack, you are left building, maintaining and scaling low-level plumbing code for agents that slows down AI innovation. For example: +AI demos are easy to hack. But once you move past the prototype stage, you’re stuck building and maintaining low-level plumbing code that slows down real innovation. For example: -- You want to build specialized agents, but get stuck building **routing and handoff** code. -- You want use new LLMs, but struggle to **quickly and safely add LLMs** without writing integration code. -- You're bogged down with prompt engineering work to **clarify user intent and validate inputs**. -- You're wasting cycles choosing and integrating code for **observability** instead of it happening transparently. +- **Routing & orchestration.** Frameworks handle routing and handoffs in tightly coupled ways, so if you want to plug in your own router, planner, or policy engine, you’re stuck with a heavy refactor or brittle overrides. +- **Model integration churn.** Frameworks wire LLM integrations directly into code abstractions, making it hard to add or swap models without touching application code — meaning you’ll have to bounce your app every time you want to experiment with a new provider or version. +- **Observability & governance.** Logging, tracing, and guardrails are baked in as tightly coupled features, so bringing in best-of-breed solutions is painful and often requires digging through the guts of a framework. +- **Prompt engineering overhead**. Input validation, clarifying vague user input, and coercing outputs into the right schema all pile up, turning what should be design work into low-level plumbing work. With Arch, you can move faster by focusing on higher-level objectives in a language and framework agnostic way. **Arch** was built by the contributors of [Envoy Proxy](https://www.envoyproxy.io/) with the belief that: @@ -39,8 +39,8 @@ With Arch, you can move faster by focusing on higher-level objectives in a langu **Core Features**: - - `🚦 Routing to Agents`. Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) for fast (<100ms) agent routing and hand-off scenarios - - `🔗 Routing to LLMs`: Unify access and routing to any LLM, including dynamic routing via [preference policies](#Preference-based-Routing). + - `🚦 Route to Agents`: Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) for fast (<100ms) agent routing and hand-off + - `🔗 Route to LLMs`: Unify access to LLMs with support for [dynamic routing](#Preference-based-Routing). Model aliases [coming soon](https://github.com/katanemo/archgw/issues/557) - `⛨ Guardrails`: Centrally configure and prevent harmful outcomes and ensure safe user interactions - `⚡ Tools Use`: For common agentic scenarios let Arch instantly clarify and convert prompts to tools/API calls - `🕵 Observability`: W3C compatible request tracing and LLM metrics that instantly plugin with popular tools @@ -85,7 +85,68 @@ $ source venv/bin/activate # On Windows, use: venv\Scripts\activate $ pip install archgw==0.3.10 ``` -### Build Agentic Apps with Arch Gateway +### Use Arch as a LLM Router +Arch supports two primary routing strategies for LLMs: model-based routing and preference-based routing. + +#### Model-based Routing +Model-based routing allows you to configure static model names for routing. This is useful when you always want to use a specific model for certain tasks, or manually swap between models. Below an example configuration for model-based routing, and you can follow our [usage guide](demos/use_cases/README.md) on how to get working. + +```yaml +version: v0.1.0 + +listeners: + egress_traffic: + address: 0.0.0.0 + port: 12000 + message_format: openai + timeout: 30s + +llm_providers: + - access_key: $OPENAI_API_KEY + model: openai/gpt-4o + default: true + + - access_key: $MISTRAL_API_KEY + model: mistral/mistral-3b-latest +``` + +#### Preference-based Routing +Preference-based routing is designed for more dynamic and intelligent selection of models. Instead of static model names, you write plain-language routing policies that describe the type of task or preference — for example: + +```yaml +version: v0.1.0 + +listeners: + egress_traffic: + address: 0.0.0.0 + port: 12000 + message_format: openai + timeout: 30s + +llm_providers: + - model: openai/gpt-4.1 + access_key: $OPENAI_API_KEY + default: true + routing_preferences: + - name: code generation + description: generating new code snippets, functions, or boilerplate based on user prompts or requirements + + - model: openai/gpt-4o-mini + access_key: $OPENAI_API_KEY + routing_preferences: + - name: code understanding + description: understand and explain existing code snippets, functions, or libraries +``` + +Arch uses a lightweight 1.5B autoregressive model to map prompts (and conversation context) to these policies. This approach adapts to intent drift, supports multi-turn conversations, and avoids the brittleness of embedding-based classifiers or manual if/else chains. No retraining is required when adding new models or updating policies — routing is governed entirely by human-readable rules. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper: + +
+ + Arch Router Paper Preview + +
+ +### Build Agentic Apps with Arch In following quickstart we will show you how easy it is to build AI agent with Arch gateway. We will build a currency exchange agent using following simple steps. For this demo we will use `https://api.frankfurter.dev/` to fetch latest price for currencies and assume USD as base currency. @@ -182,67 +243,6 @@ $ curl --header 'Content-Type: application/json' \ ``` -### Use Arch as a LLM Router -Arch supports two primary routing strategies for LLMs: model-based routing and preference-based routing. - -#### Model-based Routing -Model-based routing allows you to configure static model names for routing. This is useful when you always want to use a specific model for certain tasks, or manually swap between models. Below an example configuration for model-based routing, and you can follow our [usage guide](demos/use_cases/README.md) on how to get working. - -```yaml -version: v0.1.0 - -listeners: - egress_traffic: - address: 0.0.0.0 - port: 12000 - message_format: openai - timeout: 30s - -llm_providers: - - access_key: $OPENAI_API_KEY - model: openai/gpt-4o - default: true - - - access_key: $MISTRAL_API_KEY - model: mistral/mistral-3b-latest -``` - -#### Preference-based Routing -Preference-based routing is designed for more dynamic and intelligent selection of models. Instead of static model names, you write plain-language routing policies that describe the type of task or preference — for example: - -```yaml -version: v0.1.0 - -listeners: - egress_traffic: - address: 0.0.0.0 - port: 12000 - message_format: openai - timeout: 30s - -llm_providers: - - model: openai/gpt-4.1 - access_key: $OPENAI_API_KEY - default: true - routing_preferences: - - name: code generation - description: generating new code snippets, functions, or boilerplate based on user prompts or requirements - - - model: openai/gpt-4o-mini - access_key: $OPENAI_API_KEY - routing_preferences: - - name: code understanding - description: understand and explain existing code snippets, functions, or libraries -``` - -Arch uses a lightweight 1.5B autoregressive model to map prompts (and conversation context) to these policies. This approach adapts to intent drift, supports multi-turn conversations, and avoids the brittleness of embedding-based classifiers or manual if/else chains. No retraining is required when adding new models or updating policies — routing is governed entirely by human-readable rules. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper: - -
- - Arch Router Paper Preview - -
- ## [Observability](https://docs.archgw.com/guides/observability/observability.html) Arch is designed to support best-in class observability by supporting open standards. Please read our [docs](https://docs.archgw.com/guides/observability/observability.html) on observability for more details on tracing, metrics, and logs. The screenshot below is from our integration with Signoz (among others)