fixed docs and added ollama as a first-class LLM provider

This commit is contained in:
Salman Paracha 2025-09-18 23:10:21 -07:00
parent 8d0b468345
commit f7c9d04da9
16 changed files with 1612 additions and 149 deletions

104
README.md
View file

@ -10,7 +10,7 @@ _Arch is a smart proxy server designed as a modular edge and AI gateway for agen
[Quickstart](#Quickstart) •
[Demos](#Demos) •
[Route LLMs](#Use-Arch-as-a-LLM-Router) •
[Route LLMs](#use-arch-as-a-llm-router) •
[Build agentic apps with Arch](#Build-Agentic-Apps-with-Arch) •
[Documentation](https://docs.archgw.com) •
[Contact](#Contact)
@ -41,7 +41,7 @@ With Arch, you can move faster by focusing on higher-level objectives in a langu
**Core Features**:
- `🚦 Route to Agents`: Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) for fast (<100ms) agent routing and hand-off
- `🔗 Route to LLMs`: Unify access to LLMs with support for [dynamic routing](#Preference-based-Routing). Model aliases [coming soon](https://github.com/katanemo/archgw/issues/557)
- `🔗 Route to LLMs`: Unify access to LLMs with support for [three routing strategies](#use-arch-as-a-llm-router).
- `⛨ Guardrails`: Centrally configure and prevent harmful outcomes and ensure safe user interactions
- `⚡ Tools Use`: For common agentic scenarios let Arch instantly clarify and convert prompts to tools/API calls
- `🕵 Observability`: W3C compatible request tracing and LLM metrics that instantly plugin with popular tools
@ -87,10 +87,10 @@ $ pip install archgw==0.3.12
```
### Use Arch as a LLM Router
Arch supports two primary routing strategies for LLMs: model-based routing and preference-based routing.
Arch supports three powerful routing strategies for LLMs: model-based routing, alias-based routing, and preference-based routing. Each strategy offers different levels of abstraction and control for managing your LLM infrastructure.
#### Model-based Routing
Model-based routing allows you to configure static model names for routing. This is useful when you always want to use a specific model for certain tasks, or manually swap between models. Below an example configuration for model-based routing, and you can follow our [usage guide](demos/use_cases/README.md) on how to get working.
Model-based routing allows you to configure specific models with static routing. This is ideal when you need direct control over which models handle specific requests. Arch supports 11+ LLM providers including OpenAI, Anthropic, DeepSeek, Mistral, Groq, and more.
```yaml
version: v0.1.0
@ -103,16 +103,31 @@ listeners:
timeout: 30s
llm_providers:
- access_key: $OPENAI_API_KEY
model: openai/gpt-4o
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
- access_key: $MISTRAL_API_KEY
model: mistral/mistral-3b-latest
- model: anthropic/claude-3-5-sonnet-20241022
access_key: $ANTHROPIC_API_KEY
```
#### Preference-based Routing
Preference-based routing is designed for more dynamic and intelligent selection of models. Instead of static model names, you write plain-language routing policies that describe the type of task or preference — for example:
You can then route to specific models using any OpenAI-compatible client:
```python
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
# Route to specific model
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
```
#### Alias-based Routing
Alias-based routing lets you create semantic model names that map to underlying providers. This approach decouples your application code from specific model names, making it easy to experiment with different models or handle provider changes.
```yaml
version: v0.1.0
@ -125,21 +140,68 @@ listeners:
timeout: 30s
llm_providers:
- model: openai/gpt-4.1
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
- model: anthropic/claude-3-5-sonnet-20241022
access_key: $ANTHROPIC_API_KEY
model_aliases:
- alias: fast-model
models:
- openai/gpt-4o-mini
- anthropic/claude-3-5-haiku-20241022
- alias: reasoning-model
models:
- openai/gpt-4o
- anthropic/claude-3-5-sonnet-20241022
```
Arch uses a lightweight 1.5B autoregressive model to map prompts (and conversation context) to these policies. This approach adapts to intent drift, supports multi-turn conversations, and avoids the brittleness of embedding-based classifiers or manual if/else chains. No retraining is required when adding new models or updating policies — routing is governed entirely by human-readable rules. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper:
Use semantic aliases in your application code:
```python
# Your code uses semantic names instead of provider-specific ones
response = client.chat.completions.create(
model="reasoning-model", # Routes to best available reasoning model
messages=[{"role": "user", "content": "Solve this complex problem..."}]
)
```
#### Preference-aligned Routing
Preference-aligned routing provides intelligent, dynamic model selection based on natural language descriptions of tasks and preferences. Instead of hardcoded routing logic, you describe what each model is good at using plain English.
```yaml
version: v0.1.0
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: complex_reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
- name: creative_writing
description: storytelling, creative content, and artistic writing
- model: deepseek/deepseek-coder
access_key: $DEEPSEEK_API_KEY
routing_preferences:
- name: code_generation
description: generating new code, writing functions, and creating scripts
- name: code_review
description: analyzing existing code for bugs, improvements, and optimization
```
Arch uses a lightweight 1.5B autoregressive model to intelligently map user prompts to these preferences, automatically selecting the best model for each request. This approach adapts to intent drift, supports multi-turn conversations, and avoids brittle embedding-based classifiers or manual if/else chains. No retraining required when adding models or updating policies — routing is governed entirely by human-readable rules.
**Learn More**: Check our [documentation](https://docs.archgw.com/concepts/llm_providers/llm_providers.html) for comprehensive provider setup guides and routing strategies. You can learn more about the design, benchmarks, and methodology behind preference-based routing in our paper:
<div align="left">
<a href="https://arxiv.org/abs/2506.16655" target="_blank">