feat(docs): refresh routing models

This commit is contained in:
Musa 2025-12-23 14:02:20 -08:00
parent 9c515116cb
commit ec17252403
3 changed files with 31 additions and 30 deletions

View file

@ -17,6 +17,7 @@ Configuration Structure
All providers are configured in the ``llm_providers`` section of your ``plano_config.yaml`` file: All providers are configured in the ``llm_providers`` section of your ``plano_config.yaml`` file:
.. code-block:: yaml .. code-block:: yaml
llm_providers: llm_providers:
# Provider configurations go here # Provider configurations go here
- model: provider/model-name - model: provider/model-name

View file

@ -18,7 +18,7 @@ Model-based routing
Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``: Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``:
- Use provider-specific names like ``openai/gpt-4o`` or ``anthropic/claude-3-5-sonnet-20241022`` - Use provider-specific names like ``openai/gpt-5.2`` or ``anthropic/claude-sonnet-4-5``
- Provides full control and transparency over which model handles each request - Provides full control and transparency over which model handles each request
- Ideal for production workloads where you want predictable routing behavior - Ideal for production workloads where you want predictable routing behavior
@ -38,14 +38,14 @@ Configure your LLM providers with specific provider/model names:
timeout: 30s timeout: 30s
llm_providers: llm_providers:
- model: openai/gpt-4o-mini - model: openai/gpt-5.2
access_key: $OPENAI_API_KEY access_key: $OPENAI_API_KEY
default: true default: true
- model: openai/gpt-4o - model: openai/gpt-5
access_key: $OPENAI_API_KEY access_key: $OPENAI_API_KEY
- model: anthropic/claude-3-5-sonnet-20241022 - model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY access_key: $ANTHROPIC_API_KEY
Client usage Client usage
@ -57,12 +57,12 @@ Clients specify exact models:
# Direct provider/model specification # Direct provider/model specification
response = client.chat.completions.create( response = client.chat.completions.create(
model="openai/gpt-4o-mini", model="openai/gpt-5.2",
messages=[{"role": "user", "content": "Hello!"}] messages=[{"role": "user", "content": "Hello!"}]
) )
response = client.chat.completions.create( response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022", model="anthropic/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Write a story"}] messages=[{"role": "user", "content": "Write a story"}]
) )
@ -73,7 +73,7 @@ Alias-based routing
Alias-based routing lets you create semantic model names that decouple your application from specific providers: Alias-based routing lets you create semantic model names that decouple your application from specific providers:
- Use meaningful names like ``fast-model``, ``reasoning-model``, or ``arch.summarize.v1`` (see :ref:`model_aliases`) - Use meaningful names like ``fast-model``, ``reasoning-model``, or ``plano.summarize.v1`` (see :ref:`model_aliases`)
- Maps semantic names to underlying provider models for easier experimentation and provider switching - Maps semantic names to underlying provider models for easier experimentation and provider switching
- Ideal for applications that want abstraction from specific model names while maintaining control - Ideal for applications that want abstraction from specific model names while maintaining control
@ -93,25 +93,25 @@ Configure semantic aliases that map to underlying models:
timeout: 30s timeout: 30s
llm_providers: llm_providers:
- model: openai/gpt-4o-mini - model: openai/gpt-5.2
access_key: $OPENAI_API_KEY access_key: $OPENAI_API_KEY
- model: openai/gpt-4o - model: openai/gpt-5
access_key: $OPENAI_API_KEY access_key: $OPENAI_API_KEY
- model: anthropic/claude-3-5-sonnet-20241022 - model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY access_key: $ANTHROPIC_API_KEY
model_aliases: model_aliases:
# Model aliases - friendly names that map to actual provider names # Model aliases - friendly names that map to actual provider names
fast-model: fast-model:
target: gpt-4o-mini target: gpt-5.2
reasoning-model: reasoning-model:
target: gpt-4o target: gpt-5
creative-model: creative-model:
target: claude-3-5-sonnet-20241022 target: claude-sonnet-4-5
Client usage Client usage
^^^^^^^^^^^^ ^^^^^^^^^^^^
@ -160,11 +160,11 @@ To configure preference-aligned dynamic routing, define routing preferences that
timeout: 30s timeout: 30s
llm_providers: llm_providers:
- model: openai/gpt-4o-mini - model: openai/gpt-5.2
access_key: $OPENAI_API_KEY access_key: $OPENAI_API_KEY
default: true default: true
- model: openai/gpt-4o - model: openai/gpt-5
access_key: $OPENAI_API_KEY access_key: $OPENAI_API_KEY
routing_preferences: routing_preferences:
- name: code understanding - name: code understanding
@ -172,7 +172,7 @@ To configure preference-aligned dynamic routing, define routing preferences that
- name: complex reasoning - name: complex reasoning
description: deep analysis, mathematical problem solving, and logical reasoning description: deep analysis, mathematical problem solving, and logical reasoning
- model: anthropic/claude-3-5-sonnet-20241022 - model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY access_key: $ANTHROPIC_API_KEY
routing_preferences: routing_preferences:
- name: creative writing - name: creative writing
@ -190,7 +190,7 @@ Clients can let the router decide or still specify aliases:
# Let Arch-Router choose based on content # Let Arch-Router choose based on content
response = client.chat.completions.create( response = client.chat.completions.create(
messages=[{"role": "user", "content": "Write a creative story about space exploration"}] messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
# No model specified - router will analyze and choose claude-3-5-sonnet-20241022 # No model specified - router will analyze and choose claude-sonnet-4-5
) )
@ -237,17 +237,17 @@ You can combine static model selection with dynamic routing preferences for maxi
:caption: Hybrid Routing Configuration :caption: Hybrid Routing Configuration
llm_providers: llm_providers:
- model: openai/gpt-4o-mini - model: openai/gpt-5.2
access_key: $OPENAI_API_KEY access_key: $OPENAI_API_KEY
default: true default: true
- model: openai/gpt-4o - model: openai/gpt-5
access_key: $OPENAI_API_KEY access_key: $OPENAI_API_KEY
routing_preferences: routing_preferences:
- name: complex_reasoning - name: complex_reasoning
description: deep analysis and complex problem solving description: deep analysis and complex problem solving
- model: anthropic/claude-3-5-sonnet-20241022 - model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY access_key: $ANTHROPIC_API_KEY
routing_preferences: routing_preferences:
- name: creative_tasks - name: creative_tasks
@ -256,14 +256,14 @@ You can combine static model selection with dynamic routing preferences for maxi
model_aliases: model_aliases:
# Model aliases - friendly names that map to actual provider names # Model aliases - friendly names that map to actual provider names
fast-model: fast-model:
target: gpt-4o-mini target: gpt-5.2
reasoning-model: reasoning-model:
target: gpt-4o target: gpt-5
# Aliases that can also participate in dynamic routing # Aliases that can also participate in dynamic routing
creative-model: creative-model:
target: claude-3-5-sonnet-20241022 target: claude-sonnet-4-5
This configuration allows clients to: This configuration allows clients to:

View file

@ -19,24 +19,24 @@ Key Features
How It Works How It Works
^^^^^^^^^^^^ ^^^^^^^^^^^^
Plano exposes access logs for every call it manages on your behalf. By default these access logs can be found under ``~/archgw_logs``. For example: Plano exposes access logs for every call it manages on your behalf. By default these access logs can be found under ``~/plano_logs``. For example:
.. code-block:: console .. code-block:: console
$ tail -F ~/archgw_logs/access_*.log $ tail -F ~/plano_logs/access_*.log
==> /Users/adilhafeez/archgw_logs/access_llm.log <== ==> /Users/username/plano_logs/access_llm.log <==
[2024-10-10T03:55:49.537Z] "POST /v1/chat/completions HTTP/1.1" 0 DC 0 0 770 - "-" "OpenAI/Python 1.51.0" "469793af-b25f-9b57-b265-f376e8d8c586" "api.openai.com" "162.159.140.245:443" [2024-10-10T03:55:49.537Z] "POST /v1/chat/completions HTTP/1.1" 0 DC 0 0 770 - "-" "OpenAI/Python 1.51.0" "469793af-b25f-9b57-b265-f376e8d8c586" "api.openai.com" "162.159.140.245:443"
==> /Users/adilhafeez/archgw_logs/access_internal.log <== ==> /Users/username/plano_logs/access_internal.log <==
[2024-10-10T03:56:03.906Z] "POST /embeddings HTTP/1.1" 200 - 52 21797 54 53 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000" [2024-10-10T03:56:03.906Z] "POST /embeddings HTTP/1.1" 200 - 52 21797 54 53 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
[2024-10-10T03:56:03.961Z] "POST /zeroshot HTTP/1.1" 200 - 106 218 87 87 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000" [2024-10-10T03:56:03.961Z] "POST /zeroshot HTTP/1.1" 200 - 106 218 87 87 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
[2024-10-10T03:56:04.050Z] "POST /v1/chat/completions HTTP/1.1" 200 - 1301 614 441 441 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000" [2024-10-10T03:56:04.050Z] "POST /v1/chat/completions HTTP/1.1" 200 - 1301 614 441 441 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
[2024-10-10T03:56:04.492Z] "POST /hallucination HTTP/1.1" 200 - 556 127 104 104 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000" [2024-10-10T03:56:04.492Z] "POST /hallucination HTTP/1.1" 200 - 556 127 104 104 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
[2024-10-10T03:56:04.598Z] "POST /insurance_claim_details HTTP/1.1" 200 - 447 125 17 17 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "api_server" "192.168.65.254:18083" [2024-10-10T03:56:04.598Z] "POST /insurance_claim_details HTTP/1.1" 200 - 447 125 17 17 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "api_server" "192.168.65.254:18083"
==> /Users/adilhafeez/archgw_logs/access_ingress.log <== ==> /Users/username/plano_logs/access_ingress.log <==
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "arch_llm_listener" "0.0.0.0:12000" [2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "plano_llm_listener" "0.0.0.0:12000"
Log Format Log Format
@ -58,6 +58,6 @@ For example for following request:
.. code-block:: console .. code-block:: console
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "arch_llm_listener" "0.0.0.0:12000" [2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "plano_llm_listener" "0.0.0.0:12000"
Total duration was 1695ms, and the upstream service took 984ms to process the request. Bytes received and sent were 463 and 1022 respectively. Total duration was 1695ms, and the upstream service took 984ms to process the request. Bytes received and sent were 463 and 1022 respectively.