mirror of
https://github.com/katanemo/plano.git
synced 2026-04-25 16:56:24 +02:00
Model affinity for consistent model selection in agentic loops (#827)
Some checks are pending
CI / pre-commit (push) Waiting to run
CI / plano-tools-tests (push) Waiting to run
CI / native-smoke-test (push) Waiting to run
CI / docker-build (push) Waiting to run
CI / validate-config (push) Waiting to run
CI / security-scan (push) Blocked by required conditions
CI / test-prompt-gateway (push) Blocked by required conditions
CI / test-model-alias-routing (push) Blocked by required conditions
CI / test-responses-api-with-state (push) Blocked by required conditions
CI / e2e-plano-tests (3.10) (push) Blocked by required conditions
CI / e2e-plano-tests (3.11) (push) Blocked by required conditions
CI / e2e-plano-tests (3.12) (push) Blocked by required conditions
CI / e2e-plano-tests (3.13) (push) Blocked by required conditions
CI / e2e-plano-tests (3.14) (push) Blocked by required conditions
CI / e2e-demo-preference (push) Blocked by required conditions
CI / e2e-demo-currency (push) Blocked by required conditions
Publish docker image (latest) / build-arm64 (push) Waiting to run
Publish docker image (latest) / build-amd64 (push) Waiting to run
Publish docker image (latest) / create-manifest (push) Blocked by required conditions
Build and Deploy Documentation / build (push) Waiting to run
Some checks are pending
CI / pre-commit (push) Waiting to run
CI / plano-tools-tests (push) Waiting to run
CI / native-smoke-test (push) Waiting to run
CI / docker-build (push) Waiting to run
CI / validate-config (push) Waiting to run
CI / security-scan (push) Blocked by required conditions
CI / test-prompt-gateway (push) Blocked by required conditions
CI / test-model-alias-routing (push) Blocked by required conditions
CI / test-responses-api-with-state (push) Blocked by required conditions
CI / e2e-plano-tests (3.10) (push) Blocked by required conditions
CI / e2e-plano-tests (3.11) (push) Blocked by required conditions
CI / e2e-plano-tests (3.12) (push) Blocked by required conditions
CI / e2e-plano-tests (3.13) (push) Blocked by required conditions
CI / e2e-plano-tests (3.14) (push) Blocked by required conditions
CI / e2e-demo-preference (push) Blocked by required conditions
CI / e2e-demo-currency (push) Blocked by required conditions
Publish docker image (latest) / build-arm64 (push) Waiting to run
Publish docker image (latest) / build-amd64 (push) Waiting to run
Publish docker image (latest) / create-manifest (push) Blocked by required conditions
Build and Deploy Documentation / build (push) Waiting to run
This commit is contained in:
parent
978b1ea722
commit
8dedf0bec1
13 changed files with 614 additions and 43 deletions
|
|
@ -120,6 +120,49 @@ routing_preferences:
|
|||
|
||||
---
|
||||
|
||||
## Model Affinity
|
||||
|
||||
In agentic loops where the same session makes multiple LLM calls, send an `X-Model-Affinity` header to pin the routing decision. The first request routes normally and caches the result. All subsequent requests with the same affinity ID return the cached model without re-running routing.
|
||||
|
||||
```json
|
||||
POST /v1/chat/completions
|
||||
X-Model-Affinity: a1b2c3d4-5678-...
|
||||
|
||||
{
|
||||
"model": "openai/gpt-4o-mini",
|
||||
"messages": [...]
|
||||
}
|
||||
```
|
||||
|
||||
The routing decision endpoint also supports model affinity:
|
||||
|
||||
```json
|
||||
POST /routing/v1/chat/completions
|
||||
X-Model-Affinity: a1b2c3d4-5678-...
|
||||
```
|
||||
|
||||
Response when pinned:
|
||||
```json
|
||||
{
|
||||
"models": ["anthropic/claude-sonnet-4-20250514"],
|
||||
"route": "code generation",
|
||||
"trace_id": "...",
|
||||
"session_id": "a1b2c3d4-5678-...",
|
||||
"pinned": true
|
||||
}
|
||||
```
|
||||
|
||||
Without the header, routing runs fresh every time (no breaking change).
|
||||
|
||||
Configure TTL and cache size:
|
||||
```yaml
|
||||
routing:
|
||||
session_ttl_seconds: 600 # default: 10 min
|
||||
session_max_entries: 10000 # upper limit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Version Requirements
|
||||
|
||||
| Version | Top-level `routing_preferences` |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue