mirror of
https://github.com/katanemo/plano.git
synced 2026-05-08 15:22:43 +02:00
rename x-session-id to x-routing-session-id and fix routing config field name
This commit is contained in:
parent
f699cfb059
commit
5789694d2f
11 changed files with 41 additions and 34 deletions
|
|
@ -108,13 +108,13 @@ The response contains the model list — your client should try `models[0]` firs
|
|||
|
||||
## Session Pinning
|
||||
|
||||
Send an `X-Session-Id` header to pin the routing decision for a session. Once a model is selected, all subsequent requests with the same session ID return the same model without re-running routing.
|
||||
Send an `X-Routing-Session-Id` header to pin the routing decision for a session. Once a model is selected, all subsequent requests with the same session ID return the same model without re-running routing.
|
||||
|
||||
```bash
|
||||
# First call — runs routing, caches result
|
||||
curl http://localhost:12000/routing/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Session-Id: my-session-123" \
|
||||
-H "X-Routing-Session-Id: my-session-123" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Write a Python function for binary search"}]
|
||||
|
|
@ -136,7 +136,7 @@ Response (first call):
|
|||
# Second call — same session, returns cached result
|
||||
curl http://localhost:12000/routing/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Session-Id: my-session-123" \
|
||||
-H "X-Routing-Session-Id: my-session-123" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Now explain merge sort"}]
|
||||
|
|
@ -161,7 +161,7 @@ routing:
|
|||
session_max_entries: 10000 # default: 10000
|
||||
```
|
||||
|
||||
Without the `X-Session-Id` header, routing runs fresh every time (no breaking change).
|
||||
Without the `X-Routing-Session-Id` header, routing runs fresh every time (no breaking change).
|
||||
|
||||
## Kubernetes Deployment (Self-hosted Arch-Router on GPU)
|
||||
|
||||
|
|
|
|||
|
|
@ -114,7 +114,7 @@ echo "--- 7. Session pinning - first call (fresh routing decision) ---"
|
|||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Session-Id: demo-session-001" \
|
||||
-H "X-Routing-Session-Id: demo-session-001" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [
|
||||
|
|
@ -129,7 +129,7 @@ echo " Notice: same model returned with \"pinned\": true, routing was skipped
|
|||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Session-Id: demo-session-001" \
|
||||
-H "X-Routing-Session-Id: demo-session-001" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [
|
||||
|
|
@ -143,7 +143,7 @@ echo "--- 9. Different session gets its own fresh routing ---"
|
|||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Session-Id: demo-session-002" \
|
||||
-H "X-Routing-Session-Id: demo-session-002" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue