mirror of
https://github.com/katanemo/plano.git
synced 2026-06-02 14:35:14 +02:00
use plano-orchestrator for LLM routing, remove arch-router
Replace RouterService/RouterModelV1 (arch-router prompt) with OrchestratorService/OrchestratorModelV1 (plano-orchestrator prompt) for LLM routing. This ensures the correct system prompt is used when llm_routing_model points at a Plano-Orchestrator model. - Extend OrchestratorService with session caching, ModelMetricsService, top-level routing preferences, and determine_route() for LLM routing - Delete RouterService, RouterModel trait, RouterModelV1, and ARCH_ROUTER_V1_SYSTEM_PROMPT - Unify defaults to Plano-Orchestrator / plano-orchestrator - Update CLI config generator, demos, docs, and config schema Made-with: Cursor
This commit is contained in:
parent
980faef6be
commit
af724fcc1e
27 changed files with 380 additions and 1412 deletions
|
|
@ -1,18 +1,18 @@
|
|||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: arch-router
|
||||
name: plano-orchestrator
|
||||
labels:
|
||||
app: arch-router
|
||||
app: plano-orchestrator
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: arch-router
|
||||
app: plano-orchestrator
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: arch-router
|
||||
app: plano-orchestrator
|
||||
spec:
|
||||
tolerations:
|
||||
- key: nvidia.com/gpu
|
||||
|
|
@ -53,7 +53,7 @@ spec:
|
|||
- "--tokenizer"
|
||||
- "katanemo/Arch-Router-1.5B"
|
||||
- "--served-model-name"
|
||||
- "Arch-Router"
|
||||
- "Plano-Orchestrator"
|
||||
- "--gpu-memory-utilization"
|
||||
- "0.3"
|
||||
- "--tensor-parallel-size"
|
||||
|
|
@ -94,10 +94,10 @@ spec:
|
|||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: arch-router
|
||||
name: plano-orchestrator
|
||||
spec:
|
||||
selector:
|
||||
app: arch-router
|
||||
app: plano-orchestrator
|
||||
ports:
|
||||
- name: http
|
||||
port: 10000
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue