mirror of
https://github.com/katanemo/plano.git
synced 2026-05-21 13:55:15 +02:00
Merge remote-tracking branch 'origin/main' into adil/refactor_brightstaff
Made-with: Cursor # Conflicts: # crates/brightstaff/src/main.rs # crates/brightstaff/src/router/plano_orchestrator.rs
This commit is contained in:
commit
c7d8ba7556
49 changed files with 1088 additions and 398 deletions
12
.claude/skills/build-brightstaff/SKILL.md
Normal file
12
.claude/skills/build-brightstaff/SKILL.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
---
|
||||
name: build-brightstaff
|
||||
description: Build the brightstaff native binary. Use when brightstaff code changes.
|
||||
---
|
||||
|
||||
Build brightstaff:
|
||||
|
||||
```
|
||||
cd crates && cargo build --release -p brightstaff
|
||||
```
|
||||
|
||||
If the build fails, diagnose and fix the errors.
|
||||
10
.claude/skills/build-cli/SKILL.md
Normal file
10
.claude/skills/build-cli/SKILL.md
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
---
|
||||
name: build-cli
|
||||
description: Build and install the Python CLI (planoai). Use after making changes to cli/ code to install locally.
|
||||
---
|
||||
|
||||
1. `cd cli && uv sync` — ensure dependencies are installed
|
||||
2. `cd cli && uv tool install --editable .` — install the CLI locally
|
||||
3. Verify the installation: `cd cli && uv run planoai --help`
|
||||
|
||||
If the build or install fails, diagnose and fix the issues.
|
||||
12
.claude/skills/build-wasm/SKILL.md
Normal file
12
.claude/skills/build-wasm/SKILL.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
---
|
||||
name: build-wasm
|
||||
description: Build the WASM plugins for Envoy. Use when WASM plugin code changes.
|
||||
---
|
||||
|
||||
Build the WASM plugins:
|
||||
|
||||
```
|
||||
cd crates && cargo build --release --target=wasm32-wasip1 -p llm_gateway -p prompt_gateway
|
||||
```
|
||||
|
||||
If the build fails, diagnose and fix the errors.
|
||||
12
.claude/skills/check/SKILL.md
Normal file
12
.claude/skills/check/SKILL.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
---
|
||||
name: check
|
||||
description: Run Rust fmt, clippy, and unit tests. Use after making Rust code changes.
|
||||
---
|
||||
|
||||
Run all local checks in order:
|
||||
|
||||
1. `cd crates && cargo fmt --all -- --check` — if formatting fails, run `cargo fmt --all` to fix it
|
||||
2. `cd crates && cargo clippy --locked --all-targets --all-features -- -D warnings` — fix any warnings
|
||||
3. `cd crates && cargo test --lib` — ensure all unit tests pass
|
||||
|
||||
Report a summary of what passed/failed.
|
||||
17
.claude/skills/new-provider/SKILL.md
Normal file
17
.claude/skills/new-provider/SKILL.md
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
---
|
||||
name: new-provider
|
||||
description: Add a new LLM provider to hermesllm. Use when integrating a new AI provider.
|
||||
disable-model-invocation: true
|
||||
user-invocable: true
|
||||
---
|
||||
|
||||
Add a new LLM provider to hermesllm. The user will provide the provider name as $ARGUMENTS.
|
||||
|
||||
1. Add a new variant to `ProviderId` enum in `crates/hermesllm/src/providers/id.rs`
|
||||
2. Implement string parsing in the `TryFrom<&str>` impl for the new provider
|
||||
3. If the provider uses a non-OpenAI API format, create request/response types in `crates/hermesllm/src/apis/`
|
||||
4. Add variant to `ProviderRequestType` and `ProviderResponseType` enums and update all match arms
|
||||
5. Add model list to `crates/hermesllm/src/providers/provider_models.yaml`
|
||||
6. Update `SupportedUpstreamAPIs` mapping if needed
|
||||
|
||||
After making changes, run `cd crates && cargo test --lib` to verify everything compiles and tests pass.
|
||||
16
.claude/skills/pr/SKILL.md
Normal file
16
.claude/skills/pr/SKILL.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
name: pr
|
||||
description: Create a feature branch and open a pull request for the current changes.
|
||||
disable-model-invocation: true
|
||||
user-invocable: true
|
||||
---
|
||||
|
||||
Create a pull request for the current changes:
|
||||
|
||||
1. Determine the GitHub username via `gh api user --jq .login`. If the login is `adilhafeez`, use `adil` instead.
|
||||
2. Create a feature branch using format `<username>/<feature_name>` — infer the feature name from the changes
|
||||
3. Run `cd crates && cargo fmt --all -- --check` and `cd crates && cargo clippy --locked --all-targets --all-features -- -D warnings` to verify Rust code is clean
|
||||
4. Commit all changes with a short, concise commit message (one line, no Co-Authored-By)
|
||||
5. Push the branch and create a PR targeting `main`
|
||||
|
||||
Keep the PR title short (under 70 chars). Include a brief summary in the body. Never include a "Test plan" section or any "Generated with Claude Code" attribution.
|
||||
30
.claude/skills/release/SKILL.md
Normal file
30
.claude/skills/release/SKILL.md
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
---
|
||||
name: release
|
||||
description: Bump the Plano version across all required files. Use when preparing a release.
|
||||
disable-model-invocation: true
|
||||
user-invocable: true
|
||||
---
|
||||
|
||||
Prepare a release version bump. The user may provide the new version number as $ARGUMENTS (e.g., `/release 0.4.12`), or a bump type (`major`, `minor`, `patch`).
|
||||
|
||||
If no argument is provided, read the current version from `cli/planoai/__init__.py`, auto-increment the patch version (e.g., `0.4.11` → `0.4.12`), and confirm with the user before proceeding.
|
||||
|
||||
Update the version string in ALL of these files:
|
||||
|
||||
- `.github/workflows/ci.yml`
|
||||
- `cli/planoai/__init__.py`
|
||||
- `cli/planoai/consts.py`
|
||||
- `cli/pyproject.toml`
|
||||
- `build_filter_image.sh`
|
||||
- `config/validate_plano_config.sh`
|
||||
- `docs/source/conf.py`
|
||||
- `docs/source/get_started/quickstart.rst`
|
||||
- `docs/source/resources/deployment.rst`
|
||||
- `apps/www/src/components/Hero.tsx`
|
||||
- `demos/llm_routing/preference_based_routing/README.md`
|
||||
|
||||
Do NOT change version strings in `*.lock` files or `Cargo.lock`.
|
||||
|
||||
After updating all version strings, run `cd cli && uv lock` to update the lock file with the new version.
|
||||
|
||||
After making changes, show a summary of all files modified and the old → new version.
|
||||
9
.claude/skills/test-python/SKILL.md
Normal file
9
.claude/skills/test-python/SKILL.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
---
|
||||
name: test-python
|
||||
description: Run Python CLI tests. Use after making changes to cli/ code.
|
||||
---
|
||||
|
||||
1. `cd cli && uv sync` — ensure dependencies are installed
|
||||
2. `cd cli && uv run pytest -v` — run all tests
|
||||
|
||||
If tests fail, diagnose and fix the issues.
|
||||
4
.github/workflows/ci.yml
vendored
4
.github/workflows/ci.yml
vendored
|
|
@ -133,13 +133,13 @@ jobs:
|
|||
load: true
|
||||
tags: |
|
||||
${{ env.PLANO_DOCKER_IMAGE }}
|
||||
${{ env.DOCKER_IMAGE }}:0.4.11
|
||||
${{ env.DOCKER_IMAGE }}:0.4.12
|
||||
${{ env.DOCKER_IMAGE }}:latest
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
|
||||
- name: Save image as artifact
|
||||
run: docker save ${{ env.PLANO_DOCKER_IMAGE }} ${{ env.DOCKER_IMAGE }}:0.4.11 ${{ env.DOCKER_IMAGE }}:latest -o /tmp/plano-image.tar
|
||||
run: docker save ${{ env.PLANO_DOCKER_IMAGE }} ${{ env.DOCKER_IMAGE }}:0.4.12 ${{ env.DOCKER_IMAGE }}:latest -o /tmp/plano-image.tar
|
||||
|
||||
- name: Upload image artifact
|
||||
uses: actions/upload-artifact@v6
|
||||
|
|
|
|||
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -152,3 +152,4 @@ apps/*/dist/
|
|||
|
||||
.cursor/
|
||||
.agents
|
||||
docs/do/
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@ repos:
|
|||
hooks:
|
||||
- id: check-yaml
|
||||
exclude: config/envoy.template*
|
||||
args: [--allow-multiple-documents]
|
||||
- id: end-of-file-fixer
|
||||
- id: trailing-whitespace
|
||||
- repo: local
|
||||
|
|
|
|||
152
CLAUDE.md
152
CLAUDE.md
|
|
@ -1,152 +1,106 @@
|
|||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
Plano is an AI-native proxy server and data plane for agentic applications, built on Envoy proxy. It centralizes agent orchestration, LLM routing, observability, and safety guardrails as an out-of-process dataplane.
|
||||
|
||||
## Build & Test Commands
|
||||
|
||||
### Rust (crates/)
|
||||
|
||||
```bash
|
||||
# Build WASM plugins (must target wasm32-wasip1)
|
||||
# Rust — WASM plugins (must target wasm32-wasip1)
|
||||
cd crates && cargo build --release --target=wasm32-wasip1 -p llm_gateway -p prompt_gateway
|
||||
|
||||
# Build brightstaff binary (native target)
|
||||
# Rust — brightstaff binary (native target)
|
||||
cd crates && cargo build --release -p brightstaff
|
||||
|
||||
# Run unit tests
|
||||
# Rust — tests, format, lint
|
||||
cd crates && cargo test --lib
|
||||
|
||||
# Format check
|
||||
cd crates && cargo fmt --all -- --check
|
||||
|
||||
# Lint
|
||||
cd crates && cargo clippy --locked --all-targets --all-features -- -D warnings
|
||||
```
|
||||
|
||||
### Python CLI (cli/)
|
||||
# Python CLI
|
||||
cd cli && uv sync && uv run pytest -v
|
||||
|
||||
```bash
|
||||
cd cli && uv sync # Install dependencies
|
||||
cd cli && uv run pytest -v # Run tests
|
||||
cd cli && uv run planoai --help # Run CLI
|
||||
```
|
||||
# JS/TS (Turbo monorepo)
|
||||
npm run build && npm run lint && npm run typecheck
|
||||
|
||||
### JavaScript/TypeScript (apps/, packages/)
|
||||
|
||||
```bash
|
||||
npm run build # Build all (via Turbo)
|
||||
npm run lint # Lint all
|
||||
npm run dev # Dev servers
|
||||
npm run typecheck # Type check
|
||||
```
|
||||
|
||||
### Pre-commit (runs fmt, clippy, cargo test, black, yaml checks)
|
||||
|
||||
```bash
|
||||
# Pre-commit (fmt, clippy, cargo test, black, yaml)
|
||||
pre-commit run --all-files
|
||||
```
|
||||
|
||||
### Docker
|
||||
|
||||
```bash
|
||||
# Docker
|
||||
docker build -t katanemo/plano:latest .
|
||||
```
|
||||
|
||||
### E2E Tests (tests/e2e/)
|
||||
|
||||
E2E tests require a built Docker image and API keys. They run via `tests/e2e/run_e2e_tests.sh` which executes four test suites: `test_prompt_gateway.py`, `test_model_alias_routing.py`, `test_openai_responses_api_client.py`, and `test_openai_responses_api_client_with_state.py`.
|
||||
E2E tests require a Docker image and API keys: `tests/e2e/run_e2e_tests.sh`
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Data Flow
|
||||
|
||||
Requests flow through Envoy proxy with two WASM filter plugins, backed by a native Rust binary:
|
||||
|
||||
```
|
||||
Client → Envoy (prompt_gateway.wasm → llm_gateway.wasm) → Agents/LLM Providers
|
||||
↕
|
||||
brightstaff (native binary: state, routing, signals, tracing)
|
||||
```
|
||||
|
||||
### Rust Crates (crates/)
|
||||
### Crates (crates/)
|
||||
|
||||
All crates share a Cargo workspace. Two compile to `wasm32-wasip1` for Envoy, the rest are native:
|
||||
|
||||
- **prompt_gateway** (WASM) — Proxy-WASM filter for prompt/message processing, guardrails, and filter chains
|
||||
- **prompt_gateway** (WASM) — Proxy-WASM filter for prompt processing, guardrails, filter chains
|
||||
- **llm_gateway** (WASM) — Proxy-WASM filter for LLM request/response handling and routing
|
||||
- **brightstaff** (native binary) — Core application server: handlers, router, signals, state management, tracing
|
||||
- **common** (library) — Shared across all crates: configuration, LLM provider abstractions, HTTP utilities, routing logic, rate limiting, tokenizer, PII detection, tracing
|
||||
- **hermesllm** (library) — Translates LLM API formats between providers (OpenAI, Anthropic, Gemini, Mistral, Grok, AWS Bedrock, Azure, together.ai). Key types: `ProviderId`, `ProviderRequest`, `ProviderResponse`, `ProviderStreamResponse`
|
||||
- **brightstaff** (native) — Core server: handlers, router, signals, state, tracing
|
||||
- **common** (lib) — Shared: config, HTTP, routing, rate limiting, tokenizer, PII, tracing
|
||||
- **hermesllm** (lib) — LLM API translation between providers. Key types: `ProviderId`, `ProviderRequest`, `ProviderResponse`, `ProviderStreamResponse`
|
||||
|
||||
### Python CLI (cli/planoai/)
|
||||
|
||||
The `planoai` CLI manages the Plano lifecycle. Key commands:
|
||||
- `planoai up <config.yaml>` — Validate config, check API keys, start Docker container
|
||||
- `planoai down` — Stop container
|
||||
- `planoai build` — Build Docker image from repo root
|
||||
- `planoai logs` — Stream access/debug logs
|
||||
- `planoai trace` — OTEL trace collection and analysis
|
||||
- `planoai init` — Initialize new project
|
||||
- `planoai cli_agent` — Start a CLI agent connected to Plano
|
||||
- `planoai generate_prompt_targets` — Generate prompt_targets from python methods
|
||||
Entry point: `main.py`. Built with `rich-click`. Commands: `up`, `down`, `build`, `logs`, `trace`, `init`, `cli_agent`, `generate_prompt_targets`.
|
||||
|
||||
Entry point: `cli/planoai/main.py`. Container lifecycle in `core.py`. Docker operations in `docker_cli.py`.
|
||||
### Config (config/)
|
||||
|
||||
### Configuration System (config/)
|
||||
- `plano_config_schema.yaml` — JSON Schema for validating user configs
|
||||
- `envoy.template.yaml` — Jinja2 template → Envoy config
|
||||
- `supervisord.conf` — Process supervisor for Envoy + brightstaff
|
||||
|
||||
- `plano_config_schema.yaml` — JSON Schema (draft-07) for validating user config files
|
||||
- `envoy.template.yaml` — Jinja2 template rendered into Envoy proxy config
|
||||
- `supervisord.conf` — Process supervisor for Envoy + brightstaff in the container
|
||||
### JS Apps (apps/, packages/)
|
||||
|
||||
User configs define: `agents` (id + url), `model_providers` (model + access_key), `listeners` (type: agent/model/prompt, with router strategy), `filters` (filter chains), and `tracing` settings.
|
||||
Turbo monorepo with Next.js 16 / React 19. Not part of the core proxy.
|
||||
|
||||
### JavaScript Apps (apps/, packages/)
|
||||
## WASM Plugin Rules
|
||||
|
||||
Turbo monorepo with Next.js 16 / React 19 applications and shared packages (UI components, Tailwind config, TypeScript config). Not part of the core proxy — these are web applications.
|
||||
Code in `prompt_gateway` and `llm_gateway` runs in Envoy's WASM sandbox:
|
||||
|
||||
- **No std networking/filesystem** — use proxy-wasm host calls only
|
||||
- **No tokio/async** — synchronous, callback-driven. `Action::Pause` / `Action::Continue` for flow control
|
||||
- **Lifecycle**: `RootContext` → `on_configure`, `create_http_context`; `HttpContext` → `on_http_request/response_headers/body`
|
||||
- **HTTP callouts**: `dispatch_http_call()` → store context in `callouts: RefCell<HashMap<u32, CallContext>>` → match in `on_http_call_response()`
|
||||
- **Config**: `Rc`-wrapped, loaded once in `on_configure()` via `serde_yaml::from_slice()`
|
||||
- **Dependencies must be no_std compatible** (e.g., `governor` with `features = ["no_std"]`)
|
||||
- **Crate type**: `cdylib` → produces `.wasm`
|
||||
|
||||
## Adding a New LLM Provider
|
||||
|
||||
1. Add variant to `ProviderId` in `crates/hermesllm/src/providers/id.rs` + `TryFrom<&str>`
|
||||
2. Create request/response types in `crates/hermesllm/src/apis/` if non-OpenAI format
|
||||
3. Add variant to `ProviderRequestType`/`ProviderResponseType` enums, update all match arms
|
||||
4. Add models to `crates/hermesllm/src/providers/provider_models.yaml`
|
||||
5. Update `SupportedUpstreamAPIs` mapping if needed
|
||||
|
||||
## Release Process
|
||||
|
||||
To prepare a release (e.g., bumping from `0.4.6` to `0.4.7`), update the version string in all of the following files:
|
||||
Update version (e.g., `0.4.11` → `0.4.12`) in all of these files:
|
||||
|
||||
**CI Workflow:**
|
||||
- `.github/workflows/ci.yml` — docker build/save tags
|
||||
- `.github/workflows/ci.yml`, `build_filter_image.sh`, `config/validate_plano_config.sh`
|
||||
- `cli/planoai/__init__.py`, `cli/planoai/consts.py`, `cli/pyproject.toml`
|
||||
- `docs/source/conf.py`, `docs/source/get_started/quickstart.rst`, `docs/source/resources/deployment.rst`
|
||||
- `apps/www/src/components/Hero.tsx`, `demos/llm_routing/preference_based_routing/README.md`
|
||||
|
||||
**CLI:**
|
||||
- `cli/planoai/__init__.py` — `__version__`
|
||||
- `cli/planoai/consts.py` — `PLANO_DOCKER_IMAGE` default
|
||||
- `cli/pyproject.toml` — `version`
|
||||
|
||||
**Build & Config:**
|
||||
- `build_filter_image.sh` — docker build tag
|
||||
- `config/validate_plano_config.sh` — docker image tag
|
||||
|
||||
**Docs:**
|
||||
- `docs/source/conf.py` — `release`
|
||||
- `docs/source/get_started/quickstart.rst` — install commands and example output
|
||||
- `docs/source/resources/deployment.rst` — docker image tag
|
||||
|
||||
**Website & Demos:**
|
||||
- `apps/www/src/components/Hero.tsx` — version badge
|
||||
- `demos/llm_routing/preference_based_routing/README.md` — example output
|
||||
|
||||
**Important:** Do NOT change `0.4.6` references in `*.lock` files or `Cargo.lock` — those refer to the `colorama` and `http-body` dependency versions, not Plano.
|
||||
|
||||
Commit message format: `release X.Y.Z`
|
||||
Do NOT change version strings in `*.lock` files or `Cargo.lock`. Commit message: `release X.Y.Z`
|
||||
|
||||
## Workflow Preferences
|
||||
|
||||
- **Git commits:** Do NOT add `Co-Authored-By` lines. Keep commit messages short and concise (one line, no verbose descriptions). NEVER commit and push directly to `main`—always use a feature branch and PR.
|
||||
- **Git branches:** Use the format `<github_username>/<feature_name>` when creating branches for PRs. Determine the username from `gh api user --jq .login`.
|
||||
- **GitHub issues:** When a GitHub issue URL is pasted, fetch all requirements and context from the issue first. The end goal is always a PR with all tests passing.
|
||||
- **Commits:** No `Co-Authored-By`. Short one-line messages. Never push directly to `main` — always feature branch + PR.
|
||||
- **Branches:** Use `adil/<feature_name>` format.
|
||||
- **Issues:** When a GitHub issue URL is pasted, fetch all context first. Goal is always a PR with passing tests.
|
||||
|
||||
## Key Conventions
|
||||
|
||||
- Rust edition 2021, formatted with `cargo fmt`, linted with `cargo clippy -D warnings`
|
||||
- Python formatted with Black
|
||||
- WASM plugins must target `wasm32-wasip1` — they run inside Envoy, not as native binaries
|
||||
- The Docker image bundles Envoy + WASM plugins + brightstaff + Python CLI into a single container managed by supervisord
|
||||
- API keys come from environment variables or `.env` files, never hardcoded
|
||||
- Rust edition 2021, `cargo fmt`, `cargo clippy -D warnings`
|
||||
- Python: Black. Rust errors: `thiserror` with `#[from]`
|
||||
- API keys from env vars or `.env`, never hardcoded
|
||||
- Provider dispatch: `ProviderRequestType`/`ProviderResponseType` enums implementing `ProviderRequest`/`ProviderResponse` traits
|
||||
|
|
|
|||
|
|
@ -49,6 +49,7 @@ FROM python:3.14-slim AS arch
|
|||
|
||||
RUN set -eux; \
|
||||
apt-get update; \
|
||||
apt-get upgrade -y; \
|
||||
apt-get install -y --no-install-recommends gettext-base curl; \
|
||||
apt-get clean; rm -rf /var/lib/apt/lists/*
|
||||
|
||||
|
|
|
|||
|
|
@ -24,7 +24,7 @@ export function Hero() {
|
|||
>
|
||||
<div className="inline-flex flex-wrap items-center gap-1.5 sm:gap-2 px-3 sm:px-4 py-1 rounded-full bg-[rgba(185,191,255,0.4)] border border-[var(--secondary)] shadow backdrop-blur hover:bg-[rgba(185,191,255,0.6)] transition-colors cursor-pointer">
|
||||
<span className="text-xs sm:text-sm font-medium text-black/65">
|
||||
v0.4.11
|
||||
v0.4.12
|
||||
</span>
|
||||
<span className="text-xs sm:text-sm font-medium text-black ">
|
||||
—
|
||||
|
|
|
|||
|
|
@ -1 +1 @@
|
|||
docker build -f Dockerfile . -t katanemo/plano -t katanemo/plano:0.4.11
|
||||
docker build -f Dockerfile . -t katanemo/plano -t katanemo/plano:0.4.12
|
||||
|
|
|
|||
|
|
@ -1,3 +1,3 @@
|
|||
"""Plano CLI - Intelligent Prompt Gateway."""
|
||||
|
||||
__version__ = "0.4.11"
|
||||
__version__ = "0.4.12"
|
||||
|
|
|
|||
|
|
@ -3,18 +3,17 @@ import os
|
|||
from planoai.utils import convert_legacy_listeners
|
||||
from jinja2 import Environment, FileSystemLoader
|
||||
import yaml
|
||||
from jsonschema import validate
|
||||
from jsonschema import validate, ValidationError
|
||||
from urllib.parse import urlparse
|
||||
from copy import deepcopy
|
||||
from planoai.consts import DEFAULT_OTEL_TRACING_GRPC_ENDPOINT
|
||||
|
||||
|
||||
SUPPORTED_PROVIDERS_WITH_BASE_URL = [
|
||||
"azure_openai",
|
||||
"ollama",
|
||||
"qwen",
|
||||
"amazon_bedrock",
|
||||
"arch",
|
||||
"plano",
|
||||
]
|
||||
|
||||
SUPPORTED_PROVIDERS_WITHOUT_BASE_URL = [
|
||||
|
|
@ -368,47 +367,52 @@ def validate_and_render_schema():
|
|||
llms_with_endpoint.append(model_provider)
|
||||
llms_with_endpoint_cluster_names.add(cluster_name)
|
||||
|
||||
if len(model_usage_name_keys) > 0:
|
||||
routing_model_provider = config_yaml.get("routing", {}).get(
|
||||
"model_provider", None
|
||||
overrides_config = config_yaml.get("overrides", {})
|
||||
# Build lookup of model names (already prefix-stripped by config processing)
|
||||
model_name_set = {mp.get("model") for mp in updated_model_providers}
|
||||
|
||||
# Auto-add arch-router provider if routing preferences exist and no provider matches the router model
|
||||
router_model = overrides_config.get("llm_routing_model", "Arch-Router")
|
||||
# Strip provider prefix for comparison since config processing strips prefixes from model names
|
||||
router_model_id = (
|
||||
router_model.split("/", 1)[1] if "/" in router_model else router_model
|
||||
)
|
||||
if len(model_usage_name_keys) > 0 and router_model_id not in model_name_set:
|
||||
updated_model_providers.append(
|
||||
{
|
||||
"name": "arch-router",
|
||||
"provider_interface": "plano",
|
||||
"model": router_model_id,
|
||||
"internal": True,
|
||||
}
|
||||
)
|
||||
if (
|
||||
routing_model_provider
|
||||
and routing_model_provider not in model_provider_name_set
|
||||
):
|
||||
raise Exception(
|
||||
f"Routing model_provider {routing_model_provider} is not defined in model_providers"
|
||||
)
|
||||
if (
|
||||
routing_model_provider is None
|
||||
and "arch-router" not in model_provider_name_set
|
||||
):
|
||||
updated_model_providers.append(
|
||||
{
|
||||
"name": "arch-router",
|
||||
"provider_interface": "arch",
|
||||
"model": config_yaml.get("routing", {}).get("model", "Arch-Router"),
|
||||
"internal": True,
|
||||
}
|
||||
)
|
||||
|
||||
# Always add arch-function model provider if not already defined
|
||||
if "arch-function" not in model_provider_name_set:
|
||||
updated_model_providers.append(
|
||||
{
|
||||
"name": "arch-function",
|
||||
"provider_interface": "arch",
|
||||
"provider_interface": "plano",
|
||||
"model": "Arch-Function",
|
||||
"internal": True,
|
||||
}
|
||||
)
|
||||
|
||||
if "plano-orchestrator" not in model_provider_name_set:
|
||||
# Auto-add plano-orchestrator provider if no provider matches the orchestrator model
|
||||
orchestrator_model = overrides_config.get(
|
||||
"agent_orchestration_model", "Plano-Orchestrator"
|
||||
)
|
||||
orchestrator_model_id = (
|
||||
orchestrator_model.split("/", 1)[1]
|
||||
if "/" in orchestrator_model
|
||||
else orchestrator_model
|
||||
)
|
||||
if orchestrator_model_id not in model_name_set:
|
||||
updated_model_providers.append(
|
||||
{
|
||||
"name": "plano-orchestrator",
|
||||
"provider_interface": "arch",
|
||||
"model": "Plano-Orchestrator",
|
||||
"name": "plano/orchestrator",
|
||||
"provider_interface": "plano",
|
||||
"model": orchestrator_model_id,
|
||||
"internal": True,
|
||||
}
|
||||
)
|
||||
|
|
@ -503,11 +507,15 @@ def validate_prompt_config(plano_config_file, plano_config_schema_file):
|
|||
|
||||
try:
|
||||
validate(config_yaml, config_schema_yaml)
|
||||
except Exception as e:
|
||||
print(
|
||||
f"Error validating plano_config file: {plano_config_file}, schema file: {plano_config_schema_file}, error: {e}"
|
||||
except ValidationError as e:
|
||||
path = (
|
||||
" → ".join(str(p) for p in e.absolute_path) if e.absolute_path else "root"
|
||||
)
|
||||
raise e
|
||||
raise ValidationError(
|
||||
f"{e.message}\n Location: {path}\n Value: {e.instance}"
|
||||
) from None
|
||||
except Exception as e:
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ PLANO_COLOR = "#969FF4"
|
|||
|
||||
SERVICE_NAME_ARCHGW = "plano"
|
||||
PLANO_DOCKER_NAME = "plano"
|
||||
PLANO_DOCKER_IMAGE = os.getenv("PLANO_DOCKER_IMAGE", "katanemo/plano:0.4.11")
|
||||
PLANO_DOCKER_IMAGE = os.getenv("PLANO_DOCKER_IMAGE", "katanemo/plano:0.4.12")
|
||||
DEFAULT_OTEL_TRACING_GRPC_ENDPOINT = "http://localhost:4317"
|
||||
|
||||
# Native mode constants
|
||||
|
|
|
|||
|
|
@ -420,9 +420,16 @@ def native_validate_config(plano_config_file):
|
|||
with _temporary_env(overrides):
|
||||
from planoai.config_generator import validate_and_render_schema
|
||||
|
||||
# Suppress verbose print output from config_generator
|
||||
with contextlib.redirect_stdout(io.StringIO()):
|
||||
validate_and_render_schema()
|
||||
# Suppress verbose print output from config_generator but capture errors
|
||||
captured = io.StringIO()
|
||||
try:
|
||||
with contextlib.redirect_stdout(captured):
|
||||
validate_and_render_schema()
|
||||
except SystemExit:
|
||||
# validate_and_render_schema calls exit(1) on failure after
|
||||
# printing to stdout; re-raise so the caller gets a useful message.
|
||||
output = captured.getvalue().strip()
|
||||
raise Exception(output) if output else Exception("Config validation failed")
|
||||
|
||||
|
||||
def native_logs(debug=False, follow=False):
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
[project]
|
||||
name = "planoai"
|
||||
version = "0.4.11"
|
||||
version = "0.4.12"
|
||||
description = "Python-based CLI tool to manage Plano."
|
||||
authors = [{name = "Katanemo Labs, Inc."}]
|
||||
readme = "README.md"
|
||||
|
|
|
|||
2
cli/uv.lock
generated
2
cli/uv.lock
generated
|
|
@ -337,7 +337,7 @@ wheels = [
|
|||
|
||||
[[package]]
|
||||
name = "planoai"
|
||||
version = "0.4.9"
|
||||
version = "0.4.12"
|
||||
source = { editable = "." }
|
||||
dependencies = [
|
||||
{ name = "click" },
|
||||
|
|
|
|||
|
|
@ -594,13 +594,13 @@ static_resources:
|
|||
|
||||
clusters:
|
||||
|
||||
- name: arch
|
||||
- name: plano
|
||||
connect_timeout: {{ upstream_connect_timeout | default('5s') }}
|
||||
type: LOGICAL_DNS
|
||||
dns_lookup_family: V4_ONLY
|
||||
lb_policy: ROUND_ROBIN
|
||||
load_assignment:
|
||||
cluster_name: arch
|
||||
cluster_name: plano
|
||||
endpoints:
|
||||
- lb_endpoints:
|
||||
- endpoint:
|
||||
|
|
|
|||
|
|
@ -173,7 +173,7 @@ properties:
|
|||
provider_interface:
|
||||
type: string
|
||||
enum:
|
||||
- arch
|
||||
- plano
|
||||
- claude
|
||||
- deepseek
|
||||
- groq
|
||||
|
|
@ -220,7 +220,7 @@ properties:
|
|||
provider_interface:
|
||||
type: string
|
||||
enum:
|
||||
- arch
|
||||
- plano
|
||||
- claude
|
||||
- deepseek
|
||||
- groq
|
||||
|
|
@ -271,6 +271,12 @@ properties:
|
|||
upstream_tls_ca_path:
|
||||
type: string
|
||||
description: "Path to the trusted CA bundle for upstream TLS verification. Default is '/etc/ssl/certs/ca-certificates.crt'."
|
||||
llm_routing_model:
|
||||
type: string
|
||||
description: "Model name for the LLM router (e.g., 'Arch-Router'). Must match a model in model_providers."
|
||||
agent_orchestration_model:
|
||||
type: string
|
||||
description: "Model name for the agent orchestrator (e.g., 'Plano-Orchestrator'). Must match a model in model_providers."
|
||||
system_prompt:
|
||||
type: string
|
||||
prompt_targets:
|
||||
|
|
@ -408,14 +414,6 @@ properties:
|
|||
enum:
|
||||
- llm
|
||||
- prompt
|
||||
routing:
|
||||
type: object
|
||||
properties:
|
||||
llm_provider:
|
||||
type: string
|
||||
model:
|
||||
type: string
|
||||
additionalProperties: false
|
||||
state_storage:
|
||||
type: object
|
||||
properties:
|
||||
|
|
|
|||
|
|
@ -178,6 +178,7 @@ mod tests {
|
|||
Arc::new(OrchestratorService::new(
|
||||
"http://localhost:8080".to_string(),
|
||||
"test-model".to_string(),
|
||||
"plano-orchestrator".to_string(),
|
||||
))
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -22,6 +22,7 @@ mod tests {
|
|||
Arc::new(OrchestratorService::new(
|
||||
"http://localhost:8080".to_string(),
|
||||
"test-model".to_string(),
|
||||
"plano-orchestrator".to_string(),
|
||||
))
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -12,9 +12,7 @@ use brightstaff::state::StateStorage;
|
|||
use brightstaff::tracing::init_tracer;
|
||||
use bytes::Bytes;
|
||||
use common::configuration::Configuration;
|
||||
use common::consts::{
|
||||
CHAT_COMPLETIONS_PATH, MESSAGES_PATH, OPENAI_RESPONSES_API_PATH, PLANO_ORCHESTRATOR_MODEL_NAME,
|
||||
};
|
||||
use common::consts::{CHAT_COMPLETIONS_PATH, MESSAGES_PATH, OPENAI_RESPONSES_API_PATH};
|
||||
use common::llm_providers::LlmProviders;
|
||||
use http_body_util::{combinators::BoxBody, BodyExt, Empty};
|
||||
use hyper::body::Incoming;
|
||||
|
|
@ -35,6 +33,8 @@ use tracing::{debug, info, warn};
|
|||
const BIND_ADDRESS: &str = "0.0.0.0:9091";
|
||||
const DEFAULT_ROUTING_LLM_PROVIDER: &str = "arch-router";
|
||||
const DEFAULT_ROUTING_MODEL_NAME: &str = "Arch-Router";
|
||||
const DEFAULT_ORCHESTRATOR_LLM_PROVIDER: &str = "plano-orchestrator";
|
||||
const DEFAULT_ORCHESTRATOR_MODEL_NAME: &str = "Plano-Orchestrator";
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helpers
|
||||
|
|
@ -111,16 +111,20 @@ async fn init_app_state(
|
|||
let llm_providers = LlmProviders::try_from(config.model_providers.clone())
|
||||
.map_err(|e| format!("failed to create LlmProviders: {e}"))?;
|
||||
|
||||
let routing_model_name = config
|
||||
.routing
|
||||
.as_ref()
|
||||
.and_then(|r| r.model.clone())
|
||||
.unwrap_or_else(|| DEFAULT_ROUTING_MODEL_NAME.to_string());
|
||||
let overrides = config.overrides.clone().unwrap_or_default();
|
||||
|
||||
let routing_model_name: String = overrides
|
||||
.llm_routing_model
|
||||
.as_deref()
|
||||
.map(|m| m.split_once('/').map(|(_, id)| id).unwrap_or(m))
|
||||
.unwrap_or(DEFAULT_ROUTING_MODEL_NAME)
|
||||
.to_string();
|
||||
|
||||
let routing_llm_provider = config
|
||||
.routing
|
||||
.as_ref()
|
||||
.and_then(|r| r.model_provider.clone())
|
||||
.model_providers
|
||||
.iter()
|
||||
.find(|p| p.model.as_deref() == Some(routing_model_name.as_str()))
|
||||
.map(|p| p.name.clone())
|
||||
.unwrap_or_else(|| DEFAULT_ROUTING_LLM_PROVIDER.to_string());
|
||||
|
||||
let router_service = Arc::new(RouterService::new(
|
||||
|
|
@ -130,9 +134,24 @@ async fn init_app_state(
|
|||
routing_llm_provider,
|
||||
));
|
||||
|
||||
let orchestrator_model_name: String = overrides
|
||||
.agent_orchestration_model
|
||||
.as_deref()
|
||||
.map(|m| m.split_once('/').map(|(_, id)| id).unwrap_or(m))
|
||||
.unwrap_or(DEFAULT_ORCHESTRATOR_MODEL_NAME)
|
||||
.to_string();
|
||||
|
||||
let orchestrator_llm_provider: String = config
|
||||
.model_providers
|
||||
.iter()
|
||||
.find(|p| p.model.as_deref() == Some(orchestrator_model_name.as_str()))
|
||||
.map(|p| p.name.clone())
|
||||
.unwrap_or_else(|| DEFAULT_ORCHESTRATOR_LLM_PROVIDER.to_string());
|
||||
|
||||
let orchestrator_service = Arc::new(OrchestratorService::new(
|
||||
format!("{llm_provider_url}{CHAT_COMPLETIONS_PATH}"),
|
||||
PLANO_ORCHESTRATOR_MODEL_NAME.to_string(),
|
||||
orchestrator_model_name,
|
||||
orchestrator_llm_provider,
|
||||
));
|
||||
|
||||
let state_storage = init_state_storage(config).await?;
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@ use std::{collections::HashMap, sync::Arc};
|
|||
|
||||
use common::{
|
||||
configuration::{AgentUsagePreference, OrchestrationPreference},
|
||||
consts::{ARCH_PROVIDER_HINT_HEADER, PLANO_ORCHESTRATOR_MODEL_NAME, REQUEST_ID_HEADER},
|
||||
consts::{ARCH_PROVIDER_HINT_HEADER, REQUEST_ID_HEADER},
|
||||
};
|
||||
use hermesllm::apis::openai::Message;
|
||||
use hyper::header;
|
||||
|
|
@ -20,6 +20,7 @@ pub struct OrchestratorService {
|
|||
orchestrator_url: String,
|
||||
client: reqwest::Client,
|
||||
orchestrator_model: Arc<dyn OrchestratorModel>,
|
||||
orchestrator_provider_name: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
|
|
@ -34,7 +35,11 @@ pub enum OrchestrationError {
|
|||
pub type Result<T> = std::result::Result<T, OrchestrationError>;
|
||||
|
||||
impl OrchestratorService {
|
||||
pub fn new(orchestrator_url: String, orchestration_model_name: String) -> Self {
|
||||
pub fn new(
|
||||
orchestrator_url: String,
|
||||
orchestration_model_name: String,
|
||||
orchestrator_provider_name: String,
|
||||
) -> Self {
|
||||
let agent_orchestrations: HashMap<String, Vec<OrchestrationPreference>> = HashMap::new();
|
||||
|
||||
let orchestrator_model = Arc::new(orchestrator_model_v1::OrchestratorModelV1::new(
|
||||
|
|
@ -47,6 +52,7 @@ impl OrchestratorService {
|
|||
orchestrator_url,
|
||||
client: reqwest::Client::new(),
|
||||
orchestrator_model,
|
||||
orchestrator_provider_name,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -88,7 +94,8 @@ impl OrchestratorService {
|
|||
);
|
||||
headers.insert(
|
||||
header::HeaderName::from_static(ARCH_PROVIDER_HINT_HEADER),
|
||||
header::HeaderValue::from_static(PLANO_ORCHESTRATOR_MODEL_NAME),
|
||||
header::HeaderValue::from_str(&self.orchestrator_provider_name)
|
||||
.unwrap_or_else(|_| header::HeaderValue::from_static("plano-orchestrator")),
|
||||
);
|
||||
|
||||
// Inject OpenTelemetry trace context from current span
|
||||
|
|
@ -106,7 +113,8 @@ impl OrchestratorService {
|
|||
|
||||
headers.insert(
|
||||
header::HeaderName::from_static("model"),
|
||||
header::HeaderValue::from_static(PLANO_ORCHESTRATOR_MODEL_NAME),
|
||||
header::HeaderValue::from_str(&self.orchestrator_provider_name)
|
||||
.unwrap_or_else(|_| header::HeaderValue::from_static("plano-orchestrator")),
|
||||
);
|
||||
|
||||
let Some((content, elapsed)) =
|
||||
|
|
|
|||
|
|
@ -7,12 +7,6 @@ use crate::api::open_ai::{
|
|||
ChatCompletionTool, FunctionDefinition, FunctionParameter, FunctionParameters, ParameterType,
|
||||
};
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Routing {
|
||||
pub model_provider: Option<String>,
|
||||
pub model: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ModelAlias {
|
||||
pub target: String,
|
||||
|
|
@ -72,7 +66,6 @@ pub struct Configuration {
|
|||
pub ratelimits: Option<Vec<Ratelimit>>,
|
||||
pub tracing: Option<Tracing>,
|
||||
pub mode: Option<GatewayMode>,
|
||||
pub routing: Option<Routing>,
|
||||
pub agents: Option<Vec<Agent>>,
|
||||
pub filters: Option<Vec<Agent>>,
|
||||
pub listeners: Vec<Listener>,
|
||||
|
|
@ -84,6 +77,8 @@ pub struct Overrides {
|
|||
pub prompt_target_intent_matching_threshold: Option<f64>,
|
||||
pub optimize_context_window: Option<bool>,
|
||||
pub use_agent_orchestrator: Option<bool>,
|
||||
pub llm_routing_model: Option<String>,
|
||||
pub agent_orchestration_model: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
|
|
@ -207,8 +202,6 @@ pub struct EmbeddingProviver {
|
|||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Hash)]
|
||||
pub enum LlmProviderType {
|
||||
#[serde(rename = "arch")]
|
||||
Arch,
|
||||
#[serde(rename = "anthropic")]
|
||||
Anthropic,
|
||||
#[serde(rename = "deepseek")]
|
||||
|
|
@ -237,12 +230,13 @@ pub enum LlmProviderType {
|
|||
Qwen,
|
||||
#[serde(rename = "amazon_bedrock")]
|
||||
AmazonBedrock,
|
||||
#[serde(rename = "plano")]
|
||||
Plano,
|
||||
}
|
||||
|
||||
impl Display for LlmProviderType {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
match self {
|
||||
LlmProviderType::Arch => write!(f, "arch"),
|
||||
LlmProviderType::Anthropic => write!(f, "anthropic"),
|
||||
LlmProviderType::Deepseek => write!(f, "deepseek"),
|
||||
LlmProviderType::Groq => write!(f, "groq"),
|
||||
|
|
@ -257,6 +251,7 @@ impl Display for LlmProviderType {
|
|||
LlmProviderType::Zhipu => write!(f, "zhipu"),
|
||||
LlmProviderType::Qwen => write!(f, "qwen"),
|
||||
LlmProviderType::AmazonBedrock => write!(f, "amazon_bedrock"),
|
||||
LlmProviderType::Plano => write!(f, "plano"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -591,14 +586,14 @@ mod test {
|
|||
},
|
||||
LlmProvider {
|
||||
name: "arch-router".to_string(),
|
||||
provider_interface: LlmProviderType::Arch,
|
||||
provider_interface: LlmProviderType::Plano,
|
||||
model: Some("Arch-Router".to_string()),
|
||||
internal: Some(true),
|
||||
..Default::default()
|
||||
},
|
||||
LlmProvider {
|
||||
name: "plano-orchestrator".to_string(),
|
||||
provider_interface: LlmProviderType::Arch,
|
||||
provider_interface: LlmProviderType::Plano,
|
||||
model: Some("Plano-Orchestrator".to_string()),
|
||||
internal: Some(true),
|
||||
..Default::default()
|
||||
|
|
|
|||
|
|
@ -33,5 +33,4 @@ pub const OTEL_COLLECTOR_HTTP: &str = "opentelemetry_collector_http";
|
|||
pub const LLM_ROUTE_HEADER: &str = "x-arch-llm-route";
|
||||
pub const ENVOY_RETRY_HEADER: &str = "x-envoy-max-retries";
|
||||
pub const BRIGHT_STAFF_SERVICE_NAME: &str = "brightstaff";
|
||||
pub const PLANO_ORCHESTRATOR_MODEL_NAME: &str = "Plano-Orchestrator";
|
||||
pub const ARCH_FC_CLUSTER: &str = "arch";
|
||||
pub const PLANO_FC_CLUSTER: &str = "plano";
|
||||
|
|
|
|||
|
|
@ -1,183 +1,16 @@
|
|||
version: '1.0'
|
||||
source: canonical-apis
|
||||
providers:
|
||||
mistralai:
|
||||
- mistralai/mistral-medium-2505
|
||||
- mistralai/mistral-medium-2508
|
||||
- mistralai/mistral-medium-latest
|
||||
- mistralai/mistral-medium
|
||||
- mistralai/mistral-vibe-cli-with-tools
|
||||
- mistralai/open-mistral-nemo
|
||||
- mistralai/open-mistral-nemo-2407
|
||||
- mistralai/mistral-tiny-2407
|
||||
- mistralai/mistral-tiny-latest
|
||||
- mistralai/mistral-large-2411
|
||||
- mistralai/pixtral-large-2411
|
||||
- mistralai/pixtral-large-latest
|
||||
- mistralai/mistral-large-pixtral-2411
|
||||
- mistralai/codestral-2508
|
||||
- mistralai/codestral-latest
|
||||
- mistralai/devstral-small-2507
|
||||
- mistralai/devstral-medium-2507
|
||||
- mistralai/devstral-2512
|
||||
- mistralai/mistral-vibe-cli-latest
|
||||
- mistralai/devstral-medium-latest
|
||||
- mistralai/devstral-latest
|
||||
- mistralai/labs-devstral-small-2512
|
||||
- mistralai/devstral-small-latest
|
||||
- mistralai/mistral-small-2506
|
||||
- mistralai/mistral-small-latest
|
||||
- mistralai/labs-mistral-small-creative
|
||||
- mistralai/magistral-medium-2509
|
||||
- mistralai/magistral-medium-latest
|
||||
- mistralai/magistral-small-2509
|
||||
- mistralai/magistral-small-latest
|
||||
- mistralai/mistral-large-2512
|
||||
- mistralai/mistral-large-latest
|
||||
- mistralai/ministral-3b-2512
|
||||
- mistralai/ministral-3b-latest
|
||||
- mistralai/ministral-8b-2512
|
||||
- mistralai/ministral-8b-latest
|
||||
- mistralai/ministral-14b-2512
|
||||
- mistralai/ministral-14b-latest
|
||||
- mistralai/mistral-small-2501
|
||||
- mistralai/mistral-embed-2312
|
||||
- mistralai/mistral-embed
|
||||
- mistralai/codestral-embed
|
||||
- mistralai/codestral-embed-2505
|
||||
openai:
|
||||
- openai/gpt-4-0613
|
||||
- openai/gpt-4
|
||||
- openai/gpt-3.5-turbo
|
||||
- openai/gpt-5.2-codex
|
||||
- openai/gpt-3.5-turbo-instruct
|
||||
- openai/gpt-3.5-turbo-instruct-0914
|
||||
- openai/gpt-4-1106-preview
|
||||
- openai/gpt-3.5-turbo-1106
|
||||
- openai/gpt-4-0125-preview
|
||||
- openai/gpt-4-turbo-preview
|
||||
- openai/gpt-3.5-turbo-0125
|
||||
- openai/gpt-4-turbo
|
||||
- openai/gpt-4-turbo-2024-04-09
|
||||
- openai/gpt-4o
|
||||
- openai/gpt-4o-2024-05-13
|
||||
- openai/gpt-4o-mini-2024-07-18
|
||||
- openai/gpt-4o-mini
|
||||
- openai/gpt-4o-2024-08-06
|
||||
- openai/chatgpt-4o-latest
|
||||
- openai/o1-2024-12-17
|
||||
- openai/o1
|
||||
- openai/computer-use-preview
|
||||
- openai/o3-mini
|
||||
- openai/o3-mini-2025-01-31
|
||||
- openai/gpt-4o-2024-11-20
|
||||
- openai/computer-use-preview-2025-03-11
|
||||
- openai/gpt-4o-search-preview-2025-03-11
|
||||
- openai/gpt-4o-search-preview
|
||||
- openai/gpt-4o-mini-search-preview-2025-03-11
|
||||
- openai/gpt-4o-mini-search-preview
|
||||
- openai/o1-pro-2025-03-19
|
||||
- openai/o1-pro
|
||||
- openai/o3-2025-04-16
|
||||
- openai/o4-mini-2025-04-16
|
||||
- openai/o3
|
||||
- openai/o4-mini
|
||||
- openai/gpt-4.1-2025-04-14
|
||||
- openai/gpt-4.1
|
||||
- openai/gpt-4.1-mini-2025-04-14
|
||||
- openai/gpt-4.1-mini
|
||||
- openai/gpt-4.1-nano-2025-04-14
|
||||
- openai/gpt-4.1-nano
|
||||
- openai/o3-pro
|
||||
- openai/o3-pro-2025-06-10
|
||||
- openai/o4-mini-deep-research
|
||||
- openai/o3-deep-research
|
||||
- openai/o3-deep-research-2025-06-26
|
||||
- openai/o4-mini-deep-research-2025-06-26
|
||||
- openai/gpt-5-chat-latest
|
||||
- openai/gpt-5-2025-08-07
|
||||
- openai/gpt-5
|
||||
- openai/gpt-5-mini-2025-08-07
|
||||
- openai/gpt-5-mini
|
||||
- openai/gpt-5-nano-2025-08-07
|
||||
- openai/gpt-5-nano
|
||||
- openai/gpt-5-codex
|
||||
- openai/gpt-5-pro-2025-10-06
|
||||
- openai/gpt-5-pro
|
||||
- openai/gpt-5-search-api
|
||||
- openai/gpt-5-search-api-2025-10-14
|
||||
- openai/gpt-5.1-chat-latest
|
||||
- openai/gpt-5.1-2025-11-13
|
||||
- openai/gpt-5.1
|
||||
- openai/gpt-5.1-codex
|
||||
- openai/gpt-5.1-codex-mini
|
||||
- openai/gpt-5.1-codex-max
|
||||
- openai/gpt-5.2-2025-12-11
|
||||
- openai/gpt-5.2
|
||||
- openai/gpt-5.2-pro-2025-12-11
|
||||
- openai/gpt-5.2-pro
|
||||
- openai/gpt-5.2-chat-latest
|
||||
- openai/gpt-3.5-turbo-16k
|
||||
- openai/ft:gpt-3.5-turbo-0613:katanemo::8CMZbm0P
|
||||
deepseek:
|
||||
- deepseek/deepseek-chat
|
||||
- deepseek/deepseek-reasoner
|
||||
x-ai:
|
||||
- x-ai/grok-2-vision-1212
|
||||
- x-ai/grok-3
|
||||
- x-ai/grok-3-mini
|
||||
- x-ai/grok-4-0709
|
||||
- x-ai/grok-4-1-fast-non-reasoning
|
||||
- x-ai/grok-4-1-fast-reasoning
|
||||
- x-ai/grok-4-fast-non-reasoning
|
||||
- x-ai/grok-4-fast-reasoning
|
||||
- x-ai/grok-code-fast-1
|
||||
- x-ai/grok-imagine-image
|
||||
- x-ai/grok-imagine-video
|
||||
moonshotai:
|
||||
- moonshotai/kimi-k2-thinking
|
||||
- moonshotai/kimi-k2.5
|
||||
- moonshotai/moonshot-v1-128k-vision-preview
|
||||
- moonshotai/moonshot-v1-8k
|
||||
- moonshotai/kimi-k2-turbo-preview
|
||||
- moonshotai/moonshot-v1-128k
|
||||
- moonshotai/moonshot-v1-32k-vision-preview
|
||||
- moonshotai/kimi-k2-thinking-turbo
|
||||
- moonshotai/kimi-latest
|
||||
- moonshotai/moonshot-v1-32k
|
||||
- moonshotai/moonshot-v1-auto
|
||||
- moonshotai/kimi-k2-0711-preview
|
||||
- moonshotai/kimi-k2-0905-preview
|
||||
- moonshotai/moonshot-v1-8k-vision-preview
|
||||
anthropic:
|
||||
- anthropic/claude-opus-4-6
|
||||
- anthropic/claude-opus-4-5-20251101
|
||||
- anthropic/claude-opus-4-5
|
||||
- anthropic/claude-haiku-4-5-20251001
|
||||
- anthropic/claude-haiku-4-5
|
||||
- anthropic/claude-sonnet-4-5-20250929
|
||||
- anthropic/claude-sonnet-4-5
|
||||
- anthropic/claude-opus-4-1-20250805
|
||||
- anthropic/claude-opus-4-1
|
||||
- anthropic/claude-opus-4-20250514
|
||||
- anthropic/claude-opus-4
|
||||
- anthropic/claude-sonnet-4-20250514
|
||||
- anthropic/claude-sonnet-4
|
||||
- anthropic/claude-3-7-sonnet-20250219
|
||||
- anthropic/claude-3-7-sonnet
|
||||
- anthropic/claude-3-5-haiku-20241022
|
||||
- anthropic/claude-3-5-haiku
|
||||
- anthropic/claude-3-haiku-20240307
|
||||
- anthropic/claude-3-haiku
|
||||
google:
|
||||
- google/gemini-2.5-flash
|
||||
- google/gemini-2.5-pro
|
||||
- google/gemini-2.0-flash
|
||||
- google/gemini-2.0-flash-001
|
||||
- google/gemini-2.0-flash-exp-image-generation
|
||||
- google/gemini-2.0-flash-lite-001
|
||||
- google/gemini-2.0-flash-lite
|
||||
- google/gemini-exp-1206
|
||||
- google/gemini-2.5-flash-preview-tts
|
||||
- google/gemini-2.5-pro-preview-tts
|
||||
- google/gemma-3-1b-it
|
||||
|
|
@ -191,12 +24,15 @@ providers:
|
|||
- google/gemini-pro-latest
|
||||
- google/gemini-2.5-flash-lite
|
||||
- google/gemini-2.5-flash-image
|
||||
- google/gemini-2.5-flash-preview-09-2025
|
||||
- google/gemini-2.5-flash-lite-preview-09-2025
|
||||
- google/gemini-3-pro-preview
|
||||
- google/gemini-3-flash-preview
|
||||
- google/gemini-3.1-pro-preview
|
||||
- google/gemini-3.1-pro-preview-customtools
|
||||
- google/gemini-3.1-flash-lite-preview
|
||||
- google/gemini-3-pro-image-preview
|
||||
- google/nano-banana-pro-preview
|
||||
- google/gemini-3.1-flash-image-preview
|
||||
- google/gemini-robotics-er-1.5-preview
|
||||
- google/gemini-2.5-computer-use-preview-10-2025
|
||||
- google/deep-research-pro-preview-12-2025
|
||||
|
|
@ -212,7 +48,37 @@ providers:
|
|||
- amazon/amazon.nova-premier-v1:0
|
||||
- amazon/amazon.nova-lite-v1:0
|
||||
- amazon/amazon.nova-micro-v1:0
|
||||
x-ai:
|
||||
- x-ai/grok-3
|
||||
- x-ai/grok-3-mini
|
||||
- x-ai/grok-4-0709
|
||||
- x-ai/grok-4-1-fast-non-reasoning
|
||||
- x-ai/grok-4-1-fast-reasoning
|
||||
- x-ai/grok-4-fast-non-reasoning
|
||||
- x-ai/grok-4-fast-reasoning
|
||||
- x-ai/grok-4.20-beta-0309-non-reasoning
|
||||
- x-ai/grok-4.20-beta-0309-reasoning
|
||||
- x-ai/grok-4.20-multi-agent-beta-0309
|
||||
- x-ai/grok-code-fast-1
|
||||
- x-ai/grok-imagine-image
|
||||
- x-ai/grok-imagine-video
|
||||
z-ai:
|
||||
- z-ai/glm-4.5
|
||||
- z-ai/glm-4.5-air
|
||||
- z-ai/glm-4.6
|
||||
- z-ai/glm-4.7
|
||||
- z-ai/glm-5
|
||||
qwen:
|
||||
- qwen/qwen3-asr-flash-2026-02-10
|
||||
- qwen/qwen3.5-flash-2026-02-23
|
||||
- qwen/qwen3.5-flash
|
||||
- qwen/qwen3.5-122b-a10b
|
||||
- qwen/qwen3.5-35b-a3b
|
||||
- qwen/qwen3.5-27b
|
||||
- qwen/qwen3-coder-next
|
||||
- qwen/qwen3.5-397b-a17b
|
||||
- qwen/qwen3.5-plus-2026-02-15
|
||||
- qwen/qwen3.5-plus
|
||||
- qwen/qwen3-vl-flash-2026-01-22
|
||||
- qwen/qwen3-max-2026-01-23
|
||||
- qwen/qwen-plus-character
|
||||
|
|
@ -294,13 +160,161 @@ providers:
|
|||
- qwen/qwen-max
|
||||
- qwen/qwen-plus
|
||||
- qwen/qwen-turbo
|
||||
z-ai:
|
||||
- z-ai/glm-4.5
|
||||
- z-ai/glm-4.5-air
|
||||
- z-ai/glm-4.6
|
||||
- z-ai/glm-4.7
|
||||
- z-ai/glm-5
|
||||
mistralai:
|
||||
- mistralai/mistral-medium-2505
|
||||
- mistralai/mistral-medium-2508
|
||||
- mistralai/mistral-medium-latest
|
||||
- mistralai/mistral-medium
|
||||
- mistralai/mistral-vibe-cli-with-tools
|
||||
- mistralai/open-mistral-nemo
|
||||
- mistralai/open-mistral-nemo-2407
|
||||
- mistralai/mistral-tiny-2407
|
||||
- mistralai/mistral-tiny-latest
|
||||
- mistralai/codestral-2508
|
||||
- mistralai/codestral-latest
|
||||
- mistralai/devstral-2512
|
||||
- mistralai/mistral-vibe-cli-latest
|
||||
- mistralai/devstral-medium-latest
|
||||
- mistralai/devstral-latest
|
||||
- mistralai/mistral-small-2506
|
||||
- mistralai/mistral-small-latest
|
||||
- mistralai/labs-mistral-small-creative
|
||||
- mistralai/magistral-medium-2509
|
||||
- mistralai/magistral-medium-latest
|
||||
- mistralai/magistral-small-2509
|
||||
- mistralai/magistral-small-latest
|
||||
- mistralai/mistral-large-2512
|
||||
- mistralai/mistral-large-latest
|
||||
- mistralai/ministral-3b-2512
|
||||
- mistralai/ministral-3b-latest
|
||||
- mistralai/ministral-8b-2512
|
||||
- mistralai/ministral-8b-latest
|
||||
- mistralai/ministral-14b-2512
|
||||
- mistralai/ministral-14b-latest
|
||||
- mistralai/mistral-large-2411
|
||||
- mistralai/pixtral-large-2411
|
||||
- mistralai/pixtral-large-latest
|
||||
- mistralai/mistral-large-pixtral-2411
|
||||
- mistralai/devstral-small-2507
|
||||
- mistralai/devstral-medium-2507
|
||||
- mistralai/labs-devstral-small-2512
|
||||
- mistralai/devstral-small-latest
|
||||
- mistralai/mistral-squarepoint-2602
|
||||
- mistralai/mistral-embed-2312
|
||||
- mistralai/mistral-embed
|
||||
- mistralai/codestral-embed
|
||||
- mistralai/codestral-embed-2505
|
||||
moonshotai:
|
||||
- moonshotai/kimi-k2.5
|
||||
- moonshotai/kimi-k2-0905-preview
|
||||
- moonshotai/moonshot-v1-32k
|
||||
- moonshotai/moonshot-v1-128k
|
||||
- moonshotai/kimi-k2-thinking-turbo
|
||||
- moonshotai/moonshot-v1-8k-vision-preview
|
||||
- moonshotai/kimi-k2-0711-preview
|
||||
- moonshotai/moonshot-v1-auto
|
||||
- moonshotai/kimi-k2-thinking
|
||||
- moonshotai/moonshot-v1-128k-vision-preview
|
||||
- moonshotai/kimi-k2-turbo-preview
|
||||
- moonshotai/moonshot-v1-32k-vision-preview
|
||||
- moonshotai/moonshot-v1-8k
|
||||
anthropic:
|
||||
- anthropic/claude-sonnet-4-6
|
||||
- anthropic/claude-opus-4-6
|
||||
- anthropic/claude-opus-4-5-20251101
|
||||
- anthropic/claude-opus-4-5
|
||||
- anthropic/claude-haiku-4-5-20251001
|
||||
- anthropic/claude-haiku-4-5
|
||||
- anthropic/claude-sonnet-4-5-20250929
|
||||
- anthropic/claude-sonnet-4-5
|
||||
- anthropic/claude-opus-4-1-20250805
|
||||
- anthropic/claude-opus-4-1
|
||||
- anthropic/claude-opus-4-20250514
|
||||
- anthropic/claude-opus-4
|
||||
- anthropic/claude-sonnet-4-20250514
|
||||
- anthropic/claude-sonnet-4
|
||||
- anthropic/claude-3-haiku-20240307
|
||||
- anthropic/claude-3-haiku
|
||||
openai:
|
||||
- openai/gpt-4-0613
|
||||
- openai/gpt-4
|
||||
- openai/gpt-3.5-turbo
|
||||
- openai/gpt-5.4
|
||||
- openai/gpt-5.3-chat-latest
|
||||
- openai/gpt-5.4-2026-03-05
|
||||
- openai/gpt-5.4-pro
|
||||
- openai/gpt-5.4-pro-2026-03-05
|
||||
- openai/gpt-3.5-turbo-instruct
|
||||
- openai/gpt-3.5-turbo-instruct-0914
|
||||
- openai/gpt-4-1106-preview
|
||||
- openai/gpt-3.5-turbo-1106
|
||||
- openai/gpt-4-0125-preview
|
||||
- openai/gpt-4-turbo-preview
|
||||
- openai/gpt-3.5-turbo-0125
|
||||
- openai/gpt-4-turbo
|
||||
- openai/gpt-4-turbo-2024-04-09
|
||||
- openai/gpt-4o
|
||||
- openai/gpt-4o-2024-05-13
|
||||
- openai/gpt-4o-mini-2024-07-18
|
||||
- openai/gpt-4o-mini
|
||||
- openai/gpt-4o-2024-08-06
|
||||
- openai/o1-2024-12-17
|
||||
- openai/o1
|
||||
- openai/computer-use-preview
|
||||
- openai/o3-mini
|
||||
- openai/o3-mini-2025-01-31
|
||||
- openai/gpt-4o-2024-11-20
|
||||
- openai/computer-use-preview-2025-03-11
|
||||
- openai/gpt-4o-mini-search-preview-2025-03-11
|
||||
- openai/gpt-4o-mini-search-preview
|
||||
- openai/o1-pro-2025-03-19
|
||||
- openai/o1-pro
|
||||
- openai/o3-2025-04-16
|
||||
- openai/o4-mini-2025-04-16
|
||||
- openai/o3
|
||||
- openai/o4-mini
|
||||
- openai/gpt-4.1-2025-04-14
|
||||
- openai/gpt-4.1
|
||||
- openai/gpt-4.1-mini-2025-04-14
|
||||
- openai/gpt-4.1-mini
|
||||
- openai/gpt-4.1-nano-2025-04-14
|
||||
- openai/gpt-4.1-nano
|
||||
- openai/o3-pro
|
||||
- openai/o3-pro-2025-06-10
|
||||
- openai/o4-mini-deep-research
|
||||
- openai/o3-deep-research
|
||||
- openai/o3-deep-research-2025-06-26
|
||||
- openai/o4-mini-deep-research-2025-06-26
|
||||
- openai/gpt-5-chat-latest
|
||||
- openai/gpt-5-2025-08-07
|
||||
- openai/gpt-5
|
||||
- openai/gpt-5-mini-2025-08-07
|
||||
- openai/gpt-5-mini
|
||||
- openai/gpt-5-nano-2025-08-07
|
||||
- openai/gpt-5-nano
|
||||
- openai/gpt-5-codex
|
||||
- openai/gpt-5-pro-2025-10-06
|
||||
- openai/gpt-5-pro
|
||||
- openai/gpt-5-search-api
|
||||
- openai/gpt-5-search-api-2025-10-14
|
||||
- openai/gpt-5.1-chat-latest
|
||||
- openai/gpt-5.1-2025-11-13
|
||||
- openai/gpt-5.1
|
||||
- openai/gpt-5.1-codex
|
||||
- openai/gpt-5.1-codex-mini
|
||||
- openai/gpt-5.1-codex-max
|
||||
- openai/gpt-5.2-2025-12-11
|
||||
- openai/gpt-5.2
|
||||
- openai/gpt-5.2-pro-2025-12-11
|
||||
- openai/gpt-5.2-pro
|
||||
- openai/gpt-5.2-chat-latest
|
||||
- openai/gpt-5.2-codex
|
||||
- openai/gpt-5.3-codex
|
||||
- openai/gpt-4o-search-preview
|
||||
- openai/gpt-4o-search-preview-2025-03-11
|
||||
- openai/gpt-3.5-turbo-16k
|
||||
- openai/ft:gpt-3.5-turbo-0613:katanemo::8CMZbm0P
|
||||
metadata:
|
||||
total_providers: 10
|
||||
total_models: 289
|
||||
last_updated: 2026-02-13T22:44:30.413065+00:00
|
||||
total_models: 303
|
||||
last_updated: 2026-03-15T16:47:22.207197+00:00
|
||||
|
|
|
|||
|
|
@ -35,7 +35,7 @@ mod tests {
|
|||
ProviderId::Mistral
|
||||
);
|
||||
assert_eq!(ProviderId::try_from("groq").unwrap(), ProviderId::Groq);
|
||||
assert_eq!(ProviderId::try_from("arch").unwrap(), ProviderId::Arch);
|
||||
assert_eq!(ProviderId::try_from("plano").unwrap(), ProviderId::Plano);
|
||||
|
||||
// Test aliases
|
||||
assert_eq!(ProviderId::try_from("google").unwrap(), ProviderId::Gemini);
|
||||
|
|
|
|||
|
|
@ -34,7 +34,7 @@ pub enum ProviderId {
|
|||
Gemini,
|
||||
Anthropic,
|
||||
GitHub,
|
||||
Arch,
|
||||
Plano,
|
||||
AzureOpenAI,
|
||||
XAI,
|
||||
TogetherAI,
|
||||
|
|
@ -58,7 +58,7 @@ impl TryFrom<&str> for ProviderId {
|
|||
"google" => Ok(ProviderId::Gemini), // alias
|
||||
"anthropic" => Ok(ProviderId::Anthropic),
|
||||
"github" => Ok(ProviderId::GitHub),
|
||||
"arch" => Ok(ProviderId::Arch),
|
||||
"plano" => Ok(ProviderId::Plano),
|
||||
"azure_openai" => Ok(ProviderId::AzureOpenAI),
|
||||
"xai" => Ok(ProviderId::XAI),
|
||||
"together_ai" => Ok(ProviderId::TogetherAI),
|
||||
|
|
@ -135,7 +135,7 @@ impl ProviderId {
|
|||
| ProviderId::Groq
|
||||
| ProviderId::Mistral
|
||||
| ProviderId::Deepseek
|
||||
| ProviderId::Arch
|
||||
| ProviderId::Plano
|
||||
| ProviderId::Gemini
|
||||
| ProviderId::GitHub
|
||||
| ProviderId::AzureOpenAI
|
||||
|
|
@ -153,7 +153,7 @@ impl ProviderId {
|
|||
| ProviderId::Groq
|
||||
| ProviderId::Mistral
|
||||
| ProviderId::Deepseek
|
||||
| ProviderId::Arch
|
||||
| ProviderId::Plano
|
||||
| ProviderId::Gemini
|
||||
| ProviderId::GitHub
|
||||
| ProviderId::AzureOpenAI
|
||||
|
|
@ -219,7 +219,7 @@ impl Display for ProviderId {
|
|||
ProviderId::Gemini => write!(f, "Gemini"),
|
||||
ProviderId::Anthropic => write!(f, "Anthropic"),
|
||||
ProviderId::GitHub => write!(f, "GitHub"),
|
||||
ProviderId::Arch => write!(f, "Arch"),
|
||||
ProviderId::Plano => write!(f, "Plano"),
|
||||
ProviderId::AzureOpenAI => write!(f, "azure_openai"),
|
||||
ProviderId::XAI => write!(f, "xai"),
|
||||
ProviderId::TogetherAI => write!(f, "together_ai"),
|
||||
|
|
|
|||
|
|
@ -873,7 +873,7 @@ impl HttpContext for StreamContext {
|
|||
// ensure that the provider has an endpoint if the access key is missing else return a bad request
|
||||
if self.llm_provider.as_ref().unwrap().endpoint.is_none()
|
||||
&& self.llm_provider.as_ref().unwrap().provider_interface
|
||||
!= LlmProviderType::Arch
|
||||
!= LlmProviderType::Plano
|
||||
{
|
||||
self.send_server_error(error, Some(StatusCode::BAD_REQUEST));
|
||||
}
|
||||
|
|
|
|||
|
|
@ -123,6 +123,42 @@ Each agent:
|
|||
|
||||
Both agents run as native local processes and communicate with Plano running natively on the host.
|
||||
|
||||
## Running with local Plano-Orchestrator (via vLLM)
|
||||
|
||||
By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the orchestrator model locally using vLLM on a server with an NVIDIA GPU:
|
||||
|
||||
1. Install vLLM and download the model:
|
||||
```bash
|
||||
pip install vllm
|
||||
```
|
||||
|
||||
2. Start the vLLM server with the 4B model:
|
||||
```bash
|
||||
vllm serve katanemo/Plano-Orchestrator-4B \
|
||||
--host 0.0.0.0 \
|
||||
--port 8000 \
|
||||
--tensor-parallel-size 1 \
|
||||
--gpu-memory-utilization 0.3 \
|
||||
--tokenizer katanemo/Plano-Orchestrator-4B \
|
||||
--chat-template chat_template.jinja \
|
||||
--served-model-name katanemo/Plano-Orchestrator-4B \
|
||||
--enable-prefix-caching
|
||||
```
|
||||
|
||||
3. Start the demo with the local orchestrator config:
|
||||
```bash
|
||||
./run_demo.sh --local-orchestrator
|
||||
```
|
||||
|
||||
4. Test with curl:
|
||||
```bash
|
||||
curl -X POST http://localhost:8001/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "What is the weather in Istanbul?"}]}'
|
||||
```
|
||||
|
||||
You should see Plano use your local orchestrator to route the request to the weather agent.
|
||||
|
||||
## Observability
|
||||
|
||||
This demo includes full OpenTelemetry (OTel) compatible distributed tracing to monitor and debug agent interactions:
|
||||
|
|
|
|||
|
|
@ -0,0 +1,66 @@
|
|||
version: v0.3.0
|
||||
|
||||
overrides:
|
||||
agent_orchestration_model: plano/katanemo/Plano-Orchestrator-4B
|
||||
|
||||
agents:
|
||||
- id: weather_agent
|
||||
url: http://localhost:10510
|
||||
- id: flight_agent
|
||||
url: http://localhost:10520
|
||||
|
||||
model_providers:
|
||||
- model: plano/katanemo/Plano-Orchestrator-4B
|
||||
base_url: http://localhost:8000
|
||||
|
||||
- model: openai/gpt-5.2
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY # smaller, faster, cheaper model for extracting entities like location
|
||||
|
||||
listeners:
|
||||
- type: agent
|
||||
name: travel_booking_service
|
||||
port: 8001
|
||||
router: plano_orchestrator_v1
|
||||
agents:
|
||||
- id: weather_agent
|
||||
description: |
|
||||
|
||||
WeatherAgent is a specialized AI assistant for real-time weather information and forecasts. It provides accurate weather data for any city worldwide using the Open-Meteo API, helping travelers plan their trips with up-to-date weather conditions.
|
||||
|
||||
Capabilities:
|
||||
* Get real-time weather conditions and multi-day forecasts for any city worldwide using Open-Meteo API (free, no API key needed)
|
||||
* Provides current temperature
|
||||
* Provides multi-day forecasts
|
||||
* Provides weather conditions
|
||||
* Provides sunrise/sunset times
|
||||
* Provides detailed weather information
|
||||
* Understands conversation context to resolve location references from previous messages
|
||||
* Handles weather-related questions including "What's the weather in [city]?", "What's the forecast for [city]?", "How's the weather in [city]?"
|
||||
* When queries include both weather and other travel questions (e.g., flights, currency), this agent answers ONLY the weather part
|
||||
|
||||
- id: flight_agent
|
||||
description: |
|
||||
|
||||
FlightAgent is an AI-powered tool specialized in providing live flight information between airports. It leverages the FlightAware AeroAPI to deliver real-time flight status, gate information, and delay updates.
|
||||
|
||||
Capabilities:
|
||||
* Get live flight information between airports using FlightAware AeroAPI
|
||||
* Shows real-time flight status
|
||||
* Shows scheduled/estimated/actual departure and arrival times
|
||||
* Shows gate and terminal information
|
||||
* Shows delays
|
||||
* Shows aircraft type
|
||||
* Shows flight status
|
||||
* Automatically resolves city names to airport codes (IATA/ICAO)
|
||||
* Understands conversation context to infer origin/destination from follow-up questions
|
||||
* Handles flight-related questions including "What flights go from [city] to [city]?", "Do flights go to [city]?", "Are there direct flights from [city]?"
|
||||
* When queries include both flight and other travel questions (e.g., weather, currency), this agent answers ONLY the flight part
|
||||
|
||||
tracing:
|
||||
random_sampling: 100
|
||||
span_attributes:
|
||||
header_prefixes:
|
||||
- x-acme-
|
||||
|
|
@ -31,8 +31,13 @@ start_demo() {
|
|||
fi
|
||||
|
||||
# Step 4: Start Plano
|
||||
echo "Starting Plano with config.yaml..."
|
||||
planoai up config.yaml
|
||||
PLANO_CONFIG="config.yaml"
|
||||
if [ "$1" == "--local-orchestrator" ]; then
|
||||
PLANO_CONFIG="config_local_orchestrator.yaml"
|
||||
echo "Using local orchestrator config..."
|
||||
fi
|
||||
echo "Starting Plano with $PLANO_CONFIG..."
|
||||
planoai up "$PLANO_CONFIG"
|
||||
|
||||
# Step 5: Start agents natively
|
||||
echo "Starting agents..."
|
||||
|
|
|
|||
|
|
@ -1,6 +1,54 @@
|
|||
# Model Routing Service Demo
|
||||
|
||||
This demo shows how to use the `/routing/v1/*` endpoints to get routing decisions without proxying requests to an LLM. The endpoint accepts standard LLM request formats and returns which model Plano's router would select.
|
||||
Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and intelligent LLM routing.
|
||||
|
||||
```
|
||||
┌───────────┐ ┌─────────────────────────────────┐ ┌──────────────┐
|
||||
│ Client │ ───► │ Plano │ ───► │ OpenAI │
|
||||
│ (any │ │ │ │ Anthropic │
|
||||
│ language)│ │ Arch-Router (1.5B model) │ │ Any Provider│
|
||||
└───────────┘ │ analyzes intent → picks model │ └──────────────┘
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
- **One endpoint, many models** — apps call Plano using standard OpenAI/Anthropic APIs; Plano handles provider selection, keys, and failover
|
||||
- **Intelligent routing** — a lightweight 1.5B router model classifies user intent and picks the best model per request
|
||||
- **Platform governance** — centralize API keys, rate limits, guardrails, and observability without touching app code
|
||||
- **Runs anywhere** — single binary; self-host the router for full data privacy
|
||||
|
||||
## How Routing Works
|
||||
|
||||
The entire routing configuration is plain YAML — no code:
|
||||
|
||||
```yaml
|
||||
model_providers:
|
||||
- model: openai/gpt-4o-mini
|
||||
default: true # fallback for unmatched requests
|
||||
|
||||
- model: openai/gpt-4o
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
description: complex reasoning tasks, multi-step analysis
|
||||
|
||||
- model: anthropic/claude-sonnet-4-20250514
|
||||
routing_preferences:
|
||||
- name: code_generation
|
||||
description: generating new code, writing functions
|
||||
```
|
||||
|
||||
When a request arrives, Plano sends the conversation and routing preferences to Arch-Router, which classifies the intent and returns the matching route:
|
||||
|
||||
```
|
||||
1. Request arrives → "Write binary search in Python"
|
||||
2. Preferences serialized → [{"name":"code_generation", ...}, {"name":"complex_reasoning", ...}]
|
||||
3. Arch-Router classifies → {"route": "code_generation"}
|
||||
4. Route → Model lookup → code_generation → anthropic/claude-sonnet-4-20250514
|
||||
5. Request forwarded → Claude generates the response
|
||||
```
|
||||
|
||||
No match? Arch-Router returns `other` → Plano falls back to the default model.
|
||||
|
||||
The `/routing/v1/*` endpoints return the routing decision **without** forwarding to the LLM — useful for testing and validating routing behavior before going to production.
|
||||
|
||||
## Setup
|
||||
|
||||
|
|
@ -55,6 +103,69 @@ Response:
|
|||
|
||||
The response tells you which model would handle this request and which route was matched, without actually making the LLM call.
|
||||
|
||||
## Kubernetes Deployment (Self-hosted Arch-Router on GPU)
|
||||
|
||||
To run Arch-Router in-cluster using vLLM instead of the default hosted endpoint:
|
||||
|
||||
**0. Check your GPU node labels and taints**
|
||||
|
||||
```bash
|
||||
kubectl get nodes --show-labels | grep -i gpu
|
||||
kubectl get node <gpu-node-name> -o jsonpath='{.spec.taints}'
|
||||
```
|
||||
|
||||
GPU nodes commonly have a `nvidia.com/gpu:NoSchedule` taint — `vllm-deployment.yaml` includes a matching toleration. If you have multiple GPU node pools and need to pin to a specific one, uncomment and set the `nodeSelector` in `vllm-deployment.yaml` using the label for your cloud provider.
|
||||
|
||||
**1. Deploy Arch-Router and Plano:**
|
||||
|
||||
```bash
|
||||
|
||||
# arch-router deployment
|
||||
kubectl apply -f vllm-deployment.yaml
|
||||
|
||||
# plano deployment
|
||||
kubectl create secret generic plano-secrets \
|
||||
--from-literal=OPENAI_API_KEY=$OPENAI_API_KEY \
|
||||
--from-literal=ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
|
||||
|
||||
kubectl create configmap plano-config \
|
||||
--from-file=plano_config.yaml=config_k8s.yaml \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
kubectl apply -f plano-deployment.yaml
|
||||
```
|
||||
|
||||
**3. Wait for both pods to be ready:**
|
||||
|
||||
```bash
|
||||
# Arch-Router downloads the model (~1 min) then vLLM loads it (~2 min)
|
||||
kubectl get pods -l app=arch-router -w
|
||||
kubectl rollout status deployment/plano
|
||||
```
|
||||
|
||||
**4. Test:**
|
||||
|
||||
```bash
|
||||
kubectl port-forward svc/plano 12000:12000
|
||||
./demo.sh
|
||||
```
|
||||
|
||||
To confirm requests are hitting your in-cluster Arch-Router (not just health checks):
|
||||
|
||||
```bash
|
||||
kubectl logs -l app=arch-router -f --tail=0
|
||||
# Look for POST /v1/chat/completions entries
|
||||
```
|
||||
|
||||
**Updating the config:**
|
||||
|
||||
```bash
|
||||
kubectl create configmap plano-config \
|
||||
--from-file=plano_config.yaml=config_k8s.yaml \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
kubectl rollout restart deployment/plano
|
||||
```
|
||||
|
||||
## Demo Output
|
||||
|
||||
```
|
||||
|
|
|
|||
33
demos/llm_routing/model_routing_service/config_k8s.yaml
Normal file
33
demos/llm_routing/model_routing_service/config_k8s.yaml
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
version: v0.3.0
|
||||
|
||||
overrides:
|
||||
llm_routing_model: plano/Arch-Router
|
||||
|
||||
listeners:
|
||||
- type: model
|
||||
name: model_listener
|
||||
port: 12000
|
||||
|
||||
model_providers:
|
||||
|
||||
- model: plano/Arch-Router
|
||||
base_url: http://arch-router:10000
|
||||
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/gpt-4o
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
description: complex reasoning tasks, multi-step analysis, or detailed explanations
|
||||
|
||||
- model: anthropic/claude-sonnet-4-20250514
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: code_generation
|
||||
description: generating new code, writing functions, or creating boilerplate
|
||||
|
||||
tracing:
|
||||
random_sampling: 100
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: plano
|
||||
labels:
|
||||
app: plano
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: plano
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: plano
|
||||
spec:
|
||||
containers:
|
||||
- name: plano
|
||||
image: katanemo/plano:0.4.12
|
||||
ports:
|
||||
- containerPort: 12000 # LLM gateway (chat completions, model routing)
|
||||
name: llm-gateway
|
||||
envFrom:
|
||||
- secretRef:
|
||||
name: plano-secrets
|
||||
env:
|
||||
- name: LOG_LEVEL
|
||||
value: "info"
|
||||
volumeMounts:
|
||||
- name: plano-config
|
||||
mountPath: /app/plano_config.yaml
|
||||
subPath: plano_config.yaml
|
||||
readOnly: true
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /healthz
|
||||
port: 12000
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /healthz
|
||||
port: 12000
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 30
|
||||
resources:
|
||||
requests:
|
||||
memory: "256Mi"
|
||||
cpu: "250m"
|
||||
limits:
|
||||
memory: "512Mi"
|
||||
cpu: "1000m"
|
||||
volumes:
|
||||
- name: plano-config
|
||||
configMap:
|
||||
name: plano-config
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: plano
|
||||
spec:
|
||||
selector:
|
||||
app: plano
|
||||
ports:
|
||||
- name: llm-gateway
|
||||
port: 12000
|
||||
targetPort: 12000
|
||||
36
demos/llm_routing/model_routing_service/test.rest
Normal file
36
demos/llm_routing/model_routing_service/test.rest
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
### Code generation query (OpenAI format) — expects anthropic/claude-sonnet
|
||||
POST http://localhost:12000/routing/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Write a Python function for binary search"}]
|
||||
}
|
||||
|
||||
### Complex reasoning query (OpenAI format) — expects openai/gpt-4o
|
||||
POST http://localhost:12000/routing/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Analyze the trade-offs between microservices and monolithic architecture"}]
|
||||
}
|
||||
|
||||
### Simple query — no routing match, expects default model
|
||||
POST http://localhost:12000/routing/v1/chat/completions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Hello"}]
|
||||
}
|
||||
|
||||
### Code generation query (Anthropic format)
|
||||
POST http://localhost:12000/routing/v1/messages
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"model": "claude-sonnet-4-20250514",
|
||||
"max_tokens": 1024,
|
||||
"messages": [{"role": "user", "content": "Write a REST API in Go using Gin"}]
|
||||
}
|
||||
104
demos/llm_routing/model_routing_service/vllm-deployment.yaml
Normal file
104
demos/llm_routing/model_routing_service/vllm-deployment.yaml
Normal file
|
|
@ -0,0 +1,104 @@
|
|||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: arch-router
|
||||
labels:
|
||||
app: arch-router
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: arch-router
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: arch-router
|
||||
spec:
|
||||
tolerations:
|
||||
- key: nvidia.com/gpu
|
||||
operator: Exists
|
||||
effect: NoSchedule
|
||||
# Optional: add a nodeSelector to pin to a specific GPU node pool.
|
||||
# The nvidia.com/gpu resource request below is sufficient for most clusters.
|
||||
# nodeSelector:
|
||||
# DigitalOcean: doks.digitalocean.com/gpu-model: l40s
|
||||
# GKE: cloud.google.com/gke-accelerator: nvidia-l4
|
||||
# EKS: eks.amazonaws.com/nodegroup: gpu-nodes
|
||||
# AKS: kubernetes.azure.com/agentpool: gpupool
|
||||
initContainers:
|
||||
- name: download-model
|
||||
image: python:3.11-slim
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
- |
|
||||
pip install huggingface_hub[cli] && \
|
||||
python -c "from huggingface_hub import snapshot_download; snapshot_download('katanemo/Arch-Router-1.5B.gguf', local_dir='/models/Arch-Router-1.5B.gguf')"
|
||||
volumeMounts:
|
||||
- name: model-cache
|
||||
mountPath: /models
|
||||
containers:
|
||||
- name: vllm
|
||||
image: vllm/vllm-openai:latest
|
||||
command:
|
||||
- vllm
|
||||
- serve
|
||||
- /models/Arch-Router-1.5B.gguf/Arch-Router-1.5B-Q4_K_M.gguf
|
||||
- "--host"
|
||||
- "0.0.0.0"
|
||||
- "--port"
|
||||
- "10000"
|
||||
- "--load-format"
|
||||
- "gguf"
|
||||
- "--tokenizer"
|
||||
- "katanemo/Arch-Router-1.5B"
|
||||
- "--served-model-name"
|
||||
- "Arch-Router"
|
||||
- "--gpu-memory-utilization"
|
||||
- "0.3"
|
||||
- "--tensor-parallel-size"
|
||||
- "1"
|
||||
- "--enable-prefix-caching"
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 10000
|
||||
protocol: TCP
|
||||
resources:
|
||||
requests:
|
||||
cpu: "1"
|
||||
memory: "4Gi"
|
||||
nvidia.com/gpu: "1"
|
||||
limits:
|
||||
cpu: "4"
|
||||
memory: "8Gi"
|
||||
nvidia.com/gpu: "1"
|
||||
volumeMounts:
|
||||
- name: model-cache
|
||||
mountPath: /models
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 10000
|
||||
initialDelaySeconds: 60
|
||||
periodSeconds: 10
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 10000
|
||||
initialDelaySeconds: 180
|
||||
periodSeconds: 30
|
||||
volumes:
|
||||
- name: model-cache
|
||||
emptyDir: {}
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: arch-router
|
||||
spec:
|
||||
selector:
|
||||
app: arch-router
|
||||
ports:
|
||||
- name: http
|
||||
port: 10000
|
||||
targetPort: 10000
|
||||
|
|
@ -1,8 +1,7 @@
|
|||
version: v0.1.0
|
||||
|
||||
routing:
|
||||
model: Arch-Router
|
||||
llm_provider: arch-router
|
||||
overrides:
|
||||
llm_routing_model: Arch-Router
|
||||
|
||||
listeners:
|
||||
egress_traffic:
|
||||
|
|
|
|||
|
|
@ -1,8 +1,7 @@
|
|||
version: v0.3.0
|
||||
|
||||
routing:
|
||||
model: Arch-Router
|
||||
llm_provider: arch-router
|
||||
overrides:
|
||||
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
|
||||
listeners:
|
||||
- type: model
|
||||
|
|
@ -11,8 +10,7 @@ listeners:
|
|||
|
||||
model_providers:
|
||||
|
||||
- name: arch-router
|
||||
model: arch/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
- model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
base_url: http://localhost:11434
|
||||
|
||||
- model: openai/gpt-4o-mini
|
||||
|
|
|
|||
|
|
@ -17,7 +17,7 @@ from sphinxawesome_theme.postprocess import Icons
|
|||
project = "Plano Docs"
|
||||
copyright = "2025, Katanemo Labs, Inc"
|
||||
author = "Katanemo Labs, Inc"
|
||||
release = " v0.4.11"
|
||||
release = " v0.4.12"
|
||||
|
||||
# -- General configuration ---------------------------------------------------
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
|
||||
|
|
|
|||
|
|
@ -43,7 +43,7 @@ Plano's CLI allows you to manage and interact with the Plano efficiently. To ins
|
|||
|
||||
.. code-block:: console
|
||||
|
||||
$ uv tool install planoai==0.4.11
|
||||
$ uv tool install planoai==0.4.12
|
||||
|
||||
**Option 2: Install with pip (Traditional)**
|
||||
|
||||
|
|
@ -51,7 +51,7 @@ Plano's CLI allows you to manage and interact with the Plano efficiently. To ins
|
|||
|
||||
$ python -m venv venv
|
||||
$ source venv/bin/activate # On Windows, use: venv\Scripts\activate
|
||||
$ pip install planoai==0.4.11
|
||||
$ pip install planoai==0.4.12
|
||||
|
||||
|
||||
.. _llm_routing_quickstart:
|
||||
|
|
|
|||
|
|
@ -253,13 +253,11 @@ Using Ollama (recommended for local development)
|
|||
|
||||
.. code-block:: yaml
|
||||
|
||||
routing:
|
||||
model: Arch-Router
|
||||
llm_provider: arch-router
|
||||
overrides:
|
||||
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
|
||||
model_providers:
|
||||
- name: arch-router
|
||||
model: arch/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
- model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
base_url: http://localhost:11434
|
||||
|
||||
- model: openai/gpt-5.2
|
||||
|
|
@ -324,13 +322,11 @@ vLLM provides higher throughput and GPU optimizations suitable for production de
|
|||
|
||||
.. code-block:: yaml
|
||||
|
||||
routing:
|
||||
model: Arch-Router
|
||||
llm_provider: arch-router
|
||||
overrides:
|
||||
llm_routing_model: plano/Arch-Router
|
||||
|
||||
model_providers:
|
||||
- name: arch-router
|
||||
model: Arch-Router
|
||||
- model: plano/Arch-Router
|
||||
base_url: http://<your-server-ip>:10000
|
||||
|
||||
- model: openai/gpt-5.2
|
||||
|
|
@ -351,6 +347,35 @@ vLLM provides higher throughput and GPU optimizations suitable for production de
|
|||
curl http://localhost:10000/v1/models
|
||||
|
||||
|
||||
Using vLLM on Kubernetes (GPU nodes)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
|
||||
The ``demos/llm_routing/model_routing_service/`` directory includes ready-to-use manifests:
|
||||
|
||||
- ``vllm-deployment.yaml`` — Arch-Router served by vLLM, with an init container to download
|
||||
the model from HuggingFace
|
||||
- ``plano-deployment.yaml`` — Plano proxy configured to use the in-cluster Arch-Router
|
||||
- ``config_k8s.yaml`` — Plano config with ``llm_routing_model`` pointing at
|
||||
``http://arch-router:10000`` instead of the default hosted endpoint
|
||||
|
||||
Key things to know before deploying:
|
||||
|
||||
- GPU nodes commonly have a ``nvidia.com/gpu:NoSchedule`` taint — the ``vllm-deployment.yaml``
|
||||
includes a matching toleration. The ``nvidia.com/gpu: "1"`` resource request is sufficient
|
||||
for scheduling in most clusters; a ``nodeSelector`` is optional and commented out in the
|
||||
manifest for cases where you need to pin to a specific GPU node pool.
|
||||
- Model download takes ~1 minute; vLLM loads the model in ~1-2 minutes after that. The
|
||||
``livenessProbe`` has a 180-second ``initialDelaySeconds`` to avoid premature restarts.
|
||||
- The Plano config ConfigMap must use ``--from-file=plano_config.yaml=config_k8s.yaml`` with
|
||||
``subPath`` in the Deployment — omitting ``subPath`` causes Kubernetes to mount a directory
|
||||
instead of a file.
|
||||
|
||||
For the canonical Plano Kubernetes deployment (ConfigMap, Secrets, Deployment YAML), see
|
||||
:ref:`deployment`. For full step-by-step commands specific to this demo, see the
|
||||
`demo README <https://github.com/katanemo/plano/tree/main/demos/llm_routing/model_routing_service/README.md>`_.
|
||||
|
||||
|
||||
Combining Routing Methods
|
||||
-------------------------
|
||||
|
||||
|
|
|
|||
|
|
@ -335,6 +335,90 @@ Combine RAG agents for documentation lookup with specialized troubleshooting age
|
|||
- id: troubleshoot_agent
|
||||
description: Diagnoses and resolves technical issues step by step
|
||||
|
||||
Self-hosting Plano-Orchestrator
|
||||
-------------------------------
|
||||
|
||||
By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the orchestrator model, you can serve it using **vLLM** on a server with an NVIDIA GPU.
|
||||
|
||||
.. note::
|
||||
vLLM requires a Linux server with an NVIDIA GPU (CUDA). For local development on macOS, a GGUF version for Ollama is coming soon.
|
||||
|
||||
The following model variants are available on HuggingFace:
|
||||
|
||||
* `Plano-Orchestrator-4B <https://huggingface.co/katanemo/Plano-Orchestrator-4B>`_ — lighter model, suitable for development and testing
|
||||
* `Plano-Orchestrator-4B-FP8 <https://huggingface.co/katanemo/Plano-Orchestrator-4B-FP8>`_ — FP8 quantized 4B model, lower memory usage
|
||||
* `Plano-Orchestrator-30B-A3B <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B>`_ — full-size model for production
|
||||
* `Plano-Orchestrator-30B-A3B-FP8 <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B-FP8>`_ — FP8 quantized 30B model, recommended for production deployments
|
||||
|
||||
Using vLLM
|
||||
~~~~~~~~~~
|
||||
|
||||
1. **Install vLLM**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install vllm
|
||||
|
||||
2. **Download the model and chat template**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install huggingface_hub
|
||||
huggingface-cli download katanemo/Plano-Orchestrator-4B
|
||||
|
||||
3. **Start the vLLM server**
|
||||
|
||||
For the 4B model (development):
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
vllm serve katanemo/Plano-Orchestrator-4B \
|
||||
--host 0.0.0.0 \
|
||||
--port 8000 \
|
||||
--tensor-parallel-size 1 \
|
||||
--gpu-memory-utilization 0.3 \
|
||||
--tokenizer katanemo/Plano-Orchestrator-4B \
|
||||
--chat-template chat_template.jinja \
|
||||
--served-model-name katanemo/Plano-Orchestrator-4B \
|
||||
--enable-prefix-caching
|
||||
|
||||
For the 30B-A3B-FP8 model (production):
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
vllm serve katanemo/Plano-Orchestrator-30B-A3B-FP8 \
|
||||
--host 0.0.0.0 \
|
||||
--port 8000 \
|
||||
--tensor-parallel-size 1 \
|
||||
--gpu-memory-utilization 0.9 \
|
||||
--tokenizer katanemo/Plano-Orchestrator-30B-A3B-FP8 \
|
||||
--chat-template chat_template.jinja \
|
||||
--max-model-len 32768 \
|
||||
--served-model-name katanemo/Plano-Orchestrator-30B-A3B-FP8 \
|
||||
--enable-prefix-caching
|
||||
|
||||
4. **Configure Plano to use the local orchestrator**
|
||||
|
||||
Use the model name matching your ``--served-model-name``:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
overrides:
|
||||
agent_orchestration_model: plano/katanemo/Plano-Orchestrator-4B
|
||||
|
||||
model_providers:
|
||||
- model: katanemo/Plano-Orchestrator-4B
|
||||
provider_interface: plano
|
||||
base_url: http://<your-server-ip>:8000
|
||||
|
||||
5. **Verify the server is running**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
curl http://localhost:8000/health
|
||||
curl http://localhost:8000/v1/models
|
||||
|
||||
|
||||
Next Steps
|
||||
----------
|
||||
|
||||
|
|
|
|||
|
|
@ -65,7 +65,7 @@ Create a ``docker-compose.yml`` file with the following configuration:
|
|||
# docker-compose.yml
|
||||
services:
|
||||
plano:
|
||||
image: katanemo/plano:0.4.11
|
||||
image: katanemo/plano:0.4.12
|
||||
container_name: plano
|
||||
ports:
|
||||
- "10000:10000" # ingress (client -> plano)
|
||||
|
|
@ -153,7 +153,7 @@ Create a ``plano-deployment.yaml``:
|
|||
spec:
|
||||
containers:
|
||||
- name: plano
|
||||
image: katanemo/plano:0.4.11
|
||||
image: katanemo/plano:0.4.12
|
||||
ports:
|
||||
- containerPort: 12000 # LLM gateway (chat completions, model routing)
|
||||
name: llm-gateway
|
||||
|
|
|
|||
|
|
@ -107,11 +107,11 @@ model_providers:
|
|||
- internal: true
|
||||
model: Arch-Function
|
||||
name: arch-function
|
||||
provider_interface: arch
|
||||
provider_interface: plano
|
||||
- internal: true
|
||||
model: Plano-Orchestrator
|
||||
name: plano-orchestrator
|
||||
provider_interface: arch
|
||||
name: plano/orchestrator
|
||||
provider_interface: plano
|
||||
prompt_targets:
|
||||
- description: Get current weather at a location.
|
||||
endpoint:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue