Compare commits

...

15 commits
0.4.20 ... main

Author SHA1 Message Date
Musa
473ec70b5c
Bump version to 0.4.21 (#911)
Some checks are pending
CI / pre-commit (push) Waiting to run
CI / plano-tools-tests (push) Waiting to run
CI / native-smoke-test (push) Waiting to run
CI / docker-build (push) Waiting to run
CI / validate-config (push) Waiting to run
CI / security-scan (push) Blocked by required conditions
CI / test-prompt-gateway (push) Blocked by required conditions
CI / test-model-alias-routing (push) Blocked by required conditions
CI / test-responses-api-with-state (push) Blocked by required conditions
CI / e2e-plano-tests (3.10) (push) Blocked by required conditions
CI / e2e-plano-tests (3.11) (push) Blocked by required conditions
CI / e2e-plano-tests (3.12) (push) Blocked by required conditions
CI / e2e-plano-tests (3.13) (push) Blocked by required conditions
CI / e2e-plano-tests (3.14) (push) Blocked by required conditions
CI / e2e-demo-preference (push) Blocked by required conditions
CI / e2e-demo-currency (push) Blocked by required conditions
Publish docker image (latest) / build-arm64 (push) Waiting to run
Publish docker image (latest) / build-amd64 (push) Waiting to run
Publish docker image (latest) / create-manifest (push) Blocked by required conditions
Build and Deploy Documentation / build (push) Waiting to run
2026-04-24 14:27:32 -07:00
Syed A. Hashmi
dafd245332
signals: restore the pre-port flag marker emoji (🚩) (#913)
* signals: restore the pre-port flag marker emoji

#903 inadvertently replaced the legacy FLAG_MARKER (U+1F6A9, '🚩') with
'[!]', which broke any downstream dashboard / alert that searches span
names for the flag emoji. Restores the original marker and updates the
#910 docs pass to match.

- crates/brightstaff/src/signals/analyzer.rs: FLAG_MARKER back to
  "\\u{1F6A9}" with a comment noting the backwards-compatibility
  reason so it doesn't drift again.
- docs/source/concepts/signals.rst and docs/source/guides/observability/
  tracing.rst: swap every '[!]' reference (subheading text, example
  span name, tip box, dashboard query hint) back to 🚩.

Verified: cargo test -p brightstaff --lib (162 passed, 1 ignored);
sphinx-build clean on both files; rendered HTML shows 🚩 in all
flag-marker references.

Made-with: Cursor

* fix: silence manual_checked_ops clippy lint (rustc 1.95)

Pre-existing warning in router/stress_tests.rs that becomes an error
under CI's -D warnings with rustc 1.95. Replace the manual if/else
with growth.checked_div(num_iterations).unwrap_or(0) as clippy
suggests.

Made-with: Cursor
2026-04-24 13:54:53 -07:00
Musa
897fda2deb
fix(routing): auto-migrate v0.3.0 inline routing_preferences to v0.4.0 top-level (#912)
* fix(routing): auto-migrate v0.3.0 inline routing_preferences to v0.4.0 top-level

Lift inline routing_preferences under each model_provider into the
top-level routing_preferences list with merged models[] and bump
version to v0.4.0, with a deprecation warning. Existing v0.3.0
demo configs (Claude Code, Codex, preference_based_routing, etc.)
keep working unchanged. Schema flags the inline shape as deprecated
but still accepts it. Docs and skills updated to canonical top-level
multi-model form.

* test(common): bump reference config assertion to v0.4.0

The rendered reference config was bumped to v0.4.0 when its inline
routing_preferences were lifted to the top level; align the
configuration deserialization test with that change.

* fix(config_generator): bump version to v0.4.0 up front in migration

Move the v0.3.0 -> v0.4.0 version bump to the top of
migrate_inline_routing_preferences so it runs unconditionally,
including for configs that already declare top-level
routing_preferences at v0.3.0. Previously the bump only fired
when inline migration produced entries, leaving top-level v0.3.0
configs rejected by brightstaff's v0.4.0 gate. Tests updated to
cover the new behavior and to confirm we never downgrade newer
versions.

* fix(config_generator): gate routing_preferences migration on version < v0.4.0

Short-circuit the migration when the config already declares v0.4.0
or newer. Anything at v0.4.0+ is assumed to be on the canonical
top-level shape and is passed through untouched, including stray
inline preferences (which are the author's bug to fix). Only v0.3.0
and older configs are rewritten and bumped.
2026-04-24 12:31:44 -07:00
Syed A. Hashmi
5a652eb666
docs: align signals page with paper taxonomy (#910)
* docs: align signals page with paper taxonomy

Updates docs/source/concepts/signals.rst and the tracing guide's signals
subsection to reflect the three-layer taxonomy shipped in #903:

- Introduces the paper reference (arXiv:2604.00356) and the three layers
  (interaction, execution, environment) with all 20 leaf signal types in
  three reference tables
- Documents the new layered OTel attribute set
  (signals.interaction.*, signals.execution.*, signals.environment.*)
  and marks the legacy aggregate keys (signals.follow_up.repair.*,
  signals.frustration.*, signals.repetition.count,
  signals.escalation.requested, signals.positive_feedback.count) as
  deprecated-but-still-emitted
- Adds a Span Events section describing the per-instance signal.<type>
  events with confidence / snippet / metadata attributes
- Fixes the flag marker reference ([!] in the code vs 🚩 in the old docs)
- Updates all example attributes, dashboard queries, and alert rules to
  use the layered keys
- Updates the tracing guide's behavioral-signals subsection to match
- Notes that the triage sampler is a planned follow-up and today sampling
  is consumer-side via observability-platform filters

Build verified locally: sphinx-build produces no warnings on these files.

Made-with: Cursor

* docs: reframe signals intro around the improvement flywheel

Addresses review feedback on #910:

- Replace the triage-only framing at the top with an instrument -> sample
  & triage -> construct data -> optimize -> deploy flywheel that explains
  why signals matter, not just what they surface. Paper's 82% / 1.52x
  numbers move into step 2 of the flywheel where they belong.
- Remove the 'Signals vs Response Quality' section. Per review, signals
  and response quality overlap rather than complement each other, so the
  comparison is misleading.
- Borrow the per-category summaries and leaf-type descriptions verbatim
  from the katanemo/signals reference implementation (module docstrings)
  so the documentation and the detector contract stay in sync. Drops the
  hand-crafted examples that were not strictly accurate (e.g. 'semantic
  overlap is high' for rephrase, 'user explicitly corrects the agent'
  for correction).

Made-with: Cursor

* docs: address signals flywheel review feedback

Addresses review comments on #910:

- Shorten the paper citation to (Chen et al., 2026) per common cite
  practice (replacing the full author list form).
- Replace the Why Signals Matter section with the review-suggested
  rewrite verbatim: more formal intro framing, renumbered steps to
  Instrument / Sample & triage / Data Construction / Model Optimization
  / Deploy, removes 'routing decisions' from the data-construction
  step, and adds DPO/RLHF/SFT as model-optimization examples.
- Renders tau and O(messages) as proper math glyphs via the sphinx
  built-in :math: role (enabled by adding sphinx.ext.mathjax to
  conf.py). Using the RST role form rather than raw $...$ inline so
  sphinx only injects MathJax on pages that actually have math,
  instead of loading ~1MB of JS on every page.

Build verified locally: sphinx-build produces no warnings on the
changed files and the rendered HTML wraps tau and O(messages) in
MathJax-ready <span class="math">\(\tau\)</span> containers.

Made-with: Cursor
2026-04-24 12:31:14 -07:00
Musa
b81eb7266c
feat(providers): add Vercel AI Gateway and OpenRouter support (#902)
Some checks are pending
CI / pre-commit (push) Waiting to run
CI / plano-tools-tests (push) Waiting to run
CI / native-smoke-test (push) Waiting to run
CI / docker-build (push) Waiting to run
CI / validate-config (push) Waiting to run
CI / security-scan (push) Blocked by required conditions
CI / test-prompt-gateway (push) Blocked by required conditions
CI / test-model-alias-routing (push) Blocked by required conditions
CI / test-responses-api-with-state (push) Blocked by required conditions
CI / e2e-plano-tests (3.10) (push) Blocked by required conditions
CI / e2e-plano-tests (3.11) (push) Blocked by required conditions
CI / e2e-plano-tests (3.12) (push) Blocked by required conditions
CI / e2e-plano-tests (3.13) (push) Blocked by required conditions
CI / e2e-plano-tests (3.14) (push) Blocked by required conditions
CI / e2e-demo-preference (push) Blocked by required conditions
CI / e2e-demo-currency (push) Blocked by required conditions
Publish docker image (latest) / build-arm64 (push) Waiting to run
Publish docker image (latest) / build-amd64 (push) Waiting to run
Publish docker image (latest) / create-manifest (push) Blocked by required conditions
Build and Deploy Documentation / build (push) Waiting to run
* add Vercel and OpenRouter as OpenAI-compatible LLM providers

* fix(fmt): fix cargo fmt line length issues in provider id tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style(hermesllm): fix rustfmt formatting in provider id tests

* Add Vercel and OpenRouter to zero-config planoai up defaults

Wires `vercel/*` and `openrouter/*` into the synthesized default config so
`planoai up` with no user config exposes both providers out of the box
(env-keyed via AI_GATEWAY_API_KEY / OPENROUTER_API_KEY, pass-through
otherwise). Registers both in SUPPORTED_PROVIDERS_WITHOUT_BASE_URL so
wildcard model entries validate without an explicit provider_interface.

---------

Co-authored-by: Musa Malik <musam@uw.edu>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 15:54:39 -07:00
Musa
78dc4edad9
Add first-class ChatGPT subscription provider support (#881)
* Add first-class ChatGPT subscription provider support

* Address PR feedback: move uuid import to top, reuse parsed config in up()

* Add ChatGPT token watchdog for seamless long-lived sessions

* Address PR feedback: error on stream=false for ChatGPT, fix auth file permissions

* Replace ChatGPT watchdog/restart with passthrough_auth

---------

Co-authored-by: Musa Malik <musam@uw.edu>
2026-04-23 15:34:44 -07:00
Adil Hafeez
aa726b1bba
add jemalloc and /debug/memstats endpoint for OOM diagnosis (#885)
Some checks are pending
CI / pre-commit (push) Waiting to run
CI / plano-tools-tests (push) Waiting to run
CI / native-smoke-test (push) Waiting to run
CI / docker-build (push) Waiting to run
CI / validate-config (push) Waiting to run
CI / security-scan (push) Blocked by required conditions
CI / test-prompt-gateway (push) Blocked by required conditions
CI / test-model-alias-routing (push) Blocked by required conditions
CI / test-responses-api-with-state (push) Blocked by required conditions
CI / e2e-plano-tests (3.10) (push) Blocked by required conditions
CI / e2e-plano-tests (3.11) (push) Blocked by required conditions
CI / e2e-plano-tests (3.12) (push) Blocked by required conditions
CI / e2e-plano-tests (3.13) (push) Blocked by required conditions
CI / e2e-plano-tests (3.14) (push) Blocked by required conditions
CI / e2e-demo-preference (push) Blocked by required conditions
CI / e2e-demo-currency (push) Blocked by required conditions
Publish docker image (latest) / build-arm64 (push) Waiting to run
Publish docker image (latest) / build-amd64 (push) Waiting to run
Publish docker image (latest) / create-manifest (push) Blocked by required conditions
Build and Deploy Documentation / build (push) Waiting to run
2026-04-23 13:59:12 -07:00
Syed A. Hashmi
c8079ac971
signals: feature parity with the latest Signals paper. Porting logic from python repo (#903)
* signals: port to layered taxonomy with dual-emit OTel

Made-with: Cursor

* fix: silence collapsible_match clippy lint (rustc 1.95)

Made-with: Cursor

* test: parity harness for rust vs python signals analyzer

Validates the brightstaff signals port against the katanemo/signals Python
reference on lmsys/lmsys-chat-1m. Adds a signals_replay bin emitting python-
compatible JSON, a pyarrow-based driver (bypasses the datasets loader pickle
bug on python 3.14), a 3-tier comparator, and an on-demand workflow_dispatch
CI job.

Made-with: Cursor

* Remove signals test from the gitops flow

* style: format parity harness with black

Made-with: Cursor

* signals: group summary by taxonomy, factor misalignment_ratio

Addresses #903 review feedback from @nehcgs:

- generate_summary() now renders explicit Interaction / Execution /
  Environment headers so the paper taxonomy is visible at a glance,
  even when no signals fired in a given layer. Quality-driving callouts
  (high misalignment rate, looping detected, escalation requested) are
  appended after the layer summary as an alerts tail.

- repair_ratio (legacy taxonomy name) renamed to misalignment_ratio
  and factored into a single InteractionSignals::misalignment_ratio()
  helper so assess_quality and generate_summary share one source of
  truth instead of recomputing the same divide twice.

Two new unit tests pin the layer headers and the (sev N) severity
suffix. Parity with the python reference is preserved at the Tier-A
level (per-type counts + overall_quality); only the human-readable
summary string diverges, which the parity comparator already classifies
as Tier-C.

Made-with: Cursor
2026-04-23 12:02:30 -07:00
Adil Hafeez
6701195a5d
add overrides.disable_signals to skip CPU-heavy signal analysis (#906) 2026-04-23 11:38:29 -07:00
Adil Hafeez
22f332f62d
Add Prometheus metrics endpoint and Grafana dashboard for brightstaff (#904)
Some checks are pending
CI / pre-commit (push) Waiting to run
CI / plano-tools-tests (push) Waiting to run
CI / native-smoke-test (push) Waiting to run
CI / docker-build (push) Waiting to run
CI / validate-config (push) Waiting to run
CI / security-scan (push) Blocked by required conditions
CI / test-prompt-gateway (push) Blocked by required conditions
CI / test-model-alias-routing (push) Blocked by required conditions
CI / test-responses-api-with-state (push) Blocked by required conditions
CI / e2e-plano-tests (3.10) (push) Blocked by required conditions
CI / e2e-plano-tests (3.11) (push) Blocked by required conditions
CI / e2e-plano-tests (3.12) (push) Blocked by required conditions
CI / e2e-plano-tests (3.13) (push) Blocked by required conditions
CI / e2e-plano-tests (3.14) (push) Blocked by required conditions
CI / e2e-demo-preference (push) Blocked by required conditions
CI / e2e-demo-currency (push) Blocked by required conditions
Publish docker image (latest) / build-arm64 (push) Waiting to run
Publish docker image (latest) / build-amd64 (push) Waiting to run
Publish docker image (latest) / create-manifest (push) Blocked by required conditions
Build and Deploy Documentation / build (push) Waiting to run
2026-04-22 11:19:10 -07:00
Adil Hafeez
9812540602
Improve obs model name matching, latency metrics, and error reporting (#900)
Some checks failed
CI / pre-commit (push) Has been cancelled
CI / plano-tools-tests (push) Has been cancelled
CI / native-smoke-test (push) Has been cancelled
CI / docker-build (push) Has been cancelled
CI / validate-config (push) Has been cancelled
Publish docker image (latest) / build-arm64 (push) Has been cancelled
Publish docker image (latest) / build-amd64 (push) Has been cancelled
Build and Deploy Documentation / build (push) Has been cancelled
CI / security-scan (push) Has been cancelled
CI / test-prompt-gateway (push) Has been cancelled
CI / test-model-alias-routing (push) Has been cancelled
CI / test-responses-api-with-state (push) Has been cancelled
CI / e2e-plano-tests (3.10) (push) Has been cancelled
CI / e2e-plano-tests (3.11) (push) Has been cancelled
CI / e2e-plano-tests (3.12) (push) Has been cancelled
CI / e2e-plano-tests (3.13) (push) Has been cancelled
CI / e2e-plano-tests (3.14) (push) Has been cancelled
CI / e2e-demo-preference (push) Has been cancelled
CI / e2e-demo-currency (push) Has been cancelled
Publish docker image (latest) / create-manifest (push) Has been cancelled
2026-04-18 21:21:15 -07:00
Adil Hafeez
c3c213b2fd
Fix request closures during long-running streaming (#899) 2026-04-18 21:20:34 -07:00
Adil Hafeez
78d8c90184
Add claude-opus-4-7 to anthropic provider models (#901)
Some checks are pending
CI / pre-commit (push) Waiting to run
CI / plano-tools-tests (push) Waiting to run
CI / native-smoke-test (push) Waiting to run
CI / docker-build (push) Waiting to run
CI / validate-config (push) Waiting to run
CI / security-scan (push) Blocked by required conditions
CI / test-prompt-gateway (push) Blocked by required conditions
CI / test-model-alias-routing (push) Blocked by required conditions
CI / test-responses-api-with-state (push) Blocked by required conditions
CI / e2e-plano-tests (3.10) (push) Blocked by required conditions
CI / e2e-plano-tests (3.11) (push) Blocked by required conditions
CI / e2e-plano-tests (3.12) (push) Blocked by required conditions
CI / e2e-plano-tests (3.13) (push) Blocked by required conditions
CI / e2e-plano-tests (3.14) (push) Blocked by required conditions
CI / e2e-demo-preference (push) Blocked by required conditions
CI / e2e-demo-currency (push) Blocked by required conditions
Publish docker image (latest) / build-arm64 (push) Waiting to run
Publish docker image (latest) / build-amd64 (push) Waiting to run
Publish docker image (latest) / create-manifest (push) Blocked by required conditions
Build and Deploy Documentation / build (push) Waiting to run
2026-04-18 19:10:57 -07:00
Adil Hafeez
ffea891dba
fix: prevent index-out-of-bounds panic in signal analyzer follow-up (#896) 2026-04-18 16:24:02 -07:00
Adil Hafeez
e7464b817a
fix(anthropic-stream): avoid bare/duplicate message_stop on OpenAI upstream (#898) 2026-04-18 15:57:34 -07:00
97 changed files with 10420 additions and 3896 deletions

View file

@ -133,13 +133,13 @@ jobs:
load: true
tags: |
${{ env.PLANO_DOCKER_IMAGE }}
${{ env.DOCKER_IMAGE }}:0.4.20
${{ env.DOCKER_IMAGE }}:0.4.21
${{ env.DOCKER_IMAGE }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Save image as artifact
run: docker save ${{ env.PLANO_DOCKER_IMAGE }} ${{ env.DOCKER_IMAGE }}:0.4.20 ${{ env.DOCKER_IMAGE }}:latest -o /tmp/plano-image.tar
run: docker save ${{ env.PLANO_DOCKER_IMAGE }} ${{ env.DOCKER_IMAGE }}:0.4.21 ${{ env.DOCKER_IMAGE }}:latest -o /tmp/plano-image.tar
- name: Upload image artifact
uses: actions/upload-artifact@v6

View file

@ -24,7 +24,7 @@ export function Hero() {
>
<div className="inline-flex flex-wrap items-center gap-1.5 sm:gap-2 px-3 sm:px-4 py-1 rounded-full bg-[rgba(185,191,255,0.4)] border border-[var(--secondary)] shadow backdrop-blur hover:bg-[rgba(185,191,255,0.6)] transition-colors cursor-pointer">
<span className="text-xs sm:text-sm font-medium text-black/65">
v0.4.20
v0.4.21
</span>
<span className="text-xs sm:text-sm font-medium text-black ">

View file

@ -1 +1 @@
docker build -f Dockerfile . -t katanemo/plano -t katanemo/plano:0.4.20
docker build -f Dockerfile . -t katanemo/plano -t katanemo/plano:0.4.21

View file

@ -1,3 +1,3 @@
"""Plano CLI - Intelligent Prompt Gateway."""
__version__ = "0.4.20"
__version__ = "0.4.21"

290
cli/planoai/chatgpt_auth.py Normal file
View file

@ -0,0 +1,290 @@
"""
ChatGPT subscription OAuth device-flow authentication.
Implements the device code flow used by OpenAI Codex CLI to authenticate
with a ChatGPT Plus/Pro subscription. Tokens are stored locally in
~/.plano/chatgpt/auth.json and auto-refreshed when expired.
"""
import base64
import json
import os
import time
from typing import Any, Dict, Optional, Tuple
import requests
from planoai.consts import PLANO_HOME
# OAuth + API constants (derived from openai/codex)
CHATGPT_AUTH_BASE = "https://auth.openai.com"
CHATGPT_DEVICE_CODE_URL = f"{CHATGPT_AUTH_BASE}/api/accounts/deviceauth/usercode"
CHATGPT_DEVICE_TOKEN_URL = f"{CHATGPT_AUTH_BASE}/api/accounts/deviceauth/token"
CHATGPT_OAUTH_TOKEN_URL = f"{CHATGPT_AUTH_BASE}/oauth/token"
CHATGPT_DEVICE_VERIFY_URL = f"{CHATGPT_AUTH_BASE}/codex/device"
CHATGPT_API_BASE = "https://chatgpt.com/backend-api/codex"
CHATGPT_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
# Local storage
CHATGPT_AUTH_DIR = os.path.join(PLANO_HOME, "chatgpt")
CHATGPT_AUTH_FILE = os.path.join(CHATGPT_AUTH_DIR, "auth.json")
# Timeouts
TOKEN_EXPIRY_SKEW_SECONDS = 60
DEVICE_CODE_TIMEOUT_SECONDS = 15 * 60
DEVICE_CODE_POLL_SECONDS = 5
def _ensure_auth_dir():
os.makedirs(CHATGPT_AUTH_DIR, exist_ok=True)
def load_auth() -> Optional[Dict[str, Any]]:
"""Load auth data from disk."""
try:
with open(CHATGPT_AUTH_FILE, "r") as f:
return json.load(f)
except (IOError, json.JSONDecodeError):
return None
def save_auth(data: Dict[str, Any]):
"""Save auth data to disk."""
_ensure_auth_dir()
fd = os.open(CHATGPT_AUTH_FILE, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
with os.fdopen(fd, "w") as f:
json.dump(data, f, indent=2)
def delete_auth():
"""Remove stored credentials."""
try:
os.remove(CHATGPT_AUTH_FILE)
except FileNotFoundError:
pass
def _decode_jwt_claims(token: str) -> Dict[str, Any]:
"""Decode JWT payload without verification."""
try:
parts = token.split(".")
if len(parts) < 2:
return {}
payload_b64 = parts[1]
payload_b64 += "=" * (-len(payload_b64) % 4)
return json.loads(base64.urlsafe_b64decode(payload_b64).decode("utf-8"))
except Exception:
return {}
def _get_expires_at(token: str) -> Optional[int]:
"""Extract expiration time from JWT."""
claims = _decode_jwt_claims(token)
exp = claims.get("exp")
return int(exp) if isinstance(exp, (int, float)) else None
def _extract_account_id(token: Optional[str]) -> Optional[str]:
"""Extract ChatGPT account ID from JWT claims."""
if not token:
return None
claims = _decode_jwt_claims(token)
auth_claims = claims.get("https://api.openai.com/auth")
if isinstance(auth_claims, dict):
account_id = auth_claims.get("chatgpt_account_id")
if isinstance(account_id, str) and account_id:
return account_id
return None
def _is_token_expired(auth_data: Dict[str, Any]) -> bool:
"""Check if the access token is expired."""
expires_at = auth_data.get("expires_at")
if expires_at is None:
access_token = auth_data.get("access_token")
if access_token:
expires_at = _get_expires_at(access_token)
if expires_at:
auth_data["expires_at"] = expires_at
save_auth(auth_data)
if expires_at is None:
return True
return time.time() >= float(expires_at) - TOKEN_EXPIRY_SKEW_SECONDS
def _refresh_tokens(refresh_token: str) -> Dict[str, str]:
"""Refresh the access token using the refresh token."""
resp = requests.post(
CHATGPT_OAUTH_TOKEN_URL,
json={
"client_id": CHATGPT_CLIENT_ID,
"grant_type": "refresh_token",
"refresh_token": refresh_token,
"scope": "openid profile email",
},
)
resp.raise_for_status()
data = resp.json()
access_token = data.get("access_token")
id_token = data.get("id_token")
if not access_token or not id_token:
raise RuntimeError(f"Refresh response missing fields: {data}")
return {
"access_token": access_token,
"refresh_token": data.get("refresh_token", refresh_token),
"id_token": id_token,
}
def _build_auth_record(tokens: Dict[str, str]) -> Dict[str, Any]:
"""Build the auth record to persist."""
access_token = tokens.get("access_token")
id_token = tokens.get("id_token")
expires_at = _get_expires_at(access_token) if access_token else None
account_id = _extract_account_id(id_token or access_token)
return {
"access_token": access_token,
"refresh_token": tokens.get("refresh_token"),
"id_token": id_token,
"expires_at": expires_at,
"account_id": account_id,
}
def request_device_code() -> Dict[str, str]:
"""Request a device code from OpenAI's device auth endpoint."""
resp = requests.post(
CHATGPT_DEVICE_CODE_URL,
json={"client_id": CHATGPT_CLIENT_ID},
)
resp.raise_for_status()
data = resp.json()
device_auth_id = data.get("device_auth_id")
user_code = data.get("user_code") or data.get("usercode")
interval = data.get("interval")
if not device_auth_id or not user_code:
raise RuntimeError(f"Device code response missing fields: {data}")
return {
"device_auth_id": device_auth_id,
"user_code": user_code,
"interval": str(interval or "5"),
}
def poll_for_authorization(device_code: Dict[str, str]) -> Dict[str, str]:
"""Poll until the user completes authorization. Returns code_data."""
interval = int(device_code.get("interval", "5"))
start_time = time.time()
while time.time() - start_time < DEVICE_CODE_TIMEOUT_SECONDS:
try:
resp = requests.post(
CHATGPT_DEVICE_TOKEN_URL,
json={
"device_auth_id": device_code["device_auth_id"],
"user_code": device_code["user_code"],
},
)
if resp.status_code == 200:
data = resp.json()
if all(
key in data
for key in ("authorization_code", "code_challenge", "code_verifier")
):
return data
if resp.status_code in (403, 404):
time.sleep(max(interval, DEVICE_CODE_POLL_SECONDS))
continue
resp.raise_for_status()
except requests.HTTPError as exc:
if exc.response is not None and exc.response.status_code in (403, 404):
time.sleep(max(interval, DEVICE_CODE_POLL_SECONDS))
continue
raise RuntimeError(f"Polling failed: {exc}") from exc
time.sleep(max(interval, DEVICE_CODE_POLL_SECONDS))
raise RuntimeError("Timed out waiting for device authorization")
def exchange_code_for_tokens(code_data: Dict[str, str]) -> Dict[str, str]:
"""Exchange the authorization code for access/refresh/id tokens."""
redirect_uri = f"{CHATGPT_AUTH_BASE}/deviceauth/callback"
body = (
"grant_type=authorization_code"
f"&code={code_data['authorization_code']}"
f"&redirect_uri={redirect_uri}"
f"&client_id={CHATGPT_CLIENT_ID}"
f"&code_verifier={code_data['code_verifier']}"
)
resp = requests.post(
CHATGPT_OAUTH_TOKEN_URL,
headers={"Content-Type": "application/x-www-form-urlencoded"},
data=body,
)
resp.raise_for_status()
data = resp.json()
if not all(key in data for key in ("access_token", "refresh_token", "id_token")):
raise RuntimeError(f"Token exchange response missing fields: {data}")
return {
"access_token": data["access_token"],
"refresh_token": data["refresh_token"],
"id_token": data["id_token"],
}
def login() -> Dict[str, Any]:
"""Run the full device code login flow. Returns the auth record."""
device_code = request_device_code()
auth_record = _build_auth_record({})
auth_record["device_code_requested_at"] = time.time()
save_auth(auth_record)
print(
"\nSign in with your ChatGPT account:\n"
f" 1) Visit: {CHATGPT_DEVICE_VERIFY_URL}\n"
f" 2) Enter code: {device_code['user_code']}\n\n"
"Device codes are a common phishing target. Never share this code.\n",
flush=True,
)
code_data = poll_for_authorization(device_code)
tokens = exchange_code_for_tokens(code_data)
auth_record = _build_auth_record(tokens)
save_auth(auth_record)
return auth_record
def get_access_token() -> Tuple[str, Optional[str]]:
"""
Get a valid access token and account ID.
Refreshes automatically if expired. Raises if no auth data exists.
Returns (access_token, account_id).
"""
auth_data = load_auth()
if not auth_data:
raise RuntimeError(
"No ChatGPT credentials found. Run 'planoai chatgpt login' first."
)
access_token = auth_data.get("access_token")
if access_token and not _is_token_expired(auth_data):
return access_token, auth_data.get("account_id")
# Try refresh
refresh_token = auth_data.get("refresh_token")
if refresh_token:
tokens = _refresh_tokens(refresh_token)
auth_record = _build_auth_record(tokens)
save_auth(auth_record)
return auth_record["access_token"], auth_record.get("account_id")
raise RuntimeError(
"ChatGPT token expired and refresh failed. Run 'planoai chatgpt login' again."
)

View file

@ -0,0 +1,86 @@
"""
CLI commands for ChatGPT subscription management.
Usage:
planoai chatgpt login - Authenticate with ChatGPT via device code flow
planoai chatgpt status - Check authentication status
planoai chatgpt logout - Remove stored credentials
"""
import datetime
import click
from rich.console import Console
from planoai import chatgpt_auth
console = Console()
@click.group()
def chatgpt():
"""ChatGPT subscription management."""
pass
@chatgpt.command()
def login():
"""Authenticate with your ChatGPT subscription using device code flow."""
try:
auth_record = chatgpt_auth.login()
account_id = auth_record.get("account_id", "unknown")
console.print(
f"\n[green]Successfully authenticated with ChatGPT![/green]"
f"\nAccount ID: {account_id}"
f"\nCredentials saved to: {chatgpt_auth.CHATGPT_AUTH_FILE}"
)
except Exception as e:
console.print(f"\n[red]Authentication failed:[/red] {e}")
raise SystemExit(1)
@chatgpt.command()
def status():
"""Check ChatGPT authentication status."""
auth_data = chatgpt_auth.load_auth()
if not auth_data or not auth_data.get("access_token"):
console.print(
"[yellow]Not authenticated.[/yellow] Run 'planoai chatgpt login'."
)
return
account_id = auth_data.get("account_id", "unknown")
expires_at = auth_data.get("expires_at")
if expires_at:
expiry_time = datetime.datetime.fromtimestamp(
expires_at, tz=datetime.timezone.utc
)
now = datetime.datetime.now(tz=datetime.timezone.utc)
if expiry_time > now:
remaining = expiry_time - now
console.print(
f"[green]Authenticated[/green]"
f"\n Account ID: {account_id}"
f"\n Token expires: {expiry_time.strftime('%Y-%m-%d %H:%M:%S UTC')}"
f" ({remaining.seconds // 60}m remaining)"
)
else:
console.print(
f"[yellow]Token expired[/yellow]"
f"\n Account ID: {account_id}"
f"\n Expired at: {expiry_time.strftime('%Y-%m-%d %H:%M:%S UTC')}"
f"\n Will auto-refresh on next use, or run 'planoai chatgpt login'."
)
else:
console.print(
f"[green]Authenticated[/green] (no expiry info)"
f"\n Account ID: {account_id}"
)
@chatgpt.command()
def logout():
"""Remove stored ChatGPT credentials."""
chatgpt_auth.delete_auth()
console.print("[green]ChatGPT credentials removed.[/green]")

View file

@ -1,5 +1,6 @@
import json
import os
import uuid
from planoai.utils import convert_legacy_listeners
from jinja2 import Environment, FileSystemLoader
import yaml
@ -28,9 +29,16 @@ SUPPORTED_PROVIDERS_WITHOUT_BASE_URL = [
"xai",
"moonshotai",
"zhipu",
"chatgpt",
"digitalocean",
"vercel",
"openrouter",
]
CHATGPT_API_BASE = "https://chatgpt.com/backend-api/codex"
CHATGPT_DEFAULT_ORIGINATOR = "codex_cli_rs"
CHATGPT_DEFAULT_USER_AGENT = "codex_cli_rs/0.0.0 (Unknown 0; unknown) unknown"
SUPPORTED_PROVIDERS = (
SUPPORTED_PROVIDERS_WITHOUT_BASE_URL + SUPPORTED_PROVIDERS_WITH_BASE_URL
)
@ -50,6 +58,110 @@ def get_endpoint_and_port(endpoint, protocol):
return endpoint, port
def migrate_inline_routing_preferences(config_yaml):
"""Lift v0.3.0-style inline ``routing_preferences`` under each
``model_providers`` entry to the v0.4.0 top-level ``routing_preferences``
list with ``models: [...]``.
This function is a no-op for configs whose ``version`` is already
``v0.4.0`` or newer those are assumed to be on the canonical
top-level shape and are passed through untouched.
For older configs, the version is bumped to ``v0.4.0`` up front so
brightstaff's v0.4.0 gate for top-level ``routing_preferences``
accepts the rendered config, then inline preferences under each
provider are lifted into the top-level list. Preferences with the
same ``name`` across multiple providers are merged into a single
top-level entry whose ``models`` list contains every provider's
full ``<provider>/<model>`` string in declaration order. The first
``description`` encountered wins; conflicts are warned, not errored,
so existing v0.3.0 configs keep compiling. Any top-level preference
already defined by the user is preserved as-is.
"""
current_version = str(config_yaml.get("version", ""))
if _version_tuple(current_version) >= (0, 4, 0):
return
config_yaml["version"] = "v0.4.0"
model_providers = config_yaml.get("model_providers") or []
if not model_providers:
return
migrated = {}
for model_provider in model_providers:
inline_prefs = model_provider.get("routing_preferences")
if not inline_prefs:
continue
full_model_name = model_provider.get("model")
if not full_model_name:
continue
if "/" in full_model_name and full_model_name.split("/")[-1].strip() == "*":
raise Exception(
f"Model {full_model_name} has routing_preferences but uses wildcard (*). Models with routing preferences cannot be wildcards."
)
for pref in inline_prefs:
name = pref.get("name")
description = pref.get("description", "")
if not name:
continue
if name in migrated:
entry = migrated[name]
if description and description != entry["description"]:
print(
f"WARNING: routing preference '{name}' has conflicting descriptions across providers; keeping the first one."
)
if full_model_name not in entry["models"]:
entry["models"].append(full_model_name)
else:
migrated[name] = {
"name": name,
"description": description,
"models": [full_model_name],
}
if not migrated:
return
for model_provider in model_providers:
if "routing_preferences" in model_provider:
del model_provider["routing_preferences"]
existing_top_level = config_yaml.get("routing_preferences") or []
existing_names = {entry.get("name") for entry in existing_top_level}
merged = list(existing_top_level)
for name, entry in migrated.items():
if name in existing_names:
continue
merged.append(entry)
config_yaml["routing_preferences"] = merged
print(
"WARNING: inline routing_preferences under model_providers is deprecated "
"and has been auto-migrated to top-level routing_preferences. Update your "
"config to v0.4.0 top-level form. See docs/routing-api.md"
)
def _version_tuple(version_string):
stripped = version_string.strip().lstrip("vV")
if not stripped:
return (0, 0, 0)
parts = stripped.split("-", 1)[0].split(".")
out = []
for part in parts[:3]:
try:
out.append(int(part))
except ValueError:
out.append(0)
while len(out) < 3:
out.append(0)
return tuple(out)
def validate_and_render_schema():
ENVOY_CONFIG_TEMPLATE_FILE = os.getenv(
"ENVOY_CONFIG_TEMPLATE_FILE", "envoy.template.yaml"
@ -93,6 +205,8 @@ def validate_and_render_schema():
config_yaml["model_providers"] = config_yaml["llm_providers"]
del config_yaml["llm_providers"]
migrate_inline_routing_preferences(config_yaml)
listeners, llm_gateway, prompt_gateway = convert_legacy_listeners(
config_yaml.get("listeners"), config_yaml.get("model_providers")
)
@ -192,7 +306,16 @@ def validate_and_render_schema():
model_provider_name_set = set()
llms_with_usage = []
model_name_keys = set()
model_usage_name_keys = set()
top_level_preferences = config_yaml.get("routing_preferences") or []
seen_pref_names = set()
for pref in top_level_preferences:
pref_name = pref.get("name")
if pref_name in seen_pref_names:
raise Exception(
f'Duplicate routing preference name "{pref_name}", please provide unique name for each routing preference'
)
seen_pref_names.add(pref_name)
print("listeners: ", listeners)
@ -251,10 +374,6 @@ def validate_and_render_schema():
raise Exception(
f"Model {model_name} is configured as default but uses wildcard (*). Default models cannot be wildcards."
)
if model_provider.get("routing_preferences"):
raise Exception(
f"Model {model_name} has routing_preferences but uses wildcard (*). Models with routing preferences cannot be wildcards."
)
# Validate azure_openai and ollama provider requires base_url
if (provider in SUPPORTED_PROVIDERS_WITH_BASE_URL) and model_provider.get(
@ -303,13 +422,6 @@ def validate_and_render_schema():
)
model_name_keys.add(model_id)
for routing_preference in model_provider.get("routing_preferences", []):
if routing_preference.get("name") in model_usage_name_keys:
raise Exception(
f'Duplicate routing preference name "{routing_preference.get("name")}", please provide unique name for each routing preference'
)
model_usage_name_keys.add(routing_preference.get("name"))
# Warn if both passthrough_auth and access_key are configured
if model_provider.get("passthrough_auth") and model_provider.get(
"access_key"
@ -332,6 +444,25 @@ def validate_and_render_schema():
provider = model_provider["provider"]
model_provider["provider_interface"] = provider
del model_provider["provider"]
# Auto-wire ChatGPT provider: inject base_url, passthrough_auth, and extra headers
if provider == "chatgpt":
if not model_provider.get("base_url"):
model_provider["base_url"] = CHATGPT_API_BASE
if not model_provider.get("access_key") and not model_provider.get(
"passthrough_auth"
):
model_provider["passthrough_auth"] = True
headers = model_provider.get("headers", {})
headers.setdefault(
"ChatGPT-Account-Id",
os.environ.get("CHATGPT_ACCOUNT_ID", ""),
)
headers.setdefault("originator", CHATGPT_DEFAULT_ORIGINATOR)
headers.setdefault("user-agent", CHATGPT_DEFAULT_USER_AGENT)
headers.setdefault("session_id", str(uuid.uuid4()))
model_provider["headers"] = headers
updated_model_providers.append(model_provider)
if model_provider.get("base_url", None):
@ -378,7 +509,7 @@ def validate_and_render_schema():
router_model_id = (
router_model.split("/", 1)[1] if "/" in router_model else router_model
)
if len(model_usage_name_keys) > 0 and router_model_id not in model_name_set:
if len(seen_pref_names) > 0 and router_model_id not in model_name_set:
updated_model_providers.append(
{
"name": "plano-orchestrator",

View file

@ -5,7 +5,7 @@ PLANO_COLOR = "#969FF4"
SERVICE_NAME_ARCHGW = "plano"
PLANO_DOCKER_NAME = "plano"
PLANO_DOCKER_IMAGE = os.getenv("PLANO_DOCKER_IMAGE", "katanemo/plano:0.4.20")
PLANO_DOCKER_IMAGE = os.getenv("PLANO_DOCKER_IMAGE", "katanemo/plano:0.4.21")
DEFAULT_OTEL_TRACING_GRPC_ENDPOINT = "http://localhost:4317"
# Native mode constants

View file

@ -81,6 +81,21 @@ PROVIDER_DEFAULTS: list[ProviderDefault] = [
base_url="https://inference.do-ai.run/v1",
model_pattern="digitalocean/*",
),
ProviderDefault(
name="vercel",
env_var="AI_GATEWAY_API_KEY",
base_url="https://ai-gateway.vercel.sh/v1",
model_pattern="vercel/*",
),
# OpenRouter is a first-class provider — the `openrouter/` model prefix is
# accepted by the schema and brightstaff's ProviderId parser, so no
# provider_interface override is needed.
ProviderDefault(
name="openrouter",
env_var="OPENROUTER_API_KEY",
base_url="https://openrouter.ai/api/v1",
model_pattern="openrouter/*",
),
]

View file

@ -37,6 +37,7 @@ from planoai.core import (
)
from planoai.init_cmd import init as init_cmd
from planoai.trace_cmd import trace as trace_cmd, start_trace_listener_background
from planoai.chatgpt_cmd import chatgpt as chatgpt_cmd
from planoai.obs_cmd import obs as obs_cmd
from planoai.consts import (
DEFAULT_OTEL_TRACING_GRPC_ENDPOINT,
@ -125,6 +126,28 @@ def _temporary_cli_log_level(level: str | None):
set_log_level(current_level)
def _inject_chatgpt_tokens_if_needed(config, env, console):
"""If config uses chatgpt providers, resolve tokens from ~/.plano/chatgpt/auth.json."""
providers = config.get("model_providers") or config.get("llm_providers") or []
has_chatgpt = any(str(p.get("model", "")).startswith("chatgpt/") for p in providers)
if not has_chatgpt:
return
try:
from planoai.chatgpt_auth import get_access_token
access_token, account_id = get_access_token()
env["CHATGPT_ACCESS_TOKEN"] = access_token
if account_id:
env["CHATGPT_ACCOUNT_ID"] = account_id
except Exception as e:
console.print(
f"\n[red]ChatGPT auth error:[/red] {e}\n"
f"[dim]Run 'planoai chatgpt login' to authenticate.[/dim]\n"
)
sys.exit(1)
def _print_missing_keys(console, missing_keys: list[str]) -> None:
console.print(f"\n[red]✗[/red] [red]Missing API keys![/red]\n")
for key in missing_keys:
@ -418,6 +441,14 @@ def up(
env = os.environ.copy()
env.pop("PATH", None)
import yaml
with open(plano_config_file, "r") as f:
plano_config = yaml.safe_load(f)
# Inject ChatGPT tokens from ~/.plano/chatgpt/auth.json if any provider needs them
_inject_chatgpt_tokens_if_needed(plano_config, env, console)
# Check access keys
access_keys = get_llm_provider_access_keys(plano_config_file=plano_config_file)
access_keys = set(access_keys)
@ -715,6 +746,7 @@ main.add_command(cli_agent)
main.add_command(generate_prompt_targets)
main.add_command(init_cmd, name="init")
main.add_command(trace_cmd, name="trace")
main.add_command(chatgpt_cmd, name="chatgpt")
main.add_command(obs_cmd, name="obs")
if __name__ == "__main__":

View file

@ -253,6 +253,7 @@ def start_native(
log.info("Plano is running (native mode)")
for port in gateway_ports:
log.info(f" http://localhost:{port}")
break
# Check if processes are still alive
@ -367,8 +368,11 @@ def _kill_pid(pid):
pass
def stop_native():
"""Stop natively-running Envoy and brightstaff processes.
def stop_native(skip_pids: set | None = None):
"""Stop natively-running Envoy, brightstaff, and watchdog processes.
Args:
skip_pids: Set of PIDs to skip (used by the watchdog to avoid self-termination).
Returns:
bool: True if at least one process was running and received a stop signal,
@ -385,7 +389,12 @@ def stop_native():
brightstaff_pid = pids.get("brightstaff_pid")
had_running_process = False
for name, pid in [("envoy", envoy_pid), ("brightstaff", brightstaff_pid)]:
for name, pid in [
("envoy", envoy_pid),
("brightstaff", brightstaff_pid),
]:
if skip_pids and pid in skip_pids:
continue
if pid is None:
continue
try:

View file

@ -7,6 +7,7 @@ Single-source: one fetch at startup, cached for the life of the process.
from __future__ import annotations
import logging
import re
import threading
from dataclasses import dataclass
from typing import Any
@ -123,13 +124,28 @@ class PricingCatalog:
return round(cost, 6)
_DATE_SUFFIX_RE = re.compile(r"-\d{8}$")
_PROVIDER_PREFIXES = ("anthropic", "openai", "google", "meta", "cohere", "mistral")
_ANTHROPIC_FAMILIES = {"opus", "sonnet", "haiku"}
def _model_key_candidates(model_name: str) -> list[str]:
"""Lookup-side variants of a Plano-emitted model name.
Plano resolves names like ``claude-haiku-4-5-20251001``; the catalog stores
them as ``anthropic-claude-haiku-4.5``. We strip the date suffix and the
``provider/`` prefix here; the catalog itself registers the dash/dot and
family-order aliases at parse time (see :func:`_expand_aliases`).
"""
base = model_name.strip()
out = [base]
if "/" in base:
out.append(base.split("/", 1)[1])
for k in list(out):
stripped = _DATE_SUFFIX_RE.sub("", k)
if stripped != k:
out.append(stripped)
out.extend([v.lower() for v in list(out)])
# Dedup while preserving order.
seen: set[str] = set()
uniq = []
for key in out:
@ -139,6 +155,54 @@ def _model_key_candidates(model_name: str) -> list[str]:
return uniq
def _expand_aliases(model_id: str) -> set[str]:
"""Catalog-side variants of a DO model id.
DO publishes Anthropic models under ids like ``anthropic-claude-opus-4.7``
or ``anthropic-claude-4.6-sonnet`` while Plano emits ``claude-opus-4-7`` /
``claude-sonnet-4-6``. Generate a set covering provider-prefix stripping,
dashdot in version segments, and familyversion word order so a single
catalog entry matches every name shape we'll see at lookup.
"""
aliases: set[str] = set()
def add(name: str) -> None:
if not name:
return
aliases.add(name)
aliases.add(name.lower())
add(model_id)
base = model_id
head, _, rest = base.partition("-")
if head.lower() in _PROVIDER_PREFIXES and rest:
add(rest)
base = rest
for key in list(aliases):
if "." in key:
add(key.replace(".", "-"))
parts = base.split("-")
if len(parts) >= 3 and parts[0].lower() == "claude":
rest_parts = parts[1:]
for i, p in enumerate(rest_parts):
if p.lower() in _ANTHROPIC_FAMILIES:
others = rest_parts[:i] + rest_parts[i + 1 :]
if not others:
break
family_last = "claude-" + "-".join(others) + "-" + p
family_first = "claude-" + p + "-" + "-".join(others)
add(family_last)
add(family_first)
add(family_last.replace(".", "-"))
add(family_first.replace(".", "-"))
break
return aliases
def _parse_do_pricing(data: Any) -> dict[str, ModelPrice]:
"""Parse DO catalog response into a ModelPrice map keyed by model id.
@ -204,11 +268,13 @@ def _parse_do_pricing(data: Any) -> dict[str, ModelPrice]:
# rates for promo/open-weight models.
if input_rate == 0 and output_rate == 0:
continue
prices[str(model_id)] = ModelPrice(
price = ModelPrice(
input_per_token_usd=input_rate,
output_per_token_usd=output_rate,
cached_input_per_token_usd=cached_rate,
)
for alias in _expand_aliases(str(model_id)):
prices.setdefault(alias, price)
return prices

View file

@ -4,15 +4,18 @@ from __future__ import annotations
from collections import Counter
from dataclasses import dataclass
from datetime import datetime, timezone
from datetime import datetime
from http import HTTPStatus
from rich.box import SIMPLE
from rich.columns import Columns
from rich.align import Align
from rich.box import SIMPLE, SIMPLE_HEAVY
from rich.console import Group
from rich.panel import Panel
from rich.table import Table
from rich.text import Text
MAX_WIDTH = 160
from planoai.obs.collector import LLMCall
@ -24,6 +27,16 @@ class AggregateStats:
total_output_tokens: int
distinct_sessions: int
current_session: str | None
p50_latency_ms: float | None = None
p95_latency_ms: float | None = None
p99_latency_ms: float | None = None
p50_ttft_ms: float | None = None
p95_ttft_ms: float | None = None
p99_ttft_ms: float | None = None
error_count: int = 0
errors_4xx: int = 0
errors_5xx: int = 0
has_cost: bool = False
@dataclass
@ -35,10 +48,16 @@ class ModelRollup:
cache_write: int
cache_read: int
cost_usd: float
has_cost: bool = False
avg_tokens_per_sec: float | None = None
def _now() -> datetime:
return datetime.now(tz=timezone.utc).astimezone()
def _percentile(values: list[float], pct: float) -> float | None:
if not values:
return None
s = sorted(values)
k = max(0, min(len(s) - 1, int(round((pct / 100.0) * (len(s) - 1)))))
return s[k]
def aggregates(calls: list[LLMCall]) -> AggregateStats:
@ -49,6 +68,15 @@ def aggregates(calls: list[LLMCall]) -> AggregateStats:
current = next(
(c.session_id for c in reversed(calls) if c.session_id is not None), None
)
durations = [c.duration_ms for c in calls if c.duration_ms is not None]
ttfts = [c.ttft_ms for c in calls if c.ttft_ms is not None]
errors_4xx = sum(
1 for c in calls if c.status_code is not None and 400 <= c.status_code < 500
)
errors_5xx = sum(
1 for c in calls if c.status_code is not None and c.status_code >= 500
)
has_cost = any(c.cost_usd is not None for c in calls)
return AggregateStats(
count=len(calls),
total_cost_usd=total_cost,
@ -56,11 +84,22 @@ def aggregates(calls: list[LLMCall]) -> AggregateStats:
total_output_tokens=total_output,
distinct_sessions=len(session_ids),
current_session=current,
p50_latency_ms=_percentile(durations, 50),
p95_latency_ms=_percentile(durations, 95),
p99_latency_ms=_percentile(durations, 99),
p50_ttft_ms=_percentile(ttfts, 50),
p95_ttft_ms=_percentile(ttfts, 95),
p99_ttft_ms=_percentile(ttfts, 99),
error_count=errors_4xx + errors_5xx,
errors_4xx=errors_4xx,
errors_5xx=errors_5xx,
has_cost=has_cost,
)
def model_rollups(calls: list[LLMCall]) -> list[ModelRollup]:
buckets: dict[str, dict[str, float | int]] = {}
buckets: dict[str, dict[str, float | int | bool]] = {}
tps_samples: dict[str, list[float]] = {}
for c in calls:
key = c.model
b = buckets.setdefault(
@ -72,6 +111,7 @@ def model_rollups(calls: list[LLMCall]) -> list[ModelRollup]:
"cache_write": 0,
"cache_read": 0,
"cost": 0.0,
"has_cost": False,
},
)
b["requests"] = int(b["requests"]) + 1
@ -80,9 +120,16 @@ def model_rollups(calls: list[LLMCall]) -> list[ModelRollup]:
b["cache_write"] = int(b["cache_write"]) + int(c.cache_creation_tokens or 0)
b["cache_read"] = int(b["cache_read"]) + int(c.cached_input_tokens or 0)
b["cost"] = float(b["cost"]) + (c.cost_usd or 0.0)
if c.cost_usd is not None:
b["has_cost"] = True
tps = c.tokens_per_sec
if tps is not None:
tps_samples.setdefault(key, []).append(tps)
rollups: list[ModelRollup] = []
for model, b in buckets.items():
samples = tps_samples.get(model)
avg_tps = (sum(samples) / len(samples)) if samples else None
rollups.append(
ModelRollup(
model=model,
@ -92,34 +139,62 @@ def model_rollups(calls: list[LLMCall]) -> list[ModelRollup]:
cache_write=int(b["cache_write"]),
cache_read=int(b["cache_read"]),
cost_usd=float(b["cost"]),
has_cost=bool(b["has_cost"]),
avg_tokens_per_sec=avg_tps,
)
)
rollups.sort(key=lambda r: r.cost_usd, reverse=True)
rollups.sort(key=lambda r: (r.cost_usd, r.requests), reverse=True)
return rollups
def route_hits(calls: list[LLMCall]) -> list[tuple[str, int, float]]:
@dataclass
class RouteHit:
route: str
hits: int
pct: float
p95_latency_ms: float | None
error_count: int
def route_hits(calls: list[LLMCall]) -> list[RouteHit]:
counts: Counter[str] = Counter()
per_route_latency: dict[str, list[float]] = {}
per_route_errors: dict[str, int] = {}
for c in calls:
if c.route_name:
counts[c.route_name] += 1
if not c.route_name:
continue
counts[c.route_name] += 1
if c.duration_ms is not None:
per_route_latency.setdefault(c.route_name, []).append(c.duration_ms)
if c.status_code is not None and c.status_code >= 400:
per_route_errors[c.route_name] = per_route_errors.get(c.route_name, 0) + 1
total = sum(counts.values())
if total == 0:
return []
return [(r, n, (n / total) * 100.0) for r, n in counts.most_common()]
return [
RouteHit(
route=r,
hits=n,
pct=(n / total) * 100.0,
p95_latency_ms=_percentile(per_route_latency.get(r, []), 95),
error_count=per_route_errors.get(r, 0),
)
for r, n in counts.most_common()
]
def _fmt_cost(v: float | None) -> str:
def _fmt_cost(v: float | None, *, zero: str = "") -> str:
if v is None:
return ""
if v == 0:
return "$0"
# Adaptive precision so tiny costs ($3.8e-5) remain readable.
return zero
if abs(v) < 0.0001:
return f"${v:.8f}".rstrip("0").rstrip(".")
if abs(v) < 0.01:
return f"${v:.6f}".rstrip("0").rstrip(".")
return f"${v:.4f}"
if abs(v) < 1:
return f"${v:.4f}"
return f"${v:,.2f}"
def _fmt_ms(v: float | None) -> str:
@ -142,187 +217,418 @@ def _fmt_tokens(v: int | None) -> str:
return f"{v:,}"
def _request_panel(last: LLMCall | None) -> Panel:
def _fmt_tps(v: float | None) -> str:
if v is None or v <= 0:
return ""
if v >= 100:
return f"{v:.0f}/s"
return f"{v:.1f}/s"
def _latency_style(v: float | None) -> str:
if v is None:
return "dim"
if v < 500:
return "green"
if v < 2000:
return "yellow"
return "red"
def _ttft_style(v: float | None) -> str:
if v is None:
return "dim"
if v < 300:
return "green"
if v < 1000:
return "yellow"
return "red"
def _truncate_model(name: str, limit: int = 32) -> str:
if len(name) <= limit:
return name
return name[: limit - 1] + ""
def _status_text(code: int | None) -> Text:
if code is None:
return Text("", style="dim")
if 200 <= code < 300:
return Text("● ok", style="green")
if 300 <= code < 400:
return Text(f"{code}", style="yellow")
if 400 <= code < 500:
return Text(f"{code}", style="yellow bold")
return Text(f"{code}", style="red bold")
def _summary_panel(last: LLMCall | None, stats: AggregateStats) -> Panel:
# Content-sized columns with a fixed gutter keep the two blocks close
# together instead of stretching across the full terminal on wide screens.
grid = Table.grid(padding=(0, 4))
grid.add_column(no_wrap=True)
grid.add_column(no_wrap=True)
# Left: latest request snapshot.
left = Table.grid(padding=(0, 1))
left.add_column(style="dim", no_wrap=True)
left.add_column(no_wrap=True)
if last is None:
body = Text("no requests yet", style="dim")
left.add_row("latest", Text("waiting for spans…", style="dim italic"))
else:
t = Table.grid(padding=(0, 1))
t.add_column(style="bold cyan")
t.add_column()
t.add_row("Endpoint", "chat/completions")
status = "" if last.status_code is None else str(last.status_code)
t.add_row("Status", status)
t.add_row("Model", last.model)
model_text = Text(_truncate_model(last.model, 48), style="bold cyan")
if last.is_streaming:
model_text.append(" ⟳ stream", style="dim")
left.add_row("model", model_text)
if last.request_model and last.request_model != last.model:
t.add_row("Req model", last.request_model)
left.add_row(
"requested", Text(_truncate_model(last.request_model, 48), style="cyan")
)
if last.route_name:
t.add_row("Route", last.route_name)
body = t
return Panel(body, title="[bold]Request[/]", border_style="cyan", box=SIMPLE)
def _cost_panel(last: LLMCall | None) -> Panel:
if last is None:
body = Text("", style="dim")
else:
t = Table.grid(padding=(0, 1))
t.add_column(style="bold green")
t.add_column()
t.add_row("Request", _fmt_cost(last.cost_usd))
t.add_row("Input", _fmt_tokens(last.prompt_tokens))
t.add_row("Output", _fmt_tokens(last.completion_tokens))
left.add_row("route", Text(last.route_name, style="yellow"))
left.add_row("status", _status_text(last.status_code))
tokens = Text()
tokens.append(_fmt_tokens(last.prompt_tokens))
tokens.append(" in", style="dim")
tokens.append(" · ", style="dim")
tokens.append(_fmt_tokens(last.completion_tokens), style="green")
tokens.append(" out", style="dim")
if last.cached_input_tokens:
t.add_row("Cached", _fmt_tokens(last.cached_input_tokens))
body = t
return Panel(body, title="[bold]Cost[/]", border_style="green", box=SIMPLE)
tokens.append(" · ", style="dim")
tokens.append(_fmt_tokens(last.cached_input_tokens), style="yellow")
tokens.append(" cached", style="dim")
left.add_row("tokens", tokens)
timing = Text()
timing.append("TTFT ", style="dim")
timing.append(_fmt_ms(last.ttft_ms), style=_ttft_style(last.ttft_ms))
timing.append(" · ", style="dim")
timing.append("lat ", style="dim")
timing.append(_fmt_ms(last.duration_ms), style=_latency_style(last.duration_ms))
tps = last.tokens_per_sec
if tps:
timing.append(" · ", style="dim")
timing.append(_fmt_tps(tps), style="green")
left.add_row("timing", timing)
left.add_row("cost", Text(_fmt_cost(last.cost_usd), style="green bold"))
# Right: lifetime totals.
right = Table.grid(padding=(0, 1))
right.add_column(style="dim", no_wrap=True)
right.add_column(no_wrap=True)
right.add_row(
"requests",
Text(f"{stats.count:,}", style="bold"),
)
if stats.error_count:
err_text = Text()
err_text.append(f"{stats.error_count:,}", style="red bold")
parts: list[str] = []
if stats.errors_4xx:
parts.append(f"{stats.errors_4xx} 4xx")
if stats.errors_5xx:
parts.append(f"{stats.errors_5xx} 5xx")
if parts:
err_text.append(f" ({' · '.join(parts)})", style="dim")
right.add_row("errors", err_text)
cost_str = _fmt_cost(stats.total_cost_usd) if stats.has_cost else ""
right.add_row("total cost", Text(cost_str, style="green bold"))
tokens_total = Text()
tokens_total.append(_fmt_tokens(stats.total_input_tokens))
tokens_total.append(" in", style="dim")
tokens_total.append(" · ", style="dim")
tokens_total.append(_fmt_tokens(stats.total_output_tokens), style="green")
tokens_total.append(" out", style="dim")
right.add_row("tokens", tokens_total)
lat_text = Text()
lat_text.append("p50 ", style="dim")
lat_text.append(
_fmt_ms(stats.p50_latency_ms), style=_latency_style(stats.p50_latency_ms)
)
lat_text.append(" · ", style="dim")
lat_text.append("p95 ", style="dim")
lat_text.append(
_fmt_ms(stats.p95_latency_ms), style=_latency_style(stats.p95_latency_ms)
)
lat_text.append(" · ", style="dim")
lat_text.append("p99 ", style="dim")
lat_text.append(
_fmt_ms(stats.p99_latency_ms), style=_latency_style(stats.p99_latency_ms)
)
right.add_row("latency", lat_text)
ttft_text = Text()
ttft_text.append("p50 ", style="dim")
ttft_text.append(_fmt_ms(stats.p50_ttft_ms), style=_ttft_style(stats.p50_ttft_ms))
ttft_text.append(" · ", style="dim")
ttft_text.append("p95 ", style="dim")
ttft_text.append(_fmt_ms(stats.p95_ttft_ms), style=_ttft_style(stats.p95_ttft_ms))
ttft_text.append(" · ", style="dim")
ttft_text.append("p99 ", style="dim")
ttft_text.append(_fmt_ms(stats.p99_ttft_ms), style=_ttft_style(stats.p99_ttft_ms))
right.add_row("TTFT", ttft_text)
sess = Text()
sess.append(f"{stats.distinct_sessions}")
if stats.current_session:
sess.append(" · current ", style="dim")
sess.append(stats.current_session, style="magenta")
right.add_row("sessions", sess)
def _totals_panel(stats: AggregateStats) -> Panel:
t = Table.grid(padding=(0, 1))
t.add_column(style="bold magenta")
t.add_column()
t.add_column(style="bold magenta")
t.add_column()
t.add_row(
"Total cost",
_fmt_cost(stats.total_cost_usd),
"Requests",
str(stats.count),
grid.add_row(left, right)
return Panel(
grid,
title="[bold]live LLM traffic[/]",
border_style="cyan",
box=SIMPLE_HEAVY,
padding=(0, 1),
)
t.add_row(
"Input",
_fmt_tokens(stats.total_input_tokens),
"Output",
_fmt_tokens(stats.total_output_tokens),
)
t.add_row(
"Sessions",
str(stats.distinct_sessions),
"Current session",
stats.current_session or "",
)
return Panel(t, title="[bold]Totals[/]", border_style="magenta", box=SIMPLE)
def _model_rollup_table(rollups: list[ModelRollup]) -> Table:
table = Table(
title="Totals by model",
title="by model",
title_justify="left",
title_style="bold dim",
caption="cost via DigitalOcean Gradient catalog",
caption_justify="left",
caption_style="dim italic",
box=SIMPLE,
header_style="bold",
expand=True,
pad_edge=False,
padding=(0, 1),
)
table.add_column("Model", style="cyan")
table.add_column("Req", justify="right")
table.add_column("Input", justify="right")
table.add_column("Output", justify="right", style="green")
table.add_column("Cache write", justify="right", style="yellow")
table.add_column("Cache read", justify="right", style="yellow")
table.add_column("Cost", justify="right", style="green")
table.add_column("model", style="cyan", no_wrap=True)
table.add_column("req", justify="right")
table.add_column("input", justify="right")
table.add_column("output", justify="right", style="green")
table.add_column("cache wr", justify="right", style="yellow")
table.add_column("cache rd", justify="right", style="yellow")
table.add_column("tok/s", justify="right")
table.add_column("cost", justify="right", style="green")
if not rollups:
table.add_row("", "", "", "", "", "", "")
for r in rollups:
table.add_row(
r.model,
str(r.requests),
Text("no requests yet", style="dim italic"),
*([""] * 7),
)
return table
for r in rollups:
cost_cell = _fmt_cost(r.cost_usd) if r.has_cost else ""
table.add_row(
_truncate_model(r.model),
f"{r.requests:,}",
_fmt_tokens(r.input_tokens),
_fmt_tokens(r.output_tokens),
_fmt_int(r.cache_write),
_fmt_int(r.cache_read),
_fmt_cost(r.cost_usd),
_fmt_tps(r.avg_tokens_per_sec),
cost_cell,
)
return table
def _route_hit_table(hits: list[tuple[str, int, float]]) -> Table:
def _route_hit_table(hits: list[RouteHit]) -> Table:
table = Table(
title="Route hit %",
title="route share",
title_justify="left",
title_style="bold dim",
box=SIMPLE,
header_style="bold",
expand=True,
pad_edge=False,
padding=(0, 1),
)
table.add_column("Route", style="cyan")
table.add_column("Hits", justify="right")
table.add_column("route", style="cyan")
table.add_column("hits", justify="right")
table.add_column("%", justify="right")
for route, n, pct in hits:
table.add_row(route, str(n), f"{pct:.1f}")
table.add_column("p95", justify="right")
table.add_column("err", justify="right")
for h in hits:
err_cell = (
Text(f"{h.error_count:,}", style="red bold") if h.error_count else ""
)
table.add_row(
h.route,
f"{h.hits:,}",
f"{h.pct:5.1f}%",
Text(_fmt_ms(h.p95_latency_ms), style=_latency_style(h.p95_latency_ms)),
err_cell,
)
return table
def _recent_table(calls: list[LLMCall], limit: int = 15) -> Table:
show_route = any(c.route_name for c in calls)
show_cache = any((c.cached_input_tokens or 0) > 0 for c in calls)
show_rsn = any((c.reasoning_tokens or 0) > 0 for c in calls)
caption_parts = ["in·new = fresh prompt tokens"]
if show_cache:
caption_parts.append("in·cache = cached read")
if show_rsn:
caption_parts.append("rsn = reasoning")
caption_parts.append("lat = total latency")
table = Table(
title="Recent requests",
title=f"recent · last {min(limit, len(calls)) if calls else 0}",
title_justify="left",
title_style="bold dim",
caption=" · ".join(caption_parts),
caption_justify="left",
caption_style="dim italic",
box=SIMPLE,
header_style="bold",
expand=True,
pad_edge=False,
padding=(0, 1),
)
table.add_column("time")
table.add_column("model", style="cyan")
table.add_column("time", no_wrap=True)
table.add_column("model", style="cyan", no_wrap=True)
if show_route:
table.add_column("route", style="yellow")
table.add_column("in", justify="right")
table.add_column("cache", justify="right", style="yellow")
table.add_column("route", style="yellow", no_wrap=True)
table.add_column("in·new", justify="right")
if show_cache:
table.add_column("in·cache", justify="right", style="yellow")
table.add_column("out", justify="right", style="green")
table.add_column("rsn", justify="right")
table.add_column("cost", justify="right", style="green")
if show_rsn:
table.add_column("rsn", justify="right")
table.add_column("tok/s", justify="right")
table.add_column("TTFT", justify="right")
table.add_column("lat", justify="right")
table.add_column("st")
table.add_column("cost", justify="right", style="green")
table.add_column("status")
if not calls:
cols = len(table.columns)
table.add_row(
Text("waiting for spans…", style="dim italic"),
*([""] * (cols - 1)),
)
return table
recent = list(reversed(calls))[:limit]
for c in recent:
status_cell = (
"ok"
if c.status_code and 200 <= c.status_code < 400
else str(c.status_code or "")
)
row = [
c.timestamp.strftime("%H:%M:%S"),
c.model,
for idx, c in enumerate(recent):
is_newest = idx == 0
time_style = "bold white" if is_newest else None
model_style = "bold cyan" if is_newest else "cyan"
row: list[object] = [
(
Text(c.timestamp.strftime("%H:%M:%S"), style=time_style)
if time_style
else c.timestamp.strftime("%H:%M:%S")
),
Text(_truncate_model(c.model), style=model_style),
]
if show_route:
row.append(c.route_name or "")
row.append(_fmt_tokens(c.prompt_tokens))
if show_cache:
row.append(_fmt_int(c.cached_input_tokens))
row.append(_fmt_tokens(c.completion_tokens))
if show_rsn:
row.append(_fmt_int(c.reasoning_tokens))
row.extend(
[
_fmt_tokens(c.prompt_tokens),
_fmt_int(c.cached_input_tokens),
_fmt_tokens(c.completion_tokens),
_fmt_int(c.reasoning_tokens),
_fmt_tps(c.tokens_per_sec),
Text(_fmt_ms(c.ttft_ms), style=_ttft_style(c.ttft_ms)),
Text(_fmt_ms(c.duration_ms), style=_latency_style(c.duration_ms)),
_fmt_cost(c.cost_usd),
_fmt_ms(c.ttft_ms),
_fmt_ms(c.duration_ms),
status_cell,
_status_text(c.status_code),
]
)
table.add_row(*row)
if not recent:
table.add_row(*(["no requests yet"] + [""] * (10 if show_route else 9)))
return table
def render(calls: list[LLMCall]) -> Group:
def _last_error(calls: list[LLMCall]) -> LLMCall | None:
for c in reversed(calls):
if c.status_code is not None and c.status_code >= 400:
return c
return None
def _http_reason(code: int) -> str:
try:
return HTTPStatus(code).phrase
except ValueError:
return ""
def _fmt_ago(ts: datetime) -> str:
# `ts` is produced in collector.py via datetime.now(tz=...), but fall back
# gracefully if a naive timestamp ever sneaks in.
now = datetime.now(tz=ts.tzinfo) if ts.tzinfo else datetime.now()
delta = (now - ts).total_seconds()
if delta < 0:
delta = 0
if delta < 60:
return f"{int(delta)}s ago"
if delta < 3600:
return f"{int(delta // 60)}m ago"
return f"{int(delta // 3600)}h ago"
def _error_banner(call: LLMCall) -> Panel:
code = call.status_code or 0
border = "red" if code >= 500 else "yellow"
header = Text()
header.append(f"{code}", style=f"{border} bold")
reason = _http_reason(code)
if reason:
header.append(f" {reason}", style=border)
header.append(" · ", style="dim")
header.append(_truncate_model(call.model, 48), style="cyan")
if call.route_name:
header.append(" · ", style="dim")
header.append(call.route_name, style="yellow")
header.append(" · ", style="dim")
header.append(_fmt_ago(call.timestamp), style="dim")
if call.request_id:
header.append(" · req ", style="dim")
header.append(call.request_id, style="magenta")
return Panel(
header,
title="[bold]last error[/]",
title_align="left",
border_style=border,
box=SIMPLE,
padding=(0, 1),
)
def _footer(stats: AggregateStats) -> Text:
waiting = stats.count == 0
text = Text()
text.append("Ctrl-C ", style="bold")
text.append("exit", style="dim")
text.append(" · OTLP :4317", style="dim")
text.append(" · pricing: DigitalOcean ", style="dim")
if waiting:
text.append("waiting for spans", style="yellow")
text.append(
" — set tracing.opentracing_grpc_endpoint=localhost:4317", style="dim"
)
else:
text.append(f"receiving · {stats.count:,} call(s) buffered", style="green")
return text
def render(calls: list[LLMCall]) -> Align:
last = calls[-1] if calls else None
stats = aggregates(calls)
rollups = model_rollups(calls)
hits = route_hits(calls)
header = Columns(
[_request_panel(last), _cost_panel(last), _totals_panel(stats)],
expand=True,
equal=True,
)
parts = [
header,
_model_rollup_table(rollups),
]
parts: list[object] = [_summary_panel(last, stats)]
err = _last_error(calls)
if err is not None:
parts.append(_error_banner(err))
if hits:
parts.append(_route_hit_table(hits))
split = Table.grid(padding=(0, 2))
split.add_column(no_wrap=False)
split.add_column(no_wrap=False)
split.add_row(_model_rollup_table(rollups), _route_hit_table(hits))
parts.append(split)
else:
parts.append(_model_rollup_table(rollups))
parts.append(_recent_table(calls))
parts.append(
Text(
"q quit · c clear · waiting for spans on OTLP :4317 — brightstaff needs "
"tracing.opentracing_grpc_endpoint=localhost:4317",
style="dim",
)
)
return Group(*parts)
parts.append(_footer(stats))
# Cap overall width so wide terminals don't stretch the layout into a
# mostly-whitespace gap between columns.
return Align.left(Group(*parts), width=MAX_WIDTH)

View file

@ -91,7 +91,12 @@ def convert_legacy_listeners(
"type": "model",
"port": 12000,
"address": "0.0.0.0",
"timeout": "30s",
# LLM streaming responses routinely exceed 30s (extended thinking,
# long tool reasoning, large completions). Match the 300s ceiling
# used by the direct upstream-provider routes so Envoy doesn't
# abort streams with UT mid-response. Users can override via their
# plano_config.yaml `listeners.timeout` field.
"timeout": "300s",
"model_providers": model_providers or [],
}
@ -100,7 +105,7 @@ def convert_legacy_listeners(
"type": "prompt",
"port": 10000,
"address": "0.0.0.0",
"timeout": "30s",
"timeout": "300s",
}
# Handle None case

View file

@ -1,6 +1,6 @@
[project]
name = "planoai"
version = "0.4.20"
version = "0.4.21"
description = "Python-based CLI tool to manage Plano."
authors = [{name = "Katanemo Labs, Inc."}]
readme = "README.md"

View file

@ -1,7 +1,11 @@
import json
import pytest
import yaml
from unittest import mock
from planoai.config_generator import validate_and_render_schema
from planoai.config_generator import (
validate_and_render_schema,
migrate_inline_routing_preferences,
)
@pytest.fixture(autouse=True)
@ -253,38 +257,72 @@ llm_providers:
base_url: "http://custom.com/api/v2"
provider_interface: openai
""",
},
{
"id": "vercel_is_supported_provider",
"expected_error": None,
"plano_config": """
version: v0.4.0
listeners:
- name: llm
type: model
port: 12000
model_providers:
- model: vercel/*
base_url: https://ai-gateway.vercel.sh/v1
passthrough_auth: true
""",
},
{
"id": "openrouter_is_supported_provider",
"expected_error": None,
"plano_config": """
version: v0.4.0
listeners:
- name: llm
type: model
port: 12000
model_providers:
- model: openrouter/*
base_url: https://openrouter.ai/api/v1
passthrough_auth: true
""",
},
{
"id": "duplicate_routeing_preference_name",
"expected_error": "Duplicate routing preference name",
"plano_config": """
version: v0.1.0
version: v0.4.0
listeners:
egress_traffic:
address: 0.0.0.0
- name: llm
type: model
port: 12000
message_format: openai
timeout: 30s
llm_providers:
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
- model: openai/gpt-4.1
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
models:
- openai/gpt-4o
- name: code understanding
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
models:
- openai/gpt-4o-mini
tracing:
random_sampling: 100
@ -465,3 +503,238 @@ def test_convert_legacy_llm_providers_no_prompt_gateway():
"port": 12000,
"timeout": "30s",
}
def test_inline_routing_preferences_migrated_to_top_level():
plano_config = """
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
"""
config_yaml = yaml.safe_load(plano_config)
migrate_inline_routing_preferences(config_yaml)
assert config_yaml["version"] == "v0.4.0"
for provider in config_yaml["model_providers"]:
assert "routing_preferences" not in provider
top_level = config_yaml["routing_preferences"]
by_name = {entry["name"]: entry for entry in top_level}
assert set(by_name) == {"code understanding", "code generation"}
assert by_name["code understanding"]["models"] == ["openai/gpt-4o"]
assert by_name["code generation"]["models"] == [
"anthropic/claude-sonnet-4-20250514"
]
assert (
by_name["code understanding"]["description"]
== "understand and explain existing code snippets, functions, or libraries"
)
def test_inline_same_name_across_providers_merges_models():
plano_config = """
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
"""
config_yaml = yaml.safe_load(plano_config)
migrate_inline_routing_preferences(config_yaml)
top_level = config_yaml["routing_preferences"]
assert len(top_level) == 1
entry = top_level[0]
assert entry["name"] == "code generation"
assert entry["models"] == [
"openai/gpt-4o",
"anthropic/claude-sonnet-4-20250514",
]
assert config_yaml["version"] == "v0.4.0"
def test_existing_top_level_routing_preferences_preserved():
plano_config = """
version: v0.4.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: code generation
description: generating new code snippets or boilerplate
models:
- openai/gpt-4o
- anthropic/claude-sonnet-4-20250514
"""
config_yaml = yaml.safe_load(plano_config)
before = yaml.safe_dump(config_yaml, sort_keys=True)
migrate_inline_routing_preferences(config_yaml)
after = yaml.safe_dump(config_yaml, sort_keys=True)
assert before == after
def test_existing_top_level_wins_over_inline_migration():
plano_config = """
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code generation
description: inline description should lose
routing_preferences:
- name: code generation
description: user-defined top-level description wins
models:
- openai/gpt-4o
"""
config_yaml = yaml.safe_load(plano_config)
migrate_inline_routing_preferences(config_yaml)
top_level = config_yaml["routing_preferences"]
assert len(top_level) == 1
entry = top_level[0]
assert entry["description"] == "user-defined top-level description wins"
assert entry["models"] == ["openai/gpt-4o"]
def test_wildcard_with_inline_routing_preferences_errors():
plano_config = """
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openrouter/*
base_url: https://openrouter.ai/api/v1
passthrough_auth: true
routing_preferences:
- name: code generation
description: generating code
"""
config_yaml = yaml.safe_load(plano_config)
with pytest.raises(Exception) as excinfo:
migrate_inline_routing_preferences(config_yaml)
assert "wildcard" in str(excinfo.value).lower()
def test_migration_bumps_version_even_without_inline_preferences():
plano_config = """
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
"""
config_yaml = yaml.safe_load(plano_config)
migrate_inline_routing_preferences(config_yaml)
assert "routing_preferences" not in config_yaml
assert config_yaml["version"] == "v0.4.0"
def test_migration_is_noop_on_v040_config_with_stray_inline_preferences():
# v0.4.0 configs are assumed to be on the canonical top-level shape.
# The migration intentionally does not rescue stray inline preferences
# at v0.4.0+ so that the deprecation boundary is a clean version gate.
plano_config = """
version: v0.4.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code generation
description: generating new code
"""
config_yaml = yaml.safe_load(plano_config)
migrate_inline_routing_preferences(config_yaml)
assert config_yaml["version"] == "v0.4.0"
assert "routing_preferences" not in config_yaml
assert config_yaml["model_providers"][0]["routing_preferences"] == [
{"name": "code generation", "description": "generating new code"}
]
def test_migration_does_not_downgrade_newer_versions():
plano_config = """
version: v0.5.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
"""
config_yaml = yaml.safe_load(plano_config)
migrate_inline_routing_preferences(config_yaml)
assert config_yaml["version"] == "v0.5.0"

View file

@ -28,6 +28,8 @@ def test_zero_env_vars_produces_pure_passthrough():
# All known providers should be listed.
names = {p["name"] for p in cfg["model_providers"]}
assert "digitalocean" in names
assert "vercel" in names
assert "openrouter" in names
assert "openai" in names
assert "anthropic" in names
@ -84,3 +86,26 @@ def test_provider_defaults_digitalocean_is_configured():
assert by_name["digitalocean"].env_var == "DO_API_KEY"
assert by_name["digitalocean"].base_url == "https://inference.do-ai.run/v1"
assert by_name["digitalocean"].model_pattern == "digitalocean/*"
def test_provider_defaults_vercel_is_configured():
by_name = {p.name: p for p in PROVIDER_DEFAULTS}
assert "vercel" in by_name
assert by_name["vercel"].env_var == "AI_GATEWAY_API_KEY"
assert by_name["vercel"].base_url == "https://ai-gateway.vercel.sh/v1"
assert by_name["vercel"].model_pattern == "vercel/*"
def test_provider_defaults_openrouter_is_configured():
by_name = {p.name: p for p in PROVIDER_DEFAULTS}
assert "openrouter" in by_name
assert by_name["openrouter"].env_var == "OPENROUTER_API_KEY"
assert by_name["openrouter"].base_url == "https://openrouter.ai/api/v1"
assert by_name["openrouter"].model_pattern == "openrouter/*"
def test_openrouter_env_key_promotes_to_env_keyed():
cfg = synthesize_default_config(env={"OPENROUTER_API_KEY": "or-1"})
by_name = {p["name"]: p for p in cfg["model_providers"]}
assert by_name["openrouter"].get("access_key") == "$OPENROUTER_API_KEY"
assert by_name["openrouter"].get("passthrough_auth") is None

View file

@ -83,6 +83,49 @@ def test_parse_do_catalog_treats_small_values_as_per_token():
assert prices["openai-gpt-oss-120b"].input_per_token_usd == 1e-7
def test_anthropic_aliases_match_plano_emitted_names():
"""DO publishes 'anthropic-claude-opus-4.7' and 'anthropic-claude-haiku-4.5';
Plano emits 'claude-opus-4-7' and 'claude-haiku-4-5-20251001'. Aliases
registered at parse time should bridge the gap."""
from planoai.obs.pricing import _parse_do_pricing
sample = {
"data": [
{
"model_id": "anthropic-claude-opus-4.7",
"pricing": {
"input_price_per_million": 15.0,
"output_price_per_million": 75.0,
},
},
{
"model_id": "anthropic-claude-haiku-4.5",
"pricing": {
"input_price_per_million": 1.0,
"output_price_per_million": 5.0,
},
},
{
"model_id": "anthropic-claude-4.6-sonnet",
"pricing": {
"input_price_per_million": 3.0,
"output_price_per_million": 15.0,
},
},
]
}
catalog = PricingCatalog(_parse_do_pricing(sample))
# Family-last shapes Plano emits.
assert catalog.price_for("claude-opus-4-7") is not None
assert catalog.price_for("claude-haiku-4-5") is not None
# Date-suffixed name (Anthropic API style).
assert catalog.price_for("claude-haiku-4-5-20251001") is not None
# Word-order swap: DO has 'claude-4.6-sonnet', Plano emits 'claude-sonnet-4-6'.
assert catalog.price_for("claude-sonnet-4-6") is not None
# Original DO ids still resolve.
assert catalog.price_for("anthropic-claude-opus-4.7") is not None
def test_parse_do_catalog_divides_large_values_as_per_million():
"""A provider that genuinely reports $5-per-million in that field gets divided."""
from planoai.obs.pricing import _parse_do_pricing

View file

@ -94,10 +94,10 @@ def test_route_hits_only_for_routed_calls():
]
hits = route_hits(calls)
# Only calls with route names are counted.
assert sum(n for _, n, _ in hits) == 3
hits_by_name = {name: (n, pct) for name, n, pct in hits}
assert hits_by_name["code"][0] == 2
assert hits_by_name["summarization"][0] == 1
assert sum(h.hits for h in hits) == 3
hits_by_name = {h.route: h for h in hits}
assert hits_by_name["code"].hits == 2
assert hits_by_name["summarization"].hits == 1
def test_route_hits_empty_when_no_routes():

2
cli/uv.lock generated
View file

@ -337,7 +337,7 @@ wheels = [
[[package]]
name = "planoai"
version = "0.4.20"
version = "0.4.21"
source = { editable = "." }
dependencies = [
{ name = "click" },

View file

@ -0,0 +1,541 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"description": "RED, LLM upstream, routing service, and process metrics for brightstaff. Pair with Envoy admin metrics from cluster=bright_staff.",
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 0 },
"id": 100,
"panels": [],
"title": "HTTP RED",
"type": "row"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisLabel": "req/s",
"drawStyle": "line",
"fillOpacity": 10,
"lineWidth": 1,
"showPoints": "never"
},
"unit": "reqps"
}
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 1 },
"id": 1,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum by (handler) (rate(brightstaff_http_requests_total[1m]))",
"legendFormat": "{{handler}}",
"refId": "A"
}
],
"title": "Rate — brightstaff RPS by handler",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "5xx fraction over 5m. Page-worthy when sustained above ~1%.",
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 0.01 },
{ "color": "red", "value": 0.05 }
]
},
"unit": "percentunit"
}
},
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 1 },
"id": 2,
"options": {
"colorMode": "background",
"graphMode": "area",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum(rate(brightstaff_http_requests_total{status_class=\"5xx\"}[5m])) / clamp_min(sum(rate(brightstaff_http_requests_total[5m])), 1)",
"legendFormat": "5xx rate",
"refId": "A"
}
],
"title": "Errors — brightstaff 5xx rate",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "p50/p95/p99 by handler, computed from histogram buckets over 5m.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 5, "lineWidth": 1, "showPoints": "never" },
"unit": "s"
}
},
"gridPos": { "h": 9, "w": 24, "x": 0, "y": 9 },
"id": 3,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "histogram_quantile(0.50, sum by (le, handler) (rate(brightstaff_http_request_duration_seconds_bucket[5m])))",
"legendFormat": "p50 {{handler}}",
"refId": "A"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "histogram_quantile(0.95, sum by (le, handler) (rate(brightstaff_http_request_duration_seconds_bucket[5m])))",
"legendFormat": "p95 {{handler}}",
"refId": "B"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "histogram_quantile(0.99, sum by (le, handler) (rate(brightstaff_http_request_duration_seconds_bucket[5m])))",
"legendFormat": "p99 {{handler}}",
"refId": "C"
}
],
"title": "Duration — p50 / p95 / p99 by handler",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "In-flight requests by handler. Climbs before latency does when brightstaff is saturated.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 10, "lineWidth": 1, "showPoints": "never" },
"unit": "short"
}
},
"gridPos": { "h": 8, "w": 24, "x": 0, "y": 18 },
"id": 4,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum by (handler) (brightstaff_http_in_flight_requests)",
"legendFormat": "{{handler}}",
"refId": "A"
}
],
"title": "In-flight requests by handler",
"type": "timeseries"
},
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 26 },
"id": 200,
"panels": [],
"title": "LLM upstream",
"type": "row"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 5, "lineWidth": 1, "showPoints": "never" },
"unit": "s"
}
},
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 27 },
"id": 5,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "histogram_quantile(0.95, sum by (le, provider, model) (rate(brightstaff_llm_upstream_duration_seconds_bucket[5m])))",
"legendFormat": "p95 {{provider}}/{{model}}",
"refId": "A"
}
],
"title": "LLM upstream p95 by provider/model",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "All non-success error classes. timeout/connect = network, 5xx/429 = provider, parse = body shape mismatch, stream = mid-stream disconnect.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 30, "lineWidth": 1, "showPoints": "never", "stacking": { "mode": "normal" } },
"unit": "reqps"
}
},
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 27 },
"id": 6,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum by (provider, error_class) (rate(brightstaff_llm_upstream_requests_total{error_class!=\"none\"}[5m]))",
"legendFormat": "{{provider}} / {{error_class}}",
"refId": "A"
}
],
"title": "LLM upstream errors by provider / class",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "Streaming only. Empty if the route never streams.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 5, "lineWidth": 1, "showPoints": "never" },
"unit": "s"
}
},
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 36 },
"id": 7,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "histogram_quantile(0.95, sum by (le, provider, model) (rate(brightstaff_llm_time_to_first_token_seconds_bucket[5m])))",
"legendFormat": "p95 {{provider}}/{{model}}",
"refId": "A"
}
],
"title": "Time-to-first-token p95 (streaming)",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "Tokens/sec by provider/model/kind — proxy for cost. Stacked.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 30, "lineWidth": 1, "showPoints": "never", "stacking": { "mode": "normal" } },
"unit": "tokens/s"
}
},
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 36 },
"id": 8,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum by (provider, model, kind) (rate(brightstaff_llm_tokens_total[5m]))",
"legendFormat": "{{provider}}/{{model}} {{kind}}",
"refId": "A"
}
],
"title": "Token throughput by provider / model / kind",
"type": "timeseries"
},
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 45 },
"id": 300,
"panels": [],
"title": "Routing service",
"type": "row"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "Which models the orchestrator picked over the last 15 minutes.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"unit": "short"
}
},
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 46 },
"id": 9,
"options": {
"displayMode": "gradient",
"orientation": "horizontal",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum by (selected_model) (increase(brightstaff_router_decisions_total[15m]))",
"legendFormat": "{{selected_model}}",
"refId": "A"
}
],
"title": "Model selection distribution (last 15m)",
"type": "bargauge"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "Fraction of decisions that fell back (orchestrator returned `none` or errored). High = router can't classify intent or no candidates configured.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 10, "lineWidth": 1, "showPoints": "never" },
"unit": "percentunit"
}
},
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 46 },
"id": 10,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum by (route) (rate(brightstaff_router_decisions_total{fallback=\"true\"}[5m])) / clamp_min(sum by (route) (rate(brightstaff_router_decisions_total[5m])), 1)",
"legendFormat": "{{route}}",
"refId": "A"
}
],
"title": "Fallback rate by route",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 5, "lineWidth": 1, "showPoints": "never" },
"unit": "s"
}
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 55 },
"id": 11,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "histogram_quantile(0.95, sum by (le, route) (rate(brightstaff_router_decision_duration_seconds_bucket[5m])))",
"legendFormat": "p95 {{route}}",
"refId": "A"
}
],
"title": "Router decision p95 latency",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "Hit / (hit + miss). Low ratio = sessions aren't being reused or TTL too short.",
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "red", "value": null },
{ "color": "yellow", "value": 0.5 },
{ "color": "green", "value": 0.8 }
]
},
"unit": "percentunit",
"min": 0,
"max": 1
}
},
"gridPos": { "h": 8, "w": 6, "x": 12, "y": 55 },
"id": 12,
"options": {
"colorMode": "background",
"graphMode": "area",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum(rate(brightstaff_session_cache_events_total{outcome=\"hit\"}[5m])) / clamp_min(sum(rate(brightstaff_session_cache_events_total{outcome=~\"hit|miss\"}[5m])), 1)",
"legendFormat": "hit rate",
"refId": "A"
}
],
"title": "Session cache hit rate",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "decision_served = a real model picked. no_candidates = sentinel `none` returned. policy_error = orchestrator failed.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 30, "lineWidth": 1, "showPoints": "never", "stacking": { "mode": "normal" } },
"unit": "reqps"
}
},
"gridPos": { "h": 8, "w": 6, "x": 18, "y": 55 },
"id": 13,
"options": {
"legend": { "displayMode": "list", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum by (outcome) (rate(brightstaff_routing_service_requests_total[5m]))",
"legendFormat": "{{outcome}}",
"refId": "A"
}
],
"title": "/routing/* outcomes",
"type": "timeseries"
},
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 63 },
"id": 400,
"panels": [],
"title": "Process & Envoy link",
"type": "row"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"description": "Compare to brightstaff RPS (panel 1) — sustained gap = network or Envoy queueing.",
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 10, "lineWidth": 1, "showPoints": "never" },
"unit": "reqps"
}
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 64 },
"id": 14,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum(rate(envoy_cluster_upstream_rq_total{envoy_cluster_name=\"bright_staff\"}[1m]))",
"legendFormat": "envoy → bright_staff",
"refId": "A"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "sum(rate(brightstaff_http_requests_total[1m]))",
"legendFormat": "brightstaff served",
"refId": "B"
}
],
"title": "Envoy → brightstaff link health",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": { "drawStyle": "line", "fillOpacity": 10, "lineWidth": 1, "showPoints": "never" }
},
"overrides": [
{
"matcher": { "id": "byName", "options": "RSS" },
"properties": [{ "id": "unit", "value": "bytes" }]
},
{
"matcher": { "id": "byName", "options": "CPU" },
"properties": [{ "id": "unit", "value": "percentunit" }]
}
]
},
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 64 },
"id": 15,
"options": {
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "process_resident_memory_bytes{job=\"brightstaff\"}",
"legendFormat": "RSS",
"refId": "A"
},
{
"datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
"expr": "rate(process_cpu_seconds_total{job=\"brightstaff\"}[1m])",
"legendFormat": "CPU",
"refId": "B"
}
],
"title": "Brightstaff process RSS / CPU",
"type": "timeseries"
}
],
"refresh": "30s",
"schemaVersion": 39,
"tags": ["plano", "brightstaff", "llm"],
"templating": {
"list": [
{
"name": "DS_PROMETHEUS",
"label": "Prometheus",
"type": "datasource",
"query": "prometheus",
"current": { "selected": false, "text": "Prometheus", "value": "DS_PROMETHEUS" },
"hide": 0,
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"includeAll": false,
"multi": false
}
]
},
"time": { "from": "now-1h", "to": "now" },
"timepicker": {},
"timezone": "browser",
"title": "Brightstaff (Plano dataplane)",
"uid": "brightstaff",
"version": 1,
"weekStart": ""
}

View file

@ -0,0 +1,43 @@
# One-command Prometheus + Grafana stack for observing a locally-running
# Plano (Envoy admin :9901 + brightstaff :9092 on the host).
#
# cd config/grafana
# docker compose up -d
# open http://localhost:3000 (admin / admin)
#
# Grafana is preloaded with:
# - Prometheus datasource (uid=DS_PROMETHEUS) → http://prometheus:9090
# - Brightstaff dashboard (auto-imported from brightstaff_dashboard.json)
#
# Prometheus scrapes the host's :9092 and :9901 via host.docker.internal.
# On Linux this works because of the `extra_hosts: host-gateway` mapping
# below. On Mac/Win it works natively.
services:
prometheus:
image: prom/prometheus:latest
container_name: plano-prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus_scrape.yaml:/etc/prometheus/prometheus.yml:ro
extra_hosts:
- "host.docker.internal:host-gateway"
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: plano-grafana
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_USER: admin
GF_SECURITY_ADMIN_PASSWORD: admin
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: Viewer
volumes:
- ./provisioning:/etc/grafana/provisioning:ro
- ./brightstaff_dashboard.json:/var/lib/grafana/dashboards/brightstaff_dashboard.json:ro
depends_on:
- prometheus
restart: unless-stopped

View file

@ -0,0 +1,44 @@
# Prometheus config that scrapes Plano (Envoy admin + brightstaff). This is
# a complete Prometheus config — mount it directly at
# /etc/prometheus/prometheus.yml. The included docker-compose.yaml does this
# for you.
#
# Targets:
# - envoy:9901 Envoy admin → envoy_cluster_*, envoy_http_*, envoy_server_*.
# - brightstaff:9092 Native dataplane → brightstaff_http_*, brightstaff_llm_*,
# brightstaff_router_*, process_*.
#
# Hostname `host.docker.internal` works on Docker Desktop (Mac/Win) and on
# Linux when the container is started with `--add-host=host.docker.internal:
# host-gateway` (the included compose does this). If Plano runs *inside*
# Docker on the same network as Prometheus, replace it with the container
# name (e.g. `plano:9092`).
#
# This file is unrelated to demos/llm_routing/model_routing_service/prometheus.yaml,
# which scrapes a fake metrics service to feed the routing engine.
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
scrape_configs:
- job_name: envoy
honor_timestamps: true
metrics_path: /stats
params:
format: ["prometheus"]
static_configs:
- targets:
- host.docker.internal:9901
labels:
service: plano
- job_name: brightstaff
honor_timestamps: true
metrics_path: /metrics
static_configs:
- targets:
- host.docker.internal:9092
labels:
service: plano

View file

@ -0,0 +1,15 @@
# Auto-load the brightstaff dashboard JSON on Grafana startup.
apiVersion: 1
providers:
- name: brightstaff
orgId: 1
folder: Plano
type: file
disableDeletion: false
updateIntervalSeconds: 30
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: false

View file

@ -0,0 +1,14 @@
# Auto-provision the Prometheus datasource so the bundled dashboard wires up
# without any clicks. The `uid: DS_PROMETHEUS` matches the templated input in
# brightstaff_dashboard.json.
apiVersion: 1
datasources:
- name: Prometheus
uid: DS_PROMETHEUS
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true

View file

@ -190,9 +190,18 @@ properties:
- openai
- xiaomi
- gemini
- chatgpt
- digitalocean
- vercel
- openrouter
headers:
type: object
additionalProperties:
type: string
description: "Additional headers to send with upstream requests (e.g., ChatGPT-Account-Id, originator)."
routing_preferences:
type: array
description: "[DEPRECATED] Inline routing_preferences under a model_provider are auto-migrated to the top-level routing_preferences list by the config generator. New configs should declare routing_preferences at the top level with an explicit models: [...] list. See docs/routing-api.md."
items:
type: object
properties:
@ -239,9 +248,18 @@ properties:
- openai
- xiaomi
- gemini
- chatgpt
- digitalocean
- vercel
- openrouter
headers:
type: object
additionalProperties:
type: string
description: "Additional headers to send with upstream requests (e.g., ChatGPT-Account-Id, originator)."
routing_preferences:
type: array
description: "[DEPRECATED] Inline routing_preferences under an llm_provider are auto-migrated to the top-level routing_preferences list by the config generator. New configs should declare routing_preferences at the top level with an explicit models: [...] list. See docs/routing-api.md."
items:
type: object
properties:
@ -278,6 +296,9 @@ properties:
type: boolean
use_agent_orchestrator:
type: boolean
disable_signals:
type: boolean
description: "Disable agentic signal analysis (frustration, repetition, escalation, etc.) on LLM responses to save CPU. Default false."
upstream_connect_timeout:
type: string
description: "Connect timeout for upstream provider clusters (e.g., '5s', '10s'). Default is '5s'."

372
crates/Cargo.lock generated
View file

@ -23,6 +23,18 @@ version = "0.3.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8fd72866655d1904d6b0997d0b07ba561047d070fbe29de039031c641b61217"
[[package]]
name = "ahash"
version = "0.8.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5a15f179cd60c4584b8a8c596927aadc462e27f2ca70c04e0071964a73ba7a75"
dependencies = [
"cfg-if",
"once_cell",
"version_check",
"zerocopy",
]
[[package]]
name = "aho-corasick"
version = "1.1.4"
@ -257,6 +269,24 @@ dependencies = [
"vsimd",
]
[[package]]
name = "bindgen"
version = "0.72.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "993776b509cfb49c750f11b8f07a46fa23e0a1386ffc01fb1e7d343efc387895"
dependencies = [
"bitflags",
"cexpr",
"clang-sys",
"itertools 0.13.0",
"proc-macro2",
"quote",
"regex",
"rustc-hash 2.1.2",
"shlex",
"syn 2.0.117",
]
[[package]]
name = "bit-set"
version = "0.5.3"
@ -316,6 +346,9 @@ dependencies = [
"hyper 1.9.0",
"hyper-util",
"lru",
"metrics 0.23.1",
"metrics-exporter-prometheus",
"metrics-process",
"mockito",
"opentelemetry",
"opentelemetry-http",
@ -325,6 +358,7 @@ dependencies = [
"pretty_assertions",
"rand 0.9.4",
"redis",
"regex",
"reqwest",
"serde",
"serde_json",
@ -332,6 +366,8 @@ dependencies = [
"serde_yaml",
"strsim",
"thiserror 2.0.18",
"tikv-jemalloc-ctl",
"tikv-jemallocator",
"time",
"tokio",
"tokio-postgres",
@ -391,6 +427,15 @@ dependencies = [
"shlex",
]
[[package]]
name = "cexpr"
version = "0.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6fac387a98bb7c37292057cffc56d62ecb629900026402633ae9160df93a8766"
dependencies = [
"nom",
]
[[package]]
name = "cfg-if"
version = "1.0.4"
@ -428,6 +473,17 @@ dependencies = [
"windows-link",
]
[[package]]
name = "clang-sys"
version = "1.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0b023947811758c97c59bf9d1c188fd619ad4718dcaa767947df1cadb14f39f4"
dependencies = [
"glob",
"libc",
"libloading",
]
[[package]]
name = "cmov"
version = "0.5.3"
@ -574,6 +630,21 @@ dependencies = [
"cfg-if",
]
[[package]]
name = "crossbeam-epoch"
version = "0.9.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e"
dependencies = [
"crossbeam-utils",
]
[[package]]
name = "crossbeam-utils"
version = "0.8.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28"
[[package]]
name = "crypto-common"
version = "0.1.7"
@ -1070,6 +1141,12 @@ dependencies = [
"wasip3",
]
[[package]]
name = "glob"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280"
[[package]]
name = "governor"
version = "0.6.3"
@ -1128,7 +1205,7 @@ version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e91b62f79061a0bc2e046024cb7ba44b08419ed238ecbd9adbd787434b9e8c25"
dependencies = [
"ahash",
"ahash 0.3.8",
"autocfg",
]
@ -1138,6 +1215,15 @@ version = "0.12.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8a9ee70c43aaf417c914396645a0fa852624801b24ebb7ae78fe8272889ac888"
[[package]]
name = "hashbrown"
version = "0.14.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e5274423e17b7c9fc20b6e7e208532f9b19825d82dfd615708b70edd83df41f1"
dependencies = [
"ahash 0.8.12",
]
[[package]]
name = "hashbrown"
version = "0.15.5"
@ -1189,6 +1275,12 @@ dependencies = [
"uuid",
]
[[package]]
name = "hermit-abi"
version = "0.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fc0fef456e4baa96da950455cd02c081ca953b141298e41db3fc7e36b1da849c"
[[package]]
name = "hex"
version = "0.4.3"
@ -1665,6 +1757,27 @@ version = "0.2.185"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "52ff2c0fe9bc6cb6b14a0592c2ff4fa9ceb83eea9db979b0487cd054946a2b8f"
[[package]]
name = "libloading"
version = "0.8.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d7c4b02199fee7c5d21a5ae7d8cfa79a6ef5bb2fc834d6e9058e89c825efdc55"
dependencies = [
"cfg-if",
"windows-link",
]
[[package]]
name = "libproc"
version = "0.14.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a54ad7278b8bc5301d5ffd2a94251c004feb971feba96c971ea4063645990757"
dependencies = [
"bindgen",
"errno",
"libc",
]
[[package]]
name = "libredox"
version = "0.1.16"
@ -1745,6 +1858,12 @@ version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "112b39cec0b298b6c1999fee3e31427f74f676e4cb9879ed1a121b43661a4154"
[[package]]
name = "mach2"
version = "0.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dae608c151f68243f2b000364e1f7b186d9c29845f7d2d85bd31b9ad77ad552b"
[[package]]
name = "matchers"
version = "0.2.0"
@ -1782,6 +1901,77 @@ version = "2.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79"
[[package]]
name = "metrics"
version = "0.23.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3045b4193fbdc5b5681f32f11070da9be3609f189a79f3390706d42587f46bb5"
dependencies = [
"ahash 0.8.12",
"portable-atomic",
]
[[package]]
name = "metrics"
version = "0.24.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5d5312e9ba3771cfa961b585728215e3d972c950a3eed9252aa093d6301277e8"
dependencies = [
"ahash 0.8.12",
"portable-atomic",
]
[[package]]
name = "metrics-exporter-prometheus"
version = "0.15.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b4f0c8427b39666bf970460908b213ec09b3b350f20c0c2eabcbba51704a08e6"
dependencies = [
"base64 0.22.1",
"http-body-util",
"hyper 1.9.0",
"hyper-util",
"indexmap 2.14.0",
"ipnet",
"metrics 0.23.1",
"metrics-util",
"quanta",
"thiserror 1.0.69",
"tokio",
"tracing",
]
[[package]]
name = "metrics-process"
version = "2.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4268d87f64a752f5a651314fc683f04da10be65701ea3e721ba4d74f79163cac"
dependencies = [
"libc",
"libproc",
"mach2",
"metrics 0.24.3",
"once_cell",
"procfs",
"rlimit",
"windows",
]
[[package]]
name = "metrics-util"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4259040465c955f9f2f1a4a8a16dc46726169bca0f88e8fb2dbeced487c3e828"
dependencies = [
"crossbeam-epoch",
"crossbeam-utils",
"hashbrown 0.14.5",
"metrics 0.23.1",
"num_cpus",
"quanta",
"sketches-ddsketch",
]
[[package]]
name = "mime"
version = "0.3.17"
@ -1935,6 +2125,16 @@ dependencies = [
"autocfg",
]
[[package]]
name = "num_cpus"
version = "1.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "91df4bbde75afed763b708b7eee1e8e7651e02d97f6d5dd763e89367e957b23b"
dependencies = [
"hermit-abi",
"libc",
]
[[package]]
name = "objc2-core-foundation"
version = "0.3.2"
@ -2125,6 +2325,12 @@ dependencies = [
"windows-link",
]
[[package]]
name = "paste"
version = "1.0.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "57c0d7b74b563b49d38dae00a0c37d4d6de9b432382b2892f0574ddcae73fd0a"
[[package]]
name = "percent-encoding"
version = "2.3.2"
@ -2278,6 +2484,27 @@ dependencies = [
"unicode-ident",
]
[[package]]
name = "procfs"
version = "0.18.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "25485360a54d6861439d60facef26de713b1e126bf015ec8f98239467a2b82f7"
dependencies = [
"bitflags",
"procfs-core",
"rustix",
]
[[package]]
name = "procfs-core"
version = "0.18.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e6401bf7b6af22f78b563665d15a22e9aef27775b79b149a66ca022468a4e405"
dependencies = [
"bitflags",
"hex",
]
[[package]]
name = "prompt_gateway"
version = "0.1.0"
@ -2333,6 +2560,21 @@ dependencies = [
"log",
]
[[package]]
name = "quanta"
version = "0.12.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f3ab5a9d756f0d97bdc89019bd2e4ea098cf9cde50ee7564dde6b81ccc8f06c7"
dependencies = [
"crossbeam-utils",
"libc",
"once_cell",
"raw-cpuid",
"wasi 0.11.1+wasi-snapshot-preview1",
"web-sys",
"winapi",
]
[[package]]
name = "quinn"
version = "0.11.9"
@ -2485,6 +2727,15 @@ version = "0.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "63b8176103e19a2643978565ca18b50549f6101881c443590420e4dc998a3c69"
[[package]]
name = "raw-cpuid"
version = "11.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "498cd0dc59d73224351ee52a95fee0f1a617a2eae0e7d9d720cc622c73a54186"
dependencies = [
"bitflags",
]
[[package]]
name = "redis"
version = "0.27.6"
@ -2646,6 +2897,15 @@ dependencies = [
"windows-sys 0.52.0",
]
[[package]]
name = "rlimit"
version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f35ee2729c56bb610f6dba436bf78135f728b7373bdffae2ec815b2d3eb98cc3"
dependencies = [
"libc",
]
[[package]]
name = "rustc-hash"
version = "1.1.0"
@ -3098,6 +3358,12 @@ version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b2aa850e253778c88a04c3d7323b043aeda9d3e30d5971937c1855769763678e"
[[package]]
name = "sketches-ddsketch"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85636c14b73d81f541e525f585c0a2109e6744e1565b5c1668e31c70c10ed65c"
[[package]]
name = "slab"
version = "0.4.12"
@ -3308,6 +3574,37 @@ dependencies = [
"rustc-hash 1.1.0",
]
[[package]]
name = "tikv-jemalloc-ctl"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "661f1f6a57b3a36dc9174a2c10f19513b4866816e13425d3e418b11cc37bc24c"
dependencies = [
"libc",
"paste",
"tikv-jemalloc-sys",
]
[[package]]
name = "tikv-jemalloc-sys"
version = "0.6.1+5.3.0-1-ge13ca993e8ccb9ba9847cc330696e02839f328f7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cd8aa5b2ab86a2cefa406d889139c162cbb230092f7d1d7cbc1716405d852a3b"
dependencies = [
"cc",
"libc",
]
[[package]]
name = "tikv-jemallocator"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0359b4327f954e0567e69fb191cf1436617748813819c94b8cd4a431422d053a"
dependencies = [
"libc",
"tikv-jemalloc-sys",
]
[[package]]
name = "time"
version = "0.3.47"
@ -4003,6 +4300,49 @@ dependencies = [
"web-sys",
]
[[package]]
name = "winapi"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419"
dependencies = [
"winapi-i686-pc-windows-gnu",
"winapi-x86_64-pc-windows-gnu",
]
[[package]]
name = "winapi-i686-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
[[package]]
name = "winapi-x86_64-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"
[[package]]
name = "windows"
version = "0.62.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "527fadee13e0c05939a6a05d5bd6eec6cd2e3dbd648b9f8e447c6518133d8580"
dependencies = [
"windows-collections",
"windows-core",
"windows-future",
"windows-numerics",
]
[[package]]
name = "windows-collections"
version = "0.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "23b2d95af1a8a14a3c7367e1ed4fc9c20e0a26e79551b1454d72583c97cc6610"
dependencies = [
"windows-core",
]
[[package]]
name = "windows-core"
version = "0.62.2"
@ -4016,6 +4356,17 @@ dependencies = [
"windows-strings",
]
[[package]]
name = "windows-future"
version = "0.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e1d6f90251fe18a279739e78025bd6ddc52a7e22f921070ccdc67dde84c605cb"
dependencies = [
"windows-core",
"windows-link",
"windows-threading",
]
[[package]]
name = "windows-implement"
version = "0.60.2"
@ -4044,6 +4395,16 @@ version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5"
[[package]]
name = "windows-numerics"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6e2e40844ac143cdb44aead537bbf727de9b044e107a0f1220392177d15b0f26"
dependencies = [
"windows-core",
"windows-link",
]
[[package]]
name = "windows-registry"
version = "0.6.1"
@ -4133,6 +4494,15 @@ dependencies = [
"windows_x86_64_msvc 0.53.1",
]
[[package]]
name = "windows-threading"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3949bd5b99cafdf1c7ca86b43ca564028dfe27d66958f2470940f73d86d75b37"
dependencies = [
"windows-link",
]
[[package]]
name = "windows_aarch64_gnullvm"
version = "0.52.6"

View file

@ -3,6 +3,18 @@ name = "brightstaff"
version = "0.1.0"
edition = "2021"
[features]
default = ["jemalloc"]
jemalloc = ["tikv-jemallocator", "tikv-jemalloc-ctl"]
[[bin]]
name = "brightstaff"
path = "src/main.rs"
[[bin]]
name = "signals_replay"
path = "src/bin/signals_replay.rs"
[dependencies]
async-openai = "0.30.1"
async-trait = "0.1"
@ -26,7 +38,11 @@ opentelemetry-stdout = "0.31"
opentelemetry_sdk = { version = "0.31", features = ["rt-tokio"] }
pretty_assertions = "1.4.1"
rand = "0.9.2"
regex = "1.10"
lru = "0.12"
metrics = "0.23"
metrics-exporter-prometheus = { version = "0.15", default-features = false, features = ["http-listener"] }
metrics-process = "2.1"
redis = { version = "0.27", features = ["tokio-comp"] }
reqwest = { version = "0.12.15", features = ["stream"] }
serde = { version = "1.0.219", features = ["derive"] }
@ -35,6 +51,8 @@ serde_with = "3.13.0"
strsim = "0.11"
serde_yaml = "0.9.34"
thiserror = "2.0.12"
tikv-jemallocator = { version = "0.6", optional = true }
tikv-jemalloc-ctl = { version = "0.6", features = ["stats"], optional = true }
tokio = { version = "1.44.2", features = ["full"] }
tokio-postgres = { version = "0.7", features = ["with-serde_json-1"] }
tokio-stream = "0.1"

View file

@ -24,4 +24,7 @@ pub struct AppState {
/// Shared HTTP client for upstream LLM requests (connection pooling / keep-alive).
pub http_client: reqwest::Client,
pub filter_pipeline: Arc<FilterPipeline>,
/// When false, agentic signal analysis is skipped on LLM responses to save CPU.
/// Controlled by `overrides.disable_signals` in plano config.
pub signals_enabled: bool,
}

View file

@ -0,0 +1,175 @@
//! `signals-replay` — batch driver for the `brightstaff` signal analyzer.
//!
//! Reads JSONL conversations from stdin (one per line) and emits matching
//! JSONL reports on stdout, one per input conversation, in the same order.
//!
//! Input shape (per line):
//! ```json
//! {"id": "convo-42", "messages": [{"from": "human", "value": "..."}, ...]}
//! ```
//!
//! Output shape (per line, success):
//! ```json
//! {"id": "convo-42", "report": { ...python-compatible SignalReport dict... }}
//! ```
//!
//! On per-line failure (parse / analyzer error), emits:
//! ```json
//! {"id": "convo-42", "error": "..."}
//! ```
//!
//! The output report dict is shaped to match the Python reference's
//! `SignalReport.to_dict()` byte-for-byte so the parity comparator can do a
//! direct structural diff.
use std::io::{self, BufRead, BufWriter, Write};
use serde::Deserialize;
use serde_json::{json, Map, Value};
use brightstaff::signals::{SignalAnalyzer, SignalGroup, SignalReport};
#[derive(Debug, Deserialize)]
struct InputLine {
id: Value,
messages: Vec<MessageRow>,
}
#[derive(Debug, Deserialize)]
struct MessageRow {
#[serde(default)]
from: String,
#[serde(default)]
value: String,
}
fn main() {
let stdin = io::stdin();
let stdout = io::stdout();
let mut out = BufWriter::new(stdout.lock());
let analyzer = SignalAnalyzer::default();
for line in stdin.lock().lines() {
let line = match line {
Ok(l) => l,
Err(e) => {
eprintln!("read error: {e}");
std::process::exit(1);
}
};
let trimmed = line.trim();
if trimmed.is_empty() {
continue;
}
let result = process_line(&analyzer, trimmed);
// Always emit one line per input line so id ordering stays aligned.
if let Err(e) = writeln!(out, "{result}") {
eprintln!("write error: {e}");
std::process::exit(1);
}
// Flush periodically isn't strictly needed — BufWriter handles it,
// and the parent process reads the whole stream when we're done.
}
let _ = out.flush();
}
fn process_line(analyzer: &SignalAnalyzer, line: &str) -> Value {
let parsed: InputLine = match serde_json::from_str(line) {
Ok(p) => p,
Err(e) => {
return json!({
"id": Value::Null,
"error": format!("input parse: {e}"),
});
}
};
let id = parsed.id.clone();
let view: Vec<brightstaff::signals::analyzer::ShareGptMessage<'_>> = parsed
.messages
.iter()
.map(|m| brightstaff::signals::analyzer::ShareGptMessage {
from: m.from.as_str(),
value: m.value.as_str(),
})
.collect();
let report = analyzer.analyze_sharegpt(&view);
let report_dict = report_to_python_dict(&report);
json!({
"id": id,
"report": report_dict,
})
}
/// Convert a `SignalReport` into the Python reference's `to_dict()` shape.
///
/// Ordering of category keys in each layer dict follows the Python source
/// exactly so even string-equality comparisons behave deterministically.
fn report_to_python_dict(r: &SignalReport) -> Value {
let mut interaction = Map::new();
interaction.insert(
"misalignment".to_string(),
signal_group_to_python(&r.interaction.misalignment),
);
interaction.insert(
"stagnation".to_string(),
signal_group_to_python(&r.interaction.stagnation),
);
interaction.insert(
"disengagement".to_string(),
signal_group_to_python(&r.interaction.disengagement),
);
interaction.insert(
"satisfaction".to_string(),
signal_group_to_python(&r.interaction.satisfaction),
);
let mut execution = Map::new();
execution.insert(
"failure".to_string(),
signal_group_to_python(&r.execution.failure),
);
execution.insert(
"loops".to_string(),
signal_group_to_python(&r.execution.loops),
);
let mut environment = Map::new();
environment.insert(
"exhaustion".to_string(),
signal_group_to_python(&r.environment.exhaustion),
);
json!({
"interaction_signals": Value::Object(interaction),
"execution_signals": Value::Object(execution),
"environment_signals": Value::Object(environment),
"overall_quality": r.overall_quality.as_str(),
"summary": r.summary,
})
}
fn signal_group_to_python(g: &SignalGroup) -> Value {
let signals: Vec<Value> = g
.signals
.iter()
.map(|s| {
json!({
"signal_type": s.signal_type.as_str(),
"message_index": s.message_index,
"snippet": s.snippet,
"confidence": s.confidence,
"metadata": s.metadata,
})
})
.collect();
json!({
"category": g.category,
"count": g.count,
"severity": g.severity,
"signals": signals,
})
}

View file

@ -0,0 +1,53 @@
use bytes::Bytes;
use http_body_util::combinators::BoxBody;
use hyper::{Response, StatusCode};
use super::full;
#[derive(serde::Serialize)]
struct MemStats {
allocated_bytes: usize,
resident_bytes: usize,
#[serde(skip_serializing_if = "Option::is_none")]
error: Option<String>,
}
/// Returns jemalloc memory statistics as JSON.
/// Falls back to a stub when the jemalloc feature is disabled.
pub async fn memstats() -> Result<Response<BoxBody<Bytes, hyper::Error>>, hyper::Error> {
let stats = get_jemalloc_stats();
let json = serde_json::to_string(&stats).unwrap();
Ok(Response::builder()
.status(StatusCode::OK)
.header("Content-Type", "application/json")
.body(full(json))
.unwrap())
}
#[cfg(feature = "jemalloc")]
fn get_jemalloc_stats() -> MemStats {
use tikv_jemalloc_ctl::{epoch, stats};
if let Err(e) = epoch::advance() {
return MemStats {
allocated_bytes: 0,
resident_bytes: 0,
error: Some(format!("failed to advance jemalloc epoch: {e}")),
};
}
MemStats {
allocated_bytes: stats::allocated::read().unwrap_or(0),
resident_bytes: stats::resident::read().unwrap_or(0),
error: None,
}
}
#[cfg(not(feature = "jemalloc"))]
fn get_jemalloc_stats() -> MemStats {
MemStats {
allocated_bytes: 0,
resident_bytes: 0,
error: Some("jemalloc feature not enabled".to_string()),
}
}

View file

@ -441,10 +441,8 @@ impl ArchFunctionHandler {
}
}
// Handle str/string conversions
"str" | "string" => {
if !value.is_string() {
return Ok(json!(value.to_string()));
}
"str" | "string" if !value.is_string() => {
return Ok(json!(value.to_string()));
}
_ => {}
}

View file

@ -24,13 +24,14 @@ use crate::app_state::AppState;
use crate::handlers::agents::pipeline::PipelineProcessor;
use crate::handlers::extract_request_id;
use crate::handlers::full;
use crate::metrics as bs_metrics;
use crate::state::response_state_processor::ResponsesStateProcessor;
use crate::state::{
extract_input_items, retrieve_and_combine_input, StateStorage, StateStorageError,
};
use crate::streaming::{
create_streaming_response, create_streaming_response_with_output_filter, truncate_message,
ObservableStreamProcessor, StreamProcessor,
LlmMetricsCtx, ObservableStreamProcessor, StreamProcessor,
};
use crate::tracing::{
collect_custom_trace_attributes, llm as tracing_llm, operation_component,
@ -142,6 +143,7 @@ async fn llm_chat_inner(
&request_path,
&state.model_aliases,
&state.llm_providers,
state.signals_enabled,
)
.await
{
@ -253,7 +255,15 @@ async fn llm_chat_inner(
if let Some(ref client_api_kind) = client_api {
let upstream_api =
provider_id.compatible_api_for_client(client_api_kind, is_streaming_request);
client_request.normalize_for_upstream(provider_id, &upstream_api);
if let Err(e) = client_request.normalize_for_upstream(provider_id, &upstream_api) {
warn!(
"request_id={}: normalize_for_upstream failed: {}",
request_id, e
);
let mut bad_request = Response::new(full(e.message));
*bad_request.status_mut() = StatusCode::BAD_REQUEST;
return Ok(bad_request);
}
}
// --- Phase 2: Resolve conversation state (v1/responses API) ---
@ -407,6 +417,7 @@ async fn parse_and_validate_request(
request_path: &str,
model_aliases: &Option<HashMap<String, ModelAlias>>,
llm_providers: &Arc<RwLock<LlmProviders>>,
signals_enabled: bool,
) -> Result<PreparedRequest, Response<BoxBody<Bytes, hyper::Error>>> {
let raw_bytes = request
.collect()
@ -485,7 +496,11 @@ async fn parse_and_validate_request(
let user_message_preview = client_request
.get_recent_user_message()
.map(|msg| truncate_message(&msg, 50));
let messages_for_signals = Some(client_request.get_messages());
let messages_for_signals = if signals_enabled {
Some(client_request.get_messages())
} else {
None
};
// Set the upstream model name and strip routing metadata
client_request.set_model(model_name_only.clone());
@ -686,6 +701,13 @@ async fn send_upstream(
let request_start_time = std::time::Instant::now();
// Labels for LLM upstream metrics. We prefer `resolved_model` (post-routing)
// and derive the provider from its `provider/model` prefix. This matches the
// same model id the cost/latency router keys off.
let (metric_provider_raw, metric_model_raw) = bs_metrics::split_provider_model(resolved_model);
let metric_provider = metric_provider_raw.to_string();
let metric_model = metric_model_raw.to_string();
let llm_response = match http_client
.post(upstream_url)
.headers(request_headers.clone())
@ -695,6 +717,14 @@ async fn send_upstream(
{
Ok(res) => res,
Err(err) => {
let err_class = bs_metrics::llm_error_class_from_reqwest(&err);
bs_metrics::record_llm_upstream(
&metric_provider,
&metric_model,
0,
err_class,
request_start_time.elapsed(),
);
let err_msg = format!("Failed to send request: {}", err);
let mut internal_error = Response::new(full(err_msg));
*internal_error.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
@ -750,7 +780,12 @@ async fn send_upstream(
span_name,
request_start_time,
messages_for_signals,
);
)
.with_llm_metrics(LlmMetricsCtx {
provider: metric_provider.clone(),
model: metric_model.clone(),
upstream_status: upstream_status.as_u16(),
});
let output_filter_request_headers = if filter_pipeline.has_output_filters() {
Some(request_headers.clone())

View file

@ -5,10 +5,24 @@ use hyper::StatusCode;
use std::sync::Arc;
use tracing::{debug, info, warn};
use crate::metrics as bs_metrics;
use crate::metrics::labels as metric_labels;
use crate::router::orchestrator::OrchestratorService;
use crate::streaming::truncate_message;
use crate::tracing::routing;
/// Classify a request path (already stripped of `/agents` or `/routing` by
/// the caller) into the fixed `route` label used on routing metrics.
fn route_label_for_path(request_path: &str) -> &'static str {
if request_path.starts_with("/agents") {
metric_labels::ROUTE_AGENT
} else if request_path.starts_with("/routing") {
metric_labels::ROUTE_ROUTING
} else {
metric_labels::ROUTE_LLM
}
}
pub struct RoutingResult {
/// Primary model to use (first in the ranked list).
pub model_name: String,
@ -106,15 +120,23 @@ pub async fn router_chat_get_upstream_model(
)
.await;
let determination_ms = routing_start_time.elapsed().as_millis() as i64;
let determination_elapsed = routing_start_time.elapsed();
let determination_ms = determination_elapsed.as_millis() as i64;
let current_span = tracing::Span::current();
current_span.record(routing::ROUTE_DETERMINATION_MS, determination_ms);
let route_label = route_label_for_path(request_path);
match routing_result {
Ok(route) => match route {
Some((route_name, ranked_models)) => {
let model_name = ranked_models.first().cloned().unwrap_or_default();
current_span.record("route.selected_model", model_name.as_str());
bs_metrics::record_router_decision(
route_label,
&model_name,
false,
determination_elapsed,
);
Ok(RoutingResult {
model_name,
models: ranked_models,
@ -126,6 +148,12 @@ pub async fn router_chat_get_upstream_model(
// This signals to llm.rs to use the original validated request model
current_span.record("route.selected_model", "none");
info!("no route determined, using default model");
bs_metrics::record_router_decision(
route_label,
"none",
true,
determination_elapsed,
);
Ok(RoutingResult {
model_name: "none".to_string(),
@ -136,6 +164,7 @@ pub async fn router_chat_get_upstream_model(
},
Err(err) => {
current_span.record("route.selected_model", "unknown");
bs_metrics::record_router_decision(route_label, "unknown", true, determination_elapsed);
Err(RoutingError::internal_error(format!(
"Failed to determine route: {}",
err

View file

@ -1,4 +1,5 @@
pub mod agents;
pub mod debug;
pub mod function_calling;
pub mod llm;
pub mod models;

View file

@ -12,6 +12,8 @@ use tracing::{debug, info, info_span, warn, Instrument};
use super::extract_or_generate_traceparent;
use crate::handlers::llm::model_selection::router_chat_get_upstream_model;
use crate::metrics as bs_metrics;
use crate::metrics::labels as metric_labels;
use crate::router::orchestrator::OrchestratorService;
use crate::tracing::{collect_custom_trace_attributes, operation_component, set_service_name};
@ -230,6 +232,17 @@ async fn routing_decision_inner(
pinned: false,
};
// Distinguish "decision served" (a concrete model picked) from
// "no_candidates" (the sentinel "none" returned when nothing
// matched). The handler still responds 200 in both cases, so RED
// metrics alone can't tell them apart.
let outcome = if response.models.first().map(|m| m == "none").unwrap_or(true) {
metric_labels::ROUTING_SVC_NO_CANDIDATES
} else {
metric_labels::ROUTING_SVC_DECISION_SERVED
};
bs_metrics::record_routing_service_outcome(outcome);
info!(
primary_model = %response.models.first().map(|s| s.as_str()).unwrap_or("none"),
total_models = response.models.len(),
@ -249,6 +262,7 @@ async fn routing_decision_inner(
.unwrap())
}
Err(err) => {
bs_metrics::record_routing_service_outcome(metric_labels::ROUTING_SVC_POLICY_ERROR);
warn!(error = %err.message, "routing decision failed");
Ok(BrightStaffError::InternalServerError(err.message).into_response())
}

View file

@ -1,5 +1,6 @@
pub mod app_state;
pub mod handlers;
pub mod metrics;
pub mod router;
pub mod session_cache;
pub mod signals;

View file

@ -1,10 +1,17 @@
#[cfg(feature = "jemalloc")]
#[global_allocator]
static ALLOC: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
use brightstaff::app_state::AppState;
use brightstaff::handlers::agents::orchestrator::agent_chat;
use brightstaff::handlers::debug;
use brightstaff::handlers::empty;
use brightstaff::handlers::function_calling::function_calling_chat_handler;
use brightstaff::handlers::llm::llm_chat;
use brightstaff::handlers::models::list_models;
use brightstaff::handlers::routing_service::routing_decision;
use brightstaff::metrics as bs_metrics;
use brightstaff::metrics::labels as metric_labels;
use brightstaff::router::model_metrics::ModelMetricsService;
use brightstaff::router::orchestrator::OrchestratorService;
use brightstaff::session_cache::init_session_cache;
@ -326,6 +333,8 @@ async fn init_app_state(
.as_ref()
.and_then(|tracing| tracing.span_attributes.clone());
let signals_enabled = !overrides.disable_signals.unwrap_or(false);
Ok(AppState {
orchestrator_service,
model_aliases: config.model_aliases.clone(),
@ -337,6 +346,7 @@ async fn init_app_state(
span_attributes,
http_client: reqwest::Client::new(),
filter_pipeline,
signals_enabled,
})
}
@ -384,10 +394,79 @@ async fn init_state_storage(
// Request routing
// ---------------------------------------------------------------------------
/// Normalized method label — limited set so we never emit a free-form string.
fn method_label(method: &Method) -> &'static str {
match *method {
Method::GET => "GET",
Method::POST => "POST",
Method::PUT => "PUT",
Method::DELETE => "DELETE",
Method::PATCH => "PATCH",
Method::HEAD => "HEAD",
Method::OPTIONS => "OPTIONS",
_ => "OTHER",
}
}
/// Compute the fixed `handler` metric label from the request's path+method.
/// Returning `None` for fall-through means `route()` will hand the request to
/// the catch-all 404 branch.
fn handler_label_for(method: &Method, path: &str) -> &'static str {
if let Some(stripped) = path.strip_prefix("/agents") {
if matches!(
stripped,
CHAT_COMPLETIONS_PATH | MESSAGES_PATH | OPENAI_RESPONSES_API_PATH
) {
return metric_labels::HANDLER_AGENT_CHAT;
}
}
if let Some(stripped) = path.strip_prefix("/routing") {
if matches!(
stripped,
CHAT_COMPLETIONS_PATH | MESSAGES_PATH | OPENAI_RESPONSES_API_PATH
) {
return metric_labels::HANDLER_ROUTING_DECISION;
}
}
match (method, path) {
(&Method::POST, CHAT_COMPLETIONS_PATH | MESSAGES_PATH | OPENAI_RESPONSES_API_PATH) => {
metric_labels::HANDLER_LLM_CHAT
}
(&Method::POST, "/function_calling") => metric_labels::HANDLER_FUNCTION_CALLING,
(&Method::GET, "/v1/models" | "/agents/v1/models") => metric_labels::HANDLER_LIST_MODELS,
(&Method::OPTIONS, "/v1/models" | "/agents/v1/models") => {
metric_labels::HANDLER_CORS_PREFLIGHT
}
_ => metric_labels::HANDLER_NOT_FOUND,
}
}
/// Route an incoming HTTP request to the appropriate handler.
async fn route(
req: Request<Incoming>,
state: Arc<AppState>,
) -> Result<Response<BoxBody<Bytes, hyper::Error>>, hyper::Error> {
let handler = handler_label_for(req.method(), req.uri().path());
let method = method_label(req.method());
let started = std::time::Instant::now();
let _in_flight = bs_metrics::InFlightGuard::new(handler);
let result = dispatch(req, state).await;
let status = match &result {
Ok(resp) => resp.status().as_u16(),
// hyper::Error here means the body couldn't be produced; conventionally 500.
Err(_) => 500,
};
bs_metrics::record_http(handler, method, status, started);
result
}
/// Inner dispatcher split out so `route()` can wrap it with metrics without
/// duplicating the match tree.
async fn dispatch(
req: Request<Incoming>,
state: Arc<AppState>,
) -> Result<Response<BoxBody<Bytes, hyper::Error>>, hyper::Error> {
let parent_cx = global::get_text_map_propagator(|p| p.extract(&HeaderExtractor(req.headers())));
let path = req.uri().path().to_string();
@ -439,6 +518,7 @@ async fn route(
Ok(list_models(Arc::clone(&state.llm_providers)).await)
}
(&Method::OPTIONS, "/v1/models" | "/agents/v1/models") => cors_preflight(),
(&Method::GET, "/debug/memstats") => debug::memstats().await,
_ => {
debug!(method = %req.method(), path = %path, "no route found");
let mut not_found = Response::new(empty());
@ -503,6 +583,7 @@ async fn run_server(state: Arc<AppState>) -> Result<(), Box<dyn std::error::Erro
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let config = load_config()?;
let _tracer_provider = init_tracer(config.tracing.as_ref());
bs_metrics::init();
info!("loaded plano_config.yaml");
let state = Arc::new(init_app_state(&config).await?);
run_server(state).await

View file

@ -0,0 +1,38 @@
//! Fixed label-value constants so callers never emit free-form strings
//! (which would blow up cardinality).
// Handler enum — derived from the path+method match in `route()`.
pub const HANDLER_AGENT_CHAT: &str = "agent_chat";
pub const HANDLER_ROUTING_DECISION: &str = "routing_decision";
pub const HANDLER_LLM_CHAT: &str = "llm_chat";
pub const HANDLER_FUNCTION_CALLING: &str = "function_calling";
pub const HANDLER_LIST_MODELS: &str = "list_models";
pub const HANDLER_CORS_PREFLIGHT: &str = "cors_preflight";
pub const HANDLER_NOT_FOUND: &str = "not_found";
// Router "route" class — which brightstaff endpoint prompted the decision.
pub const ROUTE_AGENT: &str = "agent";
pub const ROUTE_ROUTING: &str = "routing";
pub const ROUTE_LLM: &str = "llm";
// Token kind for brightstaff_llm_tokens_total.
pub const TOKEN_KIND_PROMPT: &str = "prompt";
pub const TOKEN_KIND_COMPLETION: &str = "completion";
// LLM error_class values (match docstring in metrics/mod.rs).
pub const LLM_ERR_NONE: &str = "none";
pub const LLM_ERR_TIMEOUT: &str = "timeout";
pub const LLM_ERR_CONNECT: &str = "connect";
pub const LLM_ERR_PARSE: &str = "parse";
pub const LLM_ERR_OTHER: &str = "other";
pub const LLM_ERR_STREAM: &str = "stream";
// Routing service outcome values.
pub const ROUTING_SVC_DECISION_SERVED: &str = "decision_served";
pub const ROUTING_SVC_NO_CANDIDATES: &str = "no_candidates";
pub const ROUTING_SVC_POLICY_ERROR: &str = "policy_error";
// Session cache outcome values.
pub const SESSION_CACHE_HIT: &str = "hit";
pub const SESSION_CACHE_MISS: &str = "miss";
pub const SESSION_CACHE_STORE: &str = "store";

View file

@ -0,0 +1,377 @@
//! Prometheus metrics for brightstaff.
//!
//! Installs the `metrics` global recorder backed by
//! `metrics-exporter-prometheus` and exposes a `/metrics` HTTP endpoint on a
//! dedicated admin port (default `0.0.0.0:9092`, overridable via
//! `METRICS_BIND_ADDRESS`).
//!
//! Emitted metric families (see `describe_all` for full list):
//! - HTTP RED: `brightstaff_http_requests_total`,
//! `brightstaff_http_request_duration_seconds`,
//! `brightstaff_http_in_flight_requests`.
//! - LLM upstream: `brightstaff_llm_upstream_requests_total`,
//! `brightstaff_llm_upstream_duration_seconds`,
//! `brightstaff_llm_time_to_first_token_seconds`,
//! `brightstaff_llm_tokens_total`,
//! `brightstaff_llm_tokens_usage_missing_total`.
//! - Routing: `brightstaff_router_decisions_total`,
//! `brightstaff_router_decision_duration_seconds`,
//! `brightstaff_routing_service_requests_total`,
//! `brightstaff_session_cache_events_total`.
//! - Process: via `metrics-process`.
//! - Build: `brightstaff_build_info`.
use std::net::SocketAddr;
use std::sync::OnceLock;
use std::time::{Duration, Instant};
use metrics::{counter, describe_counter, describe_gauge, describe_histogram, gauge, histogram};
use metrics_exporter_prometheus::{Matcher, PrometheusBuilder};
use tracing::{info, warn};
pub mod labels;
/// Guard flag so tests don't re-install the global recorder.
static INIT: OnceLock<()> = OnceLock::new();
const DEFAULT_METRICS_BIND: &str = "0.0.0.0:9092";
/// HTTP request duration buckets (seconds). Capped at 60s.
const HTTP_BUCKETS: &[f64] = &[
0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0,
];
/// LLM upstream / TTFT buckets (seconds). Capped at 120s because provider
/// completions routinely run that long.
const LLM_BUCKETS: &[f64] = &[0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, 120.0];
/// Router decision buckets (seconds). The orchestrator call itself is usually
/// sub-second but bucketed generously in case of upstream slowness.
const ROUTER_BUCKETS: &[f64] = &[
0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0,
];
/// Install the global recorder and spawn the `/metrics` HTTP listener.
///
/// Safe to call more than once; subsequent calls are no-ops so tests that
/// construct their own recorder still work.
pub fn init() {
if INIT.get().is_some() {
return;
}
let bind: SocketAddr = std::env::var("METRICS_BIND_ADDRESS")
.unwrap_or_else(|_| DEFAULT_METRICS_BIND.to_string())
.parse()
.unwrap_or_else(|err| {
warn!(error = %err, default = DEFAULT_METRICS_BIND, "invalid METRICS_BIND_ADDRESS, falling back to default");
DEFAULT_METRICS_BIND.parse().expect("default bind parses")
});
let builder = PrometheusBuilder::new()
.with_http_listener(bind)
.set_buckets_for_metric(
Matcher::Full("brightstaff_http_request_duration_seconds".to_string()),
HTTP_BUCKETS,
)
.and_then(|b| {
b.set_buckets_for_metric(Matcher::Prefix("brightstaff_llm_".to_string()), LLM_BUCKETS)
})
.and_then(|b| {
b.set_buckets_for_metric(
Matcher::Full("brightstaff_router_decision_duration_seconds".to_string()),
ROUTER_BUCKETS,
)
});
let builder = match builder {
Ok(b) => b,
Err(err) => {
warn!(error = %err, "failed to configure metrics buckets, using defaults");
PrometheusBuilder::new().with_http_listener(bind)
}
};
if let Err(err) = builder.install() {
warn!(error = %err, "failed to install Prometheus recorder; metrics disabled");
return;
}
let _ = INIT.set(());
describe_all();
emit_build_info();
// Register process-level collector (RSS, CPU, FDs).
let collector = metrics_process::Collector::default();
collector.describe();
// Prime once at startup; subsequent scrapes refresh via the exporter's
// per-scrape render, so we additionally refresh on a short interval to
// keep gauges moving between scrapes without requiring client pull.
collector.collect();
tokio::spawn(async move {
let mut tick = tokio::time::interval(Duration::from_secs(10));
tick.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip);
loop {
tick.tick().await;
collector.collect();
}
});
info!(address = %bind, "metrics listener started");
}
fn describe_all() {
describe_counter!(
"brightstaff_http_requests_total",
"Total HTTP requests served by brightstaff, by handler and status class."
);
describe_histogram!(
"brightstaff_http_request_duration_seconds",
"Wall-clock duration of HTTP requests served by brightstaff, by handler."
);
describe_gauge!(
"brightstaff_http_in_flight_requests",
"Number of HTTP requests currently being served by brightstaff, by handler."
);
describe_counter!(
"brightstaff_llm_upstream_requests_total",
"LLM upstream request outcomes, by provider, model, status class and error class."
);
describe_histogram!(
"brightstaff_llm_upstream_duration_seconds",
"Wall-clock duration of LLM upstream calls (stream close for streaming), by provider and model."
);
describe_histogram!(
"brightstaff_llm_time_to_first_token_seconds",
"Time from request start to first streamed byte, by provider and model (streaming only)."
);
describe_counter!(
"brightstaff_llm_tokens_total",
"Tokens reported in the provider `usage` field, by provider, model and kind (prompt/completion)."
);
describe_counter!(
"brightstaff_llm_tokens_usage_missing_total",
"LLM responses that completed without a usable `usage` block (so token counts are unknown)."
);
describe_counter!(
"brightstaff_router_decisions_total",
"Routing decisions made by the orchestrator, by route, selected model, and whether a fallback was used."
);
describe_histogram!(
"brightstaff_router_decision_duration_seconds",
"Time spent in the orchestrator deciding a route, by route."
);
describe_counter!(
"brightstaff_routing_service_requests_total",
"Outcomes of /routing/* decision requests: decision_served, no_candidates, policy_error."
);
describe_counter!(
"brightstaff_session_cache_events_total",
"Session affinity cache lookups and stores, by outcome."
);
describe_gauge!(
"brightstaff_build_info",
"Build metadata. Always 1; labels carry version and git SHA."
);
}
fn emit_build_info() {
let version = env!("CARGO_PKG_VERSION");
let git_sha = option_env!("GIT_SHA").unwrap_or("unknown");
gauge!(
"brightstaff_build_info",
"version" => version.to_string(),
"git_sha" => git_sha.to_string(),
)
.set(1.0);
}
/// Split a provider-qualified model id like `"openai/gpt-4o"` into
/// `(provider, model)`. Returns `("unknown", raw)` when there is no `/`.
pub fn split_provider_model(full: &str) -> (&str, &str) {
match full.split_once('/') {
Some((p, m)) => (p, m),
None => ("unknown", full),
}
}
/// Bucket an HTTP status code into `"2xx"` / `"4xx"` / `"5xx"` / `"1xx"` / `"3xx"`.
pub fn status_class(status: u16) -> &'static str {
match status {
100..=199 => "1xx",
200..=299 => "2xx",
300..=399 => "3xx",
400..=499 => "4xx",
500..=599 => "5xx",
_ => "other",
}
}
// ---------------------------------------------------------------------------
// HTTP RED helpers
// ---------------------------------------------------------------------------
/// RAII guard that increments the in-flight gauge on construction and
/// decrements on drop. Pair with [`HttpTimer`] in the `route()` wrapper so the
/// gauge drops even on error paths.
pub struct InFlightGuard {
handler: &'static str,
}
impl InFlightGuard {
pub fn new(handler: &'static str) -> Self {
gauge!(
"brightstaff_http_in_flight_requests",
"handler" => handler,
)
.increment(1.0);
Self { handler }
}
}
impl Drop for InFlightGuard {
fn drop(&mut self) {
gauge!(
"brightstaff_http_in_flight_requests",
"handler" => self.handler,
)
.decrement(1.0);
}
}
/// Record the HTTP request counter + duration histogram.
pub fn record_http(handler: &'static str, method: &'static str, status: u16, started: Instant) {
let class = status_class(status);
counter!(
"brightstaff_http_requests_total",
"handler" => handler,
"method" => method,
"status_class" => class,
)
.increment(1);
histogram!(
"brightstaff_http_request_duration_seconds",
"handler" => handler,
)
.record(started.elapsed().as_secs_f64());
}
// ---------------------------------------------------------------------------
// LLM upstream helpers
// ---------------------------------------------------------------------------
/// Classify an outcome of an LLM upstream call for the `error_class` label.
pub fn llm_error_class_from_reqwest(err: &reqwest::Error) -> &'static str {
if err.is_timeout() {
"timeout"
} else if err.is_connect() {
"connect"
} else if err.is_decode() {
"parse"
} else {
"other"
}
}
/// Record the outcome of an LLM upstream call. `status` is the HTTP status
/// the upstream returned (0 if the call never produced one, e.g. send failure).
/// `error_class` is `"none"` on success, or a discriminated error label.
pub fn record_llm_upstream(
provider: &str,
model: &str,
status: u16,
error_class: &str,
duration: Duration,
) {
let class = if status == 0 {
"error"
} else {
status_class(status)
};
counter!(
"brightstaff_llm_upstream_requests_total",
"provider" => provider.to_string(),
"model" => model.to_string(),
"status_class" => class,
"error_class" => error_class.to_string(),
)
.increment(1);
histogram!(
"brightstaff_llm_upstream_duration_seconds",
"provider" => provider.to_string(),
"model" => model.to_string(),
)
.record(duration.as_secs_f64());
}
pub fn record_llm_ttft(provider: &str, model: &str, ttft: Duration) {
histogram!(
"brightstaff_llm_time_to_first_token_seconds",
"provider" => provider.to_string(),
"model" => model.to_string(),
)
.record(ttft.as_secs_f64());
}
pub fn record_llm_tokens(provider: &str, model: &str, kind: &'static str, count: u64) {
counter!(
"brightstaff_llm_tokens_total",
"provider" => provider.to_string(),
"model" => model.to_string(),
"kind" => kind,
)
.increment(count);
}
pub fn record_llm_tokens_usage_missing(provider: &str, model: &str) {
counter!(
"brightstaff_llm_tokens_usage_missing_total",
"provider" => provider.to_string(),
"model" => model.to_string(),
)
.increment(1);
}
// ---------------------------------------------------------------------------
// Router helpers
// ---------------------------------------------------------------------------
pub fn record_router_decision(
route: &'static str,
selected_model: &str,
fallback: bool,
duration: Duration,
) {
counter!(
"brightstaff_router_decisions_total",
"route" => route,
"selected_model" => selected_model.to_string(),
"fallback" => if fallback { "true" } else { "false" },
)
.increment(1);
histogram!(
"brightstaff_router_decision_duration_seconds",
"route" => route,
)
.record(duration.as_secs_f64());
}
pub fn record_routing_service_outcome(outcome: &'static str) {
counter!(
"brightstaff_routing_service_requests_total",
"outcome" => outcome,
)
.increment(1);
}
pub fn record_session_cache_event(outcome: &'static str) {
counter!(
"brightstaff_session_cache_events_total",
"outcome" => outcome,
)
.increment(1);
}

View file

@ -3,3 +3,5 @@ pub mod model_metrics;
pub mod orchestrator;
pub mod orchestrator_model;
pub mod orchestrator_model_v1;
#[cfg(test)]
mod stress_tests;

View file

@ -15,6 +15,8 @@ use super::http::{self, post_and_extract_content};
use super::model_metrics::ModelMetricsService;
use super::orchestrator_model::OrchestratorModel;
use crate::metrics as bs_metrics;
use crate::metrics::labels as metric_labels;
use crate::router::orchestrator_model_v1;
use crate::session_cache::SessionCache;
@ -130,7 +132,13 @@ impl OrchestratorService {
tenant_id: Option<&str>,
) -> Option<CachedRoute> {
let cache = self.session_cache.as_ref()?;
cache.get(&Self::session_key(tenant_id, session_id)).await
let result = cache.get(&Self::session_key(tenant_id, session_id)).await;
bs_metrics::record_session_cache_event(if result.is_some() {
metric_labels::SESSION_CACHE_HIT
} else {
metric_labels::SESSION_CACHE_MISS
});
result
}
pub async fn cache_route(
@ -151,6 +159,7 @@ impl OrchestratorService {
self.session_ttl,
)
.await;
bs_metrics::record_session_cache_event(metric_labels::SESSION_CACHE_STORE);
}
}

View file

@ -0,0 +1,260 @@
#[cfg(test)]
mod tests {
use crate::router::orchestrator::OrchestratorService;
use crate::session_cache::memory::MemorySessionCache;
use common::configuration::{SelectionPolicy, SelectionPreference, TopLevelRoutingPreference};
use hermesllm::apis::openai::{Message, MessageContent, Role};
use std::sync::Arc;
fn make_messages(n: usize) -> Vec<Message> {
(0..n)
.map(|i| Message {
role: if i % 2 == 0 {
Role::User
} else {
Role::Assistant
},
content: Some(MessageContent::Text(format!(
"This is message number {i} with some padding text to make it realistic."
))),
name: None,
tool_calls: None,
tool_call_id: None,
})
.collect()
}
fn make_routing_prefs() -> Vec<TopLevelRoutingPreference> {
vec![
TopLevelRoutingPreference {
name: "code_generation".to_string(),
description: "Code generation and debugging tasks".to_string(),
models: vec![
"openai/gpt-4o".to_string(),
"openai/gpt-4o-mini".to_string(),
],
selection_policy: SelectionPolicy {
prefer: SelectionPreference::None,
},
},
TopLevelRoutingPreference {
name: "summarization".to_string(),
description: "Summarizing documents and text".to_string(),
models: vec![
"anthropic/claude-3-sonnet".to_string(),
"openai/gpt-4o-mini".to_string(),
],
selection_policy: SelectionPolicy {
prefer: SelectionPreference::None,
},
},
]
}
/// Stress test: exercise the full routing code path N times using a mock
/// HTTP server and measure jemalloc allocated bytes before/after.
///
/// This catches:
/// - Memory leaks in generate_request / parse_response
/// - Leaks in reqwest connection handling
/// - String accumulation in the orchestrator model
/// - Fragmentation (jemalloc allocated vs resident)
#[tokio::test]
async fn stress_test_routing_determine_route() {
let mut server = mockito::Server::new_async().await;
let router_url = format!("{}/v1/chat/completions", server.url());
let mock_response = serde_json::json!({
"id": "chatcmpl-mock",
"object": "chat.completion",
"created": 1234567890,
"model": "plano-orchestrator",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"route\": \"code_generation\"}"
},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 100, "completion_tokens": 10, "total_tokens": 110}
});
let _mock = server
.mock("POST", "/v1/chat/completions")
.with_status(200)
.with_header("content-type", "application/json")
.with_body(mock_response.to_string())
.expect_at_least(1)
.create_async()
.await;
let prefs = make_routing_prefs();
let session_cache = Arc::new(MemorySessionCache::new(1000));
let orchestrator_service = Arc::new(OrchestratorService::with_routing(
router_url,
"Plano-Orchestrator".to_string(),
"plano-orchestrator".to_string(),
Some(prefs.clone()),
None,
None,
session_cache,
None,
2048,
));
// Warm up: a few requests to stabilize allocator state
for _ in 0..10 {
let msgs = make_messages(5);
let _ = orchestrator_service
.determine_route(&msgs, None, "warmup")
.await;
}
// Snapshot memory after warmup
let baseline = get_allocated();
let num_iterations = 2000;
for i in 0..num_iterations {
let msgs = make_messages(5 + (i % 10));
let inline = if i % 3 == 0 {
Some(make_routing_prefs())
} else {
None
};
let _ = orchestrator_service
.determine_route(&msgs, inline, &format!("req-{i}"))
.await;
}
let after = get_allocated();
let growth = after.saturating_sub(baseline);
let growth_mb = growth as f64 / (1024.0 * 1024.0);
let per_request = growth.checked_div(num_iterations).unwrap_or(0);
eprintln!("=== Routing Stress Test Results ===");
eprintln!(" Iterations: {num_iterations}");
eprintln!(" Baseline alloc: {} bytes", baseline);
eprintln!(" Final alloc: {} bytes", after);
eprintln!(" Growth: {} bytes ({growth_mb:.2} MB)", growth);
eprintln!(" Per-request: {} bytes", per_request);
// Allow up to 256 bytes per request of retained growth (connection pool, etc.)
// A true leak would show thousands of bytes per request.
assert!(
per_request < 256,
"Possible memory leak: {per_request} bytes/request retained after {num_iterations} iterations"
);
}
/// Stress test with high concurrency: many parallel determine_route calls.
#[tokio::test]
async fn stress_test_routing_concurrent() {
let mut server = mockito::Server::new_async().await;
let router_url = format!("{}/v1/chat/completions", server.url());
let mock_response = serde_json::json!({
"id": "chatcmpl-mock",
"object": "chat.completion",
"created": 1234567890,
"model": "plano-orchestrator",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"route\": \"summarization\"}"
},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 100, "completion_tokens": 10, "total_tokens": 110}
});
let _mock = server
.mock("POST", "/v1/chat/completions")
.with_status(200)
.with_header("content-type", "application/json")
.with_body(mock_response.to_string())
.expect_at_least(1)
.create_async()
.await;
let prefs = make_routing_prefs();
let session_cache = Arc::new(MemorySessionCache::new(1000));
let orchestrator_service = Arc::new(OrchestratorService::with_routing(
router_url,
"Plano-Orchestrator".to_string(),
"plano-orchestrator".to_string(),
Some(prefs),
None,
None,
session_cache,
None,
2048,
));
// Warm up
for _ in 0..20 {
let msgs = make_messages(3);
let _ = orchestrator_service
.determine_route(&msgs, None, "warmup")
.await;
}
let baseline = get_allocated();
let concurrency = 50;
let requests_per_task = 100;
let total = concurrency * requests_per_task;
let mut handles = vec![];
for t in 0..concurrency {
let svc = Arc::clone(&orchestrator_service);
let handle = tokio::spawn(async move {
for r in 0..requests_per_task {
let msgs = make_messages(3 + (r % 8));
let _ = svc
.determine_route(&msgs, None, &format!("req-{t}-{r}"))
.await;
}
});
handles.push(handle);
}
for h in handles {
h.await.unwrap();
}
let after = get_allocated();
let growth = after.saturating_sub(baseline);
let per_request = growth / total;
eprintln!("=== Concurrent Routing Stress Test Results ===");
eprintln!(" Tasks: {concurrency} x {requests_per_task} = {total}");
eprintln!(" Baseline: {} bytes", baseline);
eprintln!(" Final: {} bytes", after);
eprintln!(
" Growth: {} bytes ({:.2} MB)",
growth,
growth as f64 / 1_048_576.0
);
eprintln!(" Per-request: {} bytes", per_request);
assert!(
per_request < 512,
"Possible memory leak under concurrency: {per_request} bytes/request retained after {total} requests"
);
}
#[cfg(feature = "jemalloc")]
fn get_allocated() -> usize {
tikv_jemalloc_ctl::epoch::advance().unwrap();
tikv_jemalloc_ctl::stats::allocated::read().unwrap_or(0)
}
#[cfg(not(feature = "jemalloc"))]
fn get_allocated() -> usize {
0
}
}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,347 @@
//! Environment exhaustion detector. Direct port of
//! `signals/environment/exhaustion.py`.
use std::sync::OnceLock;
use regex::Regex;
use serde_json::json;
use crate::signals::analyzer::ShareGptMessage;
use crate::signals::schemas::{SignalGroup, SignalInstance, SignalType};
pub const API_ERROR_PATTERNS: &[&str] = &[
r"500\s*(internal\s+)?server\s+error",
r"502\s*bad\s+gateway",
r"503\s*service\s+unavailable",
r"504\s*gateway\s+timeout",
r"internal\s+server\s+error",
r"service\s+unavailable",
r"server\s+error",
r"backend\s+error",
r"upstream\s+error",
r"service\s+temporarily\s+unavailable",
r"maintenance\s+mode",
r"under\s+maintenance",
r"try\s+again\s+later",
r"temporarily\s+unavailable",
r"system\s+error",
r"unexpected\s+error",
r"unhandled\s+exception",
];
pub const TIMEOUT_PATTERNS: &[&str] = &[
r"timeout",
r"timed?\s*out",
r"etimedout",
r"connection\s+timed?\s*out",
r"read\s+timed?\s*out",
r"request\s+timed?\s*out",
r"gateway\s+timeout",
r"deadline\s+exceeded",
r"took\s+too\s+long",
r"operation\s+timed?\s*out",
r"socket\s+timeout",
];
pub const RATE_LIMIT_PATTERNS: &[&str] = &[
r"rate\s+limit",
r"rate.limited",
r"(status|error|http)\s*:?\s*429",
r"429\s+(too\s+many|rate|limit)",
r"too\s+many\s+requests?",
r"quota\s+exceeded",
r"quota\s+limit",
r"throttl(ed|ing)",
r"request\s+limit",
r"api\s+limit",
r"calls?\s+per\s+(second|minute|hour|day)",
r"exceeded\s+.*\s+limit",
r"slow\s+down",
r"retry\s+after",
r"requests?\s+exceeded",
];
pub const NETWORK_PATTERNS: &[&str] = &[
r"connection\s+refused",
r"econnrefused",
r"econnreset",
r"connection\s+reset",
r"enotfound",
r"dns\s+(error|failure|lookup)",
r"host\s+not\s+found",
r"network\s+(error|failure|unreachable)",
r"no\s+route\s+to\s+host",
r"socket\s+error",
r"connection\s+failed",
r"unable\s+to\s+connect",
r"cannot\s+connect",
r"could\s+not\s+connect",
r"connect\s+error",
r"ssl\s+(error|handshake|certificate)",
r"certificate\s+(error|invalid|expired)",
];
pub const MALFORMED_PATTERNS: &[&str] = &[
r"json\s+parse\s+error",
r"invalid\s+json",
r"unexpected\s+token",
r"syntax\s+error.*json",
r"malformed\s+(response|json|data)",
r"unexpected\s+end\s+of",
r"parse\s+error",
r"parsing\s+failed",
r"invalid\s+response",
r"unexpected\s+response",
r"response\s+format",
r"missing\s+field.*response",
r"unexpected\s+schema",
r"schema\s+validation",
r"deserialization\s+error",
r"failed\s+to\s+decode",
];
pub const CONTEXT_OVERFLOW_PATTERNS: &[&str] = &[
r"context\s+(length|limit|overflow|exceeded)",
r"token\s+(limit|overflow|exceeded)",
r"max(imum)?\s+tokens?",
r"input\s+too\s+(long|large)",
r"exceeds?\s+(context|token|character|input)\s+limit",
r"message\s+too\s+(long|large)",
r"content\s+too\s+(long|large)",
r"truncat(ed|ion)\s+(due\s+to|because|for)\s+(length|size|limit)",
r"maximum\s+context",
r"prompt\s+too\s+(long|large)",
];
fn compile(patterns: &[&str]) -> Regex {
let combined = patterns
.iter()
.map(|p| format!("({})", p))
.collect::<Vec<_>>()
.join("|");
Regex::new(&format!("(?i){}", combined)).expect("exhaustion pattern regex must compile")
}
fn api_error_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(API_ERROR_PATTERNS))
}
fn timeout_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(TIMEOUT_PATTERNS))
}
fn rate_limit_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(RATE_LIMIT_PATTERNS))
}
fn network_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(NETWORK_PATTERNS))
}
fn malformed_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(MALFORMED_PATTERNS))
}
fn context_overflow_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(CONTEXT_OVERFLOW_PATTERNS))
}
fn snippet_around(text: &str, m: regex::Match<'_>, context: usize) -> String {
let start = m.start().saturating_sub(context);
let end = (m.end() + context).min(text.len());
let start = align_char_boundary(text, start, false);
let end = align_char_boundary(text, end, true);
let mut snippet = String::new();
if start > 0 {
snippet.push_str("...");
}
snippet.push_str(&text[start..end]);
if end < text.len() {
snippet.push_str("...");
}
snippet
}
fn align_char_boundary(s: &str, mut idx: usize, forward: bool) -> usize {
if idx >= s.len() {
return s.len();
}
while !s.is_char_boundary(idx) {
if forward {
idx += 1;
} else if idx == 0 {
break;
} else {
idx -= 1;
}
}
idx
}
pub fn analyze_exhaustion(messages: &[ShareGptMessage<'_>]) -> SignalGroup {
let mut group = SignalGroup::new("exhaustion");
for (i, msg) in messages.iter().enumerate() {
if msg.from != "observation" {
continue;
}
let value = msg.value;
let lower = value.to_lowercase();
if let Some(m) = rate_limit_re().find(&lower) {
group.add_signal(emit(
SignalType::EnvironmentExhaustionRateLimit,
i,
snippet_around(value, m, 50),
0.95,
"rate_limit",
m.as_str(),
));
continue;
}
if let Some(m) = api_error_re().find(&lower) {
group.add_signal(emit(
SignalType::EnvironmentExhaustionApiError,
i,
snippet_around(value, m, 50),
0.9,
"api_error",
m.as_str(),
));
continue;
}
if let Some(m) = timeout_re().find(&lower) {
group.add_signal(emit(
SignalType::EnvironmentExhaustionTimeout,
i,
snippet_around(value, m, 50),
0.9,
"timeout",
m.as_str(),
));
continue;
}
if let Some(m) = network_re().find(&lower) {
group.add_signal(emit(
SignalType::EnvironmentExhaustionNetwork,
i,
snippet_around(value, m, 50),
0.9,
"network",
m.as_str(),
));
continue;
}
if let Some(m) = malformed_re().find(&lower) {
group.add_signal(emit(
SignalType::EnvironmentExhaustionMalformed,
i,
snippet_around(value, m, 50),
0.85,
"malformed_response",
m.as_str(),
));
continue;
}
if let Some(m) = context_overflow_re().find(&lower) {
group.add_signal(emit(
SignalType::EnvironmentExhaustionContextOverflow,
i,
snippet_around(value, m, 50),
0.9,
"context_overflow",
m.as_str(),
));
}
}
group
}
fn emit(
t: SignalType,
idx: usize,
snippet: String,
confidence: f32,
kind: &str,
matched: &str,
) -> SignalInstance {
SignalInstance::new(t, idx, snippet)
.with_confidence(confidence)
.with_metadata(json!({
"exhaustion_type": kind,
"matched": matched,
}))
}
#[cfg(test)]
mod tests {
use super::*;
fn obs(value: &str) -> ShareGptMessage<'_> {
ShareGptMessage {
from: "observation",
value,
}
}
#[test]
fn detects_rate_limit() {
let g = analyze_exhaustion(&[obs("HTTP 429: too many requests, retry after 30s")]);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::EnvironmentExhaustionRateLimit)));
}
#[test]
fn detects_api_error() {
let g = analyze_exhaustion(&[obs("503 service unavailable - try again later")]);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::EnvironmentExhaustionApiError)));
}
#[test]
fn detects_timeout() {
let g = analyze_exhaustion(&[obs("Connection timed out after 30 seconds")]);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::EnvironmentExhaustionTimeout)));
}
#[test]
fn detects_network_failure() {
let g = analyze_exhaustion(&[obs("ECONNREFUSED: connection refused by remote host")]);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::EnvironmentExhaustionNetwork)));
}
#[test]
fn detects_malformed_response() {
let g = analyze_exhaustion(&[obs("Invalid JSON: unexpected token at position 42")]);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::EnvironmentExhaustionMalformed)));
}
#[test]
fn detects_context_overflow() {
let g = analyze_exhaustion(&[obs("Maximum context length exceeded for this model")]);
assert!(g.signals.iter().any(|s| matches!(
s.signal_type,
SignalType::EnvironmentExhaustionContextOverflow
)));
}
}

View file

@ -0,0 +1,3 @@
//! Environment signals: exhaustion (external system failures and constraints).
pub mod exhaustion;

View file

@ -0,0 +1,388 @@
//! Execution failure detector. Direct port of `signals/execution/failure.py`.
use std::sync::OnceLock;
use regex::Regex;
use serde_json::json;
use crate::signals::analyzer::ShareGptMessage;
use crate::signals::schemas::{SignalGroup, SignalInstance, SignalType};
pub const INVALID_ARGS_PATTERNS: &[&str] = &[
r"invalid\s+argument",
r"invalid\s+parameter",
r"invalid\s+type",
r"type\s*error",
r"expected\s+\w+\s*,?\s*got\s+\w+",
r"required\s+field",
r"required\s+parameter",
r"missing\s+required",
r"missing\s+argument",
r"validation\s+failed",
r"validation\s+error",
r"invalid\s+value",
r"invalid\s+format",
r"must\s+be\s+(a|an)\s+\w+",
r"cannot\s+be\s+(null|empty|none)",
r"is\s+not\s+valid",
r"does\s+not\s+match",
r"out\s+of\s+range",
r"invalid\s+date",
r"invalid\s+json",
r"malformed\s+request",
];
pub const BAD_QUERY_PATTERNS: &[&str] = &[
r"invalid\s+query",
r"query\s+syntax\s+error",
r"malformed\s+query",
r"unknown\s+field",
r"invalid\s+field",
r"invalid\s+filter",
r"invalid\s+search",
r"unknown\s+id",
r"invalid\s+id",
r"id\s+format\s+error",
r"invalid\s+identifier",
r"query\s+failed",
r"search\s+error",
r"invalid\s+operator",
r"unsupported\s+query",
];
pub const TOOL_NOT_FOUND_PATTERNS: &[&str] = &[
r"unknown\s+function",
r"unknown\s+tool",
r"function\s+not\s+found",
r"tool\s+not\s+found",
r"no\s+such\s+function",
r"no\s+such\s+tool",
r"undefined\s+function",
r"action\s+not\s+supported",
r"invalid\s+tool",
r"invalid\s+function",
r"unrecognized\s+function",
];
pub const AUTH_MISUSE_PATTERNS: &[&str] = &[
r"\bunauthorized\b",
r"(status|error|http|code)\s*:?\s*401",
r"401\s+unauthorized",
r"403\s+forbidden",
r"permission\s+denied",
r"access\s+denied",
r"authentication\s+required",
r"invalid\s+credentials",
r"invalid\s+token",
r"token\s+expired",
r"missing\s+authorization",
r"\bforbidden\b",
r"not\s+authorized",
r"insufficient\s+permissions?",
];
pub const STATE_ERROR_PATTERNS: &[&str] = &[
r"invalid\s+state",
r"illegal\s+state",
r"must\s+call\s+\w+\s+first",
r"must\s+\w+\s+before",
r"cannot\s+\w+\s+before",
r"already\s+(exists?|created|started|finished)",
r"not\s+initialized",
r"not\s+started",
r"already\s+in\s+progress",
r"operation\s+in\s+progress",
r"sequence\s+error",
r"precondition\s+failed",
r"(status|error|http)\s*:?\s*409",
r"409\s+conflict",
r"\bconflict\b",
];
fn compile(patterns: &[&str]) -> Regex {
// Use `(?i)` flag for case-insensitive matching, matching Python's `re.IGNORECASE`.
let combined = patterns
.iter()
.map(|p| format!("({})", p))
.collect::<Vec<_>>()
.join("|");
Regex::new(&format!("(?i){}", combined)).expect("failure pattern regex must compile")
}
fn invalid_args_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(INVALID_ARGS_PATTERNS))
}
fn bad_query_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(BAD_QUERY_PATTERNS))
}
fn tool_not_found_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(TOOL_NOT_FOUND_PATTERNS))
}
fn auth_misuse_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(AUTH_MISUSE_PATTERNS))
}
fn state_error_re() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| compile(STATE_ERROR_PATTERNS))
}
/// Pull tool name + args from a `function_call` message. Mirrors
/// `_extract_tool_info` in the reference.
pub(crate) fn extract_tool_info(value: &str) -> (String, String) {
if let Ok(parsed) = serde_json::from_str::<serde_json::Value>(value) {
if let Some(obj) = parsed.as_object() {
let name = obj
.get("name")
.or_else(|| obj.get("function"))
.and_then(|v| v.as_str())
.map(|s| s.to_string())
.unwrap_or_else(|| "unknown".to_string());
let args = match obj.get("arguments").or_else(|| obj.get("args")) {
Some(serde_json::Value::Object(o)) => {
serde_json::to_string(&serde_json::Value::Object(o.clone())).unwrap_or_default()
}
Some(other) => other
.as_str()
.map(|s| s.to_string())
.unwrap_or_else(|| serde_json::to_string(other).unwrap_or_default()),
None => String::new(),
};
return (name, args);
}
}
let mut snippet: String = value.chars().take(200).collect();
snippet.shrink_to_fit();
("unknown".to_string(), snippet)
}
/// Build a context-window snippet around a regex match, with leading/trailing
/// ellipses when truncated. Mirrors `_get_snippet`.
fn snippet_around(text: &str, m: regex::Match<'_>, context: usize) -> String {
let start = m.start().saturating_sub(context);
let end = (m.end() + context).min(text.len());
// Ensure we cut on UTF-8 boundaries.
let start = align_char_boundary(text, start, false);
let end = align_char_boundary(text, end, true);
let mut snippet = String::new();
if start > 0 {
snippet.push_str("...");
}
snippet.push_str(&text[start..end]);
if end < text.len() {
snippet.push_str("...");
}
snippet
}
fn align_char_boundary(s: &str, mut idx: usize, forward: bool) -> usize {
if idx >= s.len() {
return s.len();
}
while !s.is_char_boundary(idx) {
if forward {
idx += 1;
} else if idx == 0 {
break;
} else {
idx -= 1;
}
}
idx
}
pub fn analyze_failure(messages: &[ShareGptMessage<'_>]) -> SignalGroup {
let mut group = SignalGroup::new("failure");
let mut last_call: Option<(usize, String, String)> = None;
for (i, msg) in messages.iter().enumerate() {
match msg.from {
"function_call" => {
let (name, args) = extract_tool_info(msg.value);
last_call = Some((i, name, args));
continue;
}
"observation" => {}
_ => continue,
}
let value = msg.value;
let lower = value.to_lowercase();
let (call_index, tool_name) = match &last_call {
Some((idx, name, _)) => (*idx, name.clone()),
None => (i.saturating_sub(1), "unknown".to_string()),
};
if let Some(m) = invalid_args_re().find(&lower) {
group.add_signal(
SignalInstance::new(
SignalType::ExecutionFailureInvalidArgs,
i,
snippet_around(value, m, 50),
)
.with_confidence(0.9)
.with_metadata(json!({
"tool_name": tool_name,
"call_index": call_index,
"error_type": "invalid_args",
"matched": m.as_str(),
})),
);
continue;
}
if let Some(m) = tool_not_found_re().find(&lower) {
group.add_signal(
SignalInstance::new(
SignalType::ExecutionFailureToolNotFound,
i,
snippet_around(value, m, 50),
)
.with_confidence(0.95)
.with_metadata(json!({
"tool_name": tool_name,
"call_index": call_index,
"error_type": "tool_not_found",
"matched": m.as_str(),
})),
);
continue;
}
if let Some(m) = auth_misuse_re().find(&lower) {
group.add_signal(
SignalInstance::new(
SignalType::ExecutionFailureAuthMisuse,
i,
snippet_around(value, m, 50),
)
.with_confidence(0.8)
.with_metadata(json!({
"tool_name": tool_name,
"call_index": call_index,
"error_type": "auth_misuse",
"matched": m.as_str(),
})),
);
continue;
}
if let Some(m) = state_error_re().find(&lower) {
group.add_signal(
SignalInstance::new(
SignalType::ExecutionFailureStateError,
i,
snippet_around(value, m, 50),
)
.with_confidence(0.85)
.with_metadata(json!({
"tool_name": tool_name,
"call_index": call_index,
"error_type": "state_error",
"matched": m.as_str(),
})),
);
continue;
}
if let Some(m) = bad_query_re().find(&lower) {
let confidence = if ["error", "invalid", "failed"]
.iter()
.any(|w| lower.contains(w))
{
0.8
} else {
0.6
};
group.add_signal(
SignalInstance::new(
SignalType::ExecutionFailureBadQuery,
i,
snippet_around(value, m, 50),
)
.with_confidence(confidence)
.with_metadata(json!({
"tool_name": tool_name,
"call_index": call_index,
"error_type": "bad_query",
"matched": m.as_str(),
})),
);
}
}
group
}
#[cfg(test)]
mod tests {
use super::*;
fn fc(value: &str) -> ShareGptMessage<'_> {
ShareGptMessage {
from: "function_call",
value,
}
}
fn obs(value: &str) -> ShareGptMessage<'_> {
ShareGptMessage {
from: "observation",
value,
}
}
#[test]
fn detects_invalid_args() {
let msgs = vec![
fc(r#"{"name":"create_user","arguments":{"age":"twelve"}}"#),
obs("Error: validation failed - expected integer got string for field age"),
];
let g = analyze_failure(&msgs);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::ExecutionFailureInvalidArgs)));
}
#[test]
fn detects_tool_not_found() {
let msgs = vec![
fc(r#"{"name":"send_thought","arguments":{}}"#),
obs("Error: unknown function 'send_thought'"),
];
let g = analyze_failure(&msgs);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::ExecutionFailureToolNotFound)));
}
#[test]
fn detects_auth_misuse() {
let msgs = vec![
fc(r#"{"name":"get_secret","arguments":{}}"#),
obs("HTTP 401 Unauthorized"),
];
let g = analyze_failure(&msgs);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::ExecutionFailureAuthMisuse)));
}
#[test]
fn detects_state_error() {
let msgs = vec![
fc(r#"{"name":"commit_tx","arguments":{}}"#),
obs("must call begin_tx first"),
];
let g = analyze_failure(&msgs);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::ExecutionFailureStateError)));
}
}

View file

@ -0,0 +1,433 @@
//! Execution loops detector. Direct port of `signals/execution/loops.py`.
use serde_json::json;
use crate::signals::analyzer::ShareGptMessage;
use crate::signals::schemas::{SignalGroup, SignalInstance, SignalType};
pub const RETRY_THRESHOLD: usize = 3;
pub const PARAMETER_DRIFT_THRESHOLD: usize = 3;
pub const OSCILLATION_CYCLES_THRESHOLD: usize = 3;
#[derive(Debug, Clone)]
pub struct ToolCall {
pub index: usize,
pub name: String,
/// Canonical JSON string of arguments (sorted keys when parseable).
pub args: String,
pub args_dict: Option<serde_json::Map<String, serde_json::Value>>,
}
impl ToolCall {
pub fn args_equal(&self, other: &ToolCall) -> bool {
match (&self.args_dict, &other.args_dict) {
(Some(a), Some(b)) => a == b,
_ => self.args == other.args,
}
}
}
fn parse_tool_call(index: usize, msg: &ShareGptMessage<'_>) -> Option<ToolCall> {
if msg.from != "function_call" {
return None;
}
let value = msg.value;
if let Ok(parsed) = serde_json::from_str::<serde_json::Value>(value) {
if let Some(obj) = parsed.as_object() {
let name = obj
.get("name")
.or_else(|| obj.get("function"))
.and_then(|v| v.as_str())
.map(|s| s.to_string())
.unwrap_or_else(|| "unknown".to_string());
let raw_args = obj.get("arguments").or_else(|| obj.get("args"));
let (args_str, args_dict) = match raw_args {
Some(serde_json::Value::Object(o)) => {
let mut keys: Vec<&String> = o.keys().collect();
keys.sort();
let mut canon = serde_json::Map::new();
for k in keys {
canon.insert(k.clone(), o[k].clone());
}
(
serde_json::to_string(&serde_json::Value::Object(canon.clone()))
.unwrap_or_default(),
Some(canon),
)
}
Some(other) => (
other
.as_str()
.map(|s| s.to_string())
.unwrap_or_else(|| serde_json::to_string(other).unwrap_or_default()),
None,
),
None => (String::new(), None),
};
return Some(ToolCall {
index,
name,
args: args_str,
args_dict,
});
}
}
if let Some(paren) = value.find('(') {
if paren > 0 {
let name = value[..paren].trim().to_string();
let args_part = &value[paren..];
if args_part.starts_with('(') && args_part.ends_with(')') {
let inner = args_part[1..args_part.len() - 1].trim();
if let Ok(serde_json::Value::Object(o)) =
serde_json::from_str::<serde_json::Value>(inner)
{
let mut keys: Vec<&String> = o.keys().collect();
keys.sort();
let mut canon = serde_json::Map::new();
for k in keys {
canon.insert(k.clone(), o[k].clone());
}
return Some(ToolCall {
index,
name,
args: serde_json::to_string(&serde_json::Value::Object(canon.clone()))
.unwrap_or_default(),
args_dict: Some(canon),
});
}
return Some(ToolCall {
index,
name,
args: inner.to_string(),
args_dict: None,
});
}
return Some(ToolCall {
index,
name,
args: args_part.to_string(),
args_dict: None,
});
}
}
Some(ToolCall {
index,
name: value.trim().to_string(),
args: String::new(),
args_dict: None,
})
}
fn extract_tool_calls(messages: &[ShareGptMessage<'_>]) -> Vec<ToolCall> {
let mut out = Vec::new();
for (i, msg) in messages.iter().enumerate() {
if let Some(c) = parse_tool_call(i, msg) {
out.push(c);
}
}
out
}
fn detect_retry(calls: &[ToolCall]) -> Vec<(usize, usize, String)> {
if calls.len() < RETRY_THRESHOLD {
return Vec::new();
}
let mut patterns = Vec::new();
let mut i = 0;
while i < calls.len() {
let current = &calls[i];
let mut j = i + 1;
let mut run_length = 1;
while j < calls.len() {
if calls[j].name == current.name && calls[j].args_equal(current) {
run_length += 1;
j += 1;
} else {
break;
}
}
if run_length >= RETRY_THRESHOLD {
patterns.push((calls[i].index, calls[j - 1].index, current.name.clone()));
i = j;
} else {
i += 1;
}
}
patterns
}
fn detect_parameter_drift(calls: &[ToolCall]) -> Vec<(usize, usize, String, usize)> {
if calls.len() < PARAMETER_DRIFT_THRESHOLD {
return Vec::new();
}
let mut patterns = Vec::new();
let mut i = 0;
while i < calls.len() {
let current_name = calls[i].name.clone();
let mut seen_args: Vec<String> = vec![calls[i].args.clone()];
let mut unique_args = 1;
let mut j = i + 1;
while j < calls.len() {
if calls[j].name != current_name {
break;
}
if !seen_args.iter().any(|a| a == &calls[j].args) {
seen_args.push(calls[j].args.clone());
unique_args += 1;
}
j += 1;
}
let run_length = j - i;
if run_length >= PARAMETER_DRIFT_THRESHOLD && unique_args >= 2 {
patterns.push((
calls[i].index,
calls[j - 1].index,
current_name,
unique_args,
));
i = j;
} else {
i += 1;
}
}
patterns
}
fn detect_oscillation(calls: &[ToolCall]) -> Vec<(usize, usize, Vec<String>, usize)> {
let min_calls = 2 * OSCILLATION_CYCLES_THRESHOLD;
if calls.len() < min_calls {
return Vec::new();
}
let mut patterns = Vec::new();
let mut i: usize = 0;
while i + min_calls <= calls.len() {
let max_pat_len = (5usize).min(calls.len() - i);
let mut found_for_i = false;
for pat_len in 2..=max_pat_len {
let pattern_names: Vec<String> =
(0..pat_len).map(|k| calls[i + k].name.clone()).collect();
let unique: std::collections::HashSet<&String> = pattern_names.iter().collect();
if unique.len() < 2 {
continue;
}
let mut cycles = 1;
let mut pos = i + pat_len;
while pos + pat_len <= calls.len() {
let mut all_match = true;
for k in 0..pat_len {
if calls[pos + k].name != pattern_names[k] {
all_match = false;
break;
}
}
if all_match {
cycles += 1;
pos += pat_len;
} else {
break;
}
}
if cycles >= OSCILLATION_CYCLES_THRESHOLD {
let end_idx_in_calls = i + (cycles * pat_len) - 1;
patterns.push((
calls[i].index,
calls[end_idx_in_calls].index,
pattern_names,
cycles,
));
// Mirror Python: `i = end_idx + 1 - pattern_len`. We set `i` so that
// the next outer iteration begins after we account for overlap.
i = end_idx_in_calls + 1 - pat_len;
found_for_i = true;
break;
}
}
if !found_for_i {
i += 1;
} else {
// Match Python's `i = end_idx + 1 - pattern_len; break` then loop.
// We'll continue; the outer while re-checks i.
}
}
if patterns.len() > 1 {
patterns = deduplicate_patterns(patterns);
}
patterns
}
fn deduplicate_patterns(
mut patterns: Vec<(usize, usize, Vec<String>, usize)>,
) -> Vec<(usize, usize, Vec<String>, usize)> {
if patterns.is_empty() {
return patterns;
}
patterns.sort_by(|a, b| {
let ord = a.0.cmp(&b.0);
if ord != std::cmp::Ordering::Equal {
ord
} else {
(b.1 - b.0).cmp(&(a.1 - a.0))
}
});
let mut result = Vec::new();
let mut last_end: i64 = -1;
for p in patterns {
if (p.0 as i64) > last_end {
last_end = p.1 as i64;
result.push(p);
}
}
result
}
pub fn analyze_loops(messages: &[ShareGptMessage<'_>]) -> SignalGroup {
let mut group = SignalGroup::new("loops");
let calls = extract_tool_calls(messages);
if calls.len() < RETRY_THRESHOLD {
return group;
}
let retries = detect_retry(&calls);
for (start_idx, end_idx, tool_name) in &retries {
let call_count = calls
.iter()
.filter(|c| *start_idx <= c.index && c.index <= *end_idx)
.count();
group.add_signal(
SignalInstance::new(
SignalType::ExecutionLoopsRetry,
*start_idx,
format!(
"Tool '{}' called {} times with identical arguments",
tool_name, call_count
),
)
.with_confidence(0.95)
.with_metadata(json!({
"tool_name": tool_name,
"start_index": start_idx,
"end_index": end_idx,
"call_count": call_count,
"loop_type": "retry",
})),
);
}
let drifts = detect_parameter_drift(&calls);
for (start_idx, end_idx, tool_name, variation_count) in &drifts {
let overlaps_retry = retries
.iter()
.any(|r| !(*end_idx < r.0 || *start_idx > r.1));
if overlaps_retry {
continue;
}
let call_count = calls
.iter()
.filter(|c| *start_idx <= c.index && c.index <= *end_idx)
.count();
group.add_signal(
SignalInstance::new(
SignalType::ExecutionLoopsParameterDrift,
*start_idx,
format!(
"Tool '{}' called {} times with {} different argument variations",
tool_name, call_count, variation_count
),
)
.with_confidence(0.85)
.with_metadata(json!({
"tool_name": tool_name,
"start_index": start_idx,
"end_index": end_idx,
"call_count": call_count,
"variation_count": variation_count,
"loop_type": "parameter_drift",
})),
);
}
let oscillations = detect_oscillation(&calls);
for (start_idx, end_idx, tool_names, cycle_count) in &oscillations {
let pattern_str = tool_names.join(" \u{2192} ");
group.add_signal(
SignalInstance::new(
SignalType::ExecutionLoopsOscillation,
*start_idx,
format!(
"Oscillation pattern [{}] repeated {} times",
pattern_str, cycle_count
),
)
.with_confidence(0.9)
.with_metadata(json!({
"pattern": tool_names,
"start_index": start_idx,
"end_index": end_idx,
"cycle_count": cycle_count,
"loop_type": "oscillation",
})),
);
}
group
}
#[cfg(test)]
mod tests {
use super::*;
fn fc(value: &str) -> ShareGptMessage<'_> {
ShareGptMessage {
from: "function_call",
value,
}
}
#[test]
fn detects_retry_loop() {
let arg = r#"{"name":"check_status","arguments":{"id":"abc"}}"#;
let msgs = vec![fc(arg), fc(arg), fc(arg), fc(arg)];
let g = analyze_loops(&msgs);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::ExecutionLoopsRetry)));
}
#[test]
fn detects_parameter_drift() {
let msgs = vec![
fc(r#"{"name":"search","arguments":{"q":"a"}}"#),
fc(r#"{"name":"search","arguments":{"q":"ab"}}"#),
fc(r#"{"name":"search","arguments":{"q":"abc"}}"#),
fc(r#"{"name":"search","arguments":{"q":"abcd"}}"#),
];
let g = analyze_loops(&msgs);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::ExecutionLoopsParameterDrift)));
}
#[test]
fn detects_oscillation() {
let a = r#"{"name":"toolA","arguments":{}}"#;
let b = r#"{"name":"toolB","arguments":{}}"#;
let msgs = vec![fc(a), fc(b), fc(a), fc(b), fc(a), fc(b)];
let g = analyze_loops(&msgs);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::ExecutionLoopsOscillation)));
}
#[test]
fn no_signals_when_few_calls() {
let msgs = vec![fc(r#"{"name":"only_once","arguments":{}}"#)];
let g = analyze_loops(&msgs);
assert!(g.signals.is_empty());
}
}

View file

@ -0,0 +1,5 @@
//! Execution signals: failure (agent-caused tool errors) and loops
//! (repetitive tool-call behavior).
pub mod failure;
pub mod loops;

View file

@ -0,0 +1,193 @@
//! Shared constants for the interaction layer detectors.
//!
//! Direct port of `signals/interaction/constants.py`.
use std::collections::HashSet;
use std::sync::OnceLock;
pub const POSITIVE_PREFIXES: &[&str] = &[
"yes",
"yeah",
"yep",
"yup",
"sure",
"ok",
"okay",
"great",
"awesome",
"perfect",
"thanks",
"thank",
"wonderful",
"excellent",
"amazing",
"nice",
"good",
"cool",
"absolutely",
"definitely",
"please",
];
pub const CONFIRMATION_PREFIXES: &[&str] = &[
"yes",
"yeah",
"yep",
"yup",
"correct",
"right",
"that's correct",
"thats correct",
"that's right",
"thats right",
"that is correct",
"that is right",
];
const STOPWORD_LIST: &[&str] = &[
"a",
"about",
"above",
"after",
"again",
"against",
"all",
"am",
"an",
"and",
"any",
"are",
"as",
"at",
"be",
"because",
"been",
"before",
"being",
"below",
"between",
"both",
"but",
"by",
"can",
"could",
"did",
"do",
"does",
"doing",
"down",
"during",
"each",
"few",
"for",
"from",
"further",
"had",
"has",
"have",
"having",
"he",
"her",
"here",
"hers",
"herself",
"him",
"himself",
"his",
"how",
"i",
"if",
"in",
"into",
"is",
"it",
"its",
"itself",
"just",
"me",
"more",
"most",
"my",
"myself",
"no",
"nor",
"not",
"now",
"of",
"off",
"on",
"once",
"only",
"or",
"other",
"our",
"ours",
"ourselves",
"out",
"over",
"own",
"same",
"she",
"should",
"so",
"some",
"such",
"than",
"that",
"the",
"their",
"theirs",
"them",
"themselves",
"then",
"there",
"these",
"they",
"this",
"those",
"through",
"to",
"too",
"under",
"until",
"up",
"very",
"was",
"we",
"were",
"what",
"when",
"where",
"which",
"while",
"who",
"whom",
"why",
"with",
"would",
"you",
"your",
"yours",
"yourself",
"yourselves",
];
pub fn stopwords() -> &'static HashSet<&'static str> {
static SET: OnceLock<HashSet<&'static str>> = OnceLock::new();
SET.get_or_init(|| STOPWORD_LIST.iter().copied().collect())
}
/// Returns true if `text` (case-insensitive, trimmed) starts with any of the
/// given prefixes treated as **whole tokens or token sequences**. This matches
/// the Python's `text_lower.startswith(prefix)` plus the natural intent that
/// `"please"` shouldn't fire on `"pleased"`.
pub fn starts_with_prefix(text: &str, prefixes: &[&str]) -> bool {
let lowered = text.to_lowercase();
let trimmed = lowered.trim_start();
for prefix in prefixes {
if trimmed.starts_with(prefix) {
return true;
}
}
false
}

View file

@ -0,0 +1,445 @@
//! Disengagement signals: escalation, quit, negative stance.
//!
//! Direct port of `signals/interaction/disengagement.py`.
use std::sync::OnceLock;
use regex::Regex;
use serde_json::json;
use super::constants::{starts_with_prefix, POSITIVE_PREFIXES};
use crate::signals::schemas::{SignalGroup, SignalInstance, SignalType};
use crate::signals::text_processing::{normalize_patterns, NormalizedMessage, NormalizedPattern};
const ESCALATION_PATTERN_TEXTS: &[&str] = &[
// Human requests
"speak to a human",
"talk to a human",
"connect me to a human",
"connect me with a human",
"transfer me to a human",
"get me a human",
"chat with a human",
// Person requests
"speak to a person",
"talk to a person",
"connect me to a person",
"connect me with a person",
"transfer me to a person",
"get me a person",
"chat with a person",
// Real person requests
"speak to a real person",
"talk to a real person",
"connect me to a real person",
"connect me with a real person",
"transfer me to a real person",
"get me a real person",
"chat with a real person",
// Actual person requests
"speak to an actual person",
"talk to an actual person",
"connect me to an actual person",
"connect me with an actual person",
"transfer me to an actual person",
"get me an actual person",
"chat with an actual person",
// Supervisor requests
"speak to a supervisor",
"talk to a supervisor",
"connect me to a supervisor",
"connect me with a supervisor",
"transfer me to a supervisor",
"get me a supervisor",
"chat with a supervisor",
// Manager requests
"speak to a manager",
"talk to a manager",
"connect me to a manager",
"connect me with a manager",
"transfer me to a manager",
"get me a manager",
"chat with a manager",
// Customer service requests
"speak to customer service",
"talk to customer service",
"connect me to customer service",
"connect me with customer service",
"transfer me to customer service",
"get me customer service",
"chat with customer service",
// Customer support requests
"speak to customer support",
"talk to customer support",
"connect me to customer support",
"connect me with customer support",
"transfer me to customer support",
"get me customer support",
"chat with customer support",
// Support requests
"speak to support",
"talk to support",
"connect me to support",
"connect me with support",
"transfer me to support",
"get me support",
"chat with support",
// Tech support requests
"speak to tech support",
"talk to tech support",
"connect me to tech support",
"connect me with tech support",
"transfer me to tech support",
"get me tech support",
"chat with tech support",
// Help desk requests
"speak to help desk",
"talk to help desk",
"connect me to help desk",
"connect me with help desk",
"transfer me to help desk",
"get me help desk",
"chat with help desk",
// Explicit escalation
"escalate this",
];
const QUIT_PATTERN_TEXTS: &[&str] = &[
"i give up",
"i'm giving up",
"im giving up",
"i'm going to quit",
"i quit",
"forget it",
"forget this",
"screw it",
"screw this",
"don't bother trying",
"don't bother with this",
"don't bother with it",
"don't even bother",
"why bother",
"not worth it",
"this is hopeless",
"going elsewhere",
"try somewhere else",
"look elsewhere",
];
const NEGATIVE_STANCE_PATTERN_TEXTS: &[&str] = &[
"this is useless",
"not helpful",
"doesn't help",
"not helping",
"you're not helping",
"youre not helping",
"this doesn't work",
"this doesnt work",
"this isn't working",
"this isnt working",
"still doesn't work",
"still doesnt work",
"still not working",
"still isn't working",
"still isnt working",
"waste of time",
"wasting my time",
"this is ridiculous",
"this is absurd",
"this is insane",
"this is stupid",
"this is dumb",
"this sucks",
"this is frustrating",
"not good enough",
"why can't you",
"why cant you",
"same issue",
"did that already",
"done that already",
"tried that already",
"already tried that",
"i've done that",
"ive done that",
"i've tried that",
"ive tried that",
"i'm disappointed",
"im disappointed",
"disappointed with you",
"disappointed in you",
"useless bot",
"dumb bot",
"stupid bot",
];
const AGENT_DIRECTED_PROFANITY_PATTERN_TEXTS: &[&str] = &[
"this is bullshit",
"what bullshit",
"such bullshit",
"total bullshit",
"complete bullshit",
"this is crap",
"what crap",
"this is shit",
"what the hell is wrong with you",
"what the fuck is wrong with you",
"you're fucking useless",
"youre fucking useless",
"you are fucking useless",
"fucking useless",
"this bot is shit",
"this bot is crap",
"damn bot",
"fucking bot",
"stupid fucking",
"are you fucking kidding",
"wtf is wrong with you",
"wtf is this",
"ffs just",
"for fucks sake",
"for fuck's sake",
"what the f**k",
"what the f*ck",
"what the f***",
"that's bullsh*t",
"thats bullsh*t",
"that's bull***t",
"thats bull***t",
"that's bs",
"thats bs",
"this is bullsh*t",
"this is bull***t",
"this is bs",
];
fn escalation_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(ESCALATION_PATTERN_TEXTS))
}
fn quit_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(QUIT_PATTERN_TEXTS))
}
fn negative_stance_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(NEGATIVE_STANCE_PATTERN_TEXTS))
}
fn profanity_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(AGENT_DIRECTED_PROFANITY_PATTERN_TEXTS))
}
fn re_consecutive_q() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| Regex::new(r"\?{2,}").unwrap())
}
fn re_consecutive_e() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| Regex::new(r"!{2,}").unwrap())
}
fn re_mixed_punct() -> &'static Regex {
static R: OnceLock<Regex> = OnceLock::new();
R.get_or_init(|| Regex::new(r"[?!]{3,}").unwrap())
}
pub fn analyze_disengagement(
normalized_messages: &[(usize, &str, NormalizedMessage)],
char_ngram_threshold: f32,
token_cosine_threshold: f32,
) -> SignalGroup {
let mut group = SignalGroup::new("disengagement");
for (idx, role, norm_msg) in normalized_messages {
if *role != "human" {
continue;
}
let text = &norm_msg.raw;
// All-caps shouting check.
let alpha_chars: String = text.chars().filter(|c| c.is_alphabetic()).collect();
if alpha_chars.chars().count() >= 10 {
let upper_count = alpha_chars.chars().filter(|c| c.is_uppercase()).count();
let upper_ratio = upper_count as f32 / alpha_chars.chars().count() as f32;
if upper_ratio >= 0.8 {
let snippet: String = text.chars().take(50).collect();
group.add_signal(
SignalInstance::new(SignalType::DisengagementNegativeStance, *idx, snippet)
.with_metadata(json!({
"indicator_type": "all_caps",
"upper_ratio": upper_ratio,
})),
);
}
}
// Excessive consecutive punctuation.
let starts_with_positive = starts_with_prefix(text, POSITIVE_PREFIXES);
let cq = re_consecutive_q().find_iter(text).count();
let ce = re_consecutive_e().find_iter(text).count();
let mixed = re_mixed_punct().find_iter(text).count();
if !starts_with_positive && (cq >= 1 || ce >= 1 || mixed >= 1) {
let snippet: String = text.chars().take(50).collect();
group.add_signal(
SignalInstance::new(SignalType::DisengagementNegativeStance, *idx, snippet)
.with_metadata(json!({
"indicator_type": "excessive_punctuation",
"consecutive_questions": cq,
"consecutive_exclamations": ce,
"mixed_punctuation": mixed,
})),
);
}
// Escalation patterns.
let mut found_escalation = false;
for pattern in escalation_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(
SignalType::DisengagementEscalation,
*idx,
pattern.raw.clone(),
)
.with_metadata(json!({"pattern_type": "escalation"})),
);
found_escalation = true;
break;
}
}
// Quit patterns (independent of escalation).
for pattern in quit_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(SignalType::DisengagementQuit, *idx, pattern.raw.clone())
.with_metadata(json!({"pattern_type": "quit"})),
);
break;
}
}
// Profanity (more specific) before generic negative stance.
let mut found_profanity = false;
for pattern in profanity_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(
SignalType::DisengagementNegativeStance,
*idx,
pattern.raw.clone(),
)
.with_metadata(json!({
"indicator_type": "profanity",
"pattern": pattern.raw,
})),
);
found_profanity = true;
break;
}
}
if !found_escalation && !found_profanity {
for pattern in negative_stance_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(
SignalType::DisengagementNegativeStance,
*idx,
pattern.raw.clone(),
)
.with_metadata(json!({
"indicator_type": "complaint",
"pattern": pattern.raw,
})),
);
break;
}
}
}
}
group
}
#[cfg(test)]
mod tests {
use super::*;
fn nm(s: &str) -> NormalizedMessage {
NormalizedMessage::from_text(s, 2000)
}
#[test]
fn detects_human_escalation_request() {
let msgs = vec![(
0usize,
"human",
nm("This is taking forever, get me a human"),
)];
let g = analyze_disengagement(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::DisengagementEscalation)));
}
#[test]
fn detects_quit_intent() {
let msgs = vec![(0usize, "human", nm("Forget it, I give up"))];
let g = analyze_disengagement(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::DisengagementQuit)));
}
#[test]
fn detects_negative_stance_complaint() {
let msgs = vec![(0usize, "human", nm("This is useless"))];
let g = analyze_disengagement(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::DisengagementNegativeStance)));
}
#[test]
fn detects_excessive_punctuation_as_negative_stance() {
let msgs = vec![(0usize, "human", nm("WHY isn't this working???"))];
let g = analyze_disengagement(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::DisengagementNegativeStance)));
}
#[test]
fn positive_excitement_is_not_disengagement() {
let msgs = vec![(0usize, "human", nm("Yes!! That's perfect!!!"))];
let g = analyze_disengagement(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.all(|s| !matches!(s.signal_type, SignalType::DisengagementNegativeStance)));
}
}

View file

@ -0,0 +1,338 @@
//! Misalignment signals: corrections, rephrases, clarifications.
//!
//! Direct port of `signals/interaction/misalignment.py`.
use std::sync::OnceLock;
use serde_json::json;
use super::constants::{stopwords, CONFIRMATION_PREFIXES};
use crate::signals::schemas::{SignalGroup, SignalInstance, SignalType};
use crate::signals::text_processing::{normalize_patterns, NormalizedMessage, NormalizedPattern};
const CORRECTION_PATTERN_TEXTS: &[&str] = &[
"no, i meant",
"no i meant",
"no, i said",
"no i said",
"no, i asked",
"no i asked",
"nah, i meant",
"nope, i meant",
"not what i said",
"not what i asked",
"that's not what i said",
"that's not what i asked",
"that's not what i meant",
"thats not what i said",
"thats not what i asked",
"thats not what i meant",
"that's not what you",
"no that's not what i",
"no, that's not what i",
"you're not quite right",
"youre not quite right",
"you're not exactly right",
"youre not exactly right",
"you're wrong about",
"youre wrong about",
"i just said",
"i already said",
"i already told you",
];
const REPHRASE_PATTERN_TEXTS: &[&str] = &[
"let me rephrase",
"let me explain again",
"what i'm trying to say",
"what i'm saying is",
"in other words",
];
const CLARIFICATION_PATTERN_TEXTS: &[&str] = &[
"i don't understand",
"don't understand",
"not understanding",
"can't understand",
"don't get it",
"don't follow",
"i'm confused",
"so confused",
"makes no sense",
"doesn't make sense",
"not making sense",
"what do you mean",
"what does that mean",
"what are you saying",
"i'm lost",
"totally lost",
"lost me",
"no clue what you",
"no idea what you",
"no clue what that",
"no idea what that",
"come again",
"say that again",
"repeat that",
"trouble following",
"hard to follow",
"can't follow",
];
fn correction_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(CORRECTION_PATTERN_TEXTS))
}
fn rephrase_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(REPHRASE_PATTERN_TEXTS))
}
fn clarification_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(CLARIFICATION_PATTERN_TEXTS))
}
fn is_confirmation_message(text: &str) -> bool {
let lowered = text.to_lowercase();
let trimmed = lowered.trim();
CONFIRMATION_PREFIXES.iter().any(|p| trimmed.starts_with(p))
}
/// Detect whether two user messages appear to be rephrases of each other.
pub fn is_similar_rephrase(
norm_msg1: &NormalizedMessage,
norm_msg2: &NormalizedMessage,
overlap_threshold: f32,
min_meaningful_tokens: usize,
max_new_content_ratio: f32,
) -> bool {
if norm_msg1.tokens.len() < 3 || norm_msg2.tokens.len() < 3 {
return false;
}
if is_confirmation_message(&norm_msg1.raw) {
return false;
}
let stops = stopwords();
let tokens1: std::collections::HashSet<&str> = norm_msg1
.tokens
.iter()
.filter(|t| !stops.contains(t.as_str()))
.map(|s| s.as_str())
.collect();
let tokens2: std::collections::HashSet<&str> = norm_msg2
.tokens
.iter()
.filter(|t| !stops.contains(t.as_str()))
.map(|s| s.as_str())
.collect();
if tokens1.len() < min_meaningful_tokens || tokens2.len() < min_meaningful_tokens {
return false;
}
let new_tokens: std::collections::HashSet<&&str> = tokens1.difference(&tokens2).collect();
let new_content_ratio = if tokens1.is_empty() {
0.0
} else {
new_tokens.len() as f32 / tokens1.len() as f32
};
if new_content_ratio > max_new_content_ratio {
return false;
}
let intersection = tokens1.intersection(&tokens2).count();
let min_size = tokens1.len().min(tokens2.len());
if min_size == 0 {
return false;
}
let overlap_ratio = intersection as f32 / min_size as f32;
overlap_ratio >= overlap_threshold
}
/// Analyze user messages for misalignment signals.
pub fn analyze_misalignment(
normalized_messages: &[(usize, &str, NormalizedMessage)],
char_ngram_threshold: f32,
token_cosine_threshold: f32,
) -> SignalGroup {
let mut group = SignalGroup::new("misalignment");
let mut prev_user_idx: Option<usize> = None;
let mut prev_user_msg: Option<&NormalizedMessage> = None;
for (idx, role, norm_msg) in normalized_messages {
if *role != "human" {
continue;
}
let mut found_in_turn = false;
for pattern in correction_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(
SignalType::MisalignmentCorrection,
*idx,
pattern.raw.clone(),
)
.with_metadata(json!({"pattern_type": "correction"})),
);
found_in_turn = true;
break;
}
}
if found_in_turn {
prev_user_idx = Some(*idx);
prev_user_msg = Some(norm_msg);
continue;
}
for pattern in rephrase_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(
SignalType::MisalignmentRephrase,
*idx,
pattern.raw.clone(),
)
.with_metadata(json!({"pattern_type": "rephrase"})),
);
found_in_turn = true;
break;
}
}
if found_in_turn {
prev_user_idx = Some(*idx);
prev_user_msg = Some(norm_msg);
continue;
}
for pattern in clarification_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(
SignalType::MisalignmentClarification,
*idx,
pattern.raw.clone(),
)
.with_metadata(json!({"pattern_type": "clarification"})),
);
found_in_turn = true;
break;
}
}
if found_in_turn {
prev_user_idx = Some(*idx);
prev_user_msg = Some(norm_msg);
continue;
}
// Semantic rephrase vs the previous user message (recent only).
if let (Some(prev_idx), Some(prev_msg)) = (prev_user_idx, prev_user_msg) {
let turns_between = idx.saturating_sub(prev_idx);
if turns_between <= 3 && is_similar_rephrase(norm_msg, prev_msg, 0.75, 4, 0.5) {
group.add_signal(
SignalInstance::new(
SignalType::MisalignmentRephrase,
*idx,
"[similar rephrase detected]",
)
.with_confidence(0.8)
.with_metadata(json!({
"pattern_type": "semantic_rephrase",
"compared_to": prev_idx,
})),
);
}
}
prev_user_idx = Some(*idx);
prev_user_msg = Some(norm_msg);
}
group
}
#[cfg(test)]
mod tests {
use super::*;
fn nm(s: &str) -> NormalizedMessage {
NormalizedMessage::from_text(s, 2000)
}
fn make(items: &[(&'static str, &str)]) -> Vec<(usize, &'static str, NormalizedMessage)> {
items
.iter()
.enumerate()
.map(|(i, (role, text))| (i, *role, nm(text)))
.collect()
}
#[test]
fn detects_explicit_correction() {
let msgs = make(&[
("human", "Show me my orders"),
("gpt", "Sure, here are your invoices"),
("human", "No, I meant my recent orders"),
]);
let g = analyze_misalignment(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::MisalignmentCorrection)));
}
#[test]
fn detects_rephrase_marker() {
let msgs = make(&[
("human", "Show me X"),
("gpt", "Sure"),
("human", "Let me rephrase: I want X grouped by date"),
]);
let g = analyze_misalignment(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::MisalignmentRephrase)));
}
#[test]
fn detects_clarification_request() {
let msgs = make(&[
("human", "Run the report"),
("gpt", "Foobar quux baz."),
("human", "I don't understand what you mean"),
]);
let g = analyze_misalignment(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::MisalignmentClarification)));
}
#[test]
fn confirmation_is_not_a_rephrase() {
let m1 = nm("Yes, that's correct, please proceed with the order");
let m2 = nm("please proceed with the order for the same product");
assert!(!is_similar_rephrase(&m1, &m2, 0.75, 4, 0.5));
}
}

View file

@ -0,0 +1,10 @@
//! Interaction signals: misalignment, stagnation, disengagement, satisfaction.
//!
//! These signals capture how the dialogue itself unfolds (semantic alignment,
//! progress, engagement, closure) independent of tool execution outcomes.
pub mod constants;
pub mod disengagement;
pub mod misalignment;
pub mod satisfaction;
pub mod stagnation;

View file

@ -0,0 +1,177 @@
//! Satisfaction signals: gratitude, confirmation, success.
//!
//! Direct port of `signals/interaction/satisfaction.py`.
use std::sync::OnceLock;
use serde_json::json;
use crate::signals::schemas::{SignalGroup, SignalInstance, SignalType};
use crate::signals::text_processing::{normalize_patterns, NormalizedMessage, NormalizedPattern};
const GRATITUDE_PATTERN_TEXTS: &[&str] = &[
"that's helpful",
"that helps",
"this helps",
"appreciate it",
"appreciate that",
"that's perfect",
"exactly what i needed",
"just what i needed",
"you're the best",
"you rock",
"you're awesome",
"you're amazing",
"you're great",
];
const CONFIRMATION_PATTERN_TEXTS: &[&str] = &[
"that works",
"this works",
"that's great",
"that's amazing",
"this is great",
"that's awesome",
"love it",
"love this",
"love that",
];
const SUCCESS_PATTERN_TEXTS: &[&str] = &[
"it worked",
"that worked",
"this worked",
"it's working",
"that's working",
"this is working",
];
fn gratitude_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(GRATITUDE_PATTERN_TEXTS))
}
fn confirmation_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(CONFIRMATION_PATTERN_TEXTS))
}
fn success_patterns() -> &'static Vec<NormalizedPattern> {
static PATS: OnceLock<Vec<NormalizedPattern>> = OnceLock::new();
PATS.get_or_init(|| normalize_patterns(SUCCESS_PATTERN_TEXTS))
}
pub fn analyze_satisfaction(
normalized_messages: &[(usize, &str, NormalizedMessage)],
char_ngram_threshold: f32,
token_cosine_threshold: f32,
) -> SignalGroup {
let mut group = SignalGroup::new("satisfaction");
for (idx, role, norm_msg) in normalized_messages {
if *role != "human" {
continue;
}
let mut found = false;
for pattern in gratitude_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(
SignalType::SatisfactionGratitude,
*idx,
pattern.raw.clone(),
)
.with_metadata(json!({"pattern_type": "gratitude"})),
);
found = true;
break;
}
}
if found {
continue;
}
for pattern in confirmation_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(
SignalType::SatisfactionConfirmation,
*idx,
pattern.raw.clone(),
)
.with_metadata(json!({"pattern_type": "confirmation"})),
);
found = true;
break;
}
}
if found {
continue;
}
for pattern in success_patterns() {
if norm_msg.matches_normalized_pattern(
pattern,
char_ngram_threshold,
token_cosine_threshold,
) {
group.add_signal(
SignalInstance::new(SignalType::SatisfactionSuccess, *idx, pattern.raw.clone())
.with_metadata(json!({"pattern_type": "success"})),
);
break;
}
}
}
group
}
#[cfg(test)]
mod tests {
use super::*;
fn nm(s: &str) -> NormalizedMessage {
NormalizedMessage::from_text(s, 2000)
}
#[test]
fn detects_gratitude() {
let msgs = vec![(0usize, "human", nm("That's perfect, appreciate it!"))];
let g = analyze_satisfaction(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::SatisfactionGratitude)));
}
#[test]
fn detects_confirmation() {
let msgs = vec![(0usize, "human", nm("That works for me, thanks"))];
let g = analyze_satisfaction(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::SatisfactionConfirmation)));
}
#[test]
fn detects_success() {
let msgs = vec![(0usize, "human", nm("Great, it worked!"))];
let g = analyze_satisfaction(&msgs, 0.65, 0.6);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::SatisfactionSuccess)));
}
}

View file

@ -0,0 +1,241 @@
//! Stagnation signals: dragging (turn-count efficiency) and repetition.
//!
//! Direct port of `signals/interaction/stagnation.py`.
use serde_json::json;
use super::constants::{starts_with_prefix, POSITIVE_PREFIXES};
use crate::signals::schemas::{SignalGroup, SignalInstance, SignalType, TurnMetrics};
use crate::signals::text_processing::NormalizedMessage;
/// Adapter row used by stagnation::dragging detector. Mirrors the ShareGPT
/// `{"from": role, "value": text}` shape used in the Python reference.
pub struct ShareGptMsg<'a> {
pub from: &'a str,
}
pub fn analyze_dragging(
messages: &[ShareGptMsg<'_>],
baseline_turns: usize,
efficiency_threshold: f32,
) -> (SignalGroup, TurnMetrics) {
let mut group = SignalGroup::new("stagnation");
let mut user_turns: usize = 0;
let mut assistant_turns: usize = 0;
for m in messages {
match m.from {
"human" => user_turns += 1,
"gpt" => assistant_turns += 1,
_ => {}
}
}
let total_turns = user_turns;
let efficiency_score: f32 = if total_turns == 0 || total_turns <= baseline_turns {
1.0
} else {
let excess = (total_turns - baseline_turns) as f32;
1.0 / (1.0 + excess * 0.25)
};
let is_dragging = efficiency_score < efficiency_threshold;
let metrics = TurnMetrics {
total_turns,
user_turns,
assistant_turns,
is_dragging,
efficiency_score,
};
if is_dragging {
let last_idx = messages.len().saturating_sub(1);
group.add_signal(
SignalInstance::new(
SignalType::StagnationDragging,
last_idx,
format!(
"Conversation dragging: {} turns (efficiency: {:.2})",
total_turns, efficiency_score
),
)
.with_confidence(1.0 - efficiency_score)
.with_metadata(json!({
"total_turns": total_turns,
"efficiency_score": efficiency_score,
"baseline_turns": baseline_turns,
})),
);
}
(group, metrics)
}
pub fn analyze_repetition(
normalized_messages: &[(usize, &str, NormalizedMessage)],
lookback: usize,
exact_threshold: f32,
near_duplicate_threshold: f32,
) -> SignalGroup {
let mut group = SignalGroup::new("stagnation");
// We keep references into `normalized_messages`. Since `normalized_messages`
// is borrowed for the whole function, this avoids cloning.
let mut prev_human: Vec<(usize, &NormalizedMessage)> = Vec::new();
let mut prev_gpt: Vec<(usize, &NormalizedMessage)> = Vec::new();
for (idx, role, norm_msg) in normalized_messages {
if *role != "human" && *role != "gpt" {
continue;
}
// Skip human positive-prefix messages; they're naturally repetitive.
if *role == "human" && starts_with_prefix(&norm_msg.raw, POSITIVE_PREFIXES) {
prev_human.push((*idx, norm_msg));
continue;
}
if norm_msg.tokens.len() < 5 {
if *role == "human" {
prev_human.push((*idx, norm_msg));
} else {
prev_gpt.push((*idx, norm_msg));
}
continue;
}
let prev = if *role == "human" {
&prev_human
} else {
&prev_gpt
};
let start = prev.len().saturating_sub(lookback);
let mut matched = false;
for (prev_idx, prev_msg) in &prev[start..] {
if prev_msg.tokens.len() < 5 {
continue;
}
let similarity = norm_msg.ngram_similarity_with_message(prev_msg);
if similarity >= exact_threshold {
group.add_signal(
SignalInstance::new(
SignalType::StagnationRepetition,
*idx,
format!("Exact repetition with message {}", prev_idx),
)
.with_confidence(similarity)
.with_metadata(json!({
"repetition_type": "exact",
"compared_to": prev_idx,
"similarity": similarity,
"role": role,
})),
);
matched = true;
break;
} else if similarity >= near_duplicate_threshold {
group.add_signal(
SignalInstance::new(
SignalType::StagnationRepetition,
*idx,
format!("Near-duplicate with message {}", prev_idx),
)
.with_confidence(similarity)
.with_metadata(json!({
"repetition_type": "near_duplicate",
"compared_to": prev_idx,
"similarity": similarity,
"role": role,
})),
);
matched = true;
break;
}
}
let _ = matched;
if *role == "human" {
prev_human.push((*idx, norm_msg));
} else {
prev_gpt.push((*idx, norm_msg));
}
}
group
}
/// Combined stagnation analyzer: dragging + repetition.
pub fn analyze_stagnation(
messages: &[ShareGptMsg<'_>],
normalized_messages: &[(usize, &str, NormalizedMessage)],
baseline_turns: usize,
) -> (SignalGroup, TurnMetrics) {
let (dragging_group, metrics) = analyze_dragging(messages, baseline_turns, 0.5);
let repetition_group = analyze_repetition(normalized_messages, 2, 0.95, 0.85);
let mut combined = SignalGroup::new("stagnation");
for s in dragging_group.signals.iter().cloned() {
combined.add_signal(s);
}
for s in repetition_group.signals.iter().cloned() {
combined.add_signal(s);
}
(combined, metrics)
}
#[cfg(test)]
mod tests {
use super::*;
fn nm(s: &str) -> NormalizedMessage {
NormalizedMessage::from_text(s, 2000)
}
#[test]
fn dragging_after_many_user_turns() {
let msgs: Vec<_> = (0..15)
.flat_map(|_| [ShareGptMsg { from: "human" }, ShareGptMsg { from: "gpt" }])
.collect();
let (g, m) = analyze_dragging(&msgs, 5, 0.5);
assert!(m.is_dragging);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::StagnationDragging)));
}
#[test]
fn no_dragging_below_baseline() {
let msgs = vec![
ShareGptMsg { from: "human" },
ShareGptMsg { from: "gpt" },
ShareGptMsg { from: "human" },
ShareGptMsg { from: "gpt" },
];
let (g, m) = analyze_dragging(&msgs, 5, 0.5);
assert!(!m.is_dragging);
assert!(g.signals.is_empty());
}
#[test]
fn detects_exact_repetition_in_user_messages() {
let n = vec![
(
0usize,
"human",
nm("This widget is broken and needs repair right now"),
),
(1, "gpt", nm("Sorry to hear that. Let me look into it.")),
(
2,
"human",
nm("This widget is broken and needs repair right now"),
),
];
let g = analyze_repetition(&n, 2, 0.95, 0.85);
assert!(g
.signals
.iter()
.any(|s| matches!(s.signal_type, SignalType::StagnationRepetition)));
}
}

View file

@ -1,3 +1,26 @@
mod analyzer;
//! Plano signals: behavioral quality indicators for agent interactions.
//!
//! This is a Rust port of the paper-aligned Python reference implementation at
//! `https://github.com/katanemo/signals` (or `/Users/shashmi/repos/signals`).
//!
//! Three layers of signals are detected from a conversation transcript:
//!
//! - **Interaction**: misalignment, stagnation, disengagement, satisfaction
//! - **Execution**: failure, loops
//! - **Environment**: exhaustion
//!
//! See `SignalType` for the full hierarchy.
pub use analyzer::*;
pub mod analyzer;
pub mod environment;
pub mod execution;
pub mod interaction;
pub mod otel;
pub mod schemas;
pub mod text_processing;
pub use analyzer::{SignalAnalyzer, FLAG_MARKER};
pub use schemas::{
EnvironmentSignals, ExecutionSignals, InteractionQuality, InteractionSignals, SignalGroup,
SignalInstance, SignalLayer, SignalReport, SignalType, TurnMetrics,
};

View file

@ -0,0 +1,241 @@
//! Helpers for emitting `SignalReport` data to OpenTelemetry spans.
//!
//! Two sets of attributes are emitted:
//!
//! - **Legacy** keys under `signals.*` (e.g. `signals.frustration.count`),
//! computed from the new layered counts. Preserved for one release for
//! backward compatibility with existing dashboards.
//! - **New** layered keys (e.g. `signals.interaction.misalignment.count`),
//! one set of `count`/`severity` attributes per category, plus per-instance
//! span events named `signal.<dotted_signal_type>`.
use opentelemetry::trace::SpanRef;
use opentelemetry::KeyValue;
use crate::signals::schemas::{SignalGroup, SignalReport, SignalType};
/// Emit both legacy and layered OTel attributes/events for a `SignalReport`.
///
/// Returns `true` if any "concerning" signal was found, mirroring the previous
/// behavior used to flag the span operation name.
pub fn emit_signals_to_span(span: &SpanRef<'_>, report: &SignalReport) -> bool {
emit_overall(span, report);
emit_layered_attributes(span, report);
emit_legacy_attributes(span, report);
emit_signal_events(span, report);
is_concerning(report)
}
fn emit_overall(span: &SpanRef<'_>, report: &SignalReport) {
span.set_attribute(KeyValue::new(
"signals.quality",
report.overall_quality.as_str().to_string(),
));
span.set_attribute(KeyValue::new(
"signals.quality_score",
report.quality_score as f64,
));
span.set_attribute(KeyValue::new(
"signals.turn_count",
report.turn_metrics.total_turns as i64,
));
span.set_attribute(KeyValue::new(
"signals.efficiency_score",
report.turn_metrics.efficiency_score as f64,
));
}
fn emit_group(span: &SpanRef<'_>, prefix: &str, group: &SignalGroup) {
if group.count == 0 {
return;
}
span.set_attribute(KeyValue::new(
format!("{}.count", prefix),
group.count as i64,
));
span.set_attribute(KeyValue::new(
format!("{}.severity", prefix),
group.severity as i64,
));
}
fn emit_layered_attributes(span: &SpanRef<'_>, report: &SignalReport) {
emit_group(
span,
"signals.interaction.misalignment",
&report.interaction.misalignment,
);
emit_group(
span,
"signals.interaction.stagnation",
&report.interaction.stagnation,
);
emit_group(
span,
"signals.interaction.disengagement",
&report.interaction.disengagement,
);
emit_group(
span,
"signals.interaction.satisfaction",
&report.interaction.satisfaction,
);
emit_group(span, "signals.execution.failure", &report.execution.failure);
emit_group(span, "signals.execution.loops", &report.execution.loops);
emit_group(
span,
"signals.environment.exhaustion",
&report.environment.exhaustion,
);
}
fn count_of(report: &SignalReport, t: SignalType) -> usize {
report.iter_signals().filter(|s| s.signal_type == t).count()
}
/// Emit the legacy attribute keys consumed by existing dashboards. These are
/// derived from the new `SignalReport` so no detector contract is broken.
fn emit_legacy_attributes(span: &SpanRef<'_>, report: &SignalReport) {
use crate::tracing::signals as legacy;
// signals.follow_up.repair.{count,ratio} - misalignment proxies repairs.
let repair_count = report.interaction.misalignment.count;
let user_turns = report.turn_metrics.user_turns.max(1) as f32;
if repair_count > 0 {
span.set_attribute(KeyValue::new(legacy::REPAIR_COUNT, repair_count as i64));
let ratio = repair_count as f32 / user_turns;
span.set_attribute(KeyValue::new(legacy::REPAIR_RATIO, format!("{:.3}", ratio)));
}
// signals.frustration.{count,severity} - disengagement.negative_stance is
// the closest legacy analog of "frustration".
let frustration_count = count_of(report, SignalType::DisengagementNegativeStance);
if frustration_count > 0 {
span.set_attribute(KeyValue::new(
legacy::FRUSTRATION_COUNT,
frustration_count as i64,
));
let severity = match frustration_count {
0 => 0,
1..=2 => 1,
3..=4 => 2,
_ => 3,
};
span.set_attribute(KeyValue::new(legacy::FRUSTRATION_SEVERITY, severity as i64));
}
// signals.repetition.count - stagnation (repetition + dragging).
if report.interaction.stagnation.count > 0 {
span.set_attribute(KeyValue::new(
legacy::REPETITION_COUNT,
report.interaction.stagnation.count as i64,
));
}
// signals.escalation.requested - any escalation/quit signal.
let escalated = report.interaction.disengagement.signals.iter().any(|s| {
matches!(
s.signal_type,
SignalType::DisengagementEscalation | SignalType::DisengagementQuit
)
});
if escalated {
span.set_attribute(KeyValue::new(legacy::ESCALATION_REQUESTED, true));
}
// signals.positive_feedback.count - satisfaction signals.
if report.interaction.satisfaction.count > 0 {
span.set_attribute(KeyValue::new(
legacy::POSITIVE_FEEDBACK_COUNT,
report.interaction.satisfaction.count as i64,
));
}
}
fn emit_signal_events(span: &SpanRef<'_>, report: &SignalReport) {
for sig in report.iter_signals() {
let event_name = format!("signal.{}", sig.signal_type.as_str());
let mut attrs: Vec<KeyValue> = vec![
KeyValue::new("signal.type", sig.signal_type.as_str().to_string()),
KeyValue::new("signal.message_index", sig.message_index as i64),
KeyValue::new("signal.confidence", sig.confidence as f64),
];
if !sig.snippet.is_empty() {
attrs.push(KeyValue::new("signal.snippet", sig.snippet.clone()));
}
if !sig.metadata.is_null() {
attrs.push(KeyValue::new("signal.metadata", sig.metadata.to_string()));
}
span.add_event(event_name, attrs);
}
}
fn is_concerning(report: &SignalReport) -> bool {
use crate::signals::schemas::InteractionQuality;
if matches!(
report.overall_quality,
InteractionQuality::Poor | InteractionQuality::Severe
) {
return true;
}
if report.interaction.disengagement.count > 0 {
return true;
}
if report.interaction.stagnation.count > 2 {
return true;
}
if report.execution.failure.count > 0 || report.execution.loops.count > 0 {
return true;
}
false
}
#[cfg(test)]
mod tests {
use super::*;
use crate::signals::schemas::{
EnvironmentSignals, ExecutionSignals, InteractionQuality, InteractionSignals, SignalGroup,
SignalInstance, SignalReport, SignalType, TurnMetrics,
};
fn report_with_escalation() -> SignalReport {
let mut diseng = SignalGroup::new("disengagement");
diseng.add_signal(SignalInstance::new(
SignalType::DisengagementEscalation,
3,
"get me a human",
));
SignalReport {
interaction: InteractionSignals {
disengagement: diseng,
..InteractionSignals::default()
},
execution: ExecutionSignals::default(),
environment: EnvironmentSignals::default(),
overall_quality: InteractionQuality::Severe,
quality_score: 0.0,
turn_metrics: TurnMetrics {
total_turns: 3,
user_turns: 2,
assistant_turns: 1,
is_dragging: false,
efficiency_score: 1.0,
},
summary: String::new(),
}
}
#[test]
fn is_concerning_flags_disengagement() {
let r = report_with_escalation();
assert!(is_concerning(&r));
}
#[test]
fn count_of_returns_per_type_count() {
let r = report_with_escalation();
assert_eq!(count_of(&r, SignalType::DisengagementEscalation), 1);
assert_eq!(count_of(&r, SignalType::DisengagementNegativeStance), 0);
}
}

View file

@ -0,0 +1,431 @@
//! Data shapes for the signal analyzer.
//!
//! Mirrors `signals/schemas.py` from the reference implementation. Where the
//! Python library exposes a `Dict[str, SignalGroup]` partitioned by category,
//! the Rust port uses strongly-typed sub-structs (`InteractionSignals`,
//! `ExecutionSignals`, `EnvironmentSignals`) for the same partitioning.
use serde::{Deserialize, Serialize};
/// Hierarchical signal type. The 20 leaf variants mirror the paper taxonomy
/// and the Python reference's `SignalType` string enum.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub enum SignalType {
// Interaction > Misalignment
MisalignmentCorrection,
MisalignmentRephrase,
MisalignmentClarification,
// Interaction > Stagnation
StagnationDragging,
StagnationRepetition,
// Interaction > Disengagement
DisengagementEscalation,
DisengagementQuit,
DisengagementNegativeStance,
// Interaction > Satisfaction
SatisfactionGratitude,
SatisfactionConfirmation,
SatisfactionSuccess,
// Execution > Failure
ExecutionFailureInvalidArgs,
ExecutionFailureBadQuery,
ExecutionFailureToolNotFound,
ExecutionFailureAuthMisuse,
ExecutionFailureStateError,
// Execution > Loops
ExecutionLoopsRetry,
ExecutionLoopsParameterDrift,
ExecutionLoopsOscillation,
// Environment > Exhaustion
EnvironmentExhaustionApiError,
EnvironmentExhaustionTimeout,
EnvironmentExhaustionRateLimit,
EnvironmentExhaustionNetwork,
EnvironmentExhaustionMalformed,
EnvironmentExhaustionContextOverflow,
}
impl SignalType {
/// Dotted hierarchical string identifier, e.g.
/// `"interaction.misalignment.correction"`. Matches the Python reference's
/// `SignalType` enum *value* strings byte-for-byte.
pub fn as_str(&self) -> &'static str {
match self {
SignalType::MisalignmentCorrection => "interaction.misalignment.correction",
SignalType::MisalignmentRephrase => "interaction.misalignment.rephrase",
SignalType::MisalignmentClarification => "interaction.misalignment.clarification",
SignalType::StagnationDragging => "interaction.stagnation.dragging",
SignalType::StagnationRepetition => "interaction.stagnation.repetition",
SignalType::DisengagementEscalation => "interaction.disengagement.escalation",
SignalType::DisengagementQuit => "interaction.disengagement.quit",
SignalType::DisengagementNegativeStance => "interaction.disengagement.negative_stance",
SignalType::SatisfactionGratitude => "interaction.satisfaction.gratitude",
SignalType::SatisfactionConfirmation => "interaction.satisfaction.confirmation",
SignalType::SatisfactionSuccess => "interaction.satisfaction.success",
SignalType::ExecutionFailureInvalidArgs => "execution.failure.invalid_args",
SignalType::ExecutionFailureBadQuery => "execution.failure.bad_query",
SignalType::ExecutionFailureToolNotFound => "execution.failure.tool_not_found",
SignalType::ExecutionFailureAuthMisuse => "execution.failure.auth_misuse",
SignalType::ExecutionFailureStateError => "execution.failure.state_error",
SignalType::ExecutionLoopsRetry => "execution.loops.retry",
SignalType::ExecutionLoopsParameterDrift => "execution.loops.parameter_drift",
SignalType::ExecutionLoopsOscillation => "execution.loops.oscillation",
SignalType::EnvironmentExhaustionApiError => "environment.exhaustion.api_error",
SignalType::EnvironmentExhaustionTimeout => "environment.exhaustion.timeout",
SignalType::EnvironmentExhaustionRateLimit => "environment.exhaustion.rate_limit",
SignalType::EnvironmentExhaustionNetwork => "environment.exhaustion.network",
SignalType::EnvironmentExhaustionMalformed => {
"environment.exhaustion.malformed_response"
}
SignalType::EnvironmentExhaustionContextOverflow => {
"environment.exhaustion.context_overflow"
}
}
}
pub fn layer(&self) -> SignalLayer {
match self {
SignalType::MisalignmentCorrection
| SignalType::MisalignmentRephrase
| SignalType::MisalignmentClarification
| SignalType::StagnationDragging
| SignalType::StagnationRepetition
| SignalType::DisengagementEscalation
| SignalType::DisengagementQuit
| SignalType::DisengagementNegativeStance
| SignalType::SatisfactionGratitude
| SignalType::SatisfactionConfirmation
| SignalType::SatisfactionSuccess => SignalLayer::Interaction,
SignalType::ExecutionFailureInvalidArgs
| SignalType::ExecutionFailureBadQuery
| SignalType::ExecutionFailureToolNotFound
| SignalType::ExecutionFailureAuthMisuse
| SignalType::ExecutionFailureStateError
| SignalType::ExecutionLoopsRetry
| SignalType::ExecutionLoopsParameterDrift
| SignalType::ExecutionLoopsOscillation => SignalLayer::Execution,
SignalType::EnvironmentExhaustionApiError
| SignalType::EnvironmentExhaustionTimeout
| SignalType::EnvironmentExhaustionRateLimit
| SignalType::EnvironmentExhaustionNetwork
| SignalType::EnvironmentExhaustionMalformed
| SignalType::EnvironmentExhaustionContextOverflow => SignalLayer::Environment,
}
}
/// Category name within the layer (e.g. `"misalignment"`, `"failure"`).
pub fn category(&self) -> &'static str {
// Strip the layer prefix and take everything before the next dot.
let s = self.as_str();
let after_layer = s.split_once('.').map(|(_, rest)| rest).unwrap_or(s);
after_layer
.split_once('.')
.map(|(c, _)| c)
.unwrap_or(after_layer)
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub enum SignalLayer {
Interaction,
Execution,
Environment,
}
impl SignalLayer {
pub fn as_str(&self) -> &'static str {
match self {
SignalLayer::Interaction => "interaction",
SignalLayer::Execution => "execution",
SignalLayer::Environment => "environment",
}
}
}
/// Overall quality assessment for an agent interaction session.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum InteractionQuality {
Excellent,
Good,
Neutral,
Poor,
Severe,
}
impl InteractionQuality {
pub fn as_str(&self) -> &'static str {
match self {
InteractionQuality::Excellent => "excellent",
InteractionQuality::Good => "good",
InteractionQuality::Neutral => "neutral",
InteractionQuality::Poor => "poor",
InteractionQuality::Severe => "severe",
}
}
}
/// A single detected signal instance.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SignalInstance {
pub signal_type: SignalType,
/// Absolute index into the original conversation `Vec<Message>`.
pub message_index: usize,
pub snippet: String,
pub confidence: f32,
/// Free-form metadata payload mirroring the Python `Dict[str, Any]`.
/// Stored as a JSON object so we can faithfully reproduce the reference's
/// flexible per-detector metadata.
#[serde(default)]
pub metadata: serde_json::Value,
}
impl SignalInstance {
pub fn new(signal_type: SignalType, message_index: usize, snippet: impl Into<String>) -> Self {
Self {
signal_type,
message_index,
snippet: snippet.into(),
confidence: 1.0,
metadata: serde_json::Value::Object(serde_json::Map::new()),
}
}
pub fn with_confidence(mut self, c: f32) -> Self {
self.confidence = c;
self
}
pub fn with_metadata(mut self, m: serde_json::Value) -> Self {
self.metadata = m;
self
}
}
/// Aggregated signals for a specific category.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SignalGroup {
pub category: String,
pub count: usize,
pub signals: Vec<SignalInstance>,
/// Severity level (0-3: none, mild, moderate, severe).
pub severity: u8,
}
impl SignalGroup {
pub fn new(category: impl Into<String>) -> Self {
Self {
category: category.into(),
count: 0,
signals: Vec::new(),
severity: 0,
}
}
pub fn add_signal(&mut self, signal: SignalInstance) {
self.signals.push(signal);
self.count = self.signals.len();
self.update_severity();
}
fn update_severity(&mut self) {
self.severity = match self.count {
0 => 0,
1..=2 => 1,
3..=4 => 2,
_ => 3,
};
}
}
/// Turn count and efficiency metrics, used by stagnation.dragging.
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct TurnMetrics {
pub total_turns: usize,
pub user_turns: usize,
pub assistant_turns: usize,
pub is_dragging: bool,
pub efficiency_score: f32,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct InteractionSignals {
pub misalignment: SignalGroup,
pub stagnation: SignalGroup,
pub disengagement: SignalGroup,
pub satisfaction: SignalGroup,
}
impl Default for InteractionSignals {
fn default() -> Self {
Self {
misalignment: SignalGroup::new("misalignment"),
stagnation: SignalGroup::new("stagnation"),
disengagement: SignalGroup::new("disengagement"),
satisfaction: SignalGroup::new("satisfaction"),
}
}
}
impl InteractionSignals {
/// Ratio of misalignment instances to user turns. Used as a quality
/// scoring input and as a threshold for the "high misalignment rate"
/// summary callout. Mirrors `misalignment.count / max(user_turns, 1)`
/// from the Python reference's `_assess_quality` and `_generate_summary`.
pub fn misalignment_ratio(&self, user_turns: usize) -> f32 {
let denom = user_turns.max(1) as f32;
self.misalignment.count as f32 / denom
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ExecutionSignals {
pub failure: SignalGroup,
pub loops: SignalGroup,
}
impl Default for ExecutionSignals {
fn default() -> Self {
Self {
failure: SignalGroup::new("failure"),
loops: SignalGroup::new("loops"),
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EnvironmentSignals {
pub exhaustion: SignalGroup,
}
impl Default for EnvironmentSignals {
fn default() -> Self {
Self {
exhaustion: SignalGroup::new("exhaustion"),
}
}
}
/// Complete signal analysis report for a conversation.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SignalReport {
pub interaction: InteractionSignals,
pub execution: ExecutionSignals,
pub environment: EnvironmentSignals,
pub overall_quality: InteractionQuality,
pub quality_score: f32,
pub turn_metrics: TurnMetrics,
pub summary: String,
}
impl Default for SignalReport {
fn default() -> Self {
Self {
interaction: InteractionSignals::default(),
execution: ExecutionSignals::default(),
environment: EnvironmentSignals::default(),
overall_quality: InteractionQuality::Neutral,
quality_score: 50.0,
turn_metrics: TurnMetrics::default(),
summary: String::new(),
}
}
}
impl SignalReport {
/// Iterate over every `SignalInstance` across all layers and groups.
pub fn iter_signals(&self) -> impl Iterator<Item = &SignalInstance> {
self.interaction
.misalignment
.signals
.iter()
.chain(self.interaction.stagnation.signals.iter())
.chain(self.interaction.disengagement.signals.iter())
.chain(self.interaction.satisfaction.signals.iter())
.chain(self.execution.failure.signals.iter())
.chain(self.execution.loops.signals.iter())
.chain(self.environment.exhaustion.signals.iter())
}
pub fn has_signal_type(&self, t: SignalType) -> bool {
self.iter_signals().any(|s| s.signal_type == t)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn signal_type_strings_match_paper_taxonomy() {
assert_eq!(
SignalType::MisalignmentCorrection.as_str(),
"interaction.misalignment.correction"
);
assert_eq!(
SignalType::ExecutionFailureInvalidArgs.as_str(),
"execution.failure.invalid_args"
);
assert_eq!(
SignalType::EnvironmentExhaustionMalformed.as_str(),
"environment.exhaustion.malformed_response"
);
}
#[test]
fn signal_type_layer_and_category() {
assert_eq!(
SignalType::MisalignmentRephrase.layer(),
SignalLayer::Interaction
);
assert_eq!(SignalType::MisalignmentRephrase.category(), "misalignment");
assert_eq!(
SignalType::ExecutionLoopsRetry.layer(),
SignalLayer::Execution
);
assert_eq!(SignalType::ExecutionLoopsRetry.category(), "loops");
assert_eq!(
SignalType::EnvironmentExhaustionTimeout.layer(),
SignalLayer::Environment
);
assert_eq!(
SignalType::EnvironmentExhaustionTimeout.category(),
"exhaustion"
);
}
#[test]
fn signal_group_severity_buckets_match_python() {
let mut g = SignalGroup::new("misalignment");
assert_eq!(g.severity, 0);
for n in 1..=2 {
g.add_signal(SignalInstance::new(
SignalType::MisalignmentCorrection,
n,
"x",
));
}
assert_eq!(g.severity, 1);
for n in 3..=4 {
g.add_signal(SignalInstance::new(
SignalType::MisalignmentCorrection,
n,
"x",
));
}
assert_eq!(g.severity, 2);
for n in 5..=6 {
g.add_signal(SignalInstance::new(
SignalType::MisalignmentCorrection,
n,
"x",
));
}
assert_eq!(g.severity, 3);
}
}

View file

@ -0,0 +1,401 @@
//! Text normalization and similarity primitives.
//!
//! Direct Rust port of `signals/text_processing.py` from the reference. The
//! shapes (`NormalizedMessage`, `NormalizedPattern`) and similarity formulas
//! match the Python implementation exactly so that pattern matching produces
//! the same results on the same inputs.
use std::collections::{HashMap, HashSet};
/// Size of character n-grams used for fuzzy similarity (3 = trigrams).
pub const NGRAM_SIZE: usize = 3;
const PUNCT_TRIM: &[char] = &[
'!', '"', '#', '$', '%', '&', '\'', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=',
'>', '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~',
];
/// Pre-processed message with normalized text and tokens for efficient matching.
#[derive(Debug, Clone, Default)]
pub struct NormalizedMessage {
pub raw: String,
pub tokens: Vec<String>,
pub token_set: HashSet<String>,
pub bigram_set: HashSet<String>,
pub char_ngram_set: HashSet<String>,
pub token_frequency: HashMap<String, usize>,
}
impl NormalizedMessage {
/// Create a normalized message from raw text. Mirrors
/// `NormalizedMessage.from_text` in the reference, including the
/// head-20%/tail-80% truncation strategy when text exceeds `max_length`.
pub fn from_text(text: &str, max_length: usize) -> Self {
let char_count = text.chars().count();
let raw: String = if char_count <= max_length {
text.to_string()
} else {
let head_len = max_length / 5;
// Reserve one char for the joining space.
let tail_len = max_length.saturating_sub(head_len + 1);
let head: String = text.chars().take(head_len).collect();
let tail: String = text
.chars()
.skip(char_count.saturating_sub(tail_len))
.collect();
format!("{} {}", head, tail)
};
// Normalize unicode punctuation to ASCII equivalents.
let normalized_unicode = raw
.replace(['\u{2019}', '\u{2018}'], "'")
.replace(['\u{201c}', '\u{201d}'], "\"")
.replace(['\u{2013}', '\u{2014}'], "-");
// Lowercase + collapse whitespace (matches Python's `" ".join(s.split())`).
let normalized: String = normalized_unicode
.to_lowercase()
.split_whitespace()
.collect::<Vec<_>>()
.join(" ");
let mut tokens: Vec<String> = Vec::new();
for word in normalized.split_whitespace() {
let stripped: String = word.trim_matches(PUNCT_TRIM).to_string();
if !stripped.is_empty() {
tokens.push(stripped);
}
}
let token_set: HashSet<String> = tokens.iter().cloned().collect();
let mut bigram_set: HashSet<String> = HashSet::new();
for i in 0..tokens.len().saturating_sub(1) {
bigram_set.insert(format!("{} {}", tokens[i], tokens[i + 1]));
}
let tokens_text = tokens.join(" ");
let char_ngram_set = char_ngrams(&tokens_text, NGRAM_SIZE);
let mut token_frequency: HashMap<String, usize> = HashMap::new();
for t in &tokens {
*token_frequency.entry(t.clone()).or_insert(0) += 1;
}
Self {
raw,
tokens,
token_set,
bigram_set,
char_ngram_set,
token_frequency,
}
}
pub fn contains_token(&self, token: &str) -> bool {
self.token_set.contains(token)
}
pub fn contains_phrase(&self, phrase: &str) -> bool {
let phrase_tokens: Vec<&str> = phrase.split_whitespace().collect();
if phrase_tokens.is_empty() {
return false;
}
if phrase_tokens.len() == 1 {
return self.contains_token(phrase_tokens[0]);
}
if phrase_tokens.len() > self.tokens.len() {
return false;
}
let n = phrase_tokens.len();
for i in 0..=self.tokens.len() - n {
if self.tokens[i..i + n]
.iter()
.zip(phrase_tokens.iter())
.all(|(a, b)| a == b)
{
return true;
}
}
false
}
/// Character n-gram (Jaccard) similarity vs another normalized message.
pub fn ngram_similarity_with_message(&self, other: &NormalizedMessage) -> f32 {
jaccard(&self.char_ngram_set, &other.char_ngram_set)
}
/// Character n-gram (Jaccard) similarity vs a raw pattern string.
pub fn ngram_similarity_with_pattern(&self, pattern: &str) -> f32 {
let normalized = strip_non_word_chars(&pattern.to_lowercase());
let pattern_ngrams = char_ngrams(&normalized, NGRAM_SIZE);
jaccard(&self.char_ngram_set, &pattern_ngrams)
}
/// Fraction of pattern's ngrams contained in this message's ngram set.
pub fn char_ngram_containment(&self, pattern: &str) -> f32 {
let normalized = strip_non_word_chars(&pattern.to_lowercase());
let pattern_ngrams = char_ngrams(&normalized, NGRAM_SIZE);
if pattern_ngrams.is_empty() {
return 0.0;
}
let contained = pattern_ngrams
.iter()
.filter(|ng| self.char_ngram_set.contains(*ng))
.count();
contained as f32 / pattern_ngrams.len() as f32
}
/// Token-frequency cosine similarity vs a raw pattern string.
pub fn token_cosine_similarity(&self, pattern: &str) -> f32 {
let mut pattern_freq: HashMap<String, usize> = HashMap::new();
for word in pattern.to_lowercase().split_whitespace() {
let stripped = word.trim_matches(PUNCT_TRIM);
if !stripped.is_empty() {
*pattern_freq.entry(stripped.to_string()).or_insert(0) += 1;
}
}
cosine_freq(&self.token_frequency, &pattern_freq)
}
/// Layered match against a pre-normalized pattern. Mirrors
/// `matches_normalized_pattern` from the reference: exact phrase ->
/// char-ngram Jaccard -> token cosine.
pub fn matches_normalized_pattern(
&self,
pattern: &NormalizedPattern,
char_ngram_threshold: f32,
token_cosine_threshold: f32,
) -> bool {
// Layer 0: exact phrase match using pre-tokenized message.
let plen = pattern.tokens.len();
let slen = self.tokens.len();
if plen > 0 && plen <= slen {
for i in 0..=slen - plen {
if self.tokens[i..i + plen] == pattern.tokens[..] {
return true;
}
}
}
// Layer 1: character n-gram Jaccard similarity.
if !self.char_ngram_set.is_empty() && !pattern.char_ngram_set.is_empty() {
let inter = self
.char_ngram_set
.intersection(&pattern.char_ngram_set)
.count();
let union = self.char_ngram_set.union(&pattern.char_ngram_set).count();
if union > 0 {
let sim = inter as f32 / union as f32;
if sim >= char_ngram_threshold {
return true;
}
}
}
// Layer 2: token frequency cosine similarity.
if !self.token_frequency.is_empty() && !pattern.token_frequency.is_empty() {
let sim = cosine_freq(&self.token_frequency, &pattern.token_frequency);
if sim >= token_cosine_threshold {
return true;
}
}
false
}
}
/// Pre-processed pattern with normalized text and pre-computed n-grams/tokens.
#[derive(Debug, Clone, Default)]
pub struct NormalizedPattern {
pub raw: String,
pub tokens: Vec<String>,
pub char_ngram_set: HashSet<String>,
pub token_frequency: HashMap<String, usize>,
}
impl NormalizedPattern {
pub fn from_text(pattern: &str) -> Self {
let normalized = pattern
.to_lowercase()
.replace(['\u{2019}', '\u{2018}'], "'")
.replace(['\u{201c}', '\u{201d}'], "\"")
.replace(['\u{2013}', '\u{2014}'], "-");
let normalized: String = normalized.split_whitespace().collect::<Vec<_>>().join(" ");
// Tokenize the same way as NormalizedMessage (trim boundary punctuation,
// keep internal punctuation).
let mut tokens: Vec<String> = Vec::new();
for word in normalized.split_whitespace() {
let stripped = word.trim_matches(PUNCT_TRIM);
if !stripped.is_empty() {
tokens.push(stripped.to_string());
}
}
// For ngrams + cosine, strip ALL punctuation (matches Python's
// `re.sub(r"[^\w\s]", "", normalized)`).
let normalized_for_ngrams = strip_non_word_chars(&normalized);
let char_ngram_set = char_ngrams(&normalized_for_ngrams, NGRAM_SIZE);
let tokens_no_punct: Vec<&str> = normalized_for_ngrams.split_whitespace().collect();
let mut token_frequency: HashMap<String, usize> = HashMap::new();
for t in &tokens_no_punct {
*token_frequency.entry((*t).to_string()).or_insert(0) += 1;
}
Self {
raw: pattern.to_string(),
tokens,
char_ngram_set,
token_frequency,
}
}
}
/// Convenience: normalize a list of raw pattern strings into `NormalizedPattern`s.
pub fn normalize_patterns(patterns: &[&str]) -> Vec<NormalizedPattern> {
patterns
.iter()
.map(|p| NormalizedPattern::from_text(p))
.collect()
}
// ---------------------------------------------------------------------------
// Similarity primitives
// ---------------------------------------------------------------------------
fn char_ngrams(s: &str, n: usize) -> HashSet<String> {
// Python iterates by character index, not byte; mirror that with .chars().
let chars: Vec<char> = s.chars().collect();
let mut out: HashSet<String> = HashSet::new();
if chars.len() < n {
return out;
}
for i in 0..=chars.len() - n {
out.insert(chars[i..i + n].iter().collect());
}
out
}
fn jaccard(a: &HashSet<String>, b: &HashSet<String>) -> f32 {
if a.is_empty() && b.is_empty() {
return 1.0;
}
if a.is_empty() || b.is_empty() {
return 0.0;
}
let inter = a.intersection(b).count();
let union = a.union(b).count();
if union == 0 {
0.0
} else {
inter as f32 / union as f32
}
}
fn cosine_freq(a: &HashMap<String, usize>, b: &HashMap<String, usize>) -> f32 {
if a.is_empty() && b.is_empty() {
return 1.0;
}
if a.is_empty() || b.is_empty() {
return 0.0;
}
let mut dot: f64 = 0.0;
let mut n1_sq: f64 = 0.0;
let mut n2_sq: f64 = 0.0;
for (token, &freq2) in b {
let freq1 = *a.get(token).unwrap_or(&0);
dot += (freq1 * freq2) as f64;
n2_sq += (freq2 * freq2) as f64;
}
for &freq1 in a.values() {
n1_sq += (freq1 * freq1) as f64;
}
let n1 = n1_sq.sqrt();
let n2 = n2_sq.sqrt();
if n1 == 0.0 || n2 == 0.0 {
0.0
} else {
(dot / (n1 * n2)) as f32
}
}
/// Python equivalent: `re.sub(r"[^\w\s]", "", text)` followed by whitespace
/// collapse. Python's `\w` is `[A-Za-z0-9_]` plus unicode word characters; we
/// use Rust's `char::is_alphanumeric()` plus `_` for an equivalent definition.
fn strip_non_word_chars(text: &str) -> String {
let mut out = String::with_capacity(text.len());
for c in text.chars() {
if c.is_alphanumeric() || c == '_' || c.is_whitespace() {
out.push(c);
}
}
out.split_whitespace().collect::<Vec<_>>().join(" ")
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn normalize_lowercases_and_strips_punctuation() {
let m = NormalizedMessage::from_text("Hello, World!", 2000);
assert_eq!(m.tokens, vec!["hello".to_string(), "world".to_string()]);
}
#[test]
fn normalizes_smart_quotes() {
let m = NormalizedMessage::from_text("don\u{2019}t", 2000);
assert!(m.tokens.contains(&"don't".to_string()));
}
#[test]
fn truncates_long_text_with_head_tail() {
let long = "a".repeat(3000);
let m = NormalizedMessage::from_text(&long, 2000);
// raw should be ~ 2000 chars (head + space + tail)
assert!(m.raw.chars().count() <= 2001);
assert!(m.raw.starts_with("aa"));
assert!(m.raw.ends_with("aa"));
}
#[test]
fn contains_phrase_matches_consecutive_tokens() {
let m = NormalizedMessage::from_text("I think this is great work", 2000);
assert!(m.contains_phrase("this is great"));
assert!(!m.contains_phrase("great this"));
}
#[test]
fn matches_pattern_via_exact_phrase() {
let m = NormalizedMessage::from_text("No, I meant the second one", 2000);
let p = NormalizedPattern::from_text("no i meant");
assert!(m.matches_normalized_pattern(&p, 0.65, 0.6));
}
#[test]
fn matches_pattern_via_char_ngram_fuzziness() {
// Typo in "meant" -> "ment" so layer 0 (exact phrase) cannot match,
// forcing the matcher to fall back to layer 1 (char n-gram Jaccard).
let m = NormalizedMessage::from_text("No I ment", 2000);
let p = NormalizedPattern::from_text("no i meant");
assert!(m.matches_normalized_pattern(&p, 0.4, 0.6));
}
#[test]
fn jaccard_identical_sets_is_one() {
let a: HashSet<String> = ["abc", "bcd"].iter().map(|s| s.to_string()).collect();
assert!((jaccard(&a, &a) - 1.0).abs() < 1e-6);
}
#[test]
fn cosine_freq_orthogonal_is_zero() {
let mut a: HashMap<String, usize> = HashMap::new();
a.insert("hello".to_string(), 1);
let mut b: HashMap<String, usize> = HashMap::new();
b.insert("world".to_string(), 1);
assert_eq!(cosine_freq(&a, &b), 0.0);
}
}

View file

@ -20,8 +20,11 @@ const STREAM_BUFFER_SIZE: usize = 16;
/// Most chat responses are well under this; pathological ones are dropped without
/// affecting pass-through streaming to the client.
const USAGE_BUFFER_MAX: usize = 2 * 1024 * 1024;
use crate::signals::{InteractionQuality, SignalAnalyzer, TextBasedSignalAnalyzer, FLAG_MARKER};
use crate::tracing::{llm, set_service_name, signals as signal_constants};
use crate::metrics as bs_metrics;
use crate::metrics::labels as metric_labels;
use crate::signals::otel::emit_signals_to_span;
use crate::signals::{SignalAnalyzer, FLAG_MARKER};
use crate::tracing::{llm, set_service_name};
use hermesllm::apis::openai::Message;
/// Parsed usage + resolved-model details from a provider response.
@ -172,6 +175,18 @@ impl StreamProcessor for Box<dyn StreamProcessor> {
}
}
/// Optional Prometheus-metric context for an LLM upstream call. When present,
/// [`ObservableStreamProcessor`] emits `brightstaff_llm_*` metrics at
/// first-byte / complete / error callbacks.
#[derive(Debug, Clone)]
pub struct LlmMetricsCtx {
pub provider: String,
pub model: String,
/// HTTP status of the upstream response. Used to pick `status_class` and
/// `error_class` on `on_complete`.
pub upstream_status: u16,
}
/// A processor that tracks streaming metrics
pub struct ObservableStreamProcessor {
service_name: String,
@ -185,6 +200,8 @@ pub struct ObservableStreamProcessor {
/// on `on_complete`. Capped at `USAGE_BUFFER_MAX`; excess chunks are dropped
/// from the buffer (they still pass through to the client).
response_buffer: Vec<u8>,
llm_metrics: Option<LlmMetricsCtx>,
metrics_recorded: bool,
}
impl ObservableStreamProcessor {
@ -219,8 +236,17 @@ impl ObservableStreamProcessor {
time_to_first_token: None,
messages,
response_buffer: Vec::new(),
llm_metrics: None,
metrics_recorded: false,
}
}
/// Attach LLM upstream metric context so the processor emits
/// `brightstaff_llm_*` metrics on first-byte / complete / error.
pub fn with_llm_metrics(mut self, ctx: LlmMetricsCtx) -> Self {
self.llm_metrics = Some(ctx);
self
}
}
impl StreamProcessor for ObservableStreamProcessor {
@ -240,7 +266,11 @@ impl StreamProcessor for ObservableStreamProcessor {
fn on_first_bytes(&mut self) {
// Record time to first token (only for streaming)
if self.time_to_first_token.is_none() {
self.time_to_first_token = Some(self.start_time.elapsed().as_millis());
let elapsed = self.start_time.elapsed();
self.time_to_first_token = Some(elapsed.as_millis());
if let Some(ref ctx) = self.llm_metrics {
bs_metrics::record_llm_ttft(&ctx.provider, &ctx.model, elapsed);
}
}
}
@ -299,81 +329,56 @@ impl StreamProcessor for ObservableStreamProcessor {
otel_span.set_attribute(KeyValue::new(llm::MODEL_NAME, resolved));
}
}
// Emit LLM upstream prometheus metrics (duration + tokens) if wired.
// The upstream responded (we have a status), so status_class alone
// carries the non-2xx signal — error_class stays "none".
if let Some(ref ctx) = self.llm_metrics {
bs_metrics::record_llm_upstream(
&ctx.provider,
&ctx.model,
ctx.upstream_status,
metric_labels::LLM_ERR_NONE,
self.start_time.elapsed(),
);
if let Some(v) = usage.prompt_tokens {
bs_metrics::record_llm_tokens(
&ctx.provider,
&ctx.model,
metric_labels::TOKEN_KIND_PROMPT,
v.max(0) as u64,
);
}
if let Some(v) = usage.completion_tokens {
bs_metrics::record_llm_tokens(
&ctx.provider,
&ctx.model,
metric_labels::TOKEN_KIND_COMPLETION,
v.max(0) as u64,
);
}
if usage.prompt_tokens.is_none() && usage.completion_tokens.is_none() {
bs_metrics::record_llm_tokens_usage_missing(&ctx.provider, &ctx.model);
}
self.metrics_recorded = true;
}
// Release the buffered bytes early; nothing downstream needs them.
self.response_buffer.clear();
self.response_buffer.shrink_to_fit();
// Analyze signals if messages are available and record as span attributes
// Analyze signals if messages are available and record as span
// attributes + per-signal events. We dual-emit legacy aggregate keys
// and the new layered taxonomy so existing dashboards keep working
// while new consumers can opt into the richer hierarchy.
if let Some(ref messages) = self.messages {
let analyzer: Box<dyn SignalAnalyzer> = Box::new(TextBasedSignalAnalyzer::new());
let report = analyzer.analyze(messages);
let analyzer = SignalAnalyzer::default();
let report = analyzer.analyze_openai(messages);
// Get the current OTel span to set signal attributes
let span = tracing::Span::current();
let otel_context = span.context();
let otel_span = otel_context.span();
// Add overall quality
otel_span.set_attribute(KeyValue::new(
signal_constants::QUALITY,
format!("{:?}", report.overall_quality),
));
// Add repair/follow-up metrics if concerning
if report.follow_up.is_concerning || report.follow_up.repair_count > 0 {
otel_span.set_attribute(KeyValue::new(
signal_constants::REPAIR_COUNT,
report.follow_up.repair_count as i64,
));
otel_span.set_attribute(KeyValue::new(
signal_constants::REPAIR_RATIO,
format!("{:.3}", report.follow_up.repair_ratio),
));
}
// Add frustration metrics
if report.frustration.has_frustration {
otel_span.set_attribute(KeyValue::new(
signal_constants::FRUSTRATION_COUNT,
report.frustration.frustration_count as i64,
));
otel_span.set_attribute(KeyValue::new(
signal_constants::FRUSTRATION_SEVERITY,
report.frustration.severity as i64,
));
}
// Add repetition metrics
if report.repetition.has_looping {
otel_span.set_attribute(KeyValue::new(
signal_constants::REPETITION_COUNT,
report.repetition.repetition_count as i64,
));
}
// Add escalation metrics
if report.escalation.escalation_requested {
otel_span
.set_attribute(KeyValue::new(signal_constants::ESCALATION_REQUESTED, true));
}
// Add positive feedback metrics
if report.positive_feedback.has_positive_feedback {
otel_span.set_attribute(KeyValue::new(
signal_constants::POSITIVE_FEEDBACK_COUNT,
report.positive_feedback.positive_count as i64,
));
}
// Flag the span name if any concerning signal is detected
let should_flag = report.frustration.has_frustration
|| report.repetition.has_looping
|| report.escalation.escalation_requested
|| matches!(
report.overall_quality,
InteractionQuality::Poor | InteractionQuality::Severe
);
let should_flag = emit_signals_to_span(&otel_span, &report);
if should_flag {
otel_span.update_name(format!("{} {}", self.operation_name, FLAG_MARKER));
}
@ -396,6 +401,18 @@ impl StreamProcessor for ObservableStreamProcessor {
duration_ms = self.start_time.elapsed().as_millis(),
"stream error"
);
if let Some(ref ctx) = self.llm_metrics {
if !self.metrics_recorded {
bs_metrics::record_llm_upstream(
&ctx.provider,
&ctx.model,
ctx.upstream_status,
metric_labels::LLM_ERR_STREAM,
self.start_time.elapsed(),
);
self.metrics_recorded = true;
}
}
}
}

View file

@ -234,6 +234,7 @@ pub struct Overrides {
pub llm_routing_model: Option<String>,
pub agent_orchestration_model: Option<String>,
pub orchestrator_model_context_length: Option<usize>,
pub disable_signals: Option<bool>,
}
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
@ -391,6 +392,8 @@ pub enum LlmProviderType {
AmazonBedrock,
#[serde(rename = "plano")]
Plano,
#[serde(rename = "chatgpt")]
ChatGPT,
#[serde(rename = "digitalocean")]
DigitalOcean,
}
@ -414,6 +417,7 @@ impl Display for LlmProviderType {
LlmProviderType::Qwen => write!(f, "qwen"),
LlmProviderType::AmazonBedrock => write!(f, "amazon_bedrock"),
LlmProviderType::Plano => write!(f, "plano"),
LlmProviderType::ChatGPT => write!(f, "chatgpt"),
LlmProviderType::DigitalOcean => write!(f, "digitalocean"),
}
}
@ -481,6 +485,7 @@ pub struct LlmProvider {
pub base_url_path_prefix: Option<String>,
pub internal: Option<bool>,
pub passthrough_auth: Option<bool>,
pub headers: Option<HashMap<String, String>>,
}
pub trait IntoModels {
@ -524,6 +529,7 @@ impl Default for LlmProvider {
base_url_path_prefix: None,
internal: None,
passthrough_auth: None,
headers: None,
}
}
}
@ -650,7 +656,7 @@ mod test {
.expect("reference config file not found");
let config: super::Configuration = serde_yaml::from_str(&ref_config).unwrap();
assert_eq!(config.version, "v0.3.0");
assert_eq!(config.version, "v0.4.0");
if let Some(prompt_targets) = &config.prompt_targets {
assert!(
@ -750,4 +756,29 @@ mod test {
assert!(model_ids.contains(&"openai-gpt4".to_string()));
assert!(!model_ids.contains(&"plano-orchestrator".to_string()));
}
#[test]
fn test_overrides_disable_signals_default_none() {
let overrides = super::Overrides::default();
assert_eq!(overrides.disable_signals, None);
}
#[test]
fn test_overrides_disable_signals_deserialize() {
let yaml = r#"
disable_signals: true
"#;
let overrides: super::Overrides = serde_yaml::from_str(yaml).unwrap();
assert_eq!(overrides.disable_signals, Some(true));
let yaml_false = r#"
disable_signals: false
"#;
let overrides: super::Overrides = serde_yaml::from_str(yaml_false).unwrap();
assert_eq!(overrides.disable_signals, Some(false));
let yaml_missing = "{}";
let overrides: super::Overrides = serde_yaml::from_str(yaml_missing).unwrap();
assert_eq!(overrides.disable_signals, None);
}
}

View file

@ -277,6 +277,7 @@ mod tests {
internal: None,
stream: None,
passthrough_auth: None,
headers: None,
}
}

View file

@ -1,6 +1,9 @@
use crate::apis::anthropic::MessagesStreamEvent;
use crate::apis::anthropic::{
MessagesMessageDelta, MessagesStopReason, MessagesStreamEvent, MessagesUsage,
};
use crate::apis::streaming_shapes::sse::{SseEvent, SseStreamBufferTrait};
use crate::providers::streaming_response::ProviderStreamResponseType;
use log::warn;
use std::collections::HashSet;
/// SSE Stream Buffer for Anthropic Messages API streaming.
@ -11,13 +14,24 @@ use std::collections::HashSet;
///
/// When converting from OpenAI to Anthropic format, this buffer injects the required
/// ContentBlockStart and ContentBlockStop events to maintain proper Anthropic protocol.
///
/// Guarantees (Anthropic Messages API contract):
/// 1. `message_stop` is never emitted unless a matching `message_start` was emitted first.
/// 2. `message_stop` is emitted at most once per stream (no double-close).
/// 3. If upstream terminates with no content (empty/filtered/errored response), a
/// minimal but well-formed envelope is synthesized so the client's state machine
/// stays consistent.
pub struct AnthropicMessagesStreamBuffer {
/// Buffered SSE events ready to be written to wire
buffered_events: Vec<SseEvent>,
/// Track if we've seen a message_start event
/// Track if we've emitted a message_start event
message_started: bool,
/// Track if we've emitted a terminal message_stop event (for idempotency /
/// double-close protection).
message_stopped: bool,
/// Track content block indices that have received ContentBlockStart events
content_block_start_indices: HashSet<i32>,
@ -42,6 +56,7 @@ impl AnthropicMessagesStreamBuffer {
Self {
buffered_events: Vec::new(),
message_started: false,
message_stopped: false,
content_block_start_indices: HashSet::new(),
needs_content_block_stop: false,
seen_message_delta: false,
@ -49,6 +64,66 @@ impl AnthropicMessagesStreamBuffer {
}
}
/// Inject a `message_start` event into the buffer if one hasn't been emitted yet.
/// This is the single source of truth for opening a message — every handler
/// that can legitimately be the first event on the wire must call this before
/// pushing its own event.
fn ensure_message_started(&mut self) {
if self.message_started {
return;
}
let model = self.model.as_deref().unwrap_or("unknown");
let message_start = AnthropicMessagesStreamBuffer::create_message_start_event(model);
self.buffered_events.push(message_start);
self.message_started = true;
}
/// Inject a synthetic `message_delta` with `end_turn` / zero usage.
/// Used when we must close a message but upstream never produced a terminal
/// event (e.g. `[DONE]` arrives with no prior `finish_reason`).
fn push_synthetic_message_delta(&mut self) {
let event = MessagesStreamEvent::MessageDelta {
delta: MessagesMessageDelta {
stop_reason: MessagesStopReason::EndTurn,
stop_sequence: None,
},
usage: MessagesUsage {
input_tokens: 0,
output_tokens: 0,
cache_creation_input_tokens: None,
cache_read_input_tokens: None,
},
};
let sse_string: String = event.clone().into();
self.buffered_events.push(SseEvent {
data: None,
event: Some("message_delta".to_string()),
raw_line: sse_string.clone(),
sse_transformed_lines: sse_string,
provider_stream_response: Some(ProviderStreamResponseType::MessagesStreamEvent(event)),
});
self.seen_message_delta = true;
}
/// Inject a `message_stop` event into the buffer, marking the stream as closed.
/// Idempotent — subsequent calls are no-ops.
fn push_message_stop(&mut self) {
if self.message_stopped {
return;
}
let message_stop = MessagesStreamEvent::MessageStop;
let sse_string: String = message_stop.into();
self.buffered_events.push(SseEvent {
data: None,
event: Some("message_stop".to_string()),
raw_line: sse_string.clone(),
sse_transformed_lines: sse_string,
provider_stream_response: None,
});
self.message_stopped = true;
self.seen_message_delta = false;
}
/// Check if a content_block_start event has been sent for the given index
fn has_content_block_start_been_sent(&self, index: i32) -> bool {
self.content_block_start_indices.contains(&index)
@ -149,6 +224,27 @@ impl SseStreamBufferTrait for AnthropicMessagesStreamBuffer {
// We match on a reference first to determine the type, then move the event
match &event.provider_stream_response {
Some(ProviderStreamResponseType::MessagesStreamEvent(evt)) => {
// If the message has already been closed, drop any trailing events
// to avoid emitting data after `message_stop` (protocol violation).
// This typically indicates a duplicate `[DONE]` from upstream or a
// replay of previously-buffered bytes — worth surfacing so we can
// spot misbehaving providers.
if self.message_stopped {
warn!(
"anthropic stream buffer: dropping event after message_stop (variant={})",
match evt {
MessagesStreamEvent::MessageStart { .. } => "message_start",
MessagesStreamEvent::ContentBlockStart { .. } => "content_block_start",
MessagesStreamEvent::ContentBlockDelta { .. } => "content_block_delta",
MessagesStreamEvent::ContentBlockStop { .. } => "content_block_stop",
MessagesStreamEvent::MessageDelta { .. } => "message_delta",
MessagesStreamEvent::MessageStop => "message_stop",
MessagesStreamEvent::Ping => "ping",
}
);
return;
}
match evt {
MessagesStreamEvent::MessageStart { .. } => {
// Add the message_start event
@ -157,14 +253,7 @@ impl SseStreamBufferTrait for AnthropicMessagesStreamBuffer {
}
MessagesStreamEvent::ContentBlockStart { index, .. } => {
let index = *index as i32;
// Inject message_start if needed
if !self.message_started {
let model = self.model.as_deref().unwrap_or("unknown");
let message_start =
AnthropicMessagesStreamBuffer::create_message_start_event(model);
self.buffered_events.push(message_start);
self.message_started = true;
}
self.ensure_message_started();
// Add the content_block_start event (from tool calls or other sources)
self.buffered_events.push(event);
@ -173,14 +262,7 @@ impl SseStreamBufferTrait for AnthropicMessagesStreamBuffer {
}
MessagesStreamEvent::ContentBlockDelta { index, .. } => {
let index = *index as i32;
// Inject message_start if needed
if !self.message_started {
let model = self.model.as_deref().unwrap_or("unknown");
let message_start =
AnthropicMessagesStreamBuffer::create_message_start_event(model);
self.buffered_events.push(message_start);
self.message_started = true;
}
self.ensure_message_started();
// Check if ContentBlockStart was sent for this index
if !self.has_content_block_start_been_sent(index) {
@ -196,6 +278,11 @@ impl SseStreamBufferTrait for AnthropicMessagesStreamBuffer {
self.buffered_events.push(event);
}
MessagesStreamEvent::MessageDelta { usage, .. } => {
// `message_delta` is only meaningful inside an open message.
// Upstream can send it with no prior content (empty completion,
// content filter, etc.), so we must open a message first.
self.ensure_message_started();
// Inject ContentBlockStop before message_delta
if self.needs_content_block_stop {
let content_block_stop =
@ -230,15 +317,52 @@ impl SseStreamBufferTrait for AnthropicMessagesStreamBuffer {
}
MessagesStreamEvent::ContentBlockStop { .. } => {
// ContentBlockStop received from upstream (e.g., Bedrock)
self.ensure_message_started();
// Clear the flag so we don't inject another one
self.needs_content_block_stop = false;
self.buffered_events.push(event);
}
MessagesStreamEvent::MessageStop => {
// MessageStop received from upstream (e.g., OpenAI via [DONE])
// Clear the flag so we don't inject another one
self.seen_message_delta = false;
// MessageStop received from upstream (e.g., OpenAI via [DONE]).
//
// The Anthropic protocol requires the full envelope
// message_start → [content blocks] → message_delta → message_stop
// so we must not emit a bare `message_stop`. Synthesize whatever
// is missing to keep the client's state machine consistent.
self.ensure_message_started();
if self.needs_content_block_stop {
let content_block_stop =
AnthropicMessagesStreamBuffer::create_content_block_stop_event();
self.buffered_events.push(content_block_stop);
self.needs_content_block_stop = false;
}
// If no message_delta has been emitted yet (empty/filtered upstream
// response), synthesize a minimal one carrying `end_turn`.
if !self.seen_message_delta {
// If we also never opened a content block, open and close one
// so clients that expect at least one block are happy.
if self.content_block_start_indices.is_empty() {
let content_block_start =
AnthropicMessagesStreamBuffer::create_content_block_start_event(
);
self.buffered_events.push(content_block_start);
self.set_content_block_start_sent(0);
let content_block_stop =
AnthropicMessagesStreamBuffer::create_content_block_stop_event(
);
self.buffered_events.push(content_block_stop);
}
self.push_synthetic_message_delta();
}
// Push the upstream-provided message_stop and mark closed.
// `push_message_stop` is idempotent but we want to reuse the
// original SseEvent so raw passthrough semantics are preserved.
self.buffered_events.push(event);
self.message_stopped = true;
self.seen_message_delta = false;
}
_ => {
// Other Anthropic event types (Ping, etc.), just accumulate
@ -254,24 +378,23 @@ impl SseStreamBufferTrait for AnthropicMessagesStreamBuffer {
}
fn to_bytes(&mut self) -> Vec<u8> {
// Convert all accumulated events to bytes and clear buffer
// Convert all accumulated events to bytes and clear buffer.
//
// NOTE: We do NOT inject ContentBlockStop here because it's injected when we see MessageDelta
// or MessageStop. Injecting it here causes premature ContentBlockStop in the middle of streaming.
// Inject MessageStop after MessageDelta if we've seen one
// This completes the Anthropic Messages API event sequence
if self.seen_message_delta {
let message_stop = MessagesStreamEvent::MessageStop;
let sse_string: String = message_stop.into();
let message_stop_event = SseEvent {
data: None,
event: Some("message_stop".to_string()),
raw_line: sse_string.clone(),
sse_transformed_lines: sse_string,
provider_stream_response: None,
};
self.buffered_events.push(message_stop_event);
self.seen_message_delta = false;
//
// Inject a synthetic `message_stop` only when:
// 1. A `message_delta` has been seen (otherwise we'd violate the Anthropic
// protocol by emitting `message_stop` without a preceding `message_delta`), AND
// 2. We haven't already emitted `message_stop` (either synthetic from a
// previous flush, or real from an upstream `[DONE]`).
//
// Without the `!message_stopped` guard, a stream whose `finish_reason` chunk
// and `[DONE]` marker land in separate HTTP body chunks would receive two
// `message_stop` events, triggering Claude Code's "Received message_stop
// without a current message" error.
if self.seen_message_delta && !self.message_stopped {
self.push_message_stop();
}
let mut buffer = Vec::new();
@ -615,4 +738,133 @@ data: [DONE]"#;
println!("✓ Stop reason: tool_use");
println!("✓ Proper Anthropic tool_use protocol\n");
}
/// Regression test for:
/// Claude Code CLI error: "Received message_stop without a current message"
///
/// Reproduces the *double-close* scenario: OpenAI's final `finish_reason`
/// chunk and the `[DONE]` marker arrive in **separate** HTTP body chunks, so
/// `to_bytes()` is called between them. Before the fix, this produced two
/// `message_stop` events on the wire (one synthetic, one from `[DONE]`).
#[test]
fn test_openai_to_anthropic_emits_single_message_stop_across_chunk_boundary() {
let client_api = SupportedAPIsFromClient::AnthropicMessagesAPI(AnthropicApi::Messages);
let upstream_api = SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions);
let mut buffer = AnthropicMessagesStreamBuffer::new();
// --- HTTP chunk 1: content + finish_reason (no [DONE] yet) -----------
let chunk_1 = r#"data: {"id":"c1","object":"chat.completion.chunk","created":1,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Hi"},"finish_reason":null}]}
data: {"id":"c1","object":"chat.completion.chunk","created":1,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}"#;
for raw in SseStreamIter::try_from(chunk_1.as_bytes()).unwrap() {
let e = SseEvent::try_from((raw, &client_api, &upstream_api)).unwrap();
buffer.add_transformed_event(e);
}
let out_1 = String::from_utf8(buffer.to_bytes()).unwrap();
// --- HTTP chunk 2: just the [DONE] marker ----------------------------
let chunk_2 = "data: [DONE]";
for raw in SseStreamIter::try_from(chunk_2.as_bytes()).unwrap() {
let e = SseEvent::try_from((raw, &client_api, &upstream_api)).unwrap();
buffer.add_transformed_event(e);
}
let out_2 = String::from_utf8(buffer.to_bytes()).unwrap();
let combined = format!("{}{}", out_1, out_2);
let start_count = combined.matches("event: message_start").count();
let stop_count = combined.matches("event: message_stop").count();
assert_eq!(
start_count, 1,
"Must emit exactly one message_start across chunks, got {start_count}. Output:\n{combined}"
);
assert_eq!(
stop_count, 1,
"Must emit exactly one message_stop across chunks (no double-close), got {stop_count}. Output:\n{combined}"
);
// Every message_stop must be preceded by a message_start earlier in the stream.
let start_pos = combined.find("event: message_start").unwrap();
let stop_pos = combined.find("event: message_stop").unwrap();
assert!(
start_pos < stop_pos,
"message_start must come before message_stop. Output:\n{combined}"
);
}
/// Regression test for:
/// "Received message_stop without a current message" on empty upstream responses.
///
/// OpenAI returns only `[DONE]` with no content deltas and no `finish_reason`
/// (this happens with content filters, truncated upstream streams, and some
/// 5xx recoveries). Before the fix, the buffer emitted a bare `message_stop`
/// with no preceding `message_start`. After the fix, it synthesizes a
/// minimal but well-formed envelope.
#[test]
fn test_openai_done_only_stream_synthesizes_valid_envelope() {
let client_api = SupportedAPIsFromClient::AnthropicMessagesAPI(AnthropicApi::Messages);
let upstream_api = SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions);
let mut buffer = AnthropicMessagesStreamBuffer::new();
let raw_input = "data: [DONE]";
for raw in SseStreamIter::try_from(raw_input.as_bytes()).unwrap() {
let e = SseEvent::try_from((raw, &client_api, &upstream_api)).unwrap();
buffer.add_transformed_event(e);
}
let out = String::from_utf8(buffer.to_bytes()).unwrap();
assert!(
out.contains("event: message_start"),
"Empty upstream must still produce message_start. Output:\n{out}"
);
assert!(
out.contains("event: message_delta"),
"Empty upstream must produce a synthesized message_delta. Output:\n{out}"
);
assert_eq!(
out.matches("event: message_stop").count(),
1,
"Empty upstream must produce exactly one message_stop. Output:\n{out}"
);
// Protocol ordering: start < delta < stop.
let p_start = out.find("event: message_start").unwrap();
let p_delta = out.find("event: message_delta").unwrap();
let p_stop = out.find("event: message_stop").unwrap();
assert!(
p_start < p_delta && p_delta < p_stop,
"Bad ordering. Output:\n{out}"
);
}
/// Regression test: events arriving after `message_stop` (e.g. a stray `[DONE]`
/// echo, or late-arriving deltas from a racing upstream) must be dropped
/// rather than written after the terminal frame.
#[test]
fn test_events_after_message_stop_are_dropped() {
let client_api = SupportedAPIsFromClient::AnthropicMessagesAPI(AnthropicApi::Messages);
let upstream_api = SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions);
let mut buffer = AnthropicMessagesStreamBuffer::new();
let first = r#"data: {"id":"c1","object":"chat.completion.chunk","created":1,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"ok"},"finish_reason":"stop"}]}
data: [DONE]"#;
for raw in SseStreamIter::try_from(first.as_bytes()).unwrap() {
let e = SseEvent::try_from((raw, &client_api, &upstream_api)).unwrap();
buffer.add_transformed_event(e);
}
let _ = buffer.to_bytes();
// Simulate a duplicate / late `[DONE]` after the stream was already closed.
let late = "data: [DONE]";
for raw in SseStreamIter::try_from(late.as_bytes()).unwrap() {
let e = SseEvent::try_from((raw, &client_api, &upstream_api)).unwrap();
buffer.add_transformed_event(e);
}
let tail = String::from_utf8(buffer.to_bytes()).unwrap();
assert!(
tail.is_empty(),
"No bytes should be emitted after message_stop, got: {tail:?}"
);
}
}

View file

@ -95,6 +95,7 @@ providers:
anthropic:
- anthropic/claude-sonnet-4-6
- anthropic/claude-opus-4-6
- anthropic/claude-opus-4-7
- anthropic/claude-opus-4-5-20251101
- anthropic/claude-opus-4-5
- anthropic/claude-haiku-4-5-20251001
@ -328,6 +329,10 @@ providers:
- xiaomi/mimo-v2-flash
- xiaomi/mimo-v2-omni
- xiaomi/mimo-v2-pro
chatgpt:
- chatgpt/gpt-5.4
- chatgpt/gpt-5.3-codex
- chatgpt/gpt-5.2
digitalocean:
- digitalocean/openai-gpt-4.1
- digitalocean/openai-gpt-4o
@ -375,6 +380,6 @@ providers:
- digitalocean/qwen3-embedding-0.6b
- digitalocean/router:software-engineering
metadata:
total_providers: 12
total_models: 361
last_updated: 2026-04-16T00:00:00.000000+00:00
total_providers: 13
total_models: 364
last_updated: 2026-04-20T00:00:00.000000+00:00

View file

@ -175,7 +175,9 @@ impl SupportedAPIsFromClient {
match self {
SupportedAPIsFromClient::AnthropicMessagesAPI(AnthropicApi::Messages) => {
match provider_id {
ProviderId::Anthropic => build_endpoint("/v1", "/messages"),
ProviderId::Anthropic | ProviderId::Vercel => {
build_endpoint("/v1", "/messages")
}
ProviderId::AmazonBedrock => {
if request_path.starts_with("/v1/") && !is_streaming {
build_endpoint("", &format!("/model/{}/converse", model_id))
@ -192,7 +194,10 @@ impl SupportedAPIsFromClient {
// For Responses API, check if provider supports it, otherwise translate to chat/completions
match provider_id {
// Providers that support /v1/responses natively
ProviderId::OpenAI | ProviderId::XAI => route_by_provider("/responses"),
ProviderId::OpenAI
| ProviderId::XAI
| ProviderId::ChatGPT
| ProviderId::Vercel => route_by_provider("/responses"),
// All other providers: translate to /chat/completions
_ => route_by_provider("/chat/completions"),
}
@ -718,4 +723,36 @@ mod tests {
"/v1/responses"
);
}
#[test]
fn test_responses_api_targets_chatgpt_native_responses_endpoint() {
let api = SupportedAPIsFromClient::OpenAIResponsesAPI(OpenAIApi::Responses);
assert_eq!(
api.target_endpoint_for_provider(
&ProviderId::ChatGPT,
"/v1/responses",
"gpt-5.4",
false,
None,
false
),
"/v1/responses"
);
}
#[test]
fn test_responses_api_targets_vercel_native_responses_endpoint() {
let api = SupportedAPIsFromClient::OpenAIResponsesAPI(OpenAIApi::Responses);
assert_eq!(
api.target_endpoint_for_provider(
&ProviderId::Vercel,
"/v1/responses",
"gpt-5.4",
false,
None,
false
),
"/v1/responses"
);
}
}

View file

@ -44,7 +44,10 @@ pub enum ProviderId {
Zhipu,
Qwen,
AmazonBedrock,
ChatGPT,
DigitalOcean,
Vercel,
OpenRouter,
}
impl TryFrom<&str> for ProviderId {
@ -72,9 +75,12 @@ impl TryFrom<&str> for ProviderId {
"qwen" => Ok(ProviderId::Qwen),
"amazon_bedrock" => Ok(ProviderId::AmazonBedrock),
"amazon" => Ok(ProviderId::AmazonBedrock), // alias
"chatgpt" => Ok(ProviderId::ChatGPT),
"digitalocean" => Ok(ProviderId::DigitalOcean),
"do" => Ok(ProviderId::DigitalOcean), // alias
"do_ai" => Ok(ProviderId::DigitalOcean), // alias
"vercel" => Ok(ProviderId::Vercel),
"openrouter" => Ok(ProviderId::OpenRouter),
_ => Err(format!("Unknown provider: {}", value)),
}
}
@ -99,6 +105,7 @@ impl ProviderId {
ProviderId::Moonshotai => "moonshotai",
ProviderId::Zhipu => "z-ai",
ProviderId::Qwen => "qwen",
ProviderId::ChatGPT => "chatgpt",
ProviderId::DigitalOcean => "digitalocean",
_ => return Vec::new(),
};
@ -137,6 +144,17 @@ impl ProviderId {
SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions)
}
// Vercel AI Gateway natively supports all three API types
(ProviderId::Vercel, SupportedAPIsFromClient::AnthropicMessagesAPI(_)) => {
SupportedUpstreamAPIs::AnthropicMessagesAPI(AnthropicApi::Messages)
}
(ProviderId::Vercel, SupportedAPIsFromClient::OpenAIChatCompletions(_)) => {
SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions)
}
(ProviderId::Vercel, SupportedAPIsFromClient::OpenAIResponsesAPI(_)) => {
SupportedUpstreamAPIs::OpenAIResponsesAPI(OpenAIApi::Responses)
}
// OpenAI-compatible providers only support OpenAI chat completions
(
ProviderId::OpenAI
@ -154,7 +172,9 @@ impl ProviderId {
| ProviderId::Moonshotai
| ProviderId::Zhipu
| ProviderId::Qwen
| ProviderId::DigitalOcean,
| ProviderId::DigitalOcean
| ProviderId::OpenRouter
| ProviderId::ChatGPT,
SupportedAPIsFromClient::AnthropicMessagesAPI(_),
) => SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions),
@ -174,13 +194,15 @@ impl ProviderId {
| ProviderId::Moonshotai
| ProviderId::Zhipu
| ProviderId::Qwen
| ProviderId::DigitalOcean,
| ProviderId::DigitalOcean
| ProviderId::OpenRouter
| ProviderId::ChatGPT,
SupportedAPIsFromClient::OpenAIChatCompletions(_),
) => SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions),
// OpenAI Responses API - OpenAI and xAI support this natively
// OpenAI Responses API - OpenAI, xAI, and ChatGPT support this natively
(
ProviderId::OpenAI | ProviderId::XAI,
ProviderId::OpenAI | ProviderId::XAI | ProviderId::ChatGPT,
SupportedAPIsFromClient::OpenAIResponsesAPI(_),
) => SupportedUpstreamAPIs::OpenAIResponsesAPI(OpenAIApi::Responses),
@ -241,7 +263,10 @@ impl Display for ProviderId {
ProviderId::Zhipu => write!(f, "zhipu"),
ProviderId::Qwen => write!(f, "qwen"),
ProviderId::AmazonBedrock => write!(f, "amazon_bedrock"),
ProviderId::ChatGPT => write!(f, "chatgpt"),
ProviderId::DigitalOcean => write!(f, "digitalocean"),
ProviderId::Vercel => write!(f, "vercel"),
ProviderId::OpenRouter => write!(f, "openrouter"),
}
}
}
@ -344,6 +369,79 @@ mod tests {
);
}
#[test]
fn test_vercel_and_openrouter_parsing() {
assert_eq!(ProviderId::try_from("vercel"), Ok(ProviderId::Vercel));
assert!(ProviderId::try_from("vercel_ai").is_err());
assert_eq!(
ProviderId::try_from("openrouter"),
Ok(ProviderId::OpenRouter)
);
assert!(ProviderId::try_from("open_router").is_err());
}
#[test]
fn test_vercel_compatible_api() {
use crate::clients::endpoints::{SupportedAPIsFromClient, SupportedUpstreamAPIs};
let openai_client =
SupportedAPIsFromClient::OpenAIChatCompletions(OpenAIApi::ChatCompletions);
let upstream = ProviderId::Vercel.compatible_api_for_client(&openai_client, false);
assert!(
matches!(upstream, SupportedUpstreamAPIs::OpenAIChatCompletions(_)),
"Vercel should map OpenAI client to OpenAIChatCompletions upstream"
);
let anthropic_client =
SupportedAPIsFromClient::AnthropicMessagesAPI(AnthropicApi::Messages);
let upstream = ProviderId::Vercel.compatible_api_for_client(&anthropic_client, false);
assert!(
matches!(upstream, SupportedUpstreamAPIs::AnthropicMessagesAPI(_)),
"Vercel should map Anthropic client to AnthropicMessagesAPI upstream natively"
);
let responses_client = SupportedAPIsFromClient::OpenAIResponsesAPI(OpenAIApi::Responses);
let upstream = ProviderId::Vercel.compatible_api_for_client(&responses_client, false);
assert!(
matches!(upstream, SupportedUpstreamAPIs::OpenAIResponsesAPI(_)),
"Vercel should map Responses API client to OpenAIResponsesAPI upstream natively"
);
}
#[test]
fn test_openrouter_compatible_api() {
use crate::clients::endpoints::{SupportedAPIsFromClient, SupportedUpstreamAPIs};
let openai_client =
SupportedAPIsFromClient::OpenAIChatCompletions(OpenAIApi::ChatCompletions);
let upstream = ProviderId::OpenRouter.compatible_api_for_client(&openai_client, false);
assert!(
matches!(upstream, SupportedUpstreamAPIs::OpenAIChatCompletions(_)),
"OpenRouter should map OpenAI client to OpenAIChatCompletions upstream"
);
let anthropic_client =
SupportedAPIsFromClient::AnthropicMessagesAPI(AnthropicApi::Messages);
let upstream = ProviderId::OpenRouter.compatible_api_for_client(&anthropic_client, false);
assert!(
matches!(upstream, SupportedUpstreamAPIs::OpenAIChatCompletions(_)),
"OpenRouter should translate Anthropic client to OpenAIChatCompletions upstream"
);
let responses_client = SupportedAPIsFromClient::OpenAIResponsesAPI(OpenAIApi::Responses);
let upstream = ProviderId::OpenRouter.compatible_api_for_client(&responses_client, false);
assert!(
matches!(upstream, SupportedUpstreamAPIs::OpenAIChatCompletions(_)),
"OpenRouter should translate Responses API client to OpenAIChatCompletions upstream"
);
}
#[test]
fn test_vercel_and_openrouter_empty_models() {
assert!(ProviderId::Vercel.models().is_empty());
assert!(ProviderId::OpenRouter.models().is_empty());
}
#[test]
fn test_xai_uses_responses_api_for_responses_clients() {
use crate::clients::endpoints::{SupportedAPIsFromClient, SupportedUpstreamAPIs};
@ -355,4 +453,16 @@ mod tests {
SupportedUpstreamAPIs::OpenAIResponsesAPI(OpenAIApi::Responses)
));
}
#[test]
fn test_chatgpt_uses_responses_api_for_responses_clients() {
use crate::clients::endpoints::{SupportedAPIsFromClient, SupportedUpstreamAPIs};
let client_api = SupportedAPIsFromClient::OpenAIResponsesAPI(OpenAIApi::Responses);
let upstream = ProviderId::ChatGPT.compatible_api_for_client(&client_api, false);
assert!(matches!(
upstream,
SupportedUpstreamAPIs::OpenAIResponsesAPI(OpenAIApi::Responses)
));
}
}

View file

@ -77,7 +77,7 @@ impl ProviderRequestType {
&mut self,
provider_id: ProviderId,
upstream_api: &SupportedUpstreamAPIs,
) {
) -> Result<(), ProviderRequestError> {
if provider_id == ProviderId::XAI
&& matches!(
upstream_api,
@ -89,6 +89,48 @@ impl ProviderRequestType {
req.web_search_options = None;
}
}
// ChatGPT requires instructions, store=false, and input as a list
if provider_id == ProviderId::ChatGPT {
if let Self::ResponsesAPIRequest(req) = self {
use crate::apis::openai_responses::{
InputItem, InputMessage, InputParam, MessageContent, MessageRole,
};
const CHATGPT_BASE_INSTRUCTIONS: &str =
"You are Codex, based on GPT-5. You are running as a coding agent in the Codex CLI on a user's computer.";
match &req.instructions {
Some(existing) if existing.contains(CHATGPT_BASE_INSTRUCTIONS) => {}
Some(existing) => {
req.instructions =
Some(format!("{}\n\n{}", CHATGPT_BASE_INSTRUCTIONS, existing));
}
None => {
req.instructions = Some(CHATGPT_BASE_INSTRUCTIONS.to_string());
}
}
req.store = Some(false);
if req.stream == Some(false) {
return Err(ProviderRequestError {
message: "Non-streaming requests are not supported for the ChatGPT Codex provider. Set stream=true or omit the stream field.".to_string(),
source: None,
});
}
req.stream = Some(true);
// ChatGPT backend requires input to be a list, not a plain string
if let InputParam::Text(text) = &req.input {
req.input = InputParam::Items(vec![InputItem::Message(InputMessage {
role: MessageRole::User,
content: MessageContent::Text(text.clone()),
})]);
}
if let InputParam::SingleItem(item) = &req.input {
req.input = InputParam::Items(vec![item.clone()]);
}
}
}
Ok(())
}
}
@ -824,10 +866,12 @@ mod tests {
..Default::default()
});
request.normalize_for_upstream(
ProviderId::XAI,
&SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions),
);
request
.normalize_for_upstream(
ProviderId::XAI,
&SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions),
)
.unwrap();
let ProviderRequestType::ChatCompletionsRequest(req) = request else {
panic!("expected chat request");
@ -852,10 +896,12 @@ mod tests {
..Default::default()
});
request.normalize_for_upstream(
ProviderId::OpenAI,
&SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions),
);
request
.normalize_for_upstream(
ProviderId::OpenAI,
&SupportedUpstreamAPIs::OpenAIChatCompletions(OpenAIApi::ChatCompletions),
)
.unwrap();
let ProviderRequestType::ChatCompletionsRequest(req) = request else {
panic!("expected chat request");

View file

@ -346,12 +346,10 @@ impl TryFrom<(SseEvent, &SupportedAPIsFromClient, &SupportedUpstreamAPIs)> for S
(
SupportedAPIsFromClient::OpenAIChatCompletions(_),
SupportedUpstreamAPIs::AnthropicMessagesAPI(_),
) => {
) if transformed_event.is_event_only() && transformed_event.event.is_some() => {
// OpenAI clients don't expect separate event: lines
// Suppress upstream Anthropic event-only lines
if transformed_event.is_event_only() && transformed_event.event.is_some() {
transformed_event.sse_transformed_lines = "\n".to_string();
}
transformed_event.sse_transformed_lines = "\n".to_string();
}
_ => {
// Other cross-API combinations can be handled here as needed
@ -371,12 +369,10 @@ impl TryFrom<(SseEvent, &SupportedAPIsFromClient, &SupportedUpstreamAPIs)> for S
| (
SupportedAPIsFromClient::OpenAIResponsesAPI(_),
SupportedUpstreamAPIs::OpenAIResponsesAPI(_),
) => {
if transformed_event.is_event_only() && transformed_event.event.is_some() {
// Mark as should-skip by clearing sse_transformed_lines
// The event line is already included when the data line is transformed
transformed_event.sse_transformed_lines = String::new();
}
) if transformed_event.is_event_only() && transformed_event.event.is_some() => {
// Mark as should-skip by clearing sse_transformed_lines
// The event line is already included when the data line is transformed
transformed_event.sse_transformed_lines = String::new();
}
_ => {
// Other passthrough combinations (OpenAI ChatCompletions, etc.) don't have this issue

View file

@ -188,14 +188,13 @@ pub fn convert_openai_message_to_anthropic_content(
// Handle regular content
match &message.content {
Some(MessageContent::Text(text)) => {
if !text.is_empty() {
blocks.push(MessagesContentBlock::Text {
text: text.clone(),
cache_control: None,
});
}
Some(MessageContent::Text(text)) if !text.is_empty() => {
blocks.push(MessagesContentBlock::Text {
text: text.clone(),
cache_control: None,
});
}
Some(MessageContent::Text(_)) => {}
Some(MessageContent::Parts(parts)) => {
for part in parts {
match part {

View file

@ -354,10 +354,10 @@ impl TryFrom<MessagesMessage> for BedrockMessage {
MessagesMessageContent::Blocks(blocks) => {
for block in blocks {
match block {
crate::apis::anthropic::MessagesContentBlock::Text { text, .. } => {
if !text.is_empty() {
content_blocks.push(ContentBlock::Text { text });
}
crate::apis::anthropic::MessagesContentBlock::Text { text, .. }
if !text.is_empty() =>
{
content_blocks.push(ContentBlock::Text { text });
}
crate::apis::anthropic::MessagesContentBlock::ToolUse {
id,

View file

@ -317,11 +317,10 @@ impl TryFrom<Message> for BedrockMessage {
Role::User => {
// Convert user message content to content blocks
match message.content {
Some(MessageContent::Text(text)) => {
if !text.is_empty() {
content_blocks.push(ContentBlock::Text { text });
}
Some(MessageContent::Text(text)) if !text.is_empty() => {
content_blocks.push(ContentBlock::Text { text });
}
Some(MessageContent::Text(_)) => {}
Some(MessageContent::Parts(parts)) => {
// Convert OpenAI content parts to Bedrock ContentBlocks
for part in parts {

View file

@ -241,6 +241,14 @@ impl StreamContext {
}
}
// Apply any extra headers configured on the provider (e.g., ChatGPT-Account-Id, originator)
let headers = self.llm_provider().headers.clone();
if let Some(headers) = headers {
for (key, value) in &headers {
self.set_http_request_header(key, Some(value));
}
}
Ok(())
}
@ -1060,7 +1068,20 @@ impl HttpContext for StreamContext {
match ProviderRequestType::try_from((deserialized_client_request, upstream)) {
Ok(mut request) => {
request.normalize_for_upstream(self.get_provider_id(), upstream);
if let Err(e) =
request.normalize_for_upstream(self.get_provider_id(), upstream)
{
warn!(
"request_id={}: normalize_for_upstream failed: {}",
self.request_identifier(),
e
);
self.send_server_error(
ServerError::LogicError(e.message),
Some(StatusCode::BAD_REQUEST),
);
return Action::Pause;
}
debug!(
"request_id={}: upstream request payload: {}",
self.request_identifier(),

View file

@ -0,0 +1,61 @@
# ChatGPT Subscription Routing
Route requests through your ChatGPT Plus/Pro subscription using Plano. Uses the OpenAI Responses API under the hood, targeting `chatgpt.com/backend-api/codex/responses`.
## Setup
### 1. Authenticate with ChatGPT
```bash
planoai chatgpt login
```
This opens a device code flow — visit the URL shown and enter the code. Tokens are saved to `~/.plano/chatgpt/auth.json`.
### 2. Start Plano
```bash
planoai up config.yaml
```
### 3. Send a request
```bash
curl http://localhost:12000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.2",
"input": "Hello, what model are you?"
}'
```
Or use the test script:
```bash
bash test_chatgpt.sh
```
## How it works
- `chatgpt/gpt-5.2` in the config tells Plano to use the ChatGPT subscription provider
- Plano reads OAuth tokens from `~/.plano/chatgpt/auth.json` (auto-refreshes if expired)
- Requests are proxied to `https://chatgpt.com/backend-api/codex/responses` with the required headers:
- `Authorization: Bearer <access_token>`
- `ChatGPT-Account-Id: <account_id>`
- `originator: codex_cli_rs`
- `session_id: <uuid>`
## Available models
```
chatgpt/gpt-5.4
chatgpt/gpt-5.3-codex
chatgpt/gpt-5.2
```
## Managing credentials
```bash
planoai chatgpt status # Check auth status
planoai chatgpt logout # Remove stored credentials
```

View file

@ -0,0 +1,38 @@
#!/usr/bin/env python3
"""Interactive chat with a model through Plano using the OpenAI SDK."""
import sys
from openai import OpenAI
client = OpenAI(base_url="http://localhost:12000/v1", api_key="unused")
def run_chat(model):
print(f"Chatting with {model} via Plano (Ctrl+C to quit)\n")
history = []
while True:
try:
user_input = input("you> ")
except (KeyboardInterrupt, EOFError):
print("\nbye")
break
if not user_input.strip():
continue
history.append({"role": "user", "content": user_input})
stream = client.responses.create(model=model, input=history, stream=True)
print(f"{model}> ", end="", flush=True)
full = ""
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
full += event.delta
print()
history.append({"role": "assistant", "content": full})
if __name__ == "__main__":
model = sys.argv[1] if len(sys.argv) > 1 else "gpt-5.2"
run_chat(model)

View file

@ -0,0 +1,9 @@
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: chatgpt/*

View file

@ -0,0 +1,18 @@
#!/bin/bash
# Test ChatGPT subscription routing through Plano
# Prerequisites: planoai chatgpt login && planoai up config.yaml
set -e
echo "Testing ChatGPT subscription via Plano Responses API..."
echo ""
curl -s http://localhost:12000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.2",
"input": "What is 2 + 2? Reply in one word."
}' | python3 -m json.tool
echo ""
echo "Done."

View file

@ -19,7 +19,7 @@ model_providers:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
# Anthropic Models
- model: anthropic/claude-sonnet-4-5
- model: anthropic/claude-sonnet-4-6
default: true
access_key: $ANTHROPIC_API_KEY

View file

@ -3,7 +3,7 @@ This demo shows how you can use user preferences to route user prompts to approp
## How to start the demo
Make sure you have Plano CLI installed (`pip install planoai==0.4.20` or `uv tool install planoai==0.4.20`).
Make sure you have Plano CLI installed (`pip install planoai==0.4.21` or `uv tool install planoai==0.4.21`).
```bash
cd demos/llm_routing/preference_based_routing

View file

@ -34,11 +34,13 @@ POST /v1/chat/completions
### `routing_preferences` fields
| Field | Type | Required | Description |
|---|---|---|---|
| `name` | string | yes | Route identifier. Must match the LLM router's route classification. |
| `description` | string | yes | Natural language description used by the router to match user intent. |
| `models` | string[] | yes | Ordered candidate pool. At least one entry required. Must be declared in `model_providers`. |
| Field | Type | Required | Description |
| ------------- | -------- | -------- | ------------------------------------------------------------------------------------------- |
| `name` | string | yes | Route identifier. Must match the LLM router's route classification. |
| `description` | string | yes | Natural language description used by the router to match user intent. |
| `models` | string[] | yes | Ordered candidate pool. At least one entry required. Must be declared in `model_providers`. |
### Notes
@ -64,11 +66,13 @@ POST /v1/chat/completions
### Fields
| Field | Type | Description |
|---|---|---|
| `models` | string[] | Ranked model list. Use `models[0]` as primary; retry with `models[1]` on 429/5xx, and so on. |
| `route` | string \| null | Name of the matched route. `null` if no route matched — client should use the original request `model`. |
| `trace_id` | string | Trace ID for distributed tracing and observability. |
| Field | Type | Description |
| ---------- | ------------- | ------------------------------------------------------------------------------------------------------- |
| `models` | string[] | Ranked model list. Use `models[0]` as primary; retry with `models[1]` on 429/5xx, and so on. |
| `route` | string | null | Name of the matched route. `null` if no route matched — client should use the original request `model`. |
| `trace_id` | string | Trace ID for distributed tracing and observability. |
---
@ -142,6 +146,7 @@ X-Model-Affinity: a1b2c3d4-5678-...
```
Response when pinned:
```json
{
"models": ["anthropic/claude-sonnet-4-20250514"],
@ -155,6 +160,7 @@ Response when pinned:
Without the header, routing runs fresh every time (no breaking change).
Configure TTL and cache size:
```yaml
routing:
session_ttl_seconds: 600 # default: 10 min
@ -165,7 +171,8 @@ routing:
## Version Requirements
| Version | Top-level `routing_preferences` |
|---|---|
| Version | Top-level `routing_preferences` |
| ---------- | -------------------------------------- |
| `< v0.4.0` | Not allowed — startup error if present |
| `v0.4.0+` | Supported (required for model routing) |
| `v0.4.0+` | Supported (required for model routing) |

View file

@ -158,7 +158,9 @@ Anthropic
.. code-block:: yaml
llm_providers:
version: v0.4.0
model_providers:
# Configure all Anthropic models with wildcard
- model: anthropic/*
access_key: $ANTHROPIC_API_KEY
@ -179,8 +181,12 @@ Anthropic
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_PROD_API_KEY
routing_preferences:
- name: code_generation
routing_preferences:
- name: code_generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
models:
- anthropic/claude-sonnet-4-20250514
DeepSeek
~~~~~~~~
@ -798,7 +804,9 @@ You can configure specific models with custom settings even when using wildcards
.. code-block:: yaml
llm_providers:
version: v0.4.0
model_providers:
# Expand to all Anthropic models
- model: anthropic/*
access_key: $ANTHROPIC_API_KEY
@ -807,14 +815,17 @@ You can configure specific models with custom settings even when using wildcards
# This model will NOT be included in the wildcard expansion above
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_PROD_API_KEY
routing_preferences:
- name: code_generation
priority: 1
# Another specific override
- model: anthropic/claude-3-haiku-20240307
access_key: $ANTHROPIC_DEV_API_KEY
routing_preferences:
- name: code_generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
models:
- anthropic/claude-sonnet-4-20250514
**Custom Provider Wildcards:**
For providers not in Plano's registry, wildcards enable dynamic model routing:
@ -856,24 +867,36 @@ Mark one model as the default for fallback scenarios:
Routing Preferences
~~~~~~~~~~~~~~~~~~~
Configure routing preferences for dynamic model selection:
Starting in ``v0.4.0``, configure routing preferences at the top level of the config. Each preference declares an ordered ``models`` candidate pool; the first entry is primary and the rest are fallbacks the client tries on ``429``/``5xx`` errors. Multiple providers can serve the same route — just list them all under ``models``. See :doc:`/guides/llm_router` for the full routing model.
.. code-block:: yaml
llm_providers:
version: v0.4.0
model_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
routing_preferences:
- name: complex_reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
- name: code_review
description: reviewing and analyzing existing code for bugs and improvements
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative_writing
description: creative content generation, storytelling, and writing assistance
routing_preferences:
- name: complex_reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
models:
- openai/gpt-5.2
- anthropic/claude-sonnet-4-5
- name: code_review
description: reviewing and analyzing existing code for bugs and improvements
models:
- openai/gpt-5.2
- name: creative_writing
description: creative content generation, storytelling, and writing assistance
models:
- anthropic/claude-sonnet-4-5
.. note::
``v0.3.0`` configs that declare ``routing_preferences`` inline under each ``model_provider`` are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update to the form above to silence the warning and gain the multi-model fallback behavior.
.. _passthrough_auth:

View file

@ -4,333 +4,602 @@
Signals™
========
Agentic Signals are behavioral and executions quality indicators that act as early warning signs of agent performance—highlighting both brilliant successes and **severe failures**. These signals are computed directly from conversation traces without requiring manual labeling or domain expertise, making them practical for production observability at scale.
Agentic Signals are lightweight, model-free behavioral indicators computed
from live interaction trajectories and attached to your existing
OpenTelemetry traces. They are the instrumentation layer of a closed-loop
improvement flywheel for agents — turning raw production traffic into
prioritized data that can drive prompt, routing, and model updates without
running an LLM-as-judge on every session.
The Problem: Knowing What's "Good"
==================================
The framework implemented here follows the taxonomy and detector design in
*Signals: Trajectory Sampling and Triage for Agentic Interactions*
(`Chen et al., 2026 <https://arxiv.org/abs/2604.00356>`_). All detectors
are computed without model calls; the entire pipeline attaches structured
attributes and span events to existing spans so your dashboards and alerts
work unmodified.
One of the hardest parts of building agents is measuring how well they perform in the real world.
Why Signals Matter: The Improvement Flywheel
============================================
**Offline testing** relies on hand-picked examples and happy-path scenarios, missing the messy diversity of real usage. Developers manually prompt models, evaluate responses, and tune prompts by guesswork—a slow, incomplete feedback loop.
Agentic applications are increasingly deployed at scale, yet improving them
after deployment remains difficult. Production trajectories are long,
numerous, and non-deterministic, making exhaustive human review infeasible
and auxiliary LLM evaluation expensive. As a result, teams face a
bottleneck: they cannot score every response, inspect every trace, or
reliably identify which failures and successes should inform the next model
update. Without a low-cost triage layer, the feedback loop from production
behavior to model improvement remains incomplete.
**Production debugging** floods developers with traces and logs but provides little guidance on which interactions actually matter. Finding failures means painstakingly reconstructing sessions and manually labeling quality issues.
Signals close this loop by cheaply identifying which interactions among
millions are worth inspecting:
You can't score every response with an LLM-as-judge (too expensive, too slow) or manually review every trace (doesn't scale). What you need are **behavioral signals**—fast, economical proxies that dont label quality outright but dramatically shrink the search space, pointing to sessions most likely to be broken or brilliant.
1. **Instrument.** Live trajectories are scored with model-free signals
attached as structured attributes on existing OpenTelemetry spans,
organized under a fixed taxonomy of interaction, execution, and
environment signals. This requires no additional model calls,
infrastructure, or changes to online agent behavior.
2. **Sample & triage.** Signal attributes act as filters: they surface
severe failures, retrieve representative exemplars, and exclude the
uninformative middle. In our experiments, signal-based sampling
achieves 82% informativeness on :math:`\tau`-bench, compared with 54%
for random sampling, yielding a 1.52× efficiency gain per informative
trajectory.
3. **Data Construction.** The triaged subset becomes targeted input for
constructing preference datasets or supervised fine-tuning datasets
from production trajectories.
4. **Model Optimization.** The resulting preference or supervised
fine-tuning data is used to update the model through methods such as
DPO, RLHF, or supervised fine-tuning, so optimization is driven by
targeted production behavior rather than undifferentiated trace noise.
5. **Deploy.** The improved model is deployed and immediately
re-instrumented with the same signals, enabling teams to measure
whether the change improved production behavior and to feed the next
iteration.
This loop depends on the first step being nearly free. The framework is
therefore designed around fixed-taxonomy, model-free detectors with
:math:`O(\text{messages})` cost, no online behavior change, and no
dependence on expensive evaluator models. By making production traces
searchable and sampleable at scale, signals turn raw agent telemetry into a
practical model-optimization flywheel.
What Are Behavioral Signals?
============================
Behavioral signals are canaries in the coal mine—early, objective indicators that something may have gone wrong (or gone exceptionally well). They dont explain *why* an agent failed, but they reliably signal *where* attention is needed.
Behavioral signals are canaries in the coal mine — early, objective
indicators that something may have gone wrong (or gone exceptionally well).
They don't explain *why* an agent failed, but they reliably signal *where*
attention is needed.
These signals emerge naturally from the rhythm of interaction:
- A user rephrasing the same request
- A user rephrasing or correcting the same request
- Sharp increases in conversation length
- Frustrated follow-up messages (ALL CAPS, "this doesnt work", excessive !!!/???)
- Agent repetition / looping
- Expressions of gratitude or satisfaction
- Requests to speak to a human / contact support
- Negative stance markers ("this doesn't work", ALL CAPS, excessive !!! or ???)
- Agent repetition or tool-call loops
- Expressions of gratitude, confirmation, or task success
- Requests for a human agent or explicit quit intent
- Tool errors, timeouts, rate limits, and context-window exhaustion
Individually, these clues are shallow; together, they form a fingerprint of agent performance. Embedded directly into traces, they make it easy to spot friction as it happens: where users struggle, where agents loop, and where escalations occur.
Individually, these clues are shallow; together, they form a fingerprint of
agent performance. Embedded directly into traces, they make it easy to spot
friction as it happens: where users struggle, where agents loop, where tool
failures cluster, and where escalations occur.
Signals vs Response Quality
===========================
Signal Taxonomy
===============
Behavioral signals and response quality are complementary.
Signals are organized into three top-level **layers**, each with its own
intent. Every detected signal belongs to exactly one leaf type under one of
seven categories. The per-category summaries and leaf-type descriptions
below are borrowed verbatim from the reference implementation at
`katanemo/signals <https://github.com/katanemo/signals>`_ to keep the
documentation and the detector contract in sync.
**Response Quality**
Domain-specific correctness: did the agent do the right thing given business rules, user intent, and operational context? This often requires subject-matter experts or outcome instrumentation and is time-intensive but irreplaceable.
Interaction — user ↔ agent conversational quality
-------------------------------------------------
**Behavioral Signals**
Observable patterns that correlate with quality: high repair frequency, excessive turns, frustration markers, repetition, escalation, and positive feedback. Fast to compute and valuable for prioritizing which traces deserve inspection.
**Misalignment** — Misalignment signals capture semantic or intent mismatch
between the user and the agent, such as rephrasing, corrections,
clarifications, and restated constraints. These signals do not assert that
either party is "wrong"; they only indicate that shared understanding has
not yet been established.
Used together, signals tell you *where to look*, and quality evaluation tells you *what went wrong (or right)*.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``misalignment.correction``
- Explicit corrections, negations, mistake acknowledgments.
* - ``misalignment.rephrase``
- Rephrasing indicators, alternative explanations.
* - ``misalignment.clarification``
- Confusion expressions, requests for clarification.
**Stagnation** — Stagnation signals capture cases where the discourse
continues but fails to make visible progress. This includes near-duplicate
assistant responses, circular explanations, repeated scaffolding, and other
forms of linguistic degeneration.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``stagnation.dragging``
- Excessive turn count, conversation not progressing efficiently.
* - ``stagnation.repetition``
- Near-duplicate or repetitive assistant responses.
**Disengagement** — Disengagement signals mark the withdrawal of
cooperative intent from the interaction. These include explicit requests to
exit the agent flow (e.g., "talk to a human"), strong negative stances, and
abandonment markers.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``disengagement.escalation``
- Requests for human agent or support.
* - ``disengagement.quit``
- Notification to quit or leave.
* - ``disengagement.negative_stance``
- Complaints, frustration, negative sentiment.
**Satisfaction** — Satisfaction signals indicate explicit stabilization and
completion of the interaction. These include expressions of gratitude,
success confirmations, and closing utterances. We use these signals to
sample exemplar traces rather than to assign quality scores.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``satisfaction.gratitude``
- Expressions of thanks and appreciation.
* - ``satisfaction.confirmation``
- Explicit satisfaction expressions.
* - ``satisfaction.success``
- Confirmation of task completion or understanding.
Execution — agent-caused action quality
---------------------------------------
**Failure** — Detects agent-caused failures in tool/function usage. These
are issues the agent is responsible for (as opposed to environment failures
which are external system issues). Requires tool-call traces
(``function_call`` / ``observation``) to fire.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``execution.failure.invalid_args``
- Wrong type, missing required field.
* - ``execution.failure.bad_query``
- Empty results due to overly narrow/wrong query.
* - ``execution.failure.tool_not_found``
- Agent called non-existent tool.
* - ``execution.failure.auth_misuse``
- Agent didn't pass credentials correctly.
* - ``execution.failure.state_error``
- Tool called in wrong state/order.
**Loops** — Detects behavioral patterns where the agent gets stuck
repeating tool calls. These are distinct from
``interaction.stagnation`` (conversation text repetition) and
``execution.failure`` (single tool errors) — these detect tool-level
behavioral loops.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``execution.loops.retry``
- Same tool with identical args ≥3 times.
* - ``execution.loops.parameter_drift``
- Same tool with varied args ≥3 times.
* - ``execution.loops.oscillation``
- Multi-tool A→B→A→B pattern ≥3 cycles.
Environment — external system / boundary conditions
---------------------------------------------------
**Exhaustion** — Detects failures and constraints arising from the
surrounding system rather than the agent's internal policy or reasoning.
These are external issues the agent cannot control.
.. list-table::
:header-rows: 1
:widths: 30 70
* - Leaf signal type
- Description
* - ``environment.exhaustion.api_error``
- 5xx errors, service unavailable.
* - ``environment.exhaustion.timeout``
- Connection/read timeouts.
* - ``environment.exhaustion.rate_limit``
- 429, quota exceeded.
* - ``environment.exhaustion.network``
- Connection refused, DNS errors.
* - ``environment.exhaustion.malformed_response``
- Invalid JSON, unexpected schema.
* - ``environment.exhaustion.context_overflow``
- Token/context limit exceeded.
How It Works
============
Signals are computed automatically by the gateway and emitted as **OpenTelemetry trace attributes** to your existing observability stack (Jaeger, Honeycomb, Grafana Tempo, etc.). No additional libraries or instrumentation required—just configure your OTEL collector endpoint.
Signals are computed automatically by the gateway after each assistant
response and emitted as **OpenTelemetry trace attributes** and **span events**
on your existing spans. No additional libraries or instrumentation are
required — just configure your OTEL collector endpoint as usual.
Each conversation trace is enriched with signal attributes that you can query, filter, and visualize in your observability platform. The gateway analyzes message content (performing text normalization, Unicode handling, and pattern matching) to compute behavioral signals in real-time.
Each conversation trace is enriched with layered signal attributes
(category-level counts and severities) plus one span event per detected
signal instance (with confidence, snippet, and per-detector metadata).
**OTEL Trace Attributes**
.. note::
Signal analysis is enabled by default and runs on the request path. It
does **not** affect the response sent to the client. Set
``overrides.disable_signals: true`` in your Plano config to skip this
CPU-heavy analysis (see the configuration reference).
Signal data is exported as structured span attributes:
OTel Span Attributes
====================
- ``signals.quality`` - Overall assessment (Excellent/Good/Neutral/Poor/Severe)
- ``signals.turn_count`` - Total number of turns in the conversation
- ``signals.efficiency_score`` - Efficiency metric (0.0-1.0)
- ``signals.repair.count`` - Number of repair attempts detected (when present)
- ``signals.repair.ratio`` - Ratio of repairs to user turns (when present)
- ``signals.frustration.count`` - Number of frustration indicators detected
- ``signals.frustration.severity`` - Frustration level (0-3)
- ``signals.repetition.count`` - Number of repetition instances detected
- ``signals.escalation.requested`` - Boolean escalation flag ("true" when present)
- ``signals.positive_feedback.count`` - Number of positive feedback indicators
Signal data is exported as structured OTel attributes. There are two tiers:
**top-level** attributes (always emitted on spans that carry signal
analysis) and **layered** attributes (emitted only when the corresponding
category has at least one signal instance).
**Visual Flag Marker**
Top-level attributes
--------------------
When concerning signals are detected (frustration, looping, escalation, or poor/severe quality), the flag marker **🚩** is automatically appended to the span's operation name, making problematic traces easy to spot in your trace visualizations.
Always emitted once signals are computed.
**Querying in Your Observability Platform**
.. list-table::
:header-rows: 1
:widths: 40 15 45
Example queries:
* - Attribute
- Type
- Value
* - ``signals.quality``
- string
- One of ``excellent``, ``good``, ``neutral``, ``poor``, ``severe``.
* - ``signals.quality_score``
- float
- Numeric score 0.0 100.0 that feeds the quality bucket.
* - ``signals.turn_count``
- int
- Total number of user + assistant turns in the interaction.
* - ``signals.efficiency_score``
- float
- Efficiency metric 0.0 1.0 (stays at 1.0 up to baseline turns,
then decays: ``1 / (1 + 0.3 * (turns - baseline))``).
- Find all severe interactions: ``signals.quality = "Severe"``
- Find flagged traces: search for **🚩** in span names
- Find long conversations: ``signals.turn_count > 10``
- Find inefficient interactions: ``signals.efficiency_score < 0.5``
- Find high repair rates: ``signals.repair.ratio > 0.3``
- Find frustrated users: ``signals.frustration.severity >= 2``
- Find looping agents: ``signals.repetition.count >= 3``
- Find positive interactions: ``signals.positive_feedback.count >= 2``
- Find escalations: ``signals.escalation.requested = "true"``
Layered attributes
------------------
Emitted per category, only when ``count > 0``. One ``.count`` and one
``.severity`` attribute per category. Severity is a 03 bucket (see
`Severity levels`_ below).
.. list-table::
:header-rows: 1
:widths: 50 50
* - Attribute (emitted when fired)
- Source
* - ``signals.interaction.misalignment.count``
- Any ``misalignment.*`` leaf type
* - ``signals.interaction.misalignment.severity``
- "
* - ``signals.interaction.stagnation.count``
- Any ``stagnation.*`` leaf type
* - ``signals.interaction.stagnation.severity``
- "
* - ``signals.interaction.disengagement.count``
- Any ``disengagement.*`` leaf type
* - ``signals.interaction.disengagement.severity``
- "
* - ``signals.interaction.satisfaction.count``
- Any ``satisfaction.*`` leaf type
* - ``signals.interaction.satisfaction.severity``
- "
* - ``signals.execution.failure.count``
- Any ``failure.*`` leaf type
* - ``signals.execution.failure.severity``
- "
* - ``signals.execution.loops.count``
- Any ``loops.*`` leaf type
* - ``signals.execution.loops.severity``
- "
* - ``signals.environment.exhaustion.count``
- Any ``exhaustion.*`` leaf type
* - ``signals.environment.exhaustion.severity``
- "
Legacy attributes (deprecated, still emitted)
---------------------------------------------
The following aggregate keys pre-date the paper taxonomy and are still
emitted for one release so existing dashboards keep working. They are
derived from the layered counts above and will be removed in a future
release. Migrate to the layered keys when convenient.
.. list-table::
:header-rows: 1
:widths: 50 50
* - Legacy attribute
- Layered equivalent
* - ``signals.follow_up.repair.count``
- ``signals.interaction.misalignment.count``
* - ``signals.follow_up.repair.ratio``
- (computed: ``misalignment.count / max(user_turns, 1)``)
* - ``signals.frustration.count``
- Count of ``disengagement.negative_stance`` instances
* - ``signals.frustration.severity``
- Derived severity bucket of the above
* - ``signals.repetition.count``
- ``signals.interaction.stagnation.count``
* - ``signals.escalation.requested``
- True if any ``disengagement.escalation`` or ``disengagement.quit`` fired
* - ``signals.positive_feedback.count``
- ``signals.interaction.satisfaction.count``
Span Events
===========
In addition to span attributes, every detected signal instance is emitted as
a span event named ``signal.<dotted-type>`` (e.g.
``signal.interaction.satisfaction.gratitude``). Each event carries:
.. list-table::
:header-rows: 1
:widths: 30 15 55
* - Event attribute
- Type
- Description
* - ``signal.type``
- string
- Full dotted signal type (same as the event name suffix).
* - ``signal.message_index``
- int
- Zero-based index of the message that triggered the signal.
* - ``signal.confidence``
- float
- Detector confidence in [0.0, 1.0].
* - ``signal.snippet``
- string
- Matched substring from the source message (when available).
* - ``signal.metadata``
- string (JSON)
- Per-detector metadata (pattern name, ratio values, etc.).
Span events are the right surface for drill-down: attribute filters narrow
traces, then events tell you *which messages* fired *which signals* with
*what evidence*.
Visual Flag Marker
------------------
When concerning signals are detected (disengagement present, stagnation
count > 2, any execution failure / loop, or overall quality ``poor``/
``severe``), the marker 🚩 (U+1F6A9) is appended to the span's operation
name.
This makes flagged sessions immediately visible in trace UIs without
requiring attribute filtering.
Querying in Your Observability Platform
---------------------------------------
Example queries against the layered keys::
signals.quality = "severe"
signals.turn_count > 10
signals.efficiency_score < 0.5
signals.interaction.disengagement.severity >= 2
signals.interaction.misalignment.count > 3
signals.interaction.satisfaction.count > 0 AND signals.quality = "good"
signals.execution.failure.count > 0
signals.environment.exhaustion.count > 0
For flagged sessions, search for 🚩 in span names.
.. image:: /_static/img/signals_trace.png
:width: 100%
:align: center
Severity Levels
===============
Core Signal Types
=================
The signals system tracks six categories of behavioral indicators.
Turn Count & Efficiency
-----------------------
**What it measures**
Number of userassistant exchanges.
**Why it matters**
Long conversations often indicate unclear intent resolution, confusion, or inefficiency. Very short conversations can correlate with crisp resolution.
**Key metrics**
- Total turn count
- Warning thresholds (concerning: >7 turns, excessive: >12 turns)
- Efficiency score (0.01.0)
**Efficiency scoring**
Baseline expectation is ~5 turns (tunable). Efficiency stays at 1.0 up to the baseline, then declines with an inverse penalty as turns exceed baseline::
efficiency = 1 / (1 + 0.3 * (turns - baseline))
Follow-Up & Repair Frequency
----------------------------
**What it measures**
How often users clarify, correct, or rephrase requests. This is a **user signal** tracking query reformulation behavior—when users must repair or rephrase their requests because the agent didn't understand or respond appropriately.
**Why it matters**
High repair frequency is a proxy for misunderstanding or intent drift. When users repeatedly rephrase the same request, it indicates the agent is failing to grasp or act on the user's intent.
**Key metrics**
- Repair count and ratio (repairs / user turns)
- Concerning threshold: >30% repair ratio
- Detected repair phrases (exact or fuzzy)
**Common patterns detected**
- Explicit corrections: "I meant", "correction"
- Negations: "No, I...", "that's not"
- Rephrasing: "let me rephrase", "to clarify"
- Mistake acknowledgment: "my mistake", "I was wrong"
- "Similar rephrase" heuristic based on token overlap (with stopwords downweighted)
User Frustration
----------------
**What it measures**
Observable frustration indicators and emotional escalation.
**Why it matters**
Catching frustration early enables intervention before users abandon or escalate.
**Detection patterns**
- **Complaints**: "this doesn't work", "not helpful", "waste of time"
- **Confusion**: "I don't understand", "makes no sense", "I'm confused"
- **Tone markers**:
- ALL CAPS (>=10 alphabetic chars and >=80% uppercase)
- Excessive punctuation (>=3 exclamation marks or >=3 question marks)
- **Profanity**: token-based (avoids substring false positives like "absolute" -> "bs")
**Severity levels**
- **None (0)**: no indicators
- **Mild (1)**: 12 indicators
- **Moderate (2)**: 34 indicators
- **Severe (3)**: 5+ indicators
Repetition & Looping
--------------------
**What it measures**
Assistant repetition / degenerative loops. This is an **assistant signal** tracking when the agent repeats itself, fails to follow instructions, or gets stuck in loops—indicating the agent is not making progress or adapting its responses.
**Why it matters**
Often indicates missing state tracking, broken tool integration, prompt issues, or the agent ignoring user corrections. High repetition means the agent is not learning from the conversation context.
**Detection method**
- Compare assistant messages using **bigram Jaccard similarity**
- Classify:
- **Exact**: similarity >= 0.85
- **Near-duplicate**: similarity >= 0.50
- Looping is flagged when repetition instances exceed 2 in a session.
**Severity levels**
Every category aggregates its leaf signal counts into a severity bucket used
by both the layered ``.severity`` attribute and the overall quality score.
- **None (0)**: 0 instances
- **Mild (1)**: 12 instances
- **Moderate (2)**: 34 instances
- **Severe (3)**: 5+ instances
Positive Feedback
-----------------
**What it measures**
User expressions of satisfaction, gratitude, and success.
**Why it matters**
Strong positive signals identify exemplar traces for prompt engineering and evaluation.
**Detection patterns**
- Gratitude: "thank you", "appreciate it"
- Satisfaction: "that's great", "awesome", "love it"
- Success confirmation: "got it", "that worked", "perfect"
**Confidence scoring**
- 1 indicator: 0.6
- 2 indicators: 0.8
- 3+ indicators: 0.95
Escalation Requests
-------------------
**What it measures**
Requests for human help/support or threats to quit.
**Why it matters**
Escalation is a strong signal that the agent failed to resolve the interaction.
**Detection patterns**
- Human requests: "speak to a human", "real person", "live agent"
- Support: "contact support", "customer service", "help desk"
- Quit threats: "I'm done", "forget it", "I give up"
Severity is always computed per-category. For example, three instances of
``misalignment.rephrase`` plus two of ``misalignment.correction`` yield
``signals.interaction.misalignment.severity = 3`` (5 instances total).
Overall Quality Assessment
==========================
Signals are aggregated into an overall interaction quality on a 5-point scale.
Signals are aggregated into an overall interaction quality on a 5-point
scale. The scoring model starts at 50.0 (neutral), adds positive weight for
satisfaction, and subtracts weight for disengagement, misalignment (when
ratio > 30% of user turns), stagnation (when count > 2), execution failures,
execution loops, and environment exhaustion.
**Excellent**
The resulting numeric score maps to the bucket emitted in ``signals.quality``:
**Excellent (75 100)**
Strong positive signals, efficient resolution, low friction.
**Good**
Mostly positive with minor clarifications; some back-and-forth but successful.
**Good (60 74)**
Mostly positive with minor clarifications; some back-and-forth but
successful.
**Neutral**
**Neutral (40 59)**
Mixed signals; neither clearly good nor bad.
**Poor**
Concerning negative patterns (high friction, multiple repairs, moderate frustration). High abandonment risk.
**Poor (25 39)**
Concerning negative patterns (high friction, multiple misalignments,
moderate disengagement, tool failures). High abandonment risk.
**Severe**
Critical issues—escalation requested, severe frustration, severe looping, or excessive turns (>12). Requires immediate attention.
**Severe (0 24)**
Critical issues — escalation requested, severe disengagement, severe
stagnation, or compounding failures. Requires immediate attention.
This assessment uses a scoring model that weighs positive factors (efficiency, positive feedback) against negative ones (frustration, repairs, repetition, escalation).
The raw numeric score is available under ``signals.quality_score``.
Sampling and Prioritization
===========================
In production, trace data is overwhelming. Signals provide a lightweight first layer of analysis to prioritize which sessions deserve review.
In production, trace data is overwhelming. Signals provide a lightweight
first layer of triage to select the small fraction of trajectories that are
most likely to be informative. Per the paper, signal-based sampling reaches
82% informativeness on τ-bench versus 54% for random sampling — a 1.52×
efficiency gain per informative trajectory.
Workflow:
1. Gateway captures conversation messages and computes signals
2. Signal attributes are emitted to OTEL spans automatically
2. Signal attributes and per-instance events are emitted to OTEL spans
3. Your observability platform ingests and indexes the attributes
4. Query/filter by signal attributes to surface outliers (poor/severe and exemplars)
4. Query / filter by signal attributes to surface outliers and exemplars
5. Review high-information traces to identify improvement opportunities
6. Update prompts, routing, or policies based on findings
7. Redeploy and monitor signal metrics to validate improvements
This creates a reinforcement loop where traces become both diagnostic data and training signal.
This creates a reinforcement loop where traces become both diagnostic data
and training signal for prompt engineering, routing policies, and
preference-data construction.
Trace Filtering and Telemetry
=============================
.. note::
An in-gateway triage sampler that selects informative trajectories
inline — with configurable per-category weights and budgets — is planned
as a follow-up to this release. Today, sampling is consumer-side: your
observability platform filters on the signal attributes described above.
Signal attributes are automatically added to OpenTelemetry spans, making them immediately queryable in your observability platform.
Example Span
============
**Visual Filtering**
A concerning session, showing both layered attributes and a per-instance
event::
When concerning signals are detected, the flag marker **🚩** (U+1F6A9) is automatically appended to the span's operation name. This makes flagged sessions immediately visible in trace visualizations without requiring attribute filtering.
# Span name: "POST /v1/chat/completions gpt-5.2 🚩"
**Example Span Attributes**::
# Top-level
signals.quality = "severe"
signals.quality_score = 0.0
signals.turn_count = 4
signals.efficiency_score = 1.0
# Span name: "POST /v1/chat/completions gpt-4 🚩"
signals.quality = "Severe"
signals.turn_count = 15
signals.efficiency_score = 0.234
signals.repair.count = 4
signals.repair.ratio = 0.571
signals.frustration.severity = 3
signals.frustration.count = 5
signals.escalation.requested = "true"
signals.repetition.count = 4
# Layered (only non-zero categories are emitted)
signals.interaction.disengagement.count = 6
signals.interaction.disengagement.severity = 3
**Building Dashboards**
# Legacy (deprecated, emitted while dual-emit is on)
signals.frustration.count = 4
signals.frustration.severity = 2
signals.escalation.requested = true
Use signal attributes to build monitoring dashboards in Grafana, Honeycomb, Datadog, etc.:
# Per-instance span events
event: signal.interaction.disengagement.escalation
signal.type = "interaction.disengagement.escalation"
signal.message_index = 6
signal.confidence = 1.0
signal.snippet = "get me a human"
signal.metadata = {"pattern_type":"escalation"}
Building Dashboards
===================
Use signal attributes to build monitoring dashboards in Grafana, Honeycomb,
Datadog, etc. Prefer the layered keys — they align with the paper taxonomy
and will outlive the legacy keys.
- **Quality distribution**: Count of traces by ``signals.quality``
- **P95 turn count**: 95th percentile of ``signals.turn_count``
- **Average efficiency**: Mean of ``signals.efficiency_score``
- **High repair rate**: Percentage where ``signals.repair.ratio > 0.3``
- **Frustration rate**: Percentage where ``signals.frustration.severity >= 2``
- **Escalation rate**: Percentage where ``signals.escalation.requested = "true"``
- **Looping rate**: Percentage where ``signals.repetition.count >= 3``
- **Positive feedback rate**: Percentage where ``signals.positive_feedback.count >= 1``
- **High misalignment rate**: Percentage where
``signals.interaction.misalignment.count > 3``
- **Disengagement rate**: Percentage where
``signals.interaction.disengagement.severity >= 2``
- **Satisfaction rate**: Percentage where
``signals.interaction.satisfaction.count >= 1``
- **Escalation rate**: Percentage where a ``disengagement.escalation`` or
``disengagement.quit`` event fired (via span-event filter)
- **Tool-failure rate**: Percentage where
``signals.execution.failure.count > 0``
- **Environment issue rate**: Percentage where
``signals.environment.exhaustion.count > 0``
**Creating Alerts**
Creating Alerts
===============
Set up alerts based on signal thresholds:
- Alert when severe interaction count exceeds threshold in 1-hour window
- Alert on sudden spike in frustration rate (>2x baseline)
- Alert when escalation rate exceeds 5% of total conversations
- Alert on degraded efficiency (P95 turn count increases >50%)
- Alert when ``signals.quality = "severe"`` count exceeds threshold in a
1-hour window
- Alert on sudden spike in
``signals.interaction.disengagement.severity >= 2`` (>2× baseline)
- Alert on sustained ``signals.execution.failure.count > 0`` — agent-caused
tool issues
- Alert on spikes in ``signals.environment.exhaustion.count`` — external
system degradation
- Alert on degraded efficiency (P95 ``signals.turn_count`` up > 50%)
Best Practices
==============
Start simple:
- Alert or page on **Severe** sessions (or on spikes in Severe rate)
- Review **Poor** sessions within 24 hours
- Sample **Excellent** sessions as exemplars
- Alert or page on ``severe`` sessions (or on spikes in ``severe`` rate)
- Review ``poor`` sessions within 24 hours
- Sample ``excellent`` sessions as exemplars
Combine multiple signals to infer failure modes:
- Looping: repetition severity >= 2 + excessive turns
- User giving up: frustration severity >= 2 + escalation requested
- Misunderstood intent: repair ratio > 30% + excessive turns
- Working well: positive feedback + high efficiency + no frustration
- **Silent loop**: ``signals.interaction.stagnation.severity >= 2`` +
``signals.turn_count`` above baseline
- **User giving up**: ``signals.interaction.disengagement.severity >= 2`` +
any escalation event
- **Misunderstood intent**:
``signals.interaction.misalignment.count / user_turns > 0.3``
- **Agent-caused friction**: ``signals.execution.failure.count > 0`` +
``signals.interaction.misalignment.count > 0``
- **External degradation, not agent fault**:
``signals.environment.exhaustion.count > 0`` while
``signals.execution.failure.count = 0``
- **Working well**: ``signals.interaction.satisfaction.count >= 1`` +
``signals.efficiency_score > 0.8`` + no disengagement
Limitations and Considerations
==============================
Signals dont capture:
Signals don't capture:
- Task completion / real outcomes
- Factual or domain correctness
@ -339,21 +608,31 @@ Signals dont capture:
Mitigation strategies:
- Periodically sample flagged sessions and measure false positives/negatives
- Periodically sample flagged sessions and measure false positives / negatives
- Tune baselines per use case and user population
- Add domain-specific phrase libraries where needed
- Combine signals with non-text metrics (tool failures, disconnects, latency)
.. note::
Behavioral signals complement—but do not replace—domain-specific response quality evaluation. Use signals to prioritize which traces to inspect, then apply domain expertise and outcome checks to diagnose root causes.
Behavioral signals complement — but do not replace — domain-specific
response quality evaluation. Use signals to prioritize which traces to
inspect, then apply domain expertise and outcome checks to diagnose root
causes.
.. tip::
The flag marker in the span name provides instant visual feedback in trace UIs, while the structured attributes (``signals.quality``, ``signals.frustration.severity``, etc.) enable powerful querying and aggregation in your observability platform.
The 🚩 marker in the span name provides instant visual feedback in
trace UIs, while the structured attributes (``signals.quality``,
``signals.interaction.disengagement.severity``, etc.) and per-instance
span events enable powerful querying and drill-down in your observability
platform.
See Also
========
- :doc:`../guides/observability/tracing` - Distributed tracing for agent systems
- :doc:`../guides/observability/monitoring` - Metrics and dashboards
- :doc:`../guides/observability/access_logging` - Request/response logging
- :doc:`../guides/observability/observability` - Complete observability guide
- `Signals: Trajectory Sampling and Triage for Agentic Interactions
<https://arxiv.org/abs/2604.00356>`_ — the paper this framework implements
- :doc:`../guides/observability/tracing` — Distributed tracing for agent
systems
- :doc:`../guides/observability/monitoring` — Metrics and dashboards
- :doc:`../guides/observability/access_logging` — Request / response logging
- :doc:`../guides/observability/observability` — Complete observability guide

View file

@ -17,7 +17,7 @@ from sphinxawesome_theme.postprocess import Icons
project = "Plano Docs"
copyright = "2026, Katanemo Labs, a DigitalOcean Company"
author = "Katanemo Labs, Inc"
release = " v0.4.20"
release = " v0.4.21"
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
@ -33,6 +33,7 @@ extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.intersphinx",
"sphinx.ext.extlinks",
"sphinx.ext.mathjax",
"sphinx.ext.viewcode",
"sphinx_sitemap",
"sphinx_design",
@ -41,6 +42,7 @@ extensions = [
"provider_models",
]
# Paths that contain templates, relative to this directory.
templates_path = ["_templates"]

View file

@ -43,7 +43,7 @@ Plano's CLI allows you to manage and interact with the Plano efficiently. To ins
.. code-block:: console
$ uv tool install planoai==0.4.20
$ uv tool install planoai==0.4.21
**Option 2: Install with pip (Traditional)**
@ -51,7 +51,7 @@ Plano's CLI allows you to manage and interact with the Plano efficiently. To ins
$ python -m venv venv
$ source venv/bin/activate # On Windows, use: venv\Scripts\activate
$ pip install planoai==0.4.20
$ pip install planoai==0.4.21
.. _llm_routing_quickstart:

View file

@ -147,38 +147,53 @@ Plano-Orchestrator analyzes each prompt to infer domain and action, then applies
Configuration
^^^^^^^^^^^^^
To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:
To configure preference-aligned dynamic routing, declare a top-level ``routing_preferences`` list and attach an ordered ``models`` candidate pool to each route. Starting in ``v0.4.0``, ``routing_preferences`` lives at the root of the config (not inline under ``model_providers``), which lets multiple models serve the same route — the first entry in ``models`` is primary, the rest are fallbacks that the client tries on ``429``/``5xx`` errors.
.. code-block:: yaml
:caption: Preference-Aligned Dynamic Routing Configuration
version: v0.4.0
listeners:
egress_traffic:
- name: egress_traffic
type: model
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
model_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
- name: complex reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
models:
- openai/gpt-5
- anthropic/claude-sonnet-4-5
- name: complex reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
models:
- openai/gpt-5
- name: creative writing
description: creative content generation, storytelling, and writing assistance
models:
- anthropic/claude-sonnet-4-5
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts
models:
- anthropic/claude-sonnet-4-5
- openai/gpt-5
.. note::
Configs still using the ``v0.3.0`` inline style (``routing_preferences`` nested under each ``model_provider``) are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update your config to the form above to silence the warning.
Client usage
^^^^^^^^^^^^
@ -253,6 +268,8 @@ Using Ollama (recommended for local development)
.. code-block:: yaml
version: v0.4.0
overrides:
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
@ -266,9 +283,12 @@ Using Ollama (recommended for local development)
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
models:
- anthropic/claude-sonnet-4-5
4. **Verify the model is running**
@ -322,6 +342,8 @@ vLLM provides higher throughput and GPU optimizations suitable for production de
.. code-block:: yaml
version: v0.4.0
overrides:
llm_routing_model: plano/Plano-Orchestrator
@ -335,9 +357,12 @@ vLLM provides higher throughput and GPU optimizations suitable for production de
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
models:
- anthropic/claude-sonnet-4-5
5. **Verify the server is running**
@ -468,22 +493,30 @@ You can combine static model selection with dynamic routing preferences for maxi
.. code-block:: yaml
:caption: Hybrid Routing Configuration
llm_providers:
version: v0.4.0
model_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5
access_key: $OPENAI_API_KEY
routing_preferences:
- name: complex_reasoning
description: deep analysis and complex problem solving
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative_tasks
description: creative writing and content generation
routing_preferences:
- name: complex_reasoning
description: deep analysis and complex problem solving
models:
- openai/gpt-5
- anthropic/claude-sonnet-4-5
- name: creative_tasks
description: creative writing and content generation
models:
- anthropic/claude-sonnet-4-5
- openai/gpt-5
model_aliases:
# Model aliases - friendly names that map to actual provider names

View file

@ -75,3 +75,54 @@ are some sample configuration files for both, respectively.
isDefault: true
access: proxy
editable: true
Brightstaff metrics
~~~~~~~~~~~~~~~~~~~
In addition to Envoy's stats on ``:9901``, the brightstaff dataplane
process exposes its own Prometheus endpoint on ``0.0.0.0:9092`` (override
with ``METRICS_BIND_ADDRESS``). It publishes:
* HTTP RED — ``brightstaff_http_requests_total``,
``brightstaff_http_request_duration_seconds``,
``brightstaff_http_in_flight_requests`` (labels: ``handler``, ``method``,
``status_class``).
* LLM upstream — ``brightstaff_llm_upstream_requests_total``,
``brightstaff_llm_upstream_duration_seconds``,
``brightstaff_llm_time_to_first_token_seconds``,
``brightstaff_llm_tokens_total`` (labels: ``provider``, ``model``,
``error_class``, ``kind``).
* Routing — ``brightstaff_router_decisions_total``,
``brightstaff_router_decision_duration_seconds``,
``brightstaff_routing_service_requests_total``,
``brightstaff_session_cache_events_total``.
* Process & build — ``process_resident_memory_bytes``,
``process_cpu_seconds_total``, ``brightstaff_build_info``.
A self-contained Prometheus + Grafana stack is shipped under
``config/grafana/``. With Plano already running on the host, bring it up
with one command:
.. code-block:: bash
cd config/grafana
docker compose up -d
open http://localhost:3000 # admin / admin (anonymous viewer also enabled)
Grafana auto-loads the Prometheus datasource and the brightstaff
dashboard (look under the *Plano* folder). Prometheus scrapes the host's
``:9092`` and ``:9901`` via ``host.docker.internal``.
Files:
* ``config/grafana/docker-compose.yaml`` — one-command Prom + Grafana
stack with provisioning.
* ``config/grafana/prometheus_scrape.yaml`` — complete Prometheus config
with ``envoy`` and ``brightstaff`` scrape jobs (mounted by the
compose).
* ``config/grafana/brightstaff_dashboard.json`` — 19-panel dashboard
across HTTP RED, LLM upstream, Routing service, and Process & Envoy
link rows. Auto-provisioned by the compose; can also be imported by
hand via *Dashboards → New → Import*.
* ``config/grafana/provisioning/`` — Grafana provisioning files for the
datasource and dashboard provider.

View file

@ -101,20 +101,20 @@ This creates a complete end-to-end trace showing the full request lifecycle thro
Behavioral Signals in Traces
----------------------------
Plano automatically enriches OpenTelemetry traces with :doc:`../../concepts/signals`behavioral quality indicators computed from conversation patterns. These signals are attached as span attributes, providing immediate visibility into interaction quality.
Plano automatically enriches OpenTelemetry traces with :doc:`../../concepts/signals`lightweight, model-free behavioral indicators organized into three layers (interaction, execution, environment) per `Chen et al., 2026 <https://arxiv.org/abs/2604.00356>`_. Signals are attached as span attributes and per-instance span events, providing immediate visibility into interaction quality.
**What Signals Provide**
Signals act as early warning indicators embedded in your traces:
- **Quality Assessment**: Overall interaction quality (Excellent/Good/Neutral/Poor/Severe)
- **Efficiency Metrics**: Turn count, efficiency scores, repair frequency
- **User Sentiment**: Frustration indicators, positive feedback, escalation requests
- **Agent Behavior**: Repetition detection, looping patterns
- **Quality Assessment**: Overall interaction quality (``excellent`` / ``good`` / ``neutral`` / ``poor`` / ``severe``) and numeric score
- **Interaction layer**: misalignment, stagnation, disengagement, satisfaction
- **Execution layer**: tool failures and loop patterns (from ``function_call`` / ``observation`` traces)
- **Environment layer**: exhaustion (API errors, timeouts, rate limits, context overflow)
**Visual Flag Markers**
When concerning signals are detected (frustration, looping, escalation, or poor/severe quality), Plano automatically appends a flag marker **🚩** to the span's operation name. This makes problematic traces immediately visible in your tracing UI without requiring additional queries.
When concerning signals are detected (disengagement, execution failures / loops, stagnation > 2, or ``poor`` / ``severe`` quality), Plano automatically appends a 🚩 marker to the span's operation name. This makes problematic traces immediately visible in your tracing UI without requiring additional queries.
**Example Span with Signals**::
@ -123,23 +123,37 @@ When concerning signals are detected (frustration, looping, escalation, or poor/
llm.model = "gpt-4"
llm.usage.total_tokens = 225
# Behavioral signal attributes:
signals.quality = "Severe"
signals.turn_count = 15
signals.efficiency_score = 0.234
signals.frustration.severity = 3
signals.escalation.requested = "true"
# Top-level signal attributes:
signals.quality = "severe"
signals.quality_score = 0.0
signals.turn_count = 15
signals.efficiency_score = 0.234
# Layered attributes (only non-zero categories are emitted):
signals.interaction.misalignment.count = 4
signals.interaction.misalignment.severity = 2
signals.interaction.disengagement.count = 5
signals.interaction.disengagement.severity = 3
# Per-instance span event:
event: signal.interaction.disengagement.escalation
signal.type = "interaction.disengagement.escalation"
signal.message_index = 14
signal.confidence = 1.0
signal.snippet = "get me a human"
**Querying Signal Data**
In your observability platform (Jaeger, Grafana Tempo, Datadog, etc.), filter traces by signal attributes:
- Find severe interactions: ``signals.quality = "Severe"``
- Find frustrated users: ``signals.frustration.severity >= 2``
- Find severe interactions: ``signals.quality = "severe"``
- Find disengaged users: ``signals.interaction.disengagement.severity >= 2``
- Find misaligned interactions: ``signals.interaction.misalignment.count > 3``
- Find tool failures: ``signals.execution.failure.count > 0``
- Find external issues: ``signals.environment.exhaustion.count > 0``
- Find inefficient flows: ``signals.efficiency_score < 0.5``
- Find escalations: ``signals.escalation.requested = "true"``
For complete details on all available signals, detection methods, and best practices, see the :doc:`../../concepts/signals` guide.
For complete details on all 20 leaf signal types, severity scheme, legacy attribute deprecation, and best practices, see the :doc:`../../concepts/signals` guide.
Custom Span Attributes

View file

@ -65,7 +65,7 @@ Create a ``docker-compose.yml`` file with the following configuration:
# docker-compose.yml
services:
plano:
image: katanemo/plano:0.4.20
image: katanemo/plano:0.4.21
container_name: plano
ports:
- "10000:10000" # ingress (client -> plano)
@ -153,7 +153,7 @@ Create a ``plano-deployment.yaml``:
spec:
containers:
- name: plano
image: katanemo/plano:0.4.20
image: katanemo/plano:0.4.21
ports:
- containerPort: 12000 # LLM gateway (chat completions, model routing)
name: llm-gateway

View file

@ -1,5 +1,5 @@
# Plano Gateway configuration version
version: v0.3.0
version: v0.4.0
# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
agents:
@ -32,17 +32,8 @@ model_providers:
- model: mistral/ministral-3b-latest
access_key: $MISTRAL_API_KEY
# routing_preferences: tags a model with named capabilities so Plano's LLM router
# can select the best model for each request based on intent. Requires the
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
# Each preference has a name (short label) and a description (used for intent matching).
- model: groq/llama-3.3-70b-versatile
access_key: $GROQ_API_KEY
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
- name: code review
description: reviewing, analyzing, and suggesting improvements to existing code
# passthrough_auth: forwards the client's Authorization header upstream instead of
# using the configured access_key. Useful for LiteLLM or similar proxy setups.
@ -64,6 +55,29 @@ model_aliases:
smart-llm:
target: gpt-4o
# routing_preferences: top-level list that tags named task categories with an
# ordered pool of candidate models. Plano's LLM router matches incoming requests
# against these descriptions and returns an ordered list of models; the client
# uses models[0] as primary and retries with models[1], models[2]... on 429/5xx.
# Requires overrides.llm_routing_model to point at Plano-Orchestrator (or equivalent).
# Each model in `models` must be declared in model_providers above.
# selection_policy is optional: {prefer: cheapest|fastest|none} lets the router
# reorder candidates using live cost/latency data from model_metrics_sources.
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
models:
- anthropic/claude-sonnet-4-0
- openai/gpt-4o
- groq/llama-3.3-70b-versatile
- name: code review
description: reviewing, analyzing, and suggesting improvements to existing code
models:
- anthropic/claude-sonnet-4-0
- groq/llama-3.3-70b-versatile
selection_policy:
prefer: cheapest
# HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
listeners:
# Agent listener for routing requests to multiple agents
@ -173,6 +187,9 @@ overrides:
llm_routing_model: Plano-Orchestrator
# Model used for agent orchestration (must be listed in model_providers)
agent_orchestration_model: Plano-Orchestrator
# Disable agentic signal analysis (frustration, repetition, escalation, etc.)
# on LLM responses to save CPU. Default: false.
disable_signals: false
# Model affinity — pin routing decisions for agentic loops
routing:

View file

@ -69,12 +69,6 @@ listeners:
model: llama-3.3-70b-versatile
name: groq/llama-3.3-70b-versatile
provider_interface: groq
routing_preferences:
- description: generating new code snippets, functions, or boilerplate based on
user prompts or requirements
name: code generation
- description: reviewing, analyzing, and suggesting improvements to existing code
name: code review
- base_url: https://litellm.example.com
cluster_name: openai_litellm.example.com
endpoint: litellm.example.com
@ -131,12 +125,6 @@ model_providers:
model: llama-3.3-70b-versatile
name: groq/llama-3.3-70b-versatile
provider_interface: groq
routing_preferences:
- description: generating new code snippets, functions, or boilerplate based on
user prompts or requirements
name: code generation
- description: reviewing, analyzing, and suggesting improvements to existing code
name: code review
- base_url: https://litellm.example.com
cluster_name: openai_litellm.example.com
endpoint: litellm.example.com
@ -170,6 +158,7 @@ model_providers:
provider_interface: plano
overrides:
agent_orchestration_model: Plano-Orchestrator
disable_signals: false
llm_routing_model: Plano-Orchestrator
optimize_context_window: true
prompt_target_intent_matching_threshold: 0.7
@ -220,6 +209,21 @@ routing:
type: memory
session_max_entries: 10000
session_ttl_seconds: 600
routing_preferences:
- description: generating new code snippets, functions, or boilerplate based on user
prompts or requirements
models:
- anthropic/claude-sonnet-4-0
- openai/gpt-4o
- groq/llama-3.3-70b-versatile
name: code generation
- description: reviewing, analyzing, and suggesting improvements to existing code
models:
- anthropic/claude-sonnet-4-0
- groq/llama-3.3-70b-versatile
name: code review
selection_policy:
prefer: cheapest
state_storage:
type: memory
system_prompt: 'You are a helpful assistant. Always respond concisely and accurately.
@ -236,4 +240,4 @@ tracing:
environment: production
service.team: platform
trace_arch_internal: false
version: v0.3.0
version: v0.4.0

View file

@ -312,20 +312,24 @@ When a request does not match any routing preference, Plano forwards it to the `
**Incorrect (no default provider set):**
```yaml
version: v0.3.0
version: v0.4.0
model_providers:
- model: openai/gpt-4o-mini # No default: true anywhere
access_key: $OPENAI_API_KEY
routing_preferences:
- name: summarization
description: Summarizing documents and extracting key points
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code_generation
description: Writing new functions and implementing algorithms
routing_preferences:
- name: summarization
description: Summarizing documents and extracting key points
models:
- openai/gpt-4o-mini
- name: code_generation
description: Writing new functions and implementing algorithms
models:
- openai/gpt-4o
```
**Incorrect (multiple defaults — ambiguous):**
@ -344,25 +348,35 @@ model_providers:
**Correct (exactly one default, covering unmatched requests):**
```yaml
version: v0.3.0
version: v0.4.0
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true # Handles general/unclassified requests
routing_preferences:
- name: summarization
description: Summarizing documents, articles, and meeting notes
- name: classification
description: Categorizing inputs, labeling, and intent detection
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code_generation
description: Writing, debugging, and reviewing code
- name: complex_reasoning
description: Multi-step math, logical analysis, research synthesis
routing_preferences:
- name: summarization
description: Summarizing documents, articles, and meeting notes
models:
- openai/gpt-4o-mini
- openai/gpt-4o
- name: classification
description: Categorizing inputs, labeling, and intent detection
models:
- openai/gpt-4o-mini
- name: code_generation
description: Writing, debugging, and reviewing code
models:
- openai/gpt-4o
- openai/gpt-4o-mini
- name: complex_reasoning
description: Multi-step math, logical analysis, research synthesis
models:
- openai/gpt-4o
```
Choose your most cost-effective capable model as the default — it handles all traffic that doesn't match specialized preferences.
@ -498,21 +512,27 @@ model_providers:
**Combined: proxy for some models, Plano-managed for others:**
```yaml
version: v0.4.0
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY # Plano manages this key
default: true
routing_preferences:
- name: quick tasks
description: Short answers, simple lookups, fast completions
- model: custom/vllm-llama
base_url: http://gpu-server:8000
provider_interface: openai
passthrough_auth: true # vLLM cluster handles its own auth
routing_preferences:
- name: long context
description: Processing very long documents, multi-document analysis
routing_preferences:
- name: quick tasks
description: Short answers, simple lookups, fast completions
models:
- openai/gpt-4o-mini
- name: long context
description: Processing very long documents, multi-document analysis
models:
- custom/vllm-llama
```
Reference: https://github.com/katanemo/archgw
@ -526,67 +546,100 @@ Reference: https://github.com/katanemo/archgw
## Write Task-Specific Routing Preference Descriptions
Plano's `plano_orchestrator_v1` router uses a 1.5B preference-aligned LLM to classify incoming requests against your `routing_preferences` descriptions. It routes the request to the first provider whose preferences match. Description quality directly determines routing accuracy.
Plano's `plano_orchestrator_v1` router uses a 1.5B preference-aligned LLM to classify incoming requests against your `routing_preferences` descriptions. It returns an ordered `models` list for the matched route; the client uses `models[0]` as primary and falls back to `models[1]`, `models[2]`... on `429`/`5xx` errors. Description quality directly determines routing accuracy.
Starting in `v0.4.0`, `routing_preferences` lives at the **top level** of the config and each entry carries its own `models: [...]` candidate pool. Listing multiple models under a single route gives you automatic provider fallback without extra client logic. Configs still using the legacy v0.3.0 inline shape (under each `model_provider`) are auto-migrated with a deprecation warning — prefer the top-level form below.
**Incorrect (vague, overlapping descriptions):**
```yaml
version: v0.4.0
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
routing_preferences:
- name: simple
description: easy tasks # Too vague — what is "easy"?
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: hard
description: hard tasks # Too vague — overlaps with "easy"
routing_preferences:
- name: simple
description: easy tasks # Too vague — what is "easy"?
models:
- openai/gpt-4o-mini
- name: hard
description: hard tasks # Too vague — overlaps with "easy"
models:
- openai/gpt-4o
```
**Correct (specific, distinct task descriptions):**
**Correct (specific, distinct task descriptions, multi-model fallbacks):**
```yaml
version: v0.4.0
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
routing_preferences:
- name: summarization
description: >
Summarizing documents, articles, emails, or meeting transcripts.
Extracting key points, generating TL;DR sections, condensing long text.
- name: classification
description: >
Categorizing inputs, sentiment analysis, spam detection,
intent classification, labeling structured data fields.
- name: translation
description: >
Translating text between languages, localization tasks.
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code_generation
description: >
Writing new functions, classes, or modules from scratch.
Implementing algorithms, boilerplate generation, API integrations.
- name: code_review
description: >
Reviewing code for bugs, security vulnerabilities, performance issues.
Suggesting refactors, explaining complex code, debugging errors.
- name: complex_reasoning
description: >
Multi-step math problems, logical deduction, strategic planning,
research synthesis requiring chain-of-thought reasoning.
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: summarization
description: >
Summarizing documents, articles, emails, or meeting transcripts.
Extracting key points, generating TL;DR sections, condensing long text.
models:
- openai/gpt-4o-mini
- openai/gpt-4o
- name: classification
description: >
Categorizing inputs, sentiment analysis, spam detection,
intent classification, labeling structured data fields.
models:
- openai/gpt-4o-mini
- name: translation
description: >
Translating text between languages, localization tasks.
models:
- openai/gpt-4o-mini
- anthropic/claude-sonnet-4-5
- name: code_generation
description: >
Writing new functions, classes, or modules from scratch.
Implementing algorithms, boilerplate generation, API integrations.
models:
- openai/gpt-4o
- anthropic/claude-sonnet-4-5
- name: code_review
description: >
Reviewing code for bugs, security vulnerabilities, performance issues.
Suggesting refactors, explaining complex code, debugging errors.
models:
- anthropic/claude-sonnet-4-5
- openai/gpt-4o
- name: complex_reasoning
description: >
Multi-step math problems, logical deduction, strategic planning,
research synthesis requiring chain-of-thought reasoning.
models:
- openai/gpt-4o
- anthropic/claude-sonnet-4-5
```
**Key principles for good preference descriptions:**
- Use concrete action verbs: "writing", "reviewing", "translating", "summarizing"
- List 35 specific sub-tasks or synonyms for each preference
- Ensure preferences across providers are mutually exclusive in scope
- Ensure preferences across routes are mutually exclusive in scope
- Order `models` from most preferred to least — the client falls back in order on `429`/`5xx`
- List multiple models under one route for automatic provider fallback without extra client logic
- Every model listed in `models` must be declared in `model_providers`
- Test with representative queries using `planoai trace` and `--where` filters to verify routing decisions
Reference: https://github.com/katanemo/archgw
@ -1451,7 +1504,7 @@ planoai cli_agent claude --path /path/to/project
**Recommended config for Claude Code routing:**
```yaml
version: v0.3.0
version: v0.4.0
listeners:
- type: model
@ -1462,19 +1515,25 @@ model_providers:
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_API_KEY
default: true
routing_preferences:
- name: general coding
description: >
Writing code, debugging, code review, explaining concepts,
answering programming questions, general development tasks.
- model: anthropic/claude-opus-4-6
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: complex architecture
description: >
System design, complex refactoring across many files,
architectural decisions, performance optimization, security audits.
routing_preferences:
- name: general coding
description: >
Writing code, debugging, code review, explaining concepts,
answering programming questions, general development tasks.
models:
- anthropic/claude-sonnet-4-20250514
- anthropic/claude-opus-4-6
- name: complex architecture
description: >
System design, complex refactoring across many files,
architectural decisions, performance optimization, security audits.
models:
- anthropic/claude-opus-4-6
- anthropic/claude-sonnet-4-20250514
model_aliases:
claude.fast.v1:
@ -1861,28 +1920,36 @@ listeners:
**Multi-listener architecture (serves all client types):**
```yaml
version: v0.3.0
version: v0.4.0
# --- Shared model providers ---
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
routing_preferences:
- name: quick tasks
description: Short answers, formatting, classification, simple generation
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: complex reasoning
description: Multi-step analysis, code generation, research synthesis
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: long documents
description: Summarizing or analyzing very long documents, PDFs, transcripts
# --- Shared routing_preferences (top-level, v0.4.0+) ---
routing_preferences:
- name: quick tasks
description: Short answers, formatting, classification, simple generation
models:
- openai/gpt-4o-mini
- name: complex reasoning
description: Multi-step analysis, code generation, research synthesis
models:
- openai/gpt-4o
- anthropic/claude-sonnet-4-20250514
- name: long documents
description: Summarizing or analyzing very long documents, PDFs, transcripts
models:
- anthropic/claude-sonnet-4-20250514
- openai/gpt-4o
# --- Listener 1: OpenAI-compatible API gateway ---
# For: SDK clients, Claude Code, LangChain, etc.

View file

@ -7,67 +7,100 @@ tags: routing, model-selection, preferences, llm-routing
## Write Task-Specific Routing Preference Descriptions
Plano's `plano_orchestrator_v1` router uses a 1.5B preference-aligned LLM to classify incoming requests against your `routing_preferences` descriptions. It routes the request to the first provider whose preferences match. Description quality directly determines routing accuracy.
Plano's `plano_orchestrator_v1` router uses a 1.5B preference-aligned LLM to classify incoming requests against your `routing_preferences` descriptions. It returns an ordered `models` list for the matched route; the client uses `models[0]` as primary and falls back to `models[1]`, `models[2]`... on `429`/`5xx` errors. Description quality directly determines routing accuracy.
Starting in `v0.4.0`, `routing_preferences` lives at the **top level** of the config and each entry carries its own `models: [...]` candidate pool. Configs still using the legacy v0.3.0 inline shape (under each `model_provider`) are auto-migrated with a deprecation warning — prefer the top-level form below.
**Incorrect (vague, overlapping descriptions):**
```yaml
version: v0.4.0
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
routing_preferences:
- name: simple
description: easy tasks # Too vague — what is "easy"?
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: hard
description: hard tasks # Too vague — overlaps with "easy"
routing_preferences:
- name: simple
description: easy tasks # Too vague — what is "easy"?
models:
- openai/gpt-4o-mini
- name: hard
description: hard tasks # Too vague — overlaps with "easy"
models:
- openai/gpt-4o
```
**Correct (specific, distinct task descriptions):**
**Correct (specific, distinct task descriptions, multi-model fallbacks):**
```yaml
version: v0.4.0
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
routing_preferences:
- name: summarization
description: >
Summarizing documents, articles, emails, or meeting transcripts.
Extracting key points, generating TL;DR sections, condensing long text.
- name: classification
description: >
Categorizing inputs, sentiment analysis, spam detection,
intent classification, labeling structured data fields.
- name: translation
description: >
Translating text between languages, localization tasks.
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code_generation
description: >
Writing new functions, classes, or modules from scratch.
Implementing algorithms, boilerplate generation, API integrations.
- name: code_review
description: >
Reviewing code for bugs, security vulnerabilities, performance issues.
Suggesting refactors, explaining complex code, debugging errors.
- name: complex_reasoning
description: >
Multi-step math problems, logical deduction, strategic planning,
research synthesis requiring chain-of-thought reasoning.
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: summarization
description: >
Summarizing documents, articles, emails, or meeting transcripts.
Extracting key points, generating TL;DR sections, condensing long text.
models:
- openai/gpt-4o-mini
- openai/gpt-4o
- name: classification
description: >
Categorizing inputs, sentiment analysis, spam detection,
intent classification, labeling structured data fields.
models:
- openai/gpt-4o-mini
- name: translation
description: >
Translating text between languages, localization tasks.
models:
- openai/gpt-4o-mini
- anthropic/claude-sonnet-4-5
- name: code_generation
description: >
Writing new functions, classes, or modules from scratch.
Implementing algorithms, boilerplate generation, API integrations.
models:
- openai/gpt-4o
- anthropic/claude-sonnet-4-5
- name: code_review
description: >
Reviewing code for bugs, security vulnerabilities, performance issues.
Suggesting refactors, explaining complex code, debugging errors.
models:
- anthropic/claude-sonnet-4-5
- openai/gpt-4o
- name: complex_reasoning
description: >
Multi-step math problems, logical deduction, strategic planning,
research synthesis requiring chain-of-thought reasoning.
models:
- openai/gpt-4o
- anthropic/claude-sonnet-4-5
```
**Key principles for good preference descriptions:**
- Use concrete action verbs: "writing", "reviewing", "translating", "summarizing"
- List 35 specific sub-tasks or synonyms for each preference
- Ensure preferences across providers are mutually exclusive in scope
- Ensure preferences across routes are mutually exclusive in scope
- Order `models` from most preferred to least — the client will fall back in order on `429`/`5xx`
- List multiple models under one route to get automatic provider fallback without additional client logic
- Every model listed in `models` must be declared in `model_providers`
- Test with representative queries using `planoai trace` and `--where` filters to verify routing decisions
Reference: https://github.com/katanemo/archgw
Reference: [Routing API](../../docs/routing-api.md) · https://github.com/katanemo/archgw

4
tests/parity/signals/.gitignore vendored Normal file
View file

@ -0,0 +1,4 @@
out/
.venv/
__pycache__/
*.pyc

View file

@ -0,0 +1,98 @@
# Signals Parity Harness
Validates that `crates/brightstaff/src/signals/` (Rust port) produces the same
`SignalReport` as the Python reference at <https://github.com/katanemo/signals>
on a fixed sample of `lmsys/lmsys-chat-1m` conversations.
This harness is **not** part of normal CI. It downloads several GB and is run
on demand to gate releases of the signals subsystem (or to investigate
regressions reported in production).
## What gets compared
For each conversation, both analyzers emit a `SignalReport`. The comparator
classifies any divergence into three tiers:
| Tier | Field | Action on divergence |
|------|------------------------------------------------|----------------------|
| A | set of `SignalType` present, per-type counts, `overall_quality` | Fail the run |
| B | per-instance `message_index`, instance counts per type | Log + collect, do not fail |
| C | metadata, snippet text, summary | Information only |
Quality buckets are compared by string (`excellent` / `good` / ...).
## What this harness does *not* cover
`lmsys-chat-1m` is plain user/assistant chat. It exercises the **interaction**
layer well (misalignment, stagnation, disengagement, satisfaction) but does
**not** exercise:
- `execution.failure.*`
- `execution.loops.*`
- `environment.exhaustion.*`
Those signals require `function_call` / `observation` ShareGPT roles. They are
covered by the Rust unit tests and the Python repo's own test fixtures, both
of which run on every PR. A synthetic tool-trace dataset for full coverage is
deferred to a follow-up.
## One-time setup
```bash
# 1. Build the Rust replay binary.
cd ../../../crates && cargo build --release -p brightstaff --bin signals_replay
# 2. Set up the Python environment for the harness driver.
cd ../tests/parity/signals
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 3. Install the Python signals reference.
# Either point at a local checkout:
pip install -e /path/to/signals
# or pull from git:
pip install 'signals @ git+https://github.com/katanemo/signals@<sha>'
```
## Running
```bash
source .venv/bin/activate
python run_parity.py \
--num-samples 2000 \
--seed 42 \
--dataset-revision <hf-dataset-revision-sha> \
--rust-binary ../../../crates/target/release/signals_replay \
--output-dir out/
python compare.py --output-dir out/
```
`run_parity.py` will:
1. Download `lmsys/lmsys-chat-1m` (cached in `~/.cache/huggingface`).
2. Pick `--num-samples` rows under `--seed`.
3. Convert each to ShareGPT, write `out/conversations.jsonl`.
4. Run the Rust binary as a subprocess → `out/rust_reports.jsonl`.
5. Run the Python analyzer in-process → `out/python_reports.jsonl`.
`compare.py` reads both report files and writes:
- `out/diffs.jsonl` — one record per mismatched conversation, with tier + structural diff
- `out/metrics.json` — agreement %, per-`SignalType` confusion matrix, quality-bucket confusion matrix
- `out/summary.md` — human-readable PR-ready report
Exit code is non-zero iff any Tier-A divergence is observed.
## Reproducibility
Every run pins:
- `dataset_revision` — the HF dataset commit
- `seed` — RNG seed for sampling
- `signals_python_version``pip show signals` version
- `plano_git_sha``git rev-parse HEAD` of this repo
- `signals_replay_binary_sha256` — the hash of the Rust bin
All are stamped into `metrics.json`.

View file

@ -0,0 +1,103 @@
#!/usr/bin/env python3
"""
Local smoke test for the parity harness runs both runners on a tiny
hand-picked set of conversations without touching the lmsys dataset.
Run from this directory:
python _smoke_test.py --rust-binary <path>
"""
from __future__ import annotations
import argparse
import json
import subprocess
import sys
from pathlib import Path
from signals.analyzer import SignalAnalyzer
SAMPLES = [
{
"id": "smoke-gratitude",
"messages": [
{"from": "human", "value": "What is the weather in Istanbul?"},
{"from": "gpt", "value": "Istanbul is 14C and partly cloudy."},
{"from": "human", "value": "That worked, exactly what I needed. Thanks!"},
],
},
{
"id": "smoke-escalation",
"messages": [
{"from": "human", "value": "This isn't helpful at all"},
{"from": "gpt", "value": "I'm sorry, can you tell me more?"},
{"from": "human", "value": "Get me a human, this is useless"},
],
},
{
"id": "smoke-correction",
"messages": [
{"from": "human", "value": "Book me a flight to NYC for tomorrow"},
{"from": "gpt", "value": "Sure, here are flights to NYC for Friday."},
{
"from": "human",
"value": "No, I meant flights for Saturday, not tomorrow",
},
],
},
{
"id": "smoke-clean",
"messages": [
{"from": "human", "value": "Hi"},
{"from": "gpt", "value": "Hello, how can I help?"},
],
},
{
"id": "smoke-rephrase",
"messages": [
{"from": "human", "value": "Can you summarize the news please"},
{"from": "gpt", "value": "Sure, here is a summary."},
{"from": "human", "value": "Could you please summarize the news"},
],
},
]
def main() -> int:
p = argparse.ArgumentParser()
p.add_argument("--rust-binary", required=True, type=Path)
args = p.parse_args()
out_dir = Path("out_smoke")
out_dir.mkdir(exist_ok=True)
conv_path = out_dir / "conversations.jsonl"
rust_path = out_dir / "rust_reports.jsonl"
py_path = out_dir / "python_reports.jsonl"
with conv_path.open("w") as f:
for s in SAMPLES:
f.write(json.dumps(s) + "\n")
with conv_path.open("rb") as fin, rust_path.open("wb") as fout:
proc = subprocess.run(
[str(args.rust_binary)], stdin=fin, stdout=fout, stderr=subprocess.PIPE
)
if proc.returncode != 0:
sys.stderr.write(proc.stderr.decode("utf-8", errors="replace"))
return 2
analyzer = SignalAnalyzer()
with conv_path.open() as fin, py_path.open("w") as fout:
for line in fin:
obj = json.loads(line)
r = analyzer.analyze(obj["messages"])
fout.write(json.dumps({"id": obj["id"], "report": r.to_dict()}) + "\n")
rc = subprocess.call(
[sys.executable, "compare.py", "--output-dir", str(out_dir)],
)
return rc
if __name__ == "__main__":
sys.exit(main())

View file

@ -0,0 +1,333 @@
#!/usr/bin/env python3
"""
Diff Rust vs Python signal reports produced by run_parity.py.
See README.md for the tier definitions. Exits non-zero iff any Tier-A
divergence is found.
"""
from __future__ import annotations
import argparse
import json
import sys
from collections import Counter, defaultdict
from pathlib import Path
from typing import Any, Dict, List, Tuple
CATEGORIES_BY_LAYER = {
"interaction_signals": [
"misalignment",
"stagnation",
"disengagement",
"satisfaction",
],
"execution_signals": ["failure", "loops"],
"environment_signals": ["exhaustion"],
}
def parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument("--output-dir", type=Path, default=Path("out"))
return p.parse_args()
def load_jsonl(path: Path) -> Dict[str, Dict[str, Any]]:
"""Load a JSONL file keyed by `id`. Lines with errors are still indexed."""
out: Dict[str, Dict[str, Any]] = {}
with path.open() as f:
for line in f:
line = line.strip()
if not line:
continue
obj = json.loads(line)
out[str(obj.get("id"))] = obj
return out
def per_type_counts(report: Dict[str, Any]) -> Dict[str, int]:
"""Return {signal_type: count} across all groups in a report dict."""
counts: Counter[str] = Counter()
for layer in CATEGORIES_BY_LAYER:
groups = report.get(layer, {}) or {}
for category in CATEGORIES_BY_LAYER[layer]:
group = groups.get(category)
if not group:
continue
for sig in group.get("signals", []) or []:
counts[sig["signal_type"]] += 1
return dict(counts)
def per_type_indices(report: Dict[str, Any]) -> Dict[str, List[int]]:
out: Dict[str, List[int]] = defaultdict(list)
for layer in CATEGORIES_BY_LAYER:
groups = report.get(layer, {}) or {}
for category in CATEGORIES_BY_LAYER[layer]:
group = groups.get(category)
if not group:
continue
for sig in group.get("signals", []) or []:
out[sig["signal_type"]].append(sig.get("message_index"))
for k in out:
out[k].sort(key=lambda x: (x is None, x))
return dict(out)
def diff_counts(a: Dict[str, int], b: Dict[str, int]) -> List[Tuple[str, int, int]]:
"""Return [(signal_type, a_count, b_count)] for entries that differ."""
keys = set(a) | set(b)
out = []
for k in sorted(keys):
ac = a.get(k, 0)
bc = b.get(k, 0)
if ac != bc:
out.append((k, ac, bc))
return out
def diff_indices(
a: Dict[str, List[int]], b: Dict[str, List[int]]
) -> List[Tuple[str, List[int], List[int]]]:
keys = set(a) | set(b)
out = []
for k in sorted(keys):
ai = a.get(k, [])
bi = b.get(k, [])
if ai != bi:
out.append((k, ai, bi))
return out
def compare_one(
convo_id: str, py: Dict[str, Any], rust: Dict[str, Any]
) -> Dict[str, Any] | None:
"""Compare a single conversation. Return diff record, or None if identical."""
if "error" in py or "error" in rust:
return {
"id": convo_id,
"tier": "A",
"kind": "error_in_runner",
"python_error": py.get("error"),
"rust_error": rust.get("error"),
}
py_report = py["report"]
rust_report = rust["report"]
py_counts = per_type_counts(py_report)
rust_counts = per_type_counts(rust_report)
count_diff = diff_counts(py_counts, rust_counts)
py_quality = py_report.get("overall_quality")
rust_quality = rust_report.get("overall_quality")
quality_mismatch = py_quality != rust_quality
if count_diff or quality_mismatch:
return {
"id": convo_id,
"tier": "A",
"kind": "signal_or_quality_mismatch",
"quality": {"python": py_quality, "rust": rust_quality},
"count_diff": [
{"signal_type": st, "python": pc, "rust": rc}
for (st, pc, rc) in count_diff
],
}
py_idx = per_type_indices(py_report)
rust_idx = per_type_indices(rust_report)
idx_diff = diff_indices(py_idx, rust_idx)
if idx_diff:
return {
"id": convo_id,
"tier": "B",
"kind": "instance_index_mismatch",
"diff": [
{"signal_type": st, "python_indices": pi, "rust_indices": ri}
for (st, pi, ri) in idx_diff
],
}
return None
def confusion_matrix(
pairs: List[Tuple[str, str]], labels: List[str]
) -> Dict[str, Dict[str, int]]:
cm: Dict[str, Dict[str, int]] = {a: {b: 0 for b in labels} for a in labels}
for py, rust in pairs:
if py not in cm:
cm[py] = {b: 0 for b in labels}
if rust not in cm[py]:
cm[py][rust] = 0
cm[py][rust] += 1
return cm
def main() -> int:
args = parse_args()
out_dir = args.output_dir
py_reports = load_jsonl(out_dir / "python_reports.jsonl")
rust_reports = load_jsonl(out_dir / "rust_reports.jsonl")
common_ids = sorted(set(py_reports) & set(rust_reports))
only_py = sorted(set(py_reports) - set(rust_reports))
only_rust = sorted(set(rust_reports) - set(py_reports))
diffs: List[Dict[str, Any]] = []
quality_pairs: List[Tuple[str, str]] = []
per_type_total = Counter()
per_type_disagree = Counter()
tier_a = 0
tier_b = 0
for cid in common_ids:
d = compare_one(cid, py_reports[cid], rust_reports[cid])
if d is None:
quality_pairs.append(
(
py_reports[cid]["report"]["overall_quality"],
rust_reports[cid]["report"]["overall_quality"],
)
)
for st, _ in per_type_counts(py_reports[cid]["report"]).items():
per_type_total[st] += 1
else:
diffs.append(d)
if d["tier"] == "A":
tier_a += 1
elif d["tier"] == "B":
tier_b += 1
if "report" in py_reports[cid] and "report" in rust_reports[cid]:
quality_pairs.append(
(
py_reports[cid]["report"].get("overall_quality", "?"),
rust_reports[cid]["report"].get("overall_quality", "?"),
)
)
for cd in d.get("count_diff", []) or []:
per_type_disagree[cd["signal_type"]] += 1
per_type_total[cd["signal_type"]] += 1
n_total = len(common_ids)
n_match = n_total - len(diffs)
agreement = (n_match / n_total) if n_total else 0.0
quality_labels = ["excellent", "good", "neutral", "poor", "severe"]
cm = confusion_matrix(quality_pairs, quality_labels)
metrics = {
"n_python_reports": len(py_reports),
"n_rust_reports": len(rust_reports),
"n_common": n_total,
"n_only_python": len(only_py),
"n_only_rust": len(only_rust),
"n_full_match": n_match,
"agreement_pct": round(100.0 * agreement, 4),
"tier_a_divergences": tier_a,
"tier_b_divergences": tier_b,
"quality_confusion_matrix": cm,
"per_signal_type_total": dict(per_type_total),
"per_signal_type_disagree": dict(per_type_disagree),
}
# Pull in run metadata if present.
rm_path = out_dir / "run_metadata.json"
if rm_path.exists():
metrics["run_metadata"] = json.loads(rm_path.read_text())
(out_dir / "metrics.json").write_text(json.dumps(metrics, indent=2))
with (out_dir / "diffs.jsonl").open("w") as f:
for d in diffs:
f.write(json.dumps(d, ensure_ascii=False))
f.write("\n")
write_summary_md(out_dir / "summary.md", metrics, diffs[:20])
print(
json.dumps(
{k: v for k, v in metrics.items() if k != "quality_confusion_matrix"},
indent=2,
)
)
print(f"\ndiffs: {out_dir / 'diffs.jsonl'} metrics: {out_dir / 'metrics.json'}")
print(f"summary: {out_dir / 'summary.md'}")
if tier_a > 0:
print(f"\nFAIL: {tier_a} Tier-A divergence(s) detected.", file=sys.stderr)
return 1
return 0
def write_summary_md(
path: Path, metrics: Dict[str, Any], sample_diffs: List[Dict[str, Any]]
) -> None:
lines: List[str] = []
lines.append("# Signals Parity Report")
lines.append("")
rm = metrics.get("run_metadata", {})
if rm:
lines.append("## Run metadata")
lines.append("")
for k in (
"dataset_name",
"dataset_revision",
"seed",
"num_samples_actual",
"plano_git_sha",
"signals_python_version",
"rust_binary_sha256",
):
if k in rm:
lines.append(f"- **{k}**: `{rm[k]}`")
lines.append("")
lines.append("## Summary")
lines.append("")
lines.append(f"- Conversations compared: **{metrics['n_common']}**")
lines.append(f"- Full matches: **{metrics['n_full_match']}**")
lines.append(f"- Agreement: **{metrics['agreement_pct']}%**")
lines.append(f"- Tier-A divergences: **{metrics['tier_a_divergences']}**")
lines.append(f"- Tier-B divergences: **{metrics['tier_b_divergences']}**")
lines.append("")
lines.append("## Per-signal-type disagreement")
lines.append("")
lines.append("| Signal type | Total reports | Disagreements |")
lines.append("|---|---:|---:|")
totals = metrics["per_signal_type_total"]
disagrees = metrics["per_signal_type_disagree"]
for k in sorted(set(totals) | set(disagrees)):
lines.append(f"| `{k}` | {totals.get(k, 0)} | {disagrees.get(k, 0)} |")
lines.append("")
lines.append("## Quality bucket confusion matrix (rows = python, cols = rust)")
lines.append("")
cm = metrics["quality_confusion_matrix"]
labels = list(cm.keys())
lines.append("| | " + " | ".join(labels) + " |")
lines.append("|---|" + "|".join(["---:"] * len(labels)) + "|")
for r in labels:
lines.append(
f"| {r} | " + " | ".join(str(cm[r].get(c, 0)) for c in labels) + " |"
)
lines.append("")
if sample_diffs:
lines.append("## Sample divergences (first 20)")
lines.append("")
for d in sample_diffs:
lines.append(f"### `{d['id']}` — tier {d['tier']}{d['kind']}")
lines.append("")
lines.append("```json")
lines.append(json.dumps(d, indent=2))
lines.append("```")
lines.append("")
path.write_text("\n".join(lines))
if __name__ == "__main__":
sys.exit(main())

View file

@ -0,0 +1,3 @@
huggingface_hub>=0.25
pyarrow>=15
tqdm>=4.65

View file

@ -0,0 +1,332 @@
#!/usr/bin/env python3
"""
Parity harness driver.
Samples conversations from `lmsys/lmsys-chat-1m`, runs both the Python
reference analyzer (in-process) and the Rust port (subprocess), writes both
reports to disk for `compare.py` to diff.
Usage:
python run_parity.py \\
--num-samples 2000 \\
--seed 42 \\
--dataset-revision <hf-revision-sha> \\
--rust-binary ../../../crates/target/release/signals_replay \\
--output-dir out/
"""
from __future__ import annotations
import argparse
import hashlib
import json
import random
import subprocess
import sys
import time
from pathlib import Path
from typing import Any, Dict, Iterator, List
try:
import pyarrow.parquet as pq
from huggingface_hub import hf_hub_download, list_repo_files
except ImportError:
print(
"error: install dependencies first: pip install -r requirements.txt",
file=sys.stderr,
)
sys.exit(2)
try:
from signals.analyzer import SignalAnalyzer
except ImportError:
print(
"error: the python `signals` package is not installed. "
"install it from your local checkout: pip install -e /path/to/signals",
file=sys.stderr,
)
sys.exit(2)
try:
from tqdm import tqdm
except ImportError:
def tqdm(it, **_kwargs): # type: ignore[no-redef]
return it
DATASET_NAME = "lmsys/lmsys-chat-1m"
def parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser(description=__doc__)
p.add_argument("--num-samples", type=int, default=2000)
p.add_argument("--seed", type=int, default=42)
p.add_argument(
"--dataset-revision",
default=None,
help="HF dataset revision to pin (default: latest, NOT recommended for reproducibility)",
)
p.add_argument(
"--rust-binary",
type=Path,
required=True,
help="path to the `signals_replay` binary built from crates/brightstaff",
)
p.add_argument(
"--output-dir",
type=Path,
default=Path("out"),
help="directory to write the conversations + both runners' outputs",
)
p.add_argument(
"--max-conv-messages",
type=int,
default=200,
help="drop conversations with more than this many messages (the analyzer "
"truncates to last 100 anyway; this is a sanity cap on input parsing)",
)
return p.parse_args()
def lmsys_to_sharegpt(conversation: List[Dict[str, str]]) -> List[Dict[str, str]]:
"""Convert lmsys-chat-1m's `[{role, content}]` to ShareGPT's `[{from, value}]`.
lmsys uses `user` / `assistant` (no tools, no system role in `conversation`).
"""
out = []
for m in conversation:
role = m.get("role", "")
content = m.get("content", "")
if not isinstance(content, str):
content = str(content) if content is not None else ""
if role == "user":
from_ = "human"
elif role == "assistant":
from_ = "gpt"
else:
# lmsys is human/assistant only; skip anything else defensively.
continue
out.append({"from": from_, "value": content})
return out
def _list_parquet_files(revision: str | None) -> List[str]:
"""Return the list of parquet shard paths in the dataset repo."""
files = list_repo_files(DATASET_NAME, repo_type="dataset", revision=revision)
return sorted(f for f in files if f.endswith(".parquet"))
def _download_shards(paths: List[str], revision: str | None) -> List[Path]:
"""Download each parquet shard to the HF cache, return local paths."""
local: List[Path] = []
for rel in tqdm(paths, desc="downloading shards", unit="shard"):
p = hf_hub_download(
DATASET_NAME,
filename=rel,
repo_type="dataset",
revision=revision,
)
local.append(Path(p))
return local
def sample_conversations(
*,
num_samples: int,
seed: int,
revision: str | None,
max_conv_messages: int,
) -> Iterator[Dict[str, Any]]:
"""Yield `num_samples` conversations sampled uniformly across the dataset.
We bypass the `datasets` loader (which has a Python 3.14 pickle issue)
and read the parquet shards directly via pyarrow.
"""
print(
f"listing {DATASET_NAME}"
f"{' @ ' + revision if revision else ' (no revision pinned!)'}",
file=sys.stderr,
)
shard_paths = _list_parquet_files(revision)
if not shard_paths:
raise SystemExit(f"no parquet shards found for {DATASET_NAME}")
local_paths = _download_shards(shard_paths, revision)
# Collect row counts without reading data.
shard_row_counts: List[int] = []
for p in local_paths:
pf = pq.ParquetFile(str(p))
shard_row_counts.append(pf.metadata.num_rows)
total_rows = sum(shard_row_counts)
print(
f"dataset has {total_rows:,} rows across {len(local_paths)} shards",
file=sys.stderr,
)
rng = random.Random(seed)
global_indices = sorted(rng.sample(range(total_rows), num_samples))
# Bucket indices by shard.
by_shard: Dict[int, List[int]] = {}
cumulative = 0
shard_offsets = []
for c in shard_row_counts:
shard_offsets.append(cumulative)
cumulative += c
for gi in global_indices:
# Find which shard this index belongs to.
for si, off in enumerate(shard_offsets):
if gi < off + shard_row_counts[si]:
by_shard.setdefault(si, []).append(gi - off)
break
yielded = 0
for si in sorted(by_shard.keys()):
local_rows = by_shard[si]
pf = pq.ParquetFile(str(local_paths[si]))
table = pf.read(columns=["conversation"])
conv_col = table.column("conversation")
for local_idx in local_rows:
raw = conv_col[local_idx].as_py()
if not raw:
continue
conversation = raw if isinstance(raw, list) else raw.get("conversation", [])
if len(conversation) > max_conv_messages:
continue
messages = lmsys_to_sharegpt(conversation)
if not messages:
continue
global_idx = shard_offsets[si] + local_idx
yield {
"id": f"lmsys-{global_idx}",
"messages": messages,
}
yielded += 1
print(f"yielded {yielded} conversations after filtering", file=sys.stderr)
def write_conversations(out_path: Path, samples: Iterator[Dict[str, Any]]) -> int:
n = 0
with out_path.open("w") as f:
for s in tqdm(samples, desc="sampling", unit="convo"):
f.write(json.dumps(s, ensure_ascii=False))
f.write("\n")
n += 1
return n
def run_rust(rust_binary: Path, conv_path: Path, out_path: Path) -> None:
print(f"running rust analyzer: {rust_binary}", file=sys.stderr)
t0 = time.monotonic()
with conv_path.open("rb") as fin, out_path.open("wb") as fout:
proc = subprocess.run(
[str(rust_binary)],
stdin=fin,
stdout=fout,
stderr=subprocess.PIPE,
check=False,
)
if proc.returncode != 0:
sys.stderr.write(proc.stderr.decode("utf-8", errors="replace"))
raise SystemExit(f"rust runner exited {proc.returncode}")
elapsed = time.monotonic() - t0
print(f" rust runner: {elapsed:.1f}s", file=sys.stderr)
def run_python(conv_path: Path, out_path: Path) -> None:
print("running python analyzer...", file=sys.stderr)
t0 = time.monotonic()
analyzer = SignalAnalyzer()
with conv_path.open() as fin, out_path.open("w") as fout:
for line in tqdm(fin, desc="python", unit="convo"):
line = line.strip()
if not line:
continue
try:
obj = json.loads(line)
report = analyzer.analyze(obj["messages"])
fout.write(
json.dumps(
{"id": obj["id"], "report": report.to_dict()},
ensure_ascii=False,
)
)
except Exception as e:
fout.write(json.dumps({"id": obj.get("id"), "error": str(e)}))
fout.write("\n")
elapsed = time.monotonic() - t0
print(f" python runner: {elapsed:.1f}s", file=sys.stderr)
def stamp_metadata(args: argparse.Namespace, output_dir: Path, n_samples: int) -> None:
"""Write the input metadata so compare.py can include it in the report."""
binary_sha = hashlib.sha256(args.rust_binary.read_bytes()).hexdigest()
try:
plano_sha = (
subprocess.check_output(
["git", "rev-parse", "HEAD"], cwd=Path(__file__).parent
)
.decode()
.strip()
)
except Exception:
plano_sha = "unknown"
try:
signals_version = subprocess.check_output(
[sys.executable, "-m", "pip", "show", "signals"]
).decode()
signals_version = next(
(
l.split(":", 1)[1].strip()
for l in signals_version.splitlines()
if l.startswith("Version")
),
"unknown",
)
except Exception:
signals_version = "unknown"
meta = {
"dataset_name": DATASET_NAME,
"dataset_revision": args.dataset_revision,
"seed": args.seed,
"num_samples_requested": args.num_samples,
"num_samples_actual": n_samples,
"rust_binary": str(args.rust_binary.resolve()),
"rust_binary_sha256": binary_sha,
"plano_git_sha": plano_sha,
"signals_python_version": signals_version,
"max_conv_messages": args.max_conv_messages,
}
(output_dir / "run_metadata.json").write_text(json.dumps(meta, indent=2))
print(f"wrote {output_dir / 'run_metadata.json'}", file=sys.stderr)
def main() -> None:
args = parse_args()
args.output_dir.mkdir(parents=True, exist_ok=True)
if not args.rust_binary.exists():
raise SystemExit(f"rust binary not found at {args.rust_binary}")
conv_path = args.output_dir / "conversations.jsonl"
rust_path = args.output_dir / "rust_reports.jsonl"
py_path = args.output_dir / "python_reports.jsonl"
samples = sample_conversations(
num_samples=args.num_samples,
seed=args.seed,
revision=args.dataset_revision,
max_conv_messages=args.max_conv_messages,
)
n = write_conversations(conv_path, samples)
print(f"wrote {n} conversations to {conv_path}", file=sys.stderr)
run_rust(args.rust_binary, conv_path, rust_path)
run_python(conv_path, py_path)
stamp_metadata(args, args.output_dir, n)
print("done. now run: python compare.py --output-dir " + str(args.output_dir))
if __name__ == "__main__":
main()