Replaces the previous head-only truncation of oversized user messages
with a middle-trim (head + ellipsis + tail) that preserves both the task
framing (start of message) and the actual ask (end of message) — a
common shape for long pasted content like code dumps or specs. The
unicode ellipsis also signals to the router model that content was
dropped, which can improve classification accuracy on truncated prompts.
Also adds an outer guardrail: only the last `MAX_ROUTING_TURNS` (16)
filtered messages are considered when building the routing prompt. This
bounds prompt growth for long conversations before the token-budget
loop runs, matching the approach HuggingFace chat-ui takes in its
arch-router client.
Tests:
- test_huge_single_user_message_is_middle_trimmed: regression test for
the 500KB user message scenario. Verifies the prompt stays bounded,
head + tail markers both survive, and the ellipsis is present.
- test_turn_cap_limits_routing_history: builds a 32-turn conversation
and verifies only the last 16 make it into the prompt.
- test_trim_middle_utf8_helper: unit test for the helper covering the
no-op path, the 60/40 split, the too-small-for-marker fallback, and
UTF-8 boundary safety for multi-byte characters.
- Updated test_conversation_trim_upto_user_message to reflect the new
middle-trim behavior.
`post_and_extract_content` was unconditionally deserializing the upstream
response body as a `ChatCompletionsResponse`, which meant 4xx/5xx error
bodies (OpenAI-style `{"error": {...}}` envelopes) failed with confusing
messages like `missing field 'id' at line 1 column 391`. The real
upstream message (e.g. "This model's maximum context length is 32768
tokens...") only appeared once as a warn log and then got buried in the
generic "Failed to parse JSON response" path.
Now we:
- Check the HTTP status before attempting to parse the success body.
- On non-2xx, extract a human-readable message from the OpenAI-style
error envelope (or fall back to a UTF-8-safe truncated raw body).
- Return a dedicated `HttpError::Upstream { status, message }` variant
so callers can log / surface / retry based on the real status code.
- Truncate raw bodies in warn logs to 512 bytes (UTF-8-safe) to avoid
flooding logs with oversized JSON or HTML error pages.
The orchestrator trimmer had a bypass that kept the latest user message
whole even when it alone exceeded the configured token budget. This
caused brightstaff to send a ~500KB prompt to the Plano-Orchestrator
model, which rejected it with a 400 "context length exceeded" from the
upstream 32K-token window. Brightstaff then surfaced a confusing
"missing field id" parse error instead of the real upstream message.
Fix the bypass by trimming the overflowing user message from the end
toward the beginning until it fits in the remaining token budget. The
beginning of the message (where user intent usually lives) is preserved
and the tail is dropped. Added a UTF-8-safe byte-truncation helper and a
regression test that mirrors the production payload (a single ~500KB
user message with a small budget).
* feat: add initial documentation for Plano Agent Skills
* feat: readme with examples
* feat: add detailed skills documentation and examples for Plano
---------
Co-authored-by: Adil Hafeez <adil.hafeez@gmail.com>
* add pluggable session cache with Redis backend
* add Redis session affinity demos (Docker Compose and Kubernetes)
* address PR review feedback on session cache
* document Redis session cache backend for model affinity
* sync rendered config reference with session_cache addition
* add tenant-scoped Redis session cache keys and remove dead log_affinity_hit
- Add tenant_header to SessionCacheConfig; when set, cache keys are scoped
as plano:affinity:{tenant_id}:{session_id} for multi-tenant isolation
- Thread tenant_id through RouterService, routing_service, and llm handlers
- Use Cow<'_, str> in session_key to avoid allocation when no tenant is set
- Remove unused log_affinity_hit (logging was already inlined at call sites)
* remove session_affinity_redis and session_affinity_redis_k8s demos
* feat(provider): add xiaomi as first-class provider
* feat(demos): add xiaomi mimo integration demo
* refactor(demos): remove Xiaomi MiMo integration demo and update documentation
* updating model list and adding the xiamoi models
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-389.local>
* feat(web): announce DigitalOcean acquisition across sites
* fix(web): make blog routes resilient without Sanity config
* fix(web): add mobile arrow cue to announcement banner
* fix(web): point acquisition links to announcement post
* fix: route Perplexity OpenAI paths without /v1
* add tests for Perplexity provider handling in LLM module
* refactor: use constant for Perplexity provider prefix in LLM module
* moving const to top of file
* support configurable orchestrator model via orchestration config section
* add self-hosting docs and demo for Plano-Orchestrator
* list all Plano-Orchestrator model variants in docs
* use overrides for custom routing and orchestration model
* update docs
* update orchestrator model name
* rename arch provider to plano, use llm_routing_model and agent_orchestration_model
* regenerate rendered config reference
* Add Codex CLI support; xAI response improvements
* Add native Plano running check and update CLI agent error handling
* adding PR suggestions for transformations and code quality
* message extraction logic in ResponsesAPIRequest
* xAI support for Responses API by routing to native endpoint + refactor code