`post_and_extract_content` was unconditionally deserializing the upstream
response body as a `ChatCompletionsResponse`, which meant 4xx/5xx error
bodies (OpenAI-style `{"error": {...}}` envelopes) failed with confusing
messages like `missing field 'id' at line 1 column 391`. The real
upstream message (e.g. "This model's maximum context length is 32768
tokens...") only appeared once as a warn log and then got buried in the
generic "Failed to parse JSON response" path.
Now we:
- Check the HTTP status before attempting to parse the success body.
- On non-2xx, extract a human-readable message from the OpenAI-style
error envelope (or fall back to a UTF-8-safe truncated raw body).
- Return a dedicated `HttpError::Upstream { status, message }` variant
so callers can log / surface / retry based on the real status code.
- Truncate raw bodies in warn logs to 512 bytes (UTF-8-safe) to avoid
flooding logs with oversized JSON or HTML error pages.
The orchestrator trimmer had a bypass that kept the latest user message
whole even when it alone exceeded the configured token budget. This
caused brightstaff to send a ~500KB prompt to the Plano-Orchestrator
model, which rejected it with a 400 "context length exceeded" from the
upstream 32K-token window. Brightstaff then surfaced a confusing
"missing field id" parse error instead of the real upstream message.
Fix the bypass by trimming the overflowing user message from the end
toward the beginning until it fits in the remaining token budget. The
beginning of the message (where user intent usually lives) is preserved
and the tail is dropped. Added a UTF-8-safe byte-truncation helper and a
regression test that mirrors the production payload (a single ~500KB
user message with a small budget).
Build and Deploy Documentation / build (push) Waiting to run
* feat: add initial documentation for Plano Agent Skills
* feat: readme with examples
* feat: add detailed skills documentation and examples for Plano
---------
Co-authored-by: Adil Hafeez <adil.hafeez@gmail.com>
Publish docker image (latest) / build-arm64 (push) Has been cancelled
Publish docker image (latest) / build-amd64 (push) Has been cancelled
Build and Deploy Documentation / build (push) Has been cancelled
CI / security-scan (push) Has been cancelled
CI / test-prompt-gateway (push) Has been cancelled
CI / test-model-alias-routing (push) Has been cancelled
CI / test-responses-api-with-state (push) Has been cancelled
CI / e2e-plano-tests (3.10) (push) Has been cancelled
CI / e2e-plano-tests (3.11) (push) Has been cancelled
CI / e2e-plano-tests (3.12) (push) Has been cancelled
CI / e2e-plano-tests (3.13) (push) Has been cancelled
CI / e2e-plano-tests (3.14) (push) Has been cancelled
CI / e2e-demo-preference (push) Has been cancelled
CI / e2e-demo-currency (push) Has been cancelled
Publish docker image (latest) / create-manifest (push) Has been cancelled
* add pluggable session cache with Redis backend
* add Redis session affinity demos (Docker Compose and Kubernetes)
* address PR review feedback on session cache
* document Redis session cache backend for model affinity
* sync rendered config reference with session_cache addition
* add tenant-scoped Redis session cache keys and remove dead log_affinity_hit
- Add tenant_header to SessionCacheConfig; when set, cache keys are scoped
as plano:affinity:{tenant_id}:{session_id} for multi-tenant isolation
- Thread tenant_id through RouterService, routing_service, and llm handlers
- Use Cow<'_, str> in session_key to avoid allocation when no tenant is set
- Remove unused log_affinity_hit (logging was already inlined at call sites)
* remove session_affinity_redis and session_affinity_redis_k8s demos
CI / test-prompt-gateway (push) Has been cancelled
CI / test-model-alias-routing (push) Has been cancelled
CI / test-responses-api-with-state (push) Has been cancelled
CI / e2e-plano-tests (3.10) (push) Has been cancelled
CI / e2e-plano-tests (3.11) (push) Has been cancelled
CI / e2e-plano-tests (3.12) (push) Has been cancelled
CI / e2e-plano-tests (3.13) (push) Has been cancelled
CI / e2e-plano-tests (3.14) (push) Has been cancelled
CI / e2e-demo-preference (push) Has been cancelled
CI / e2e-demo-currency (push) Has been cancelled
Publish docker image (latest) / build-arm64 (push) Has been cancelled
Publish docker image (latest) / build-amd64 (push) Has been cancelled
Publish docker image (latest) / create-manifest (push) Has been cancelled
Build and Deploy Documentation / build (push) Has been cancelled
* feat(provider): add xiaomi as first-class provider
* feat(demos): add xiaomi mimo integration demo
* refactor(demos): remove Xiaomi MiMo integration demo and update documentation
* updating model list and adding the xiamoi models
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-389.local>
* feat(web): announce DigitalOcean acquisition across sites
* fix(web): make blog routes resilient without Sanity config
* fix(web): add mobile arrow cue to announcement banner
* fix(web): point acquisition links to announcement post
* fix: route Perplexity OpenAI paths without /v1
* add tests for Perplexity provider handling in LLM module
* refactor: use constant for Perplexity provider prefix in LLM module
* moving const to top of file
* support configurable orchestrator model via orchestration config section
* add self-hosting docs and demo for Plano-Orchestrator
* list all Plano-Orchestrator model variants in docs
* use overrides for custom routing and orchestration model
* update docs
* update orchestrator model name
* rename arch provider to plano, use llm_routing_model and agent_orchestration_model
* regenerate rendered config reference
* Add Codex CLI support; xAI response improvements
* Add native Plano running check and update CLI agent error handling
* adding PR suggestions for transformations and code quality
* message extraction logic in ResponsesAPIRequest
* xAI support for Responses API by routing to native endpoint + refactor code