* fix(routing): auto-migrate v0.3.0 inline routing_preferences to v0.4.0 top-level
Lift inline routing_preferences under each model_provider into the
top-level routing_preferences list with merged models[] and bump
version to v0.4.0, with a deprecation warning. Existing v0.3.0
demo configs (Claude Code, Codex, preference_based_routing, etc.)
keep working unchanged. Schema flags the inline shape as deprecated
but still accepts it. Docs and skills updated to canonical top-level
multi-model form.
* test(common): bump reference config assertion to v0.4.0
The rendered reference config was bumped to v0.4.0 when its inline
routing_preferences were lifted to the top level; align the
configuration deserialization test with that change.
* fix(config_generator): bump version to v0.4.0 up front in migration
Move the v0.3.0 -> v0.4.0 version bump to the top of
migrate_inline_routing_preferences so it runs unconditionally,
including for configs that already declare top-level
routing_preferences at v0.3.0. Previously the bump only fired
when inline migration produced entries, leaving top-level v0.3.0
configs rejected by brightstaff's v0.4.0 gate. Tests updated to
cover the new behavior and to confirm we never downgrade newer
versions.
* fix(config_generator): gate routing_preferences migration on version < v0.4.0
Short-circuit the migration when the config already declares v0.4.0
or newer. Anything at v0.4.0+ is assumed to be on the canonical
top-level shape and is passed through untouched, including stray
inline preferences (which are the author's bug to fix). Only v0.3.0
and older configs are rewritten and bumped.
Publish docker image (latest) / build-arm64 (push) Has been cancelled
Publish docker image (latest) / build-amd64 (push) Has been cancelled
Build and Deploy Documentation / build (push) Has been cancelled
CI / security-scan (push) Has been cancelled
CI / test-prompt-gateway (push) Has been cancelled
CI / test-model-alias-routing (push) Has been cancelled
CI / test-responses-api-with-state (push) Has been cancelled
CI / e2e-plano-tests (3.10) (push) Has been cancelled
CI / e2e-plano-tests (3.11) (push) Has been cancelled
CI / e2e-plano-tests (3.12) (push) Has been cancelled
CI / e2e-plano-tests (3.13) (push) Has been cancelled
CI / e2e-plano-tests (3.14) (push) Has been cancelled
CI / e2e-demo-preference (push) Has been cancelled
CI / e2e-demo-currency (push) Has been cancelled
Publish docker image (latest) / create-manifest (push) Has been cancelled
* add pluggable session cache with Redis backend
* add Redis session affinity demos (Docker Compose and Kubernetes)
* address PR review feedback on session cache
* document Redis session cache backend for model affinity
* sync rendered config reference with session_cache addition
* add tenant-scoped Redis session cache keys and remove dead log_affinity_hit
- Add tenant_header to SessionCacheConfig; when set, cache keys are scoped
as plano:affinity:{tenant_id}:{session_id} for multi-tenant isolation
- Thread tenant_id through RouterService, routing_service, and llm handlers
- Use Cow<'_, str> in session_key to avoid allocation when no tenant is set
- Remove unused log_affinity_hit (logging was already inlined at call sites)
* remove session_affinity_redis and session_affinity_redis_k8s demos
* support configurable orchestrator model via orchestration config section
* add self-hosting docs and demo for Plano-Orchestrator
* list all Plano-Orchestrator model variants in docs
* use overrides for custom routing and orchestration model
* update docs
* update orchestrator model name
* rename arch provider to plano, use llm_routing_model and agent_orchestration_model
* regenerate rendered config reference
* first commit with tests to enable state mamangement via memory
* fixed logs to follow the conversational flow a bit better
* added support for supabase
* added the state_storage_v1_responses flag, and use that to store state appropriately
* cleaned up logs and fixed issue with connectivity for llm gateway in weather forecast demo
* fixed mixed inputs from openai v1/responses api (#632)
* fixed mixed inputs from openai v1/responses api
* removing tracing from model-alias-rouing
* handling additional input types from openairs
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
* resolving PR comments
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
* adding function_calling functionality via rust
* fixed rendered YAML file
* removed model_server from envoy.template and forwarding traffic to bright_staff
* fixed bugs in function_calling.rs that were breaking tests. All good now
* updating e2e test to clean up disk usage
* removing Arch* models to be used as a default model if one is not specified
* if the user sets arch-function base_url we should honor it
* fixing demos as we needed to pin to a particular version of huggingface_hub else the chatbot ui wouldn't build
* adding a constant for Arch-Function model name
* fixing some edge cases with calls made to Arch-Function
* fixed JSON parsing issues in function_calling.rs
* fixed bug where the raw response from Arch-Function was re-encoded
* removed debug from supervisord.conf
* commenting out disk cleanup
* adding back disk space
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-288.local>
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>