Commit graph

63 commits

Author SHA1 Message Date
Musa
980faef6be
Redis-backed session cache for cross-replica model affinity (#879)
Some checks failed
CI / pre-commit (push) Has been cancelled
CI / plano-tools-tests (push) Has been cancelled
CI / native-smoke-test (push) Has been cancelled
CI / docker-build (push) Has been cancelled
CI / validate-config (push) Has been cancelled
Publish docker image (latest) / build-arm64 (push) Has been cancelled
Publish docker image (latest) / build-amd64 (push) Has been cancelled
Build and Deploy Documentation / build (push) Has been cancelled
CI / security-scan (push) Has been cancelled
CI / test-prompt-gateway (push) Has been cancelled
CI / test-model-alias-routing (push) Has been cancelled
CI / test-responses-api-with-state (push) Has been cancelled
CI / e2e-plano-tests (3.10) (push) Has been cancelled
CI / e2e-plano-tests (3.11) (push) Has been cancelled
CI / e2e-plano-tests (3.12) (push) Has been cancelled
CI / e2e-plano-tests (3.13) (push) Has been cancelled
CI / e2e-plano-tests (3.14) (push) Has been cancelled
CI / e2e-demo-preference (push) Has been cancelled
CI / e2e-demo-currency (push) Has been cancelled
Publish docker image (latest) / create-manifest (push) Has been cancelled
* add pluggable session cache with Redis backend

* add Redis session affinity demos (Docker Compose and Kubernetes)

* address PR review feedback on session cache

* document Redis session cache backend for model affinity

* sync rendered config reference with session_cache addition

* add tenant-scoped Redis session cache keys and remove dead log_affinity_hit

- Add tenant_header to SessionCacheConfig; when set, cache keys are scoped
  as plano:affinity:{tenant_id}:{session_id} for multi-tenant isolation
- Thread tenant_id through RouterService, routing_service, and llm handlers
- Use Cow<'_, str> in session_key to avoid allocation when no tenant is set
- Remove unused log_affinity_hit (logging was already inlined at call sites)

* remove session_affinity_redis and session_affinity_redis_k8s demos
2026-04-13 19:30:47 -07:00
Adil Hafeez
8dedf0bec1
Model affinity for consistent model selection in agentic loops (#827)
Some checks are pending
CI / pre-commit (push) Waiting to run
CI / plano-tools-tests (push) Waiting to run
CI / native-smoke-test (push) Waiting to run
CI / docker-build (push) Waiting to run
CI / validate-config (push) Waiting to run
CI / security-scan (push) Blocked by required conditions
CI / test-prompt-gateway (push) Blocked by required conditions
CI / test-model-alias-routing (push) Blocked by required conditions
CI / test-responses-api-with-state (push) Blocked by required conditions
CI / e2e-plano-tests (3.10) (push) Blocked by required conditions
CI / e2e-plano-tests (3.11) (push) Blocked by required conditions
CI / e2e-plano-tests (3.12) (push) Blocked by required conditions
CI / e2e-plano-tests (3.13) (push) Blocked by required conditions
CI / e2e-plano-tests (3.14) (push) Blocked by required conditions
CI / e2e-demo-preference (push) Blocked by required conditions
CI / e2e-demo-currency (push) Blocked by required conditions
Publish docker image (latest) / build-arm64 (push) Waiting to run
Publish docker image (latest) / build-amd64 (push) Waiting to run
Publish docker image (latest) / create-manifest (push) Blocked by required conditions
Build and Deploy Documentation / build (push) Waiting to run
2026-04-08 17:32:02 -07:00
Adil Hafeez
7606c55b4b
support developer role in chat completions API (#867) 2026-04-02 18:10:32 -07:00
Musa
f68c21f8df
Handle null prefer in inline routing policy (#856)
* Handle null prefer in inline routing policy

* Use serde defaulting for null selection preference

* Add tests for default selection policy behavior in routing preferences
2026-03-31 17:41:25 -07:00
Musa
3dbda9741e
fix: route Perplexity OpenAI endpoints without /v1 (#854)
* fix: route Perplexity OpenAI paths without /v1

* add tests for Perplexity provider handling in LLM module

* refactor: use constant for Perplexity provider prefix in LLM module

* moving const to top of file
2026-03-31 17:40:42 -07:00
Adil Hafeez
af98c11a6d
restructure model_metrics_sources to type + provider (#855) 2026-03-30 17:12:20 -07:00
Adil Hafeez
e5751d6b13
model routing: cost/latency ranking with ranked fallback list (#849) 2026-03-30 13:46:52 -07:00
Salman Paracha
69df124c47
the orchestrator had a bug where it was setting the wrong headers for archfc.katanemo.dev (#839)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-389.local>
2026-03-20 00:40:47 -07:00
Adil Hafeez
1ad3e0f64e
refactor brightstaff (#736) 2026-03-19 17:58:33 -07:00
Adil Hafeez
1f23c573bf
add output filter chain (#822) 2026-03-18 17:58:20 -07:00
Adil Hafeez
bc059aed4d
Unified overrides for custom router and orchestrator models (#820)
* support configurable orchestrator model via orchestration config section

* add self-hosting docs and demo for Plano-Orchestrator

* list all Plano-Orchestrator model variants in docs

* use overrides for custom routing and orchestration model

* update docs

* update orchestrator model name

* rename arch provider to plano, use llm_routing_model and agent_orchestration_model

* regenerate rendered config reference
2026-03-15 09:36:11 -07:00
Musa
6610097659
Support for Codex via Plano (#808)
* Add Codex CLI support; xAI response improvements

* Add native Plano running check and update CLI agent error handling

* adding PR suggestions for transformations and code quality

* message extraction logic in ResponsesAPIRequest

* xAI support for Responses API by routing to native endpoint + refactor code
2026-03-10 20:54:14 -07:00
Adil Hafeez
97b7a390ef
support inline routing_policy in request body (#811) (#815) 2026-03-10 12:23:18 -07:00
Adil Hafeez
028a2cd196
add routing service (#814)
fixes https://github.com/katanemo/plano/issues/810
2026-03-09 16:32:16 -07:00
Musa
2bde21ff57
add Custom Trace Attributes to extend observability (#708)
* add custom trace attributes

* refactor: prefix custom trace attributes and update schema handlers tests configs

* refactor: rename custom_attribute_prefixes to span_attribute_header_prefixes in configuration and related handlers

* docs: add section on custom span attributes

* refactor: update tracing configuration to use span attributes and adjust related handlers

* docs: custom span attributes section to include static attributes and clarify configuration

* add custom trace attributes

* refactor: prefix custom trace attributes and update schema handlers tests configs

* refactor: rename custom_attribute_prefixes to span_attribute_header_prefixes in configuration and related handlers

* docs: add section on custom span attributes

* refactor: update tracing configuration to use span attributes and adjust related handlers

* docs: custom span attributes section to include static attributes and clarify configuration

* refactor: remove TraceCollector usage and enhance logging with structured attributes

* refactor: custom trace attribute extraction to improve clarity

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-25 16:27:20 -08:00
Syed A. Hashmi
54bc8e5e52
[ISSUE 706]: Standardize returned errors from Plano (#772)
* [ISSUE 706]: Standardize returned errors from Plano

* Standardized errors in chat completion
2026-02-24 14:34:33 -08:00
Adil Hafeez
baeee56f6b
Make model field optional in request types, resolve from default provider (#768) 2026-02-18 04:43:59 -08:00
Adil Hafeez
ba651aaf71
Rename all arch references to plano (#745)
* Rename all arch references to plano across the codebase

Complete rebrand from "Arch"/"archgw" to "Plano" including:
- Config files: arch_config_schema.yaml, workflow, demo configs
- Environment variables: ARCH_CONFIG_* → PLANO_CONFIG_*
- Python CLI: variables, functions, file paths, docker mounts
- Rust crates: config paths, log messages, metadata keys
- Docker/build: Dockerfile, supervisord, .dockerignore, .gitignore
- Docker Compose: volume mounts and env vars across all demos/tests
- GitHub workflows: job/step names
- Shell scripts: log messages
- Demos: Python code, READMEs, VS Code configs, Grafana dashboard
- Docs: RST includes, code comments, config references
- Package metadata: package.json, pyproject.toml, uv.lock

External URLs (docs.archgw.com, github.com/katanemo/archgw) left as-is.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update remaining arch references in docs

- Rename RST cross-reference labels: arch_access_logging, arch_overview_tracing, arch_overview_threading → plano_*
- Update label references in request_lifecycle.rst
- Rename arch_config_state_storage_example.yaml → plano_config_state_storage_example.yaml
- Update config YAML comments: "Arch creates/uses" → "Plano creates/uses"
- Update "the Arch gateway" → "the Plano gateway" in configuration_reference.rst
- Update arch_config_schema.yaml reference in provider_models.py
- Rename arch_agent_router → plano_agent_router in config example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix remaining arch references found in second pass

- config/docker-compose.dev.yaml: ARCH_CONFIG_FILE → PLANO_CONFIG_FILE,
  arch_config.yaml → plano_config.yaml, archgw_logs → plano_logs
- config/test_passthrough.yaml: container mount path
- tests/e2e/docker-compose.yaml: source file path (was still arch_config.yaml)
- cli/planoai/core.py: comment and log message
- crates/brightstaff/src/tracing/constants.rs: doc comment
- tests/{e2e,archgw}/common.py: get_arch_messages → get_plano_messages,
  arch_state/arch_messages variables renamed
- tests/{e2e,archgw}/test_prompt_gateway.py: updated imports and usages
- demos/shared/test_runner/{common,test_demos}.py: same renames
- tests/e2e/test_model_alias_routing.py: docstring
- .dockerignore: archgw_modelserver → plano_modelserver
- demos/use_cases/claude_code_router/pretty_model_resolution.sh: container name

Note: x-arch-* HTTP header values and Rust constant names intentionally
preserved for backwards compatibility with existing deployments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 15:16:56 -08:00
Musa
e3bf2b7f71
Introduce brand new CLI experience with tracing and quickstart (#724)
Release hardens tracing and routing: clearer CLI, modular internals, updated demos/docs/tests, and improved multi-agent reliability.

Co-authored-by: Adil Hafeez <adil.hafeez@gmail.com>
2026-02-10 13:17:43 -08:00
Adil Hafeez
46de89590b
use standard tracing and logging in brightstaff (#721) 2026-02-09 13:33:27 -08:00
Salman Paracha
2941392ed1
Adding support for wildcard models in the model_providers config (#696)
* cleaning up plano cli commands

* adding support for wildcard model providers

* fixing compile errors

* fixing bugs related to default model provider, provider hint and duplicates in the model provider list

* fixed cargo fmt issues

* updating tests to always include the model id

* using default for the prompt_gateway path

* fixed the model name, as gpt-5-mini-2025-08-07 wasn't in the config

* making sure that all aliases and models match the config

* fixed the config generator to allow for base_url providers LLMs to include wildcard models

* re-ran the models list utility and added a shell script to run it

* updating docs to mention wildcard model providers

* updated provider_models.json to yaml, added that file to our docs for reference

* updating the build docs to use the new root-based build

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
2026-01-28 17:47:33 -08:00
Salman Paracha
cdc1d7cee2
making Messages.Content optional, and having the upstream LLM fail if the right fields aren't set (#699)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
2026-01-16 16:24:03 -08:00
Adil Hafeez
626f556cc6
reduce number of info statements in pipeline processor (#698)
Co-authored-by: Adil Hafeez <adil.hafeez10@t-mobile.com>
2026-01-16 15:38:43 -08:00
Adil Hafeez
11fb4cd633
remove unnecessary clones from code (#682) 2026-01-08 15:11:05 -08:00
Adil Hafeez
78b2ae0cf7
pass request_id in orchestrator and routing model (#678) 2026-01-07 12:04:10 -08:00
Salman Paracha
b4543ba56c
Introduce signals change (#655)
* adding support for signals

* reducing false positives for signals like positive interaction

* adding docs. Still need to fix the messages list, but waiting on PR #621

* Improve frustration detection: normalize contractions and refine punctuation

* Further refine test cases with longer messages

* minor doc changes

* fixing echo statement for build

* fixing the messages construction and using the trait for signals

* update signals docs

* fixed some minor doc changes

* added more tests and fixed docuemtnation. PR 100% ready

* made fixes based on PR comments

* Optimize latency

1. replace sliding window approach with trigram containment check
2. add code to pre-compute ngrams for patterns

* removed some debug statements to make tests easier to read

* PR comments to make ObservableStreamProcessor accept optonal Vec<Messagges>

* fixed PR comments

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
Co-authored-by: MeiyuZhong <mariazhong9612@gmail.com>
Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com>
2026-01-07 11:20:44 -08:00
Adil Hafeez
57327ba667
ensure that request id is consistent (#677)
* ensure that request id is consistent

* remove test debug/info statements
2026-01-07 08:44:41 -08:00
Adil Hafeez
ca95ffb63d
cargo clippy (#660) 2025-12-25 21:08:37 -08:00
Salman Paracha
e224cba3e3
Update docs to Plano (#639) 2025-12-23 17:14:50 -08:00
Adil Hafeez
15fbb6c3af
plano orchestration using plano orchestration 4b model (#637) 2025-12-22 18:05:49 -08:00
Adil Hafeez
2f9121407b
Use mcp tools for filter chain (#621)
* agents framework demo

* more changes

* add more changes

* pending changes

* fix tests

* fix more

* rebase with main and better handle error from mcp

* add trace for filters

* add test for client error, server error and for mcp error

* update schema validate code and rename kind => type in agent_filter

* fix agent description and pre-commit

* fix tests

* add provider specific request parsing in agents chat

* fix precommit and tests

* cleanup demo

* update readme

* fix pre-commit

* refactor tracing

* fix fmt

* fix: handle MessageContent enum in responses API conversion

- Update request.rs to handle new MessageContent enum structure from main
- MessageContent can now be Text(String) or Items(Vec<InputContent>)
- Handle new InputItem variants (ItemReference, FunctionCallOutput)
- Fixes compilation error after merging latest main (#632)

* address pr feedback

* fix span

* fix build

* update openai version
2025-12-17 17:30:14 -08:00
Shuguang Chen
cb82a83c7b
orchestration integration (#623)
* orchestration integration

* Convert compact json to spaced json
2025-12-17 17:20:19 -08:00
Salman Paracha
d5a273f740
enable state management for v1/responses (#631)
* first commit with tests to enable state mamangement via memory

* fixed logs to follow the conversational flow a bit better

* added support for supabase

* added the state_storage_v1_responses flag, and use that to store state appropriately

* cleaned up logs and fixed issue with connectivity for llm gateway in weather forecast demo

* fixed mixed inputs from openai v1/responses api (#632)

* fixed mixed inputs from openai v1/responses api

* removing tracing from model-alias-rouing

* handling additional input types from openairs

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>

* resolving PR comments

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
2025-12-17 12:18:38 -08:00
Salman Paracha
a79f55f313
Improve end to end tracing (#628)
* adding canonical tracing support via bright-staff

* improved formatting for tools in the traces

* removing anthropic from the currency exchange demo

* using Envoy to transport traces, not calling OTEL directly

* moving otel collcetor cluster outside tracing if/else

* minor fixes to not write to the OTEL collector if tracing is disabled

* fixed PR comments and added more trace attributes

* more fixes based on PR comments

* more clean up based on PR comments

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
2025-12-11 15:21:57 -08:00
Adil Hafeez
367f48bf1e
handle agent error better (#627) 2025-12-10 11:20:00 -08:00
Salman Paracha
a448c6e9cb
Add support for v1/responses API (#622)
* making first commit. still need to work on streaming respones

* making first commit. still need to work on streaming respones

* stream buffer implementation with tests

* adding grok API keys to workflow

* fixed changes based on code review

* adding support for bedrock models

* fixed issues with translation to claude code

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
2025-12-03 14:58:26 -08:00
Salman Paracha
88c2bd1851
removing model_server python module to brightstaff (function calling) (#615)
* adding function_calling functionality via rust

* fixed rendered YAML file

* removed model_server from envoy.template and forwarding traffic to bright_staff

* fixed bugs in function_calling.rs that were breaking tests. All good now

* updating e2e test to clean up disk usage

* removing Arch* models to be used as a default model if one is not specified

* if the user sets arch-function base_url we should honor it

* fixing demos as we needed to pin to a particular version of huggingface_hub else the chatbot ui wouldn't build

* adding a constant for Arch-Function model name

* fixing some edge cases with calls made to Arch-Function

* fixed JSON parsing issues in function_calling.rs

* fixed bug where the raw response from Arch-Function was re-encoded

* removed debug from supervisord.conf

* commenting out disk cleanup

* adding back disk space

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-288.local>
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local>
2025-11-22 12:55:00 -08:00
Salman Paracha
9407ae6af7
Add support for Amazon Bedrock Converse and ConverseStream (#588)
* first commit to get Bedrock Converse API working. Next commit support for streaming and binary frames

* adding translation from BedrockBinaryFrameDecoder to AnthropicMessagesEvent

* Claude Code works with Amazon Bedrock

* added tests for openai streaming from bedrock

* PR comments fixed

* adding support for bedrock in docs as supported provider

* cargo fmt

* revertted to chatgpt models for claude code routing

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-288.local>
Co-authored-by: Adil Hafeez <adil.hafeez@gmail.com>
2025-10-22 11:31:21 -07:00
Adil Hafeez
96e0732089
add support for agents (#564) 2025-10-14 14:01:11 -07:00
Adil Hafeez
f4d65e2469
stream access logs and improve access log format (#581) 2025-09-30 18:46:13 -07:00
Salman Paracha
f00870dccb
adding support for claude code routing (#575)
* fixed for claude code routing. first commit

* removing redundant enum tags for cache_control

* making sure that claude code can run via the archgw cli

* fixing broken config

* adding a README.md and updated the cli to use more of our defined patterns for params

* fixed config.yaml

* minor fixes to make sure PR is clean. Ready to ship

* adding claude-sonnet-4-5 to the config

* fixes based on PR

* fixed alias for README

* fixed 400 error handling tests, now that we write temperature to 1.0 for GPT-5

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-257.local>
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-288.local>
2025-09-29 19:23:08 -07:00
Salman Paracha
03c2cf6f0d
fixed changes related to max_tokens and processing http error codes like 400 properly (#574)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-257.local>
2025-09-25 17:00:37 -07:00
Salman Paracha
4eb2b410c5
adding support for model aliases in archgw (#566)
* adding support for model aliases in archgw

* fixed PR based on feedback

* removing README. Not relevant for PR

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-136.local>
2025-09-16 11:12:08 -07:00
Salman Paracha
fb0581fd39
add support for v1/messages and transformations (#558)
* pushing draft PR

* transformations are working. Now need to add some tests next

* updated tests and added necessary response transformations for Anthropics' message response object

* fixed bugs for integration tests

* fixed doc tests

* fixed serialization issues with enums on response

* adding some debug logs to help

* fixed issues with non-streaming responses

* updated the stream_context to update response bytes

* the serialized bytes length must be set in the response side

* fixed the debug statement that was causing the integration tests for wasm to fail

* fixing json parsing errors

* intentionally removing the headers

* making sure that we convert the raw bytes to the correct provider type upstream

* fixing non-streaming responses to tranform correctly

* /v1/messages works with transformations to and from /v1/chat/completions

* updating the CLI and demos to support anthropic vs. claude

* adding the anthropic key to the preference based routing tests

* fixed test cases and added more structured logs

* fixed integration tests and cleaned up logs

* added python client tests for anthropic and openai

* cleaned up logs and fixed issue with connectivity for llm gateway in weather forecast demo

* fixing the tests. python dependency order was broken

* updated the openAI client to fix demos

* removed the raw response debug statement

* fixed the dup cloning issue and cleaned up the ProviderRequestType enum and traits

* fixing logs

* moved away from string literals to consts

* fixed streaming from Anthropic Client to OpenAI

* removed debug statement that would likely trip up integration tests

* fixed integration tests for llm_gateway

* cleaned up test cases and removed unnecessary crates

* fixing comments from PR

* fixed bug whereby we were sending an OpenAIChatCompletions request object to llm_gateway even though the request may have been AnthropicMessages

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-4.local>
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-9.local>
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-10.local>
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-41.local>
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-136.local>
2025-09-10 07:40:30 -07:00
Salman Paracha
89ab51697a
updating the implementation of /v1/chat/completions to use the generi… (#548)
* updating the implementation of /v1/chat/completions to use the generic provider interfaces

* saving changes, although we will need a small re-factor after this as well

* more refactoring changes, getting close

* more refactoring changes to avoid unecessary re-direction and duplication

* more clean up

* more refactoring

* more refactoring to clean code and make stream_context.rs work

* removing unecessary trait implemenations

* some more clean-up

* fixed bugs

* fixing test cases, and making sure all references to the ChatCOmpletions* objects point to the new types

* refactored changes to support enum dispatch

* removed the dependency on try_streaming_from_bytes into a try_from trait implementation

* updated readme based on new usage

* updated code based on code review comments

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-2.local>
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-4.local>
2025-08-20 12:55:29 -07:00
Adil Hafeez
04c7e5a175
bug fix - allow image content to pass through (#539)
fixes https://github.com/katanemo/archgw/issues/535
2025-07-25 01:22:06 -07:00
Adil Hafeez
d341f4365b
In request path use same format for usage preferences as arch_config (#533) 2025-07-21 18:31:19 -07:00
Adil Hafeez
83f4d33434
refactor logging in brightstaff (#532)
refactor logs, move unnecessary info log statements to debug and start logging latest chat completion message to log
2025-07-17 16:00:04 -07:00
Adil Hafeez
f819ee3507
pass model name in header when a route is selected when using usage preferences (#531) 2025-07-17 13:41:58 -07:00
Adil Hafeez
a7fddf30f9
better model names (#517) 2025-07-11 16:42:16 -07:00