mirror of
https://github.com/katanemo/plano.git
synced 2026-05-04 21:32:43 +02:00
* http-filter: add fully http based demo (remove MCP) * Fix pre-commit formatting * add instructions about uv build/sync in cli (#675) * Musa/demo fix (#676) * fix demo with travel agent * Update .gitignore * remove sse chunk rendering * ensure that request id is consistent (#677) * ensure that request id is consistent * remove test debug/info statements * Introduce signals change (#655) * adding support for signals * reducing false positives for signals like positive interaction * adding docs. Still need to fix the messages list, but waiting on PR #621 * Improve frustration detection: normalize contractions and refine punctuation * Further refine test cases with longer messages * minor doc changes * fixing echo statement for build * fixing the messages construction and using the trait for signals * update signals docs * fixed some minor doc changes * added more tests and fixed docuemtnation. PR 100% ready * made fixes based on PR comments * Optimize latency 1. replace sliding window approach with trigram containment check 2. add code to pre-compute ngrams for patterns * removed some debug statements to make tests easier to read * PR comments to make ObservableStreamProcessor accept optonal Vec<Messagges> * fixed PR comments --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> Co-authored-by: MeiyuZhong <mariazhong9612@gmail.com> Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com> * pass request_id in orchestrator and routing model (#678) * release 0.4.2 (#679) * tweaks to web and docs to align to 0.4.2 (#680) * tweaks to web and docs to align to 0.4.2 * made our release banner clickable --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> * Added request id across agents * fix http_filter agent: request id + pre-commit --------- Co-authored-by: Adil Hafeez <adil@katanemo.com> Co-authored-by: Musa <malikmusa1323@gmail.com> Co-authored-by: Salman Paracha <salman.paracha@gmail.com> Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-342.local> Co-authored-by: nehcgs <54548843+nehcgs@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| claude_code_router | ||
| http_filter | ||
| llm_routing | ||
| mcp_filter | ||
| model_alias_routing | ||
| model_choice_with_test_harness | ||
| ollama | ||
| preference_based_routing | ||
| spotify_bearer_auth | ||
| travel_agents | ||
| README.md | ||
Use Arch for (Model-based) LLM Routing Step 1. Create arch config file
Create config.yaml file with following content:
version: v0.1.0
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- access_key: $OPENAI_API_KEY
model: openai/gpt-4o
default: true
- access_key: $MISTRAL_API_KEY
model: mistral/ministral-3b-latest
Step 2. Start arch gateway
Once the config file is created ensure that you have env vars setup for MISTRAL_API_KEY and OPENAI_API_KEY (or these are defined in .env file).
Start arch gateway,
$ planoai up config.yaml
# Or if installed with uv: uvx planoai up config.yaml
2024-12-05 11:24:51,288 - planoai.main - INFO - Starting plano cli version: 0.4.2
2024-12-05 11:24:51,825 - planoai.utils - INFO - Schema validation successful!
2024-12-05 11:24:51,825 - planoai.main - INFO - Starting arch model server and arch gateway
...
2024-12-05 11:25:16,131 - planoai.core - INFO - Container is healthy!
Step 3: Interact with LLM
Step 3.1: Using OpenAI python client
Make outbound calls via Arch gateway
from openai import OpenAI
# Use the OpenAI client as usual
client = OpenAI(
# No need to set a specific openai.api_key since it's configured in Arch's gateway
api_key = '--',
# Set the OpenAI API base URL to the Arch gateway endpoint
base_url = "http://127.0.0.1:12000/v1"
)
response = client.chat.completions.create(
# we select model from arch_config file
model="None",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print("OpenAI Response:", response.choices[0].message.content)
Step 3.2: Using curl command
$ curl --header 'Content-Type: application/json' \
--data '{"messages": [{"role": "user","content": "What is the capital of France?"}], "model": "none"}' \
http://localhost:12000/v1/chat/completions
{
...
"model": "gpt-4o-2024-08-06",
"choices": [
{
...
"messages": {
"role": "assistant",
"content": "The capital of France is Paris.",
},
}
],
...
}
You can override model selection using x-arch-llm-provider-hint header. For example if you want to use mistral using following curl command,
$ curl --header 'Content-Type: application/json' \
--header 'x-arch-llm-provider-hint: ministral-3b' \
--data '{"messages": [{"role": "user","content": "What is the capital of France?"}], "model": "none"}' \
http://localhost:12000/v1/chat/completions
{
...
"model": "ministral-3b-latest",
"choices": [
{
"messages": {
"role": "assistant",
"content": "The capital of France is Paris. It is the most populous city in France and is known for its iconic landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also a major global center for art, fashion, gastronomy, and culture.",
},
...
}
],
...
}