mirror of
https://github.com/katanemo/plano.git
synced 2026-04-25 00:36:34 +02:00
* Rename all arch references to plano across the codebase
Complete rebrand from "Arch"/"archgw" to "Plano" including:
- Config files: arch_config_schema.yaml, workflow, demo configs
- Environment variables: ARCH_CONFIG_* → PLANO_CONFIG_*
- Python CLI: variables, functions, file paths, docker mounts
- Rust crates: config paths, log messages, metadata keys
- Docker/build: Dockerfile, supervisord, .dockerignore, .gitignore
- Docker Compose: volume mounts and env vars across all demos/tests
- GitHub workflows: job/step names
- Shell scripts: log messages
- Demos: Python code, READMEs, VS Code configs, Grafana dashboard
- Docs: RST includes, code comments, config references
- Package metadata: package.json, pyproject.toml, uv.lock
External URLs (docs.archgw.com, github.com/katanemo/archgw) left as-is.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update remaining arch references in docs
- Rename RST cross-reference labels: arch_access_logging, arch_overview_tracing, arch_overview_threading → plano_*
- Update label references in request_lifecycle.rst
- Rename arch_config_state_storage_example.yaml → plano_config_state_storage_example.yaml
- Update config YAML comments: "Arch creates/uses" → "Plano creates/uses"
- Update "the Arch gateway" → "the Plano gateway" in configuration_reference.rst
- Update arch_config_schema.yaml reference in provider_models.py
- Rename arch_agent_router → plano_agent_router in config example
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix remaining arch references found in second pass
- config/docker-compose.dev.yaml: ARCH_CONFIG_FILE → PLANO_CONFIG_FILE,
arch_config.yaml → plano_config.yaml, archgw_logs → plano_logs
- config/test_passthrough.yaml: container mount path
- tests/e2e/docker-compose.yaml: source file path (was still arch_config.yaml)
- cli/planoai/core.py: comment and log message
- crates/brightstaff/src/tracing/constants.rs: doc comment
- tests/{e2e,archgw}/common.py: get_arch_messages → get_plano_messages,
arch_state/arch_messages variables renamed
- tests/{e2e,archgw}/test_prompt_gateway.py: updated imports and usages
- demos/shared/test_runner/{common,test_demos}.py: same renames
- tests/e2e/test_model_alias_routing.py: docstring
- .dockerignore: archgw_modelserver → plano_modelserver
- demos/use_cases/claude_code_router/pretty_model_resolution.sh: container name
Note: x-arch-* HTTP header values and Rust constant names intentionally
preserved for backwards compatibility with existing deployments.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
504 lines
13 KiB
ReStructuredText
504 lines
13 KiB
ReStructuredText
.. _client_libraries:
|
|
|
|
Client Libraries
|
|
================
|
|
|
|
Plano provides a unified interface that works seamlessly with multiple client libraries and tools. You can use your preferred client library without changing your existing code - just point it to Plano's gateway endpoints.
|
|
|
|
Supported Clients
|
|
------------------
|
|
|
|
- **OpenAI SDK** - Full compatibility with OpenAI's official client
|
|
- **Anthropic SDK** - Native support for Anthropic's client library
|
|
- **cURL** - Direct HTTP requests for any programming language
|
|
- **Custom HTTP Clients** - Any HTTP client that supports REST APIs
|
|
|
|
Gateway Endpoints
|
|
-----------------
|
|
|
|
Plano exposes three main endpoints:
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
:widths: 40 60
|
|
|
|
* - Endpoint
|
|
- Purpose
|
|
* - ``http://127.0.0.1:12000/v1/chat/completions``
|
|
- OpenAI-compatible chat completions (LLM Gateway)
|
|
* - ``http://127.0.0.1:12000/v1/responses``
|
|
- OpenAI Responses API with :ref:`conversational state management <managing_conversational_state>` (LLM Gateway)
|
|
* - ``http://127.0.0.1:12000/v1/messages``
|
|
- Anthropic-compatible messages (LLM Gateway)
|
|
|
|
OpenAI (Python) SDK
|
|
-------------------
|
|
|
|
The OpenAI SDK works with any provider through Plano's OpenAI-compatible endpoint.
|
|
|
|
**Installation:**
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install openai
|
|
|
|
**Basic Usage:**
|
|
|
|
.. code-block:: python
|
|
|
|
from openai import OpenAI
|
|
|
|
# Point to Plano's LLM Gateway
|
|
client = OpenAI(
|
|
api_key="test-key", # Can be any value for local testing
|
|
base_url="http://127.0.0.1:12000/v1"
|
|
)
|
|
|
|
# Use any model configured in your plano_config.yaml
|
|
completion = client.chat.completions.create(
|
|
model="gpt-4o-mini", # Or use :ref:`model aliases <model_aliases>` like "fast-model"
|
|
max_tokens=50,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "Hello, how are you?"
|
|
}
|
|
]
|
|
)
|
|
|
|
print(completion.choices[0].message.content)
|
|
|
|
**Streaming Responses:**
|
|
|
|
.. code-block:: python
|
|
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
api_key="test-key",
|
|
base_url="http://127.0.0.1:12000/v1"
|
|
)
|
|
|
|
stream = client.chat.completions.create(
|
|
model="gpt-4o-mini",
|
|
max_tokens=50,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "Tell me a short story"
|
|
}
|
|
],
|
|
stream=True
|
|
)
|
|
|
|
# Collect streaming chunks
|
|
for chunk in stream:
|
|
if chunk.choices[0].delta.content:
|
|
print(chunk.choices[0].delta.content, end="")
|
|
|
|
**Using with Non-OpenAI Models:**
|
|
|
|
The OpenAI SDK can be used with any provider configured in Plano:
|
|
|
|
.. code-block:: python
|
|
|
|
# Using Claude model through OpenAI SDK
|
|
completion = client.chat.completions.create(
|
|
model="claude-3-5-sonnet-20241022",
|
|
max_tokens=50,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "Explain quantum computing briefly"
|
|
}
|
|
]
|
|
)
|
|
|
|
# Using Ollama model through OpenAI SDK
|
|
completion = client.chat.completions.create(
|
|
model="llama3.1",
|
|
max_tokens=50,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "What's the capital of France?"
|
|
}
|
|
]
|
|
)
|
|
|
|
OpenAI Responses API (Conversational State)
|
|
-------------------------------------------
|
|
|
|
The OpenAI Responses API (``v1/responses``) enables multi-turn conversations with automatic state management. Plano handles conversation history for you, so you don't need to manually include previous messages in each request.
|
|
|
|
See :ref:`managing_conversational_state` for detailed configuration and storage backend options.
|
|
|
|
**Installation:**
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install openai
|
|
|
|
**Basic Multi-Turn Conversation:**
|
|
|
|
.. code-block:: python
|
|
|
|
from openai import OpenAI
|
|
|
|
# Point to Plano's LLM Gateway
|
|
client = OpenAI(
|
|
api_key="test-key",
|
|
base_url="http://127.0.0.1:12000/v1"
|
|
)
|
|
|
|
# First turn - creates a new conversation
|
|
response = client.chat.completions.create(
|
|
model="gpt-4o-mini",
|
|
messages=[
|
|
{"role": "user", "content": "My name is Alice"}
|
|
]
|
|
)
|
|
|
|
# Extract response_id for conversation continuity
|
|
response_id = response.id
|
|
print(f"Assistant: {response.choices[0].message.content}")
|
|
|
|
# Second turn - continues the conversation
|
|
# Plano automatically retrieves and merges previous context
|
|
response = client.chat.completions.create(
|
|
model="gpt-4o-mini",
|
|
messages=[
|
|
{"role": "user", "content": "What's my name?"}
|
|
],
|
|
metadata={"response_id": response_id} # Reference previous conversation
|
|
)
|
|
|
|
print(f"Assistant: {response.choices[0].message.content}")
|
|
# Output: "Your name is Alice"
|
|
|
|
**Using with Any Provider:**
|
|
|
|
The Responses API works with any LLM provider configured in Plano:
|
|
|
|
.. code-block:: python
|
|
|
|
# Multi-turn conversation with Claude
|
|
response = client.chat.completions.create(
|
|
model="claude-3-5-sonnet-20241022",
|
|
messages=[
|
|
{"role": "user", "content": "Let's discuss quantum physics"}
|
|
]
|
|
)
|
|
|
|
response_id = response.id
|
|
|
|
# Continue conversation - Plano manages state regardless of provider
|
|
response = client.chat.completions.create(
|
|
model="claude-3-5-sonnet-20241022",
|
|
messages=[
|
|
{"role": "user", "content": "Tell me more about entanglement"}
|
|
],
|
|
metadata={"response_id": response_id}
|
|
)
|
|
|
|
**Key Benefits:**
|
|
|
|
* **Reduced payload size**: No need to send full conversation history in each request
|
|
* **Provider flexibility**: Use any configured LLM provider with state management
|
|
* **Automatic context merging**: Plano handles conversation continuity behind the scenes
|
|
* **Production-ready storage**: Configure :ref:`PostgreSQL or memory storage <managing_conversational_state>` based on your needs
|
|
|
|
Anthropic (Python) SDK
|
|
----------------------
|
|
|
|
The Anthropic SDK works with any provider through Plano's Anthropic-compatible endpoint.
|
|
|
|
**Installation:**
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install anthropic
|
|
|
|
**Basic Usage:**
|
|
|
|
.. code-block:: python
|
|
|
|
import anthropic
|
|
|
|
# Point to Plano's LLM Gateway
|
|
client = anthropic.Anthropic(
|
|
api_key="test-key", # Can be any value for local testing
|
|
base_url="http://127.0.0.1:12000"
|
|
)
|
|
|
|
# Use any model configured in your plano_config.yaml
|
|
message = client.messages.create(
|
|
model="claude-3-5-sonnet-20241022",
|
|
max_tokens=50,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "Hello, please respond briefly!"
|
|
}
|
|
]
|
|
)
|
|
|
|
print(message.content[0].text)
|
|
|
|
**Streaming Responses:**
|
|
|
|
.. code-block:: python
|
|
|
|
import anthropic
|
|
|
|
client = anthropic.Anthropic(
|
|
api_key="test-key",
|
|
base_url="http://127.0.0.1:12000"
|
|
)
|
|
|
|
with client.messages.stream(
|
|
model="claude-3-5-sonnet-20241022",
|
|
max_tokens=50,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "Tell me about artificial intelligence"
|
|
}
|
|
]
|
|
) as stream:
|
|
# Collect text deltas
|
|
for text in stream.text_stream:
|
|
print(text, end="")
|
|
|
|
# Get final assembled message
|
|
final_message = stream.get_final_message()
|
|
final_text = "".join(block.text for block in final_message.content if block.type == "text")
|
|
|
|
**Using with Non-Anthropic Models:**
|
|
|
|
The Anthropic SDK can be used with any provider configured in Plano:
|
|
|
|
.. code-block:: python
|
|
|
|
# Using OpenAI model through Anthropic SDK
|
|
message = client.messages.create(
|
|
model="gpt-4o-mini",
|
|
max_tokens=50,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "Explain machine learning in simple terms"
|
|
}
|
|
]
|
|
)
|
|
|
|
# Using Ollama model through Anthropic SDK
|
|
message = client.messages.create(
|
|
model="llama3.1",
|
|
max_tokens=50,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": "What is Python programming?"
|
|
}
|
|
]
|
|
)
|
|
|
|
cURL Examples
|
|
-------------
|
|
|
|
For direct HTTP requests or integration with any programming language:
|
|
|
|
**OpenAI-Compatible Endpoint:**
|
|
|
|
.. code-block:: bash
|
|
|
|
# Basic request
|
|
curl -X POST http://127.0.0.1:12000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer test-key" \
|
|
-d '{
|
|
"model": "gpt-4o-mini",
|
|
"messages": [
|
|
{"role": "user", "content": "Hello!"}
|
|
],
|
|
"max_tokens": 50
|
|
}'
|
|
|
|
# Using :ref:`model aliases <model_aliases>`
|
|
curl -X POST http://127.0.0.1:12000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "fast-model",
|
|
"messages": [
|
|
{"role": "user", "content": "Summarize this text..."}
|
|
],
|
|
"max_tokens": 100
|
|
}'
|
|
|
|
# Streaming request
|
|
curl -X POST http://127.0.0.1:12000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-4o-mini",
|
|
"messages": [
|
|
{"role": "user", "content": "Tell me a story"}
|
|
],
|
|
"stream": true,
|
|
"max_tokens": 200
|
|
}'
|
|
|
|
**Anthropic-Compatible Endpoint:**
|
|
|
|
.. code-block:: bash
|
|
|
|
# Basic request
|
|
curl -X POST http://127.0.0.1:12000/v1/messages \
|
|
-H "Content-Type: application/json" \
|
|
-H "x-api-key: test-key" \
|
|
-H "anthropic-version: 2023-06-01" \
|
|
-d '{
|
|
"model": "claude-3-5-sonnet-20241022",
|
|
"max_tokens": 50,
|
|
"messages": [
|
|
{"role": "user", "content": "Hello Claude!"}
|
|
]
|
|
}'
|
|
|
|
Cross-Client Compatibility
|
|
--------------------------
|
|
|
|
One of Plano's key features is cross-client compatibility. You can:
|
|
|
|
**Use OpenAI SDK with Claude Models:**
|
|
|
|
.. code-block:: python
|
|
|
|
# OpenAI client calling Claude model
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
|
|
|
|
response = client.chat.completions.create(
|
|
model="claude-3-5-sonnet-20241022", # Claude model
|
|
messages=[{"role": "user", "content": "Hello"}]
|
|
)
|
|
|
|
**Use Anthropic SDK with OpenAI Models:**
|
|
|
|
.. code-block:: python
|
|
|
|
# Anthropic client calling OpenAI model
|
|
import anthropic
|
|
|
|
client = anthropic.Anthropic(base_url="http://127.0.0.1:12000", api_key="test")
|
|
|
|
response = client.messages.create(
|
|
model="gpt-4o-mini", # OpenAI model
|
|
max_tokens=50,
|
|
messages=[{"role": "user", "content": "Hello"}]
|
|
)
|
|
|
|
**Mix and Match with** :ref:`Model Aliases <model_aliases>`:
|
|
|
|
.. code-block:: python
|
|
|
|
# Same code works with different underlying models
|
|
def ask_question(client, question):
|
|
return client.chat.completions.create(
|
|
model="reasoning-model", # Alias could point to any provider
|
|
messages=[{"role": "user", "content": question}]
|
|
)
|
|
|
|
# Works regardless of what "reasoning-model" actually points to
|
|
openai_client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
|
|
response = ask_question(openai_client, "Solve this math problem...")
|
|
|
|
Error Handling
|
|
--------------
|
|
|
|
**OpenAI SDK Error Handling:**
|
|
|
|
.. code-block:: python
|
|
|
|
from openai import OpenAI
|
|
import openai
|
|
|
|
client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
|
|
|
|
try:
|
|
completion = client.chat.completions.create(
|
|
model="nonexistent-model",
|
|
messages=[{"role": "user", "content": "Hello"}]
|
|
)
|
|
except openai.NotFoundError as e:
|
|
print(f"Model not found: {e}")
|
|
except openai.APIError as e:
|
|
print(f"API error: {e}")
|
|
|
|
**Anthropic SDK Error Handling:**
|
|
|
|
.. code-block:: python
|
|
|
|
import anthropic
|
|
|
|
client = anthropic.Anthropic(base_url="http://127.0.0.1:12000", api_key="test")
|
|
|
|
try:
|
|
message = client.messages.create(
|
|
model="nonexistent-model",
|
|
max_tokens=50,
|
|
messages=[{"role": "user", "content": "Hello"}]
|
|
)
|
|
except anthropic.NotFoundError as e:
|
|
print(f"Model not found: {e}")
|
|
except anthropic.APIError as e:
|
|
print(f"API error: {e}")
|
|
|
|
Best Practices
|
|
--------------
|
|
|
|
**Use** :ref:`Model Aliases <model_aliases>`:
|
|
Instead of hardcoding provider-specific model names, use semantic aliases:
|
|
|
|
.. code-block:: python
|
|
|
|
# Good - uses semantic alias
|
|
model = "fast-model"
|
|
|
|
# Less ideal - hardcoded provider model
|
|
model = "openai/gpt-4o-mini"
|
|
|
|
**Environment-Based Configuration:**
|
|
Use different :ref:`model aliases <model_aliases>` for different environments:
|
|
|
|
.. code-block:: python
|
|
|
|
import os
|
|
|
|
# Development uses cheaper/faster models
|
|
model = os.getenv("MODEL_ALIAS", "dev.chat.v1")
|
|
|
|
response = client.chat.completions.create(
|
|
model=model,
|
|
messages=[{"role": "user", "content": "Hello"}]
|
|
)
|
|
|
|
**Graceful Fallbacks:**
|
|
Implement fallback logic for better reliability:
|
|
|
|
.. code-block:: python
|
|
|
|
def chat_with_fallback(client, messages, primary_model="smart-model", fallback_model="fast-model"):
|
|
try:
|
|
return client.chat.completions.create(model=primary_model, messages=messages)
|
|
except Exception as e:
|
|
print(f"Primary model failed, trying fallback: {e}")
|
|
return client.chat.completions.create(model=fallback_model, messages=messages)
|
|
|
|
See Also
|
|
--------
|
|
|
|
- :ref:`supported_providers` - Configure your providers and see available models
|
|
- :ref:`model_aliases` - Create semantic model names
|
|
- :ref:`llm_router` - Intelligent routing capabilities
|