Update docs to Plano (#639)

This commit is contained in:
Salman Paracha 2025-12-23 17:14:50 -08:00 committed by GitHub
parent 15fbb6c3af
commit e224cba3e3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
139 changed files with 4407 additions and 24735 deletions

View file

@ -1,70 +0,0 @@
.. _intro_to_arch:
Intro to Arch
=============
AI demos are easy to build. But past the thrill of a quick hack, you are left building, maintaining and scaling low-level plumbing code for agents that slows down AI innovation.
For example:
- You want to build specialized agents, but get stuck writing **routing and handoff** code.
- You bogged down with prompt engineering work to **clarify user intent and validate inputs**.
- You want to **quickly and safely use new LLMs** but get stuck writing integration code.
- You waste cycles writing and maintaining **observability** code, when it can be transparent.
- You want to **apply guardrails**, but have to write custom code for each prompt and LLM.
Arch is designed to solve these problems by providing a unified, out-of-process architecture that integrates with your existing application stack, enabling you to focus on building high-level features rather than plumbing — all without locking you into a framework.
.. figure:: /_static/img/arch_network_diagram_high_level.png
:width: 100%
:align: center
High-level network flow of where Arch Gateway sits in your agentic stack. Designed for both ingress and egress prompt traffic.
`Arch <https://github.com/katanemo/arch>`_ is a smart edge and AI gateway for AI-native apps - built by the contributors of Envoy Proxy with the belief that:
*Prompts are nuanced and opaque user requests, which require the same capabilities as traditional HTTP requests
including secure handling, intelligent routing, robust observability, and integration with backend (API)
systems for personalization - all outside business logic.*
In practice, achieving the above goal is incredibly difficult. Arch attempts to do so by providing the following high level features:
**Out-of-process architecture, built on** `Envoy <http://envoyproxy.io/>`_:
Arch takes a dependency on Envoy and is a self-contained process that is designed to run alongside your application servers.
Arch uses Envoy's HTTP connection management subsystem, HTTP L7 filtering and telemetry capabilities to extend the functionality exclusively for prompts and LLMs.
This gives Arch several advantages:
* Arch builds on Envoy's proven success. Envoy is used at massive scale by the leading technology companies of our time including `AirBnB <https://www.airbnb.com>`_, `Dropbox <https://www.dropbox.com>`_, `Google <https://www.google.com>`_, `Reddit <https://www.reddit.com>`_, `Stripe <https://www.stripe.com>`_, etc. Its battle tested and scales linearly with usage and enables developers to focus on what really matters: application features and business logic.
* Arch works with any application language. A single Arch deployment can act as gateway for AI applications written in Python, Java, C++, Go, Php, etc.
* Arch can be deployed and upgraded quickly across your infrastructure transparently without the horrid pain of deploying library upgrades in your applications.
**Engineered with Fast Task-Specific LLMs (TLMs):** Arch is engineered with specialized LLMs that are designed for the fast, cost-effective and accurate handling of prompts.
These LLMs are designed to be best-in-class for critical tasks like:
* **Function Calling:** Arch helps you easily personalize your applications by enabling calls to application-specific (API) operations via user prompts.
This involves any predefined functions or APIs you want to expose to users to perform tasks, gather information, or manipulate data.
With function calling, you have flexibility to support "agentic" experiences tailored to specific use cases - from updating insurance claims to creating ad campaigns - via prompts.
Arch analyzes prompts, extracts critical information from prompts, engages in lightweight conversation to gather any missing parameters and makes API calls so that you can focus on writing business logic.
For more details, read :ref:`Function Calling <function_calling>`.
* **Prompt Guard:** Arch helps you improve the safety of your application by applying prompt guardrails in a centralized way for better governance hygiene.
With prompt guardrails you can prevent ``jailbreak attempts`` present in user's prompts without having to write a single line of code.
To learn more about how to configure guardrails available in Arch, read :ref:`Prompt Guard <prompt_guard>`.
**Traffic Management:** Arch offers several capabilities for LLM calls originating from your applications, including smart retries on errors from upstream LLMs, and automatic cut-over to other LLMs configured in Arch for continuous availability and disaster recovery scenarios.
Arch extends Envoy's `cluster subsystem <https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/cluster_manager>`_ to manage upstream connections to LLMs so that you can build resilient AI applications.
**Front/edge Gateway:** There is substantial benefit in using the same software at the edge (observability, traffic shaping algorithms, applying guardrails, etc.) as for outbound LLM inference use cases.
Arch has the feature set that makes it exceptionally well suited as an edge gateway for AI applications.
This includes TLS termination, applying guardrail early in the process, intelligent parameter gathering from prompts, and prompt-based routing to backend APIs.
**Best-In Class Monitoring:** Arch offers several monitoring metrics that help you understand three critical aspects of
your application: latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which
your application is responding to users, which includes metrics like time to first token (TFT), time per output token (TOT)
metrics, and the total latency as perceived by users.
**End-to-End Tracing:** Arch propagates trace context using the W3C Trace Context standard, specifically through the ``traceparent`` header.
This allows each component in the system to record its part of the request flow, enabling end-to-end tracing across the entire application.
By using OpenTelemetry, Arch ensures that developers can capture this trace data consistently and in a format compatible with various observability tools.
For more details, read :ref:`Tracing <arch_overview_tracing>`.

View file

@ -0,0 +1,56 @@
.. _intro_to_plano:
Intro to Plano
==============
Building agentic demos is easy. Delivering agentic applications safely, reliably, and repeatably to production is hard. After a quick hack, you end up building the "hidden AI middleware" to reach production: routing logic to reach the right agent, guardrail hooks for safety and moderation, evaluation and observability glue for continuous learning, and model/provider quirks — scattered across frameworks and application code.
Plano solves this by moving core delivery concerns into a unified, out-of-process dataplane. Core capabilities:
- **🚦 Orchestration:** Low-latency orchestration between agents, and add new agents without changing app code. When routing lives inside app code, it becomes hard to evolve and easy to duplicate. Moving orchestration into a centrally managed dataplane lets you change strategies without touching your agents, improving performance and reducing maintenance burden while avoiding tight coupling.
- **🛡️ Guardrails & Memory Hooks:** Apply jailbreak protection, content policies, and context workflows (e.g., rewriting, retrieval, redaction) once via :ref:`Filter Chains <filter_chain>` at the dataplane. Instead of re-implementing these in every agentic service, you get centralized governance, reduced code duplication, and consistent behavior across your stack.
- **🔗 Model Agility:** Route by model, alias (semantic names), or automatically via preferences so agents stay decoupled from specific providers. Swap or add models without refactoring prompts, tool-calling, or streaming handlers throughout your codebase by using Plano's smart routing and unified API.
- **🕵 Agentic Signals™:** Zero-code capture of behavior signals, traces, and metrics consistently across every agent. Rather than stitching together logging and metrics per framework, Plano surfaces traces, token usage, and learning signals in one place so you can iterate safely.
Built by core contributors to the widely adopted Envoy Proxy <https://www.envoyproxy.io/>_, Plano gives you a productiongrade foundation for agentic applications. It helps **developers** stay focused on the core logic of their agents, helps **product teams** shorten feedback loops for learning, and helps **engineering teams** standardize policy and safety across agents and LLMs. Plano is grounded in open protocols (de facto: OpenAIstyle v1/responses, de jure: MCP) and proven patterns like sidecar deployments, so it plugs in cleanly while remaining robust, scalable, and flexible.
In practice, achieving the above goal is incredibly difficult. Plano attempts to do so by providing the following high level features:
.. figure:: /_static/img/plano_network_diagram_high_level.png
:width: 100%
:align: center
High-level network flow of where Plano sits in your agentic stack. Designed for both ingress and egress prompt traffic.
**Engineered with Task-Specific LLMs (TLMs):** Plano is engineered with specialized LLMs that are designed for fast, cost-effective and accurate handling of prompts.
These LLMs are designed to be best-in-class for critical tasks like:
* **Agent Orchestration:** `Plano-Orchestrator <https://huggingface.co/collections/katanemo/plano-orchestrator>`_ is a family of state-of-the-art routing and orchestration models that decide which agent(s) or LLM(s) should handle each request, and in what sequence. Built for real-world multi-agent deployments, it analyzes user intent and conversation context to make precise routing and orchestration decisions while remaining efficient enough for low-latency production use across general chat, coding, and long-context multi-turn conversations.
* **Function Calling:** Plano lets you expose application-specific (API) operations as tools so that your agents can update records, fetch data, or trigger determininistic workflows via prompts. Under the hood this is backed by Arch-Function-Chat; for more details, read :ref:`Function Calling <function_calling>`.
* **Guardrails:** Plano helps you improve the safety of your application by applying prompt guardrails in a centralized way for better governance hygiene.
With prompt guardrails you can prevent ``jailbreak attempts`` present in user's prompts without having to write a single line of code.
To learn more about how to configure guardrails available in Plano, read :ref:`Prompt Guard <prompt_guard>`.
**Model Proxy:** Plano offers several capabilities for LLM calls originating from your applications, including smart retries on errors from upstream LLMs and automatic cut-over to other LLMs configured in Plano for continuous availability and disaster recovery scenarios. From your application's perspective you keep using an OpenAI-compatible API, while Plano owns resiliency and failover policies in one place.
Plano extends Envoy's `cluster subsystem <https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/cluster_manager>`_ to manage upstream connections to LLMs so that you can build resilient, provider-agnostic AI applications.
**Edge Proxy:** There is substantial benefit in using the same software at the edge (observability, traffic shaping algorithms, applying guardrails, etc.) as for outbound LLM inference use cases. Plano has the feature set that makes it exceptionally well suited as an edge gateway for AI applications.
This includes TLS termination, applying guardrails early in the request flow, and intelligently deciding which agent(s) or LLM(s) should handle each request and in what sequence. In practice, you configure listeners and policies once, and every inbound and outbound call flows through the same hardened gateway.
**Zero-Code Agent Signals™ & Tracing:** Zero-code capture of behavior signals, traces, and metrics consistently across every agent. Plano propagates trace context using the W3C Trace Context standard, specifically through the ``traceparent`` header. This allows each component in the system to record its part of the request flow, enabling end-to-end tracing across the entire application. By using OpenTelemetry, Plano ensures that developers can capture this trace data consistently and in a format compatible with various observability tools.
**Best-In Class Monitoring:** Plano offers several monitoring metrics that help you understand three critical aspects of your application: latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which your application is responding to users, which includes metrics like time to first token (TFT), time per output token (TOT) metrics, and the total latency as perceived by users.
**Out-of-process architecture, built on** `Envoy <http://envoyproxy.io/>`_:
Plano takes a dependency on Envoy and is a self-contained process that is designed to run alongside your application servers. Plano uses Envoy's HTTP connection management subsystem, HTTP L7 filtering and telemetry capabilities to extend the functionality exclusively for prompts and LLMs.
This gives Plano several advantages:
* Plano builds on Envoy's proven success. Envoy is used at massive scale by the leading technology companies of our time including `AirBnB <https://www.airbnb.com>`_, `Dropbox <https://www.dropbox.com>`_, `Google <https://www.google.com>`_, `Reddit <https://www.reddit.com>`_, `Stripe <https://www.stripe.com>`_, etc. Its battle tested and scales linearly with usage and enables developers to focus on what really matters: application features and business logic.
* Plano works with any application language. A single Plano deployment can act as gateway for AI applications written in Python, Java, C++, Go, Php, etc.
* Plano can be deployed and upgraded quickly across your infrastructure transparently without the horrid pain of deploying library upgrades in your applications.

View file

@ -1,38 +1,38 @@
.. _overview:
Overview
============
`Arch <https://github.com/katanemo/arch>`_ is a smart edge and AI gateway for AI agents - one that is natively designed to handle and process prompts, not just network traffic.
========
`Plano <https://github.com/katanemo/plano>`_ is delivery infrastructure for agentic apps. A models-native proxy server and data plane designed to help you build agents faster, and deliver them reliably to production.
Built by contributors to the widely adopted `Envoy Proxy <https://www.envoyproxy.io/>`_, Arch handles the *pesky low-level work* in building agentic apps — like applying guardrails, clarifying vague user input, routing prompts to the right agent, and unifying access to any LLM. Its a protocol-friendly and framework-agnostic infrastructure layer designed to help you build and ship agentic apps faster.
Plano pulls out the rote plumbing work (the “hidden AI middleware”) and decouples you from brittle, everchanging framework abstractions. It centralizes what shouldnt be bespoke in every codebase like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for UX and DX agility. Use any language or AI framework, and ship agents to production faster with Plano.
In this documentation, you will learn how to quickly set up Arch to trigger API calls via prompts, apply prompt guardrails without writing any application-level logic,
simplify the interaction with upstream LLMs, and improve observability all while simplifying your application development process.
Built by core contributors to the widely adopted `Envoy Proxy <https://www.envoyproxy.io/>`_, Plano gives you a productiongrade foundation for agentic applications. It helps **developers** stay focused on the core logic of their agents, helps **product teams** shorten feedback loops for learning, and helps **engineering teams** standardize policy and safety across agents and LLMs. Plano is grounded in open protocols (de facto: OpenAIstyle v1/responses, de jure: MCP) and proven patterns like sidecar deployments, so it plugs in cleanly while remaining robust, scalable, and flexible.
.. figure:: /_static/img/arch_network_diagram_high_level.png
In this documentation, youll learn how to set up Plano quickly, trigger API calls via prompts, apply guardrails without tight coupling with application code, simplify model and provider integration, and improve observability — so that you can focus on what matters most: the core product logic of your agents.
.. figure:: /_static/img/plano_network_diagram_high_level.png
:width: 100%
:align: center
High-level network flow of where Arch Gateway sits in your agentic stack. Designed for both ingress and egress prompt traffic.
High-level network flow of where Plano sits in your agentic stack. Designed for both ingress and egress traffic.
Get Started
-----------
This section introduces you to Arch and helps you get set up quickly:
This section introduces you to Plano and helps you get set up quickly:
.. grid:: 3
.. grid-item-card:: :octicon:`apps` Overview
:link: overview.html
Overview of Arch and Doc navigation
Overview of Plano and Doc navigation
.. grid-item-card:: :octicon:`book` Intro to Arch
:link: intro_to_arch.html
.. grid-item-card:: :octicon:`book` Intro to Plano
:link: intro_to_plano.html
Explore Arch's features and developer workflow
Explore Plano's features and developer workflow
.. grid-item-card:: :octicon:`rocket` Quickstart
:link: quickstart.html
@ -43,61 +43,61 @@ This section introduces you to Arch and helps you get set up quickly:
Concepts
--------
Deep dive into essential ideas and mechanisms behind Arch:
Deep dive into essential ideas and mechanisms behind Plano:
.. grid:: 3
.. grid-item-card:: :octicon:`package` Tech Overview
:link: ../concepts/tech_overview/tech_overview.html
.. grid-item-card:: :octicon:`package` Agents
:link: ../concepts/agents.html
Learn about the technology stack
Learn about how to build and scale agents with Plano
.. grid-item-card:: :octicon:`webhook` LLM Providers
.. grid-item-card:: :octicon:`webhook` Model Providers
:link: ../concepts/llm_providers/llm_providers.html
Explore Archs LLM integration options
Explore Plano's LLM integration options
.. grid-item-card:: :octicon:`workflow` Prompt Target
:link: ../concepts/prompt_target.html
Understand how Arch handles prompts
Understand how Plano handles prompts
Guides
------
Step-by-step tutorials for practical Arch use cases and scenarios:
Step-by-step tutorials for practical Plano use cases and scenarios:
.. grid:: 3
.. grid-item-card:: :octicon:`shield-check` Prompt Guard
.. grid-item-card:: :octicon:`shield-check` Guardrails
:link: ../guides/prompt_guard.html
Instructions on securing and validating prompts
.. grid-item-card:: :octicon:`code-square` Function Calling
:link: ../guides/function_calling.html
.. grid-item-card:: :octicon:`code-square` LLM Routing
:link: ../guides/llm_router.html
A guide to effective function calling
A guide to effective model selection strategies
.. grid-item-card:: :octicon:`issue-opened` Observability
:link: ../guides/observability/observability.html
.. grid-item-card:: :octicon:`issue-opened` State Management
:link: ../guides/state.html
Learn to monitor and troubleshoot Arch
Learn to manage conversation and application state
Build with Arch
---------------
Build with Plano
----------------
For developers extending and customizing Arch for specialized needs:
End to end examples demonstrating how to build agentic applications using Plano:
.. grid:: 2
.. grid-item-card:: :octicon:`dependabot` Agentic Workflow
:link: ../build_with_arch/agent.html
.. grid-item-card:: :octicon:`dependabot` Build Agentic Apps
:link: ../get_started/quickstart.html#build-agentic-apps-with-plano
Discover how to create and manage custom agents within Arch
Discover how to create and manage custom agents within Plano
.. grid-item-card:: :octicon:`stack` RAG Application
:link: ../build_with_arch/rag.html
.. grid-item-card:: :octicon:`stack` Build Multi-LLM Apps
:link: ../get_started/quickstart.html#use-plano-as-a-model-proxy-gateway
Integrate RAG for knowledge-driven responses
Learn how to route LLM calls through Plano for enhanced control and observability

View file

@ -1,10 +1,18 @@
.. _quickstart:
Quickstart
================
==========
Follow this guide to learn how to quickly set up Arch and integrate it into your generative AI applications.
Follow this guide to learn how to quickly set up Plano and integrate it into your generative AI applications. You can:
- :ref:`Build agents <quickstart_agents>` for multi-step workflows (e.g., travel assistants with flights and hotels).
- :ref:`Call deterministic APIs via prompt targets <quickstart_prompt_targets>` to turn instructions directly into function calls.
- :ref:`Use Plano as a model proxy (Gateway) <llm_routing_quickstart>` to standardize access to multiple LLM providers.
.. note::
This quickstart assumes basic familiarity with agents and prompt targets from the Concepts section. For background, see :ref:`Agents <agents>` and :ref:`Prompt Target <prompt_target>`.
The full agent and backend API implementations used here are available in the `plano-quickstart repository <https://github.com/plano-ai/plano-quickstart>`_. This guide focuses on wiring and configuring Plano (orchestration, prompt targets, and the model proxy), not application code.
Prerequisites
-------------
@ -15,32 +23,113 @@ Before you begin, ensure you have the following:
2. `Docker Compose <https://docs.docker.com/compose/install/>`_ (v2.29)
3. `Python <https://www.python.org/downloads/>`_ (v3.10+)
Arch's CLI allows you to manage and interact with the Arch gateway efficiently. To install the CLI, simply run the following command:
Plano's CLI allows you to manage and interact with the Plano efficiently. To install the CLI, simply run the following command:
.. tip::
We recommend that developers create a new Python virtual environment to isolate dependencies before installing Arch. This ensures that ``archgw`` and its dependencies do not interfere with other packages on your system.
We recommend that developers create a new Python virtual environment to isolate dependencies before installing Plano. This ensures that ``plano`` and its dependencies do not interfere with other packages on your system.
.. code-block:: console
$ python -m venv venv
$ source venv/bin/activate # On Windows, use: venv\Scripts\activate
$ pip install archgw==0.3.22
$ pip install plano==0.4.0
Build AI Agent with Arch Gateway
--------------------------------
Build Agentic Apps with Plano
-----------------------------
In the following quickstart, we will show you how easy it is to build an AI agent with the Arch gateway. We will build a currency exchange agent using the following simple steps. For this demo, we will use `https://api.frankfurter.dev/` to fetch the latest prices for currencies and assume USD as the base currency.
Plano helps you build agentic applications in two complementary ways:
Step 1. Create arch config file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* **Orchestrate agents**: Let Plano decide which agent or LLM should handle each request and in what sequence.
* **Call deterministic backends**: Use prompt targets to turn natural-language prompts into structured, validated API calls.
Create ``arch_config.yaml`` file with the following content:
.. _quickstart_agents:
Building agents with Plano orchestration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Agents are where your business logic lives (the "inner loop"). Plano takes care of the "outer loop"—routing, sequencing, and managing calls across agents and LLMs.
At a high level, building agents with Plano looks like this:
1. **Implement your agent** in your framework of choice (Python, JS/TS, etc.), exposing it as an HTTP service.
2. **Route LLM calls through Plano's Model Proxy**, so all models share a consistent interface and observability.
3. **Configure Plano to orchestrate**: define which agent(s) can handle which kinds of prompts, and let Plano decide when to call an agent vs. an LLM.
This quickstart uses a simplified version of the Travel Booking Assistant; for the full multi-agent walkthrough, see :ref:`Orchestration <agent_routing>`.
Step 1. Minimal orchestration config
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is a minimal configuration that wires Plano-Orchestrator to two HTTP services: one for flights and one for hotels.
.. code-block:: yaml
version: v0.1.0
version: v0.1.0
agents:
- id: flight_agent
url: http://host.docker.internal:10520 # your flights service
- id: hotel_agent
url: http://host.docker.internal:10530 # your hotels service
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
listeners:
- type: agent
name: travel_assistant
port: 8001
router: plano_orchestrator_v1
agents:
- id: flight_agent
description: Search for flights and provide flight status.
- id: hotel_agent
description: Find hotels and check availability.
tracing:
random_sampling: 100
Step 2. Start your agents and Plano
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Run your ``flight_agent`` and ``hotel_agent`` services (see :ref:`Orchestration <agent_routing>` for a full Travel Booking example), then start Plano with the config above:
.. code-block:: console
$ plano up plano_config.yaml
Plano will start the orchestrator and expose an agent listener on port ``8001``.
Step 3. Send a prompt and let Plano route
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Now send a request to Plano using the OpenAI-compatible chat completions API—the orchestrator will analyze the prompt and route it to the right agent based on intent:
.. code-block:: bash
$ curl --header 'Content-Type: application/json' \
--data '{"messages": [{"role": "user","content": "Find me flights from SFO to JFK tomorrow"}], "model": "openai/gpt-4o"}' \
http://localhost:8001/v1/chat/completions
You can then ask a follow-up like "Also book me a hotel near JFK" and Plano-Orchestrator will route to ``hotel_agent``—your agents stay focused on business logic while Plano handles routing.
.. _quickstart_prompt_targets:
Deterministic API calls with prompt targets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Next, we'll show Plano's deterministic API calling using a single prompt target. We'll build a currency exchange backend powered by `https://api.frankfurter.dev/`, assuming USD as the base currency.
Step 1. Create plano config file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Create ``plano_config.yaml`` file with the following content:
.. code-block:: yaml
version: v0.1.0
listeners:
ingress_traffic:
@ -49,19 +138,13 @@ Create ``arch_config.yaml`` file with the following content:
message_format: openai
timeout: 30s
llm_providers:
model_providers:
- access_key: $OPENAI_API_KEY
model: openai/gpt-4o
system_prompt: |
You are a helpful assistant.
prompt_guards:
input_guards:
jailbreak:
on_exception:
message: Looks like you're curious about my abilities, but I can only provide assistance for currency exchange.
prompt_targets:
- name: currency_exchange
description: Get currency exchange rate from USD to other currencies
@ -88,16 +171,16 @@ Create ``arch_config.yaml`` file with the following content:
endpoint: api.frankfurter.dev:443
protocol: https
Step 2. Start arch gateway with currency conversion config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Step 2. Start plano with currency conversion config
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: sh
$ archgw up arch_config.yaml
2024-12-05 16:56:27,979 - cli.main - INFO - Starting archgw cli version: 0.1.5
$ plano up plano_config.yaml
2024-12-05 16:56:27,979 - cli.main - INFO - Starting plano cli version: 0.1.5
...
2024-12-05 16:56:28,485 - cli.utils - INFO - Schema validation successful!
2024-12-05 16:56:28,485 - cli.main - INFO - Starting arch model server and arch gateway
2024-12-05 16:56:28,485 - cli.main - INFO - Starting plano model server and plano gateway
...
2024-12-05 16:56:51,647 - cli.core - INFO - Container is healthy!
@ -106,7 +189,7 @@ Once the gateway is up, you can start interacting with it at port 10000 using th
Some sample queries you can ask include: ``what is currency rate for gbp?`` or ``show me list of currencies for conversion``.
Step 3. Interacting with gateway using curl command
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is a sample curl command you can use to interact:
@ -129,15 +212,17 @@ And to get the list of supported currencies:
"Here is a list of the currencies that are supported for conversion from USD, along with their symbols:\n\n1. AUD - Australian Dollar\n2. BGN - Bulgarian Lev\n3. BRL - Brazilian Real\n4. CAD - Canadian Dollar\n5. CHF - Swiss Franc\n6. CNY - Chinese Renminbi Yuan\n7. CZK - Czech Koruna\n8. DKK - Danish Krone\n9. EUR - Euro\n10. GBP - British Pound\n11. HKD - Hong Kong Dollar\n12. HUF - Hungarian Forint\n13. IDR - Indonesian Rupiah\n14. ILS - Israeli New Sheqel\n15. INR - Indian Rupee\n16. ISK - Icelandic Króna\n17. JPY - Japanese Yen\n18. KRW - South Korean Won\n19. MXN - Mexican Peso\n20. MYR - Malaysian Ringgit\n21. NOK - Norwegian Krone\n22. NZD - New Zealand Dollar\n23. PHP - Philippine Peso\n24. PLN - Polish Złoty\n25. RON - Romanian Leu\n26. SEK - Swedish Krona\n27. SGD - Singapore Dollar\n28. THB - Thai Baht\n29. TRY - Turkish Lira\n30. USD - United States Dollar\n31. ZAR - South African Rand\n\nIf you want to convert USD to any of these currencies, you can select the one you are interested in."
Use Arch Gateway as LLM Router
------------------------------
.. _llm_routing_quickstart:
Step 1. Create arch config file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use Plano as a Model Proxy (Gateway)
------------------------------------
Arch operates based on a configuration file where you can define LLM providers, prompt targets, guardrails, etc. Below is an example configuration that defines OpenAI and Mistral LLM providers.
Step 1. Create plano config file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Create ``arch_config.yaml`` file with the following content:
Plano operates based on a configuration file where you can define LLM providers, prompt targets, guardrails, etc. Below is an example configuration that defines OpenAI and Mistral LLM providers.
Create ``plano_config.yaml`` file with the following content:
.. code-block:: yaml
@ -150,7 +235,7 @@ Create ``arch_config.yaml`` file with the following content:
message_format: openai
timeout: 30s
llm_providers:
model_providers:
- access_key: $OPENAI_API_KEY
model: openai/gpt-4o
default: true
@ -158,19 +243,19 @@ Create ``arch_config.yaml`` file with the following content:
- access_key: $MISTRAL_API_KEY
model: mistralministral-3b-latest
Step 2. Start arch gateway
~~~~~~~~~~~~~~~~~~~~~~~~~~
Step 2. Start plano
~~~~~~~~~~~~~~~~~~~
Once the config file is created, ensure that you have environment variables set up for ``MISTRAL_API_KEY`` and ``OPENAI_API_KEY`` (or these are defined in a ``.env`` file).
Start the Arch gateway:
Start Plano:
.. code-block:: console
$ archgw up arch_config.yaml
2024-12-05 11:24:51,288 - cli.main - INFO - Starting archgw cli version: 0.1.5
$ plano up plano_config.yaml
2024-12-05 11:24:51,288 - cli.main - INFO - Starting plano cli version: 0.1.5
2024-12-05 11:24:51,825 - cli.utils - INFO - Schema validation successful!
2024-12-05 11:24:51,825 - cli.main - INFO - Starting arch model server and arch gateway
2024-12-05 11:24:51,825 - cli.main - INFO - Starting plano
...
2024-12-05 11:25:16,131 - cli.core - INFO - Container is healthy!
@ -178,9 +263,9 @@ Step 3: Interact with LLM
~~~~~~~~~~~~~~~~~~~~~~~~~
Step 3.1: Using OpenAI Python client
++++++++++++++++++++++++++++++++++++
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Make outbound calls via the Arch gateway:
Make outbound calls via the Plano gateway:
.. code-block:: python
@ -188,14 +273,14 @@ Make outbound calls via the Arch gateway:
# Use the OpenAI client as usual
client = OpenAI(
# No need to set a specific openai.api_key since it's configured in Arch's gateway
# No need to set a specific openai.api_key since it's configured in Plano's gateway
api_key='--',
# Set the OpenAI API base URL to the Arch gateway endpoint
# Set the OpenAI API base URL to the Plano gateway endpoint
base_url="http://127.0.0.1:12000/v1"
)
response = client.chat.completions.create(
# we select model from arch_config file
# we select model from plano_config file
model="--",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
@ -203,7 +288,7 @@ Make outbound calls via the Arch gateway:
print("OpenAI Response:", response.choices[0].message.content)
Step 3.2: Using curl command
++++++++++++++++++++++++++++
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
@ -225,38 +310,13 @@ Step 3.2: Using curl command
],
}
You can override model selection using the ``x-arch-llm-provider-hint`` header. For example, to use Mistral, use the following curl command:
.. code-block:: bash
$ curl --header 'Content-Type: application/json' \
--header 'x-arch-llm-provider-hint: ministral-3b' \
--data '{"messages": [{"role": "user","content": "What is the capital of France?"}], "model": "none"}' \
http://localhost:12000/v1/chat/completions
{
...
"model": "ministral-3b-latest",
"choices": [
{
"messages": {
"role": "assistant",
"content": "The capital of France is Paris. It is the most populous city in France and is known for its iconic landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also a major global center for art, fashion, gastronomy, and culture.",
},
...
}
],
...
}
Next Steps
==========
Congratulations! You've successfully set up Arch and made your first prompt-based request. To further enhance your GenAI applications, explore the following resources:
Congratulations! You've successfully set up Plano and made your first prompt-based request. To further enhance your GenAI applications, explore the following resources:
- :ref:`Full Documentation <overview>`: Comprehensive guides and references.
- `GitHub Repository <https://github.com/katanemo/arch>`_: Access the source code, contribute, and track updates.
- `Support <https://github.com/katanemo/arch#contact>`_: Get help and connect with the Arch community .
- `GitHub Repository <https://github.com/katanemo/plano>`_: Access the source code, contribute, and track updates.
- `Support <https://github.com/katanemo/plano#contact>`_: Get help and connect with the Plano community .
With Arch, building scalable, fast, and personalized GenAI applications has never been easier. Dive deeper into Arch's capabilities and start creating innovative AI-driven experiences today!
With Plano, building scalable, fast, and personalized GenAI applications has never been easier. Dive deeper into Plano's capabilities and start creating innovative AI-driven experiences today!