mirror of
https://github.com/katanemo/plano.git
synced 2026-05-05 22:02:43 +02:00
Doc Update (#129)
* init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
This commit is contained in:
parent
2a7b95582c
commit
5c7567584d
49 changed files with 1185 additions and 609 deletions
47
docs/source/get_started/includes/quickstart.yaml
Normal file
47
docs/source/get_started/includes/quickstart.yaml
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
version: "0.1-beta"
|
||||
listen:
|
||||
address: 127.0.0.1 | 0.0.0.0
|
||||
port_value: 8080 #If you configure port 443, you'll need to update the listener with tls_certificates
|
||||
|
||||
system_prompt: |
|
||||
You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
|
||||
llm_providers:
|
||||
- name: "OpenAI"
|
||||
provider: "openai"
|
||||
access_key: OPENAI_API_KEY
|
||||
model: gpt-4o
|
||||
stream: true
|
||||
|
||||
prompt_targets:
|
||||
- name: reboot_devices
|
||||
description: >
|
||||
This prompt target handles user requests to reboot devices.
|
||||
It ensures that when users request to reboot specific devices or device groups, the system processes the reboot commands accurately.
|
||||
|
||||
**Examples of user prompts:**
|
||||
|
||||
- "Please reboot device 12345."
|
||||
- "Restart all devices in tenant group tenant-XYZ
|
||||
- "I need to reboot devices A, B, and C."
|
||||
|
||||
path: /agent/device_reboot
|
||||
parameters:
|
||||
- name: "device_ids"
|
||||
type: list # Options: integer | float | list | dictionary | set
|
||||
description: "A list of device identifiers (IDs) to reboot."
|
||||
required: false
|
||||
- name: "device_group"
|
||||
type: string # Options: string | integer | float | list | dictionary | set
|
||||
description: "The name of the device group to reboot."
|
||||
required: false
|
||||
|
||||
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
|
||||
endpoints:
|
||||
app_server:
|
||||
# value could be ip address or a hostname with port
|
||||
# this could also be a list of endpoints for load balancing for example endpoint: [ ip1:port, ip2:port ]
|
||||
endpoint: "127.0.0.1:80"
|
||||
# max time to wait for a connection to be established
|
||||
connect_timeout: 0.005s
|
||||
version: "0.1-beta"
|
||||
90
docs/source/get_started/intro_to_arch.rst
Normal file
90
docs/source/get_started/intro_to_arch.rst
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
.. _intro_to_arch:
|
||||
|
||||
Intro to Arch
|
||||
=============
|
||||
|
||||
Arch is an intelligent `(Layer 7) <https://www.cloudflare.com/learning/ddos/what-is-layer-7/>`_ gateway
|
||||
designed for generative AI apps, AI agents, and Co-pilots that work with prompts. Engineered with purpose-built
|
||||
large language models (LLMs), Arch handles all the critical but undifferentiated tasks related to the handling and
|
||||
processing of prompts, including detecting and rejecting `jailbreak <https://github.com/verazuo/jailbreak_llms>`_
|
||||
attempts, intelligently calling “backend” APIs to fulfill the user's request represented in a prompt, routing to
|
||||
and offering disaster recovery between upstream LLMs, and managing the observability of prompts and LLM interactions
|
||||
in a centralized way.
|
||||
|
||||
.. image:: /_static/img/arch-logo.png
|
||||
:width: 100%
|
||||
:align: center
|
||||
|
||||
**The project was born out of the belief that:**
|
||||
|
||||
*Prompts are nuanced and opaque user requests, which require the same capabilities as traditional HTTP requests
|
||||
including secure handling, intelligent routing, robust observability, and integration with backend (API)
|
||||
systems for personalization - all outside business logic.*
|
||||
|
||||
|
||||
In practice, achieving the above goal is incredibly difficult. Arch attempts to do so by providing the
|
||||
following high level features:
|
||||
|
||||
_____________________________________________________________________________________________________________
|
||||
|
||||
**Out-of-process architecture, built on** `Envoy <http://envoyproxy.io/>`_: Arch is takes a dependency on
|
||||
Envoy and is a self-contained process that is designed to run alongside your application servers. Arch uses
|
||||
Envoy's HTTP connection management subsystem, HTTP L7 filtering and telemetry capabilities to extend the
|
||||
functionality exclusively for prompts and LLMs. This gives Arch several advantages:
|
||||
|
||||
* Arch builds on Envoy's proven success. Envoy is used at masssive sacle by the leading technology companies of
|
||||
our time including `AirBnB <https://www.airbnb.com>`_, `Dropbox <https://www.dropbox.com>`_,
|
||||
`Google <https://www.google.com>`_, `Reddit <https://www.reddit.com>`_, `Stripe <https://www.stripe.com>`_,
|
||||
etc. Its battle tested and scales linearly with usage and enables developers to focus on what really matters:
|
||||
application features and business logic.
|
||||
|
||||
* Arch works with any application language. A single Arch deployment can act as gateway for AI applications
|
||||
written in Python, Java, C++, Go, Php, etc.
|
||||
|
||||
* Arch can be deployed and upgraded quickly across your infrastructure transparently without the horrid pain
|
||||
of deploying library upgrades in your applications.
|
||||
|
||||
**Engineered with Fast LLMs:** Arch is engineered with specialized (sub-billion) LLMs that are desgined for
|
||||
fast, cost-effective and acurrate handling of prompts. These LLMs are designed to be
|
||||
best-in-class for critcal prompt-related tasks like:
|
||||
|
||||
* **Function/API Calling:** Arch helps you easily personalize your applications by enabling calls to
|
||||
application-specific (API) operations via user prompts. This involves any predefined functions or APIs
|
||||
you want to expose to users to perform tasks, gather information, or manipulate data. With function calling,
|
||||
you have flexibility to support "agentic" experiences tailored to specific use cases - from updating insurance
|
||||
claims to creating ad campaigns - via prompts. Arch analyzes prompts, extracts critical information from
|
||||
prompts, engages in lightweight conversation to gather any missing parameters and makes API calls so that you can
|
||||
focus on writing business logic. For more details, read :ref:`prompt processing <arch_overview_prompt_handling>`.
|
||||
|
||||
* **Prompt Guardrails:** Arch helps you improve the safety of your application by applying prompt guardrails in
|
||||
a centralized way for better governance hygiene. With prompt guardrails you can prevent `jailbreak <https://github.com/verazuo/jailbreak_llms>`_
|
||||
attempts or toxicity present in user's prompts without having to write a single line of code. To learn more
|
||||
about how to configure guardrails available in Arch, read :ref:`prompt processing <arch_overview_prompt_handling>`.
|
||||
|
||||
* **[Coming Soon] Intent-Markers:** Developers struggle to handle `follow-up <https://www.reddit.com/r/ChatGPTPromptGenius/comments/17dzmpy/how_to_use_rag_with_conversation_history_for/?>`_,
|
||||
or `clarifying <https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/>`_
|
||||
questions. Specifically, when users ask for modifications or additions to previous responses their AI applications
|
||||
often generate entirely new responses instead of adjusting the previous ones. Arch offers intent-markers as a
|
||||
feature so that developers know when the user has shifted away from the previous intent so that they can improve
|
||||
their retrieval, lower overall token cost and dramatically improve the speed and accuracy of their responses back
|
||||
to users. For more details :ref:`intent markers<arch_rag_guide>`
|
||||
|
||||
**Traffic Management:** Arch offers several capabilities for LLM calls originating from your applications, including smart
|
||||
retries on errors from upstream LLMs, and automatic cutover to other LLMs configured in Arch for continuous availability
|
||||
and disaster recovery scenarios. Arch extends Envoy's `cluster subsystem <https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/cluster_manager>`_
|
||||
to manage upstream connections to LLMs so that you can build resilient AI applications.
|
||||
|
||||
**Front/edge Gateway:** There is substantial benefit in using the same software at the edge (observability,
|
||||
traffic shaping alogirithms, applying guardrails, etc.) as for outbound LLM inference use cases. Arch has the feature set
|
||||
that makes it exceptionally well suited as an edge gateway for AI applications. This includes TLS termination, applying
|
||||
guardrail early in the pricess, intelligent parameter gathering from prompts, and prompt-based routing to backend APIs.
|
||||
|
||||
**Best-In Class Monitoring:** Arch offers several monitoring metrics that help you understand three critical aspects of
|
||||
your application: latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which
|
||||
your application is responding to users, which includes metrics like time to first token (TFT), time per output token (TOT)
|
||||
metrics, and the total latency as perceived by users.
|
||||
|
||||
**End-to-End Tracing:** Arch propagates trace context using the W3C Trace Context standard, specifically through the
|
||||
``traceparent`` header. This allows each component in the system to record its part of the request flow, enabling **end-to-end tracing**
|
||||
across the entire application. By using OpenTelemetry, Arch ensures that developers can capture this trace data consistently and
|
||||
in a format compatible with various observability tools. For more details, read :ref:`tracing <arch_overview_tracing>`.
|
||||
91
docs/source/get_started/overview.rst
Normal file
91
docs/source/get_started/overview.rst
Normal file
|
|
@ -0,0 +1,91 @@
|
|||
Overview
|
||||
============
|
||||
Welcome to Arch, the intelligent prompt gateway designed to help developers build **fast**, **secure**, and **personalized** generative AI apps at ANY scale.
|
||||
In this documentation, you will learn how to quickly set up Arch to trigger API calls via prompts, apply prompt guardrails without writing any application-level logic,
|
||||
simplify the interaction with upstream LLMs, and improve observability all while simplifying your application development process.
|
||||
|
||||
|
||||
Get Started
|
||||
-----------
|
||||
|
||||
This section introduces you to Arch and helps you get set up quickly:
|
||||
|
||||
.. grid:: 3
|
||||
|
||||
.. grid-item-card:: Overview
|
||||
:link: overview.html
|
||||
|
||||
Overview of Arch and Doc navigation
|
||||
|
||||
.. grid-item-card:: Intro to Arch
|
||||
:link: intro_to_arch.html
|
||||
|
||||
Explore Arch's features and developer workflow
|
||||
|
||||
.. grid-item-card:: Quickstart
|
||||
:link: quickstart.html
|
||||
|
||||
Learn how to quickly set up and integrate
|
||||
|
||||
|
||||
Concepts
|
||||
--------
|
||||
|
||||
Deep dive into essential ideas and mechanisms behind Arch:
|
||||
|
||||
.. grid:: 3
|
||||
|
||||
.. grid-item-card:: Tech Overview
|
||||
:link: ../Concepts/tech_overview/tech_overview.html
|
||||
|
||||
Learn about the technology stack
|
||||
|
||||
.. grid-item-card:: LLM Provider
|
||||
:link: ../Concepts/llm_provider.html
|
||||
|
||||
Explore Arch’s LLM integration options
|
||||
|
||||
.. grid-item-card:: Targets
|
||||
:link: ../Concepts/prompt_target.html
|
||||
|
||||
Understand how Arch handles prompts
|
||||
|
||||
|
||||
Guides
|
||||
------
|
||||
Step-by-step tutorials for practical Arch use cases and scenarios:
|
||||
|
||||
.. grid:: 3
|
||||
|
||||
.. grid-item-card:: Prompt Guard
|
||||
:link: ../guides/tech_overview/tech_overview.html
|
||||
|
||||
Instructions on securing and validating prompts
|
||||
|
||||
.. grid-item-card:: Function Calling
|
||||
:link: ../guides/function_calling.html
|
||||
|
||||
A guide to effective function calling
|
||||
|
||||
.. grid-item-card:: Observability
|
||||
:link: ../guides/prompt_target.html
|
||||
|
||||
Learn to monitor and troubleshoot Arch
|
||||
|
||||
|
||||
Build with Arch
|
||||
---------------
|
||||
|
||||
For developers extending and customizing Arch for specialized needs:
|
||||
|
||||
.. grid:: 2
|
||||
|
||||
.. grid-item-card:: Agentic Workflow
|
||||
:link: ../build_with_arch/agent.html
|
||||
|
||||
Discover how to create and manage custom agents within Arch
|
||||
|
||||
.. grid-item-card:: RAG Application
|
||||
:link: ../build_with_arch/rag.html
|
||||
|
||||
Integrate RAG for knowledge-driven responses
|
||||
84
docs/source/get_started/quickstart.rst
Normal file
84
docs/source/get_started/quickstart.rst
Normal file
|
|
@ -0,0 +1,84 @@
|
|||
.. _quickstart:
|
||||
|
||||
Quickstart
|
||||
================
|
||||
|
||||
Follow this guide to learn how to quickly set up Arch and integrate it into your generative AI applications.
|
||||
|
||||
|
||||
Prerequisites
|
||||
----------------------------
|
||||
|
||||
Before you begin, ensure you have the following:
|
||||
|
||||
.. vale Vale.Spelling = NO
|
||||
|
||||
- ``Docker`` & ``Python`` installed on your system
|
||||
- ``API Keys`` for LLM providers (if using external LLMs)
|
||||
|
||||
The fastest way to get started using Arch is to use `katanemo/arch <https://hub.docker.com/r/katanemo/arch>`_ pre-built binaries.
|
||||
You can also build it from source.
|
||||
|
||||
|
||||
Step 1: Install Arch
|
||||
----------------------------
|
||||
Arch's CLI allows you to manage and interact with the Arch gateway efficiently. To install the CLI, simply
|
||||
run the following command:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ pip install archgw
|
||||
|
||||
This will install the archgw command-line tool globally on your system.
|
||||
|
||||
.. tip::
|
||||
We recommend that developers create a new Python virtual environment to isolate dependencies before installing Arch.
|
||||
This ensures that `archgw` and its dependencies do not interfere with other packages on your system.
|
||||
|
||||
To create and activate a virtual environment, you can run the following commands:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ python -m venv venv
|
||||
$ source venv/bin/activate # On Windows, use: venv\Scripts\activate
|
||||
$ pip install archgw
|
||||
|
||||
|
||||
Step 2: Config Arch
|
||||
-------------------
|
||||
|
||||
Arch operates based on a configuration file where you can define LLM providers, prompt targets, and guardrails, etc.
|
||||
Below is an example configuration to get you started, including:
|
||||
|
||||
.. vale Vale.Spelling = NO
|
||||
|
||||
- ``endpoints``: Specifies where Arch listens for incoming prompts.
|
||||
- ``system_prompts``: Defines predefined prompts to set the context for interactions.
|
||||
- ``llm_providers``: Lists the LLM providers Arch can route prompts to.
|
||||
- ``prompt_guards``: Sets up rules to detect and reject undesirable prompts.
|
||||
- ``prompt_targets``: Defines endpoints that handle specific types of prompts.
|
||||
- ``error_target``: Specifies where to route errors for handling.
|
||||
|
||||
.. literalinclude:: includes/quickstart.yaml
|
||||
:language: yaml
|
||||
|
||||
|
||||
Step 3: Start Arch Gateway
|
||||
--------------------------
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ archgw up [path_to_config]
|
||||
|
||||
|
||||
|
||||
Next Steps
|
||||
-------------------
|
||||
|
||||
Congratulations! You've successfully set up Arch and made your first prompt-based request. To further enhance your GenAI applications, explore the following resources:
|
||||
|
||||
- Full Documentation: Comprehensive guides and references.
|
||||
- `GitHub Repository <https://github.com/katanemo/arch>`_: Access the source code, contribute, and track updates.
|
||||
- `Support <https://github.com/katanemo/arch#contact>`_: Get help and connect with the Arch community .
|
||||
|
||||
With Arch, building scalable, fast, and personalized GenAI applications has never been easier. Dive deeper into Arch's capabilities and start creating innovative AI-driven experiences today!
|
||||
Loading…
Add table
Add a link
Reference in a new issue