mirror of
https://github.com/katanemo/plano.git
synced 2026-05-10 16:22:42 +02:00
Adil/fix salman docs (#75)
* added the first set of docs for our technical docs * more docuemtnation changes * added support for prompt processing and updated life of a request * updated docs to including getting help sections and updated life of a request * committing local changes for getting started guide, sample applications, and full reference spec for prompt-config * updated configuration reference, added sample app skeleton, updated favico * fixed the configuration refernce file, and made minor changes to the intent detection. commit v1 for now * Updated docs with use cases and example code, updated what is arch, and made minor changes throughout * fixed imaged and minor doc fixes * add sphinx_book_theme * updated README, and make some minor fixes to documetnation * fixed README.md * fixed image width --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
This commit is contained in:
parent
2d31aeaa36
commit
13dff3089d
33 changed files with 931 additions and 287 deletions
|
|
@ -1,72 +1,85 @@
|
|||
What is Arch
|
||||
============
|
||||
|
||||
Arch is an intelligent Layer 7 gateway designed for generative AI apps, agents, and Co-pilots that work
|
||||
with prompts. Written in `Rust <https://www.rust-lang.org/>`_, and engineered with purpose-built
|
||||
:ref:`LLMs <llms_in_arch>`, Arch handles all the critical but undifferentiated tasks related to handling and
|
||||
processing prompts, including rejecting `jailbreak <https://github.com/verazuo/jailbreak_llms>`_ attempts,
|
||||
intelligently calling “backend” APIs to fulfill a user's request represented in a prompt, routing/disaster
|
||||
recovery between upstream LLMs, and managing the observability of prompts and LLM interactions in a centralized way.
|
||||
Arch is an intelligent `(Layer 7) <https://www.cloudflare.com/learning/ddos/what-is-layer-7/>`_ gateway
|
||||
designed for generative AI apps, AI agents, and Co-pilots that work with prompts. Engineered with purpose-built
|
||||
:ref:`LLMs <llms_in_arch>`, Arch handles the critical but undifferentiated tasks related to the handling and
|
||||
processing of prompts, including detecting and rejecting `jailbreak <https://github.com/verazuo/jailbreak_llms>`_
|
||||
attempts, intelligently calling “backend” APIs to fulfill the user's request represented in a prompt, routing to
|
||||
and offering disaster recovery between upstream LLMs, and managing the observability of prompts and LLM interactions
|
||||
in a centralized way.
|
||||
|
||||
The project was born out of the belief that:
|
||||
**The project was born out of the belief that:**
|
||||
|
||||
*prompts are nuanced and opaque user requests that need the same capabilities as network requests
|
||||
in modern (cloud-native) applications, including secure handling, intelligent routing, robust observability,
|
||||
and integration with backend (API) systems for personalization.*
|
||||
*Prompts are nuanced and opaque user requests, which require the same capabilities as traditional HTTP requests
|
||||
including secure handling, intelligent routing, robust observability, and integration with backend (API)
|
||||
systems for personalization - all outside business logic.*
|
||||
|
||||
In practice, achieving the above goal is incredibly difficult. Arch attempts to do so by providing the
|
||||
In practice, achieving the above goal is incredibly difficult. Arch attempts to do so by providing the
|
||||
following high level features:
|
||||
|
||||
**Out of process archtiecture, built on Envoy:** Arch is takes a dependency on `Envoy <http://envoyproxy.io/>`_
|
||||
and is a self-contained process that is designed to run alongside your application servers. Arch uses
|
||||
Envoy's HTTP connection management subsystem and HTTP L7 filtering capabilities to extend its' proxying
|
||||
functionality. This gives Arch several advantages:
|
||||
_____________________________________________________________________________________________________________
|
||||
|
||||
* Arch builds on Envoy's success. Envoy is used at masssive sacle by the leading technology companies of
|
||||
our time including `AirBnB <https://www.airbnb.com>`_, `Dropbox <https://www.dropbox.com>`_,
|
||||
`Google <https://www.google.com>`_, `Reddit <https://www.reddit.com>`_, `Stripe <https://www.stripe.com>`_,
|
||||
etc. Its battle tested and scales linearly with usage and enables developers to focus on what really matters:
|
||||
application and business logic.
|
||||
**Out-of-process architecture, built on** `Envoy <http://envoyproxy.io/>`_: Arch is takes a dependency on
|
||||
Envoy and is a self-contained process that is designed to run alongside your application servers. Arch uses
|
||||
Envoy's HTTP connection management subsystem, HTTP L7 filtering and telemetry capabilities to extend the
|
||||
functionality exclusively for prompts and LLMs. This gives Arch several advantages:
|
||||
|
||||
* Arch works with any application language. A single Arch deployment can act as gateway for AI applications
|
||||
written in Python, Java, C++, Go, Php, etc.
|
||||
* Arch builds on Envoy's proven success. Envoy is used at masssive sacle by the leading technology companies of
|
||||
our time including `AirBnB <https://www.airbnb.com>`_, `Dropbox <https://www.dropbox.com>`_,
|
||||
`Google <https://www.google.com>`_, `Reddit <https://www.reddit.com>`_, `Stripe <https://www.stripe.com>`_,
|
||||
etc. Its battle tested and scales linearly with usage and enables developers to focus on what really matters:
|
||||
application features and business logic.
|
||||
|
||||
* As anyone that has worked with a modern application architecture knows, deploying library upgrades
|
||||
can be incredibly painful. Arch can be deployed and upgraded quickly across your infrastructure
|
||||
transparently.
|
||||
* Arch works with any application language. A single Arch deployment can act as gateway for AI applications
|
||||
written in Python, Java, C++, Go, Php, etc.
|
||||
|
||||
**Engineered with LLMs:** Arch is engineered with specialized LLMs that are desgined for fast, cost-effective
|
||||
and acurrate handling of prompts. These (sub-billion parameter) :ref:`LLMs <llms_in_arch>` are designed to be
|
||||
best-in-class for critcal but undifferentiated prompt-related tasks like 1) applying guardrails for jailbreak
|
||||
attempts 2) extracting critical information from prompts (like follow-on, clarifying questions, etc.) so that
|
||||
you can improve the speed and accuracy of retrieval, and be able to convert prompts into API sematics when necessary
|
||||
to build text-to-action (or agentic) applications. The focus for Arch is to make prompt processing indistiguishable
|
||||
from the processing of a traditional HTTP request before forwarding it to an application server. With our focus on
|
||||
speed and cost, Arch uses purpose-built LLMs and will continue to invest in those to lower latency (and cost) while
|
||||
maintaining exceptional baseline performance with frontier LLMs like `OpenAI <https:openai.com>`_, and
|
||||
`Anthropic <https:www.anthropic.com>`_.
|
||||
* Arch can be deployed and upgraded quickly across your infrastructure transparently without horrid pain of
|
||||
deploying library upgrades in your applications.
|
||||
|
||||
**Prompt Guardrails:** Arch helps you apply prompt guardrails in a centralized way for better governance
|
||||
hygiene. With prompt guardrails you can prevent `jailbreak <https://github.com/verazuo/jailbreak_llms>`_
|
||||
attempts or toxicity present in user's prompts without having to write a single line of code. To learn more about
|
||||
how to configure guardrails available in Arch, read :ref:`more <llms_in_arch>`.
|
||||
**Engineered with Fast LLMs:** Arch is engineered with specialized (sub-billion) LLMs that are desgined for fast,
|
||||
cost-effective and acurrate handling of prompts. These :ref:`LLMs <llms_in_arch>` are designed to be
|
||||
best-in-class for critcal prompt-related tasks like:
|
||||
|
||||
**Function Calling:** Arch helps you personalize GenAI apps by enabling calls to application-specific (API)
|
||||
operations using prompts. This involves any predefined functions or APIs you want to expose to users to
|
||||
perform tasks, gather information, or manipulate data. With function calling, you have flexibilityto support
|
||||
agentic workflows tailored to specific use cases - from updating insurance claims to creating ad campaigns.
|
||||
Arch analyzes prompts, extracts critical information from prompts, engages in lightweight conversation with the
|
||||
user to gather any missing parameters and makes API calls so that you can focus on writing business logic.
|
||||
* **Function/API Calling:** Arch helps you easily personalize your applications by enabling calls to
|
||||
application-specific (API) operations via user prompts. This involves any predefined functions or APIs
|
||||
you want to expose to users to perform tasks, gather information, or manipulate data. With function calling,
|
||||
you have flexibility to support "agentic" experiences tailored to specific use cases - from updating insurance
|
||||
claims to creating ad campaigns - via prompts. Arch analyzes prompts, extracts critical information from
|
||||
prompts, engages in lightweight conversation with the user to gather any missing parameters and makes API
|
||||
calls so that you can focus on writing business logic. For more details, read :ref:`prompt processing <arch_overview_prompt_handling>`.
|
||||
|
||||
**Best-In Class Monitoring & Traffic Management:** Arch offers several monitoring metrics that help you
|
||||
understand three critical aspects of your application: latency, token usage, and error rates by LLM provider.
|
||||
Latency measures the speed at which your application is responding to users, which includes metrics like time
|
||||
to first token (TFT), time per output token (TOT) metrics, and the total latency as perceived by users. In
|
||||
addition, Arch offers several capabilities for calls originating from your applications to upstream LLMs,
|
||||
including a vendor-agnostic SDK to make LLM calls, smart retries on errors from upstream LLMs, and automatic
|
||||
cutover to other LLMs configured for continuous availability and disaster recovery scenarios.
|
||||
* **Prompt Guardrails:** Arch helps you improve the safety of your application by applying prompt guardrails in
|
||||
a centralized way for better governance hygiene. With prompt guardrails you can prevent `jailbreak <https://github.com/verazuo/jailbreak_llms>`_
|
||||
attempts or toxicity present in user's prompts without having to write a single line of code. To learn more
|
||||
about how to configure guardrails available in Arch, read :ref:`prompt processing <arch_overview_prompt_handling>`.
|
||||
|
||||
**Front/edge proxy support:** There is substantial benefit in using the same software at the edge (observability,
|
||||
prompt management, load balancing algorithms, etc.) as it is . Arch has a feature set that makes it well suited
|
||||
as an edge proxy for most modern web application use cases. This includes TLS termination, HTTP/1.1 HTTP/2 and
|
||||
HTTP/3 support and prompt-based routing.
|
||||
* **Intent-Drift Detection:** Developers struggle to handle `follow-up <https://www.reddit.com/r/ChatGPTPromptGenius/comments/17dzmpy/how_to_use_rag_with_conversation_history_for/?>`_,
|
||||
or `clarifying <https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/>`_
|
||||
questions. Specifically, when users ask for modifications or additions to previous responses their AI applications
|
||||
often generate entirely new responses instead of adjusting the previous ones. Arch offers intent-drift detection as a
|
||||
feature so that developers know when the user has shifted away from the previous intent so that they can improve
|
||||
their retrieval, lower overall token cost and dramatically improve the speed and accuracy of their responses back
|
||||
to users.
|
||||
|
||||
**Traffic Management:** Arch offers several capabilities for LLM calls originating from your applications, including a
|
||||
vendor-agnostic SDK to make LLM calls, smart retries on errors from upstream LLMs, and automatic cutover to other LLMs
|
||||
configured in Arch for continuous availability and disaster recovery scenarios. Arch extends Envoy's `cluster subsystem
|
||||
<https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/cluster_manager>`_ to manage upstream connections
|
||||
to LLMs so that you can build resilient AI applications.
|
||||
|
||||
**Front/edge Gateway:** There is substantial benefit in using the same software at the edge (observability,
|
||||
traffic shaping alogirithms, applying guardrails, etc.) as for outbound LLM inference use cases. Arch has the feature set
|
||||
that makes it exceptionally well suited as an edge gateway for AI applications. This includes TLS termination, rate limiting,
|
||||
and prompt-based routing.
|
||||
|
||||
**Best-In Class Monitoring:** Arch offers several monitoring metrics that help you understand three
|
||||
critical aspects of your application: latency, token usage, and error rates by an upstream LLM provider. Latency
|
||||
measures the speed at which your application is responding to users, which includes metrics like time to first
|
||||
token (TFT), time per output token (TOT) metrics, and the total latency as perceived by users.
|
||||
|
||||
**End-to-End Tracing:** Arch propagates trace context using the W3C Trace Context standard, specifically through
|
||||
the ``traceparent`` header. This allows each component in the system to record its part of the request flow,
|
||||
enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Arch ensures that
|
||||
developers can capture this trace data consistently and in a format compatible with various observability tools.
|
||||
For more details, read :ref:`tracing <arch_overview_tracing>`.
|
||||
Loading…
Add table
Add a link
Reference in a new issue