Adil/fix salman docs (#75)

* added the first set of docs for our technical docs * more docuemtnation changes * added support for prompt processing and updated life of a request * updated docs to including getting help sections and updated life of a request * committing local changes for getting started guide, sample applications, and full reference spec for prompt-config * updated configuration reference, added sample app skeleton, updated favico * fixed the configuration refernce file, and made minor changes to the intent detection. commit v1 for now * Updated docs with use cases and example code, updated what is arch, and made minor changes throughout * fixed imaged and minor doc fixes * add sphinx_book_theme * updated README, and make some minor fixes to documetnation * fixed README.md * fixed image width --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
2026-05-10 16:22:42 +02:00 · 2024-09-24 13:54:17 -07:00 · 2024-09-24 13:54:17 -07:00 · 13dff3089d
commit 13dff3089d
parent 2d31aeaa36
33 changed files with 931 additions and 287 deletions
--- a/docs/source/intro/what_is_arch.rst
+++ b/docs/source/intro/what_is_arch.rst
@ -1,72 +1,85 @@
 What is Arch
 ============

-Arch is an intelligent Layer 7 gateway designed for generative AI apps, agents, and Co-pilots that work 
-with prompts. Written in `Rust <https://www.rust-lang.org/>`_, and engineered with purpose-built 
-:ref:`LLMs <llms_in_arch>`, Arch handles all the critical but undifferentiated tasks related to handling and 
-processing prompts, including rejecting `jailbreak <https://github.com/verazuo/jailbreak_llms>`_ attempts, 
-intelligently calling “backend” APIs to fulfill a user's request represented in a prompt, routing/disaster 
-recovery between upstream LLMs, and managing the observability of prompts and LLM interactions in a centralized way.
+Arch is an intelligent `(Layer 7) <https://www.cloudflare.com/learning/ddos/what-is-layer-7/>`_ gateway
+designed for generative AI apps, AI agents, and Co-pilots that work with prompts. Engineered with purpose-built
+:ref:`LLMs <llms_in_arch>`, Arch handles the critical but undifferentiated tasks related to the handling and
+processing of prompts, including detecting and rejecting `jailbreak <https://github.com/verazuo/jailbreak_llms>`_
+attempts, intelligently calling “backend” APIs to fulfill the user's request represented in a prompt, routing to
+and offering disaster recovery between upstream LLMs, and managing the observability of prompts and LLM interactions
+in a centralized way.

-The project was born out of the belief that:
+**The project was born out of the belief that:**

-  *prompts are nuanced and opaque user requests that need the same capabilities as network requests 
-  in modern (cloud-native) applications, including secure handling, intelligent routing, robust observability, 
-  and integration with backend (API) systems for personalization.*
+  *Prompts are nuanced and opaque user requests, which require the same capabilities as traditional HTTP requests 
+  including secure handling, intelligent routing, robust observability, and integration with backend (API)
+  systems for personalization - all outside business logic.*

-In practice, achieving the above goal is incredibly difficult. Arch attempts to do so by providing the 
+In practice, achieving the above goal is incredibly difficult. Arch attempts to do so by providing the
 following high level features:

-**Out of process archtiecture, built on Envoy:** Arch is takes a dependency on `Envoy <http://envoyproxy.io/>`_ 
-and is a self-contained process that is designed to run alongside your application servers. Arch uses 
-Envoy's HTTP connection management subsystem and HTTP L7 filtering capabilities to extend its' proxying 
-functionality. This gives Arch several advantages:
+_____________________________________________________________________________________________________________

-* Arch builds on Envoy's success. Envoy is used at masssive sacle by the leading technology companies of 
-  our time including `AirBnB <https://www.airbnb.com>`_, `Dropbox <https://www.dropbox.com>`_, 
-  `Google <https://www.google.com>`_, `Reddit <https://www.reddit.com>`_, `Stripe <https://www.stripe.com>`_, 
-  etc. Its battle tested and scales linearly with usage and enables developers to focus on what really matters: 
-  application and business logic.
+**Out-of-process architecture, built on** `Envoy <http://envoyproxy.io/>`_: Arch is takes a dependency on
+Envoy and is a self-contained process that is designed to run alongside your application servers. Arch uses
+Envoy's HTTP connection management subsystem, HTTP L7 filtering and telemetry capabilities to extend the
+functionality exclusively for prompts and LLMs. This gives Arch several advantages:

-* Arch works with any application language. A single Arch deployment can act as gateway for AI applications 
-  written in Python, Java, C++, Go, Php, etc. 
+* Arch builds on Envoy's proven success. Envoy is used at masssive sacle by the leading technology companies of
+  our time including `AirBnB <https://www.airbnb.com>`_, `Dropbox <https://www.dropbox.com>`_,
+  `Google <https://www.google.com>`_, `Reddit <https://www.reddit.com>`_, `Stripe <https://www.stripe.com>`_,
+  etc. Its battle tested and scales linearly with usage and enables developers to focus on what really matters:
+  application features and business logic.

-* As anyone that has worked with a modern application architecture knows, deploying library upgrades 
-  can be incredibly painful. Arch can be deployed and upgraded quickly across your infrastructure 
-  transparently.
+* Arch works with any application language. A single Arch deployment can act as gateway for AI applications
+  written in Python, Java, C++, Go, Php, etc.

-**Engineered with LLMs:** Arch is engineered with specialized LLMs that are desgined for fast, cost-effective 
-and acurrate handling of prompts. These (sub-billion parameter) :ref:`LLMs <llms_in_arch>` are designed to be 
-best-in-class for critcal but undifferentiated prompt-related tasks like 1) applying guardrails for jailbreak 
-attempts 2) extracting critical information from prompts (like follow-on, clarifying questions, etc.) so that 
-you can improve the speed and accuracy of retrieval, and be able to convert prompts into API sematics when necessary 
-to build text-to-action (or agentic) applications. The focus for Arch is to make prompt processing indistiguishable 
-from the processing of a traditional HTTP request before forwarding it to an application server. With our focus on 
-speed and cost, Arch uses purpose-built LLMs and will continue to invest in those to lower latency (and cost) while 
-maintaining exceptional baseline performance with frontier LLMs like `OpenAI <https:openai.com>`_, and 
-`Anthropic <https:www.anthropic.com>`_.
+* Arch can be deployed and upgraded quickly across your infrastructure transparently without horrid pain of 
+  deploying library upgrades in your applications.

-**Prompt Guardrails:** Arch helps you apply prompt guardrails in a centralized way for better governance 
-hygiene. With prompt guardrails you can prevent `jailbreak <https://github.com/verazuo/jailbreak_llms>`_ 
-attempts or toxicity present in user's prompts without having to write a single line of code. To learn more about 
-how to configure guardrails available in Arch, read :ref:`more <llms_in_arch>`. 
+**Engineered with Fast LLMs:** Arch is engineered with specialized (sub-billion) LLMs that are desgined for fast,
+cost-effective and acurrate handling of prompts. These :ref:`LLMs <llms_in_arch>` are designed to be
+best-in-class for critcal prompt-related tasks like:

-**Function Calling:** Arch helps you personalize GenAI apps by enabling calls to application-specific (API) 
-operations using prompts. This involves any predefined functions or APIs you want to expose to users to 
-perform tasks, gather information, or manipulate data. With function calling, you have flexibilityto support 
-agentic workflows tailored to specific use cases - from updating insurance claims to creating ad campaigns. 
-Arch analyzes prompts, extracts critical information from prompts, engages in lightweight conversation with the 
-user to gather any missing parameters and makes API calls so that you can focus on writing business logic.
+* **Function/API Calling:** Arch helps you easily personalize your applications by enabling calls to
+  application-specific (API) operations via user prompts. This involves any predefined functions or APIs
+  you want to expose to users to perform tasks, gather information, or manipulate data. With function calling,
+  you have flexibility to support "agentic" experiences tailored to specific use cases - from updating insurance
+  claims to creating ad campaigns - via prompts. Arch analyzes prompts, extracts critical information from
+  prompts, engages in lightweight conversation with the user to gather any missing parameters and makes API
+  calls so that you can focus on writing business logic. For more details, read :ref:`prompt processing <arch_overview_prompt_handling>`.

-**Best-In Class Monitoring & Traffic Management:** Arch offers several monitoring metrics that help you 
-understand three critical aspects of your application: latency, token usage, and error rates by LLM provider. 
-Latency measures the speed at which your application is responding to users, which includes metrics like time 
-to first token (TFT), time per output token (TOT) metrics, and the total latency as perceived by users. In 
-addition, Arch offers several capabilities for calls originating from your applications to upstream LLMs, 
-including a vendor-agnostic SDK to make LLM calls, smart retries on errors from upstream LLMs, and automatic 
-cutover to other LLMs configured for continuous availability and disaster recovery scenarios.
+* **Prompt Guardrails:** Arch helps you improve the safety of your application by applying prompt guardrails in
+  a centralized way for better governance hygiene. With prompt guardrails you can prevent `jailbreak <https://github.com/verazuo/jailbreak_llms>`_
+  attempts or toxicity present in user's prompts without having to write a single line of code. To learn more
+  about how to configure guardrails available in Arch, read :ref:`prompt processing <arch_overview_prompt_handling>`.

-**Front/edge proxy support:** There is substantial benefit in using the same software at the edge (observability, 
-prompt management, load balancing algorithms, etc.) as it is . Arch has a feature set that makes it well suited 
-as an edge proxy for most modern web application use cases. This includes TLS termination, HTTP/1.1 HTTP/2 and 
-HTTP/3 support and prompt-based routing.
+* **Intent-Drift Detection:** Developers struggle to handle `follow-up <https://www.reddit.com/r/ChatGPTPromptGenius/comments/17dzmpy/how_to_use_rag_with_conversation_history_for/?>`_,
+  or `clarifying <https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/>`_
+  questions. Specifically, when users ask for modifications or additions to previous responses their AI applications
+  often generate entirely new responses instead of adjusting the previous ones. Arch offers intent-drift detection as a
+  feature so that developers know when the user has shifted away from the previous intent so that they can improve
+  their retrieval, lower overall token cost and dramatically improve the speed and accuracy of their responses back
+  to users.
+
+**Traffic Management:** Arch offers several capabilities for LLM calls originating from your applications, including a
+vendor-agnostic SDK to make LLM calls, smart retries on errors from upstream LLMs, and automatic cutover to other LLMs
+configured in Arch for continuous availability and disaster recovery scenarios. Arch extends Envoy's `cluster subsystem
+<https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/cluster_manager>`_ to manage upstream connections
+to LLMs so that you can build resilient AI applications.
+
+**Front/edge Gateway:** There is substantial benefit in using the same software at the edge (observability,
+traffic shaping alogirithms, applying guardrails, etc.) as for outbound LLM inference use cases. Arch has the feature set
+that makes it exceptionally well suited as an edge gateway for AI applications. This includes TLS termination, rate limiting,
+and prompt-based routing.
+
+**Best-In Class Monitoring:** Arch offers several monitoring metrics that help you understand three
+critical aspects of your application: latency, token usage, and error rates by an upstream LLM provider. Latency
+measures the speed at which your application is responding to users, which includes metrics like time to first
+token (TFT), time per output token (TOT) metrics, and the total latency as perceived by users.
+
+**End-to-End Tracing:** Arch propagates trace context using the W3C Trace Context standard, specifically through 
+the ``traceparent`` header. This allows each component in the system to record its part of the request flow, 
+enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Arch ensures that
+developers can capture this trace data consistently and in a format compatible with various observability tools.
+For more details, read :ref:`tracing <arch_overview_tracing>`.