Adil/fix salman docs (#75)

* added the first set of docs for our technical docs

* more docuemtnation changes

* added support for prompt processing and updated life of a request

* updated docs to including getting help sections and updated life of a request

* committing local changes for getting started guide, sample applications, and full reference spec for prompt-config

* updated configuration reference, added sample app skeleton, updated favico

* fixed the configuration refernce file, and made minor changes to the intent detection. commit v1 for now

* Updated docs with use cases and example code, updated what is arch, and made minor changes throughout

* fixed imaged and minor doc fixes

* add sphinx_book_theme

* updated README, and make some minor fixes to documetnation

* fixed README.md

* fixed image width

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
Co-authored-by: Adil Hafeez <adil@katanemo.com>
This commit is contained in:
Salman Paracha 2024-09-24 13:54:17 -07:00 committed by GitHub
parent 2d31aeaa36
commit 13dff3089d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
33 changed files with 931 additions and 287 deletions

View file

@ -0,0 +1,57 @@
Agentic (Text-to-Action) Apps
==============================
Arch helps you easily personalize your applications by enabling calls to application-specific (API) operations
via user prompts. This involves any predefined functions or APIs you want to expose to users to perform tasks,
gather information, or manipulate data. With function calling, you have flexibility to support “agentic” apps
tailored to specific use cases - from updating insurance claims to creating ad campaigns - via prompts.
Arch analyzes prompts, extracts critical information from prompts, engages in lightweight conversation with
the user to gather any missing parameters and makes API calls so that you can focus on writing business logic.
Arch does this via its purpose-built Arch-FC1B LLM - the fastest (200ms p90 - 10x faser than GPT-4o) and cheapest
(100x than GPT-40) function-calling LLM that matches performance with frontier models.
______________________________________________________________________________________________
Single Function Call
--------------------
In the most common scenario, users will request a single action via prompts, and Arch efficiently processes the
request by extracting relevant parameters, validating the input, and calling the designated function or API. Here
is how you would go about enabling this scenario with Arch:
Step 1: Define prompt targets with functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. literalinclude:: /_config/function-calling-network-agent.yml
:language: yaml
:linenos:
:emphasize-lines: 16-37
:caption: Define prompt targets that can enable users to engage with API and backened functions of an app
Step 2: Process request parameters in Flask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the prompt targets are configured as above, handling those parameters is
.. literalinclude:: /_include/parameter_handling_flask.py
:language: python
:linenos:
:caption: Flask API example for parameter extraction via HTTP request parameters
Parallel/ Multiple Function Calling
-----------------------------------
In more complex use cases, users may request multiple actions or need multiple APIs/functions to be called
simultaneously or sequentially. With Arch, you can handle these scenarios efficiently using parallel or multiple
function calling. This allows your application to engage in a broader range of interactions, such as updating
different datasets, triggering events across systems, or collecting results from multiple services in one prompt.
Arch-FC1B is built to manage these parallel tasks efficiently, ensuring low latency and high throughput, even
when multiple functions are invoked. It provides two mechanisms to handle these cases:
Step 1: Define Multiple Function Targets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When enabling multiple function calling, define the prompt targets in a way that supports multiple functions or
API calls based on the user's prompt. These targets can be triggered in parallel or sequentially, depending on
the user's intent.
Example of Multiple Prompt Targets in YAML:

View file

@ -0,0 +1,94 @@
Retrieval-Augmented (RAG)
====================================
The following section describes how Arch can help you build faster, smarter and more accurate
Retrieval-Augmented Generation (RAG) applications.
Intent-drift detection
----------------------
Developers struggle to handle `follow-up <https://www.reddit.com/r/ChatGPTPromptGenius/comments/17dzmpy/how_to_use_rag_with_conversation_history_for/?>`_
or `clarifying <https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/>`_
questions. Specifically, when users ask for changes or additions to previous responses their AI applications often
generate entirely new responses instead of adjusting previous ones. Arch offers *intent-drift* tracking as a feature so
that developers can know when the user has shifted away from a previous intent so that they can dramatically improve
retrieval accuracy, lower overall token cost and improve the speed of their responses back to users.
Arch uses its built-in lightweight NLI and embedding models to know if the user has steered away from an active intent.
Arch's intent-drift detection mechanism is based on its' *prompt_targets* primtive. Arch tries to match an incoming
prompt to one of the *prompt_targets* configured in the gateway. Once it detects that the user has moved away from an active
active intent, Arch adds the ``x-arch-intent-drift`` headers to the request before sending it your application servers.
.. literalinclude:: /_include/intent_detection.py
:language: python
:linenos:
:lines: 95-125
:emphasize-lines: 14-22
:caption: :download:`Intent drift detection in python </_include/intent_detection.py>`
_____________________________________________________________________________________________________________________
.. Note::
Arch is (mostly) stateless so that it can scale in an embarrassingly parrallel fashion. So, while Arch offers
intent-drift detetction, you still have to maintain converational state with intent drift as meta-data. The
following code snippets show how easily you can build and enrich conversational history with Langchain (in python),
so that you can use the most relevant prompts for your retrieval and for prompting upstream LLMs.
Step 1: define ConversationBufferMemory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. literalinclude:: /_include/intent_detection.py
:language: python
:linenos:
:lines: 1-21
Step 2: update ConversationBufferMemory w/ intent
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. literalinclude:: /_include/intent_detection.py
:language: python
:linenos:
:lines: 22-62
Step 3: get Messages based on latest drift
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. literalinclude:: /_include/intent_detection.py
:language: python
:linenos:
:lines: 64-76
You can used the last set of messages that match to an intent to prompt an LLM, use it with an vector-DB for
improved retrieval, etc. With Arch and a few lines of code, you can improve the retrieval accuracy, lower overall
token cost and dramatically improve the speed of their responses back to users.
Smarter retrival with parameter extraction
------------------------------------------
To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
streamline data retrieval and processing to build more efficient and precise RAG applications.
Step 1: Define prompt targets with parameter definitions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. literalinclude:: /_config/rag-prompt-targets.yml
:language: yaml
:linenos:
:emphasize-lines: 16-36
:caption: prompt-config.yaml for parameter extraction for RAG scenarios
Step 2: Process request parameters in Flask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the prompt targets are configured as above, handling those parameters is
.. literalinclude:: /_include/parameter_handling_flask.py
:language: python
:linenos:
:caption: Flask API example for parameter extraction via HTTP request parameters