mirror of
https://github.com/katanemo/plano.git
synced 2026-04-27 01:36:33 +02:00
Adil/fix salman docs (#75)
* added the first set of docs for our technical docs * more docuemtnation changes * added support for prompt processing and updated life of a request * updated docs to including getting help sections and updated life of a request * committing local changes for getting started guide, sample applications, and full reference spec for prompt-config * updated configuration reference, added sample app skeleton, updated favico * fixed the configuration refernce file, and made minor changes to the intent detection. commit v1 for now * Updated docs with use cases and example code, updated what is arch, and made minor changes throughout * fixed imaged and minor doc fixes * add sphinx_book_theme * updated README, and make some minor fixes to documetnation * fixed README.md * fixed image width --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
This commit is contained in:
parent
2d31aeaa36
commit
13dff3089d
33 changed files with 931 additions and 287 deletions
57
docs/source/getting_started/use_cases/function_calling.rst
Normal file
57
docs/source/getting_started/use_cases/function_calling.rst
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
Agentic (Text-to-Action) Apps
|
||||
==============================
|
||||
|
||||
Arch helps you easily personalize your applications by enabling calls to application-specific (API) operations
|
||||
via user prompts. This involves any predefined functions or APIs you want to expose to users to perform tasks,
|
||||
gather information, or manipulate data. With function calling, you have flexibility to support “agentic” apps
|
||||
tailored to specific use cases - from updating insurance claims to creating ad campaigns - via prompts.
|
||||
|
||||
Arch analyzes prompts, extracts critical information from prompts, engages in lightweight conversation with
|
||||
the user to gather any missing parameters and makes API calls so that you can focus on writing business logic.
|
||||
Arch does this via its purpose-built Arch-FC1B LLM - the fastest (200ms p90 - 10x faser than GPT-4o) and cheapest
|
||||
(100x than GPT-40) function-calling LLM that matches performance with frontier models.
|
||||
______________________________________________________________________________________________
|
||||
|
||||
Single Function Call
|
||||
--------------------
|
||||
In the most common scenario, users will request a single action via prompts, and Arch efficiently processes the
|
||||
request by extracting relevant parameters, validating the input, and calling the designated function or API. Here
|
||||
is how you would go about enabling this scenario with Arch:
|
||||
|
||||
Step 1: Define prompt targets with functions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: /_config/function-calling-network-agent.yml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:emphasize-lines: 16-37
|
||||
:caption: Define prompt targets that can enable users to engage with API and backened functions of an app
|
||||
|
||||
Step 2: Process request parameters in Flask
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Once the prompt targets are configured as above, handling those parameters is
|
||||
|
||||
.. literalinclude:: /_include/parameter_handling_flask.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:caption: Flask API example for parameter extraction via HTTP request parameters
|
||||
|
||||
Parallel/ Multiple Function Calling
|
||||
-----------------------------------
|
||||
In more complex use cases, users may request multiple actions or need multiple APIs/functions to be called
|
||||
simultaneously or sequentially. With Arch, you can handle these scenarios efficiently using parallel or multiple
|
||||
function calling. This allows your application to engage in a broader range of interactions, such as updating
|
||||
different datasets, triggering events across systems, or collecting results from multiple services in one prompt.
|
||||
|
||||
Arch-FC1B is built to manage these parallel tasks efficiently, ensuring low latency and high throughput, even
|
||||
when multiple functions are invoked. It provides two mechanisms to handle these cases:
|
||||
|
||||
Step 1: Define Multiple Function Targets
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When enabling multiple function calling, define the prompt targets in a way that supports multiple functions or
|
||||
API calls based on the user's prompt. These targets can be triggered in parallel or sequentially, depending on
|
||||
the user's intent.
|
||||
|
||||
Example of Multiple Prompt Targets in YAML:
|
||||
94
docs/source/getting_started/use_cases/rag.rst
Normal file
94
docs/source/getting_started/use_cases/rag.rst
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
Retrieval-Augmented (RAG)
|
||||
====================================
|
||||
|
||||
The following section describes how Arch can help you build faster, smarter and more accurate
|
||||
Retrieval-Augmented Generation (RAG) applications.
|
||||
|
||||
Intent-drift detection
|
||||
----------------------
|
||||
|
||||
Developers struggle to handle `follow-up <https://www.reddit.com/r/ChatGPTPromptGenius/comments/17dzmpy/how_to_use_rag_with_conversation_history_for/?>`_
|
||||
or `clarifying <https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/>`_
|
||||
questions. Specifically, when users ask for changes or additions to previous responses their AI applications often
|
||||
generate entirely new responses instead of adjusting previous ones. Arch offers *intent-drift* tracking as a feature so
|
||||
that developers can know when the user has shifted away from a previous intent so that they can dramatically improve
|
||||
retrieval accuracy, lower overall token cost and improve the speed of their responses back to users.
|
||||
|
||||
Arch uses its built-in lightweight NLI and embedding models to know if the user has steered away from an active intent.
|
||||
Arch's intent-drift detection mechanism is based on its' *prompt_targets* primtive. Arch tries to match an incoming
|
||||
prompt to one of the *prompt_targets* configured in the gateway. Once it detects that the user has moved away from an active
|
||||
active intent, Arch adds the ``x-arch-intent-drift`` headers to the request before sending it your application servers.
|
||||
|
||||
.. literalinclude:: /_include/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 95-125
|
||||
:emphasize-lines: 14-22
|
||||
:caption: :download:`Intent drift detection in python </_include/intent_detection.py>`
|
||||
|
||||
_____________________________________________________________________________________________________________________
|
||||
|
||||
.. Note::
|
||||
|
||||
Arch is (mostly) stateless so that it can scale in an embarrassingly parrallel fashion. So, while Arch offers
|
||||
intent-drift detetction, you still have to maintain converational state with intent drift as meta-data. The
|
||||
following code snippets show how easily you can build and enrich conversational history with Langchain (in python),
|
||||
so that you can use the most relevant prompts for your retrieval and for prompting upstream LLMs.
|
||||
|
||||
|
||||
Step 1: define ConversationBufferMemory
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: /_include/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 1-21
|
||||
|
||||
Step 2: update ConversationBufferMemory w/ intent
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: /_include/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 22-62
|
||||
|
||||
Step 3: get Messages based on latest drift
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: /_include/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 64-76
|
||||
|
||||
|
||||
You can used the last set of messages that match to an intent to prompt an LLM, use it with an vector-DB for
|
||||
improved retrieval, etc. With Arch and a few lines of code, you can improve the retrieval accuracy, lower overall
|
||||
token cost and dramatically improve the speed of their responses back to users.
|
||||
|
||||
Smarter retrival with parameter extraction
|
||||
------------------------------------------
|
||||
|
||||
To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
|
||||
enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
|
||||
retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
|
||||
the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
|
||||
streamline data retrieval and processing to build more efficient and precise RAG applications.
|
||||
|
||||
Step 1: Define prompt targets with parameter definitions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: /_config/rag-prompt-targets.yml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:emphasize-lines: 16-36
|
||||
:caption: prompt-config.yaml for parameter extraction for RAG scenarios
|
||||
|
||||
Step 2: Process request parameters in Flask
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Once the prompt targets are configured as above, handling those parameters is
|
||||
|
||||
.. literalinclude:: /_include/parameter_handling_flask.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:caption: Flask API example for parameter extraction via HTTP request parameters
|
||||
Loading…
Add table
Add a link
Reference in a new issue