2024-09-25 23:43:34 -07:00
.. _arch_rag_guide:
2024-10-06 16:54:34 -07:00
RAG Application
===============
2024-09-24 13:54:17 -07:00
2024-09-30 14:54:01 -07:00
The following section describes how Arch can help you build faster, smarter and more accurate
2024-09-24 13:54:17 -07:00
Retrieval-Augmented Generation (RAG) applications.
2024-09-25 23:43:34 -07:00
Intent-drift Detection
2024-09-24 13:54:17 -07:00
----------------------
2024-10-08 13:18:34 -07:00
Developers struggle to handle `` follow-up `` or `` clarification `` questions.
Specifically, when users ask for changes or additions to previous responses their AI applications often generate entirely new responses instead of adjusting previous ones.
Arch offers **intent-drift** tracking as a feature so that developers can know when the user has shifted away from a previous intent so that they can dramatically improve retrieval accuracy, lower overall token cost and improve the speed of their responses back to users.
2024-09-24 13:54:17 -07:00
2024-09-30 14:54:01 -07:00
Arch uses its built-in lightweight NLI and embedding models to know if the user has steered away from an active intent.
2024-10-08 13:18:34 -07:00
Arch's intent-drift detection mechanism is based on its' :ref: `prompt_targets <prompt_target>` primtive. Arch tries to match an incoming
prompt to one of the prompt_targets configured in the gateway. Once it detects that the user has moved away from an active
2024-09-24 13:54:17 -07:00
active intent, Arch adds the `` x-arch-intent-drift `` headers to the request before sending it your application servers.
2024-10-06 16:54:34 -07:00
.. literalinclude :: includes/rag/intent_detection.py
2024-09-24 13:54:17 -07:00
:language: python
:linenos:
2024-10-08 13:18:34 -07:00
:lines: 101-157
:emphasize-lines: 14-24
2024-10-06 16:54:34 -07:00
:caption: Intent Detection Example
2024-09-24 13:54:17 -07:00
.. Note ::
2024-09-30 14:54:01 -07:00
Arch is (mostly) stateless so that it can scale in an embarrassingly parrallel fashion. So, while Arch offers
intent-drift detetction, you still have to maintain converational state with intent drift as meta-data. The
following code snippets show how easily you can build and enrich conversational history with Langchain (in python),
2024-09-24 13:54:17 -07:00
so that you can use the most relevant prompts for your retrieval and for prompting upstream LLMs.
2024-10-06 16:54:34 -07:00
Step 1: Define ConversationBufferMemory
2024-10-08 13:18:34 -07:00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2024-09-24 13:54:17 -07:00
2024-10-06 16:54:34 -07:00
.. literalinclude :: includes/rag/intent_detection.py
2024-09-24 13:54:17 -07:00
:language: python
:linenos:
:lines: 1-21
2024-10-08 13:18:34 -07:00
Step 2: Update ConversationBufferMemory with Intents
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2024-09-24 13:54:17 -07:00
2024-10-06 16:54:34 -07:00
.. literalinclude :: includes/rag/intent_detection.py
2024-09-24 13:54:17 -07:00
:language: python
:linenos:
2024-10-08 13:18:34 -07:00
:lines: 24-64
2024-09-24 13:54:17 -07:00
2024-10-06 16:54:34 -07:00
Step 3: Get Messages based on latest drift
2024-10-08 13:18:34 -07:00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2024-09-24 13:54:17 -07:00
2024-10-06 16:54:34 -07:00
.. literalinclude :: includes/rag/intent_detection.py
2024-09-24 13:54:17 -07:00
:language: python
:linenos:
2024-10-08 13:18:34 -07:00
:lines: 67-80
2024-09-24 13:54:17 -07:00
2024-09-30 14:54:01 -07:00
You can used the last set of messages that match to an intent to prompt an LLM, use it with an vector-DB for
improved retrieval, etc. With Arch and a few lines of code, you can improve the retrieval accuracy, lower overall
2024-09-24 13:54:17 -07:00
token cost and dramatically improve the speed of their responses back to users.
2024-09-30 14:54:01 -07:00
Parameter Extraction for RAG
2024-09-25 23:43:34 -07:00
----------------------------
2024-09-24 13:54:17 -07:00
2024-09-30 14:54:01 -07:00
To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
2024-09-24 13:54:17 -07:00
streamline data retrieval and processing to build more efficient and precise RAG applications.
2024-10-08 13:18:34 -07:00
Step 1: Define Prompt Targets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2024-09-24 13:54:17 -07:00
2024-10-06 16:54:34 -07:00
.. literalinclude :: includes/rag/prompt_targets.yaml
2024-09-24 13:54:17 -07:00
:language: yaml
2024-10-06 16:54:34 -07:00
:caption: Prompt Targets
2024-09-24 13:54:17 -07:00
:linenos:
2024-10-08 13:18:34 -07:00
Step 2: Process Request Parameters in Flask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2024-09-24 13:54:17 -07:00
2024-09-30 14:54:01 -07:00
Once the prompt targets are configured as above, handling those parameters is
2024-09-24 13:54:17 -07:00
2024-10-06 16:54:34 -07:00
.. literalinclude :: includes/rag/parameter_handling.py
2024-09-24 13:54:17 -07:00
:language: python
2024-10-06 16:54:34 -07:00
:caption: Parameter handling with Flask
2024-09-24 13:54:17 -07:00
:linenos: