mirror of
https://github.com/katanemo/plano.git
synced 2026-06-26 15:39:40 +02:00
Doc Update (#129)
* init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
This commit is contained in:
parent
2a7b95582c
commit
5c7567584d
49 changed files with 1185 additions and 609 deletions
70
docs/source/build_with_arch/agent.rst
Normal file
70
docs/source/build_with_arch/agent.rst
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
.. _arch_agent_guide:
|
||||
|
||||
Agentic Workflow
|
||||
==============================
|
||||
|
||||
Arch helps you easily personalize your applications by calling application-specific (API) functions
|
||||
via user prompts. This involves any predefined functions or APIs you want to expose to users to perform tasks,
|
||||
gather information, or manipulate data. This capability is generally referred to as **function calling**, where
|
||||
you have the flexibility to support “agentic” apps tailored to specific use cases - from updating insurance
|
||||
claims to creating ad campaigns - via prompts.
|
||||
|
||||
Arch analyzes prompts, extracts critical information from prompts, engages in lightweight conversation with
|
||||
the user to gather any missing parameters and makes API calls so that you can focus on writing business logic.
|
||||
Arch does this via its purpose-built :ref:`Arch-FC LLM <function_calling>` - the fastest (200ms p90 - 10x faser than GPT-4o)
|
||||
and cheapest (100x than GPT-40) function-calling LLM that matches performance with frontier models.
|
||||
|
||||
.. image:: includes/agent/function-calling-flow.jpg
|
||||
:width: 100%
|
||||
:align: center
|
||||
|
||||
|
||||
Single Function Call
|
||||
--------------------
|
||||
In the most common scenario, users will request a single action via prompts, and Arch efficiently processes the
|
||||
request by extracting relevant parameters, validating the input, and calling the designated function or API. Here
|
||||
is how you would go about enabling this scenario with Arch:
|
||||
|
||||
Step 1: Define prompt targets with functions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: includes/agent/function-calling-agent.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:emphasize-lines: 16-37
|
||||
:caption: Define prompt targets that can enable users to engage with API and backened functions of an app
|
||||
|
||||
Step 2: Process request parameters in Flask
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Once the prompt targets are configured as above, handling those parameters is
|
||||
|
||||
.. literalinclude:: includes/agent/parameter_handling.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:caption: Parameter handling with Flask
|
||||
|
||||
Parallel/ Multiple Function Calling
|
||||
-----------------------------------
|
||||
In more complex use cases, users may request multiple actions or need multiple APIs/functions to be called
|
||||
simultaneously or sequentially. With Arch, you can handle these scenarios efficiently using parallel or multiple
|
||||
function calling. This allows your application to engage in a broader range of interactions, such as updating
|
||||
different datasets, triggering events across systems, or collecting results from multiple services in one prompt.
|
||||
|
||||
Arch-FC1B is built to manage these parallel tasks efficiently, ensuring low latency and high throughput, even
|
||||
when multiple functions are invoked. It provides two mechanisms to handle these cases:
|
||||
|
||||
Step 1: Define Multiple Function Targets
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When enabling multiple function calling, define the prompt targets in a way that supports multiple functions or
|
||||
API calls based on the user's prompt. These targets can be triggered in parallel or sequentially, depending on
|
||||
the user's intent.
|
||||
|
||||
Example of Multiple Prompt Targets in YAML:
|
||||
|
||||
.. literalinclude:: includes/agent/function-calling-agent.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:emphasize-lines: 16-37
|
||||
:caption: Define prompt targets that can enable users to engage with API and backened functions of an app
|
||||
|
|
@ -0,0 +1,47 @@
|
|||
version: "0.1-beta"
|
||||
listen:
|
||||
address: 127.0.0.1 | 0.0.0.0
|
||||
port_value: 8080 #If you configure port 443, you'll need to update the listener with tls_certificates
|
||||
|
||||
system_prompt: |
|
||||
You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
|
||||
llm_providers:
|
||||
- name: "OpenAI"
|
||||
provider: "openai"
|
||||
access_key: OPENAI_API_KEY
|
||||
model: gpt-4o
|
||||
stream: true
|
||||
|
||||
prompt_targets:
|
||||
- name: reboot_devices
|
||||
description: >
|
||||
This prompt target handles user requests to reboot devices.
|
||||
It ensures that when users request to reboot specific devices or device groups, the system processes the reboot commands accurately.
|
||||
|
||||
**Examples of user prompts:**
|
||||
|
||||
- "Please reboot device 12345."
|
||||
- "Restart all devices in tenant group tenant-XYZ
|
||||
- "I need to reboot devices A, B, and C."
|
||||
|
||||
path: /agent/device_reboot
|
||||
parameters:
|
||||
- name: "device_ids"
|
||||
type: list # Options: integer | float | list | dictionary | set
|
||||
description: "A list of device identifiers (IDs) to reboot."
|
||||
required: false
|
||||
- name: "device_group"
|
||||
type: string # Options: string | integer | float | list | dictionary | set
|
||||
description: "The name of the device group to reboot."
|
||||
required: false
|
||||
|
||||
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
|
||||
endpoints:
|
||||
app_server:
|
||||
# value could be ip address or a hostname with port
|
||||
# this could also be a list of endpoints for load balancing
|
||||
# for example endpoint: [ ip1:port, ip2:port ]
|
||||
endpoint: "127.0.0.1:80"
|
||||
# max time to wait for a connection to be established
|
||||
connect_timeout: 0.005s
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 297 KiB |
|
|
@ -0,0 +1,41 @@
|
|||
from flask import Flask, request, jsonify
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route('/agent/device_summary', methods=['POST'])
|
||||
def get_device_summary():
|
||||
"""
|
||||
Endpoint to retrieve device statistics based on device IDs and an optional time range.
|
||||
"""
|
||||
data = request.get_json()
|
||||
|
||||
# Validate 'device_ids' parameter
|
||||
device_ids = data.get('device_ids')
|
||||
if not device_ids or not isinstance(device_ids, list):
|
||||
return jsonify({'error': "'device_ids' parameter is required and must be a list"}), 400
|
||||
|
||||
# Validate 'time_range' parameter (optional, defaults to 7)
|
||||
time_range = data.get('time_range', 7)
|
||||
if not isinstance(time_range, int):
|
||||
return jsonify({'error': "'time_range' must be an integer"}), 400
|
||||
|
||||
# Simulate retrieving statistics for the given device IDs and time range
|
||||
# In a real application, you would query your database or external service here
|
||||
statistics = []
|
||||
for device_id in device_ids:
|
||||
# Placeholder for actual data retrieval
|
||||
stats = {
|
||||
'device_id': device_id,
|
||||
'time_range': f'Last {time_range} days',
|
||||
'data': f'Statistics data for device {device_id} over the last {time_range} days.'
|
||||
}
|
||||
statistics.append(stats)
|
||||
|
||||
response = {
|
||||
'statistics': statistics
|
||||
}
|
||||
|
||||
return jsonify(response), 200
|
||||
|
||||
if __name__ == '__main__':
|
||||
app.run(debug=True)
|
||||
152
docs/source/build_with_arch/includes/rag/intent_detection.py
Normal file
152
docs/source/build_with_arch/includes/rag/intent_detection.py
Normal file
|
|
@ -0,0 +1,152 @@
|
|||
from flask import Flask, request, jsonify
|
||||
from datetime import datetime
|
||||
import uuid
|
||||
from langchain.memory import ConversationBufferMemory
|
||||
from langchain.schema import AIMessage, HumanMessage
|
||||
from langchain import OpenAI
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
# Global dictionary to keep track of user memories
|
||||
user_memories = {}
|
||||
|
||||
def get_user_conversation(user_id):
|
||||
"""
|
||||
Retrieve the user's conversation memory using LangChain.
|
||||
If the user does not exist, initialize their conversation memory.
|
||||
"""
|
||||
if user_id not in user_memories:
|
||||
user_memories[user_id] = ConversationBufferMemory(return_messages=True)
|
||||
return user_memories[user_id]
|
||||
|
||||
def update_user_conversation(user_id, client_messages, intent_changed):
|
||||
"""
|
||||
Update the user's conversation memory with new messages using LangChain.
|
||||
Each message is augmented with a UUID, timestamp, and intent change marker.
|
||||
Only new messages are added to avoid duplication.
|
||||
"""
|
||||
memory = get_user_conversation(user_id)
|
||||
stored_messages = memory.chat_memory.messages
|
||||
|
||||
# Determine the number of stored messages
|
||||
num_stored_messages = len(stored_messages)
|
||||
new_messages = client_messages[num_stored_messages:]
|
||||
|
||||
# Process each new message
|
||||
for index, message in enumerate(new_messages):
|
||||
role = message.get('role')
|
||||
content = message.get('content')
|
||||
metadata = {
|
||||
'uuid': str(uuid.uuid4()),
|
||||
'timestamp': datetime.utcnow().isoformat(),
|
||||
'intent_changed': False # Default value
|
||||
}
|
||||
|
||||
# Mark the intent change on the last message if detected
|
||||
if intent_changed and index == len(new_messages) - 1:
|
||||
metadata['intent_changed'] = True
|
||||
|
||||
# Create a new message with metadata
|
||||
if role == 'user':
|
||||
memory.chat_memory.add_message(
|
||||
HumanMessage(content=content, additional_kwargs={'metadata': metadata})
|
||||
)
|
||||
elif role == 'assistant':
|
||||
memory.chat_memory.add_message(
|
||||
AIMessage(content=content, additional_kwargs={'metadata': metadata})
|
||||
)
|
||||
else:
|
||||
# Handle other roles if necessary
|
||||
pass
|
||||
|
||||
return memory
|
||||
|
||||
def get_messages_since_last_intent(messages):
|
||||
"""
|
||||
Retrieve messages from the last intent change onwards using LangChain.
|
||||
"""
|
||||
messages_since_intent = []
|
||||
for message in reversed(messages):
|
||||
# Insert message at the beginning to maintain correct order
|
||||
messages_since_intent.insert(0, message)
|
||||
metadata = message.additional_kwargs.get('metadata', {})
|
||||
# Break if intent_changed is True
|
||||
if metadata.get('intent_changed', False) == True:
|
||||
break
|
||||
return messages_since_intent
|
||||
|
||||
def forward_to_llm(messages):
|
||||
"""
|
||||
Forward messages to an upstream LLM using LangChain.
|
||||
"""
|
||||
# Convert messages to a conversation string
|
||||
conversation = ""
|
||||
for message in messages:
|
||||
role = 'User' if isinstance(message, HumanMessage) else 'Assistant'
|
||||
content = message.content
|
||||
conversation += f"{role}: {content}\n"
|
||||
# Use LangChain's LLM to get a response. This call is proxied through Arch for end-to-end observability and traffic management
|
||||
llm = OpenAI()
|
||||
# Create a prompt that includes the conversation
|
||||
prompt = f"{conversation}Assistant:"
|
||||
response = llm(prompt)
|
||||
return response
|
||||
|
||||
@app.route('/process_rag', methods=['POST'])
|
||||
def process_rag():
|
||||
# Extract JSON data from the request
|
||||
data = request.get_json()
|
||||
|
||||
user_id = data.get('user_id')
|
||||
if not user_id:
|
||||
return jsonify({'error': 'User ID is required'}), 400
|
||||
|
||||
client_messages = data.get('messages')
|
||||
if not client_messages or not isinstance(client_messages, list):
|
||||
return jsonify({'error': 'Messages array is required'}), 400
|
||||
|
||||
# Extract the intent change marker from Arch's headers if present for the current prompt
|
||||
intent_changed_header = request.headers.get('x-arch-intent-marker', '').lower()
|
||||
if intent_changed_header in ['', 'false']:
|
||||
intent_changed = False
|
||||
elif intent_changed_header == 'true':
|
||||
intent_changed = True
|
||||
else:
|
||||
# Invalid value provided
|
||||
return jsonify({'error': 'Invalid value for x-arch-prompt-intent-change header'}), 400
|
||||
|
||||
# Update user conversation based on intent change
|
||||
memory = update_user_conversation(user_id, client_messages, intent_changed)
|
||||
|
||||
# Retrieve messages since last intent change for LLM
|
||||
messages_for_llm = get_messages_since_last_intent(memory.chat_memory.messages)
|
||||
|
||||
# Forward messages to upstream LLM
|
||||
llm_response = forward_to_llm(messages_for_llm)
|
||||
|
||||
# Prepare the messages to return
|
||||
messages_to_return = []
|
||||
for message in memory.chat_memory.messages:
|
||||
role = 'user' if isinstance(message, HumanMessage) else 'assistant'
|
||||
content = message.content
|
||||
metadata = message.additional_kwargs.get('metadata', {})
|
||||
message_entry = {
|
||||
'uuid': metadata.get('uuid'),
|
||||
'timestamp': metadata.get('timestamp'),
|
||||
'role': role,
|
||||
'content': content,
|
||||
'intent_changed': metadata.get('intent_changed', False)
|
||||
}
|
||||
messages_to_return.append(message_entry)
|
||||
|
||||
# Prepare the response
|
||||
response = {
|
||||
'user_id': user_id,
|
||||
'messages': messages_to_return,
|
||||
'llm_response': llm_response
|
||||
}
|
||||
|
||||
return jsonify(response), 200
|
||||
|
||||
if __name__ == '__main__':
|
||||
app.run(debug=True)
|
||||
|
|
@ -0,0 +1,41 @@
|
|||
from flask import Flask, request, jsonify
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route('/agent/device_summary', methods=['POST'])
|
||||
def get_device_summary():
|
||||
"""
|
||||
Endpoint to retrieve device statistics based on device IDs and an optional time range.
|
||||
"""
|
||||
data = request.get_json()
|
||||
|
||||
# Validate 'device_ids' parameter
|
||||
device_ids = data.get('device_ids')
|
||||
if not device_ids or not isinstance(device_ids, list):
|
||||
return jsonify({'error': "'device_ids' parameter is required and must be a list"}), 400
|
||||
|
||||
# Validate 'time_range' parameter (optional, defaults to 7)
|
||||
time_range = data.get('time_range', 7)
|
||||
if not isinstance(time_range, int):
|
||||
return jsonify({'error': "'time_range' must be an integer"}), 400
|
||||
|
||||
# Simulate retrieving statistics for the given device IDs and time range
|
||||
# In a real application, you would query your database or external service here
|
||||
statistics = []
|
||||
for device_id in device_ids:
|
||||
# Placeholder for actual data retrieval
|
||||
stats = {
|
||||
'device_id': device_id,
|
||||
'time_range': f'Last {time_range} days',
|
||||
'data': f'Statistics data for device {device_id} over the last {time_range} days.'
|
||||
}
|
||||
statistics.append(stats)
|
||||
|
||||
response = {
|
||||
'statistics': statistics
|
||||
}
|
||||
|
||||
return jsonify(response), 200
|
||||
|
||||
if __name__ == '__main__':
|
||||
app.run(debug=True)
|
||||
21
docs/source/build_with_arch/includes/rag/prompt_targets.yaml
Normal file
21
docs/source/build_with_arch/includes/rag/prompt_targets.yaml
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
prompt_targets:
|
||||
- name: get_device_statistics
|
||||
description: >
|
||||
This prompt target ensures that when users request device-related statistics, the system accurately retrieves and presents the relevant data
|
||||
based on the specified devices and time range. Examples of user prompts, include:
|
||||
|
||||
- "Show me the performance stats for device 12345 over the past week."
|
||||
- "What are the error rates for my devices in the last 24 hours?"
|
||||
- "I need statistics on device 789 over the last 10 days."
|
||||
|
||||
path: /agent/device_summary
|
||||
parameters:
|
||||
- name: "device_ids"
|
||||
type: list # Options: integer | float | list | dictionary | set
|
||||
description: "A list of device identifiers (IDs) for which the statistics are requested."
|
||||
required: true
|
||||
- name: "time_range"
|
||||
type: integer # Options: integer | float | list | dictionary | set
|
||||
description: "The number of days in the past over which to retrieve device statistics. Defaults to 7 days if not specified."
|
||||
required: false
|
||||
default: 7
|
||||
94
docs/source/build_with_arch/rag.rst
Normal file
94
docs/source/build_with_arch/rag.rst
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
.. _arch_rag_guide:
|
||||
|
||||
RAG Application
|
||||
===============
|
||||
|
||||
The following section describes how Arch can help you build faster, smarter and more accurate
|
||||
Retrieval-Augmented Generation (RAG) applications.
|
||||
|
||||
Intent-drift Detection
|
||||
----------------------
|
||||
|
||||
Developers struggle to handle `follow-up <https://www.reddit.com/r/ChatGPTPromptGenius/comments/17dzmpy/how_to_use_rag_with_conversation_history_for/?>`_
|
||||
or `clarifying <https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/>`_
|
||||
questions. Specifically, when users ask for changes or additions to previous responses their AI applications often
|
||||
generate entirely new responses instead of adjusting previous ones. Arch offers *intent-drift* tracking as a feature so
|
||||
that developers can know when the user has shifted away from a previous intent so that they can dramatically improve
|
||||
retrieval accuracy, lower overall token cost and improve the speed of their responses back to users.
|
||||
|
||||
Arch uses its built-in lightweight NLI and embedding models to know if the user has steered away from an active intent.
|
||||
Arch's intent-drift detection mechanism is based on its' *prompt_targets* primtive. Arch tries to match an incoming
|
||||
prompt to one of the *prompt_targets* configured in the gateway. Once it detects that the user has moved away from an active
|
||||
active intent, Arch adds the ``x-arch-intent-drift`` headers to the request before sending it your application servers.
|
||||
|
||||
.. literalinclude:: includes/rag/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 95-125
|
||||
:emphasize-lines: 14-22
|
||||
:caption: Intent Detection Example
|
||||
|
||||
|
||||
.. Note::
|
||||
|
||||
Arch is (mostly) stateless so that it can scale in an embarrassingly parrallel fashion. So, while Arch offers
|
||||
intent-drift detetction, you still have to maintain converational state with intent drift as meta-data. The
|
||||
following code snippets show how easily you can build and enrich conversational history with Langchain (in python),
|
||||
so that you can use the most relevant prompts for your retrieval and for prompting upstream LLMs.
|
||||
|
||||
|
||||
Step 1: Define ConversationBufferMemory
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: includes/rag/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 1-21
|
||||
|
||||
Step 2: Update ConversationBufferMemory w/ intent
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: includes/rag/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 22-62
|
||||
|
||||
Step 3: Get Messages based on latest drift
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: includes/rag/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 64-76
|
||||
|
||||
|
||||
You can used the last set of messages that match to an intent to prompt an LLM, use it with an vector-DB for
|
||||
improved retrieval, etc. With Arch and a few lines of code, you can improve the retrieval accuracy, lower overall
|
||||
token cost and dramatically improve the speed of their responses back to users.
|
||||
|
||||
Parameter Extraction for RAG
|
||||
----------------------------
|
||||
|
||||
To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
|
||||
enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
|
||||
retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
|
||||
the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
|
||||
streamline data retrieval and processing to build more efficient and precise RAG applications.
|
||||
|
||||
Step 1: Define prompt targets with parameter definitions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: includes/rag/prompt_targets.yaml
|
||||
:language: yaml
|
||||
:caption: Prompt Targets
|
||||
:linenos:
|
||||
|
||||
Step 2: Process request parameters in Flask
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Once the prompt targets are configured as above, handling those parameters is
|
||||
|
||||
.. literalinclude:: includes/rag/parameter_handling.py
|
||||
:language: python
|
||||
:caption: Parameter handling with Flask
|
||||
:linenos:
|
||||
Loading…
Add table
Add a link
Reference in a new issue