mirror of
https://github.com/katanemo/plano.git
synced 2026-05-15 11:02:39 +02:00
Fix errors and improve Doc (#143)
* Fix link issues and add icons * Improve Doc * fix test * making minor modifications to shuguangs' doc changes --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
This commit is contained in:
parent
3ed50e61d2
commit
b30ad791f7
27 changed files with 396 additions and 329 deletions
|
|
@ -11,7 +11,7 @@ claims to creating ad campaigns - via prompts.
|
|||
|
||||
Arch analyzes prompts, extracts critical information from prompts, engages in lightweight conversation with
|
||||
the user to gather any missing parameters and makes API calls so that you can focus on writing business logic.
|
||||
Arch does this via its purpose-built :ref:`Arch-FC LLM <function_calling>` - the fastest (200ms p90 - 10x faser than GPT-4o)
|
||||
Arch does this via its purpose-built :ref:`Arch-Function <function_calling>` - the fastest (200ms p90 - 10x faser than GPT-4o)
|
||||
and cheapest (100x than GPT-40) function-calling LLM that matches performance with frontier models.
|
||||
|
||||
.. image:: includes/agent/function-calling-flow.jpg
|
||||
|
|
@ -25,17 +25,17 @@ In the most common scenario, users will request a single action via prompts, and
|
|||
request by extracting relevant parameters, validating the input, and calling the designated function or API. Here
|
||||
is how you would go about enabling this scenario with Arch:
|
||||
|
||||
Step 1: Define prompt targets with functions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Step 1: Define Prompt Targets
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. literalinclude:: includes/agent/function-calling-agent.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:emphasize-lines: 16-37
|
||||
:caption: Define prompt targets that can enable users to engage with API and backened functions of an app
|
||||
:emphasize-lines: 21-34
|
||||
:caption: Prompt Target Example Configuration
|
||||
|
||||
Step 2: Process request parameters in Flask
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Step 2: Process Request Parameters
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Once the prompt targets are configured as above, handling those parameters is
|
||||
|
||||
|
|
@ -44,8 +44,8 @@ Once the prompt targets are configured as above, handling those parameters is
|
|||
:linenos:
|
||||
:caption: Parameter handling with Flask
|
||||
|
||||
Parallel/ Multiple Function Calling
|
||||
-----------------------------------
|
||||
Parallel & Multiple Function Calling
|
||||
------------------------------------
|
||||
In more complex use cases, users may request multiple actions or need multiple APIs/functions to be called
|
||||
simultaneously or sequentially. With Arch, you can handle these scenarios efficiently using parallel or multiple
|
||||
function calling. This allows your application to engage in a broader range of interactions, such as updating
|
||||
|
|
@ -54,8 +54,8 @@ different datasets, triggering events across systems, or collecting results from
|
|||
Arch-FC1B is built to manage these parallel tasks efficiently, ensuring low latency and high throughput, even
|
||||
when multiple functions are invoked. It provides two mechanisms to handle these cases:
|
||||
|
||||
Step 1: Define Multiple Function Targets
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Step 1: Define Prompt Targets
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When enabling multiple function calling, define the prompt targets in a way that supports multiple functions or
|
||||
API calls based on the user's prompt. These targets can be triggered in parallel or sequentially, depending on
|
||||
|
|
@ -66,5 +66,5 @@ Example of Multiple Prompt Targets in YAML:
|
|||
.. literalinclude:: includes/agent/function-calling-agent.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:emphasize-lines: 16-37
|
||||
:caption: Define prompt targets that can enable users to engage with API and backened functions of an app
|
||||
:emphasize-lines: 21-34
|
||||
:caption: Prompt Target Example Configuration
|
||||
|
|
|
|||
|
|
@ -1,39 +1,36 @@
|
|||
version: "0.1-beta"
|
||||
version: v0.1
|
||||
|
||||
listen:
|
||||
address: 127.0.0.1 | 0.0.0.0
|
||||
port_value: 8080 #If you configure port 443, you'll need to update the listener with tls_certificates
|
||||
|
||||
system_prompt: |
|
||||
You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
address: 0.0.0.0 # or 127.0.0.1
|
||||
port: 10000
|
||||
# Defines how Arch should parse the content from application/json or text/pain Content-type in the http request
|
||||
message_format: huggingface
|
||||
|
||||
# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
|
||||
llm_providers:
|
||||
- name: "OpenAI"
|
||||
provider: "openai"
|
||||
- name: OpenAI
|
||||
provider: openai
|
||||
access_key: OPENAI_API_KEY
|
||||
model: gpt-4o
|
||||
default: true
|
||||
stream: true
|
||||
|
||||
# default system prompt used by all prompt targets
|
||||
system_prompt: You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
|
||||
prompt_targets:
|
||||
- name: reboot_devices
|
||||
description: >
|
||||
This prompt target handles user requests to reboot devices.
|
||||
It ensures that when users request to reboot specific devices or device groups, the system processes the reboot commands accurately.
|
||||
|
||||
**Examples of user prompts:**
|
||||
|
||||
- "Please reboot device 12345."
|
||||
- "Restart all devices in tenant group tenant-XYZ
|
||||
- "I need to reboot devices A, B, and C."
|
||||
description: Reboot specific devices or device groups
|
||||
|
||||
path: /agent/device_reboot
|
||||
parameters:
|
||||
- name: "device_ids"
|
||||
type: list # Options: integer | float | list | dictionary | set
|
||||
description: "A list of device identifiers (IDs) to reboot."
|
||||
- name: device_ids
|
||||
type: list
|
||||
description: A list of device identifiers (IDs) to reboot.
|
||||
required: false
|
||||
- name: "device_group"
|
||||
type: string # Options: string | integer | float | list | dictionary | set
|
||||
description: "The name of the device group to reboot."
|
||||
- name: device_group
|
||||
type: str
|
||||
description: The name of the device group to reboot
|
||||
required: false
|
||||
|
||||
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
|
||||
|
|
@ -42,6 +39,6 @@ endpoints:
|
|||
# value could be ip address or a hostname with port
|
||||
# this could also be a list of endpoints for load balancing
|
||||
# for example endpoint: [ ip1:port, ip2:port ]
|
||||
endpoint: "127.0.0.1:80"
|
||||
endpoint: 127.0.0.1:80
|
||||
# max time to wait for a connection to be established
|
||||
connect_timeout: 0.005s
|
||||
|
|
|
|||
|
|
@ -2,7 +2,8 @@ from flask import Flask, request, jsonify
|
|||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route('/agent/device_summary', methods=['POST'])
|
||||
|
||||
@app.route("/agent/device_summary", methods=["POST"])
|
||||
def get_device_summary():
|
||||
"""
|
||||
Endpoint to retrieve device statistics based on device IDs and an optional time range.
|
||||
|
|
@ -10,14 +11,16 @@ def get_device_summary():
|
|||
data = request.get_json()
|
||||
|
||||
# Validate 'device_ids' parameter
|
||||
device_ids = data.get('device_ids')
|
||||
device_ids = data.get("device_ids")
|
||||
if not device_ids or not isinstance(device_ids, list):
|
||||
return jsonify({'error': "'device_ids' parameter is required and must be a list"}), 400
|
||||
return jsonify(
|
||||
{"error": "'device_ids' parameter is required and must be a list"}
|
||||
), 400
|
||||
|
||||
# Validate 'time_range' parameter (optional, defaults to 7)
|
||||
time_range = data.get('time_range', 7)
|
||||
time_range = data.get("time_range", 7)
|
||||
if not isinstance(time_range, int):
|
||||
return jsonify({'error': "'time_range' must be an integer"}), 400
|
||||
return jsonify({"error": "'time_range' must be an integer"}), 400
|
||||
|
||||
# Simulate retrieving statistics for the given device IDs and time range
|
||||
# In a real application, you would query your database or external service here
|
||||
|
|
@ -25,17 +28,16 @@ def get_device_summary():
|
|||
for device_id in device_ids:
|
||||
# Placeholder for actual data retrieval
|
||||
stats = {
|
||||
'device_id': device_id,
|
||||
'time_range': f'Last {time_range} days',
|
||||
'data': f'Statistics data for device {device_id} over the last {time_range} days.'
|
||||
"device_id": device_id,
|
||||
"time_range": f"Last {time_range} days",
|
||||
"data": f"Statistics data for device {device_id} over the last {time_range} days.",
|
||||
}
|
||||
statistics.append(stats)
|
||||
|
||||
response = {
|
||||
'statistics': statistics
|
||||
}
|
||||
response = {"statistics": statistics}
|
||||
|
||||
return jsonify(response), 200
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run(debug=True)
|
||||
|
|
|
|||
|
|
@ -10,6 +10,7 @@ app = Flask(__name__)
|
|||
# Global dictionary to keep track of user memories
|
||||
user_memories = {}
|
||||
|
||||
|
||||
def get_user_conversation(user_id):
|
||||
"""
|
||||
Retrieve the user's conversation memory using LangChain.
|
||||
|
|
@ -19,6 +20,7 @@ def get_user_conversation(user_id):
|
|||
user_memories[user_id] = ConversationBufferMemory(return_messages=True)
|
||||
return user_memories[user_id]
|
||||
|
||||
|
||||
def update_user_conversation(user_id, client_messages, intent_changed):
|
||||
"""
|
||||
Update the user's conversation memory with new messages using LangChain.
|
||||
|
|
@ -34,26 +36,26 @@ def update_user_conversation(user_id, client_messages, intent_changed):
|
|||
|
||||
# Process each new message
|
||||
for index, message in enumerate(new_messages):
|
||||
role = message.get('role')
|
||||
content = message.get('content')
|
||||
role = message.get("role")
|
||||
content = message.get("content")
|
||||
metadata = {
|
||||
'uuid': str(uuid.uuid4()),
|
||||
'timestamp': datetime.utcnow().isoformat(),
|
||||
'intent_changed': False # Default value
|
||||
"uuid": str(uuid.uuid4()),
|
||||
"timestamp": datetime.utcnow().isoformat(),
|
||||
"intent_changed": False, # Default value
|
||||
}
|
||||
|
||||
# Mark the intent change on the last message if detected
|
||||
if intent_changed and index == len(new_messages) - 1:
|
||||
metadata['intent_changed'] = True
|
||||
metadata["intent_changed"] = True
|
||||
|
||||
# Create a new message with metadata
|
||||
if role == 'user':
|
||||
if role == "user":
|
||||
memory.chat_memory.add_message(
|
||||
HumanMessage(content=content, additional_kwargs={'metadata': metadata})
|
||||
HumanMessage(content=content, additional_kwargs={"metadata": metadata})
|
||||
)
|
||||
elif role == 'assistant':
|
||||
elif role == "assistant":
|
||||
memory.chat_memory.add_message(
|
||||
AIMessage(content=content, additional_kwargs={'metadata': metadata})
|
||||
AIMessage(content=content, additional_kwargs={"metadata": metadata})
|
||||
)
|
||||
else:
|
||||
# Handle other roles if necessary
|
||||
|
|
@ -61,6 +63,7 @@ def update_user_conversation(user_id, client_messages, intent_changed):
|
|||
|
||||
return memory
|
||||
|
||||
|
||||
def get_messages_since_last_intent(messages):
|
||||
"""
|
||||
Retrieve messages from the last intent change onwards using LangChain.
|
||||
|
|
@ -69,12 +72,14 @@ def get_messages_since_last_intent(messages):
|
|||
for message in reversed(messages):
|
||||
# Insert message at the beginning to maintain correct order
|
||||
messages_since_intent.insert(0, message)
|
||||
metadata = message.additional_kwargs.get('metadata', {})
|
||||
metadata = message.additional_kwargs.get("metadata", {})
|
||||
# Break if intent_changed is True
|
||||
if metadata.get('intent_changed', False) == True:
|
||||
if metadata.get("intent_changed", False) == True:
|
||||
break
|
||||
|
||||
return messages_since_intent
|
||||
|
||||
|
||||
def forward_to_llm(messages):
|
||||
"""
|
||||
Forward messages to an upstream LLM using LangChain.
|
||||
|
|
@ -82,7 +87,7 @@ def forward_to_llm(messages):
|
|||
# Convert messages to a conversation string
|
||||
conversation = ""
|
||||
for message in messages:
|
||||
role = 'User' if isinstance(message, HumanMessage) else 'Assistant'
|
||||
role = "User" if isinstance(message, HumanMessage) else "Assistant"
|
||||
content = message.content
|
||||
conversation += f"{role}: {content}\n"
|
||||
# Use LangChain's LLM to get a response. This call is proxied through Arch for end-to-end observability and traffic management
|
||||
|
|
@ -92,28 +97,31 @@ def forward_to_llm(messages):
|
|||
response = llm(prompt)
|
||||
return response
|
||||
|
||||
@app.route('/process_rag', methods=['POST'])
|
||||
|
||||
@app.route("/process_rag", methods=["POST"])
|
||||
def process_rag():
|
||||
# Extract JSON data from the request
|
||||
data = request.get_json()
|
||||
|
||||
user_id = data.get('user_id')
|
||||
user_id = data.get("user_id")
|
||||
if not user_id:
|
||||
return jsonify({'error': 'User ID is required'}), 400
|
||||
return jsonify({"error": "User ID is required"}), 400
|
||||
|
||||
client_messages = data.get('messages')
|
||||
client_messages = data.get("messages")
|
||||
if not client_messages or not isinstance(client_messages, list):
|
||||
return jsonify({'error': 'Messages array is required'}), 400
|
||||
return jsonify({"error": "Messages array is required"}), 400
|
||||
|
||||
# Extract the intent change marker from Arch's headers if present for the current prompt
|
||||
intent_changed_header = request.headers.get('x-arch-intent-marker', '').lower()
|
||||
if intent_changed_header in ['', 'false']:
|
||||
intent_changed_header = request.headers.get("x-arch-intent-marker", "").lower()
|
||||
if intent_changed_header in ["", "false"]:
|
||||
intent_changed = False
|
||||
elif intent_changed_header == 'true':
|
||||
elif intent_changed_header == "true":
|
||||
intent_changed = True
|
||||
else:
|
||||
# Invalid value provided
|
||||
return jsonify({'error': 'Invalid value for x-arch-prompt-intent-change header'}), 400
|
||||
return jsonify(
|
||||
{"error": "Invalid value for x-arch-prompt-intent-change header"}
|
||||
), 400
|
||||
|
||||
# Update user conversation based on intent change
|
||||
memory = update_user_conversation(user_id, client_messages, intent_changed)
|
||||
|
|
@ -127,26 +135,27 @@ def process_rag():
|
|||
# Prepare the messages to return
|
||||
messages_to_return = []
|
||||
for message in memory.chat_memory.messages:
|
||||
role = 'user' if isinstance(message, HumanMessage) else 'assistant'
|
||||
role = "user" if isinstance(message, HumanMessage) else "assistant"
|
||||
content = message.content
|
||||
metadata = message.additional_kwargs.get('metadata', {})
|
||||
metadata = message.additional_kwargs.get("metadata", {})
|
||||
message_entry = {
|
||||
'uuid': metadata.get('uuid'),
|
||||
'timestamp': metadata.get('timestamp'),
|
||||
'role': role,
|
||||
'content': content,
|
||||
'intent_changed': metadata.get('intent_changed', False)
|
||||
"uuid": metadata.get("uuid"),
|
||||
"timestamp": metadata.get("timestamp"),
|
||||
"role": role,
|
||||
"content": content,
|
||||
"intent_changed": metadata.get("intent_changed", False),
|
||||
}
|
||||
messages_to_return.append(message_entry)
|
||||
|
||||
# Prepare the response
|
||||
response = {
|
||||
'user_id': user_id,
|
||||
'messages': messages_to_return,
|
||||
'llm_response': llm_response
|
||||
"user_id": user_id,
|
||||
"messages": messages_to_return,
|
||||
"llm_response": llm_response,
|
||||
}
|
||||
|
||||
return jsonify(response), 200
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run(debug=True)
|
||||
|
|
|
|||
|
|
@ -2,7 +2,8 @@ from flask import Flask, request, jsonify
|
|||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route('/agent/device_summary', methods=['POST'])
|
||||
|
||||
@app.route("/agent/device_summary", methods=["POST"])
|
||||
def get_device_summary():
|
||||
"""
|
||||
Endpoint to retrieve device statistics based on device IDs and an optional time range.
|
||||
|
|
@ -10,14 +11,16 @@ def get_device_summary():
|
|||
data = request.get_json()
|
||||
|
||||
# Validate 'device_ids' parameter
|
||||
device_ids = data.get('device_ids')
|
||||
device_ids = data.get("device_ids")
|
||||
if not device_ids or not isinstance(device_ids, list):
|
||||
return jsonify({'error': "'device_ids' parameter is required and must be a list"}), 400
|
||||
return jsonify(
|
||||
{"error": "'device_ids' parameter is required and must be a list"}
|
||||
), 400
|
||||
|
||||
# Validate 'time_range' parameter (optional, defaults to 7)
|
||||
time_range = data.get('time_range', 7)
|
||||
time_range = data.get("time_range", 7)
|
||||
if not isinstance(time_range, int):
|
||||
return jsonify({'error': "'time_range' must be an integer"}), 400
|
||||
return jsonify({"error": "'time_range' must be an integer"}), 400
|
||||
|
||||
# Simulate retrieving statistics for the given device IDs and time range
|
||||
# In a real application, you would query your database or external service here
|
||||
|
|
@ -25,17 +28,16 @@ def get_device_summary():
|
|||
for device_id in device_ids:
|
||||
# Placeholder for actual data retrieval
|
||||
stats = {
|
||||
'device_id': device_id,
|
||||
'time_range': f'Last {time_range} days',
|
||||
'data': f'Statistics data for device {device_id} over the last {time_range} days.'
|
||||
"device_id": device_id,
|
||||
"time_range": f"Last {time_range} days",
|
||||
"data": f"Statistics data for device {device_id} over the last {time_range} days.",
|
||||
}
|
||||
statistics.append(stats)
|
||||
|
||||
response = {
|
||||
'statistics': statistics
|
||||
}
|
||||
response = {"statistics": statistics}
|
||||
|
||||
return jsonify(response), 200
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run(debug=True)
|
||||
|
|
|
|||
|
|
@ -1,21 +1,15 @@
|
|||
prompt_targets:
|
||||
- name: get_device_statistics
|
||||
description: >
|
||||
This prompt target ensures that when users request device-related statistics, the system accurately retrieves and presents the relevant data
|
||||
based on the specified devices and time range. Examples of user prompts, include:
|
||||
|
||||
- "Show me the performance stats for device 12345 over the past week."
|
||||
- "What are the error rates for my devices in the last 24 hours?"
|
||||
- "I need statistics on device 789 over the last 10 days."
|
||||
description: Retrieve and present the relevant data based on the specified devices and time range
|
||||
|
||||
path: /agent/device_summary
|
||||
parameters:
|
||||
- name: "device_ids"
|
||||
type: list # Options: integer | float | list | dictionary | set
|
||||
description: "A list of device identifiers (IDs) for which the statistics are requested."
|
||||
- name: device_ids
|
||||
type: list
|
||||
description: A list of device identifiers (IDs) to reboot.
|
||||
required: true
|
||||
- name: "time_range"
|
||||
type: integer # Options: integer | float | list | dictionary | set
|
||||
description: "The number of days in the past over which to retrieve device statistics. Defaults to 7 days if not specified."
|
||||
- name: time_range
|
||||
type: int
|
||||
description: The number of days in the past over which to retrieve device statistics
|
||||
required: false
|
||||
default: 7
|
||||
|
|
|
|||
|
|
@ -8,24 +8,20 @@ Retrieval-Augmented Generation (RAG) applications.
|
|||
|
||||
Intent-drift Detection
|
||||
----------------------
|
||||
|
||||
Developers struggle to handle `follow-up <https://www.reddit.com/r/ChatGPTPromptGenius/comments/17dzmpy/how_to_use_rag_with_conversation_history_for/?>`_
|
||||
or `clarifying <https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/>`_
|
||||
questions. Specifically, when users ask for changes or additions to previous responses their AI applications often
|
||||
generate entirely new responses instead of adjusting previous ones. Arch offers *intent-drift* tracking as a feature so
|
||||
that developers can know when the user has shifted away from a previous intent so that they can dramatically improve
|
||||
retrieval accuracy, lower overall token cost and improve the speed of their responses back to users.
|
||||
Developers struggle to handle ``follow-up`` or ``clarification`` questions.
|
||||
Specifically, when users ask for changes or additions to previous responses their AI applications often generate entirely new responses instead of adjusting previous ones.
|
||||
Arch offers **intent-drift** tracking as a feature so that developers can know when the user has shifted away from a previous intent so that they can dramatically improve retrieval accuracy, lower overall token cost and improve the speed of their responses back to users.
|
||||
|
||||
Arch uses its built-in lightweight NLI and embedding models to know if the user has steered away from an active intent.
|
||||
Arch's intent-drift detection mechanism is based on its' *prompt_targets* primtive. Arch tries to match an incoming
|
||||
prompt to one of the *prompt_targets* configured in the gateway. Once it detects that the user has moved away from an active
|
||||
Arch's intent-drift detection mechanism is based on its' :ref:`prompt_targets <prompt_target>` primtive. Arch tries to match an incoming
|
||||
prompt to one of the prompt_targets configured in the gateway. Once it detects that the user has moved away from an active
|
||||
active intent, Arch adds the ``x-arch-intent-drift`` headers to the request before sending it your application servers.
|
||||
|
||||
.. literalinclude:: includes/rag/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 95-125
|
||||
:emphasize-lines: 14-22
|
||||
:lines: 101-157
|
||||
:emphasize-lines: 14-24
|
||||
:caption: Intent Detection Example
|
||||
|
||||
|
||||
|
|
@ -38,28 +34,28 @@ active intent, Arch adds the ``x-arch-intent-drift`` headers to the request befo
|
|||
|
||||
|
||||
Step 1: Define ConversationBufferMemory
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. literalinclude:: includes/rag/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 1-21
|
||||
|
||||
Step 2: Update ConversationBufferMemory w/ intent
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Step 2: Update ConversationBufferMemory with Intents
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. literalinclude:: includes/rag/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 22-62
|
||||
:lines: 24-64
|
||||
|
||||
Step 3: Get Messages based on latest drift
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. literalinclude:: includes/rag/intent_detection.py
|
||||
:language: python
|
||||
:linenos:
|
||||
:lines: 64-76
|
||||
:lines: 67-80
|
||||
|
||||
|
||||
You can used the last set of messages that match to an intent to prompt an LLM, use it with an vector-DB for
|
||||
|
|
@ -75,16 +71,16 @@ retrieval quality and speed of your application. By extracting parameters from t
|
|||
the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
|
||||
streamline data retrieval and processing to build more efficient and precise RAG applications.
|
||||
|
||||
Step 1: Define prompt targets with parameter definitions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Step 1: Define Prompt Targets
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. literalinclude:: includes/rag/prompt_targets.yaml
|
||||
:language: yaml
|
||||
:caption: Prompt Targets
|
||||
:linenos:
|
||||
|
||||
Step 2: Process request parameters in Flask
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Step 2: Process Request Parameters in Flask
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Once the prompt targets are configured as above, handling those parameters is
|
||||
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
version: "0.1-beta"
|
||||
version: v0.1
|
||||
|
||||
listener:
|
||||
address: 0.0.0.0 # or 127.0.0.1
|
||||
|
|
@ -8,52 +8,49 @@ listener:
|
|||
|
||||
# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
|
||||
llm_providers:
|
||||
- name: "OpenAI"
|
||||
provider: "openai"
|
||||
access_key: $OPENAI_API_KEY
|
||||
- name: OpenAI
|
||||
provider: openai
|
||||
access_key: OPENAI_API_KEY
|
||||
model: gpt-4o
|
||||
default: true
|
||||
stream: true
|
||||
|
||||
# default system prompt used by all prompt targets
|
||||
system_prompt: |
|
||||
You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
system_prompt: You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
|
||||
prompt_guards:
|
||||
input_guards:
|
||||
jailbreak:
|
||||
on_exception:
|
||||
message: "Looks like you're curious about my abilities, but I can only provide assistance within my programmed parameters."
|
||||
message: Looks like you're curious about my abilities, but I can only provide assistance within my programmed parameters.
|
||||
|
||||
prompt_targets:
|
||||
- name: "reboot_network_device"
|
||||
description: "Helps network operators perform device operations like rebooting a device."
|
||||
endpoint:
|
||||
name: app_server
|
||||
path: "/agent/action"
|
||||
parameters:
|
||||
- name: "device_id"
|
||||
# additional type options include: int | float | bool | string | list | dict
|
||||
type: "string"
|
||||
description: "Identifier of the network device to reboot."
|
||||
required: true
|
||||
- name: "confirmation"
|
||||
type: "string"
|
||||
description: "Confirmation flag to proceed with reboot."
|
||||
default: "no"
|
||||
enum: [yes, no]
|
||||
|
||||
- name: "information_extraction"
|
||||
- name: information_extraction
|
||||
default: true
|
||||
description: "This prompt handles all scenarios that are question and answer in nature. Like summarization, information extraction, etc."
|
||||
description: handel all scenarios that are question and answer in nature. Like summarization, information extraction, etc.
|
||||
endpoint:
|
||||
name: app_server
|
||||
path: "/agent/summary"
|
||||
path: /agent/summary
|
||||
# Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM
|
||||
auto_llm_dispatch_on_response: true
|
||||
# override system prompt for this prompt target
|
||||
system_prompt: |
|
||||
You are a helpful information extraction assistant. Use the information that is provided to you.
|
||||
system_prompt: You are a helpful information extraction assistant. Use the information that is provided to you.
|
||||
|
||||
- name: reboot_network_device
|
||||
description: Reboot a specific network device
|
||||
endpoint:
|
||||
name: app_server
|
||||
path: /agent/action
|
||||
parameters:
|
||||
- name: device_id
|
||||
type: str
|
||||
description: Identifier of the network device to reboot.
|
||||
required: true
|
||||
- name: confirmation
|
||||
type: bool
|
||||
description: Confirmation flag to proceed with reboot.
|
||||
default: false
|
||||
enum: [true, false]
|
||||
|
||||
error_target:
|
||||
endpoint:
|
||||
|
|
@ -66,6 +63,6 @@ endpoints:
|
|||
# value could be ip address or a hostname with port
|
||||
# this could also be a list of endpoints for load balancing
|
||||
# for example endpoint: [ ip1:port, ip2:port ]
|
||||
endpoint: "127.0.0.1:80"
|
||||
endpoint: 127.0.0.1:80
|
||||
# max time to wait for a connection to be established
|
||||
connect_timeout: 0.005s
|
||||
|
|
|
|||
|
|
@ -3,7 +3,7 @@
|
|||
LLM Provider
|
||||
============
|
||||
|
||||
``llm_provider`` is a top-level primitive in Arch, helping developers centrally define, secure, observe,
|
||||
**LLM provider** is a top-level primitive in Arch, helping developers centrally define, secure, observe,
|
||||
and manage the usage of of their LLMs. Arch builds on Envoy's reliable `cluster subsystem <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/upstream/cluster_manager>`_
|
||||
to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms,
|
||||
ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly
|
||||
|
|
|
|||
|
|
@ -1,3 +1,5 @@
|
|||
.. _prompt_target:
|
||||
|
||||
Prompt Target
|
||||
==============
|
||||
|
||||
|
|
@ -89,9 +91,10 @@ Example Configuration
|
|||
type: str
|
||||
required: true
|
||||
- name: unit
|
||||
description: The unit of temperature to return
|
||||
description: The unit of temperature
|
||||
type: str
|
||||
enum: ["celsius", "fahrenheit"]
|
||||
default: fahrenheit
|
||||
enum: [celsius, fahrenheit]
|
||||
endpoint:
|
||||
name: api_server
|
||||
path: /weather
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
.. _error_target:
|
||||
|
||||
Error Targets
|
||||
Error Target
|
||||
=============
|
||||
|
||||
**Error targets** are designed to capture and manage specific issues or exceptions that occur during Arch's function or system's execution.
|
||||
|
|
@ -12,24 +12,20 @@ The errors are communicated to the application via headers like ``X-Arch-[ERROR-
|
|||
Key Concepts
|
||||
------------
|
||||
|
||||
**Error Type**: Categorizes the nature of the error, such as "ValidationError" or "RuntimeError." These error types help in identifying what
|
||||
kind of issue occurred and provide context for troubleshooting.
|
||||
- **Error Type**: Categorizes the nature of the error, such as "ValidationError" or "RuntimeError." These error types help in identifying what kind of issue occurred and provide context for troubleshooting.
|
||||
|
||||
**Error Message**: A clear, human-readable message describing the error. This should provide enough detail to inform users or developers of
|
||||
the root cause or required action.
|
||||
- **Error Message**: A clear, human-readable message describing the error. This should provide enough detail to inform users or developers of the root cause or required action.
|
||||
|
||||
**Target Prompt**: The specific prompt or operation where the error occurred. Understanding where the error happened helps with debugging
|
||||
and pinpointing the source of the problem.
|
||||
- **Target Prompt**: The specific prompt or operation where the error occurred. Understanding where the error happened helps with debugging and pinpointing the source of the problem.
|
||||
|
||||
**Parameter-Specific Errors**: Errors that arise due to invalid or missing parameters when invoking a function. These errors are critical
|
||||
for ensuring the correctness of inputs.
|
||||
- **Parameter-Specific Errors**: Errors that arise due to invalid or missing parameters when invoking a function. These errors are critical for ensuring the correctness of inputs.
|
||||
|
||||
|
||||
Error Header Example
|
||||
--------------------
|
||||
|
||||
.. code-block:: http
|
||||
|
||||
.. code-block:: bash
|
||||
:caption: Error Header Example
|
||||
|
||||
HTTP/1.1 400 Bad Request
|
||||
X-Arch-Error-Type: FunctionValidationError
|
||||
|
|
@ -38,14 +34,15 @@ Error Header Example
|
|||
Content-Type: application/json
|
||||
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Please create a user with the following ID: 1234"
|
||||
},
|
||||
{
|
||||
"role": "system",
|
||||
"content": "Expected a string for 'user_id', but got an integer."
|
||||
}]
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Please create a user with the following ID: 1234"
|
||||
},
|
||||
{
|
||||
"role": "system",
|
||||
"content": "Expected a string for 'user_id', but got an integer."
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
Best Practices and Tips
|
||||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
Listener
|
||||
---------
|
||||
Listener is a top level primitive in Arch, which simplifies the configuration required to bind incoming
|
||||
**Listener** is a top level primitive in Arch, which simplifies the configuration required to bind incoming
|
||||
connections from downstream clients, and for egress connections to LLMs (hosted or API)
|
||||
|
||||
Arch builds on Envoy's Listener subsystem to streamline connection managemet for developers. Arch minimizes
|
||||
|
|
@ -15,23 +15,23 @@ Downstream (Ingress)
|
|||
Developers can configure Arch to accept connections from downstream clients. A downstream listener acts as the
|
||||
primary entry point for incoming traffic, handling initial connection setup, including network filtering, gurdrails,
|
||||
and additional network security checks. For more details on prompt security and safety,
|
||||
see :ref:`here <arch_overview_prompt_handling>`
|
||||
see :ref:`here <arch_overview_prompt_handling>`.
|
||||
|
||||
Upstream (Egress)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Arch automatically configures a listener to route requests from your application to upstream LLM API providers (or hosts).
|
||||
When you start Arch, it creates a listener for egress traffic based on the presence of the ``llm_providers`` configuration
|
||||
section in the ``prompt_config.yml`` file. Arch binds itself to a local address such as ``127.0.0.1:9000/v1`` or a DNS-based
|
||||
address like ``arch.local:9000/v1`` for outgoing traffic. For more details on LLM providers, read :ref:`here <llm_provider>`
|
||||
When you start Arch, it creates a listener for egress traffic based on the presence of the ``listener`` configuration
|
||||
section in the configuration file. Arch binds itself to a local address such as ``127.0.0.1:9000/v1`` or a DNS-based
|
||||
address like ``arch.local:9000/v1`` for outgoing traffic. For more details on LLM providers, read :ref:`here <llm_provider>`.
|
||||
|
||||
Configure Listener
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To configure a Downstream (Ingress) Listner, simply add the ``listener`` directive to your ``prompt_config.yml`` file:
|
||||
To configure a Downstream (Ingress) Listner, simply add the ``listener`` directive to your configuration file:
|
||||
|
||||
.. literalinclude:: ../includes/arch_config.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:lines: 1-18
|
||||
:emphasize-lines: 2-5
|
||||
:emphasize-lines: 3-7
|
||||
:caption: Example Configuration
|
||||
|
|
|
|||
|
|
@ -1,19 +1,18 @@
|
|||
.. _arch_model_serving:
|
||||
.. _model_serving:
|
||||
|
||||
Model Serving
|
||||
-------------
|
||||
=============
|
||||
|
||||
Arch is a set of **two** self-contained processes that are designed to run alongside your application
|
||||
Arch is a set of `two` self-contained processes that are designed to run alongside your application
|
||||
servers (or on a separate host connected via a network). The first process is designated to manage low-level
|
||||
networking and HTTP related comcerns, and the other process is for **model serving**, which helps Arch make
|
||||
networking and HTTP related comcerns, and the other process is for model serving, which helps Arch make
|
||||
intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built
|
||||
LLMs in Arch.
|
||||
|
||||
.. image:: /_static/img/arch-system-architecture.jpg
|
||||
:align: center
|
||||
:width: 50%
|
||||
:width: 40%
|
||||
|
||||
_____________________________________________________________________________________________________________
|
||||
|
||||
Arch' is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don't
|
||||
have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically)
|
||||
|
|
@ -21,7 +20,7 @@ can process prompts quickly and forward control back to the applicaton host. The
|
|||
can be configured to run its **model server** subsystem:
|
||||
|
||||
Local Serving (CPU - Moderate)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
------------------------------
|
||||
The following bash commands enable you to configure the model server subsystem in Arch to run local on device
|
||||
and only use CPU devices. This will be the slowest option but can be useful in dev/test scenarios where GPUs
|
||||
might not be available.
|
||||
|
|
@ -30,18 +29,18 @@ might not be available.
|
|||
|
||||
$ archgw up --local-cpu
|
||||
|
||||
Local Serving (GPU- Fast)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Local Serving (GPU - Fast)
|
||||
--------------------------
|
||||
The following bash commands enable you to configure the model server subsystem in Arch to run locally on the
|
||||
machine and utilize the GPU available for fast inference across all model use cases, including function calling
|
||||
guardails, etc.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ archgw up --local
|
||||
$ archgw up --local-gpu
|
||||
|
||||
Cloud Serving (GPU - Blazing Fast)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
----------------------------------
|
||||
The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
|
||||
cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance
|
||||
of your applications.
|
||||
|
|
|
|||
|
|
@ -1,17 +1,17 @@
|
|||
.. _arch_overview_prompt_handling:
|
||||
|
||||
Prompt
|
||||
=================
|
||||
Prompts
|
||||
=======
|
||||
|
||||
Arch's primary design point is to securely accept, process and handle prompts. To do that effectively,
|
||||
Arch relies on Envoy's HTTP `connection management <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/http/http_connection_management>`_,
|
||||
subsystem and its **prompt handler** subsystem engineered with purpose-built LLMs to
|
||||
implement critical functionality on behalf of developers so that you can stay focused on business logic.
|
||||
|
||||
.. Note::
|
||||
Arch's **prompt handler** subsystem interacts with the **model** subsytem through Envoy's cluster manager
|
||||
system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts. Read more
|
||||
about the :ref:`model subsystem <arch_model_serving>` and how the LLMs are hosted in Arch.
|
||||
Arch's **prompt handler** subsystem interacts with the **model subsytem** through Envoy's cluster manager system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts.
|
||||
|
||||
.. seealso::
|
||||
Read more about the :ref:`model subsystem <model_serving>` and how the LLMs are hosted in Arch.
|
||||
|
||||
Messages
|
||||
--------
|
||||
|
|
@ -24,7 +24,7 @@ containing two key-value pairs:
|
|||
- **Content**: Contains the actual text of the message.
|
||||
|
||||
|
||||
Prompt Guardrails
|
||||
Prompt Guard
|
||||
-----------------
|
||||
|
||||
Arch is engineered with :ref:`Arch-Guard <prompt_guard>`, an industry leading safety layer, powered by a
|
||||
|
|
@ -36,12 +36,12 @@ To add jailbreak guardrails, see example below:
|
|||
.. literalinclude:: ../includes/arch_config.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:lines: 1-45
|
||||
:emphasize-lines: 22-26
|
||||
:lines: 1-25
|
||||
:emphasize-lines: 21-25
|
||||
:caption: Example Configuration
|
||||
|
||||
.. Note::
|
||||
As a roadmap item, Arch will expose the ability for developers to define custom guardrails via Arch-Guard-v2,
|
||||
As a roadmap item, Arch will expose the ability for developers to define custom guardrails via Arch-Guard,
|
||||
and add support for additional safety checks defined by developers and hazardous categories like, violent crimes, privacy, hate,
|
||||
etc. To offer feedback on our roadmap, please visit our `github page <https://github.com/orgs/katanemo/projects/1>`_
|
||||
|
||||
|
|
@ -59,10 +59,14 @@ Configuring ``prompt_targets`` is simple. See example below:
|
|||
.. literalinclude:: ../includes/arch_config.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:emphasize-lines: 29-38
|
||||
:emphasize-lines: 39-53
|
||||
:caption: Example Configuration
|
||||
|
||||
|
||||
.. seealso::
|
||||
|
||||
Check :ref:`Prompt Target <prompt_target>` for more details!
|
||||
|
||||
Intent Detection and Prompt Matching:
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
|
@ -127,10 +131,6 @@ Example: Using OpenAI Client with Arch as an Egress Gateway
|
|||
|
||||
print("OpenAI Response:", response.choices[0].text.strip())
|
||||
|
||||
In these examples:
|
||||
|
||||
The OpenAI client is used to send traffic directly through the Arch egress proxy to the LLM of your choice, such as OpenAI.
|
||||
The OpenAI client is configured to route traffic via Arch by setting the proxy to 127.0.0.1:51001, assuming Arch is
|
||||
running locally and bound to that address and port.
|
||||
|
||||
In these examples, the OpenAI client is used to send traffic directly through the Arch egress proxy to the LLM of your choice, such as OpenAI.
|
||||
The OpenAI client is configured to route traffic via Arch by setting the proxy to ``127.0.0.1:51001``, assuming Arch is running locally and bound to that address and port.
|
||||
This setup allows you to take advantage of Arch's advanced traffic management features while interacting with LLM APIs like OpenAI.
|
||||
|
|
|
|||
|
|
@ -61,7 +61,7 @@ The request processing path in Arch has three main parts:
|
|||
forwarding prompts ``prompt_targets`` and establishes the lifecycle of any **upstream** connection to a
|
||||
hosted endpoint that implements domain-specific business logic for incoming promots. This is where knowledge
|
||||
of targets and endpoint health, load balancing and connection pooling exists.
|
||||
* :ref:`Model serving subsystem <arch_model_serving>` which helps Arch make intelligent decisions about the
|
||||
* :ref:`Model serving subsystem <model_serving>` which helps Arch make intelligent decisions about the
|
||||
incoming prompts. The model server is designed to call the purpose-built LLMs in Arch.
|
||||
|
||||
The three subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.
|
||||
|
|
|
|||
|
|
@ -9,6 +9,7 @@ Tech Overview
|
|||
terminology
|
||||
threading_model
|
||||
listener
|
||||
model_serving
|
||||
prompt
|
||||
model_serving
|
||||
request_lifecycle
|
||||
error_target
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ to keep things consistent in logs, traces and in code.
|
|||
:width: 100%
|
||||
:align: center
|
||||
|
||||
**Listener**: A listener is a named network location (e.g., port, address, path etc.) that Arch listens on to process prompts
|
||||
**Listener**: A :ref:`listener <arch_overview_listeners>` is a named network location (e.g., port, address, path etc.) that Arch listens on to process prompts
|
||||
before forwarding them to your application server endpoints. rch enables you to configure one listener for downstream connections
|
||||
(like port 80, 443) and creates a separate internal listener for calls that initiate from your application code to LLMs.
|
||||
|
||||
|
|
@ -22,25 +22,25 @@ before forwarding them to your application server endpoints. rch enables you to
|
|||
|
||||
When you start Arch, you specify a listener address/port that you want to bind downstream. But, Arch uses are predefined port
|
||||
that you can use (``127.0.0.1:10000``) to proxy egress calls originating from your application to LLMs (API-based or hosted).
|
||||
For more details, check out :ref:`LLM providers <llm_provider>`
|
||||
For more details, check out :ref:`LLM provider <llm_provider>`.
|
||||
|
||||
**Instance**: An instance of the Arch gateway. When you start Arch it creates at most two processes. One to handle Layer 7
|
||||
networking operations (auth, tls, observability, etc) and the second process to serve models that enable it to make smart
|
||||
decisions on how to accept, handle and forward prompts. The second process is optional, as the model serving sevice could be
|
||||
hosted on a different network (an API call). But these two processes are considered a single instance of Arch.
|
||||
|
||||
**Prompt Targets**: Arch offers a primitive called ``prompt_targets`` to help separate business logic from undifferentiated
|
||||
**Prompt Target**: Arch offers a primitive called :ref:`prompt_target <prompt_target>` to help separate business logic from undifferentiated
|
||||
work in building generative AI apps. Prompt targets are endpoints that receive prompts that are processed by Arch.
|
||||
For example, Arch enriches incoming prompts with metadata like knowing when a request is a follow-up or clarifying prompt
|
||||
so that you can build faster, more accurate retrieval (RAG) apps. To support agentic apps, like scheduling travel plans or
|
||||
sharing comments on a document - via prompts, Bolt uses its function calling abilities to extract critical information from
|
||||
the incoming prompt (or a set of prompts) needed by a downstream backend API or function call before calling it directly.
|
||||
|
||||
**Error Targets**: Error targets are those endpoints that receive forwarded errors from Arch when issues arise,
|
||||
**Error Target**: :ref:`Error targets <error_target>` are those endpoints that receive forwarded errors from Arch when issues arise,
|
||||
such as failing to properly call a function/API, detecting violations of guardrails, or encountering other processing errors.
|
||||
These errors are communicated to the application via headers (X-Arch-[ERROR-TYPE]), allowing it to handle the errors gracefully
|
||||
These errors are communicated to the application via headers ``X-Arch-[ERROR-TYPE]``, allowing it to handle the errors gracefully
|
||||
and take appropriate actions.
|
||||
|
||||
**Model Serving**: Arch is a set of **two** self-contained processes that are designed to run alongside your application servers
|
||||
(or on a separate hostconnected via a network).The **model serving** process helps Arch make intelligent decisions about the
|
||||
**Model Serving**: Arch is a set of `two` self-contained processes that are designed to run alongside your application servers
|
||||
(or on a separate hostconnected via a network).The :ref:`model serving <model_serving>` process helps Arch make intelligent decisions about the
|
||||
incoming prompts. The model server is designed to call the (fast) purpose-built LLMs in Arch.
|
||||
|
|
|
|||
|
|
@ -13,7 +13,7 @@ thread. All the functionality around prompt handling from a downstream client is
|
|||
This allows the majority of Arch to be largely single threaded (embarrassingly parallel) with a small amount
|
||||
of more complex code handling coordination between the worker threads.
|
||||
|
||||
Generally Arch is written to be 100% non-blocking.
|
||||
Generally, Arch is written to be 100% non-blocking.
|
||||
|
||||
.. tip::
|
||||
|
||||
|
|
|
|||
|
|
@ -1,47 +1,44 @@
|
|||
version: "0.1-beta"
|
||||
version: v0.1
|
||||
|
||||
listen:
|
||||
address: 127.0.0.1 | 0.0.0.0
|
||||
port_value: 8080 #If you configure port 443, you'll need to update the listener with tls_certificates
|
||||
|
||||
system_prompt: |
|
||||
You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
address: 0.0.0.0 # or 127.0.0.1
|
||||
port: 10000
|
||||
# Defines how Arch should parse the content from application/json or text/pain Content-type in the http request
|
||||
message_format: huggingface
|
||||
|
||||
# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
|
||||
llm_providers:
|
||||
- name: "OpenAI"
|
||||
provider: "openai"
|
||||
- name: OpenAI
|
||||
provider: openai
|
||||
access_key: OPENAI_API_KEY
|
||||
model: gpt-4o
|
||||
default: true
|
||||
stream: true
|
||||
|
||||
# default system prompt used by all prompt targets
|
||||
system_prompt: You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
|
||||
prompt_targets:
|
||||
- name: reboot_devices
|
||||
description: >
|
||||
This prompt target handles user requests to reboot devices.
|
||||
It ensures that when users request to reboot specific devices or device groups, the system processes the reboot commands accurately.
|
||||
|
||||
**Examples of user prompts:**
|
||||
|
||||
- "Please reboot device 12345."
|
||||
- "Restart all devices in tenant group tenant-XYZ
|
||||
- "I need to reboot devices A, B, and C."
|
||||
description: Reboot specific devices or device groups
|
||||
|
||||
path: /agent/device_reboot
|
||||
parameters:
|
||||
- name: "device_ids"
|
||||
type: list # Options: integer | float | list | dictionary | set
|
||||
description: "A list of device identifiers (IDs) to reboot."
|
||||
- name: device_ids
|
||||
type: list
|
||||
description: A list of device identifiers (IDs) to reboot.
|
||||
required: false
|
||||
- name: "device_group"
|
||||
type: string # Options: string | integer | float | list | dictionary | set
|
||||
description: "The name of the device group to reboot."
|
||||
- name: device_group
|
||||
type: str
|
||||
description: The name of the device group to reboot
|
||||
required: false
|
||||
|
||||
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
|
||||
endpoints:
|
||||
app_server:
|
||||
# value could be ip address or a hostname with port
|
||||
# this could also be a list of endpoints for load balancing for example endpoint: [ ip1:port, ip2:port ]
|
||||
endpoint: "127.0.0.1:80"
|
||||
# this could also be a list of endpoints for load balancing
|
||||
# for example endpoint: [ ip1:port, ip2:port ]
|
||||
endpoint: 127.0.0.1:80
|
||||
# max time to wait for a connection to be established
|
||||
connect_timeout: 0.005s
|
||||
version: "0.1-beta"
|
||||
|
|
|
|||
|
|
@ -1,3 +1,6 @@
|
|||
.. _overview:
|
||||
|
||||
|
||||
Overview
|
||||
============
|
||||
Welcome to Arch, the intelligent prompt gateway designed to help developers build **fast**, **secure**, and **personalized** generative AI apps at ANY scale.
|
||||
|
|
@ -12,17 +15,17 @@ This section introduces you to Arch and helps you get set up quickly:
|
|||
|
||||
.. grid:: 3
|
||||
|
||||
.. grid-item-card:: Overview
|
||||
.. grid-item-card:: :octicon:`apps` Overview
|
||||
:link: overview.html
|
||||
|
||||
Overview of Arch and Doc navigation
|
||||
|
||||
.. grid-item-card:: Intro to Arch
|
||||
.. grid-item-card:: :octicon:`book` Intro to Arch
|
||||
:link: intro_to_arch.html
|
||||
|
||||
Explore Arch's features and developer workflow
|
||||
|
||||
.. grid-item-card:: Quickstart
|
||||
.. grid-item-card:: :octicon:`rocket` Quickstart
|
||||
:link: quickstart.html
|
||||
|
||||
Learn how to quickly set up and integrate
|
||||
|
|
@ -35,18 +38,18 @@ Deep dive into essential ideas and mechanisms behind Arch:
|
|||
|
||||
.. grid:: 3
|
||||
|
||||
.. grid-item-card:: Tech Overview
|
||||
:link: ../Concepts/tech_overview/tech_overview.html
|
||||
.. grid-item-card:: :octicon:`package` Tech Overview
|
||||
:link: ../concepts/tech_overview/tech_overview.html
|
||||
|
||||
Learn about the technology stack
|
||||
|
||||
.. grid-item-card:: LLM Provider
|
||||
:link: ../Concepts/llm_provider.html
|
||||
.. grid-item-card:: :octicon:`webhook` LLM Provider
|
||||
:link: ../concepts/llm_provider.html
|
||||
|
||||
Explore Arch’s LLM integration options
|
||||
|
||||
.. grid-item-card:: Targets
|
||||
:link: ../Concepts/prompt_target.html
|
||||
.. grid-item-card:: :octicon:`workflow` Prompt Target
|
||||
:link: ../concepts/prompt_target.html
|
||||
|
||||
Understand how Arch handles prompts
|
||||
|
||||
|
|
@ -57,18 +60,18 @@ Step-by-step tutorials for practical Arch use cases and scenarios:
|
|||
|
||||
.. grid:: 3
|
||||
|
||||
.. grid-item-card:: Prompt Guard
|
||||
:link: ../guides/tech_overview/tech_overview.html
|
||||
.. grid-item-card:: :octicon:`shield-check` Prompt Guard
|
||||
:link: ../guides/prompt_guard.html
|
||||
|
||||
Instructions on securing and validating prompts
|
||||
|
||||
.. grid-item-card:: Function Calling
|
||||
.. grid-item-card:: :octicon:`code-square` Function Calling
|
||||
:link: ../guides/function_calling.html
|
||||
|
||||
A guide to effective function calling
|
||||
|
||||
.. grid-item-card:: Observability
|
||||
:link: ../guides/prompt_target.html
|
||||
.. grid-item-card:: :octicon:`issue-opened` Observability
|
||||
:link: ../guides/observability/observability.html
|
||||
|
||||
Learn to monitor and troubleshoot Arch
|
||||
|
||||
|
|
@ -80,12 +83,12 @@ For developers extending and customizing Arch for specialized needs:
|
|||
|
||||
.. grid:: 2
|
||||
|
||||
.. grid-item-card:: Agentic Workflow
|
||||
.. grid-item-card:: :octicon:`dependabot` Agentic Workflow
|
||||
:link: ../build_with_arch/agent.html
|
||||
|
||||
Discover how to create and manage custom agents within Arch
|
||||
|
||||
.. grid-item-card:: RAG Application
|
||||
.. grid-item-card:: :octicon:`stack` RAG Application
|
||||
:link: ../build_with_arch/rag.html
|
||||
|
||||
Integrate RAG for knowledge-driven responses
|
||||
|
|
|
|||
|
|
@ -77,7 +77,7 @@ Next Steps
|
|||
|
||||
Congratulations! You've successfully set up Arch and made your first prompt-based request. To further enhance your GenAI applications, explore the following resources:
|
||||
|
||||
- Full Documentation: Comprehensive guides and references.
|
||||
- :ref:`Full Documentation <overview>`: Comprehensive guides and references.
|
||||
- `GitHub Repository <https://github.com/katanemo/arch>`_: Access the source code, contribute, and track updates.
|
||||
- `Support <https://github.com/katanemo/arch#contact>`_: Get help and connect with the Arch community .
|
||||
|
||||
|
|
|
|||
|
|
@ -83,7 +83,7 @@ Here’s a step-by-step guide to configuring function calling within your Arch s
|
|||
|
||||
Step 1: Define the Function
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Create or identify the backend function you want Arch to call. This could be an API endpoint, a script, or any other executable backend logic.
|
||||
First, create or identify the backend function you want Arch to call. This could be an API endpoint, a script, or any other executable backend logic.
|
||||
|
||||
.. code-block:: python
|
||||
:caption: Example Function
|
||||
|
|
@ -112,11 +112,11 @@ Create or identify the backend function you want Arch to call. This could be an
|
|||
|
||||
Step 2: Configure Prompt Targets
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Map the function to a prompt target, defining the intent and parameters that Arch will extract from the user’s prompt.
|
||||
Next, map the function to a prompt target, defining the intent and parameters that Arch will extract from the user’s prompt.
|
||||
Specify the parameters your function needs and how Arch should interpret these.
|
||||
|
||||
.. code-block:: yaml
|
||||
:caption: Example Config
|
||||
:caption: Prompt Target Example Configuration
|
||||
|
||||
prompt_targets:
|
||||
- name: get_weather
|
||||
|
|
@ -134,10 +134,10 @@ Map the function to a prompt target, defining the intent and parameters that Arc
|
|||
name: api_server
|
||||
path: /weather
|
||||
|
||||
Step 3: Validate Parameters
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Arch will validate parameters and ensure that the required parameters (e.g., location) are present in the prompt, and add validation rules if necessary.
|
||||
Step 3: Arch Takes Over
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Once you have defined the functions and configured the prompt targets, Arch takes care of the remaining work.
|
||||
It will automatically validate parameters validate parameters and ensure that the required parameters (e.g., location) are present in the prompt, and add validation rules if necessary.
|
||||
Here is ane example validation schema using the `jsonschema <https://json-schema.org/docs>`_ library
|
||||
|
||||
.. code-block:: python
|
||||
|
|
@ -191,12 +191,8 @@ Here is ane example validation schema using the `jsonschema <https://json-schema
|
|||
print(weather_info)
|
||||
|
||||
|
||||
Step 4: Execute and Return the Response
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Once the function is called, format the response and send it back to Arch-Function.
|
||||
Next, Arch-Function provides users with coherent and user-friendly responses.
|
||||
|
||||
Once the functions are called, Arch formats the response and deliver back to users.
|
||||
By completing these setup steps, you enable Arch to manage the process from validation to response, ensuring users receive consistent, reliable results.
|
||||
|
||||
Example Use Cases
|
||||
-----------------
|
||||
|
|
|
|||
68
docs/source/guides/includes/arch_config.yaml
Normal file
68
docs/source/guides/includes/arch_config.yaml
Normal file
|
|
@ -0,0 +1,68 @@
|
|||
version: v0.1
|
||||
|
||||
listener:
|
||||
address: 0.0.0.0 # or 127.0.0.1
|
||||
port: 10000
|
||||
# Defines how Arch should parse the content from application/json or text/pain Content-type in the http request
|
||||
message_format: huggingface
|
||||
|
||||
# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
|
||||
llm_providers:
|
||||
- name: OpenAI
|
||||
provider: openai
|
||||
access_key: OPENAI_API_KEY
|
||||
model: gpt-4o
|
||||
default: true
|
||||
stream: true
|
||||
|
||||
# default system prompt used by all prompt targets
|
||||
system_prompt: You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
|
||||
prompt_guards:
|
||||
input_guards:
|
||||
jailbreak:
|
||||
on_exception:
|
||||
message: Looks like you're curious about my abilities, but I can only provide assistance within my programmed parameters.
|
||||
|
||||
prompt_targets:
|
||||
- name: information_extraction
|
||||
default: true
|
||||
description: handel all scenarios that are question and answer in nature. Like summarization, information extraction, etc.
|
||||
endpoint:
|
||||
name: app_server
|
||||
path: /agent/summary
|
||||
# Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM
|
||||
auto_llm_dispatch_on_response: true
|
||||
# override system prompt for this prompt target
|
||||
system_prompt: You are a helpful information extraction assistant. Use the information that is provided to you.
|
||||
|
||||
- name: reboot_network_device
|
||||
description: Perform device operations like rebooting a device.
|
||||
endpoint:
|
||||
name: app_server
|
||||
path: /agent/action
|
||||
parameters:
|
||||
- name: device_id
|
||||
type: str
|
||||
description: Identifier of the network device to reboot.
|
||||
required: true
|
||||
- name: confirmation
|
||||
type: bool
|
||||
description: Confirmation flag to proceed with reboot.
|
||||
default: false
|
||||
enum: [true, false]
|
||||
|
||||
error_target:
|
||||
endpoint:
|
||||
name: error_target_1
|
||||
path: /error
|
||||
|
||||
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
|
||||
endpoints:
|
||||
app_server:
|
||||
# value could be ip address or a hostname with port
|
||||
# this could also be a list of endpoints for load balancing
|
||||
# for example endpoint: [ ip1:port, ip2:port ]
|
||||
endpoint: 127.0.0.1:80
|
||||
# max time to wait for a connection to be established
|
||||
connect_timeout: 0.005s
|
||||
|
|
@ -47,6 +47,16 @@ It excels at detecting explicitly malicious prompts and assessing toxic content,
|
|||
By embedding Arch-Guard within the Arch architecture, we empower developers to build robust, LLM-powered applications while prioritizing security and safety. With Arch-Guard, you can navigate the complexities of prompt management with confidence, knowing you have a reliable defense against malicious input.
|
||||
|
||||
|
||||
Example Configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
Here is an example of using Arch-Guard in Arch:
|
||||
|
||||
.. literalinclude:: includes/arch_config.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:lines: 22-26
|
||||
:caption: Arch-Guard Example Configuration
|
||||
|
||||
How Arch-Guard Works
|
||||
----------------------
|
||||
|
||||
|
|
|
|||
|
|
@ -64,4 +64,3 @@ Arch (built by the contributors of `Envoy <https://www.envoyproxy.io/>`_ ) was b
|
|||
:titlesonly:
|
||||
|
||||
resources/configuration_reference
|
||||
resources/error_target
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
version: "0.1-beta"
|
||||
version: v0.1
|
||||
|
||||
listener:
|
||||
address: 0.0.0.0 # or 127.0.0.1
|
||||
|
|
@ -8,9 +8,9 @@ listener:
|
|||
common_tls_context: # If you configure port 443, you'll need to update the listener with your TLS certificates
|
||||
tls_certificates:
|
||||
- certificate_chain:
|
||||
filename: "/etc/certs/cert.pem"
|
||||
filename: /etc/certs/cert.pem
|
||||
private_key:
|
||||
filename: "/etc/certs/key.pem"
|
||||
filename: /etc/certs/key.pem
|
||||
|
||||
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
|
||||
endpoints:
|
||||
|
|
@ -18,42 +18,42 @@ endpoints:
|
|||
# value could be ip address or a hostname with port
|
||||
# this could also be a list of endpoints for load balancing
|
||||
# for example endpoint: [ ip1:port, ip2:port ]
|
||||
endpoint: "127.0.0.1:80"
|
||||
endpoint: 127.0.0.1:80
|
||||
# max time to wait for a connection to be established
|
||||
connect_timeout: 0.005s
|
||||
|
||||
mistral_local:
|
||||
endpoint: "127.0.0.1:8001"
|
||||
endpoint: 127.0.0.1:8001
|
||||
|
||||
error_target:
|
||||
endpoint: "error_target_1"
|
||||
endpoint: error_target_1
|
||||
|
||||
# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
|
||||
llm_providers:
|
||||
- name: "OpenAI"
|
||||
provider: "openai"
|
||||
access_key: $OPENAI_API_KEY
|
||||
- name: OpenAI
|
||||
provider: openai
|
||||
access_key: OPENAI_API_KEY
|
||||
model: gpt-4o
|
||||
default: true
|
||||
stream: true
|
||||
rate_limits:
|
||||
selector: #optional headers, to add rate limiting based on http headers like JWT tokens or API keys
|
||||
http_header:
|
||||
name: "Authorization"
|
||||
name: Authorization
|
||||
value: "" # Empty value means each separate value has a separate limit
|
||||
limit:
|
||||
tokens: 100000 # Tokens per unit
|
||||
unit: "minute"
|
||||
unit: minute
|
||||
|
||||
- name: "Mistral8x7b"
|
||||
provider: "mistral"
|
||||
access_key: $MISTRAL_API_KEY
|
||||
model: "mistral-8x7b"
|
||||
- name: Mistral8x7b
|
||||
provider: mistral
|
||||
access_key: MISTRAL_API_KEY
|
||||
model: mistral-8x7b
|
||||
|
||||
- name: "MistralLocal7b"
|
||||
provider: "local"
|
||||
model: "mistral-7b-instruct"
|
||||
endpoint: "mistral_local"
|
||||
- name: MistralLocal7b
|
||||
provider: local
|
||||
model: mistral-7b-instruct
|
||||
endpoint: mistral_local
|
||||
|
||||
# provides a way to override default settings for the arch system
|
||||
overrides:
|
||||
|
|
@ -62,44 +62,41 @@ overrides:
|
|||
prompt_target_intent_matching_threshold: 0.60
|
||||
|
||||
# default system prompt used by all prompt targets
|
||||
system_prompt: |
|
||||
You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
system_prompt: You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
|
||||
|
||||
prompt_guards:
|
||||
input_guards:
|
||||
jailbreak:
|
||||
on_exception:
|
||||
message: "Looks like you're curious about my abilities, but I can only provide assistance within my programmed parameters."
|
||||
message: Looks like you're curious about my abilities, but I can only provide assistance within my programmed parameters.
|
||||
|
||||
prompt_targets:
|
||||
- name: "reboot_network_device"
|
||||
description: "Helps network operators perform device operations like rebooting a device."
|
||||
endpoint:
|
||||
name: app_server
|
||||
path: "/agent/action"
|
||||
parameters:
|
||||
- name: "device_id"
|
||||
# additional type options include: int | float | bool | string | list | dict
|
||||
type: "string"
|
||||
description: "Identifier of the network device to reboot."
|
||||
required: true
|
||||
- name: "confirmation"
|
||||
type: "string"
|
||||
description: "Confirmation flag to proceed with reboot."
|
||||
default: "no"
|
||||
enum: [yes, no]
|
||||
|
||||
- name: "information_extraction"
|
||||
- name: information_extraction
|
||||
default: true
|
||||
description: "This prompt handles all scenarios that are question and answer in nature. Like summarization, information extraction, etc."
|
||||
description: handel all scenarios that are question and answer in nature. Like summarization, information extraction, etc.
|
||||
endpoint:
|
||||
name: app_server
|
||||
path: "/agent/summary"
|
||||
path: /agent/summary
|
||||
# Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM
|
||||
auto_llm_dispatch_on_response: true
|
||||
# override system prompt for this prompt target
|
||||
system_prompt: |
|
||||
You are a helpful information extraction assistant. Use the information that is provided to you.
|
||||
system_prompt: You are a helpful information extraction assistant. Use the information that is provided to you.
|
||||
|
||||
- name: reboot_network_device
|
||||
description: Reboot a specific network device
|
||||
endpoint:
|
||||
name: app_server
|
||||
path: /agent/action
|
||||
parameters:
|
||||
- name: device_id
|
||||
type: str
|
||||
description: Identifier of the network device to reboot.
|
||||
required: true
|
||||
- name: confirmation
|
||||
type: bool
|
||||
description: Confirmation flag to proceed with reboot.
|
||||
default: false
|
||||
enum: [true, false]
|
||||
|
||||
error_target:
|
||||
endpoint:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue