Doc Update (#129)

* init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2026-06-26 15:39:40 +02:00 · 2024-10-06 16:54:34 -07:00 · 2024-10-06 16:54:34 -07:00 · 5c7567584d
commit 5c7567584d
parent 2a7b95582c
49 changed files with 1185 additions and 609 deletions
--- a/docs/source/build_with_arch/agent.rst
+++ b/docs/source/build_with_arch/agent.rst
@ -0,0 +1,70 @@
+.. _arch_agent_guide:
+
+Agentic Workflow
+==============================
+
+Arch helps you easily personalize your applications by calling application-specific (API) functions
+via user prompts. This involves any predefined functions or APIs you want to expose to users to perform tasks,
+gather information, or manipulate data. This capability is generally referred to as **function calling**, where
+you have the flexibility to support “agentic” apps tailored to specific use cases - from updating insurance
+claims to creating ad campaigns - via prompts.
+
+Arch analyzes prompts, extracts critical information from prompts, engages in lightweight conversation with
+the user to gather any missing parameters and makes API calls so that you can focus on writing business logic.
+Arch does this via its purpose-built :ref:`Arch-FC LLM <function_calling>` - the fastest (200ms p90 - 10x faser than GPT-4o)
+and cheapest (100x than GPT-40) function-calling LLM that matches performance with frontier models.
+
+.. image:: includes/agent/function-calling-flow.jpg
+   :width: 100%
+   :align: center
+
+
+Single Function Call
+--------------------
+In the most common scenario, users will request a single action via prompts, and Arch efficiently processes the
+request by extracting relevant parameters, validating the input, and calling the designated function or API. Here
+is how you would go about enabling this scenario with Arch:
+
+Step 1: Define prompt targets with functions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: includes/agent/function-calling-agent.yaml
+    :language: yaml
+    :linenos:
+    :emphasize-lines: 16-37
+    :caption: Define prompt targets that can enable users to engage with API and backened functions of an app
+
+Step 2: Process request parameters in Flask
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Once the prompt targets are configured as above, handling those parameters is
+
+.. literalinclude:: includes/agent/parameter_handling.py
+    :language: python
+    :linenos:
+    :caption: Parameter handling with Flask
+
+Parallel/ Multiple Function Calling
+-----------------------------------
+In more complex use cases, users may request multiple actions or need multiple APIs/functions to be called
+simultaneously or sequentially. With Arch, you can handle these scenarios efficiently using parallel or multiple
+function calling. This allows your application to engage in a broader range of interactions, such as updating
+different datasets, triggering events across systems, or collecting results from multiple services in one prompt.
+
+Arch-FC1B is built to manage these parallel tasks efficiently, ensuring low latency and high throughput, even
+when multiple functions are invoked. It provides two mechanisms to handle these cases:
+
+Step 1: Define Multiple Function Targets
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When enabling multiple function calling, define the prompt targets in a way that supports multiple functions or
+API calls based on the user's prompt. These targets can be triggered in parallel or sequentially, depending on
+the user's intent.
+
+Example of Multiple Prompt Targets in YAML:
+
+.. literalinclude:: includes/agent/function-calling-agent.yaml
+    :language: yaml
+    :linenos:
+    :emphasize-lines: 16-37
+    :caption: Define prompt targets that can enable users to engage with API and backened functions of an app
--- a/docs/source/build_with_arch/includes/agent/function-calling-agent.yaml
+++ b/docs/source/build_with_arch/includes/agent/function-calling-agent.yaml
@ -0,0 +1,47 @@
+version: "0.1-beta"
+listen:
+  address: 127.0.0.1 | 0.0.0.0
+  port_value: 8080 #If you configure port 443, you'll need to update the listener with tls_certificates
+
+system_prompt: |
+  You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
+
+llm_providers:
+  - name: "OpenAI"
+    provider: "openai"
+    access_key: OPENAI_API_KEY
+    model: gpt-4o
+    stream: true
+
+prompt_targets:
+  - name: reboot_devices
+    description: >
+      This prompt target handles user requests to reboot devices.
+      It ensures that when users request to reboot specific devices or device groups, the system processes the reboot commands accurately.
+
+      **Examples of user prompts:**
+
+      - "Please reboot device 12345."
+      - "Restart all devices in tenant group tenant-XYZ
+      - "I need to reboot devices A, B, and C."
+
+    path: /agent/device_reboot
+    parameters:
+      - name: "device_ids"
+        type: list  # Options: integer | float | list | dictionary | set
+        description: "A list of device identifiers (IDs) to reboot."
+        required: false
+      - name: "device_group"
+        type: string  # Options: string | integer | float | list | dictionary | set
+        description: "The name of the device group to reboot."
+        required: false
+
+# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
+endpoints:
+  app_server:
+    # value could be ip address or a hostname with port
+    # this could also be a list of endpoints for load balancing
+    # for example endpoint: [ ip1:port, ip2:port ]
+    endpoint: "127.0.0.1:80"
+    # max time to wait for a connection to be established
+    connect_timeout: 0.005s
--- a/docs/source/build_with_arch/includes/agent/function-calling-flow.jpg
+++ b/docs/source/build_with_arch/includes/agent/function-calling-flow.jpg
--- a/docs/source/build_with_arch/includes/agent/parameter_handling.py
+++ b/docs/source/build_with_arch/includes/agent/parameter_handling.py
@ -0,0 +1,41 @@
+from flask import Flask, request, jsonify
+
+app = Flask(__name__)
+
+@app.route('/agent/device_summary', methods=['POST'])
+def get_device_summary():
+    """
+    Endpoint to retrieve device statistics based on device IDs and an optional time range.
+    """
+    data = request.get_json()
+
+    # Validate 'device_ids' parameter
+    device_ids = data.get('device_ids')
+    if not device_ids or not isinstance(device_ids, list):
+        return jsonify({'error': "'device_ids' parameter is required and must be a list"}), 400
+
+    # Validate 'time_range' parameter (optional, defaults to 7)
+    time_range = data.get('time_range', 7)
+    if not isinstance(time_range, int):
+        return jsonify({'error': "'time_range' must be an integer"}), 400
+
+    # Simulate retrieving statistics for the given device IDs and time range
+    # In a real application, you would query your database or external service here
+    statistics = []
+    for device_id in device_ids:
+        # Placeholder for actual data retrieval
+        stats = {
+            'device_id': device_id,
+            'time_range': f'Last {time_range} days',
+            'data': f'Statistics data for device {device_id} over the last {time_range} days.'
+        }
+        statistics.append(stats)
+
+    response = {
+        'statistics': statistics
+    }
+
+    return jsonify(response), 200
+
+if __name__ == '__main__':
+    app.run(debug=True)
--- a/docs/source/build_with_arch/includes/rag/intent_detection.py
+++ b/docs/source/build_with_arch/includes/rag/intent_detection.py
@ -0,0 +1,152 @@
+from flask import Flask, request, jsonify
+from datetime import datetime
+import uuid
+from langchain.memory import ConversationBufferMemory
+from langchain.schema import AIMessage, HumanMessage
+from langchain import OpenAI
+
+app = Flask(__name__)
+
+# Global dictionary to keep track of user memories
+user_memories = {}
+
+def get_user_conversation(user_id):
+    """
+    Retrieve the user's conversation memory using LangChain.
+    If the user does not exist, initialize their conversation memory.
+    """
+    if user_id not in user_memories:
+        user_memories[user_id] = ConversationBufferMemory(return_messages=True)
+    return user_memories[user_id]
+
+def update_user_conversation(user_id, client_messages, intent_changed):
+    """
+    Update the user's conversation memory with new messages using LangChain.
+    Each message is augmented with a UUID, timestamp, and intent change marker.
+    Only new messages are added to avoid duplication.
+    """
+    memory = get_user_conversation(user_id)
+    stored_messages = memory.chat_memory.messages
+
+    # Determine the number of stored messages
+    num_stored_messages = len(stored_messages)
+    new_messages = client_messages[num_stored_messages:]
+
+    # Process each new message
+    for index, message in enumerate(new_messages):
+        role = message.get('role')
+        content = message.get('content')
+        metadata = {
+            'uuid': str(uuid.uuid4()),
+            'timestamp': datetime.utcnow().isoformat(),
+            'intent_changed': False  # Default value
+        }
+
+        # Mark the intent change on the last message if detected
+        if intent_changed and index == len(new_messages) - 1:
+            metadata['intent_changed'] = True
+
+        # Create a new message with metadata
+        if role == 'user':
+            memory.chat_memory.add_message(
+                HumanMessage(content=content, additional_kwargs={'metadata': metadata})
+            )
+        elif role == 'assistant':
+            memory.chat_memory.add_message(
+                AIMessage(content=content, additional_kwargs={'metadata': metadata})
+            )
+        else:
+            # Handle other roles if necessary
+            pass
+
+    return memory
+
+def get_messages_since_last_intent(messages):
+    """
+    Retrieve messages from the last intent change onwards using LangChain.
+    """
+    messages_since_intent = []
+    for message in reversed(messages):
+        # Insert message at the beginning to maintain correct order
+        messages_since_intent.insert(0, message)
+        metadata = message.additional_kwargs.get('metadata', {})
+        # Break if intent_changed is True
+        if metadata.get('intent_changed', False) == True:
+            break
+    return messages_since_intent
+
+def forward_to_llm(messages):
+    """
+    Forward messages to an upstream LLM using LangChain.
+    """
+    # Convert messages to a conversation string
+    conversation = ""
+    for message in messages:
+        role = 'User' if isinstance(message, HumanMessage) else 'Assistant'
+        content = message.content
+        conversation += f"{role}: {content}\n"
+    # Use LangChain's LLM to get a response. This call is proxied through Arch for end-to-end observability and traffic management
+    llm = OpenAI()
+    # Create a prompt that includes the conversation
+    prompt = f"{conversation}Assistant:"
+    response = llm(prompt)
+    return response
+
+@app.route('/process_rag', methods=['POST'])
+def process_rag():
+    # Extract JSON data from the request
+    data = request.get_json()
+
+    user_id = data.get('user_id')
+    if not user_id:
+        return jsonify({'error': 'User ID is required'}), 400
+
+    client_messages = data.get('messages')
+    if not client_messages or not isinstance(client_messages, list):
+        return jsonify({'error': 'Messages array is required'}), 400
+
+    # Extract the intent change marker from Arch's headers if present for the current prompt
+    intent_changed_header = request.headers.get('x-arch-intent-marker', '').lower()
+    if intent_changed_header in ['', 'false']:
+        intent_changed = False
+    elif intent_changed_header == 'true':
+        intent_changed = True
+    else:
+        # Invalid value provided
+        return jsonify({'error': 'Invalid value for x-arch-prompt-intent-change header'}), 400
+
+    # Update user conversation based on intent change
+    memory = update_user_conversation(user_id, client_messages, intent_changed)
+
+    # Retrieve messages since last intent change for LLM
+    messages_for_llm = get_messages_since_last_intent(memory.chat_memory.messages)
+
+    # Forward messages to upstream LLM
+    llm_response = forward_to_llm(messages_for_llm)
+
+    # Prepare the messages to return
+    messages_to_return = []
+    for message in memory.chat_memory.messages:
+        role = 'user' if isinstance(message, HumanMessage) else 'assistant'
+        content = message.content
+        metadata = message.additional_kwargs.get('metadata', {})
+        message_entry = {
+            'uuid': metadata.get('uuid'),
+            'timestamp': metadata.get('timestamp'),
+            'role': role,
+            'content': content,
+            'intent_changed': metadata.get('intent_changed', False)
+        }
+        messages_to_return.append(message_entry)
+
+    # Prepare the response
+    response = {
+        'user_id': user_id,
+        'messages': messages_to_return,
+        'llm_response': llm_response
+    }
+
+    return jsonify(response), 200
+
+if __name__ == '__main__':
+    app.run(debug=True)
--- a/docs/source/build_with_arch/includes/rag/parameter_handling.py
+++ b/docs/source/build_with_arch/includes/rag/parameter_handling.py
@ -0,0 +1,41 @@
+from flask import Flask, request, jsonify
+
+app = Flask(__name__)
+
+@app.route('/agent/device_summary', methods=['POST'])
+def get_device_summary():
+    """
+    Endpoint to retrieve device statistics based on device IDs and an optional time range.
+    """
+    data = request.get_json()
+
+    # Validate 'device_ids' parameter
+    device_ids = data.get('device_ids')
+    if not device_ids or not isinstance(device_ids, list):
+        return jsonify({'error': "'device_ids' parameter is required and must be a list"}), 400
+
+    # Validate 'time_range' parameter (optional, defaults to 7)
+    time_range = data.get('time_range', 7)
+    if not isinstance(time_range, int):
+        return jsonify({'error': "'time_range' must be an integer"}), 400
+
+    # Simulate retrieving statistics for the given device IDs and time range
+    # In a real application, you would query your database or external service here
+    statistics = []
+    for device_id in device_ids:
+        # Placeholder for actual data retrieval
+        stats = {
+            'device_id': device_id,
+            'time_range': f'Last {time_range} days',
+            'data': f'Statistics data for device {device_id} over the last {time_range} days.'
+        }
+        statistics.append(stats)
+
+    response = {
+        'statistics': statistics
+    }
+
+    return jsonify(response), 200
+
+if __name__ == '__main__':
+    app.run(debug=True)
--- a/docs/source/build_with_arch/includes/rag/prompt_targets.yaml
+++ b/docs/source/build_with_arch/includes/rag/prompt_targets.yaml
@ -0,0 +1,21 @@
+prompt_targets:
+  - name: get_device_statistics
+    description: >
+      This prompt target ensures that when users request device-related statistics, the system accurately retrieves and presents the relevant data
+      based on the specified devices and time range. Examples of user prompts, include:
+
+      - "Show me the performance stats for device 12345 over the past week."
+      - "What are the error rates for my devices in the last 24 hours?"
+      - "I need statistics on device 789 over the last 10 days."
+
+    path: /agent/device_summary
+    parameters:
+      - name: "device_ids"
+        type: list  # Options: integer | float | list | dictionary | set
+        description: "A list of device identifiers (IDs) for which the statistics are requested."
+        required: true
+      - name: "time_range"
+        type: integer  # Options: integer | float | list | dictionary | set
+        description: "The number of days in the past over which to retrieve device statistics. Defaults to 7 days if not specified."
+        required: false
+        default: 7
--- a/docs/source/build_with_arch/rag.rst
+++ b/docs/source/build_with_arch/rag.rst
@ -0,0 +1,94 @@
+.. _arch_rag_guide:
+
+RAG Application
+===============
+
+The following section describes how Arch can help you build faster, smarter and more accurate
+Retrieval-Augmented Generation (RAG) applications.
+
+Intent-drift Detection
+----------------------
+
+Developers struggle to handle `follow-up <https://www.reddit.com/r/ChatGPTPromptGenius/comments/17dzmpy/how_to_use_rag_with_conversation_history_for/?>`_
+or `clarifying <https://www.reddit.com/r/LocalLLaMA/comments/18mqwg6/best_practice_for_rag_with_followup_chat/>`_
+questions. Specifically, when users ask for changes or additions to previous responses their AI applications often
+generate entirely new responses instead of adjusting previous ones. Arch offers *intent-drift* tracking as a feature so
+that developers can know when the user has shifted away from a previous intent so that they can dramatically improve
+retrieval accuracy, lower overall token cost and  improve the speed of their responses back to users.
+
+Arch uses its built-in lightweight NLI and embedding models to know if the user has steered away from an active intent.
+Arch's intent-drift detection mechanism is based on its' *prompt_targets* primtive. Arch tries to match an incoming
+prompt to one of the *prompt_targets* configured in the gateway. Once it detects that the user has moved away from an active
+active intent, Arch adds the ``x-arch-intent-drift`` headers to the request before sending it your application servers.
+
+.. literalinclude:: includes/rag/intent_detection.py
+    :language: python
+    :linenos:
+    :lines: 95-125
+    :emphasize-lines: 14-22
+    :caption: Intent Detection Example
+
+
+.. Note::
+
+   Arch is (mostly) stateless so that it can scale in an embarrassingly parrallel fashion. So, while Arch offers
+   intent-drift detetction, you still have to maintain converational state with intent drift as meta-data. The
+   following code snippets show how easily you can build and enrich conversational history with Langchain (in python),
+   so that you can use the most relevant prompts for your retrieval and for prompting upstream LLMs.
+
+
+Step 1: Define ConversationBufferMemory
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: includes/rag/intent_detection.py
+    :language: python
+    :linenos:
+    :lines: 1-21
+
+Step 2: Update ConversationBufferMemory w/ intent
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: includes/rag/intent_detection.py
+    :language: python
+    :linenos:
+    :lines: 22-62
+
+Step 3: Get Messages based on latest drift
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: includes/rag/intent_detection.py
+    :language: python
+    :linenos:
+    :lines: 64-76
+
+
+You can used the last set of messages that match to an intent to prompt an LLM, use it with an vector-DB for
+improved retrieval, etc. With Arch and a few lines of code, you can improve the retrieval accuracy, lower overall
+token cost and dramatically improve the speed of their responses back to users.
+
+Parameter Extraction for RAG
+----------------------------
+
+To build RAG (Retrieval-Augmented Generation) applications, you can configure prompt targets with parameters,
+enabling Arch to retrieve critical information in a structured way for processing. This approach improves the
+retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull
+the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can
+streamline data retrieval and processing to build more efficient and precise RAG applications.
+
+Step 1: Define prompt targets with parameter definitions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: includes/rag/prompt_targets.yaml
+    :language: yaml
+    :caption: Prompt Targets
+    :linenos:
+
+Step 2: Process request parameters in Flask
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Once the prompt targets are configured as above, handling those parameters is
+
+.. literalinclude:: includes/rag/parameter_handling.py
+    :language: python
+    :caption: Parameter handling with Flask
+    :linenos: