mirror of
https://github.com/katanemo/plano.git
synced 2026-05-27 14:17:15 +02:00
Doc Update (#129)
* init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
This commit is contained in:
parent
2a7b95582c
commit
5c7567584d
49 changed files with 1185 additions and 609 deletions
23
docs/source/guides/observability/access_logging.rst
Normal file
23
docs/source/guides/observability/access_logging.rst
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
.. _arch_access_logging:
|
||||
|
||||
Access Logging
|
||||
==============
|
||||
|
||||
Access logging in Arch refers to the logging of detailed information about each request and response that flows through Arch.
|
||||
It provides visibility into the traffic passing through Arch, which is crucial for monitoring, debugging, and analyzing the
|
||||
behavior of AI applications and their interactions.
|
||||
|
||||
Key Features
|
||||
^^^^^^^^^^^^
|
||||
* **Per-Request Logging**:
|
||||
Each request that passes through Arch is logged. This includes important metadata such as HTTP method,
|
||||
path, response status code, request duration, upstream host, and more.
|
||||
* **Integration with Monitoring Tools**:
|
||||
Access logs can be exported to centralized logging systems (e.g., ELK stack or Fluentd) or used to feed monitoring and alerting systems.
|
||||
* **Structured Logging**: where each request is logged as a object, making it easier to parse and analyze using tools like Elasticsearch and Kibana.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ [2024-09-27T14:52:01.123Z] "ARCH REQUEST" GET /path/to/resource HTTP/1.1 200 512 1024 56 upstream_service.com D
|
||||
X-Arch-Upstream-Service-Time: 25
|
||||
X-Arch-Attempt-Count: 1
|
||||
9
docs/source/guides/observability/monitoring.rst
Normal file
9
docs/source/guides/observability/monitoring.rst
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
.. _monitoring:
|
||||
|
||||
Monitoring
|
||||
==========
|
||||
|
||||
Arch offers several monitoring metrics that help you understand three critical aspects of your application:
|
||||
latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which your
|
||||
application is responding to users, which includes metrics like time to first token (TFT), time per output
|
||||
token (TOT) metrics, and the total latency as perceived by users.
|
||||
11
docs/source/guides/observability/observability.rst
Normal file
11
docs/source/guides/observability/observability.rst
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
.. _observability:
|
||||
|
||||
Observability
|
||||
=============
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
tracing
|
||||
monitoring
|
||||
access_logging
|
||||
313
docs/source/guides/observability/tracing.rst
Normal file
313
docs/source/guides/observability/tracing.rst
Normal file
|
|
@ -0,0 +1,313 @@
|
|||
.. _arch_overview_tracing:
|
||||
|
||||
Tracing
|
||||
=======
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
`OpenTelemetry <https://opentelemetry.io/>`_ is an open-source observability framework providing APIs
|
||||
and instrumentation for generating, collecting, processing, and exporting telemetry data, such as traces,
|
||||
metrics, and logs. Its flexible design supports a wide range of backends and seamlessly integrates with
|
||||
modern application tools. A key feature of OpenTelemetry is its commitment to standards like the
|
||||
`W3C Trace Context <https://www.w3.org/TR/trace-context/>`_
|
||||
|
||||
**Tracing** is a critical tool that allows developers to visualize and understand the flow of
|
||||
requests in an AI application. With tracing, you can capture a detailed view of how requests propagate
|
||||
through various services and components, which is crucial for **debugging**, **performance optimization**,
|
||||
and understanding complex AI agent architectures like Co-pilots.
|
||||
|
||||
**Arch** propagates trace context using the W3C Trace Context standard, specifically through the
|
||||
``traceparent`` header. This allows each component in the system to record its part of the request
|
||||
flow, enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Arch ensures
|
||||
that developers can capture this trace data consistently and in a format compatible with various observability
|
||||
tools.
|
||||
|
||||
|
||||
Benefits of Using ``Traceparent`` Headers
|
||||
-----------------------------------------
|
||||
|
||||
- **Standardization**: The W3C Trace Context standard ensures compatibility across ecosystem tools, allowing
|
||||
traces to be propagated uniformly through different layers of the system.
|
||||
- **Ease of Integration**: OpenTelemetry's design allows developers to easily integrate tracing with minimal
|
||||
changes to their codebase, enabling quick adoption of end-to-end observability.
|
||||
- **Interoperability**: Works seamlessly with popular tracing tools like AWS X-Ray, Datadog, Jaeger, and many others,
|
||||
making it easy to visualize traces in the tools you're already usi
|
||||
|
||||
How to Initiate A Trace
|
||||
-----------------------
|
||||
|
||||
1. **Enable Tracing Configuration**: Simply add the ``tracing: 100`` flag to in the :ref:`listener <arch_overview_listeners>` config
|
||||
|
||||
2. **Trace Context Propagation**: Arch automatically propagates the ``traceparent`` header. When a request is received, Arch will:
|
||||
|
||||
- Generate a new ``traceparent`` header if one is not present.
|
||||
- Extract the trace context from the ``traceparent`` header if it exists.
|
||||
- Start a new span representing its processing of the request.
|
||||
- Forward the ``traceparent`` header to downstream services.
|
||||
|
||||
3. **Sampling Policy**: The 100 in ``tracing: 100`` means that all the requests as sampled for tracing.
|
||||
You can adjust this value from 0-100.
|
||||
|
||||
|
||||
Trace Propagation
|
||||
-----------------
|
||||
|
||||
Arch uses the W3C Trace Context standard for trace propagation, which relies on the ``traceparent`` header.
|
||||
This header carries tracing information in a standardized format, enabling interoperability between different
|
||||
tracing systems.
|
||||
|
||||
Header Format
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
The ``traceparent`` header has the following format::
|
||||
|
||||
traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
|
||||
|
||||
- ``{version}``: The version of the Trace Context specification (e.g., ``00``).
|
||||
- ``{trace-id}``: A 16-byte (32-character hexadecimal) unique identifier for the trace.
|
||||
- ``{parent-id}``: An 8-byte (16-character hexadecimal) identifier for the parent span.
|
||||
- ``{trace-flags}``: Flags indicating trace options (e.g., sampling).
|
||||
|
||||
Instrumentation
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
To integrate AI tracing, your application needs to follow a few simple steps. The steps
|
||||
below are very common practice, and not unique to Arch, when you reading tracing headers and export
|
||||
`spans <https://docs.lightstep.com/docs/understand-distributed-tracing>`_ for distributed tracing.
|
||||
|
||||
- Read the ``traceparent`` header from incoming requests.
|
||||
- Start new spans as children of the extracted context.
|
||||
- Include the ``traceparent`` header in outbound requests to propagate trace context.
|
||||
- Send tracing data to a collector or tracing backend to export spans
|
||||
|
||||
Example with OpenTelemetry in Python
|
||||
************************************
|
||||
|
||||
Install OpenTelemetry packages:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
|
||||
$ pip install opentelemetry-instrumentation-requests
|
||||
|
||||
Set up the tracer and exporter:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from opentelemetry import trace
|
||||
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
|
||||
from opentelemetry.instrumentation.requests import RequestsInstrumentor
|
||||
from opentelemetry.sdk.resources import Resource
|
||||
from opentelemetry.sdk.trace import TracerProvider
|
||||
from opentelemetry.sdk.trace.export import BatchSpanProcessor
|
||||
|
||||
# Define the service name
|
||||
resource = Resource(attributes={
|
||||
"service.name": "customer-support-agent"
|
||||
})
|
||||
|
||||
# Set up the tracer provider and exporter
|
||||
tracer_provider = TracerProvider(resource=resource)
|
||||
otlp_exporter = OTLPSpanExporter(endpoint="otel-collector:4317", insecure=True)
|
||||
span_processor = BatchSpanProcessor(otlp_exporter)
|
||||
tracer_provider.add_span_processor(span_processor)
|
||||
trace.set_tracer_provider(tracer_provider)
|
||||
|
||||
# Instrument HTTP requests
|
||||
RequestsInstrumentor().instrument()
|
||||
|
||||
Handle incoming requests:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from opentelemetry import trace
|
||||
from opentelemetry.propagate import extract, inject
|
||||
import requests
|
||||
|
||||
def handle_request(request):
|
||||
# Extract the trace context
|
||||
context = extract(request.headers)
|
||||
tracer = trace.get_tracer(__name__)
|
||||
|
||||
with tracer.start_as_current_span("process_customer_request", context=context):
|
||||
# Example of processing a customer request
|
||||
print("Processing customer request...")
|
||||
|
||||
# Prepare headers for outgoing request to payment service
|
||||
headers = {}
|
||||
inject(headers)
|
||||
|
||||
# Make outgoing request to external service (e.g., payment gateway)
|
||||
response = requests.get("http://payment-service/api", headers=headers)
|
||||
|
||||
print(f"Payment service response: {response.content}")
|
||||
|
||||
|
||||
AI Agent Tracing Visualization Example
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following is an example of tracing for an AI-powered customer support system.
|
||||
A customer interacts with AI agents, which forward their requests through different
|
||||
specialized services and external systems.
|
||||
|
||||
::
|
||||
|
||||
+--------------------------+
|
||||
| Customer Interaction |
|
||||
+--------------------------+
|
||||
|
|
||||
v
|
||||
+--------------------------+ +--------------------------+
|
||||
| Agent 1 (Main - Arch) | ----> | External Payment Service |
|
||||
+--------------------------+ +--------------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------------------+ +--------------------------+
|
||||
| Agent 2 (Support - Arch)| ----> | Internal Tech Support |
|
||||
+--------------------------+ +--------------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------------------+ +--------------------------+
|
||||
| Agent 3 (Orders- Arch) | ----> | Inventory Management |
|
||||
+--------------------------+ +--------------------------+
|
||||
|
||||
Trace Breakdown:
|
||||
****************
|
||||
|
||||
- Customer Interaction:
|
||||
- Span 1: Customer initiates a request via the AI-powered chatbot for billing support (e.g., asking for payment details).
|
||||
|
||||
- AI Agent 1 (Main - Arch):
|
||||
- Span 2: AI Agent 1 (Main) processes the request and identifies it as related to billing, forwarding the request
|
||||
to an external payment service.
|
||||
- Span 3: AI Agent 1 determines that additional technical support is needed for processing and forwards the request
|
||||
to AI Agent 2.
|
||||
|
||||
- External Payment Service:
|
||||
- Span 4: The external payment service processes the payment-related request (e.g., verifying payment status) and sends
|
||||
the response back to AI Agent 1.
|
||||
|
||||
- AI Agent 2 (Tech - Arch):
|
||||
- Span 5: AI Agent 2, responsible for technical queries, processes a request forwarded from AI Agent 1 (e.g., checking for
|
||||
any account issues).
|
||||
- Span 6: AI Agent 2 forwards the query to Internal Tech Support for further investigation.
|
||||
|
||||
- Internal Tech Support:
|
||||
- Span 7: Internal Tech Support processes the request (e.g., resolving account access issues) and responds to AI Agent 2.
|
||||
|
||||
- AI Agent 3 (Orders - Arch):
|
||||
- Span 8: AI Agent 3 handles order-related queries. AI Agent 1 forwards the request to AI Agent 3 after payment verification
|
||||
is completed.
|
||||
- Span 9: AI Agent 3 forwards a request to the Inventory Management system to confirm product availability for a pending order.
|
||||
|
||||
- Inventory Management:
|
||||
- Span 10: The Inventory Management system checks stock and availability and returns the information to AI Agent 3.
|
||||
|
||||
Integrating with Tracing Tools
|
||||
------------------------------
|
||||
|
||||
AWS X-Ray
|
||||
~~~~~~~~~
|
||||
|
||||
To send tracing data to `AWS X-Ray <https://aws.amazon.com/xray/>`_ :
|
||||
|
||||
1. **Configure OpenTelemetry Collector**: Set up the collector to export traces to AWS X-Ray.
|
||||
|
||||
Collector configuration (``otel-collector-config.yaml``):
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
receivers:
|
||||
otlp:
|
||||
protocols:
|
||||
grpc:
|
||||
|
||||
processors:
|
||||
batch:
|
||||
|
||||
exporters:
|
||||
awsxray:
|
||||
region: <Your-Aws-Region>
|
||||
|
||||
service:
|
||||
pipelines:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [awsxray]
|
||||
|
||||
2. **Deploy the Collector**: Run the collector as a Docker container, Kubernetes pod, or standalone service.
|
||||
3. **Ensure AWS Credentials**: Provide AWS credentials to the collector, preferably via IAM roles.
|
||||
4. **Verify Traces**: Access the AWS X-Ray console to view your traces.
|
||||
|
||||
Datadog
|
||||
~~~~~~~
|
||||
|
||||
Datadog
|
||||
|
||||
To send tracing data to `Datadog <https://docs.datadoghq.com/getting_started/tracing/>`_:
|
||||
|
||||
1. **Configure OpenTelemetry Collector**: Set up the collector to export traces to Datadog.
|
||||
|
||||
Collector configuration (``otel-collector-config.yaml``):
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
receivers:
|
||||
otlp:
|
||||
protocols:
|
||||
grpc:
|
||||
|
||||
processors:
|
||||
batch:
|
||||
|
||||
exporters:
|
||||
datadog:
|
||||
api:
|
||||
key: "${<Your-Datadog-Api-Key>}"
|
||||
site: "${DD_SITE}"
|
||||
|
||||
service:
|
||||
pipelines:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [datadog]
|
||||
|
||||
2. **Set Environment Variables**: Provide your Datadog API key and site.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ export <Your-Datadog-Api-Key>=<Your-Datadog-Api-Key>
|
||||
$ export DD_SITE=datadoghq.com # Or datadoghq.eu
|
||||
|
||||
3. **Deploy the Collector**: Run the collector in your environment.
|
||||
4. **Verify Traces**: Access the Datadog APM dashboard to view your traces.
|
||||
|
||||
|
||||
Best Practices
|
||||
--------------
|
||||
|
||||
- **Consistent Instrumentation**: Ensure all services propagate the ``traceparent`` header.
|
||||
- **Secure Configuration**: Protect sensitive data and secure communication between services.
|
||||
- **Performance Monitoring**: Be mindful of the performance impact and adjust sampling rates accordingly.
|
||||
- **Error Handling**: Implement proper error handling to prevent tracing issues from affecting your application.
|
||||
|
||||
Summary
|
||||
----------
|
||||
|
||||
By leveraging the ``traceparent`` header for trace context propagation, Arch enables developers to implement
|
||||
tracing efficiently. This approach simplifies the process of collecting and analyzing tracing data in common
|
||||
tools like AWS X-Ray and Datadog, enhancing observability and facilitating faster debugging and optimization.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
- `OpenTelemetry Documentation <https://opentelemetry.io/docs/>`_
|
||||
- `W3C Trace Context Specification <https://www.w3.org/TR/trace-context/>`_
|
||||
- `AWS X-Ray Exporter <https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/awsxrayexporter>`_
|
||||
- `Datadog Exporter <https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/datadogexporter>`_
|
||||
|
||||
.. Note::
|
||||
Replace placeholders such as ``<Your-Aws-Region>`` and ``<Your-Datadog-Api-Key>`` with your actual configurations.
|
||||
Loading…
Add table
Add a link
Reference in a new issue