mirror of
https://github.com/katanemo/plano.git
synced 2026-04-25 16:56:24 +02:00
313 lines
12 KiB
ReStructuredText
313 lines
12 KiB
ReStructuredText
.. _arch_overview_tracing:
|
|
|
|
Tracing
|
|
=======
|
|
|
|
Overview
|
|
--------
|
|
|
|
`OpenTelemetry <https://opentelemetry.io/>`_ is an open-source observability framework providing APIs
|
|
and instrumentation for generating, collecting, processing, and exporting telemetry data, such as traces,
|
|
metrics, and logs. Its flexible design supports a wide range of backends and seamlessly integrates with
|
|
modern application tools. A key feature of OpenTelemetry is its commitment to standards like the
|
|
`W3C Trace Context <https://www.w3.org/TR/trace-context/>`_
|
|
|
|
**Tracing** is a critical tool that allows developers to visualize and understand the flow of
|
|
requests in an AI application. With tracing, you can capture a detailed view of how requests propagate
|
|
through various services and components, which is crucial for **debugging**, **performance optimization**,
|
|
and understanding complex AI agent architectures like Co-pilots.
|
|
|
|
**Arch** propagates trace context using the W3C Trace Context standard, specifically through the
|
|
``traceparent`` header. This allows each component in the system to record its part of the request
|
|
flow, enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Arch ensures
|
|
that developers can capture this trace data consistently and in a format compatible with various observability
|
|
tools.
|
|
|
|
|
|
Benefits of Using ``Traceparent`` Headers
|
|
-----------------------------------------
|
|
|
|
- **Standardization**: The W3C Trace Context standard ensures compatibility across ecosystem tools, allowing
|
|
traces to be propagated uniformly through different layers of the system.
|
|
- **Ease of Integration**: OpenTelemetry's design allows developers to easily integrate tracing with minimal
|
|
changes to their codebase, enabling quick adoption of end-to-end observability.
|
|
- **Interoperability**: Works seamlessly with popular tracing tools like AWS X-Ray, Datadog, Jaeger, and many others,
|
|
making it easy to visualize traces in the tools you're already usi
|
|
|
|
How to Initiate A Trace
|
|
-----------------------
|
|
|
|
1. **Enable Tracing Configuration**: Simply add the ``random_sampling`` in ``tracing`` section to 100`` flag to in the :ref:`listener <arch_overview_listeners>` config
|
|
|
|
2. **Trace Context Propagation**: Arch automatically propagates the ``traceparent`` header. When a request is received, Arch will:
|
|
|
|
- Generate a new ``traceparent`` header if one is not present.
|
|
- Extract the trace context from the ``traceparent`` header if it exists.
|
|
- Start a new span representing its processing of the request.
|
|
- Forward the ``traceparent`` header to downstream services.
|
|
|
|
3. **Sampling Policy**: The 100 in ``random_sampling: 100`` means that all the requests as sampled for tracing.
|
|
You can adjust this value from 0-100.
|
|
|
|
|
|
Trace Propagation
|
|
-----------------
|
|
|
|
Arch uses the W3C Trace Context standard for trace propagation, which relies on the ``traceparent`` header.
|
|
This header carries tracing information in a standardized format, enabling interoperability between different
|
|
tracing systems.
|
|
|
|
Header Format
|
|
~~~~~~~~~~~~~
|
|
|
|
The ``traceparent`` header has the following format::
|
|
|
|
traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
|
|
|
|
- ``{version}``: The version of the Trace Context specification (e.g., ``00``).
|
|
- ``{trace-id}``: A 16-byte (32-character hexadecimal) unique identifier for the trace.
|
|
- ``{parent-id}``: An 8-byte (16-character hexadecimal) identifier for the parent span.
|
|
- ``{trace-flags}``: Flags indicating trace options (e.g., sampling).
|
|
|
|
Instrumentation
|
|
~~~~~~~~~~~~~~~
|
|
|
|
To integrate AI tracing, your application needs to follow a few simple steps. The steps
|
|
below are very common practice, and not unique to Arch, when you reading tracing headers and export
|
|
`spans <https://docs.lightstep.com/docs/understand-distributed-tracing>`_ for distributed tracing.
|
|
|
|
- Read the ``traceparent`` header from incoming requests.
|
|
- Start new spans as children of the extracted context.
|
|
- Include the ``traceparent`` header in outbound requests to propagate trace context.
|
|
- Send tracing data to a collector or tracing backend to export spans
|
|
|
|
Example with OpenTelemetry in Python
|
|
************************************
|
|
|
|
Install OpenTelemetry packages:
|
|
|
|
.. code-block:: console
|
|
|
|
$ pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
|
|
$ pip install opentelemetry-instrumentation-requests
|
|
|
|
Set up the tracer and exporter:
|
|
|
|
.. code-block:: python
|
|
|
|
from opentelemetry import trace
|
|
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
|
|
from opentelemetry.instrumentation.requests import RequestsInstrumentor
|
|
from opentelemetry.sdk.resources import Resource
|
|
from opentelemetry.sdk.trace import TracerProvider
|
|
from opentelemetry.sdk.trace.export import BatchSpanProcessor
|
|
|
|
# Define the service name
|
|
resource = Resource(attributes={
|
|
"service.name": "customer-support-agent"
|
|
})
|
|
|
|
# Set up the tracer provider and exporter
|
|
tracer_provider = TracerProvider(resource=resource)
|
|
otlp_exporter = OTLPSpanExporter(endpoint="otel-collector:4317", insecure=True)
|
|
span_processor = BatchSpanProcessor(otlp_exporter)
|
|
tracer_provider.add_span_processor(span_processor)
|
|
trace.set_tracer_provider(tracer_provider)
|
|
|
|
# Instrument HTTP requests
|
|
RequestsInstrumentor().instrument()
|
|
|
|
Handle incoming requests:
|
|
|
|
.. code-block:: python
|
|
|
|
from opentelemetry import trace
|
|
from opentelemetry.propagate import extract, inject
|
|
import requests
|
|
|
|
def handle_request(request):
|
|
# Extract the trace context
|
|
context = extract(request.headers)
|
|
tracer = trace.get_tracer(__name__)
|
|
|
|
with tracer.start_as_current_span("process_customer_request", context=context):
|
|
# Example of processing a customer request
|
|
print("Processing customer request...")
|
|
|
|
# Prepare headers for outgoing request to payment service
|
|
headers = {}
|
|
inject(headers)
|
|
|
|
# Make outgoing request to external service (e.g., payment gateway)
|
|
response = requests.get("http://payment-service/api", headers=headers)
|
|
|
|
print(f"Payment service response: {response.content}")
|
|
|
|
|
|
AI Agent Tracing Visualization Example
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The following is an example of tracing for an AI-powered customer support system.
|
|
A customer interacts with AI agents, which forward their requests through different
|
|
specialized services and external systems.
|
|
|
|
::
|
|
|
|
+--------------------------+
|
|
| Customer Interaction |
|
|
+--------------------------+
|
|
|
|
|
v
|
|
+--------------------------+ +--------------------------+
|
|
| Agent 1 (Main - Arch) | ----> | External Payment Service |
|
|
+--------------------------+ +--------------------------+
|
|
| |
|
|
v v
|
|
+--------------------------+ +--------------------------+
|
|
| Agent 2 (Support - Arch)| ----> | Internal Tech Support |
|
|
+--------------------------+ +--------------------------+
|
|
| |
|
|
v v
|
|
+--------------------------+ +--------------------------+
|
|
| Agent 3 (Orders- Arch) | ----> | Inventory Management |
|
|
+--------------------------+ +--------------------------+
|
|
|
|
Trace Breakdown:
|
|
****************
|
|
|
|
- Customer Interaction:
|
|
- Span 1: Customer initiates a request via the AI-powered chatbot for billing support (e.g., asking for payment details).
|
|
|
|
- AI Agent 1 (Main - Arch):
|
|
- Span 2: AI Agent 1 (Main) processes the request and identifies it as related to billing, forwarding the request
|
|
to an external payment service.
|
|
- Span 3: AI Agent 1 determines that additional technical support is needed for processing and forwards the request
|
|
to AI Agent 2.
|
|
|
|
- External Payment Service:
|
|
- Span 4: The external payment service processes the payment-related request (e.g., verifying payment status) and sends
|
|
the response back to AI Agent 1.
|
|
|
|
- AI Agent 2 (Tech - Arch):
|
|
- Span 5: AI Agent 2, responsible for technical queries, processes a request forwarded from AI Agent 1 (e.g., checking for
|
|
any account issues).
|
|
- Span 6: AI Agent 2 forwards the query to Internal Tech Support for further investigation.
|
|
|
|
- Internal Tech Support:
|
|
- Span 7: Internal Tech Support processes the request (e.g., resolving account access issues) and responds to AI Agent 2.
|
|
|
|
- AI Agent 3 (Orders - Arch):
|
|
- Span 8: AI Agent 3 handles order-related queries. AI Agent 1 forwards the request to AI Agent 3 after payment verification
|
|
is completed.
|
|
- Span 9: AI Agent 3 forwards a request to the Inventory Management system to confirm product availability for a pending order.
|
|
|
|
- Inventory Management:
|
|
- Span 10: The Inventory Management system checks stock and availability and returns the information to AI Agent 3.
|
|
|
|
Integrating with Tracing Tools
|
|
------------------------------
|
|
|
|
AWS X-Ray
|
|
~~~~~~~~~
|
|
|
|
To send tracing data to `AWS X-Ray <https://aws.amazon.com/xray/>`_ :
|
|
|
|
1. **Configure OpenTelemetry Collector**: Set up the collector to export traces to AWS X-Ray.
|
|
|
|
Collector configuration (``otel-collector-config.yaml``):
|
|
|
|
.. code-block:: yaml
|
|
|
|
receivers:
|
|
otlp:
|
|
protocols:
|
|
grpc:
|
|
|
|
processors:
|
|
batch:
|
|
|
|
exporters:
|
|
awsxray:
|
|
region: <Your-Aws-Region>
|
|
|
|
service:
|
|
pipelines:
|
|
traces:
|
|
receivers: [otlp]
|
|
processors: [batch]
|
|
exporters: [awsxray]
|
|
|
|
2. **Deploy the Collector**: Run the collector as a Docker container, Kubernetes pod, or standalone service.
|
|
3. **Ensure AWS Credentials**: Provide AWS credentials to the collector, preferably via IAM roles.
|
|
4. **Verify Traces**: Access the AWS X-Ray console to view your traces.
|
|
|
|
Datadog
|
|
~~~~~~~
|
|
|
|
Datadog
|
|
|
|
To send tracing data to `Datadog <https://docs.datadoghq.com/getting_started/tracing/>`_:
|
|
|
|
1. **Configure OpenTelemetry Collector**: Set up the collector to export traces to Datadog.
|
|
|
|
Collector configuration (``otel-collector-config.yaml``):
|
|
|
|
.. code-block:: yaml
|
|
|
|
receivers:
|
|
otlp:
|
|
protocols:
|
|
grpc:
|
|
|
|
processors:
|
|
batch:
|
|
|
|
exporters:
|
|
datadog:
|
|
api:
|
|
key: "${<Your-Datadog-Api-Key>}"
|
|
site: "${DD_SITE}"
|
|
|
|
service:
|
|
pipelines:
|
|
traces:
|
|
receivers: [otlp]
|
|
processors: [batch]
|
|
exporters: [datadog]
|
|
|
|
2. **Set Environment Variables**: Provide your Datadog API key and site.
|
|
|
|
.. code-block:: console
|
|
|
|
$ export <Your-Datadog-Api-Key>=<Your-Datadog-Api-Key>
|
|
$ export DD_SITE=datadoghq.com # Or datadoghq.eu
|
|
|
|
3. **Deploy the Collector**: Run the collector in your environment.
|
|
4. **Verify Traces**: Access the Datadog APM dashboard to view your traces.
|
|
|
|
|
|
Best Practices
|
|
--------------
|
|
|
|
- **Consistent Instrumentation**: Ensure all services propagate the ``traceparent`` header.
|
|
- **Secure Configuration**: Protect sensitive data and secure communication between services.
|
|
- **Performance Monitoring**: Be mindful of the performance impact and adjust sampling rates accordingly.
|
|
- **Error Handling**: Implement proper error handling to prevent tracing issues from affecting your application.
|
|
|
|
Summary
|
|
----------
|
|
|
|
By leveraging the ``traceparent`` header for trace context propagation, Arch enables developers to implement
|
|
tracing efficiently. This approach simplifies the process of collecting and analyzing tracing data in common
|
|
tools like AWS X-Ray and Datadog, enhancing observability and facilitating faster debugging and optimization.
|
|
|
|
Additional Resources
|
|
--------------------
|
|
|
|
- `OpenTelemetry Documentation <https://opentelemetry.io/docs/>`_
|
|
- `W3C Trace Context Specification <https://www.w3.org/TR/trace-context/>`_
|
|
- `AWS X-Ray Exporter <https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/awsxrayexporter>`_
|
|
- `Datadog Exporter <https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/datadogexporter>`_
|
|
|
|
.. Note::
|
|
Replace placeholders such as ``<Your-Aws-Region>`` and ``<Your-Datadog-Api-Key>`` with your actual configurations.
|