Update docs to Plano (#639)

This commit is contained in:
Salman Paracha 2025-12-23 17:14:50 -08:00 committed by GitHub
parent 15fbb6c3af
commit e224cba3e3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
139 changed files with 4407 additions and 24735 deletions

View file

@ -3,14 +3,14 @@
Access Logging
==============
Access logging in Arch refers to the logging of detailed information about each request and response that flows through Arch.
It provides visibility into the traffic passing through Arch, which is crucial for monitoring, debugging, and analyzing the
Access logging in Plano refers to the logging of detailed information about each request and response that flows through Plano.
It provides visibility into the traffic passing through Plano, which is crucial for monitoring, debugging, and analyzing the
behavior of AI applications and their interactions.
Key Features
^^^^^^^^^^^^
* **Per-Request Logging**:
Each request that passes through Arch is logged. This includes important metadata such as HTTP method,
Each request that passes through Plano is logged. This includes important metadata such as HTTP method,
path, response status code, request duration, upstream host, and more.
* **Integration with Monitoring Tools**:
Access logs can be exported to centralized logging systems (e.g., ELK stack or Fluentd) or used to feed monitoring and alerting systems.
@ -19,24 +19,24 @@ Key Features
How It Works
^^^^^^^^^^^^
Arch gateway exposes access logs for every call it manages on your behalf. By default these access logs can be found under ``~/archgw_logs``. For example:
Plano exposes access logs for every call it manages on your behalf. By default these access logs can be found under ``~/plano_logs``. For example:
.. code-block:: console
$ tail -F ~/archgw_logs/access_*.log
$ tail -F ~/plano_logs/access_*.log
==> /Users/adilhafeez/archgw_logs/access_llm.log <==
==> /Users/username/plano_logs/access_llm.log <==
[2024-10-10T03:55:49.537Z] "POST /v1/chat/completions HTTP/1.1" 0 DC 0 0 770 - "-" "OpenAI/Python 1.51.0" "469793af-b25f-9b57-b265-f376e8d8c586" "api.openai.com" "162.159.140.245:443"
==> /Users/adilhafeez/archgw_logs/access_internal.log <==
==> /Users/username/plano_logs/access_internal.log <==
[2024-10-10T03:56:03.906Z] "POST /embeddings HTTP/1.1" 200 - 52 21797 54 53 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
[2024-10-10T03:56:03.961Z] "POST /zeroshot HTTP/1.1" 200 - 106 218 87 87 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
[2024-10-10T03:56:04.050Z] "POST /v1/chat/completions HTTP/1.1" 200 - 1301 614 441 441 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
[2024-10-10T03:56:04.492Z] "POST /hallucination HTTP/1.1" 200 - 556 127 104 104 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
[2024-10-10T03:56:04.598Z] "POST /insurance_claim_details HTTP/1.1" 200 - 447 125 17 17 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "api_server" "192.168.65.254:18083"
==> /Users/adilhafeez/archgw_logs/access_ingress.log <==
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "arch_llm_listener" "0.0.0.0:12000"
==> /Users/username/plano_logs/access_ingress.log <==
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "plano_llm_listener" "0.0.0.0:12000"
Log Format
@ -58,6 +58,6 @@ For example for following request:
.. code-block:: console
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "arch_llm_listener" "0.0.0.0:12000"
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "plano_llm_listener" "0.0.0.0:12000"
Total duration was 1695ms, and the upstream service took 984ms to process the request. Bytes received and sent were 463 and 1022 respectively.

View file

@ -8,11 +8,11 @@ and instrumentation for generating, collecting, processing, and exporting teleme
metrics, and logs. Its flexible design supports a wide range of backends and seamlessly integrates with
modern application tools.
Arch acts a *source* for several monitoring metrics related to **prompts** and **LLMs** natively integrated
Plano acts a *source* for several monitoring metrics related to **agents** and **LLMs** natively integrated
via `OpenTelemetry <https://opentelemetry.io/>`_ to help you understand three critical aspects of your application:
latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which your application
is responding to users, which includes metrics like time to first token (TFT), time per output token (TOT) metrics, and
the total latency as perceived by users. Below are some screenshots how Arch integrates natively with tools like
the total latency as perceived by users. Below are some screenshots how Plano integrates natively with tools like
`Grafana <https://grafana.com/grafana/dashboards/>`_ via `Promethus <https://prometheus.io/>`_
@ -32,7 +32,7 @@ Metrics Dashboard (via Grafana)
Configure Monitoring
~~~~~~~~~~~~~~~~~~~~
Arch gateway publishes stats endpoint at http://localhost:19901/stats. As noted above, Arch is a source for metrics. To view and manipulate dashbaords, you will
Plano publishes stats endpoint at http://localhost:19901/stats. As noted above, Plano is a source for metrics. To view and manipulate dashbaords, you will
need to configiure `Promethus <https://prometheus.io/>`_ (as a metrics store) and `Grafana <https://grafana.com/grafana/dashboards/>`_ for dashboards. Below
are some sample configuration files for both, respectively.
@ -51,7 +51,7 @@ are some sample configuration files for both, respectively.
timeout: 10s
api_version: v2
scrape_configs:
- job_name: archgw
- job_name: plano
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s

View file

@ -17,9 +17,9 @@ requests in an AI application. With tracing, you can capture a detailed view of
through various services and components, which is crucial for **debugging**, **performance optimization**,
and understanding complex AI agent architectures like Co-pilots.
**Arch** propagates trace context using the W3C Trace Context standard, specifically through the
**Plano** propagates trace context using the W3C Trace Context standard, specifically through the
``traceparent`` header. This allows each component in the system to record its part of the request
flow, enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Arch ensures
flow, enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Plano ensures
that developers can capture this trace data consistently and in a format compatible with various observability
tools.
@ -41,9 +41,9 @@ Benefits of Using ``Traceparent`` Headers
How to Initiate A Trace
-----------------------
1. **Enable Tracing Configuration**: Simply add the ``random_sampling`` in ``tracing`` section to 100`` flag to in the :ref:`listener <arch_overview_listeners>` config
1. **Enable Tracing Configuration**: Simply add the ``random_sampling`` in ``tracing`` section to 100`` flag to in the :ref:`listener <plano_overview_listeners>` config
2. **Trace Context Propagation**: Arch automatically propagates the ``traceparent`` header. When a request is received, Arch will:
2. **Trace Context Propagation**: Plano automatically propagates the ``traceparent`` header. When a request is received, Plano will:
- Generate a new ``traceparent`` header if one is not present.
- Extract the trace context from the ``traceparent`` header if it exists.
@ -57,7 +57,7 @@ How to Initiate A Trace
Trace Propagation
-----------------
Arch uses the W3C Trace Context standard for trace propagation, which relies on the ``traceparent`` header.
Plano uses the W3C Trace Context standard for trace propagation, which relies on the ``traceparent`` header.
This header carries tracing information in a standardized format, enabling interoperability between different
tracing systems.
@ -77,7 +77,7 @@ Instrumentation
~~~~~~~~~~~~~~~
To integrate AI tracing, your application needs to follow a few simple steps. The steps
below are very common practice, and not unique to Arch, when you reading tracing headers and export
below are very common practice, and not unique to Plano, when you reading tracing headers and export
`spans <https://docs.lightstep.com/docs/understand-distributed-tracing>`_ for distributed tracing.
- Read the ``traceparent`` header from incoming requests.
@ -148,66 +148,6 @@ Handle incoming requests:
print(f"Payment service response: {response.content}")
AI Agent Tracing Visualization Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following is an example of tracing for an AI-powered customer support system.
A customer interacts with AI agents, which forward their requests through different
specialized services and external systems.
::
+--------------------------+
| Customer Interaction |
+--------------------------+
|
v
+--------------------------+ +--------------------------+
| Agent 1 (Main - Arch) | ----> | External Payment Service |
+--------------------------+ +--------------------------+
| |
v v
+--------------------------+ +--------------------------+
| Agent 2 (Support - Arch)| ----> | Internal Tech Support |
+--------------------------+ +--------------------------+
| |
v v
+--------------------------+ +--------------------------+
| Agent 3 (Orders- Arch) | ----> | Inventory Management |
+--------------------------+ +--------------------------+
Trace Breakdown:
****************
- Customer Interaction:
- Span 1: Customer initiates a request via the AI-powered chatbot for billing support (e.g., asking for payment details).
- AI Agent 1 (Main - Arch):
- Span 2: AI Agent 1 (Main) processes the request and identifies it as related to billing, forwarding the request
to an external payment service.
- Span 3: AI Agent 1 determines that additional technical support is needed for processing and forwards the request
to AI Agent 2.
- External Payment Service:
- Span 4: The external payment service processes the payment-related request (e.g., verifying payment status) and sends
the response back to AI Agent 1.
- AI Agent 2 (Tech - Arch):
- Span 5: AI Agent 2, responsible for technical queries, processes a request forwarded from AI Agent 1 (e.g., checking for
any account issues).
- Span 6: AI Agent 2 forwards the query to Internal Tech Support for further investigation.
- Internal Tech Support:
- Span 7: Internal Tech Support processes the request (e.g., resolving account access issues) and responds to AI Agent 2.
- AI Agent 3 (Orders - Arch):
- Span 8: AI Agent 3 handles order-related queries. AI Agent 1 forwards the request to AI Agent 3 after payment verification
is completed.
- Span 9: AI Agent 3 forwards a request to the Inventory Management system to confirm product availability for a pending order.
- Inventory Management:
- Span 10: The Inventory Management system checks stock and availability and returns the information to AI Agent 3.
Integrating with Tracing Tools
------------------------------
@ -292,11 +232,11 @@ To send tracing data to `Datadog <https://docs.datadoghq.com/getting_started/tra
Langtrace
~~~~~~~~~
Langtrace is an observability tool designed specifically for large language models (LLMs). It helps you capture, analyze, and understand how LLMs are used in your applications including those built using Arch.
Langtrace is an observability tool designed specifically for large language models (LLMs). It helps you capture, analyze, and understand how LLMs are used in your applications including those built using Plano.
To send tracing data to `Langtrace <https://docs.langtrace.ai/supported-integrations/llm-tools/arch>`_:
1. **Configure Arch**: Make sure Arch is installed and setup correctly. For more information, refer to the `installation guide <https://github.com/katanemo/archgw?tab=readme-ov-file#prerequisites>`_.
1. **Configure Plano**: Make sure Plano is installed and setup correctly. For more information, refer to the `installation guide <https://github.com/katanemo/archgw?tab=readme-ov-file#prerequisites>`_.
2. **Install Langtrace**: Install the Langtrace SDK.:
@ -348,7 +288,7 @@ Best Practices
Summary
----------
By leveraging the ``traceparent`` header for trace context propagation, Arch enables developers to implement
By leveraging the ``traceparent`` header for trace context propagation, Plano enables developers to implement
tracing efficiently. This approach simplifies the process of collecting and analyzing tracing data in common
tools like AWS X-Ray and Datadog, enhancing observability and facilitating faster debugging and optimization.