mirror of
https://github.com/katanemo/plano.git
synced 2026-04-27 01:36:33 +02:00
Update docs to Plano (#639)
This commit is contained in:
parent
15fbb6c3af
commit
e224cba3e3
139 changed files with 4407 additions and 24735 deletions
|
|
@ -3,14 +3,14 @@
|
|||
Access Logging
|
||||
==============
|
||||
|
||||
Access logging in Arch refers to the logging of detailed information about each request and response that flows through Arch.
|
||||
It provides visibility into the traffic passing through Arch, which is crucial for monitoring, debugging, and analyzing the
|
||||
Access logging in Plano refers to the logging of detailed information about each request and response that flows through Plano.
|
||||
It provides visibility into the traffic passing through Plano, which is crucial for monitoring, debugging, and analyzing the
|
||||
behavior of AI applications and their interactions.
|
||||
|
||||
Key Features
|
||||
^^^^^^^^^^^^
|
||||
* **Per-Request Logging**:
|
||||
Each request that passes through Arch is logged. This includes important metadata such as HTTP method,
|
||||
Each request that passes through Plano is logged. This includes important metadata such as HTTP method,
|
||||
path, response status code, request duration, upstream host, and more.
|
||||
* **Integration with Monitoring Tools**:
|
||||
Access logs can be exported to centralized logging systems (e.g., ELK stack or Fluentd) or used to feed monitoring and alerting systems.
|
||||
|
|
@ -19,24 +19,24 @@ Key Features
|
|||
How It Works
|
||||
^^^^^^^^^^^^
|
||||
|
||||
Arch gateway exposes access logs for every call it manages on your behalf. By default these access logs can be found under ``~/archgw_logs``. For example:
|
||||
Plano exposes access logs for every call it manages on your behalf. By default these access logs can be found under ``~/plano_logs``. For example:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ tail -F ~/archgw_logs/access_*.log
|
||||
$ tail -F ~/plano_logs/access_*.log
|
||||
|
||||
==> /Users/adilhafeez/archgw_logs/access_llm.log <==
|
||||
==> /Users/username/plano_logs/access_llm.log <==
|
||||
[2024-10-10T03:55:49.537Z] "POST /v1/chat/completions HTTP/1.1" 0 DC 0 0 770 - "-" "OpenAI/Python 1.51.0" "469793af-b25f-9b57-b265-f376e8d8c586" "api.openai.com" "162.159.140.245:443"
|
||||
|
||||
==> /Users/adilhafeez/archgw_logs/access_internal.log <==
|
||||
==> /Users/username/plano_logs/access_internal.log <==
|
||||
[2024-10-10T03:56:03.906Z] "POST /embeddings HTTP/1.1" 200 - 52 21797 54 53 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
|
||||
[2024-10-10T03:56:03.961Z] "POST /zeroshot HTTP/1.1" 200 - 106 218 87 87 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
|
||||
[2024-10-10T03:56:04.050Z] "POST /v1/chat/completions HTTP/1.1" 200 - 1301 614 441 441 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
|
||||
[2024-10-10T03:56:04.492Z] "POST /hallucination HTTP/1.1" 200 - 556 127 104 104 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "model_server" "192.168.65.254:51000"
|
||||
[2024-10-10T03:56:04.598Z] "POST /insurance_claim_details HTTP/1.1" 200 - 447 125 17 17 "-" "-" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "api_server" "192.168.65.254:18083"
|
||||
|
||||
==> /Users/adilhafeez/archgw_logs/access_ingress.log <==
|
||||
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "arch_llm_listener" "0.0.0.0:12000"
|
||||
==> /Users/username/plano_logs/access_ingress.log <==
|
||||
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "plano_llm_listener" "0.0.0.0:12000"
|
||||
|
||||
|
||||
Log Format
|
||||
|
|
@ -58,6 +58,6 @@ For example for following request:
|
|||
|
||||
.. code-block:: console
|
||||
|
||||
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "arch_llm_listener" "0.0.0.0:12000"
|
||||
[2024-10-10T03:56:03.905Z] "POST /v1/chat/completions HTTP/1.1" 200 - 463 1022 1695 984 "-" "OpenAI/Python 1.51.0" "604197fe-2a5b-95a2-9367-1d6b30cfc845" "plano_llm_listener" "0.0.0.0:12000"
|
||||
|
||||
Total duration was 1695ms, and the upstream service took 984ms to process the request. Bytes received and sent were 463 and 1022 respectively.
|
||||
|
|
|
|||
|
|
@ -8,11 +8,11 @@ and instrumentation for generating, collecting, processing, and exporting teleme
|
|||
metrics, and logs. Its flexible design supports a wide range of backends and seamlessly integrates with
|
||||
modern application tools.
|
||||
|
||||
Arch acts a *source* for several monitoring metrics related to **prompts** and **LLMs** natively integrated
|
||||
Plano acts a *source* for several monitoring metrics related to **agents** and **LLMs** natively integrated
|
||||
via `OpenTelemetry <https://opentelemetry.io/>`_ to help you understand three critical aspects of your application:
|
||||
latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which your application
|
||||
is responding to users, which includes metrics like time to first token (TFT), time per output token (TOT) metrics, and
|
||||
the total latency as perceived by users. Below are some screenshots how Arch integrates natively with tools like
|
||||
the total latency as perceived by users. Below are some screenshots how Plano integrates natively with tools like
|
||||
`Grafana <https://grafana.com/grafana/dashboards/>`_ via `Promethus <https://prometheus.io/>`_
|
||||
|
||||
|
||||
|
|
@ -32,7 +32,7 @@ Metrics Dashboard (via Grafana)
|
|||
|
||||
Configure Monitoring
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
Arch gateway publishes stats endpoint at http://localhost:19901/stats. As noted above, Arch is a source for metrics. To view and manipulate dashbaords, you will
|
||||
Plano publishes stats endpoint at http://localhost:19901/stats. As noted above, Plano is a source for metrics. To view and manipulate dashbaords, you will
|
||||
need to configiure `Promethus <https://prometheus.io/>`_ (as a metrics store) and `Grafana <https://grafana.com/grafana/dashboards/>`_ for dashboards. Below
|
||||
are some sample configuration files for both, respectively.
|
||||
|
||||
|
|
@ -51,7 +51,7 @@ are some sample configuration files for both, respectively.
|
|||
timeout: 10s
|
||||
api_version: v2
|
||||
scrape_configs:
|
||||
- job_name: archgw
|
||||
- job_name: plano
|
||||
honor_timestamps: true
|
||||
scrape_interval: 15s
|
||||
scrape_timeout: 10s
|
||||
|
|
|
|||
|
|
@ -17,9 +17,9 @@ requests in an AI application. With tracing, you can capture a detailed view of
|
|||
through various services and components, which is crucial for **debugging**, **performance optimization**,
|
||||
and understanding complex AI agent architectures like Co-pilots.
|
||||
|
||||
**Arch** propagates trace context using the W3C Trace Context standard, specifically through the
|
||||
**Plano** propagates trace context using the W3C Trace Context standard, specifically through the
|
||||
``traceparent`` header. This allows each component in the system to record its part of the request
|
||||
flow, enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Arch ensures
|
||||
flow, enabling **end-to-end tracing** across the entire application. By using OpenTelemetry, Plano ensures
|
||||
that developers can capture this trace data consistently and in a format compatible with various observability
|
||||
tools.
|
||||
|
||||
|
|
@ -41,9 +41,9 @@ Benefits of Using ``Traceparent`` Headers
|
|||
How to Initiate A Trace
|
||||
-----------------------
|
||||
|
||||
1. **Enable Tracing Configuration**: Simply add the ``random_sampling`` in ``tracing`` section to 100`` flag to in the :ref:`listener <arch_overview_listeners>` config
|
||||
1. **Enable Tracing Configuration**: Simply add the ``random_sampling`` in ``tracing`` section to 100`` flag to in the :ref:`listener <plano_overview_listeners>` config
|
||||
|
||||
2. **Trace Context Propagation**: Arch automatically propagates the ``traceparent`` header. When a request is received, Arch will:
|
||||
2. **Trace Context Propagation**: Plano automatically propagates the ``traceparent`` header. When a request is received, Plano will:
|
||||
|
||||
- Generate a new ``traceparent`` header if one is not present.
|
||||
- Extract the trace context from the ``traceparent`` header if it exists.
|
||||
|
|
@ -57,7 +57,7 @@ How to Initiate A Trace
|
|||
Trace Propagation
|
||||
-----------------
|
||||
|
||||
Arch uses the W3C Trace Context standard for trace propagation, which relies on the ``traceparent`` header.
|
||||
Plano uses the W3C Trace Context standard for trace propagation, which relies on the ``traceparent`` header.
|
||||
This header carries tracing information in a standardized format, enabling interoperability between different
|
||||
tracing systems.
|
||||
|
||||
|
|
@ -77,7 +77,7 @@ Instrumentation
|
|||
~~~~~~~~~~~~~~~
|
||||
|
||||
To integrate AI tracing, your application needs to follow a few simple steps. The steps
|
||||
below are very common practice, and not unique to Arch, when you reading tracing headers and export
|
||||
below are very common practice, and not unique to Plano, when you reading tracing headers and export
|
||||
`spans <https://docs.lightstep.com/docs/understand-distributed-tracing>`_ for distributed tracing.
|
||||
|
||||
- Read the ``traceparent`` header from incoming requests.
|
||||
|
|
@ -148,66 +148,6 @@ Handle incoming requests:
|
|||
print(f"Payment service response: {response.content}")
|
||||
|
||||
|
||||
AI Agent Tracing Visualization Example
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following is an example of tracing for an AI-powered customer support system.
|
||||
A customer interacts with AI agents, which forward their requests through different
|
||||
specialized services and external systems.
|
||||
|
||||
::
|
||||
|
||||
+--------------------------+
|
||||
| Customer Interaction |
|
||||
+--------------------------+
|
||||
|
|
||||
v
|
||||
+--------------------------+ +--------------------------+
|
||||
| Agent 1 (Main - Arch) | ----> | External Payment Service |
|
||||
+--------------------------+ +--------------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------------------+ +--------------------------+
|
||||
| Agent 2 (Support - Arch)| ----> | Internal Tech Support |
|
||||
+--------------------------+ +--------------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------------------+ +--------------------------+
|
||||
| Agent 3 (Orders- Arch) | ----> | Inventory Management |
|
||||
+--------------------------+ +--------------------------+
|
||||
|
||||
Trace Breakdown:
|
||||
****************
|
||||
|
||||
- Customer Interaction:
|
||||
- Span 1: Customer initiates a request via the AI-powered chatbot for billing support (e.g., asking for payment details).
|
||||
|
||||
- AI Agent 1 (Main - Arch):
|
||||
- Span 2: AI Agent 1 (Main) processes the request and identifies it as related to billing, forwarding the request
|
||||
to an external payment service.
|
||||
- Span 3: AI Agent 1 determines that additional technical support is needed for processing and forwards the request
|
||||
to AI Agent 2.
|
||||
|
||||
- External Payment Service:
|
||||
- Span 4: The external payment service processes the payment-related request (e.g., verifying payment status) and sends
|
||||
the response back to AI Agent 1.
|
||||
|
||||
- AI Agent 2 (Tech - Arch):
|
||||
- Span 5: AI Agent 2, responsible for technical queries, processes a request forwarded from AI Agent 1 (e.g., checking for
|
||||
any account issues).
|
||||
- Span 6: AI Agent 2 forwards the query to Internal Tech Support for further investigation.
|
||||
|
||||
- Internal Tech Support:
|
||||
- Span 7: Internal Tech Support processes the request (e.g., resolving account access issues) and responds to AI Agent 2.
|
||||
|
||||
- AI Agent 3 (Orders - Arch):
|
||||
- Span 8: AI Agent 3 handles order-related queries. AI Agent 1 forwards the request to AI Agent 3 after payment verification
|
||||
is completed.
|
||||
- Span 9: AI Agent 3 forwards a request to the Inventory Management system to confirm product availability for a pending order.
|
||||
|
||||
- Inventory Management:
|
||||
- Span 10: The Inventory Management system checks stock and availability and returns the information to AI Agent 3.
|
||||
|
||||
Integrating with Tracing Tools
|
||||
------------------------------
|
||||
|
||||
|
|
@ -292,11 +232,11 @@ To send tracing data to `Datadog <https://docs.datadoghq.com/getting_started/tra
|
|||
Langtrace
|
||||
~~~~~~~~~
|
||||
|
||||
Langtrace is an observability tool designed specifically for large language models (LLMs). It helps you capture, analyze, and understand how LLMs are used in your applications including those built using Arch.
|
||||
Langtrace is an observability tool designed specifically for large language models (LLMs). It helps you capture, analyze, and understand how LLMs are used in your applications including those built using Plano.
|
||||
|
||||
To send tracing data to `Langtrace <https://docs.langtrace.ai/supported-integrations/llm-tools/arch>`_:
|
||||
|
||||
1. **Configure Arch**: Make sure Arch is installed and setup correctly. For more information, refer to the `installation guide <https://github.com/katanemo/archgw?tab=readme-ov-file#prerequisites>`_.
|
||||
1. **Configure Plano**: Make sure Plano is installed and setup correctly. For more information, refer to the `installation guide <https://github.com/katanemo/archgw?tab=readme-ov-file#prerequisites>`_.
|
||||
|
||||
2. **Install Langtrace**: Install the Langtrace SDK.:
|
||||
|
||||
|
|
@ -348,7 +288,7 @@ Best Practices
|
|||
Summary
|
||||
----------
|
||||
|
||||
By leveraging the ``traceparent`` header for trace context propagation, Arch enables developers to implement
|
||||
By leveraging the ``traceparent`` header for trace context propagation, Plano enables developers to implement
|
||||
tracing efficiently. This approach simplifies the process of collecting and analyzing tracing data in common
|
||||
tools like AWS X-Ray and Datadog, enhancing observability and facilitating faster debugging and optimization.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue