plano/docs/source/observability/stats.rst
Salman Paracha 7168b14ed3
Salmanap/docs v1 push (#92)
* updated model serving, updated the config references, architecture docs and added the llm_provider section

* several documentation changes to improve sections like life_of_a_request, model serving subsystem

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-09-27 15:37:49 -07:00

9 lines
No EOL
431 B
ReStructuredText

.. _monitoring:
Monitoring
==========
Arch offers several monitoring metrics that help you understand three critical aspects of your application:
latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which your
application is responding to users, which includes metrics like time to first token (TFT), time per output
token (TOT) metrics, and the total latency as perceived by users.