mirror of
https://github.com/katanemo/plano.git
synced 2026-05-02 12:22:43 +02:00
* updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
9 lines
No EOL
431 B
ReStructuredText
9 lines
No EOL
431 B
ReStructuredText
.. _monitoring:
|
|
|
|
Monitoring
|
|
==========
|
|
|
|
Arch offers several monitoring metrics that help you understand three critical aspects of your application:
|
|
latency, token usage, and error rates by an upstream LLM provider. Latency measures the speed at which your
|
|
application is responding to users, which includes metrics like time to first token (TFT), time per output
|
|
token (TOT) metrics, and the total latency as perceived by users. |