Changes the enforce_ratelimit function by getting token count regardless
of if there is a ratelimit or not, allowing for metric to be saved. This
essentially is the token count of what is sent to openai, but that is
not the tokens being sent by user, so rather than info about usage
statistics, it's more relavant to price or cost. Not yet sure if this is
the best way to go, but i'll use it for now.
added latency histogram and ouput sequency length histogram to the wasm
metrics. Updated stream context so that When the end_stream is recieved,
it stores the time since request was sent as well as total number of
tokens up till that point.
Changed check to check that token count is > than 0, changed debug
message to say tokens, and divided time by number of tokens received
during that time so it is actually per token
changes stats to implement debug for histogram, update filter_context to
open ttft to stats endpoint and update stream_context to get time
between both of those.
* commiting my hr_agent branch
* updating the HR agent config
* pushing to remote
* fix hr agent
* committing to merge with main
* updating to merge from main
* updating the demo and model-server-tests to pull from poetry
* updating the poetry.lock files
* updating based on feedback
* updated sysmte prompt for hr_agent
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
Co-authored-by: Adil Hafeez <adil@katanemo.com>
- add recommended extensions
- set python interpreter path for all python projects to be venv/bin/python
- update project structure in workspace
- rename project file from gatewa -> archgw
* removing unnecessar setup.py files
* updated the cli for debug and access logs
* ran the pre-commit locally to fix pull request
* fixed bug where if archgw_process is None we didn't handle it gracefully
* Apply suggestions from code review
Co-authored-by: Adil Hafeez <adil@katanemo.com>
* fixed changes based on PR
* fixed version not found message
* fixed message based on PR feedback
* adding poetry lock
* fixed pre-commit
---------
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
Co-authored-by: Adil Hafeez <adil@katanemo.com>
```
Installing dependencies from lock file
pyproject.toml changed significantly since poetry.lock was last generated. Run `poetry lock [--no-update]` to fix the lock file.
Error installing model server dependencies: Command '['poetry', 'install', '--no-cache']' returned non-zero exit status 1.
```