Commit graph

260 commits

Author SHA1 Message Date
aayushwhiz
1f9d5860b5 fix after merge 2024-11-08 18:09:37 -08:00
aayushwhiz
cb8e2a772b update stats to output input_sequence_length Histogram
Changes the enforce_ratelimit function by getting token count regardless
of if there is a ratelimit or not, allowing for metric to be saved. This
essentially is the token count of what is sent to openai, but that is
not the tokens being sent by user, so rather than info about usage
statistics, it's more relavant to price or cost. Not yet sure if this is
the best way to go, but i'll use it for now.
2024-11-08 18:09:37 -08:00
aayushwhiz
8fb5c4eceb Add in Latency and output_sequence_length
added latency histogram and ouput sequency length histogram to the wasm
metrics. Updated stream context so that When the end_stream is recieved,
it stores the time since request was sent as well as total number of
tokens up till that point.
2024-11-08 18:09:37 -08:00
aayushwhiz
840b6a0e3e fix bug with checking for token count of zero
Changed check to check that token count is > than 0, changed debug
message to say tokens, and divided time by number of tokens received
during that time so it is actually per token
2024-11-08 18:09:37 -08:00
aayushwhiz
bf39fecd6d add in tpot stat
setup check for first token as well as time per token after that
2024-11-08 18:09:37 -08:00
aayushwhiz
5543aa543f add in time to first token stat
changes stats to implement debug for histogram, update filter_context to
open ttft to stats endpoint and update stream_context to get time
between both of those.
2024-11-08 18:09:37 -08:00
Salman Paracha
4b2b371876
removing depdency on mistral keys (#256) 2024-11-08 16:09:04 -08:00
Adil Hafeez
9081eb0f7f
obfuscate auth header (#254) 2024-11-08 15:17:39 -06:00
Adil Hafeez
88d0f99866
add requirements to readme (#249) 2024-11-08 10:43:18 -08:00
Adil Hafeez
6b62662e01
update docs with weather_forecast path (#253) 2024-11-08 10:00:15 -08:00
Adil Hafeez
a72bb804eb
add support for jaeger tracing (#229) 2024-11-07 22:11:00 -06:00
CTran
fb67788be0
add prefill and test (#236)
* add prefill and test

* fix stream

* fix

* feedback

* address comments

* update

* add e2e test

* fix e2e test

* update fix

* fix

* address cmt

* address cmt
2024-11-07 11:59:29 -08:00
Ikko Eltociear Ashimine
f48489f7c0
chore: update stream_context.rs (#248)
initalize -> initialize
2024-11-05 10:18:33 -08:00
Adil Hafeez
9a6ae2efee
retry embeddings fetch (#245) 2024-11-05 10:04:36 -08:00
Adil Hafeez
9a5c5cc3a3
add http files for llm and prompt gateway for local testing (#244) 2024-11-04 15:53:15 -08:00
Salman Paracha
e4d5293af4
updating README logo (#242)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-10-31 21:58:08 -07:00
Salman Paracha
21f4e7a5e4
fixing ports in arch_config for demos (#241)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-10-31 14:37:04 -07:00
Salman Paracha
3bff4a597b
fix ports and update README for paths to agent/chat (#240)
* fix ports and update README for paths to agent/chat

* minor fix

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-10-31 09:25:24 -07:00
Adil Hafeez
5196fb28a9
update build status badges 2024-10-30 23:27:01 -07:00
Adil Hafeez
9a30afa7e1
update status badges 2024-10-30 23:24:36 -07:00
Adil Hafeez
ad70e540b6
add status badges 2024-10-30 23:24:00 -07:00
Adil Hafeez
8c6ad87c1c
release 0.1.0 (#239)
* set version to 0.1.0

* update readme
2024-10-30 18:56:49 -07:00
Salman Paracha
dab7a44053
several fixes to demos (#238)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-10-30 18:38:18 -07:00
Adil Hafeez
e462e393b1
Use large github action machine to run e2e tests (#230) 2024-10-30 17:54:51 -07:00
Salman Paracha
bb882fb59b
Updated hr_agent to be full stack: gradio + fastAPI (#235)
* commiting to remove

* fix

* updating hr_agent

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
Co-authored-by: Adil Hafeez <adil@katanemo.com>
2024-10-30 15:05:34 -07:00
Salman Paracha
bb9a774a72
moving chatbot-ui in demos and out of root project structure (#228)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-10-29 12:05:29 -07:00
Adil Hafeez
60299244b9
Improve Gradio UI and fix arch_state bug (#227) 2024-10-29 11:27:13 -07:00
José Ulises Niño Rivera
662a840ac5
Add support for streaming and fixes few issues (see description) (#202) 2024-10-28 17:05:06 -07:00
Salman Paracha
29ff8da60f
fixed typos in intro to arch docs (#225) 2024-10-26 10:41:01 -07:00
CTran
25dddcbfd9
fix model server stop process (#217)
* fix model server stop process

* replace

* replace

* add test

* add multiple pids test

* add check install for linux

* reformat
2024-10-24 19:21:47 -07:00
Salman Paracha
ff6e9bd9bd
add README for hr_agent (#224)
* add README for hr_agent

* fixed sample prompt for hr_agent in README

* added screenshot and updated README.md

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-10-24 18:21:52 -07:00
Salman Paracha
f88740582f
fixed typos in arch_config.yaml file based on issue #221 (#223)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-10-24 14:57:29 -07:00
Shuguang Chen
5f3aff4922
Update chatbot UI and update hallucination check (#218)
* update chatbot UI

* Update docker-compose for demos

* Fix bugs

* fix for emtadata (#219)

* fix for emtadata

* fix

* revert

* merge main

---------

Co-authored-by: CTran <cotran2@utexas.edu>
2024-10-24 14:11:53 -07:00
Azib Farooq
05f0491f76
updated key name (#211) 2024-10-23 21:02:24 -07:00
Ikko Eltociear Ashimine
87ce0b1be0
docs: update README.md (#220)
vist -> visit
2024-10-23 20:37:26 -07:00
Salman Paracha
7a5852b401
fixing discord link and moving contributing guide to root (#215)
Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
2024-10-23 15:45:49 -07:00
Salman Paracha
708fa15a9b
HR agent demo (#206)
* commiting my hr_agent branch

* updating the HR agent config

* pushing to remote

* fix hr agent

* committing to merge with main

* updating to merge from main

* updating the demo and model-server-tests to pull from poetry

* updating the poetry.lock files

* updating based on feedback

* updated sysmte prompt for hr_agent

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
Co-authored-by: Adil Hafeez <adil@katanemo.com>
2024-10-23 14:32:40 -07:00
CTran
8495f89fda
Cotran/hallucination (#208) 2024-10-22 12:52:01 -07:00
Adil Hafeez
ea76d85b43
Improve logging (#209)
* improve logging

* fix int tests

* better

* fix more logs

* fix more

* fix int
2024-10-22 12:07:40 -07:00
Adil Hafeez
2f374df034
refactor prompt gateway (#204) 2024-10-21 15:04:15 -07:00
Adil Hafeez
dced8a5708
Add separate util for hallucination and add tests for it (#203) 2024-10-18 19:34:17 -07:00
Adil Hafeez
faf64960df
update observability and dashboards (#198) 2024-10-18 15:07:49 -07:00
Adil Hafeez
f189d5703b update .dockerignore file after filter move 2024-10-18 14:44:39 -07:00
Adil Hafeez
dd1c7be706
Pass tool call and app function response back in metadata (#193) 2024-10-18 13:25:39 -07:00
José Ulises Niño Rivera
62a000036e
Update arch Dockerfile (#200) 2024-10-18 16:15:19 -04:00
Adil Hafeez
1719b7d5f8
Send back developer error correctly (#195) 2024-10-18 13:14:18 -07:00
Adil Hafeez
28421353fd
Update vscode workspce (#199)
- add recommended extensions
- set python interpreter path for all python projects to be venv/bin/python
- update project structure in workspace
- rename project file from gatewa -> archgw
2024-10-18 12:57:58 -07:00
Adil Hafeez
c6ba28dfcc
Code refactor and some improvements - see description (#194) 2024-10-18 12:53:44 -07:00
José Ulises Niño Rivera
aa30353c85
Add cargo workspace to allow rust-analyzer to work correctly (#197)
Signed-off-by: José Ulises Niño Rivera <junr03@users.noreply.github.com>
2024-10-18 15:44:52 -04:00
Salman Paracha
6fb63510b3
fix cli models and logs (#196)
* removing unnecessar setup.py files

* updated the cli for debug and access logs

* ran the pre-commit locally to fix pull request

* fixed bug where if archgw_process is None we didn't handle it gracefully

* Apply suggestions from code review

Co-authored-by: Adil Hafeez <adil@katanemo.com>

* fixed changes based on PR

* fixed version not found message

* fixed message based on PR feedback

* adding poetry lock

* fixed pre-commit

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
Co-authored-by: Adil Hafeez <adil@katanemo.com>
2024-10-18 12:09:45 -07:00