plano

mirror of https://github.com/katanemo/plano.git synced 2026-06-17 15:25:17 +02:00

Author	SHA1	Message	Date
Adil Hafeez	7dc9a7b533	refactor a bit	2024-11-12 14:46:05 -08:00
Adil Hafeez	0cf6a5677f	fix int tests	2024-11-12 14:42:23 -08:00
Adil Hafeez	4135911803	fix dashboard	2024-11-12 14:24:08 -08:00
Adil Hafeez	ee2751be68	address pr feedback	2024-11-12 13:05:03 -08:00
Adil Hafeez	d035b33faf	Merge branch 'main' into collect-stats-in-stream-context	2024-11-12 11:33:48 -08:00
Adil Hafeez	d76ca01980	fix int tests	2024-11-12 11:31:58 -08:00
Adil Hafeez	30647fd508	Add service to stream custom otel traces to otel-collector (#262 )	2024-11-12 11:09:40 -08:00
Adil Hafeez	5421953ea9	Merge branch 'main' into collect-stats-in-stream-context	2024-11-12 11:01:16 -08:00
Adil Hafeez	d87105882b	update rust toolchain to 1.82 (#255 ) * update rust to 1.82 pin it, also update envoy to 1.32 and python to 3.13 * use python:3.12	2024-11-12 10:35:14 -08:00
aayushwhiz	6fc32b0152	update weather_forecast demo to spin up grafana and prometheus when using monitoring profile has full dashboard with total requests, time per output token, time to first token, total latency, output sequence length, and input sequence length.	2024-11-11 17:00:48 -08:00
aayushwhiz	f4e9624c03	update integration tests to expect new stats and new request for time	2024-11-08 18:09:37 -08:00
aayushwhiz	1f9d5860b5	fix after merge	2024-11-08 18:09:37 -08:00
aayushwhiz	cb8e2a772b	update stats to output input_sequence_length Histogram Changes the enforce_ratelimit function by getting token count regardless of if there is a ratelimit or not, allowing for metric to be saved. This essentially is the token count of what is sent to openai, but that is not the tokens being sent by user, so rather than info about usage statistics, it's more relavant to price or cost. Not yet sure if this is the best way to go, but i'll use it for now.	2024-11-08 18:09:37 -08:00
aayushwhiz	8fb5c4eceb	Add in Latency and output_sequence_length added latency histogram and ouput sequency length histogram to the wasm metrics. Updated stream context so that When the end_stream is recieved, it stores the time since request was sent as well as total number of tokens up till that point.	2024-11-08 18:09:37 -08:00
aayushwhiz	840b6a0e3e	fix bug with checking for token count of zero Changed check to check that token count is > than 0, changed debug message to say tokens, and divided time by number of tokens received during that time so it is actually per token	2024-11-08 18:09:37 -08:00
aayushwhiz	bf39fecd6d	add in tpot stat setup check for first token as well as time per token after that	2024-11-08 18:09:37 -08:00
aayushwhiz	5543aa543f	add in time to first token stat changes stats to implement debug for histogram, update filter_context to open ttft to stats endpoint and update stream_context to get time between both of those.	2024-11-08 18:09:37 -08:00
Adil Hafeez	9081eb0f7f	obfuscate auth header (#254 )	2024-11-08 15:17:39 -06:00
Adil Hafeez	a72bb804eb	add support for jaeger tracing (#229 )	2024-11-07 22:11:00 -06:00
Ikko Eltociear Ashimine	f48489f7c0	chore: update stream_context.rs (#248 ) initalize -> initialize	2024-11-05 10:18:33 -08:00
Adil Hafeez	9a6ae2efee	retry embeddings fetch (#245 )	2024-11-05 10:04:36 -08:00
Adil Hafeez	e462e393b1	Use large github action machine to run e2e tests (#230 )	2024-10-30 17:54:51 -07:00
Salman Paracha	bb882fb59b	Updated hr_agent to be full stack: gradio + fastAPI (#235 ) * commiting to remove * fix * updating hr_agent --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>	2024-10-30 15:05:34 -07:00
Adil Hafeez	60299244b9	Improve Gradio UI and fix arch_state bug (#227 )	2024-10-29 11:27:13 -07:00
José Ulises Niño Rivera	662a840ac5	Add support for streaming and fixes few issues (see description) (#202 )	2024-10-28 17:05:06 -07:00
Shuguang Chen	5f3aff4922	Update chatbot UI and update hallucination check (#218 ) * update chatbot UI * Update docker-compose for demos * Fix bugs * fix for emtadata (#219) * fix for emtadata * fix * revert * merge main --------- Co-authored-by: CTran <cotran2@utexas.edu>	2024-10-24 14:11:53 -07:00
Azib Farooq	05f0491f76	updated key name (#211 )	2024-10-23 21:02:24 -07:00
CTran	8495f89fda	Cotran/hallucination (#208 )	2024-10-22 12:52:01 -07:00
Adil Hafeez	ea76d85b43	Improve logging (#209 ) * improve logging * fix int tests * better * fix more logs * fix more * fix int	2024-10-22 12:07:40 -07:00
Adil Hafeez	2f374df034	refactor prompt gateway (#204 )	2024-10-21 15:04:15 -07:00
Adil Hafeez	dced8a5708	Add separate util for hallucination and add tests for it (#203 )	2024-10-18 19:34:17 -07:00
Adil Hafeez	faf64960df	update observability and dashboards (#198 )	2024-10-18 15:07:49 -07:00
Adil Hafeez	dd1c7be706	Pass tool call and app function response back in metadata (#193 )	2024-10-18 13:25:39 -07:00
Adil Hafeez	1719b7d5f8	Send back developer error correctly (#195 )	2024-10-18 13:14:18 -07:00
Adil Hafeez	c6ba28dfcc	Code refactor and some improvements - see description (#194 )	2024-10-18 12:53:44 -07:00
José Ulises Niño Rivera	aa30353c85	Add cargo workspace to allow rust-analyzer to work correctly (#197 ) Signed-off-by: José Ulises Niño Rivera <junr03@users.noreply.github.com>	2024-10-18 15:44:52 -04:00
Adil Hafeez	21e7fe2cef	Split arch wasm filter code into prompt and llm gateway filters (#190 )	2024-10-17 10:16:40 -07:00
Adil Hafeez	3bd2ffe9fb	split wasm filter (#186 ) * split wasm filter * fix int and unit tests * rename public_types => common and move common code there * rename * fix int test	2024-10-16 14:20:26 -07:00

38 commits