Changes the enforce_ratelimit function by getting token count regardless
of if there is a ratelimit or not, allowing for metric to be saved. This
essentially is the token count of what is sent to openai, but that is
not the tokens being sent by user, so rather than info about usage
statistics, it's more relavant to price or cost. Not yet sure if this is
the best way to go, but i'll use it for now.
added latency histogram and ouput sequency length histogram to the wasm
metrics. Updated stream context so that When the end_stream is recieved,
it stores the time since request was sent as well as total number of
tokens up till that point.
Changed check to check that token count is > than 0, changed debug
message to say tokens, and divided time by number of tokens received
during that time so it is actually per token
changes stats to implement debug for histogram, update filter_context to
open ttft to stats endpoint and update stream_context to get time
between both of those.