mirror of
https://github.com/katanemo/plano.git
synced 2026-05-27 14:17:15 +02:00
By default, use cheap len/4 estimate for input token counting (metrics and ratelimit). When enable_token_counting is set to true in overrides, use tiktoken BPE for exact counts. This eliminates ~80ms of per-request latency from tiktoken in the WASM filter while keeping metrics and ratelimit functional. Made-with: Cursor |
||
|---|---|---|
| .. | ||
| api | ||
| traces | ||
| configuration.rs | ||
| consts.rs | ||
| errors.rs | ||
| http.rs | ||
| lib.rs | ||
| llm_providers.rs | ||
| path.rs | ||
| pii.rs | ||
| ratelimit.rs | ||
| routing.rs | ||
| stats.rs | ||
| tokenizer.rs | ||
| tracing.rs | ||
| utils.rs | ||