mirror of
https://github.com/katanemo/plano.git
synced 2026-05-27 14:17:15 +02:00
By default, use cheap len/4 estimate for input token counting (metrics and ratelimit). When enable_token_counting is set to true in overrides, use tiktoken BPE for exact counts. This eliminates ~80ms of per-request latency from tiktoken in the WASM filter while keeping metrics and ratelimit functional. Made-with: Cursor |
||
|---|---|---|
| .. | ||
| src | ||
| Cargo.toml | ||