mirror of
https://github.com/katanemo/plano.git
synced 2026-04-27 09:46:28 +02:00
make tiktoken token counting optional via enable_token_counting override
By default, use cheap len/4 estimate for input token counting (metrics and ratelimit). When enable_token_counting is set to true in overrides, use tiktoken BPE for exact counts. This eliminates ~80ms of per-request latency from tiktoken in the WASM filter while keeping metrics and ratelimit functional. Made-with: Cursor
This commit is contained in:
parent
406fa92802
commit
e5f3039924
3 changed files with 19 additions and 8 deletions
|
|
@ -285,6 +285,9 @@ properties:
|
|||
agent_orchestration_model:
|
||||
type: string
|
||||
description: "Model name for the agent orchestrator (e.g., 'Plano-Orchestrator'). Must match a model in model_providers."
|
||||
enable_token_counting:
|
||||
type: boolean
|
||||
description: "Enable tiktoken-based input token counting for metrics and rate limiting. Default is false."
|
||||
system_prompt:
|
||||
type: string
|
||||
prompt_targets:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue