mirror of
https://github.com/katanemo/plano.git
synced 2026-05-05 13:53:03 +02:00
By default, use cheap len/4 estimate for input token counting (metrics and ratelimit). When enable_token_counting is set to true in overrides, use tiktoken BPE for exact counts. This eliminates ~80ms of per-request latency from tiktoken in the WASM filter while keeping metrics and ratelimit functional. Made-with: Cursor |
||
|---|---|---|
| .. | ||
| docker-compose.dev.yaml | ||
| env.list | ||
| envoy.template.yaml | ||
| plano_config_schema.yaml | ||
| README.md | ||
| requirements.txt | ||
| supervisord.conf | ||
| test_passthrough.yaml | ||
| validate_plano_config.sh | ||
Envoy filter code for gateway
Add toolchain
$ rustup target add wasm32-wasip1
Building
$ cargo build --target wasm32-wasip1 --release
Testing
$ cargo test
Local development
-
Build docker image for Plano. Note this needs to be built once.
$ sh build_filter_image.sh -
Build filter binary,
$ cargo build --target wasm32-wasip1 --release -
Start envoy with config.yaml and test,
$ docker compose -f docker-compose.dev.yaml up plano -
dev version of docker-compose file uses following files that are mounted inside the container. That means no docker rebuild is needed if any of these files change. Just restart the container and chagne will be picked up,
- envoy.template.yaml
- intelligent_prompt_gateway.wasm