mirror of
https://github.com/katanemo/plano.git
synced 2026-04-27 01:36:33 +02:00
Replace per-chunk HTTP requests to output filters with a single bidirectional streaming connection per filter. This eliminates the 50-200+ round-trips per streaming LLM response. Filters opt in via streaming: true in config. When all output filters support streaming, brightstaff opens one POST per filter with a streaming request body (Body::wrap_stream) and reads the streaming response. Filters that don't opt in fall back to the existing per-chunk behavior. Updates the PII deanonymizer demo as the reference implementation with request.stream() + StreamingResponse support. Made-with: Cursor |
||
|---|---|---|
| .. | ||
| docker-compose.dev.yaml | ||
| env.list | ||
| envoy.template.yaml | ||
| plano_config_schema.yaml | ||
| README.md | ||
| requirements.txt | ||
| supervisord.conf | ||
| test_passthrough.yaml | ||
| validate_plano_config.sh | ||
Envoy filter code for gateway
Add toolchain
$ rustup target add wasm32-wasip1
Building
$ cargo build --target wasm32-wasip1 --release
Testing
$ cargo test
Local development
-
Build docker image for Plano. Note this needs to be built once.
$ sh build_filter_image.sh -
Build filter binary,
$ cargo build --target wasm32-wasip1 --release -
Start envoy with config.yaml and test,
$ docker compose -f docker-compose.dev.yaml up plano -
dev version of docker-compose file uses following files that are mounted inside the container. That means no docker rebuild is needed if any of these files change. Just restart the container and chagne will be picked up,
- envoy.template.yaml
- intelligent_prompt_gateway.wasm