mirror of
https://github.com/katanemo/plano.git
synced 2026-05-10 16:22:42 +02:00
Replace per-chunk HTTP requests to output filters with a single bidirectional streaming connection per filter. This eliminates the 50-200+ round-trips per streaming LLM response. Filters opt in via streaming: true in config. When all output filters support streaming, brightstaff opens one POST per filter with a streaming request body (Body::wrap_stream) and reads the streaming response. Filters that don't opt in fall back to the existing per-chunk behavior. Updates the PII deanonymizer demo as the reference implementation with request.stream() + StreamingResponse support. Made-with: Cursor |
||
|---|---|---|
| .. | ||
| .vscode | ||
| brightstaff | ||
| common | ||
| hermesllm | ||
| llm_gateway | ||
| prompt_gateway | ||
| build.sh | ||
| Cargo.lock | ||
| Cargo.toml | ||