bidirectional streaming for output filter chains

Replace per-chunk HTTP requests to output filters with a single
bidirectional streaming connection per filter. This eliminates
the 50-200+ round-trips per streaming LLM response.

Filters opt in via streaming: true in config. When all output filters
support streaming, brightstaff opens one POST per filter with a streaming
request body (Body::wrap_stream) and reads the streaming response. Filters
that don't opt in fall back to the existing per-chunk behavior.

Updates the PII deanonymizer demo as the reference implementation with
request.stream() + StreamingResponse support.

Made-with: Cursor
This commit is contained in:
Adil Hafeez 2026-03-19 02:27:26 -07:00
parent 1f23c573bf
commit 42d3de8906
10 changed files with 613 additions and 133 deletions

View file

@ -43,6 +43,8 @@ properties:
- streamable-http
tool:
type: string
streaming:
type: boolean
additionalProperties: false
required:
- id