Two cooperating modules that wrap stream_agent_events with in-stream
recovery from provider 429s:
* rate_limit_recovery: can_recover_provider_rate_limit truth-table
guard, reroute_to_next_auto_pin (selects the next eligible auto-pin
config and reloads the LLM bundle), log_rate_limit_recovered.
* stream_loop: run_stream_loop drives stream_agent_events in a
while-True loop, delegating recovery to a flow-supplied RecoverFn
callback so new_chat and resume_chat can share the same loop while
keeping their own nonlocal state.
Add-only; not yet wired into any orchestrator.
Extracts handle_terminal_exception: the shared except-branch behavior for
the chat orchestrators. Classifies the raised exception, logs the
structured chat_stream error event, and emits the terminal-error SSE
frame + done sentinel via the streaming service.
Add-only; nothing imports it yet.
Centralizes the premium-credits lifecycle for chat turns:
* needs_premium_quota: gate check (premium user + non-fallback config).
* PremiumReservation: dataclass capturing reservation state + token totals.
* reserve_premium / finalize_premium / release_premium: idempotent
reservation, commit, and rollback used by the orchestrators.
Add-only; legacy stream_new_chat.py keeps its inline quota handling
until cutover.
Six small, single-purpose modules shared by the upcoming new_chat and
resume_chat orchestrators:
* llm_bundle: dispatches negative config_id to the YAML loader and
non-negative config_id to the DB loader, returning (llm, AgentConfig).
* pre_stream_setup: builds the connector service, resolves the
Firecrawl API key, and returns the chat checkpointer.
* first_frames: iter_initial_frames + iter_final_frames emit the canonical
message-start / step-start / idle / finish / done SSE envelope.
* finalize_emit: iter_token_usage_frame emits the per-turn usage frame
from a TokenAccumulator summary.
* finally_cleanup: close_session_and_clear_ai_responding and run_gc_pass
centralize the finally-block bookkeeping.
* span: open_chat_request_span / set_agent_mode / close_chat_request_span /
record_outcome_attrs wrap the OpenTelemetry chat_request span.
Add-only; these are not yet wired into stream_new_chat.py.