refactor(chat): cut live chat streaming over to flows orchestrators (cutover)

Flip the live callers (new_chat_routes + gateway/agent_invoke) from the
legacy monolithic app.tasks.chat.stream_new_chat to the side-by-side
app.tasks.chat.streaming.flows orchestrators.

Adds a byte-for-byte differential parity test driving BOTH implementations
on identical, fully-deterministic glue inputs (frozen time/uuid, stubbed
LLM/persistence/agent seams). All glue paths are byte-identical:
  * new: auto-pin failure, LLM-load failure, persist-user fail,
    persist-assistant fail (full initial-frame ordering + handshake),
    pre-stream exception (top-level except path)
  * resume: persist-assistant fail

The differential test also surfaced one INTENTIONAL divergence: on a resume
turn whose auto-pin / LLM-load fails, the monolith crashes with
UnboundLocalError (_resume_premium_request_id read in finally before its
post-early-return definition); the flows version emits a clean terminal
error. The flows path is therefore byte-identical or strictly more correct.

The agent-content stream itself is shared, unforkable code
(stream_output -> EventRelay) so it cannot diverge.

Monolith + old parity test deletion follows in a separate commit.
This commit is contained in:
CREDO23 2026-06-04 14:24:18 +02:00
parent 8faa03889d
commit b9937cf4b1
3 changed files with 459 additions and 2 deletions

View file

@ -16,7 +16,7 @@ from app.gateway.bindings import get_or_create_thread_for_binding
from app.gateway.hitl_filter import DEFAULT_HITL_TOOL_NAMES
from app.gateway.thread_lock import acquire_thread_lock, release_thread_lock
from app.observability.metrics import record_gateway_turn_latency
from app.tasks.chat.stream_new_chat import stream_new_chat
from app.tasks.chat.streaming.flows import stream_new_chat
logger = logging.getLogger(__name__)