Commit graph

10 commits

Author SHA1 Message Date
DESKTOP-RTLN3BA\$punk
94e834134f chore: linting 2026-05-28 19:21:29 -07:00
CREDO23
cf0085575c refactor(chat): add streaming/flows/resume_chat/orchestrator + flows public API
Slim composition root for the resume-chat streaming flow. Mirrors the
new_chat orchestrator but specialized for resumed turns:

* no fresh user turn, no title generation, no image-capability gate
* persists a fresh assistant shell for the resumed turn
* applies build_resume_routing to dispatch user decisions to the
  correct paused subagent before invoking the agent
* shares the same stream_loop + flow-local _recover closure for in-
  stream provider rate-limit recovery

Also lands flows/__init__.py, which becomes the public chat-flow API:

    from app.tasks.chat.streaming.flows import stream_new_chat, stream_resume_chat

Existing wiring (routes, contract test) still imports from the legacy
app.tasks.chat.stream_new_chat module. Cutover is the next phase.
2026-05-25 21:50:09 +02:00
CREDO23
885d4acda9 refactor(chat): add streaming/flows/resume_chat/ per-concern leaf modules
Three focused modules used by the upcoming resume-chat orchestrator:

* runtime_context: build_resume_chat_runtime_context assembles the
  SurfSenseContextSchema for a resume turn (handles empty mention
  lists, since resume requests do not carry fresh @-mentions).
* assistant_shell: persist_resume_assistant_shell writes a fresh
  assistant row for the resumed turn so the post-stream finalize
  has a target.
* resume_routing: build_resume_routing collects the pending
  interrupts across paused subagents and slices the flat list of
  ResumeDecision[] into the correct (thread, subagent) buckets so
  LangGraph routes each decision back to the right paused tool call.

Add-only; no orchestrator yet (next commit).
2026-05-25 21:50:03 +02:00
CREDO23
b2a0888588 refactor(chat): add streaming/flows/new_chat/orchestrator.stream_new_chat
Slim composition root for the new-chat streaming flow. Sequences:

1. validate inputs and load the LLM bundle (negative id => YAML)
2. open the OTEL chat_request span; set agent_mode tag
3. spawn the four pre-stream DB writes (set-ai-responding, persist
   user turn, persist assistant shell, first-assistant probe)
4. reserve premium quota (with free-fallback retry on denial)
5. build connector + checkpointer + agent + input_state
6. emit first frames (message-start, step-start, initial thinking step)
7. spawn the background title generator
8. run the shared stream_loop with a flow-local _recover closure that
   reroutes to the next auto-pin config on provider 429s
9. finalize: emit terminal title/token frames, shielded assistant
   finalize, release-or-finalize premium quota, close session, GC,
   record OTEL outcome

Public entry-point flows/new_chat/__init__ re-exports stream_new_chat.

Existing wiring (routes, tests) still imports the legacy function from
app.tasks.chat.stream_new_chat. Cutover is a later commit.
2026-05-25 21:49:55 +02:00
CREDO23
927009745e refactor(chat): add streaming/flows/new_chat/ per-concern leaf modules
Seven focused modules that the upcoming new_chat orchestrator
composes:

* auto_pin: resolve_initial_auto_pin selects the initial config (with
  vision-capable filtering and error classification).
* llm_capability: check_image_input_capability blocks routing an
  image-bearing turn to a known text-only model.
* runtime_context: build_new_chat_runtime_context assembles the
  SurfSenseContextSchema for a new-chat turn.
* persistence_spawn: spawn_set_ai_responding_bg, spawn_persist_user_task,
  spawn_persist_assistant_shell_task, and await_persist_task background
  the four pre-stream DB writes so they overlap with agent build.
* initial_thinking_step: build_initial_thinking_step +
  iter_initial_thinking_step_frame produce the very first thinking-1 SSE
  step ("Understanding your request" / "Analyzing referenced content").
* title_gen: spawn_title_task + maybe_emit_title_update +
  await_pending_title_update background the thread-title generator and
  interleave its update into the stream when ready.
* input_state: build_new_chat_input_state assembles the LangGraph
  input_state (history bootstrap, mentions resolution, context blocks,
  human-message construction). The heavy one.

Add-only; no orchestrator yet (next commit).
2026-05-25 21:49:45 +02:00
CREDO23
21bddc73a7 refactor(chat): add streaming/flows/shared/assistant_finalize.py
Extracts finalize_assistant_message: the post-stream server-side write
of the final assistant message (with content parts + token usage)
guarded by asyncio.shield + shielded_async_session so a client
disconnect cannot abort the persist.

Add-only; legacy stream_new_chat.py keeps its inline finalize block
until cutover.
2026-05-25 21:49:31 +02:00
CREDO23
b54b803dc9 refactor(chat): add streaming/flows/shared/ rate-limit recovery + stream loop
Two cooperating modules that wrap stream_agent_events with in-stream
recovery from provider 429s:

* rate_limit_recovery: can_recover_provider_rate_limit truth-table
  guard, reroute_to_next_auto_pin (selects the next eligible auto-pin
  config and reloads the LLM bundle), log_rate_limit_recovered.
* stream_loop: run_stream_loop drives stream_agent_events in a
  while-True loop, delegating recovery to a flow-supplied RecoverFn
  callback so new_chat and resume_chat can share the same loop while
  keeping their own nonlocal state.

Add-only; not yet wired into any orchestrator.
2026-05-25 21:49:27 +02:00
CREDO23
2c3edb7c84 refactor(chat): add streaming/flows/shared/terminal_error.py
Extracts handle_terminal_exception: the shared except-branch behavior for
the chat orchestrators. Classifies the raised exception, logs the
structured chat_stream error event, and emits the terminal-error SSE
frame + done sentinel via the streaming service.

Add-only; nothing imports it yet.
2026-05-25 21:49:18 +02:00
CREDO23
40300d300a refactor(chat): add streaming/flows/shared/premium_quota.py
Centralizes the premium-credits lifecycle for chat turns:

* needs_premium_quota: gate check (premium user + non-fallback config).
* PremiumReservation: dataclass capturing reservation state + token totals.
* reserve_premium / finalize_premium / release_premium: idempotent
  reservation, commit, and rollback used by the orchestrators.

Add-only; legacy stream_new_chat.py keeps its inline quota handling
until cutover.
2026-05-25 21:49:14 +02:00
CREDO23
e9a98ecafb refactor(chat): add streaming/flows/shared/ base helpers
Six small, single-purpose modules shared by the upcoming new_chat and
resume_chat orchestrators:

* llm_bundle: dispatches negative config_id to the YAML loader and
  non-negative config_id to the DB loader, returning (llm, AgentConfig).
* pre_stream_setup: builds the connector service, resolves the
  Firecrawl API key, and returns the chat checkpointer.
* first_frames: iter_initial_frames + iter_final_frames emit the canonical
  message-start / step-start / idle / finish / done SSE envelope.
* finalize_emit: iter_token_usage_frame emits the per-turn usage frame
  from a TokenAccumulator summary.
* finally_cleanup: close_session_and_clear_ai_responding and run_gc_pass
  centralize the finally-block bookkeeping.
* span: open_chat_request_span / set_agent_mode / close_chat_request_span /
  record_outcome_attrs wrap the OpenTelemetry chat_request span.

Add-only; these are not yet wired into stream_new_chat.py.
2026-05-25 21:49:09 +02:00