SurfSense/surfsense_backend/tests
CREDO23 c3db25302b perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery
The preflight pattern probed the LLM with a 1-token ping before each
cold turn (when requested_llm_config_id==0, llm_config_id<0, and the
45s healthy TTL had expired) to detect 429s before fanning out into
planner/classifier/title-gen. To absorb its ~1-5s RTT cost we built the
agent speculatively in parallel; on 429 we discarded the build and
repinned.

Three problems with that design:

1. False security. Provider rate limits are token-bucket. A 1-token
   ping consumes ~5 tokens; the real request consumes 10-50K. The
   probe can return 200 while the real call still 429s.
2. Pure overhead in the common case. On warm-agent-cache turns the
   probe dominates wall time: ~2.5s of TTFT pure tax for ~99% of users
   who never see a 429.
3. The in-stream recovery loop (catch of _is_provider_rate_limited
   gated by not _first_event_logged) already does the right thing
   reactively: mark_runtime_cooldown -> resolve_or_get_pinned_llm_config_id
   with exclude_config_ids={previous} -> rebuild agent -> retry the
   stream. Preflight was never the only safety net; it was a redundant
   probe in front of one.

Changes:
- Delete _preflight_llm, _settle_speculative_agent_build, and the
  _PREFLIGHT_TIMEOUT_SEC / _PREFLIGHT_MAX_TOKENS constants.
- Drop the parallel agent_build_task / preflight_task plumbing in
  both stream_new_chat and stream_resume_chat; build the agent inline
  with await _build_main_agent_for_thread(...).
- Drop the unused is_recently_healthy / mark_healthy imports here
  (still exported from auto_model_pin_service since OpenRouter
  catalogue refresh and a few tests reference clear_healthy).
- Remove the obsolete preflight + settle-speculative tests from
  test_stream_new_chat_contract.py.

Net: -447 LOC. ~2.5s removed from TTFT on every cold preflight-eligible
turn. 429 recovery path is unchanged - same repin/rebuild/retry, just
not paid in advance on the healthy path.
2026-05-20 11:03:08 +02:00
..
e2e feat(tests): add mock response for file ownership in composio_module 2026-05-16 20:20:04 +05:30
fixtures remove stale ElectricSQL references from changelog and test fixtures 2026-03-24 17:07:11 +02:00
integration test: refactor Gmail indexer tests to utilize ComposioService and hybrid chunking 2026-05-16 21:26:40 +05:30
unit perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery 2026-05-20 11:03:08 +02:00
utils feat: implement task dispatcher for document processing 2026-02-26 23:55:47 +05:30
__init__.py feat: Add end-to-end tests for document upload pipeline and shared test utilities 2026-02-25 16:39:45 +05:30
conftest.py feat: add should_summarize parameter to task dispatchers 2026-02-26 19:12:37 -08:00