SurfSense/surfsense_backend
CREDO23 c3db25302b perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery
The preflight pattern probed the LLM with a 1-token ping before each
cold turn (when requested_llm_config_id==0, llm_config_id<0, and the
45s healthy TTL had expired) to detect 429s before fanning out into
planner/classifier/title-gen. To absorb its ~1-5s RTT cost we built the
agent speculatively in parallel; on 429 we discarded the build and
repinned.

Three problems with that design:

1. False security. Provider rate limits are token-bucket. A 1-token
   ping consumes ~5 tokens; the real request consumes 10-50K. The
   probe can return 200 while the real call still 429s.
2. Pure overhead in the common case. On warm-agent-cache turns the
   probe dominates wall time: ~2.5s of TTFT pure tax for ~99% of users
   who never see a 429.
3. The in-stream recovery loop (catch of _is_provider_rate_limited
   gated by not _first_event_logged) already does the right thing
   reactively: mark_runtime_cooldown -> resolve_or_get_pinned_llm_config_id
   with exclude_config_ids={previous} -> rebuild agent -> retry the
   stream. Preflight was never the only safety net; it was a redundant
   probe in front of one.

Changes:
- Delete _preflight_llm, _settle_speculative_agent_build, and the
  _PREFLIGHT_TIMEOUT_SEC / _PREFLIGHT_MAX_TOKENS constants.
- Drop the parallel agent_build_task / preflight_task plumbing in
  both stream_new_chat and stream_resume_chat; build the agent inline
  with await _build_main_agent_for_thread(...).
- Drop the unused is_recently_healthy / mark_healthy imports here
  (still exported from auto_model_pin_service since OpenRouter
  catalogue refresh and a few tests reference clear_healthy).
- Remove the obsolete preflight + settle-speculative tests from
  test_stream_new_chat_contract.py.

Net: -447 LOC. ~2.5s removed from TTFT on every cold preflight-eligible
turn. 429 recovery path is unchanged - same repin/rebuild/retry, just
not paid in advance on the healthy path.
2026-05-20 11:03:08 +02:00
..
alembic Merge remote-tracking branch 'upstream/dev' into fix/zero-cache-stale-replica-1355 2026-05-16 19:30:09 +05:30
app perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery 2026-05-20 11:03:08 +02:00
scripts chore(scripts): add MCP session lifetime probe 2026-05-19 21:30:34 +02:00
tests perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery 2026-05-20 11:03:08 +02:00
.dockerignore chore(backend): exclude tests/ from production Docker image 2026-05-06 17:16:22 +05:30
.env.example refactor(chat): stream agent events via stream_output and remove parity v2 flag 2026-05-07 19:40:10 +02:00
.gitignore chore: enhance E2E tests by adding synthetic global LLM config and updating environment variables for Google OAuth 2026-05-12 02:37:39 +05:30
.python-version feat: SurfSense v0.0.6 init 2025-03-14 18:53:14 -07:00
alembic.ini add github connector, add alembic for db migrations, fix bug updating connectors 2025-04-13 13:56:22 -07:00
celery_worker.py fix: celery_app path and gmail indexing 2025-10-21 21:11:41 -07:00
Dockerfile chore: update Docker configurations to streamline backend build and enhance E2E testing environment 2026-05-11 12:31:15 +05:30
main.py feat: added configable summary calculation and various improvements 2026-02-26 18:24:57 -08:00
pyproject.toml feat: bumped version to 0.0.23 2026-05-05 19:21:43 -07:00
uv.lock feat: bumped version to 0.0.23 2026-05-05 19:21:43 -07:00