Commit graph

6024 commits

Author SHA1 Message Date
CREDO23
2be3f04df5 chore(scripts): drop one-off MCP session lifetime probe
The probe answered its question (informing the cached_tools persistence
design). Future MCP session-pooling work, if revived, can recreate it.
2026-05-20 19:11:00 +02:00
CREDO23
704d1bf18f refactor(mcp): per-connector cache refresh on lifecycle events
Collapse the invalidate + warmup pair into a single
refresh_mcp_tools_cache_for_connector(connector_id, search_space_id)
helper and scope live discovery to the one connector that changed
instead of the whole search space.

- new mcp_tool.discover_single_mcp_connector: load one connector,
  refresh OAuth if needed, force live MCP discovery so its cached_tools
  row is rewritten; returned wrappers are discarded since the in-process
  LRU is rebuilt lazily on the next user query
- mcp_tools_cache.refresh_mcp_tools_cache_for_connector: synchronously
  evicts the per-space LRU (LRU keys cannot scope finer) and schedules
  the per-connector prefetch via loop.create_task
- routes (OAuth callback, MCP POST, MCP PUT) collapse their two
  back-to-back calls into a single refresh call; DELETE handlers keep
  using bare invalidate_mcp_tools_cache (nothing to prefetch)

No new automated tests: the new functions are I/O glue (DB + network)
where mocked unit tests would test implementation rather than behavior.
The existing 9 unit tests for the cached_tools data shape are unchanged.
2026-05-20 17:43:27 +02:00
Anish Sarkar
58a975205d chore: add workflow file to change detection filters for backend and frontend jobs 2026-05-20 20:06:38 +05:30
Anish Sarkar
a6a0f7a373 chore: streamline GitHub Actions workflow by removing change detection job and simplifying test conditions 2026-05-20 20:03:51 +05:30
Anish Sarkar
844b8ba609 chore: refactor GitHub Actions workflow to improve backend change detection and job dependencies 2026-05-20 19:42:47 +05:30
CREDO23
c0aa4261ac perf(mcp): persist list_tools discovery in connector.config.cached_tools
Skip the ~1-3s MCP initialize + list_tools handshake on every cache miss
by reading tool definitions from the connector row we already load. Lazy
populate on first miss, self-heal on corrupt cache, zero schema migration.
2026-05-20 16:11:07 +02:00
Anish Sarkar
a786574484 chore: simplify job names in GitHub Actions workflow 2026-05-20 19:29:09 +05:30
Anish Sarkar
2de8ea5501 chore: update biome version in pre-commit configuration 2026-05-20 19:25:39 +05:30
Anish Sarkar
20152b1243 chore: update Node.js version to 20 in GitHub Actions workflow 2026-05-20 18:41:05 +05:30
Anish Sarkar
ff2d621185 chore: fix the pnpm version in GitHub Actions workflow 2026-05-20 18:37:21 +05:30
Rohan Verma
0f98480096
Merge pull request #1419 from MODSetter/dev
Release v0.0.24: UI revamp, multi-agent parallelization, citations & HITL improvements
2026-05-20 03:27:24 -07:00
DESKTOP-RTLN3BA\$punk
f5f2456dfd Merge branch 'dev' of https://github.com/MODSetter/SurfSense into dev 2026-05-20 03:01:49 -07:00
DESKTOP-RTLN3BA\$punk
ed22da7b95 feat: bumped version to 0.0.24 2026-05-20 03:01:37 -07:00
Anish Sarkar
39c29d651f feat: enhance token display in MessageInfoDropdown with improved visual separation 2026-05-20 15:29:41 +05:30
CREDO23
db8bffab38 perf(prompt-cache): enable Azure prompt_cache_key routing hint
Splits the OpenAI-family gate into per-param predicates so AZURE and
AZURE_OPENAI configs now receive prompt_cache_key for backend routing
affinity (Microsoft auto-caches GPT-4o+ deployments at >=1024 tokens;
the key clusters same-prefix requests on the same GPU pool and raises
hit rate on turn 2+). prompt_cache_retention stays opted out for Azure
because litellm 1.83.14's Azure transformer would drop it silently;
revisit when Azure's supported params list is updated.
2026-05-20 11:58:15 +02:00
CREDO23
71dead0406 perf(kb-planner): route internal planner calls to dedicated small/fast LLM
Adds an optional planner LLM role wired through KnowledgePriorityMiddleware
so KB query rewriting, date extraction, and recency classification run on a
cheap model (e.g. gpt-4o-mini, Haiku, Azure nano) instead of the user's
chat LLM. Operators opt in by setting is_planner: true on exactly one
global config; without it, behavior is unchanged.
2026-05-20 11:42:52 +02:00
Anish Sarkar
8c9be9796a feat: add no-update sentinel handling to save_memory function and corresponding unit tests 2026-05-20 15:03:35 +05:30
CREDO23
c3db25302b perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery
The preflight pattern probed the LLM with a 1-token ping before each
cold turn (when requested_llm_config_id==0, llm_config_id<0, and the
45s healthy TTL had expired) to detect 429s before fanning out into
planner/classifier/title-gen. To absorb its ~1-5s RTT cost we built the
agent speculatively in parallel; on 429 we discarded the build and
repinned.

Three problems with that design:

1. False security. Provider rate limits are token-bucket. A 1-token
   ping consumes ~5 tokens; the real request consumes 10-50K. The
   probe can return 200 while the real call still 429s.
2. Pure overhead in the common case. On warm-agent-cache turns the
   probe dominates wall time: ~2.5s of TTFT pure tax for ~99% of users
   who never see a 429.
3. The in-stream recovery loop (catch of _is_provider_rate_limited
   gated by not _first_event_logged) already does the right thing
   reactively: mark_runtime_cooldown -> resolve_or_get_pinned_llm_config_id
   with exclude_config_ids={previous} -> rebuild agent -> retry the
   stream. Preflight was never the only safety net; it was a redundant
   probe in front of one.

Changes:
- Delete _preflight_llm, _settle_speculative_agent_build, and the
  _PREFLIGHT_TIMEOUT_SEC / _PREFLIGHT_MAX_TOKENS constants.
- Drop the parallel agent_build_task / preflight_task plumbing in
  both stream_new_chat and stream_resume_chat; build the agent inline
  with await _build_main_agent_for_thread(...).
- Drop the unused is_recently_healthy / mark_healthy imports here
  (still exported from auto_model_pin_service since OpenRouter
  catalogue refresh and a few tests reference clear_healthy).
- Remove the obsolete preflight + settle-speculative tests from
  test_stream_new_chat_contract.py.

Net: -447 LOC. ~2.5s removed from TTFT on every cold preflight-eligible
turn. 429 recovery path is unchanged - same repin/rebuild/retry, just
not paid in advance on the healthy path.
2026-05-20 11:03:08 +02:00
Anish Sarkar
132e7b3c44 refactor: remove memory extraction functions and related components from the new chat agent 2026-05-20 14:03:28 +05:30
Rohan Verma
4774105015
Merge pull request #1413 from mvanhorn/fix/1359-shared-isMobile-hook
fix: use shared useIsMobile (768px) in SidebarSlideOutPanel (#1359)
2026-05-20 01:32:04 -07:00
DESKTOP-RTLN3BA\$punk
b285293b4e fix: docker one click setup 2026-05-20 01:25:07 -07:00
CREDO23
1791241c0c perf(indexers): offload sync embed_text to thread across background workers
Connector kb_sync_services (gmail, onedrive, google_calendar, jira),
streaming indexers (discord, luma, teams) and the file-processor save
path all called embed_text inside async coroutines, blocking the
background worker's event loop for the duration of the embed. Wrap each
call site in asyncio.to_thread so concurrent indexing tasks stop
serialising on the embed.
2026-05-20 10:09:38 +02:00
CREDO23
a8de98895a perf(revert-service): offload sync embed_texts to thread
_restore_in_place_document and _reinsert_document_from_revision are
async helpers invoked by the synchronous-feeling POST /api/threads/.../revert
route; both ran embed_texts inline, blocking the event loop while the
HTTP client waited.
2026-05-20 10:04:26 +02:00
CREDO23
a3d6fa6196 perf(document-converters): offload sync embed_text/embed_texts to thread
generate_document_summary and create_document_chunks are async helpers
called from the chat path and from many connector indexers. Both wrapped
embed_text/embed_texts directly inside the coroutine, blocking the event
loop for the full duration of the embedding call.
2026-05-20 10:03:42 +02:00
CREDO23
52d425f170 perf(kb-persistence): offload sync embed_texts to thread
_create_document and _update_document run on the chat critical path
when the filesystem subagent writes via the user's chat turn. Both
called embed_texts synchronously inside an async coroutine, blocking
the event loop for the duration of the embed.
2026-05-20 10:03:14 +02:00
CREDO23
4fa85a9a94 perf(kb-search): offload sync embed_texts to thread
embed_texts holds a threading.Lock and runs a sync embedding call inside
search_knowledge_base, an async coroutine on the KB priority middleware
critical path. Blocking the event loop here stalls every other coroutine
on the worker (SSE keepalives, concurrent chat requests, background
tasks). Wrap in asyncio.to_thread so the embed runs on the default
executor pool while the loop keeps serving.
2026-05-20 10:02:38 +02:00
CREDO23
32f6766cb6 fix(tokens): use canonical prompt_tokens_details path for cache fields
LiteLLM normalizes every provider's cache fields onto
usage.prompt_tokens_details (cached_tokens + cache_creation_tokens).
The earlier fallback to usage.cache_read_input_tokens /
usage.cache_creation_input_tokens was wrong: Anthropic-shaped fields
only live there via a trailing setattr loop, and the canonical field
name on the wrapper is cache_creation_tokens (not _input_tokens).
2026-05-20 09:55:39 +02:00
CREDO23
6090980c5e obs(tokens): log prompt-cache read/write counts and hit ratio per LLM call 2026-05-20 09:51:44 +02:00
Anish Sarkar
a0ff86e0e8 feat: add memory document model and parsing functionality for markdown handling 2026-05-20 13:20:05 +05:30
CREDO23
0cdda14922 perf(kb subagent, desktop): cap evidence.content_excerpt to 500 chars 2026-05-20 09:43:36 +02:00
CREDO23
5edf0520c4 perf(kb subagent, cloud): cap evidence.content_excerpt to 500 chars 2026-05-20 09:43:32 +02:00
CREDO23
b554c600bb perf(research subagent): cap evidence.findings and evidence.sources to bound output 2026-05-20 09:42:57 +02:00
CREDO23
6c173dc2a7 perf(teams subagent): stop echoing raw teams/channels/messages payload into evidence.items 2026-05-20 09:42:03 +02:00
CREDO23
20f7896a99 perf(luma subagent): stop echoing raw events list into evidence.items 2026-05-20 09:41:47 +02:00
CREDO23
f4e66718be perf(discord subagent): stop echoing raw channels/messages payload into evidence.items 2026-05-20 09:41:36 +02:00
CREDO23
56d8ff89e2 perf(airtable subagent): stop echoing raw records list into evidence.items 2026-05-20 09:41:18 +02:00
CREDO23
1b2f13e25c perf(clickup subagent): stop echoing raw tasks list into evidence.items 2026-05-20 09:41:04 +02:00
CREDO23
6be1b22ef6 perf(jira subagent): stop echoing raw issues list into evidence.items 2026-05-20 09:40:48 +02:00
CREDO23
6e5dd54bbf perf(slack subagent): stop echoing raw messages list into evidence.items 2026-05-20 09:40:33 +02:00
CREDO23
d3d396a473 perf(linear subagent): stop echoing raw issues list into evidence.items 2026-05-20 09:40:18 +02:00
CREDO23
553becea28 perf(gmail subagent): stop echoing raw emails array into evidence.items 2026-05-20 09:40:00 +02:00
Anish Sarkar
fe07de3f9c chore: ran linting 2026-05-20 12:55:10 +05:30
Anish Sarkar
78a3c71bb5 feat: implement memory document fetching and saving functionality in the editor panel, and remove deprecated memory hook 2026-05-20 12:50:15 +05:30
Anish Sarkar
61234e125f refactor: remove team memory components and related settings from user and search space settings 2026-05-20 12:01:26 +05:30
Anish Sarkar
659007bc4d feat: enhance document export functionality for memory documents and update UI components 2026-05-20 11:57:31 +05:30
Anish Sarkar
43c8aaeaa7 feat: improve editor panel markdown handling and update fixed toolbar positioning 2026-05-20 11:52:41 +05:30
Varun Shukla
f2d2ac3f4a refactor(env): replace inline process.env reads with BACKEND_URL in tool-ui generators 2026-05-20 03:38:32 +05:30
Varun Shukla
fead3a64f4 refactor(env): replace inline process.env reads with BACKEND_URL in editor, chat, dashboard and settings 2026-05-20 03:34:22 +05:30
Varun Shukla
81ce9e4071
Merge branch 'dev' into fix/env-config-connector-forms 2026-05-20 03:26:12 +05:30
Anish Sarkar
73043a0756 feat: enhance memory API responses with limits and update UI components for memory limit handling 2026-05-20 03:17:05 +05:30