feat: made chat fast

- Introduced lazy knowledge base retrieval mode, allowing the main agent to fetch KB content on demand via the `search_knowledge_base` tool, improving performance by skipping expensive pre-injection processes. - Added cross-thread caching capability, enabling reuse of compiled graphs across different user chats, reducing latency for returning users. - Updated middleware to support new lazy loading and caching features, ensuring efficient resource utilization and improved response times. - Enhanced logging for performance tracking during knowledge retrieval and agent interactions.
2026-06-14 20:55:15 +02:00 · 2026-06-09 04:45:17 -07:00 · 2026-06-09 04:45:17 -07:00 · 41ff57101c
commit 41ff57101c
parent ce952d2ad1
32 changed files with 979 additions and 169 deletions
--- a/surfsense_backend/.env.example
+++ b/surfsense_backend/.env.example
@ -362,6 +362,13 @@ LANGSMITH_PROJECT=surfsense
 # SURFSENSE_ENABLE_SPECIALIZED_SUBAGENTS=false
 # SURFSENSE_ENABLE_KB_PLANNER_RUNNABLE=false

+# KB retrieval mode (default OFF = lazy). When OFF, the main agent retrieves
+# KB content on demand via the `search_knowledge_base` tool and skips the
+# expensive per-turn pre-injection (planner LLM + embed + hybrid search,
+# ~2.3s); explicit @-mentions are still surfaced cheaply. Set to true to
+# restore the original eager `<priority_documents>` pre-injection.
+# SURFSENSE_ENABLE_KB_PRIORITY_PREINJECTION=false
+
 # Snapshot / revert
 # SURFSENSE_ENABLE_ACTION_LOG=false
 # SURFSENSE_ENABLE_REVERT_ROUTE=false        # Backend-only; flip when UI ships
@ -382,6 +389,15 @@ LANGSMITH_PROJECT=surfsense
 # rollback if you suspect cache-related staleness.
 # SURFSENSE_ENABLE_AGENT_CACHE=true

+# Cross-thread reuse (default ON). Drops thread_id from the cache key so a
+# returning user's NEW chats (same user + search space + config + visibility)
+# hit the already-compiled graph instead of paying a fresh ~4-5s compile —
+# turning a cold first turn into a warm one. Safe because ActionLog,
+# KB-persistence, and the deliverables tools now resolve the chat thread from
+# the live RunnableConfig at call time rather than a build-time closure. Flip
+# OFF to fall back to a per-thread cache key (instant rollback).
+# SURFSENSE_ENABLE_CROSS_THREAD_AGENT_CACHE=true
+
 # Cache capacity (max number of compiled-agent entries kept in memory)
 # and TTL per entry (seconds). Working set is typically one entry per
 # active thread on this replica; tune up for very large deployments.