feat: made chat fast

- Introduced lazy knowledge base retrieval mode, allowing the main agent to fetch KB content on demand via the `search_knowledge_base` tool, improving performance by skipping expensive pre-injection processes.
- Added cross-thread caching capability, enabling reuse of compiled graphs across different user chats, reducing latency for returning users.
- Updated middleware to support new lazy loading and caching features, ensuring efficient resource utilization and improved response times.
- Enhanced logging for performance tracking during knowledge retrieval and agent interactions.
This commit is contained in:
DESKTOP-RTLN3BA\$punk 2026-06-09 04:45:17 -07:00
parent ce952d2ad1
commit 41ff57101c
32 changed files with 979 additions and 169 deletions

View file

@ -362,6 +362,13 @@ LANGSMITH_PROJECT=surfsense
# SURFSENSE_ENABLE_SPECIALIZED_SUBAGENTS=false
# SURFSENSE_ENABLE_KB_PLANNER_RUNNABLE=false
# KB retrieval mode (default OFF = lazy). When OFF, the main agent retrieves
# KB content on demand via the `search_knowledge_base` tool and skips the
# expensive per-turn pre-injection (planner LLM + embed + hybrid search,
# ~2.3s); explicit @-mentions are still surfaced cheaply. Set to true to
# restore the original eager `<priority_documents>` pre-injection.
# SURFSENSE_ENABLE_KB_PRIORITY_PREINJECTION=false
# Snapshot / revert
# SURFSENSE_ENABLE_ACTION_LOG=false
# SURFSENSE_ENABLE_REVERT_ROUTE=false # Backend-only; flip when UI ships
@ -382,6 +389,15 @@ LANGSMITH_PROJECT=surfsense
# rollback if you suspect cache-related staleness.
# SURFSENSE_ENABLE_AGENT_CACHE=true
# Cross-thread reuse (default ON). Drops thread_id from the cache key so a
# returning user's NEW chats (same user + search space + config + visibility)
# hit the already-compiled graph instead of paying a fresh ~4-5s compile —
# turning a cold first turn into a warm one. Safe because ActionLog,
# KB-persistence, and the deliverables tools now resolve the chat thread from
# the live RunnableConfig at call time rather than a build-time closure. Flip
# OFF to fall back to a per-thread cache key (instant rollback).
# SURFSENSE_ENABLE_CROSS_THREAD_AGENT_CACHE=true
# Cache capacity (max number of compiled-agent entries kept in memory)
# and TTL per entry (seconds). Working set is typically one entry per
# active thread on this replica; tune up for very large deployments.