Resolves: surfsense_backend/app/agents/new_chat/middleware/memory_injection.py
- Took both imports: upstream moved MEMORY_HARD_LIMIT/SOFT_LIMIT to
app.services.memory; kept our perf-logger import for timing.
Pulls in upstream changes:
- Memory document feature (services/memory refactor, removal of
app.agents.new_chat.memory_extraction and background extraction in
stream_new_chat — agent now drives memory via update_memory tool).
- BACKEND_URL env refactor across web tool-ui/editor/chat/dashboard/lib.
- GitHub Actions backend test workflow + pre-commit biome bump.
- Token-display polish in MessageInfoDropdown; save_memory no-update
sentinel.
Verified: 1723 unit tests pass, ruff clean. No semantic regression in
stream_new_chat (their memory-extraction deletion and our preflight
removal touch different functions).
Collapse the invalidate + warmup pair into a single
refresh_mcp_tools_cache_for_connector(connector_id, search_space_id)
helper and scope live discovery to the one connector that changed
instead of the whole search space.
- new mcp_tool.discover_single_mcp_connector: load one connector,
refresh OAuth if needed, force live MCP discovery so its cached_tools
row is rewritten; returned wrappers are discarded since the in-process
LRU is rebuilt lazily on the next user query
- mcp_tools_cache.refresh_mcp_tools_cache_for_connector: synchronously
evicts the per-space LRU (LRU keys cannot scope finer) and schedules
the per-connector prefetch via loop.create_task
- routes (OAuth callback, MCP POST, MCP PUT) collapse their two
back-to-back calls into a single refresh call; DELETE handlers keep
using bare invalidate_mcp_tools_cache (nothing to prefetch)
No new automated tests: the new functions are I/O glue (DB + network)
where mocked unit tests would test implementation rather than behavior.
The existing 9 unit tests for the cached_tools data shape are unchanged.
Skip the ~1-3s MCP initialize + list_tools handshake on every cache miss
by reading tool definitions from the connector row we already load. Lazy
populate on first miss, self-heal on corrupt cache, zero schema migration.
Splits the OpenAI-family gate into per-param predicates so AZURE and
AZURE_OPENAI configs now receive prompt_cache_key for backend routing
affinity (Microsoft auto-caches GPT-4o+ deployments at >=1024 tokens;
the key clusters same-prefix requests on the same GPU pool and raises
hit rate on turn 2+). prompt_cache_retention stays opted out for Azure
because litellm 1.83.14's Azure transformer would drop it silently;
revisit when Azure's supported params list is updated.
Adds an optional planner LLM role wired through KnowledgePriorityMiddleware
so KB query rewriting, date extraction, and recency classification run on a
cheap model (e.g. gpt-4o-mini, Haiku, Azure nano) instead of the user's
chat LLM. Operators opt in by setting is_planner: true on exactly one
global config; without it, behavior is unchanged.