Collapse the invalidate + warmup pair into a single
refresh_mcp_tools_cache_for_connector(connector_id, search_space_id)
helper and scope live discovery to the one connector that changed
instead of the whole search space.
- new mcp_tool.discover_single_mcp_connector: load one connector,
refresh OAuth if needed, force live MCP discovery so its cached_tools
row is rewritten; returned wrappers are discarded since the in-process
LRU is rebuilt lazily on the next user query
- mcp_tools_cache.refresh_mcp_tools_cache_for_connector: synchronously
evicts the per-space LRU (LRU keys cannot scope finer) and schedules
the per-connector prefetch via loop.create_task
- routes (OAuth callback, MCP POST, MCP PUT) collapse their two
back-to-back calls into a single refresh call; DELETE handlers keep
using bare invalidate_mcp_tools_cache (nothing to prefetch)
No new automated tests: the new functions are I/O glue (DB + network)
where mocked unit tests would test implementation rather than behavior.
The existing 9 unit tests for the cached_tools data shape are unchanged.
Skip the ~1-3s MCP initialize + list_tools handshake on every cache miss
by reading tool definitions from the connector row we already load. Lazy
populate on first miss, self-heal on corrupt cache, zero schema migration.
Splits the OpenAI-family gate into per-param predicates so AZURE and
AZURE_OPENAI configs now receive prompt_cache_key for backend routing
affinity (Microsoft auto-caches GPT-4o+ deployments at >=1024 tokens;
the key clusters same-prefix requests on the same GPU pool and raises
hit rate on turn 2+). prompt_cache_retention stays opted out for Azure
because litellm 1.83.14's Azure transformer would drop it silently;
revisit when Azure's supported params list is updated.
Adds an optional planner LLM role wired through KnowledgePriorityMiddleware
so KB query rewriting, date extraction, and recency classification run on a
cheap model (e.g. gpt-4o-mini, Haiku, Azure nano) instead of the user's
chat LLM. Operators opt in by setting is_planner: true on exactly one
global config; without it, behavior is unchanged.
_create_document and _update_document run on the chat critical path
when the filesystem subagent writes via the user's chat turn. Both
called embed_texts synchronously inside an async coroutine, blocking
the event loop for the duration of the embed.
embed_texts holds a threading.Lock and runs a sync embedding call inside
search_knowledge_base, an async coroutine on the KB priority middleware
critical path. Blocking the event loop here stalls every other coroutine
on the worker (SSE keepalives, concurrent chat requests, background
tasks). Wrap in asyncio.to_thread so the embed runs on the default
executor pool while the loop keeps serving.
Only MCP tools have a persistence target for 'approve_always' (the
connector's trusted-tools list); for native tools the decision lives
only in the in-memory runtime ruleset. Reflect that in the wire palette
so the FE can stay a pure renderer of allowed_decisions instead of
peeking at context.mcp_connector_id to decide whether to show the
'Always Allow' button.
The backend still accepts an 'approve_always' reply for any tool kind
(in-memory promotion is harmless), it just doesn't advertise it when
there's nowhere to persist.
Renames the SurfSense HITL extension decision-type from "always" to
"approve_always" so it sits in the same verb-first family as "approve",
"reject", and "edit". The Python constant is now SURFSENSE_DECISION_APPROVE_ALWAYS;
the wire value, the permission-domain decision_type, and the FE union members
all match (no wire/internal mismatch).
Both the multi_agent_chat permission middleware and the legacy new_chat one
accept the new wire value; the FE types.ts union is updated accordingly.
The "context.always" payload key is intentionally left untouched - it's the
patterns-to-promote field, semantically distinct from the decision type.
Until now an "Always Allow" reply only updated the in-memory runtime
ruleset, evaporating after the session ended. Persist it to the
existing connector.config['trusted_tools'] list so the next session's
fetch_user_allowlist_rulesets picks it up and the user is never asked
again for the same (connector, tool) pair.
- TrustedToolSaver + make_trusted_tool_saver(user_id) in
user_tool_allowlist: opens its own session via async_session_maker
per call, logs and swallows failures (in-memory promotion is the
canonical "always" path, durable persistence is opportunistic).
- PermissionMiddleware._process is now pure: returns
(state_update, list[_AlwaysPromotion]). aafter_model awaits the
saver for each promotion; after_model discards them. Promotions are
only emitted for tools whose metadata exposes mcp_connector_id, so
native tools and KB FS ops are correctly skipped.
- main_agent factory builds the saver once per turn and stashes it in
dependencies["trusted_tool_saver"]; pack_subagent and the KB
middleware stack forward it through build_permission_mw.
- Renamed pm._process(state, None) call sites in two existing tests to
pm.after_model(state, None) so they exercise the public hook
contract instead of the now-tuple-returning private method.
The FE permission card needs mcp_connector_id, mcp_server, and
tool_description in the interrupt context to render "Always Allow"
against the right connected account. Thread the tool through the
ask pipeline:
- pack_subagent → build_permission_mw(tools=...) → PermissionMiddleware
(tools_by_name) → request_permission_decision(tool=...) →
build_permission_ask_payload(tool=...) projects card fields out of
BaseTool.
- mcp_tool.py: stdio path now stashes mcp_connector_id in metadata for
parity with the HTTP path.