refactor: remove search_surfsense_docs tool and related references

- Deleted the `search_surfsense_docs` tool and its associated files, streamlining the agent's toolset. - Updated various components and prompts to remove references to the now-removed tool, ensuring consistency across the codebase. - Adjusted documentation to direct users to the SurfSense documentation link for product-related queries instead.
2026-07-20 23:21:06 +02:00 · 2026-05-28 22:35:14 -07:00 · 2026-05-28 22:35:14 -07:00 · 40ca9e6ed2
commit 40ca9e6ed2
parent 9b9e6828c7
71 changed files with 232 additions and 1676 deletions
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/citations/on.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/citations/on.md
@ -4,8 +4,8 @@ never invent ids you didn't see. Citation ids are resolved by exact-match
 lookup; a wrong id silently breaks the link, so when in doubt, omit.

 ### Channel A — chunk blocks injected this turn
-When `search_surfsense_docs` or `web_search` returns `<document>` /
-`<chunk id='…'>` blocks in this turn:
+When `web_search` returns `<document>` / `<chunk id='…'>` blocks in this
+turn:

 1. For each factual statement taken from those chunks, add
   `[citation:chunk_id]` using the **exact** id from a visible
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/dynamic_context/private.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/dynamic_context/private.md
@ -20,8 +20,8 @@ it to resolve paths the user describes in natural language ("my Q2 roadmap",
 delegating to a specialist.

 `<document>` and `<chunk id='…'>` blocks are chunked indexed content returned
-by KB search (from `search_surfsense_docs`, or backing `<priority_documents>`).
-Each chunk carries a stable `id` attribute.
+by KB search (backing `<priority_documents>`). Each chunk carries a stable
+`id` attribute.

 If a block doesn't appear this turn, work from the conversation alone.
 </dynamic_context>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/dynamic_context/team.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/dynamic_context/team.md
@ -20,8 +20,8 @@ week's planning notes") into concrete document references before delegating
 to a specialist.

 `<document>` and `<chunk id='…'>` blocks are chunked indexed content returned
-by KB search (from `search_surfsense_docs`, or backing `<priority_documents>`).
-Each chunk carries a stable `id` attribute.
+by KB search (backing `<priority_documents>`). Each chunk carries a stable
+`id` attribute.

 If a block doesn't appear this turn, work from the conversation alone.
 </dynamic_context>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/kb_first.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/kb_first.md
@ -1,19 +1,21 @@
 <knowledge_base_first>
 CRITICAL — ground factual answers in what you actually receive this turn:
 - injected workspace context (see `<dynamic_context>`),
- results from your own tool calls (`search_surfsense_docs`, `web_search`,
-  `scrape_webpage`),
+- results from your own tool calls (`web_search`, `scrape_webpage`),
 - or substantive summaries returned by a `task` specialist you invoked.

 Do **not** answer factual or informational questions from general knowledge
 unless the user explicitly authorises it after you say you couldn't find
 enough in those sources. The flow when nothing is found:

-1. Say you couldn't find enough in their workspace, docs, or tool output.
+1. Say you couldn't find enough in their workspace or tool output.
 2. Ask: *"Would you like me to answer from my general knowledge instead?"*
 3. Only answer from general knowledge after a clear yes.

 This rule does NOT apply to: casual conversation · meta-questions about
 SurfSense ("what can you do?") · formatting or analysis of content already
 in chat · clear rewrite/edit instructions · lightweight web research.
+
+For "how do I use SurfSense" / product-documentation questions, point the
+user to https://www.surfsense.com/docs.
 </knowledge_base_first>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/providers/anthropic.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/providers/anthropic.md
@ -5,7 +5,7 @@ Structured reasoning:
 - For non-trivial work, `<thinking>` / short `<plan>` before tool calls is fine.

 Professional objectivity:
- Accuracy over flattery; verify with **search_surfsense_docs**, **web_search**, **scrape_webpage**, or **task** when unsure — don’t invent connector access.
+- Accuracy over flattery; verify with **web_search**, **scrape_webpage**, or **task** when unsure — don’t invent connector access.

 Task management:
 - For 3+ steps, use todo tooling; update statuses promptly.
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/providers/deepseek.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/providers/deepseek.md
@ -13,6 +13,6 @@ Attribution:

 Tool calls:
 - Parallelise independent calls.
- Prefer **search_surfsense_docs** for SurfSense docs/product questions before **web_search** when that fits the ask.
+- For SurfSense docs/product questions, point the user to https://www.surfsense.com/docs.
 - Don’t invent paths, chunk ids, or URLs — only values from tools or the user.
 </provider_hints>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/providers/google.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/providers/google.md
@ -7,7 +7,7 @@ Output style:
 - GitHub-flavoured Markdown; monospace-friendly.

 Workflow (Understand → Plan → Act → Verify):
-1. **Understand:** parse the ask; use **search_surfsense_docs** / injected workspace context before guessing.
+1. **Understand:** parse the ask; use injected workspace context before guessing.
 2. **Plan:** for multi-step work, a short plan first.
 3. **Act:** only with tools you actually have on this agent (see `<tools>` and `<tool_routing>`). Connector work → **task**.
 4. **Verify:** re-read or re-search only when it materially reduces risk.
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/providers/openai_classic.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/providers/openai_classic.md
@ -15,6 +15,7 @@ Output style:

 Tool calls:
 - Parallelise independent calls in one turn.
- Prefer **search_surfsense_docs** for SurfSense-product questions, **web_search** / **scrape_webpage**
-  for fresh public facts; integrations and heavy workflows → **task**.
+- For SurfSense-product questions, point the user to https://www.surfsense.com/docs;
+  use **web_search** / **scrape_webpage** for fresh public facts; integrations and
+  heavy workflows → **task**.
 </provider_hints>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/routing.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/routing.md
@ -3,10 +3,7 @@ You have two execution channels. Pick the one that owns the work — never
 simulate one with the other.

 ### 1. Direct tools (you call them yourself)
- `search_surfsense_docs` — SurfSense product docs (setup, configuration,
-  connector docs, feature behavior).
- `web_search` — search the public web (anything outside SurfSense docs and
-  the workspace KB).
+- `web_search` — search the public web (anything outside the workspace KB).
 - `scrape_webpage` — fetch the body of a specific public URL.
 - `update_memory` — curate persistent memory (see `<memory_protocol>`).
 - `write_todos` — maintain a structured plan when the turn series spans
@ -14,6 +11,10 @@ simulate one with the other.
  `in_progress` **before** the `task` call that handles it, `completed`
  once the call returns. Skip for single-step requests.

+**Questions about how to use SurfSense itself** (setup, configuration,
+connectors, feature behavior) — point the user to the documentation:
+https://www.surfsense.com/docs. There is no docs-search tool; give the link.
+
 **You have NO filesystem tools.** Any read, write, edit, move, rename, or
 search inside the user's workspace goes through `task(knowledge_base, …)` —
 never via `write_file`, `ls`, or any direct file operation.
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/search_surfsense_docs/init.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/search_surfsense_docs/init.py
@ -1 +0,0 @@
-"""``search_surfsense_docs`` — description + few-shot examples."""
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/search_surfsense_docs/description.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/search_surfsense_docs/description.md
@ -1,10 +0,0 @@
- `search_surfsense_docs` — Search official SurfSense documentation (product
-  help).
-  - Use when the user asks how SurfSense itself works — setup, configuration,
-    connector documentation, feature behavior, anything covered in the
-    product docs.
-  - Not a substitute for `task` when the user wants actions inside a
-    connected service (Gmail, Slack, Jira, Notion, etc.).
-  - Args: `query`, `top_k` (default 10).
-  - Returns doc excerpts; chunk ids may appear for attribution — see
-    `<citations>` for the contract.
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/search_surfsense_docs/example.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/search_surfsense_docs/example.md
@ -1,15 +0,0 @@
-<example>
-user: "How do I install SurfSense?"
-→ search_surfsense_docs(query="installation setup")
-</example>
-
-<example>
-user: "What connectors does SurfSense support?"
-→ search_surfsense_docs(query="available connectors integrations")
-</example>
-
-<example>
-user: "How do I set up the Notion connector?"
-→ search_surfsense_docs(query="Notion connector setup configuration")
-(Changing data inside Notion itself → `task(notion, …)`, not this tool.)
-</example>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/index.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/index.py
@ -6,7 +6,6 @@ Connector integrations, MCP, deliverables, etc. are delegated via ``task`` subag
 from __future__ import annotations

 MAIN_AGENT_SURFSENSE_TOOL_NAMES_ORDERED: tuple[str, ...] = (
-    "search_surfsense_docs",
    "web_search",
    "scrape_webpage",
    "update_memory",
--- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/research/system_prompt.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/research/system_prompt.md
@ -8,7 +8,6 @@ Gather and synthesize evidence using SurfSense research tools with clear citatio
 <available_tools>
 - `web_search`
 - `scrape_webpage`
- `search_surfsense_docs`
 </available_tools>

 <tool_policy>
--- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/research/tools/init.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/research/tools/init.py
@ -1,11 +1,9 @@
-"""Research-stage tools: web search, scrape, and in-product doc search."""
+"""Research-stage tools: web search and scrape."""

 from .scrape_webpage import create_scrape_webpage_tool
-from .search_surfsense_docs import create_search_surfsense_docs_tool
 from .web_search import create_web_search_tool

 __all__ = [
    "create_scrape_webpage_tool",
-    "create_search_surfsense_docs_tool",
    "create_web_search_tool",
 ]
--- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/research/tools/index.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/research/tools/index.py
@ -9,7 +9,6 @@ from langchain_core.tools import BaseTool
 from app.agents.new_chat.permissions import Ruleset

 from .scrape_webpage import create_scrape_webpage_tool
-from .search_surfsense_docs import create_search_surfsense_docs_tool
 from .web_search import create_web_search_tool

 NAME = "research"
@ -27,5 +26,4 @@ def load_tools(
            available_connectors=d.get("available_connectors"),
        ),
        create_scrape_webpage_tool(firecrawl_api_key=d.get("firecrawl_api_key")),
-        create_search_surfsense_docs_tool(db_session=d["db_session"]),
    ]
--- a/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/research/tools/search_surfsense_docs.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/subagents/builtins/research/tools/search_surfsense_docs.py
@ -1,145 +0,0 @@
-"""Semantic search over pre-indexed in-app documentation chunks for user how-to questions."""
-
-import asyncio
-import json
-
-from langchain_core.tools import tool
-from sqlalchemy import select
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from app.db import SurfsenseDocsChunk, SurfsenseDocsDocument
-from app.utils.document_converters import embed_text
-from app.utils.surfsense_docs import surfsense_docs_public_url
-
-
-def format_surfsense_docs_results(results: list[tuple]) -> str:
-    """Format (chunk, document) rows as XML with ``doc-`` chunk IDs for citations and UI routing."""
-    if not results:
-        return "No relevant Surfsense documentation found for your query."
-
-    # Group chunks by document
-    grouped: dict[int, dict] = {}
-    for chunk, doc in results:
-        public_url = surfsense_docs_public_url(doc.source)
-        if doc.id not in grouped:
-            grouped[doc.id] = {
-                "document_id": f"doc-{doc.id}",
-                "document_type": "SURFSENSE_DOCS",
-                "title": doc.title,
-                "url": public_url,
-                "metadata": {"source": doc.source, "public_url": public_url},
-                "chunks": [],
-            }
-        grouped[doc.id]["chunks"].append(
-            {
-                "chunk_id": f"doc-{chunk.id}",
-                "content": chunk.content,
-            }
-        )
-
-    # Render XML matching format_documents_for_context structure
-    parts: list[str] = []
-    for g in grouped.values():
-        metadata_json = json.dumps(g["metadata"], ensure_ascii=False)
-
-        parts.append("<document>")
-        parts.append("<document_metadata>")
-        parts.append(f"  <document_id>{g['document_id']}</document_id>")
-        parts.append(f"  <document_type>{g['document_type']}</document_type>")
-        parts.append(f"  <title><![CDATA[{g['title']}]]></title>")
-        parts.append(f"  <url><![CDATA[{g['url']}]]></url>")
-        parts.append(f"  <metadata_json><![CDATA[{metadata_json}]]></metadata_json>")
-        parts.append("</document_metadata>")
-        parts.append("")
-        parts.append("<document_content>")
-
-        for ch in g["chunks"]:
-            parts.append(
-                f"  <chunk id='{ch['chunk_id']}'><![CDATA[{ch['content']}]]></chunk>"
-            )
-
-        parts.append("</document_content>")
-        parts.append("</document>")
-        parts.append("")
-
-    return "\n".join(parts).strip()
-
-
-async def search_surfsense_docs_async(
-    query: str,
-    db_session: AsyncSession,
-    top_k: int = 10,
-) -> str:
-    """
-    Search Surfsense documentation using vector similarity.
-
-    Args:
-        query: The search query about Surfsense usage
-        db_session: Database session for executing queries
-        top_k: Number of results to return
-
-    Returns:
-        Formatted string with relevant documentation content
-    """
-    # Get embedding for the query
-    query_embedding = await asyncio.to_thread(embed_text, query)
-
-    # Vector similarity search on chunks, joining with documents
-    stmt = (
-        select(SurfsenseDocsChunk, SurfsenseDocsDocument)
-        .join(
-            SurfsenseDocsDocument,
-            SurfsenseDocsChunk.document_id == SurfsenseDocsDocument.id,
-        )
-        .order_by(SurfsenseDocsChunk.embedding.op("<=>")(query_embedding))
-        .limit(top_k)
-    )
-
-    result = await db_session.execute(stmt)
-    rows = result.all()
-
-    return format_surfsense_docs_results(rows)
-
-
-def create_search_surfsense_docs_tool(db_session: AsyncSession):
-    """
-    Factory function to create the search_surfsense_docs tool.
-
-    Args:
-        db_session: Database session for executing queries
-
-    Returns:
-        A configured tool function for searching Surfsense documentation
-    """
-
-    @tool
-    async def search_surfsense_docs(query: str, top_k: int = 10) -> str:
-        """
-        Search Surfsense documentation for help with using the application.
-
-        Use this tool when the user asks questions about:
-        - How to use Surfsense features
-        - Installation and setup instructions
-        - Configuration options and settings
-        - Troubleshooting common issues
-        - Available connectors and integrations
-        - Browser extension usage
-        - API documentation
-
-        This searches the official Surfsense documentation that was indexed
-        at deployment time. It does NOT search the user's personal knowledge base.
-
-        Args:
-            query: The search query about Surfsense usage or features
-            top_k: Number of documentation chunks to retrieve (default: 10)
-
-        Returns:
-            Relevant documentation content formatted with chunk IDs for citations
-        """
-        return await search_surfsense_docs_async(
-            query=query,
-            db_session=db_session,
-            top_k=top_k,
-        )
-
-    return search_surfsense_docs
--- a/surfsense_backend/app/agents/new_chat/feature_flags.py
+++ b/surfsense_backend/app/agents/new_chat/feature_flags.py
@ -104,7 +104,7 @@ class AgentFeatureFlags:
    # ``tools/google_drive``, ``tools/dropbox``, ``tools/onedrive``,
    # ``tools/google_calendar``, ``tools/confluence``, ``tools/discord``,
    # ``tools/teams``, ``tools/luma``, ``connected_accounts``,
-    # ``update_memory``, ``search_surfsense_docs``) now acquire fresh
+    # ``update_memory``) now acquire fresh
    # short-lived ``AsyncSession`` instances per call via
    # :data:`async_session_maker`. The factory still accepts ``db_session``
    # for registry compatibility but ``del``'s it immediately — see any
--- a/surfsense_backend/app/agents/new_chat/mention_resolver.py
+++ b/surfsense_backend/app/agents/new_chat/mention_resolver.py
@ -73,9 +73,8 @@ class ResolvedMentionSet:
    ``@Project Roadmap`` is never shadowed by a shorter prefix
    ``@Project``).

-    ``mentioned_document_ids`` collapses doc + surfsense_doc chips into
-    a single ordered, deduped list because the priority middleware
-    treats them uniformly downstream — see
+    ``mentioned_document_ids`` is an ordered, deduped list consumed by
+    the priority middleware downstream — see
    ``KnowledgePriorityMiddleware._compute_priority_paths``.
    """

@ -103,7 +102,6 @@ async def resolve_mentions(
    search_space_id: int,
    mentioned_documents: list[MentionedDocumentInfo] | None,
    mentioned_document_ids: list[int] | None = None,
-    mentioned_surfsense_doc_ids: list[int] | None = None,
    mentioned_folder_ids: list[int] | None = None,
 ) -> ResolvedMentionSet:
    """Resolve every @-mention chip on a turn into virtual paths.
@ -111,8 +109,7 @@ async def resolve_mentions(
    The function takes both the ``mentioned_documents`` discriminated
    list (chip metadata used for substitution + persistence) and the
    parallel id arrays (``mentioned_document_ids``,
-    ``mentioned_surfsense_doc_ids``, ``mentioned_folder_ids``) for two
-    reasons:
+    ``mentioned_folder_ids``) for two reasons:

    * Legacy clients that haven't migrated to the unified chip list
      still send the id arrays — we treat the union as authoritative.
@ -142,7 +139,6 @@ async def resolve_mentions(
        dict.fromkeys(
            [
                *(mentioned_document_ids or []),
-                *(mentioned_surfsense_doc_ids or []),
                *chip_doc_ids,
            ]
        )
--- a/surfsense_backend/app/agents/new_chat/prompts/base/citations_on.md
+++ b/surfsense_backend/app/agents/new_chat/prompts/base/citations_on.md
@ -59,14 +59,13 @@ Do NOT cite document_id. Always use the chunk id.
 - NEVER create your own citation format - use the exact chunk_id values from the documents in the [citation:chunk_id] format
 - NEVER format citations as clickable links or as markdown links like "([citation:5](https://example.com))". Always use plain square brackets only
 - NEVER make up chunk IDs if you are unsure about the chunk_id. It is better to omit the citation than to guess
- Copy the EXACT chunk id from the XML - if it says `<chunk id='doc-123'>`, use [citation:doc-123]
+- Copy the EXACT chunk id from the XML - if it says `<chunk id='5'>`, use [citation:5]
 - If the chunk id is a URL like `<chunk id='https://example.com/page'>`, use [citation:https://example.com/page]
 </citation_format>

 <citation_examples>
 CORRECT citation formats:
 - [citation:5] (numeric chunk ID from knowledge base)
- [citation:doc-123] (for Surfsense documentation chunks)
 - [citation:https://example.com/article] (URL chunk ID from web search results)
 - [citation:chunk_id1], [citation:chunk_id2], [citation:chunk_id3] (multiple citations)

--- a/surfsense_backend/app/agents/new_chat/prompts/base/kb_only_policy_private.md
+++ b/surfsense_backend/app/agents/new_chat/prompts/base/kb_only_policy_private.md
@ -7,7 +7,7 @@ CRITICAL RULE — KNOWLEDGE BASE FIRST, NEVER DEFAULT TO GENERAL KNOWLEDGE:
  2. Ask the user: "Would you like me to answer from my general knowledge instead?"
  3. ONLY provide a general-knowledge answer AFTER the user explicitly says yes.
 - This policy does NOT apply to:
-  * Casual conversation, greetings, or meta-questions about SurfSense itself (e.g., "what can you do?")
+  * Casual conversation, greetings, or meta-questions about SurfSense itself (e.g., "what can you do?"). For "how do I use SurfSense" / product-documentation questions, point the user to https://www.surfsense.com/docs.
  * Formatting, summarization, or analysis of content already present in the conversation
  * Following user instructions that are clearly task-oriented (e.g., "rewrite this in bullet points")
  * Tool-usage actions like generating reports, podcasts, images, or scraping webpages
--- a/surfsense_backend/app/agents/new_chat/prompts/base/kb_only_policy_team.md
+++ b/surfsense_backend/app/agents/new_chat/prompts/base/kb_only_policy_team.md
@ -7,7 +7,7 @@ CRITICAL RULE — KNOWLEDGE BASE FIRST, NEVER DEFAULT TO GENERAL KNOWLEDGE:
  2. Ask: "Would you like me to answer from my general knowledge instead?"
  3. ONLY provide a general-knowledge answer AFTER a team member explicitly says yes.
 - This policy does NOT apply to:
-  * Casual conversation, greetings, or meta-questions about SurfSense itself (e.g., "what can you do?")
+  * Casual conversation, greetings, or meta-questions about SurfSense itself (e.g., "what can you do?"). For "how do I use SurfSense" / product-documentation questions, point the user to https://www.surfsense.com/docs.
  * Formatting, summarization, or analysis of content already present in the conversation
  * Following user instructions that are clearly task-oriented (e.g., "rewrite this in bullet points")
  * Tool-usage actions like generating reports, podcasts, images, or scraping webpages
--- a/surfsense_backend/app/agents/new_chat/prompts/base/tool_routing_private.md
+++ b/surfsense_backend/app/agents/new_chat/prompts/base/tool_routing_private.md
@ -13,6 +13,7 @@ When to use which tool:
 - Knowledge base content (Notion, GitHub, files, notes) → automatically searched
 - Real-time public web data → call web_search
 - Reading a specific webpage → call scrape_webpage
+- SurfSense product / how-to questions (setup, configuration, connectors, feature behavior) → point the user to the documentation: https://www.surfsense.com/docs

 **`task` subagents (when to delegate):**
 - **`linear_specialist`** — Linear-only investigations and tool use.
--- a/surfsense_backend/app/agents/new_chat/prompts/base/tool_routing_team.md
+++ b/surfsense_backend/app/agents/new_chat/prompts/base/tool_routing_team.md
@ -13,6 +13,7 @@ When to use which tool:
 - Knowledge base content (Notion, GitHub, files, notes) → automatically searched
 - Real-time public web data → call web_search
 - Reading a specific webpage → call scrape_webpage
+- SurfSense product / how-to questions (setup, configuration, connectors, feature behavior) → point the user to the documentation: https://www.surfsense.com/docs

 **`task` subagents (when to delegate):**
 - **`linear_specialist`** — Linear-only investigations and tool use.
--- a/surfsense_backend/app/agents/new_chat/prompts/composer.py
+++ b/surfsense_backend/app/agents/new_chat/prompts/composer.py
@ -151,7 +151,6 @@ def _read_fragment(subpath: str) -> str:
 # Ordered for reading flow: fundamentals first, then artifact generators,
 # then memory at the end (mirrors the legacy ``_ALL_TOOL_NAMES_ORDERED``).
 ALL_TOOL_NAMES_ORDERED: tuple[str, ...] = (
-    "search_surfsense_docs",
    "web_search",
    "generate_podcast",
    "generate_video_presentation",
--- a/surfsense_backend/app/agents/new_chat/prompts/examples/search_surfsense_docs.md
+++ b/surfsense_backend/app/agents/new_chat/prompts/examples/search_surfsense_docs.md
@ -1,9 +0,0 @@
-
- User: "How do I install SurfSense?"
-  - Call: `search_surfsense_docs(query="installation setup")`
- User: "What connectors does SurfSense support?"
-  - Call: `search_surfsense_docs(query="available connectors integrations")`
- User: "How do I set up the Notion connector?"
-  - Call: `search_surfsense_docs(query="Notion connector setup configuration")`
- User: "How do I use Docker to run SurfSense?"
-  - Call: `search_surfsense_docs(query="Docker installation setup")`
--- a/surfsense_backend/app/agents/new_chat/prompts/tools/search_surfsense_docs.md
+++ b/surfsense_backend/app/agents/new_chat/prompts/tools/search_surfsense_docs.md
@ -1,7 +0,0 @@
-
- search_surfsense_docs: Search the official SurfSense documentation.
-  - Use this tool when the user asks anything about SurfSense itself (the application they are using).
-  - Args:
-    - query: The search query about SurfSense
-    - top_k: Number of documentation chunks to retrieve (default: 10)
-  - Returns: Documentation content with chunk IDs for citations (prefixed with 'doc-', e.g., [citation:doc-123])
--- a/surfsense_backend/app/agents/new_chat/skills/builtin/email-drafting/SKILL.md
+++ b/surfsense_backend/app/agents/new_chat/skills/builtin/email-drafting/SKILL.md
@ -1,7 +1,6 @@
 ---
 name: email-drafting
 description: Draft an email matching the user's voice, with structured intent and CTA
-allowed-tools: search_surfsense_docs
 ---

 # Email drafting
--- a/surfsense_backend/app/agents/new_chat/skills/builtin/kb-research/SKILL.md
+++ b/surfsense_backend/app/agents/new_chat/skills/builtin/kb-research/SKILL.md
@ -1,7 +1,7 @@
 ---
 name: kb-research
 description: Structured approach to finding and synthesizing information from the user's knowledge base
-allowed-tools: search_surfsense_docs, scrape_webpage, read_file, ls_tree, grep, web_search
+allowed-tools: scrape_webpage, read_file, ls_tree, grep, web_search
 ---

 # Knowledge-base research
--- a/surfsense_backend/app/agents/new_chat/skills/builtin/meeting-prep/SKILL.md
+++ b/surfsense_backend/app/agents/new_chat/skills/builtin/meeting-prep/SKILL.md
@ -1,7 +1,7 @@
 ---
 name: meeting-prep
 description: Pull together briefing materials before a scheduled meeting
-allowed-tools: search_surfsense_docs, web_search, scrape_webpage, read_file
+allowed-tools: web_search, scrape_webpage, read_file
 ---

 # Meeting preparation
--- a/surfsense_backend/app/agents/new_chat/skills/builtin/report-writing/SKILL.md
+++ b/surfsense_backend/app/agents/new_chat/skills/builtin/report-writing/SKILL.md
@ -1,7 +1,7 @@
 ---
 name: report-writing
 description: How to scope, draft, and revise a Markdown report artifact via generate_report
-allowed-tools: generate_report, search_surfsense_docs, read_file
+allowed-tools: generate_report, read_file
 ---

 # Report writing
--- a/surfsense_backend/app/agents/new_chat/skills/builtin/slack-summary/SKILL.md
+++ b/surfsense_backend/app/agents/new_chat/skills/builtin/slack-summary/SKILL.md
@ -1,7 +1,6 @@
 ---
 name: slack-summary
 description: Distill a Slack channel or thread into actionable summary
-allowed-tools: search_surfsense_docs
 ---

 # Slack summarization
--- a/surfsense_backend/app/agents/new_chat/subagents/config.py
+++ b/surfsense_backend/app/agents/new_chat/subagents/config.py
@ -46,7 +46,6 @@ logger = logging.getLogger(__name__)
 # ``glob``, ``grep``) plus the SurfSense-side read tools.
 EXPLORE_READ_TOOLS: frozenset[str] = frozenset(
    {
-        "search_surfsense_docs",
        "web_search",
        "scrape_webpage",
        "read_file",
@ -61,7 +60,6 @@ EXPLORE_READ_TOOLS: frozenset[str] = frozenset(
 # is needed, the parent should hand off to ``explore`` first.
 REPORT_WRITER_TOOLS: frozenset[str] = frozenset(
    {
-        "search_surfsense_docs",
        "read_file",
        "generate_report",
    }
@ -222,7 +220,6 @@ EXPLORE_SYSTEM_PROMPT = """You are the **explore** subagent for SurfSense.
 Conduct read-only research across the user's knowledge base, the web, and any documents the parent agent has surfaced. Return a synthesized answer with explicit citations — never speculate beyond the sources you have actually inspected.

 ## Tools available
- `search_surfsense_docs` — fast hybrid search over the user's knowledge base.
 - `web_search` — only when the user's KB clearly does not contain the answer.
 - `scrape_webpage` — to read a URL the user or the search results provided.
 - `read_file`, `ls`, `glob`, `grep` — to inspect specific documents or trees the parent has flagged.
@ -242,7 +239,7 @@ Produce a single high-quality report deliverable using `generate_report`. The pa

 ## Workflow
 1. **Outline first.** Before calling `generate_report`, write a one-paragraph outline of the sections you plan to produce. Confirm the outline reflects the parent's instructions.
-2. **Source resolution.** Decide whether to call `search_surfsense_docs` and `read_file` for any final-checks, or whether the parent's earlier tool calls already cover the source set.
+2. **Source resolution.** Decide whether to call `read_file` for any final-checks, or whether the parent's earlier tool calls already cover the source set.
 3. **One report.** Call `generate_report` exactly once with `source_strategy` chosen per the topic and chat history (see the `report-writing` skill).
 4. **Confirm.** End with a one-sentence summary in your final message — never paste the report back into chat; the artifact card renders itself.
 """
--- a/surfsense_backend/app/agents/new_chat/tools/init.py
+++ b/surfsense_backend/app/agents/new_chat/tools/init.py
@ -5,7 +5,6 @@ This module contains all the tools available to the SurfSense agent.
 To add a new tool, see the documentation in registry.py.

 Available tools:
- search_surfsense_docs: Search Surfsense documentation for usage help
 - generate_podcast: Generate audio podcasts from content
 - generate_video_presentation: Generate video presentations with slides and narration
 - generate_image: Generate images from text descriptions using AI models
@ -31,7 +30,6 @@ from .registry import (
    get_tool_by_name,
 )
 from .scrape_webpage import create_scrape_webpage_tool
-from .search_surfsense_docs import create_search_surfsense_docs_tool
 from .update_memory import create_update_memory_tool, create_update_team_memory_tool
 from .video_presentation import create_generate_video_presentation_tool

@ -47,7 +45,6 @@ __all__ = [
    "create_generate_podcast_tool",
    "create_generate_video_presentation_tool",
    "create_scrape_webpage_tool",
-    "create_search_surfsense_docs_tool",
    "create_update_memory_tool",
    "create_update_team_memory_tool",
    "format_documents_for_context",
--- a/surfsense_backend/app/agents/new_chat/tools/registry.py
+++ b/surfsense_backend/app/agents/new_chat/tools/registry.py
@ -101,7 +101,6 @@ from .podcast import create_generate_podcast_tool
 from .report import create_generate_report_tool
 from .resume import create_generate_resume_tool
 from .scrape_webpage import create_scrape_webpage_tool
-from .search_surfsense_docs import create_search_surfsense_docs_tool
 from .teams import (
    create_list_teams_channels_tool,
    create_read_teams_messages_tool,
@ -258,15 +257,6 @@ BUILTIN_TOOLS: list[ToolDefinition] = [
        ),
        requires=[],
    ),
-    # Surfsense documentation search tool
-    ToolDefinition(
-        name="search_surfsense_docs",
-        description="Search Surfsense documentation for help with using the application",
-        factory=lambda deps: create_search_surfsense_docs_tool(
-            db_session=deps["db_session"],
-        ),
-        requires=["db_session"],
-    ),
    # =========================================================================
    # SERVICE ACCOUNT DISCOVERY
    # Generic tool for the LLM to discover connected accounts and resolve
--- a/surfsense_backend/app/agents/new_chat/tools/search_surfsense_docs.py
+++ b/surfsense_backend/app/agents/new_chat/tools/search_surfsense_docs.py
@ -1,174 +0,0 @@
-"""
-Surfsense documentation search tool.
-
-This tool allows the agent to search the pre-indexed Surfsense documentation
-to help users with questions about how to use the application.
-
-The documentation is indexed at deployment time from MDX files and stored
-in dedicated tables (surfsense_docs_documents, surfsense_docs_chunks).
-"""
-
-import asyncio
-import json
-
-from langchain_core.tools import tool
-from sqlalchemy import select
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from app.db import SurfsenseDocsChunk, SurfsenseDocsDocument, async_session_maker
-from app.utils.document_converters import embed_text
-from app.utils.surfsense_docs import surfsense_docs_public_url
-
-
-def format_surfsense_docs_results(results: list[tuple]) -> str:
-    """
-    Format search results into XML structure for the LLM context.
-
-    Uses the same XML structure as format_documents_for_context from knowledge_base.py
-    but with 'doc-' prefix on chunk IDs. This allows:
-    - LLM to use consistent [citation:doc-XXX] format
-    - Frontend to detect 'doc-' prefix and route to surfsense docs endpoint
-
-    Args:
-        results: List of (chunk, document) tuples from the database query
-
-    Returns:
-        Formatted XML string with documentation content and citation-ready chunks
-    """
-    if not results:
-        return "No relevant Surfsense documentation found for your query."
-
-    # Group chunks by document
-    grouped: dict[int, dict] = {}
-    for chunk, doc in results:
-        public_url = surfsense_docs_public_url(doc.source)
-        if doc.id not in grouped:
-            grouped[doc.id] = {
-                "document_id": f"doc-{doc.id}",
-                "document_type": "SURFSENSE_DOCS",
-                "title": doc.title,
-                "url": public_url,
-                "metadata": {"source": doc.source, "public_url": public_url},
-                "chunks": [],
-            }
-        grouped[doc.id]["chunks"].append(
-            {
-                "chunk_id": f"doc-{chunk.id}",
-                "content": chunk.content,
-            }
-        )
-
-    # Render XML matching format_documents_for_context structure
-    parts: list[str] = []
-    for g in grouped.values():
-        metadata_json = json.dumps(g["metadata"], ensure_ascii=False)
-
-        parts.append("<document>")
-        parts.append("<document_metadata>")
-        parts.append(f"  <document_id>{g['document_id']}</document_id>")
-        parts.append(f"  <document_type>{g['document_type']}</document_type>")
-        parts.append(f"  <title><![CDATA[{g['title']}]]></title>")
-        parts.append(f"  <url><![CDATA[{g['url']}]]></url>")
-        parts.append(f"  <metadata_json><![CDATA[{metadata_json}]]></metadata_json>")
-        parts.append("</document_metadata>")
-        parts.append("")
-        parts.append("<document_content>")
-
-        for ch in g["chunks"]:
-            parts.append(
-                f"  <chunk id='{ch['chunk_id']}'><![CDATA[{ch['content']}]]></chunk>"
-            )
-
-        parts.append("</document_content>")
-        parts.append("</document>")
-        parts.append("")
-
-    return "\n".join(parts).strip()
-
-
-async def search_surfsense_docs_async(
-    query: str,
-    db_session: AsyncSession,
-    top_k: int = 10,
-) -> str:
-    """
-    Search Surfsense documentation using vector similarity.
-
-    Args:
-        query: The search query about Surfsense usage
-        db_session: Database session for executing queries
-        top_k: Number of results to return
-
-    Returns:
-        Formatted string with relevant documentation content
-    """
-    # Get embedding for the query
-    query_embedding = await asyncio.to_thread(embed_text, query)
-
-    # Vector similarity search on chunks, joining with documents
-    stmt = (
-        select(SurfsenseDocsChunk, SurfsenseDocsDocument)
-        .join(
-            SurfsenseDocsDocument,
-            SurfsenseDocsChunk.document_id == SurfsenseDocsDocument.id,
-        )
-        .order_by(SurfsenseDocsChunk.embedding.op("<=>")(query_embedding))
-        .limit(top_k)
-    )
-
-    result = await db_session.execute(stmt)
-    rows = result.all()
-
-    return format_surfsense_docs_results(rows)
-
-
-def create_search_surfsense_docs_tool(db_session: AsyncSession):
-    """
-    Factory function to create the search_surfsense_docs tool.
-
-    The tool acquires its own short-lived ``AsyncSession`` per call via
-    :data:`async_session_maker` so the closure is safe to share across
-    HTTP requests by the compiled-agent cache. Capturing a per-request
-    session here would surface stale/closed sessions on cache hits.
-
-    Args:
-        db_session: Reserved for registry compatibility. Per-call sessions
-            are opened via :data:`async_session_maker` inside the tool body.
-
-    Returns:
-        A configured tool function for searching Surfsense documentation
-    """
-    del db_session  # per-call session — see docstring
-
-    @tool
-    async def search_surfsense_docs(query: str, top_k: int = 10) -> str:
-        """
-        Search Surfsense documentation for help with using the application.
-
-        Use this tool when the user asks questions about:
-        - How to use Surfsense features
-        - Installation and setup instructions
-        - Configuration options and settings
-        - Troubleshooting common issues
-        - Available connectors and integrations
-        - Browser extension usage
-        - API documentation
-
-        This searches the official Surfsense documentation that was indexed
-        at deployment time. It does NOT search the user's personal knowledge base.
-
-        Args:
-            query: The search query about Surfsense usage or features
-            top_k: Number of documentation chunks to retrieve (default: 10)
-
-        Returns:
-            Relevant documentation content formatted with chunk IDs for citations
-        """
-        async with async_session_maker() as db_session:
-            return await search_surfsense_docs_async(
-                query=query,
-                db_session=db_session,
-                top_k=top_k,
-            )
-
-    return search_surfsense_docs
				`@ -1 +0,0 @@`
				"""``search_surfsense_docs`` — description + few-shot examples."""