mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-29 19:35:20 +02:00
refactor: remove search_surfsense_docs tool and related references
- Deleted the `search_surfsense_docs` tool and its associated files, streamlining the agent's toolset. - Updated various components and prompts to remove references to the now-removed tool, ensuring consistency across the codebase. - Adjusted documentation to direct users to the SurfSense documentation link for product-related queries instead.
This commit is contained in:
parent
9b9e6828c7
commit
40ca9e6ed2
71 changed files with 232 additions and 1676 deletions
|
|
@ -4,8 +4,8 @@ never invent ids you didn't see. Citation ids are resolved by exact-match
|
|||
lookup; a wrong id silently breaks the link, so when in doubt, omit.
|
||||
|
||||
### Channel A — chunk blocks injected this turn
|
||||
When `search_surfsense_docs` or `web_search` returns `<document>` /
|
||||
`<chunk id='…'>` blocks in this turn:
|
||||
When `web_search` returns `<document>` / `<chunk id='…'>` blocks in this
|
||||
turn:
|
||||
|
||||
1. For each factual statement taken from those chunks, add
|
||||
`[citation:chunk_id]` using the **exact** id from a visible
|
||||
|
|
|
|||
|
|
@ -20,8 +20,8 @@ it to resolve paths the user describes in natural language ("my Q2 roadmap",
|
|||
delegating to a specialist.
|
||||
|
||||
`<document>` and `<chunk id='…'>` blocks are chunked indexed content returned
|
||||
by KB search (from `search_surfsense_docs`, or backing `<priority_documents>`).
|
||||
Each chunk carries a stable `id` attribute.
|
||||
by KB search (backing `<priority_documents>`). Each chunk carries a stable
|
||||
`id` attribute.
|
||||
|
||||
If a block doesn't appear this turn, work from the conversation alone.
|
||||
</dynamic_context>
|
||||
|
|
|
|||
|
|
@ -20,8 +20,8 @@ week's planning notes") into concrete document references before delegating
|
|||
to a specialist.
|
||||
|
||||
`<document>` and `<chunk id='…'>` blocks are chunked indexed content returned
|
||||
by KB search (from `search_surfsense_docs`, or backing `<priority_documents>`).
|
||||
Each chunk carries a stable `id` attribute.
|
||||
by KB search (backing `<priority_documents>`). Each chunk carries a stable
|
||||
`id` attribute.
|
||||
|
||||
If a block doesn't appear this turn, work from the conversation alone.
|
||||
</dynamic_context>
|
||||
|
|
|
|||
|
|
@ -1,19 +1,21 @@
|
|||
<knowledge_base_first>
|
||||
CRITICAL — ground factual answers in what you actually receive this turn:
|
||||
- injected workspace context (see `<dynamic_context>`),
|
||||
- results from your own tool calls (`search_surfsense_docs`, `web_search`,
|
||||
`scrape_webpage`),
|
||||
- results from your own tool calls (`web_search`, `scrape_webpage`),
|
||||
- or substantive summaries returned by a `task` specialist you invoked.
|
||||
|
||||
Do **not** answer factual or informational questions from general knowledge
|
||||
unless the user explicitly authorises it after you say you couldn't find
|
||||
enough in those sources. The flow when nothing is found:
|
||||
|
||||
1. Say you couldn't find enough in their workspace, docs, or tool output.
|
||||
1. Say you couldn't find enough in their workspace or tool output.
|
||||
2. Ask: *"Would you like me to answer from my general knowledge instead?"*
|
||||
3. Only answer from general knowledge after a clear yes.
|
||||
|
||||
This rule does NOT apply to: casual conversation · meta-questions about
|
||||
SurfSense ("what can you do?") · formatting or analysis of content already
|
||||
in chat · clear rewrite/edit instructions · lightweight web research.
|
||||
|
||||
For "how do I use SurfSense" / product-documentation questions, point the
|
||||
user to https://www.surfsense.com/docs.
|
||||
</knowledge_base_first>
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ Structured reasoning:
|
|||
- For non-trivial work, `<thinking>` / short `<plan>` before tool calls is fine.
|
||||
|
||||
Professional objectivity:
|
||||
- Accuracy over flattery; verify with **search_surfsense_docs**, **web_search**, **scrape_webpage**, or **task** when unsure — don’t invent connector access.
|
||||
- Accuracy over flattery; verify with **web_search**, **scrape_webpage**, or **task** when unsure — don’t invent connector access.
|
||||
|
||||
Task management:
|
||||
- For 3+ steps, use todo tooling; update statuses promptly.
|
||||
|
|
|
|||
|
|
@ -13,6 +13,6 @@ Attribution:
|
|||
|
||||
Tool calls:
|
||||
- Parallelise independent calls.
|
||||
- Prefer **search_surfsense_docs** for SurfSense docs/product questions before **web_search** when that fits the ask.
|
||||
- For SurfSense docs/product questions, point the user to https://www.surfsense.com/docs.
|
||||
- Don’t invent paths, chunk ids, or URLs — only values from tools or the user.
|
||||
</provider_hints>
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ Output style:
|
|||
- GitHub-flavoured Markdown; monospace-friendly.
|
||||
|
||||
Workflow (Understand → Plan → Act → Verify):
|
||||
1. **Understand:** parse the ask; use **search_surfsense_docs** / injected workspace context before guessing.
|
||||
1. **Understand:** parse the ask; use injected workspace context before guessing.
|
||||
2. **Plan:** for multi-step work, a short plan first.
|
||||
3. **Act:** only with tools you actually have on this agent (see `<tools>` and `<tool_routing>`). Connector work → **task**.
|
||||
4. **Verify:** re-read or re-search only when it materially reduces risk.
|
||||
|
|
|
|||
|
|
@ -15,6 +15,7 @@ Output style:
|
|||
|
||||
Tool calls:
|
||||
- Parallelise independent calls in one turn.
|
||||
- Prefer **search_surfsense_docs** for SurfSense-product questions, **web_search** / **scrape_webpage**
|
||||
for fresh public facts; integrations and heavy workflows → **task**.
|
||||
- For SurfSense-product questions, point the user to https://www.surfsense.com/docs;
|
||||
use **web_search** / **scrape_webpage** for fresh public facts; integrations and
|
||||
heavy workflows → **task**.
|
||||
</provider_hints>
|
||||
|
|
|
|||
|
|
@ -3,10 +3,7 @@ You have two execution channels. Pick the one that owns the work — never
|
|||
simulate one with the other.
|
||||
|
||||
### 1. Direct tools (you call them yourself)
|
||||
- `search_surfsense_docs` — SurfSense product docs (setup, configuration,
|
||||
connector docs, feature behavior).
|
||||
- `web_search` — search the public web (anything outside SurfSense docs and
|
||||
the workspace KB).
|
||||
- `web_search` — search the public web (anything outside the workspace KB).
|
||||
- `scrape_webpage` — fetch the body of a specific public URL.
|
||||
- `update_memory` — curate persistent memory (see `<memory_protocol>`).
|
||||
- `write_todos` — maintain a structured plan when the turn series spans
|
||||
|
|
@ -14,6 +11,10 @@ simulate one with the other.
|
|||
`in_progress` **before** the `task` call that handles it, `completed`
|
||||
once the call returns. Skip for single-step requests.
|
||||
|
||||
**Questions about how to use SurfSense itself** (setup, configuration,
|
||||
connectors, feature behavior) — point the user to the documentation:
|
||||
https://www.surfsense.com/docs. There is no docs-search tool; give the link.
|
||||
|
||||
**You have NO filesystem tools.** Any read, write, edit, move, rename, or
|
||||
search inside the user's workspace goes through `task(knowledge_base, …)` —
|
||||
never via `write_file`, `ls`, or any direct file operation.
|
||||
|
|
|
|||
|
|
@ -1 +0,0 @@
|
|||
"""``search_surfsense_docs`` — description + few-shot examples."""
|
||||
|
|
@ -1,10 +0,0 @@
|
|||
- `search_surfsense_docs` — Search official SurfSense documentation (product
|
||||
help).
|
||||
- Use when the user asks how SurfSense itself works — setup, configuration,
|
||||
connector documentation, feature behavior, anything covered in the
|
||||
product docs.
|
||||
- Not a substitute for `task` when the user wants actions inside a
|
||||
connected service (Gmail, Slack, Jira, Notion, etc.).
|
||||
- Args: `query`, `top_k` (default 10).
|
||||
- Returns doc excerpts; chunk ids may appear for attribution — see
|
||||
`<citations>` for the contract.
|
||||
|
|
@ -1,15 +0,0 @@
|
|||
<example>
|
||||
user: "How do I install SurfSense?"
|
||||
→ search_surfsense_docs(query="installation setup")
|
||||
</example>
|
||||
|
||||
<example>
|
||||
user: "What connectors does SurfSense support?"
|
||||
→ search_surfsense_docs(query="available connectors integrations")
|
||||
</example>
|
||||
|
||||
<example>
|
||||
user: "How do I set up the Notion connector?"
|
||||
→ search_surfsense_docs(query="Notion connector setup configuration")
|
||||
(Changing data inside Notion itself → `task(notion, …)`, not this tool.)
|
||||
</example>
|
||||
|
|
@ -6,7 +6,6 @@ Connector integrations, MCP, deliverables, etc. are delegated via ``task`` subag
|
|||
from __future__ import annotations
|
||||
|
||||
MAIN_AGENT_SURFSENSE_TOOL_NAMES_ORDERED: tuple[str, ...] = (
|
||||
"search_surfsense_docs",
|
||||
"web_search",
|
||||
"scrape_webpage",
|
||||
"update_memory",
|
||||
|
|
|
|||
|
|
@ -8,7 +8,6 @@ Gather and synthesize evidence using SurfSense research tools with clear citatio
|
|||
<available_tools>
|
||||
- `web_search`
|
||||
- `scrape_webpage`
|
||||
- `search_surfsense_docs`
|
||||
</available_tools>
|
||||
|
||||
<tool_policy>
|
||||
|
|
|
|||
|
|
@ -1,11 +1,9 @@
|
|||
"""Research-stage tools: web search, scrape, and in-product doc search."""
|
||||
"""Research-stage tools: web search and scrape."""
|
||||
|
||||
from .scrape_webpage import create_scrape_webpage_tool
|
||||
from .search_surfsense_docs import create_search_surfsense_docs_tool
|
||||
from .web_search import create_web_search_tool
|
||||
|
||||
__all__ = [
|
||||
"create_scrape_webpage_tool",
|
||||
"create_search_surfsense_docs_tool",
|
||||
"create_web_search_tool",
|
||||
]
|
||||
|
|
|
|||
|
|
@ -9,7 +9,6 @@ from langchain_core.tools import BaseTool
|
|||
from app.agents.new_chat.permissions import Ruleset
|
||||
|
||||
from .scrape_webpage import create_scrape_webpage_tool
|
||||
from .search_surfsense_docs import create_search_surfsense_docs_tool
|
||||
from .web_search import create_web_search_tool
|
||||
|
||||
NAME = "research"
|
||||
|
|
@ -27,5 +26,4 @@ def load_tools(
|
|||
available_connectors=d.get("available_connectors"),
|
||||
),
|
||||
create_scrape_webpage_tool(firecrawl_api_key=d.get("firecrawl_api_key")),
|
||||
create_search_surfsense_docs_tool(db_session=d["db_session"]),
|
||||
]
|
||||
|
|
|
|||
|
|
@ -1,145 +0,0 @@
|
|||
"""Semantic search over pre-indexed in-app documentation chunks for user how-to questions."""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
from langchain_core.tools import tool
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.db import SurfsenseDocsChunk, SurfsenseDocsDocument
|
||||
from app.utils.document_converters import embed_text
|
||||
from app.utils.surfsense_docs import surfsense_docs_public_url
|
||||
|
||||
|
||||
def format_surfsense_docs_results(results: list[tuple]) -> str:
|
||||
"""Format (chunk, document) rows as XML with ``doc-`` chunk IDs for citations and UI routing."""
|
||||
if not results:
|
||||
return "No relevant Surfsense documentation found for your query."
|
||||
|
||||
# Group chunks by document
|
||||
grouped: dict[int, dict] = {}
|
||||
for chunk, doc in results:
|
||||
public_url = surfsense_docs_public_url(doc.source)
|
||||
if doc.id not in grouped:
|
||||
grouped[doc.id] = {
|
||||
"document_id": f"doc-{doc.id}",
|
||||
"document_type": "SURFSENSE_DOCS",
|
||||
"title": doc.title,
|
||||
"url": public_url,
|
||||
"metadata": {"source": doc.source, "public_url": public_url},
|
||||
"chunks": [],
|
||||
}
|
||||
grouped[doc.id]["chunks"].append(
|
||||
{
|
||||
"chunk_id": f"doc-{chunk.id}",
|
||||
"content": chunk.content,
|
||||
}
|
||||
)
|
||||
|
||||
# Render XML matching format_documents_for_context structure
|
||||
parts: list[str] = []
|
||||
for g in grouped.values():
|
||||
metadata_json = json.dumps(g["metadata"], ensure_ascii=False)
|
||||
|
||||
parts.append("<document>")
|
||||
parts.append("<document_metadata>")
|
||||
parts.append(f" <document_id>{g['document_id']}</document_id>")
|
||||
parts.append(f" <document_type>{g['document_type']}</document_type>")
|
||||
parts.append(f" <title><![CDATA[{g['title']}]]></title>")
|
||||
parts.append(f" <url><![CDATA[{g['url']}]]></url>")
|
||||
parts.append(f" <metadata_json><![CDATA[{metadata_json}]]></metadata_json>")
|
||||
parts.append("</document_metadata>")
|
||||
parts.append("")
|
||||
parts.append("<document_content>")
|
||||
|
||||
for ch in g["chunks"]:
|
||||
parts.append(
|
||||
f" <chunk id='{ch['chunk_id']}'><![CDATA[{ch['content']}]]></chunk>"
|
||||
)
|
||||
|
||||
parts.append("</document_content>")
|
||||
parts.append("</document>")
|
||||
parts.append("")
|
||||
|
||||
return "\n".join(parts).strip()
|
||||
|
||||
|
||||
async def search_surfsense_docs_async(
|
||||
query: str,
|
||||
db_session: AsyncSession,
|
||||
top_k: int = 10,
|
||||
) -> str:
|
||||
"""
|
||||
Search Surfsense documentation using vector similarity.
|
||||
|
||||
Args:
|
||||
query: The search query about Surfsense usage
|
||||
db_session: Database session for executing queries
|
||||
top_k: Number of results to return
|
||||
|
||||
Returns:
|
||||
Formatted string with relevant documentation content
|
||||
"""
|
||||
# Get embedding for the query
|
||||
query_embedding = await asyncio.to_thread(embed_text, query)
|
||||
|
||||
# Vector similarity search on chunks, joining with documents
|
||||
stmt = (
|
||||
select(SurfsenseDocsChunk, SurfsenseDocsDocument)
|
||||
.join(
|
||||
SurfsenseDocsDocument,
|
||||
SurfsenseDocsChunk.document_id == SurfsenseDocsDocument.id,
|
||||
)
|
||||
.order_by(SurfsenseDocsChunk.embedding.op("<=>")(query_embedding))
|
||||
.limit(top_k)
|
||||
)
|
||||
|
||||
result = await db_session.execute(stmt)
|
||||
rows = result.all()
|
||||
|
||||
return format_surfsense_docs_results(rows)
|
||||
|
||||
|
||||
def create_search_surfsense_docs_tool(db_session: AsyncSession):
|
||||
"""
|
||||
Factory function to create the search_surfsense_docs tool.
|
||||
|
||||
Args:
|
||||
db_session: Database session for executing queries
|
||||
|
||||
Returns:
|
||||
A configured tool function for searching Surfsense documentation
|
||||
"""
|
||||
|
||||
@tool
|
||||
async def search_surfsense_docs(query: str, top_k: int = 10) -> str:
|
||||
"""
|
||||
Search Surfsense documentation for help with using the application.
|
||||
|
||||
Use this tool when the user asks questions about:
|
||||
- How to use Surfsense features
|
||||
- Installation and setup instructions
|
||||
- Configuration options and settings
|
||||
- Troubleshooting common issues
|
||||
- Available connectors and integrations
|
||||
- Browser extension usage
|
||||
- API documentation
|
||||
|
||||
This searches the official Surfsense documentation that was indexed
|
||||
at deployment time. It does NOT search the user's personal knowledge base.
|
||||
|
||||
Args:
|
||||
query: The search query about Surfsense usage or features
|
||||
top_k: Number of documentation chunks to retrieve (default: 10)
|
||||
|
||||
Returns:
|
||||
Relevant documentation content formatted with chunk IDs for citations
|
||||
"""
|
||||
return await search_surfsense_docs_async(
|
||||
query=query,
|
||||
db_session=db_session,
|
||||
top_k=top_k,
|
||||
)
|
||||
|
||||
return search_surfsense_docs
|
||||
Loading…
Add table
Add a link
Reference in a new issue