From bdc5b245b4e980b3f098e85ee1bbfdaf2e94597f Mon Sep 17 00:00:00 2001 From: "DESKTOP-RTLN3BA\\$punk" Date: Fri, 20 Feb 2026 18:33:28 -0800 Subject: [PATCH] feat: expand scraping guidelines in system prompt to include critical scenarios for user requests --- .../app/agents/new_chat/system_prompt.py | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/surfsense_backend/app/agents/new_chat/system_prompt.py b/surfsense_backend/app/agents/new_chat/system_prompt.py index 668d2ab10..c8dcf5154 100644 --- a/surfsense_backend/app/agents/new_chat/system_prompt.py +++ b/surfsense_backend/app/agents/new_chat/system_prompt.py @@ -217,6 +217,11 @@ up-to-date, or domain-specific information that is more relevant than your gener - IMPORTANT: This is different from link_preview: * link_preview: Only fetches metadata (title, description, thumbnail) for display * scrape_webpage: Actually reads the FULL page content so you can analyze/summarize it + - CRITICAL — WHEN TO USE (always attempt scraping, never refuse before trying): + * When a user asks to "get", "fetch", "pull", "grab", "scrape", or "read" content from a URL + * When the user wants live/dynamic data from a specific webpage (e.g., tables, scores, stats, prices) + * When a URL was mentioned earlier in the conversation and the user asks for its actual content + * When link_preview or search_knowledge_base returned insufficient data and the user wants more - Trigger scenarios: * "Read this article and summarize it" * "What does this page say about X?" @@ -224,6 +229,10 @@ up-to-date, or domain-specific information that is more relevant than your gener * "Tell me the key points from this article" * "What's in this webpage?" * "Can you analyze this article?" + * "Can you get the live table/data from [URL]?" + * "Scrape it" / "Can you scrape that?" (referring to a previously mentioned URL) + * "Fetch the content from [URL]" + * "Pull the data from that page" - Args: - url: The URL of the webpage to scrape (must be HTTP/HTTPS) - max_length: Maximum content length to return (default: 50000 chars) @@ -490,6 +499,15 @@ _TOOLS_INSTRUCTIONS_EXAMPLES_COMMON = """ - Call: `display_image(src="https://example.com/nn-diagram.png", alt="Neural Network Diagram", title="Neural Network Architecture")` - Then provide your explanation, referencing the displayed image +- User: (after discussing https://example.com/stats in the conversation) "Can you get the live data from that page?" + - Call: `scrape_webpage(url="https://example.com/stats")` + - IMPORTANT: Always attempt scraping first. Never refuse before trying the tool. + - Then present the extracted data to the user. + +- User: "Pull the table from https://example.com/leaderboard" + - Call: `scrape_webpage(url="https://example.com/leaderboard")` + - Then parse and present the table data from the scraped content. + - User: "Generate an image of a cat" - Step 1: `generate_image(prompt="A fluffy orange tabby cat sitting on a windowsill, bathed in warm golden sunlight, soft bokeh background with green houseplants, photorealistic style, cozy atmosphere")` - Step 2: Use the returned "src" URL to display it: `display_image(src="", alt="A fluffy orange tabby cat on a windowsill", title="Generated Image")`