refactor: introduce write_todos tool for enhanced task management

- Added a new write_todos tool to facilitate the creation and management of planning lists within the chat interface. - Updated system prompt with detailed instructions on using the write_todos tool, including usage patterns and restrictions. - Enhanced the chat message handling to support the new tool, ensuring proper integration and user experience. - Implemented UI components for displaying and interacting with the planning lists, including progress tracking and status indicators.
2026-06-22 21:28:12 +02:00 · 2025-12-26 14:37:23 +05:30 · 2025-12-26 14:37:23 +05:30 · eb70c055a4
commit eb70c055a4
parent 1dd740bb23
14 changed files with 1138 additions and 1 deletions
--- a/surfsense_backend/app/agents/new_chat/system_prompt.py
+++ b/surfsense_backend/app/agents/new_chat/system_prompt.py
@ -109,6 +109,65 @@ You have access to the following tools:
    * This makes your response more visual and engaging.
    * Prioritize showing: diagrams, charts, infographics, key illustrations, or images that help explain the content.
    * Don't show every image - just the most relevant 1-3 images that enhance understanding.
+
+6. write_todos: Create and update a planning/todo list to break down complex tasks.
+  - Use this tool when you need to plan your approach to a complex task.
+  - This displays a visual plan with progress tracking and status indicators.
+  
+  - USAGE PATTERN:
+    * First call: Create the plan with first task as "in_progress", rest as "pending"
+    * Subsequent calls: ONLY update task statuses (mark completed/in_progress)
+    * Use the EXACT SAME title and task IDs for all updates
+  
+  - ABSOLUTELY FORBIDDEN - WILL BREAK THE SYSTEM:
+    * ONLY ONE PLAN PER CONVERSATION - NEVER call write_todos a second time to create a new plan
+    * When all tasks in your plan are "completed", your response is FINISHED - STOP
+    * NEVER restart your response after completing it
+    * NEVER generate the same explanation twice
+    * NEVER create a second introduction or overview after the first one
+    * NEVER say "Let me explain..." twice for the same topic
+    * If you've already explained something, DO NOT explain it again
+    * After your response ends, STOP - do not continue generating
+    * NEVER say you're creating a "document", "report", "roadmap", "analysis", or any artifact
+    * Do NOT use phrases like "This report is based on..." or "Based on my research..."
+    * Just answer the question directly - do not roleplay producing a deliverable
+  
+  - CORRECT BEHAVIOR:
+    * Call write_todos to update statuses as you progress
+    * Each section of your response appears EXACTLY ONCE
+    * When you finish explaining all tasks, your response is COMPLETE
+    * Do NOT generate additional content after concluding
+  
+  - CONTENT QUALITY:
+    * Provide thorough, detailed explanations for each task
+    * The restriction is on DUPLICATING content, not on depth or detail
+    * Each task deserves a complete, comprehensive explanation
+    * Be as detailed as needed - just don't repeat yourself
+  
+  - When to use:
+    * Breaking down a complex multi-step task (3-5 tasks recommended)
+    * Showing the user what steps you'll take to solve their problem
+    * Creating an implementation roadmap
+  
+  - Args:
+    - todos: List of todo items, each with:
+      * id: Unique identifier (KEEP SAME IDs across updates)
+      * content: Description of the task (KEEP SAME content across updates)
+      * status: "pending", "in_progress", or "completed"
+    - title: Title for the plan (MUST BE IDENTICAL across all updates)
+    - description: Optional context description
+  
+  - Returns: A visual plan card with progress bar and status indicators
+  
+  - CORRECT PATTERN:
+    1. Create plan with task 1 as "in_progress"
+    2. Explain task 1 content in detail
+    3. Update plan: task 1 "completed", task 2 "in_progress"
+    4. Explain task 2 content (NEW content, not repeating task 1)
+    5. Continue until all tasks are "completed"
+    6. When all tasks are "completed", your response is FINISHED
+    7. STOP IMMEDIATELY - do NOT create another plan or continue generating
+    8. ONE PLAN ONLY - never call write_todos again after completing all tasks
 </tools>
 <tool_call_examples>
 - User: "Fetch all my notes and what's in them?"
@ -166,6 +225,22 @@ You have access to the following tools:
  - Then, if the content contains useful diagrams/images like `![Neural Network Diagram](https://example.com/nn-diagram.png)`:
    - Call: `display_image(src="https://example.com/nn-diagram.png", alt="Neural Network Diagram", title="Neural Network Architecture")`
  - Then provide your explanation, referencing the displayed image
+
+- User: "Help me implement a user authentication system"
+  - Step 1: Create plan with task 1 in_progress:
+    `write_todos(title="Auth Plan", todos=[{"id": "1", "content": "Design database schema", "status": "in_progress"}, {"id": "2", "content": "Set up password hashing", "status": "pending"}, {"id": "3", "content": "Create endpoints", "status": "pending"}])`
+  - Step 2: Provide DETAILED explanation of database schema design
+  - Step 3: Update plan (task 1 done, task 2 in_progress):
+    `write_todos(title="Auth Plan", todos=[{"id": "1", "content": "Design database schema", "status": "completed"}, {"id": "2", "content": "Set up password hashing", "status": "in_progress"}, {"id": "3", "content": "Create endpoints", "status": "pending"}])`
+  - Step 4: Provide DETAILED explanation of password hashing (NEW content only)
+  - Step 5: Update plan, explain endpoints in detail
+  - Step 6: Mark all complete, END response - DO NOT restart or regenerate
+  - FORBIDDEN: Do not go back and explain schema again after step 2
+
+- User: "How should I approach refactoring this large codebase?"
+  - Create plan, explain each step with thorough detail, update statuses as you go
+  - Each explanation is comprehensive but appears ONLY ONCE
+  - When finished with all tasks, STOP - do not continue generating
 </tool_call_examples>
 """

--- a/surfsense_backend/app/agents/new_chat/tools/registry.py
+++ b/surfsense_backend/app/agents/new_chat/tools/registry.py
@ -48,6 +48,7 @@ from .knowledge_base import create_search_knowledge_base_tool
 from .link_preview import create_link_preview_tool
 from .podcast import create_generate_podcast_tool
 from .scrape_webpage import create_scrape_webpage_tool
+from .write_todos import create_write_todos_tool

 # =============================================================================
 # Tool Definition
@ -125,6 +126,13 @@ BUILTIN_TOOLS: list[ToolDefinition] = [
        ),
        requires=[],  # firecrawl_api_key is optional
    ),
+    # Planning/Todo tool - creates visual todo lists
+    ToolDefinition(
+        name="write_todos",
+        description="Create a planning/todo list to break down complex tasks",
+        factory=lambda deps: create_write_todos_tool(),
+        requires=[],
+    ),
    # =========================================================================
    # ADD YOUR CUSTOM TOOLS BELOW
    # =========================================================================
--- a/surfsense_backend/app/agents/new_chat/tools/write_todos.py
+++ b/surfsense_backend/app/agents/new_chat/tools/write_todos.py
@ -0,0 +1,94 @@
+"""
+Write todos tool for the SurfSense agent.
+
+This module provides a tool for creating and displaying a planning/todo list
+in the chat UI. It helps the agent break down complex tasks into steps.
+"""
+
+from typing import Any, Literal
+
+from langchain_core.tools import tool
+
+
+def create_write_todos_tool():
+    """
+    Factory function to create the write_todos tool.
+
+    Returns:
+        A configured tool function for writing todos/plans.
+    """
+
+    @tool
+    async def write_todos(
+        todos: list[dict[str, Any]],
+        title: str = "Planning Approach",
+        description: str | None = None,
+    ) -> dict[str, Any]:
+        """
+        Create a planning/todo list to break down a complex task.
+
+        Use this tool when you need to plan your approach to a complex task
+        or show the user a step-by-step breakdown of what you'll do.
+
+        This displays a visual plan with:
+        - Progress tracking (X of Y complete)
+        - Status indicators (pending, in progress, completed, cancelled)
+        - Expandable details for each step
+
+        Args:
+            todos: List of todo items. Each item should have:
+                - id: Unique identifier for the todo
+                - content: Description of the task
+                - status: One of "pending", "in_progress", "completed", "cancelled"
+            title: Title for the plan (default: "Planning Approach")
+            description: Optional description providing context
+
+        Returns:
+            A dictionary containing the plan data for the UI to render.
+
+        Example:
+            write_todos(
+                title="Implementation Plan",
+                description="Steps to add the new feature",
+                todos=[
+                    {"id": "1", "content": "Analyze requirements", "status": "completed"},
+                    {"id": "2", "content": "Design solution", "status": "in_progress"},
+                    {"id": "3", "content": "Write code", "status": "pending"},
+                    {"id": "4", "content": "Add tests", "status": "pending"},
+                ]
+            )
+        """
+        # Generate a unique plan ID
+        import uuid
+
+        plan_id = f"plan-{uuid.uuid4().hex[:8]}"
+
+        # Transform todos to the expected format for the UI
+        formatted_todos = []
+        for i, todo in enumerate(todos):
+            todo_id = todo.get("id", f"todo-{i}")
+            content = todo.get("content", "")
+            status = todo.get("status", "pending")
+
+            # Validate status
+            valid_statuses = ["pending", "in_progress", "completed", "cancelled"]
+            if status not in valid_statuses:
+                status = "pending"
+
+            formatted_todos.append(
+                {
+                    "id": todo_id,
+                    "label": content,
+                    "status": status,
+                }
+            )
+
+        return {
+            "id": plan_id,
+            "title": title,
+            "description": description,
+            "todos": formatted_todos,
+        }
+
+    return write_todos
+
--- a/surfsense_backend/app/tasks/chat/stream_new_chat.py
+++ b/surfsense_backend/app/tasks/chat/stream_new_chat.py
@ -718,6 +718,25 @@ async def stream_new_chat(
                        status="completed",
                        items=completed_items,
                    )
+                elif tool_name == "write_todos":
+                    # Build completion items for planning
+                    if isinstance(tool_output, dict):
+                        plan_title = tool_output.get("title", "Plan")
+                        todos = tool_output.get("todos", [])
+                        todo_count = len(todos) if isinstance(todos, list) else 0
+                        completed_items = [
+                            *last_active_step_items,
+                            f"Plan: {plan_title[:50]}{'...' if len(plan_title) > 50 else ''}",
+                            f"Tasks: {todo_count} steps defined",
+                        ]
+                    else:
+                        completed_items = [*last_active_step_items, "Plan created"]
+                    yield streaming_service.format_thinking_step(
+                        step_id=original_step_id,
+                        title="Creating plan",
+                        status="completed",
+                        items=completed_items,
+                    )
                else:
                    yield streaming_service.format_thinking_step(
                        step_id=original_step_id,
@ -854,6 +873,28 @@ async def stream_new_chat(
                    yield streaming_service.format_terminal_info(
                        "Knowledge base search completed", "success"
                    )
+                elif tool_name == "write_todos":
+                    # Stream the full write_todos result so frontend can render the Plan component
+                    yield streaming_service.format_tool_output_available(
+                        tool_call_id,
+                        tool_output
+                        if isinstance(tool_output, dict)
+                        else {"result": tool_output},
+                    )
+                    # Send terminal message with plan info
+                    if isinstance(tool_output, dict):
+                        title = tool_output.get("title", "Plan")
+                        todos = tool_output.get("todos", [])
+                        todo_count = len(todos) if isinstance(todos, list) else 0
+                        yield streaming_service.format_terminal_info(
+                            f"Plan created: {title} ({todo_count} tasks)",
+                            "success",
+                        )
+                    else:
+                        yield streaming_service.format_terminal_info(
+                            "Plan created",
+                            "success",
+                        )
                else:
                    # Default handling for other tools
                    yield streaming_service.format_tool_output_available(