feat(automations): add create_automation HITL tool (NL → draft → approve → save)

Single tool exposed to the main agent. The main agent passes a natural-language `intent`; a focused drafter sub-LLM turns it into a full AutomationCreate JSON; that JSON is surfaced via request_approval (action_type "automation_create") so the user can edit/approve it on a frontend card; on approval the tool persists via AutomationService. Three phases, one tool call. Scope split: - main agent sees only `intent: str` (no schema knowledge leaks into the calling graph) — prompt fragments scoped accordingly. - drafter sub-LLM owns the schema + few-shot intent→JSON examples — lives in the generating graph's prompt (tools/automation/prompt.py). Files: - main_agent/tools/automation/{create.py, prompt.py, __init__.py}: new tool + drafter system prompt with two few-shot intent→JSON examples. - system_prompt/prompts/tools/create_automation/{description.md, example.md}: intent-only guidance for the main agent. - main_agent/tools/index.py: add create_automation to the main-agent allowlist. - new_chat/tools/registry.py: deferred-import factory to break the multi_agent_chat ↔ registry cycle; one ToolDefinition entry.
2026-05-29 19:35:20 +02:00 · 2026-05-28 00:12:02 +02:00 · 2026-05-28 00:12:02 +02:00 · 2b7d91aa03
commit 2b7d91aa03
parent c0232fdcfe
8 changed files with 472 additions and 0 deletions
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/create_automation/init.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/create_automation/init.py
@ -0,0 +1 @@
 """``create_automation`` — description + few-shot examples."""
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/create_automation/description.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/create_automation/description.md
@ -0,0 +1,31 @@
 - `create_automation` — Draft and author a new automation. You describe the
  user's intent; a focused drafter inside the tool turns it into the full
  automation JSON; the user reviews and edits it on an approval card; on
  approval it's saved. All three phases happen in a single tool call.
  - Call when the user wants SurfSense to do something on its own: anything
    recurring or scheduled ("every morning…", "each Monday…", "weekly
    recap…").
  - Args:
    - `intent` (string): restate the user's request **concretely**, in one
      paragraph. Cover three things:
      - **What** should run (the action: summarize, recap, post, draft, …).
      - **When** it should run (schedule + timezone if the user mentioned one;
        otherwise leave the timezone for the drafter to default to UTC).
      - **Static values** the automation needs (folder ids, channel names,
        project keys, parent page ids, …) — list them with their values.
        If the user did NOT supply one the automation needs, say so
        explicitly ("the Notion parent page id was not specified") so the
        drafter leaves a placeholder.
  - Do NOT prompt the user to confirm before calling — the approval card
    IS the confirmation. The user can edit any field on the card.
  - Returns:
    - `{status: "saved", automation_id, name}` — confirm briefly to the
      user ("Saved as automation #N — runs <when>."). Don't dump JSON back.
    - `{status: "rejected", message}` — the user declined on the card.
      Acknowledge once ("Understood, I didn't create it.") and stop. Do
      NOT retry or pitch variants.
    - `{status: "invalid", issues, raw?}` — drafting/validation failed
      before the card was shown. Read the issues, refine your `intent`
      with the missing details, call again.
    - `{status: "error", message}` — surface the message verbatim and
      offer to retry.
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/create_automation/example.md
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/system_prompt/prompts/tools/create_automation/example.md
@ -0,0 +1,13 @@
 <example>
 user: "Every weekday at 9am, summarize new documents in folder 12 and post the summary to Slack channel #daily-digest."
 → create_automation(intent="Every weekday at 09:00 UTC, summarize documents added to folder_id=12 since the last run, then post the summary to Slack channel '#daily-digest'. Static inputs: folder_id=12, slack_channel='#daily-digest'.")
 tool returns: {"status": "saved", "automation_id": 42, "name": "Daily folder 12 digest"}
 (Reply briefly: "Saved as automation #42 — runs weekdays at 9am UTC.")
 </example>
 <example>
 user: "Once a week on Mondays at 7am Paris time, draft a Notion page recapping last week's Jira tickets in project CORE."
 → create_automation(intent="Every Monday at 07:00 Europe/Paris, read last week's Jira issues in project CORE, then draft a Notion page recapping them. Static inputs: jira_project_key='CORE'. The user did NOT specify which Notion page the recap should sit under — leave notion_parent_page_id as a placeholder.")
 tool returns: {"status": "saved", "automation_id": 51, "name": "Weekly CORE Jira recap"}
 (Reply: "Saved as automation #51. I left the Notion parent page id as a placeholder — set it on the automation before next Monday.")
 </example>
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/automation/init.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/automation/init.py
@ -0,0 +1,7 @@
 """``create_automation`` — author + persist an automation via a HITL card."""
 from __future__ import annotations
 from .create import create_create_automation_tool
 __all__ = ["create_create_automation_tool"]
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/automation/create.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/automation/create.py
@ -0,0 +1,203 @@
 """``create_automation`` — NL intent → drafted JSON → HITL approval card → persisted.
 Single tool that:
 1. Drafts a structured automation from the user's intent via a focused sub-LLM
   (system prompt in :mod:`.prompt`).
 2. Surfaces the validated draft in a HITL approval card
   (``action_type="automation_create"``).
 3. On approval, validates the (possibly edited) payload again and persists
   it via :class:`AutomationService`.
 The main agent only restates the user's request as a single ``intent`` string.
 The drafting sub-LLM owns the JSON shape; the HITL card is the user's review.
 """
 from __future__ import annotations
 import json
 import logging
 import re
 from typing import Any
 from uuid import UUID
 from fastapi import HTTPException
 from langchain_core.messages import HumanMessage
 from langchain_core.tools import tool
 from pydantic import ValidationError
 from app.agents.new_chat.tools.hitl import request_approval
 from app.automations.schemas.api import AutomationCreate
 from app.automations.services.automation import AutomationService
 from app.db import User, async_session_maker
 from app.utils.content_utils import extract_text_content
 from .prompt import build_draft_prompt
 logger = logging.getLogger(__name__)
 _JSON_FENCE = re.compile(r"```(?:json)?\s*(.*?)\s*```", re.DOTALL)
 def create_create_automation_tool(
    *,
    search_space_id: int,
    user_id: str | UUID,
    llm: Any,
 ):
    """Factory for the ``create_automation`` tool.
    ``search_space_id`` is injected from the chat session (the model never
    has to guess it). ``llm`` is the drafting sub-model — we reuse the main
    agent's LLM and tag the call so it's identifiable in traces. A fresh
    ``AsyncSession`` is opened per call to avoid stale sessions on
    compiled-agent cache hits (same pattern as the Notion / memory tools).
    """
    uid = UUID(user_id) if isinstance(user_id, str) else user_id
    @tool
    async def create_automation(intent: str) -> dict[str, Any]:
        """Draft + save an automation from a natural-language intent.
        Use this when the user wants SurfSense to do something on its own
        on a schedule (e.g. "every morning summarize folder 12 to Slack").
        Restate the user's request as ONE concrete ``intent`` string: what
        should run, when, and which static values (folder ids, channel
        names, …) it needs.
        The tool drafts the full automation JSON internally, shows the user
        an approval card for review, and persists on approval. Do NOT
        prompt the user to confirm before calling — the card IS the
        confirmation. The user can edit any field there.
        Args:
            intent: Concrete restatement of the user's request. Include
                the schedule (with timezone if mentioned), the action to
                take, and any static values. Example: "Every weekday at
                09:00 UTC, summarize new docs added to folder_id=12 since
                the last run, then post the summary to Slack channel
                '#daily-digest'."
        Returns:
            ``{"status": "saved", "automation_id": int, "name": str}`` on
            approval + save.
            ``{"status": "rejected", "message": "..."}`` when the user
            declines on the card.
            ``{"status": "invalid", "issues": [...], "raw": ...}`` when
            the drafter produced output that did not validate (call again
            with a more precise intent).
            ``{"status": "error", "message": "..."}`` on drafter or
            persistence failure.
            IMPORTANT: when status is ``"rejected"`` the user explicitly
            declined. Acknowledge once and stop — do NOT retry or pitch
            variants without a fresh user request.
        """
        # --- 1. Draft via sub-LLM ---
        prompt = build_draft_prompt(search_space_id=search_space_id, intent=intent)
        try:
            response = await llm.ainvoke(
                [HumanMessage(content=prompt)],
                config={"tags": ["surfsense:internal", "automation-draft"]},
            )
        except Exception as exc:
            logger.exception("create_automation drafting LLM call failed")
            return {"status": "error", "message": f"drafting failed: {exc}"}
        raw_text = extract_text_content(response.content).strip()
        draft = _extract_json(raw_text)
        if draft is None:
            return {
                "status": "invalid",
                "issues": ["model output was not parseable JSON"],
                "raw": raw_text,
            }
        # search_space_id is injected here so the sub-LLM never has to guess.
        draft["search_space_id"] = search_space_id
        try:
            validated_draft = AutomationCreate.model_validate(draft)
        except ValidationError as exc:
            return {
                "status": "invalid",
                "issues": _format_validation_issues(exc),
                "raw": draft,
            }
        # --- 2. HITL approval card ---
        try:
            card_params = validated_draft.model_dump(mode="json", by_alias=True)
            # search_space_id is session-scoped, not user-editable.
            card_params.pop("search_space_id", None)
            result = request_approval(
                action_type="automation_create",
                tool_name="create_automation",
                params=card_params,
                context={"search_space_id": search_space_id},
            )
            if result.rejected:
                return {
                    "status": "rejected",
                    "message": "User declined. Do not retry or suggest alternatives.",
                }
            # --- 3. Persist (re-validate in case the user edited) ---
            final_payload = {**result.params, "search_space_id": search_space_id}
            try:
                final_validated = AutomationCreate.model_validate(final_payload)
            except ValidationError as exc:
                return {
                    "status": "invalid",
                    "issues": _format_validation_issues(exc),
                }
            async with async_session_maker() as session:
                user = await session.get(User, uid)
                if user is None:
                    return {
                        "status": "error",
                        "message": "user not found in this session",
                    }
                service = AutomationService(session=session, user=user)
                created = await service.create(final_validated)
                return {
                    "status": "saved",
                    "automation_id": created.id,
                    "name": created.name,
                }
        except HTTPException as exc:
            return {"status": "error", "message": exc.detail}
        except Exception as exc:
            from langgraph.errors import GraphInterrupt
            if isinstance(exc, GraphInterrupt):
                raise
            logger.exception("create_automation failed")
            return {"status": "error", "message": f"persistence failed: {exc}"}
    return create_automation
 def _extract_json(text: str) -> dict[str, Any] | None:
    """Pull a JSON object out of the model response, tolerating ``` fences."""
    if not text:
        return None
    candidate = text
    fence_match = _JSON_FENCE.search(text)
    if fence_match:
        candidate = fence_match.group(1)
    try:
        parsed = json.loads(candidate)
    except json.JSONDecodeError:
        return None
    return parsed if isinstance(parsed, dict) else None
 def _format_validation_issues(exc: ValidationError) -> list[str]:
    return [
        f"{'.'.join(str(p) for p in err['loc'])}: {err['msg']}"
        for err in exc.errors()
    ]
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/automation/prompt.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/automation/prompt.py
@ -0,0 +1,179 @@
 """System prompt for the drafting sub-LLM inside ``create_automation``.
 Converts a natural-language ``intent`` into a structured ``AutomationCreate``
 JSON object. That object becomes the payload the HITL approval card surfaces.
 Scope split:
    Real automation JSONs live here — this is the graph that *generates*
    the JSON. The main agent's prompt fragments (``description.md`` /
    ``example.md``) only carry intent-string examples; the main agent
    never sees the schema.
 Layout:
    The prompt is concatenated from four format-safe pieces. ``_HEADER`` /
    ``_FOOTER`` carry the only ``str.format`` placeholders; ``_SCHEMA`` and
    ``_FEW_SHOTS`` are plain strings so their JSON literals (and the
    ``{{ inputs.X }}`` Jinja references in queries) can stay readable
    without doubled-brace escaping.
 Catalog handling:
    v1 hard-codes the action/trigger catalog (one action, one trigger).
    When new types ship, swap the inline lines for a render-time pull
    from ``app.automations.actions`` / ``app.automations.triggers`` via
    lazy imports inside :func:`build_draft_prompt` so this module never
    participates in the ``multi_agent_chat`` import cycle.
 """
 from __future__ import annotations
 from datetime import UTC, datetime
 _HEADER = """\
 You are the SurfSense automation drafter. Convert the user intent below
 into a SINGLE JSON object matching the AutomationCreate schema. Output
 ONLY that JSON object — no prose, no markdown fence, no commentary.
 Current UTC time (for cron context): {now}
 Target search_space_id: {search_space_id}
 """
 _SCHEMA = """
 Required JSON shape:
 {
  "name": "<1-200 char identifier>",
  "description": "<one-liner or null>",
  "definition": {
    "schema_version": "1.0",
    "name": "<same as outer name>",
    "goal": "<one sentence>",
    "plan": [
      {
        "step_id": "<slug>",
        "action": "agent_task",
        "params": {
          "query": "<Jinja string referencing {{ inputs.X }}>",
          "auto_approve_all": true
        }
      }
    ],
    "metadata": {"tags": ["..."]}
  },
  "triggers": [
    {
      "type": "schedule",
      "params": {"cron": "<5-field cron>", "timezone": "<IANA tz, default UTC>"},
      "static_inputs": {"<key>": <value>, ...},
      "enabled": true
    }
  ]
 }
 v1 catalog (only these are valid):
 - Actions: agent_task — params: query (string, Jinja), auto_approve_all (bool).
 - Triggers: schedule — params: cron (5-field), timezone (IANA, e.g. "UTC",
  "Europe/Paris"). Has static_inputs (object).
 Conventions:
 - Whatever the plan references via {{ inputs.X }} MUST appear either in a
  trigger's static_inputs OR in definition.inputs.schema_.properties so the
  executor can resolve it at fire time.
 - static_inputs carries values that stay the same across every fire
  (folder ids, channel names, project keys, parent page ids). Put them on
  the trigger that supplies them, not in the plan.
 - If the user did NOT supply a value the plan needs, put "REPLACE_ME" in
  static_inputs. Do NOT invent ids, channels, or paths.
 - Cron is 5-field (minute hour day-of-month month day-of-week). Use the
  timezone the user mentioned; default "UTC" when unspecified.
 - Templating variables available at fire time: inputs.* (merged
  static_inputs + runtime), inputs.fired_at, inputs.last_fired_at.
 """
 _FEW_SHOTS = """
 Few-shot examples (intent → JSON output):
 ### Example 1 — schedule with all static values supplied
 intent: "Every weekday at 09:00 UTC, summarize documents added to folder_id=12 since the last run, then post the summary to Slack channel '#daily-digest'. Static inputs: folder_id=12, slack_channel='#daily-digest'."
 output:
 {
  "name": "Daily folder 12 digest",
  "description": "Weekday 09:00 UTC summary of folder 12 documents posted to #daily-digest",
  "definition": {
    "schema_version": "1.0",
    "name": "Daily folder 12 digest",
    "goal": "Summarize new docs in folder 12 since the last run and post to #daily-digest",
    "plan": [
      {
        "step_id": "summarize_and_post",
        "action": "agent_task",
        "params": {
          "query": "Summarize documents added to folder {{ inputs.folder_id }} since {{ inputs.last_fired_at or 'yesterday' }}, then send the summary to Slack channel {{ inputs.slack_channel }}.",
          "auto_approve_all": true
        }
      }
    ],
    "metadata": {"tags": ["daily", "digest", "slack"]}
  },
  "triggers": [
    {
      "type": "schedule",
      "params": {"cron": "0 9 * * 1-5", "timezone": "UTC"},
      "static_inputs": {"folder_id": 12, "slack_channel": "#daily-digest"},
      "enabled": true
    }
  ]
 }
 ### Example 2 — schedule with a missing value (REPLACE_ME placeholder)
 intent: "Every Monday at 07:00 Europe/Paris, read last week's Jira issues in project CORE, then draft a Notion page recapping them. Static inputs: jira_project_key='CORE'. The user did NOT specify the Notion parent page id — leave it as a placeholder."
 output:
 {
  "name": "Weekly CORE Jira recap",
  "description": "Monday 07:00 Europe/Paris recap of last week's CORE Jira issues, drafted to Notion",
  "definition": {
    "schema_version": "1.0",
    "name": "Weekly CORE Jira recap",
    "goal": "Recap last week's CORE Jira issues into a Notion page",
    "plan": [
      {
        "step_id": "recap",
        "action": "agent_task",
        "params": {
          "query": "List Jira issues in project {{ inputs.jira_project_key }} updated in the 7 days before {{ inputs.fired_at }}. Draft a Notion page under parent id {{ inputs.notion_parent_page_id }} titled 'CORE recap — week of {{ inputs.fired_at }}'.",
          "auto_approve_all": true
        }
      }
    ],
    "metadata": {"tags": ["weekly", "recap", "jira", "notion"]}
  },
  "triggers": [
    {
      "type": "schedule",
      "params": {"cron": "0 7 * * 1", "timezone": "Europe/Paris"},
      "static_inputs": {"jira_project_key": "CORE", "notion_parent_page_id": "REPLACE_ME"},
      "enabled": true
    }
  ]
 }
 """
 _FOOTER = """
 User intent:
 {intent}
 """
 def build_draft_prompt(*, search_space_id: int, intent: str) -> str:
    """Render the drafting sub-LLM system prompt for the given intent."""
    return (
        _HEADER.format(
            now=datetime.now(UTC).isoformat(timespec="seconds"),
            search_space_id=search_space_id,
        )
        + _SCHEMA
        + _FEW_SHOTS
        + _FOOTER.format(intent=intent.strip())
    )
--- a/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/index.py
+++ b/surfsense_backend/app/agents/multi_agent_chat/main_agent/tools/index.py
@ -10,6 +10,7 @@ MAIN_AGENT_SURFSENSE_TOOL_NAMES_ORDERED: tuple[str, ...] = (
    "web_search",
    "scrape_webpage",
    "update_memory",
    "create_automation",
 )
 MAIN_AGENT_SURFSENSE_TOOL_NAMES: frozenset[str] = frozenset(
--- a/surfsense_backend/app/agents/new_chat/tools/registry.py
+++ b/surfsense_backend/app/agents/new_chat/tools/registry.py
@ -150,6 +150,28 @@ class ToolDefinition:
    reverse: Callable[[dict[str, Any], Any], dict[str, Any]] | None = None
 # =============================================================================
 # Deferred-import factories
 # =============================================================================
 # Used for tools whose impls live under ``multi_agent_chat``. Importing those
 # at module-load time would cycle (``multi_agent_chat`` middleware imports
 # this registry). The import inside the factory runs only when
 # ``build_tools`` is called, by which point ``multi_agent_chat`` is fully
 # initialised.
 def _build_create_automation_tool(deps: dict[str, Any]) -> BaseTool:
    from app.agents.multi_agent_chat.main_agent.tools.automation import (
        create_create_automation_tool,
    )
    return create_create_automation_tool(
        search_space_id=deps["search_space_id"],
        user_id=deps["user_id"],
        llm=deps["llm"],
    )
 # =============================================================================
 # Built-in Tools Registry
 # =============================================================================
@ -261,6 +283,21 @@ BUILTIN_TOOLS: list[ToolDefinition] = [
        requires=["db_session", "search_space_id", "user_id"],
    ),
    # =========================================================================
    # AUTOMATION AUTHORING - single HITL tool. The tool takes an NL ``intent``
    # from the main agent, drafts the full AutomationCreate JSON via a focused
    # sub-LLM, surfaces it on an approval card, and persists on approval. The
    # factory defers its import because the impl lives under ``multi_agent_chat``
    # and that package transitively pulls this registry via middleware;
    # deferring to ``build_tools`` call-time breaks the cycle without a
    # parallel registry.
    # =========================================================================
    ToolDefinition(
        name="create_automation",
        description="Draft an automation from an NL intent; user approves the card; tool saves",
        factory=_build_create_automation_tool,
        requires=["search_space_id", "user_id", "llm"],
    ),
    # =========================================================================
    # MEMORY TOOL - single update_memory, private or team by thread_visibility
    # =========================================================================
    ToolDefinition(
		`@ -0,0 +1 @@`
							"""``create_automation`` — description + few-shot examples."""