- Deleted the `search_surfsense_docs` tool and its associated files, streamlining the agent's toolset.
- Updated various components and prompts to remove references to the now-removed tool, ensuring consistency across the codebase.
- Adjusted documentation to direct users to the SurfSense documentation link for product-related queries instead.
- Updated the `mentionParams` function to separate document and connector mentions, improving clarity and organization of the output.
- Modified the `mentionsFromParams` function to correctly handle and categorize mentions from parameters, ensuring connectors are processed separately.
- Adjusted documentation comments for better understanding of the changes in mention handling.
- Replaced the FileJson icon with SquarePen in both AutomationsEmptyState and AutomationsHeader components.
- Updated button label from "Create via JSON" to "Create manually" for clarity in the automation creation process.
- Added support for @-mentions in agent tasks, allowing users to reference documents, folders, and connectors directly in their queries.
- Updated `run_agent_task` to resolve mentions and include them in the context passed to the agent.
- Introduced new parameters in `AgentTaskActionParams` for handling mentioned document and connector IDs.
- Refactored the automation edit and new components to utilize the new `AutomationBuilderForm` for a more streamlined user experience.
- Removed deprecated JSON forms to simplify the automation creation process.
The shared AsyncPostgresSaver caches DB connections in a module-level
pool. Cached connections are bound to the asyncio loop that opened
them, but `run_async_celery_task` discards the loop on each task's
exit — so after the first task the pool holds connections pointing
to a dead loop, and the next automation hangs 30s before failing
with `PoolTimeout: couldn't get a connection after 30.00 sec`.
Swap agent_task to `InMemorySaver`; automation runs only need state
within one Celery task, so nothing is lost. Site-local TODO tracks
the proper future fix (dispose the checkpointer pool around each
Celery task, mirroring `_dispose_shared_db_engine`).
Top-level tests that span multiple submodules:
- test_stores.py (7): the trigger + action registry contracts — register
round-trip, unknown type → None (not raise), duplicate registration
rejected, defensive snapshot from all_*.
- test_definition_types.py (2): params_schema property on both
ActionDefinition and TriggerDefinition reflects the Pydantic model.
- test_persistence_enums.py (3): exact string values + member sets of
AutomationStatus / RunStatus / TriggerType — the postgres-mirrored
contract that breaks stored rows if drifted.
- test_import_registrations.py (2): the bundled agent_task action and
schedule trigger self-register on package import (canary for the
side-effect import chain).
conftest.py adds isolated_action_registry / isolated_trigger_registry
fixtures: snapshot + restore of the module-level _REGISTRY dicts so
tests that add their own definitions don't leak across the suite.
14 tests, pure unit.
auto_decide.build_auto_decisions (3): produces one decision per
action_request entry, defaults to one decision for legacy scalar
interrupts, and skips malformed interrupts silently so a misbehaving
tool can't take down the whole agent_task step.
finalize.extract_final_assistant_message (4): string-content AIMessage
returned verbatim, list-of-parts content concatenated (skipping
non-text parts like tool_use), walks back past trailing ToolMessages
to find the last AIMessage, and returns None when no extractable text
is present (so callers can branch on silence vs. empty).
7 tests, pure unit.
render.py (4): variable substitution, StrictUndefined raises on missing
keys, evaluate_predicate coerces to bool, render_value walks dicts/lists
and renders string leaves.
filters.py (4): slugify produces URL-safe output, date formats datetime
with strftime, date(None) → "" so templates can write
{{ inputs.last_fired_at | date }} on first run, date(str) passes through.
environment.py (4): the sandbox boundary — disallowed Jinja built-ins
(e.g. pprint) raise, and the finalize hook coerces non-string outputs
to predictable wire shapes (datetime → ISO, None → "", dict → JSON).
context.py (1): build_run_context exposes {run, inputs, steps} with the
exact shape every plan template body relies on.
13 tests total, all pure unit.
execute_step (6 tests): happy path, when=falsy → skipped, unknown action
→ ActionNotFound failure, retry budget exhaustion (attempts = 1 +
max_retries), retry recovery, and template-rendering of step params
against the run context.
with_retries (3 tests): first-try success returns attempts=1, recovery
returns the actual attempt that produced the result, and exhaustion
re-raises the last exception with the handler called 1 + max_retries
times.
All tests use backoff="none" to keep wall-clock time zero; timeout
testing is intentionally skipped (would need >= 1s per the int contract,
and exhaustion already locks that any Exception triggers retry).
Cover the input-validation contract dispatch_run relies on:
- no declared schema → inputs pass through unchanged (regression site
that previously stripped runtime keys like fired_at / last_fired_at
and broke Jinja templates).
- declared schema, valid inputs → passthrough validated.
- declared schema, invalid inputs → DispatchError (uniform exception
type, not raw jsonschema.ValidationError).
Plus the DispatchError exception identity (Exception subclass, message
preserved, isinstance-friendly for the dispatch layer's consumers).
4 tests, pure unit.
Cover the cron + IANA timezone + UTC normalization contract for the
schedule trigger: next-match strictly-after, DST offset shift across
spring-forward, malformed cron / unknown timezone rejection, and the
ScheduleTriggerParams Pydantic gate that surfaces InvalidCronError as
ValidationError at the API boundary.
8 tests, pure unit (no DB, no mocks).
Recent runs card under triggers. Each row expands lazily to fetch the
full run (step results, output, artifacts, error). 20-row cap for now;
real pagination lands if usage demands it.
Closes the create loop in chat: the agent describes user intent → the
drafter sub-LLM produces an AutomationCreate JSON → this card surfaces
a structured preview → approve persists; reject cancels. Edits flow
through chat refinement (re-call with a refined intent), not in-card,
so the card stays simple and the multi-turn checkpointer carries the
context.
Tool UI (components/tool-ui/automation/):
- create-automation.tsx — entry dispatcher + ApprovalCard chrome
(pending/processing/complete/rejected via useHitlPhase) + SavedCard
(links to the detail page) + InvalidCard (lists drafter validation
issues) + ErrorCard (verbatim message). Rejection result is hidden
because the approval card itself shows the rejected phase inline.
- automation-draft-preview.tsx — structured preview body: name +
description + goal, triggers (humanised cron + tz + static-input
keys), plan steps (step_id → action), and a collapsible raw JSON
for power users.
Wiring:
- components/tool-ui/index.ts — re-export.
- features/chat-messages/timeline/tool-registry/registry.ts —
register create_automation → CreateAutomationToolUI (dynamic import,
same pattern as other connector tools).
- contracts/enums/toolIcons.tsx — Workflow icon + "Create automation"
display name so fallback chrome (and timeline headers) are honest.
Shared util:
- lib/automations/describe-cron.ts — lifted from the route slice's
lib/ folder since both the dashboard slice and the new approval card
now render schedule descriptions. Slice imports updated; the now-
empty slice lib/ folder is gone.
Backend prompt fragments:
- main_agent/system_prompt/.../create_automation/description.md and
the tool's docstring no longer promise in-card edits. They make the
refinement path explicit: if the user wants changes after seeing the
draft, they reply in chat and the agent calls the tool again with a
refined intent.
v1 deliberately excludes:
- In-card edit form / right-side edit panel — defer until we see real
demand. The chat refinement loop covers the common case.
- approve_always / persistent allow rules — automations are a single
artifact, not a repeated mutation, so the "trust this kind of call"
affordance doesn't apply.
Vertical slice at /dashboard/[id]/automations/[automation_id]. Branches
in the orchestrator are: perms loading → skeleton, no-access → access
denied panel, bad id → not-found, fetch loading → skeleton, fetch
error → not-found, loaded → header + definition + triggers.
Route:
- page.tsx — server boundary; extracts both ids.
- automation-detail-content.tsx — client orchestrator.
Header:
- automation-detail-header.tsx — back link, name, status badge,
description, pause/resume + delete actions. Delete navigates back to
the list via a new onDeleted hook on DeleteAutomationDialog so the
list page (where the row just vanishes) stays unaffected.
- automation-not-found.tsx — 404/403/NaN-id panel. We don't
distinguish missing vs. forbidden in the UI.
Definition (read-only in v1):
- automation-definition-section.tsx — wrapper Card; renders goal +
tags + execution defaults + inputs schema (if present) + plan.
- plan-step-card.tsx — one step (when, output_as, retries, timeout,
params JSON).
- execution-summary.tsx — timeout / max_retries / backoff /
concurrency + on_failure step count.
- inputs-schema-preview.tsx — formatted JSON of inputs.schema; only
rendered when the definition declares inputs.
Triggers:
- automation-triggers-section.tsx — wrapper Card, "Add via chat" CTA
(creation is intent-driven, same philosophy as automations).
- trigger-card.tsx — schedule + timezone + cron, last/next fire
hints, static_inputs JSON, enable Switch and remove button.
- delete-trigger-dialog.tsx — confirm + mutation atom.
Shared:
- lib/describe-cron.ts — moved out of automation-triggers-summary.tsx
so both list and detail can describe schedules consistently
(daily/weekdays/weekly/monthly/hourly, raw cron fallback).
Loading:
- automation-detail-loading.tsx — same shell as the loaded view so the
layout doesn't jump on data arrival.
RBAC: each interactive surface is independently gated
(canUpdate/canDelete/canCreate) so the orchestrator stays thin and the
component tree is self-documenting about what each action requires.
Out of scope (later PRs):
- Editing definition / trigger params (raw-JSON path) — PR5
- Run history — PR6
The empty-state card already hosts the primary "Create via chat" CTA;
keeping the header button on the same screen showed two identical
buttons. Adds an optional ``showCreateCta`` prop to AutomationsHeader
(default true) and turns it off only in the empty branch so the card
stays the focal point.
Adds an "Automations" nav entry rendered explicitly between Inbox and
(on mobile) Documents, mirroring how those two are pulled out of the
nav list and rendered above the chat sections. The icon is Workflow
to match settings/RBAC labelling.
LayoutDataProvider:
- Adds the entry to navItems pointing at /dashboard/[id]/automations.
- Marks isActive via pathname so the row highlights on the route.
- Tags /automations as a workspace-panel page so it renders in the
centered settings-style viewport (same chrome as Team / settings).
Sidebar:
- Pulls out automationsItem alongside inboxItem and documentsItem.
- Renders it between them.
- Excludes its URL from footerNavItems so it doesn't double-render.
Page-level RBAC still gates the actual view; the sidebar entry is
always visible (consistent with Inbox/Documents which are also not
gated at the nav layer).
Anonymous (FreeLayoutDataProvider) intentionally not touched —
automations is an authenticated feature.
Vertical slice at /dashboard/[id]/automations. The page is read-only by
default; every action gates on backend automations:* permissions via a
co-located permissions hook so adding/removing surfaces stays a
one-file change.
Route:
- page.tsx — server boundary; extracts search_space_id.
- automations-content.tsx — client orchestrator (loading / no-access /
error / empty / table branches).
Components (one concern per file):
- automations-header.tsx — title + count + "Create via chat" CTA.
- automations-table.tsx + automation-row.tsx — name/status/updated
columns; row name links to detail (PR4).
- automation-status-badge.tsx — active / paused / archived pill.
- automation-row-actions.tsx — ⋯ menu with pause/resume + delete,
gated on canUpdate / canDelete. Archived rows hide the toggle.
- delete-automation-dialog.tsx — destructive confirm; mentions FK
cascade explicitly so users know triggers/runs go too.
- automations-empty-state.tsx — zero-state pointing to chat (creation
is intent-driven via the create_automation HITL tool, not a form).
- automations-loading.tsx — skeleton rows in the same shell so the
layout doesn't shift on data arrival.
- automation-triggers-summary.tsx — small cron-describer (daily,
weekdays, weekly, monthly, hourly) + timezone for the detail page.
Kept inline since v1 only registers schedule.
Hooks:
- use-automation-permissions.ts — single source of truth for the
slice's canCreate/canRead/canUpdate/canDelete/canExecute gates,
backed by myAccessAtom.
Pause/resume and delete reuse the PR2 mutation atoms, so list +
detail caches stay coherent without bespoke invalidation.
Out of scope (later PRs):
- detail route (definition viewer + triggers manager) — PR4
- raw JSON editor — PR5
- nav entry / sidebar wiring — small follow-up PR
DELETE endpoints in the automations API return 204; calling .json() on
an empty body throws SyntaxError. Treat 204 as data=null and skip
schema validation so callers can opt out of response bodies without
errors or spurious schema-mismatch warnings.
Also drops a pre-existing 'unknown → BodyInit' type error on the
non-JSON body branch via a narrow cast (caller is responsible for
passing a real BodyInit when Content-Type isn't application/json).
Backend already defined automations:create/read/update/delete/execute and
seeded them on Owner/Editor/Viewer roles, but the Settings → Roles UI was
missing the metadata to render them properly.
- backend: add PERMISSION_DESCRIPTIONS entries for the 5 automations perms so
the role editor stops falling back to "Permission for automations:create".
- frontend: add automations to CATEGORY_CONFIG (Workflow icon, slotted between
podcasts and connectors) so the role editor groups them as a real section.
- frontend: extend the three ROLE_PRESETS — Editor and Contributor get
create/read/update/execute (mirroring backend Editor); Viewer gets read.
Prep work for the automations frontend; canPerform/usePermissionGate already
handle the runtime gating, so no new hook is needed.
Single tool exposed to the main agent. The main agent passes a natural-language
`intent`; a focused drafter sub-LLM turns it into a full AutomationCreate JSON;
that JSON is surfaced via request_approval (action_type "automation_create") so
the user can edit/approve it on a frontend card; on approval the tool persists
via AutomationService. Three phases, one tool call.
Scope split:
- main agent sees only `intent: str` (no schema knowledge leaks into the calling
graph) — prompt fragments scoped accordingly.
- drafter sub-LLM owns the schema + few-shot intent→JSON examples — lives in
the generating graph's prompt (tools/automation/prompt.py).
Files:
- main_agent/tools/automation/{create.py, prompt.py, __init__.py}: new tool
+ drafter system prompt with two few-shot intent→JSON examples.
- system_prompt/prompts/tools/create_automation/{description.md, example.md}:
intent-only guidance for the main agent.
- main_agent/tools/index.py: add create_automation to the main-agent allowlist.
- new_chat/tools/registry.py: deferred-import factory to break the
multi_agent_chat ↔ registry cycle; one ToolDefinition entry.
- Added new environment variables for controlling task execution limits, including `SURFSENSE_SUBAGENT_INVOKE_TIMEOUT_SECONDS`, `SURFSENSE_TASK_BATCH_CONCURRENCY`, and `SURFSENSE_TASK_BATCH_MAX_SIZE`.
- Updated documentation to reflect new batch processing capabilities for `task` calls, allowing for concurrent execution of multiple subagent tasks.
- Improved error handling and receipt generation for deliverables, ensuring consistent feedback on task status.
- Refactored middleware to incorporate search space ID for better task management.
Manual-as-a-standalone-trigger conflates "user clicks Run now" with the
trigger model and forces ad-hoc input plumbing on the caller. Remove the
unreachable surface so the tree reflects reality (schedule is the only
v1 trigger).
- Unregister `manual`: drop import from triggers/__init__.py
- Delete `app/automations/triggers/manual/`
- Drop `RunService.dispatch_manual` (RunService is now read-only)
- Drop `POST /automations/{id}/run` and `RunDispatched` schema
- Keep `TriggerType.MANUAL` Python + PG enum value (reserved, documented)
to avoid an Alembic round-trip when Run-now is redesigned