diff --git a/automation-design-plan.md b/automation-design-plan.md
new file mode 100644
index 000000000..072f7ad99
--- /dev/null
+++ b/automation-design-plan.md
@@ -0,0 +1,1395 @@
+# SurfSense Automation Feature — Design Plan (v2)
+
+A generic, extensible automation system for SurfSense that lets users (and
+future SurfSense features) trigger agent work on a schedule, on an external
+event, or on demand — with the ability to author automations either by hand
+or from a natural-language description that yields an editable, structured
+definition.
+
+This document supersedes the v1 draft. It folds in the design audit pass and
+the corrections from working through worked examples (notably: removing the
+connector bias, clarifying the executor's role, integrating MCP cleanly, and
+committing to JSON Schema as the single declarative language).
+
+---
+
+## 1. The load-bearing principle
+
+> **The JSON definition is the program. Everything else is interpreter.**
+
+Every decision in this document serves that principle. If we ever face a
+design choice and one option lets some behavior leak out of the definition
+into the engine, we pick the other option.
+
+Three properties follow from this principle, and they're the reason the
+system will survive feature growth:
+
+- **Reproducibility** — same definition + same inputs → same observable
+  behavior, regardless of which version of the engine runs it.
+- **Portability** — definitions can be exported, imported, version-
+  controlled, code-reviewed, and shared across SurfSense instances.
+- **LLM tractability** — the NL authoring flow works because the LLM only
+  needs to produce a self-contained JSON document that validates against a
+  schema. It doesn't need to understand the engine.
+
+---
+
+## 2. The four-layer contract
+
+The system is structured as four layers. Layers 1, 2, and 4 are defined by
+SurfSense developers (at registration time). Layer 3 is what users write
+(or the NL generator produces). The runtime reads all four to do its job.
+
+| Layer | What it is | Defined by |
+| ----- | ---------- | ---------- |
+| **1. Capability registry** | What this SurfSense instance can do | Developers, at startup |
+| **2. Action contract** | Per-action input/output schema | Developers, at startup |
+| **3. Automation definition** | One concrete saved automation | Users (or NL generator) |
+| **4. Trigger contract** | Per-trigger config and payload schemas | Developers, at startup |
+
+Each layer constrains the one above. The runtime reads all four but doesn't
+know what's in them ahead of time. That's how a new capability or trigger
+type becomes available across the engine without code changes outside its
+registration.
+
+### Schema language
+
+Every shape in every layer is described in **JSON Schema (draft 2020-12).**
+No exceptions, no parallel languages, no inline shorthand. Two documented
+extensions on top:
+
+- `default: "$some_token"` — runtime-resolved defaults. The vocabulary is
+  fixed: `$last_fired_at`, `$creator`, `$space_default`. The engine resolves
+  these to values before validation.
+- `x-surfsense-*` annotations — editor hints (widget type, autocomplete
+  source). The validator ignores them; the form editor reads them.
+
+---
+
+## 3. Capability registry (Layer 1)
+
+A `Capability` is one discrete thing the SurfSense backend exposes —
+"post a Slack message," "query the Search Space," "generate a podcast." It
+is the atomic unit of "things automations can do."
+
+```python
+@dataclass
+class Capability:
+    id: str                                # "slack.post_message"
+    name: str                              # "Post Slack message"
+    description: str                       # for the NL generator
+    input_schema: dict                     # JSON Schema
+    output_schema: dict                    # JSON Schema
+    required_credentials: list[CredSpec]   # what creds the handler needs
+    side_effects: set[SideEffect]          # READ, WRITE, EXTERNAL_WRITE,
+                                           # COST_INCURRING, USER_VISIBLE
+    expected_duration_seconds: int         # estimate or upper bound
+    cost_estimate: Callable[[dict], Decimal]  # f(input) → estimated USD
+    handler: AsyncHandler
+```
+
+### Where capabilities live: a two-tier registry
+
+The capability registry has different storage requirements for different
+kinds of capabilities. **Native capabilities and MCP capabilities have
+different lifecycles**, so they're persisted differently:
+
+| Tier | What's there | Where it lives | Lifetime |
+| --- | --- | --- | --- |
+| **Native** | Capabilities defined in SurfSense's codebase (`search_space.query`, `agent.run`, etc.) | In-memory dict, populated at startup from `automations/capabilities/native.py` | Process lifetime, identical across all workers |
+| **MCP (durable)** | The fact that this SearchSpace has connected to this MCP server, the tool list it exposes, credentials | PostgreSQL: `mcp_connections` and `mcp_tools` tables | Persistent across restarts and across time |
+| **MCP (cached)** | Handler closures wrapping `(connection_id, tool_name)` | Per-worker in-memory cache, lazily built from the database on first reference | Process lifetime, rebuilt on demand |
+
+The reason this matters: **a user connects an MCP server on Monday, writes
+an automation on Tuesday, the automation runs on Friday.** Between Monday
+and Friday, workers will restart many times. Any state that only lives in
+worker memory is gone. The closures generated at connection time would
+not survive.
+
+So we split persistence by lifecycle:
+
+- Native capability handlers live in the codebase. Always available, no
+  need for the database.
+- MCP capability metadata lives in the database, so the knowledge "this
+  SearchSpace has these capabilities" survives any restart.
+- The actual closures are built on demand from the database state. They
+  live in worker memory only until the worker dies, at which point they
+  get rebuilt by the next worker that needs them.
+
+### MCP database schema
+
+```sql
+CREATE TABLE mcp_connections (
+    id                UUID PRIMARY KEY,
+    search_space_id   INT REFERENCES search_spaces(id),
+    server_url        TEXT,
+    transport         TEXT,                  -- "http", "stdio", etc.
+    name              TEXT,                  -- "Slack (Acme workspace)"
+    access_token      BYTEA,                 -- encrypted at rest
+    refresh_token     BYTEA,                 -- encrypted at rest
+    expires_at        TIMESTAMPTZ,
+    last_harvested_at TIMESTAMPTZ,
+    created_at        TIMESTAMPTZ,
+    created_by        INT REFERENCES users(id)
+);
+
+CREATE TABLE mcp_tools (
+    id              UUID PRIMARY KEY,
+    connection_id   UUID REFERENCES mcp_connections(id) ON DELETE CASCADE,
+    name            TEXT,                    -- "post_message"
+    description     TEXT,
+    input_schema    JSONB,
+    output_schema   JSONB,
+    side_effects    TEXT[],                  -- inferred or admin-curated
+    UNIQUE (connection_id, name)
+);
+```
+
+### MCP lifecycle: connect, harvest, invoke
+
+Three phases, each with distinct concerns.
+
+**Phase 1 — Connect (one-time, on user action).** User clicks "Connect
+Slack MCP." OAuth flow completes. A row is added to `mcp_connections`
+with the encrypted tokens.
+
+**Phase 2 — Harvest (right after connect, also re-runnable).** SurfSense
+opens a temporary client to the MCP server, calls `tools/list`, and writes
+one row to `mcp_tools` per discovered tool. The temporary client is then
+discarded; only the database state persists.
+
+```python
+async def harvest_mcp_server(connection_id: UUID, ctx):
+    connection = await ctx.db.get(MCPConnection, connection_id)
+    client = build_temporary_client(connection)
+    tools = await client.list_tools()
+    
+    # Replace existing tool rows for this connection
+    await ctx.db.execute(
+        delete(MCPTool).where(MCPTool.connection_id == connection_id)
+    )
+    for tool in tools:
+        ctx.db.add(MCPTool(
+            connection_id=connection_id,
+            name=tool.name,
+            description=tool.description,
+            input_schema=tool.inputSchema,
+            output_schema=tool.outputSchema,
+            side_effects=infer_side_effects(tool),
+        ))
+    connection.last_harvested_at = now()
+    await ctx.db.commit()
+```
+
+Harvesting can be re-run on a schedule (say, daily) or on user request,
+to pick up new tools the server has added.
+
+**Phase 3 — Invoke (every time a step references an MCP capability).**
+This is where the closure gets built. The executor calls
+`ctx.get_capability("slack.post_message")`. The worker's in-memory cache is
+checked; on miss, the database is queried:
+
+```python
+async def get_capability(capability_id: str, ctx: ActionContext) -> Capability:
+    cached = _WORKER_CAPABILITY_CACHE.get((ctx.search_space.id, capability_id))
+    if cached:
+        return cached
+    
+    if is_native(capability_id):
+        capability = _NATIVE_REGISTRY[capability_id]
+    else:
+        # MCP path: look up tool metadata
+        tool_row = await ctx.db.execute(
+            select(MCPTool)
+            .join(MCPConnection)
+            .where(MCPConnection.search_space_id == ctx.search_space.id)
+            .where(tool_qualified_name(MCPTool, MCPConnection) == capability_id)
+        )
+        capability = Capability(
+            id=capability_id,
+            input_schema=tool_row.input_schema,
+            output_schema=tool_row.output_schema,
+            side_effects=set(tool_row.side_effects),
+            handler=make_mcp_handler(
+                connection_id=tool_row.connection_id,
+                tool_name=tool_row.name,
+            ),
+        )
+    
+    _WORKER_CAPABILITY_CACHE[(ctx.search_space.id, capability_id)] = capability
+    return capability
+```
+
+The closure created by `make_mcp_handler` captures only the connection ID
+and tool name. When invoked, it asks `ctx.resolve_mcp_client(connection_id)`
+to build an authenticated client from the connection record (including
+token refresh if needed). That client is also transient — built per call,
+discarded after.
+
+### Credentials: resolved at the moment of use
+
+The handler doesn't carry credentials and the closure doesn't capture them.
+When invoked, the handler asks `ActionContext` for what it needs:
+
+```python
+def make_mcp_handler(connection_id: UUID, tool_name: str):
+    async def handler(ctx: ActionContext, args: dict) -> Any:
+        # Credential resolution happens here, per call
+        client = await ctx.resolve_mcp_client(connection_id)
+        response = await client.call_tool(name=tool_name, arguments=args)
+        return response.content
+    return handler
+```
+
+`ctx.resolve_mcp_client(connection_id)`:
+1. Loads the `mcp_connections` row
+2. Decrypts the access token
+3. Refreshes the token if it's expired (using the refresh token)
+4. Constructs an `MCPClient` with the token set as a default authorization
+   header
+
+The HTTP library carries the auth header on every subsequent call the
+client makes — the handler doesn't think about it after construction.
+
+For native capabilities calling external APIs directly,
+`ctx.resolve_http_client(provider)` returns an authenticated `httpx`
+client. For LLM operations, `ctx.resolve_llm(provider)` returns a
+configured LLM client. **Three resolution methods, one pattern: the
+context returns a client already authenticated.**
+
+Three properties this gives us:
+
+- **Credentials never appear in the automation definition.** The JSON
+  contains capability references and connection IDs, never tokens.
+- **Credentials never appear in the LLM's context.** Even during
+  `agent_task`, the LLM sees tool descriptions only; the host holds
+  credentials and uses them when executing the tools the LLM requests.
+- **Credentials are loaded per-call, not pre-loaded.** The credential
+  exists in memory only during the moment a handler is making a call. No
+  long-lived secrets in worker memory.
+
+---
+
+## 4. Action contract (Layer 2)
+
+An `Action` is what a user references in a plan step. Most actions are
+thin wrappers around one capability (e.g., `slack_post` wraps
+`slack.post_message`). Some compose: `agent_task` is one action whose
+handler invokes the LangGraph runtime, which in turn can call many
+capabilities.
+
+```python
+@dataclass
+class ActionDefinition:
+    type: str                              # "agent_task", "slack_post"
+    name: str                              # for the UI
+    description: str                       # for the NL generator
+    config_schema: dict                    # JSON Schema for action.config
+    output_contract: dict | DynamicOutput  # what it produces
+    uses_capabilities: list[str]           # IDs from the registry
+    produces_artifacts: list[ArtifactSpec] # see §8
+    handler: AsyncHandler
+```
+
+### Tight vs loose actions
+
+Two patterns coexist by design:
+
+- **Tight actions** (`slack_post`, `linear_create_issue`, `send_email`):
+  config_schema is fully specified, output_contract is fixed, handler is a
+  thin wrapper. ~20 LOC each. Used when the user knows exactly what they
+  want done — no LLM tokens spent on trivial work.
+
+- **Loose actions** (`agent_task`): config_schema accepts a `prompt` and a
+  `tools` allowlist; output_contract is *dynamic* — the user declares the
+  output shape they want via `output_schema` in the step config; the
+  handler asks the LLM to return that shape and validates. Used when
+  judgment is needed.
+
+The agent's tool list is **the same capabilities** that tight actions call
+directly. One registry, two invocation modes. Adding a new MCP server gives
+both modes access to its tools automatically.
+
+### How names in the definition become function calls
+
+The definition contains strings like `"action": "slack_post"`. The string is
+just a name — it does not point to a function. At runtime, the executor
+performs a **name-based lookup** against the action registry:
+
+```python
+# step.action is a string from the JSON definition, e.g. "slack_post"
+action_def = _ACTION_REGISTRY[step.action]   # dict lookup
+handler = action_def.handler                  # Python callable
+result = await handler(ctx, resolved_config)  # invocation
+```
+
+The registry is a Python dict (or a thin wrapper around one) populated at
+process startup. Each entry in `automations/actions/*.py` calls a
+`register_action(...)` function at module import time, putting its
+`ActionDefinition` (including the handler function reference) into the
+registry.
+
+The same pattern applies to capabilities. The definition references
+capabilities by ID (`"slack.post_message"`); the capability registry maps
+the ID to a `Capability` object holding the handler. Definitions never
+reference Python code directly — they reference names that the registry
+resolves to code.
+
+This separation is what makes the contract portable. The definition is
+pure data. The registry is the engine's runtime vocabulary. They meet at
+name-based lookup; nothing else crosses the boundary.
+
+### The full expressive spectrum
+
+The contract supports a continuous spectrum from purely deterministic to
+fully agentic. Six practical shapes worth recognizing:
+
+| Shape | Example | Cost / latency profile |
+| --- | --- | --- |
+| **1. Direct call** | `slack_post` with literal channel and template | No LLM. ~200ms. Fractions of a cent. |
+| **2. Direct call with computed inputs** | `linear_create_issue` using `{{summary.title}}` from a prior step | No LLM for this step. Cheap. |
+| **3. Single-domain agent task** | `agent_task` with `tools: ["slack.*"]` only | One LLM, bounded toolset. |
+| **4. Multi-domain agent task, narrow** | `agent_task` with `tools: ["github.list_pull_requests", "linear.create_issue"]` | One LLM, named capabilities. |
+| **5. Multi-domain agent task, broad** | `agent_task` with `tools: ["slack.*", "github.*", "linear.*"]` | One LLM, large toolset, most agentic. |
+| **6. Composed plan** | `agent_task` (narrow) for thinking → `slack_post` + `linear_create_issue` for acting | Best cost-to-power ratio. |
+
+Shape 6 is the underrated one and the cost-and-speed answer. The agent
+reasons once (Shape 3 or 4) and its structured output drives several
+deterministic actions. This is roughly 5–10x cheaper and 3–4x faster than
+forcing the agent to do everything (Shape 5) and produces the same outcome.
+
+**The NL generator's job is to propose Shape 6-style plans by default.**
+The Review LLM flags proposals that use `agent_task` for steps a
+deterministic action could handle. This is the discipline that keeps
+automations cheap at scale.
+
+The user navigates the spectrum by intent (describing what they want), not
+by mechanism — the shape selection is the engine's responsibility, not the
+user's.
+
+---
+
+## 5. Automation definition (Layer 3)
+
+This is the JSON the user writes (or the NL generator produces). Stored in
+`automations.definition` as JSONB.
+
+### Top-level shape
+
+```jsonc
+{
+  "schema_version": "1.0",
+  "name": "Daily competitor digest",
+  "goal": "Summarize new competitor content and post to Slack",
+
+  "inputs": {
+    "schema": {
+      "type": "object",
+      "required": ["since"],
+      "properties": {
+        "since": { "type": "string", "format": "date-time",
+                   "default": "$last_fired_at" },
+        "tags":  { "type": "array", "items": { "type": "string" },
+                   "default": ["competitor"] }
+      }
+    }
+  },
+
+  "triggers": [
+    {
+      "type": "schedule",
+      "config": { "cron": "0 9 * * 1-5", "timezone": "Africa/Kigali" }
+    }
+  ],
+
+  "plan": [
+    {
+      "step_id": "research",
+      "action": "agent_task",
+      "config": {
+        "prompt": "Find documents tagged {{inputs.tags}} indexed since {{inputs.since}}. Return JSON with bullets and source_doc_ids.",
+        "tools": ["search_space.query", "search_space.fetch_document"],
+        "model": "anthropic/claude-sonnet-4-7",
+        "output_schema": {
+          "type": "object",
+          "required": ["bullets", "source_doc_ids"],
+          "properties": {
+            "bullets":        { "type": "array", "items": { "type": "string" } },
+            "source_doc_ids": { "type": "array", "items": { "type": "string" } }
+          }
+        }
+      },
+      "output_as": "summary"
+    },
+    {
+      "step_id": "deliver",
+      "action": "slack_post",
+      "config": {
+        "channel_id": "C0123",
+        "message_template": "*Competitor digest*\n\n{% for b in summary.bullets %}• {{b}}\n{% endfor %}"
+      }
+    }
+  ],
+
+  "execution": {
+    "timeout_seconds": 600,
+    "max_retries": 2,
+    "retry_backoff": "exponential",
+    "concurrency": "drop_if_running",
+    "budget_cap_usd": 1.50,
+    "on_failure": [ /* steps to run if main plan fails after retries */ ]
+  },
+
+  "metadata": { "tags": ["digest"], "created_from_nl": true }
+}
+```
+
+### Plan steps
+
+```jsonc
+{
+  "step_id": "...",                      // unique within plan
+  "action": "...",                       // references an ActionDefinition.type
+  "when": "{{ ... }}",                   // optional Jinja expr → bool; false = skip
+  "config": { ... },                     // validated against action's config_schema
+  "output_as": "...",                    // binds output to this name for later steps
+  "max_retries": 0,                      // optional, overrides automation default
+  "timeout_seconds": 1200                // optional, overrides automation default
+}
+```
+
+Steps run **sequentially**. No parallelism, no DAGs, no loops. If a user
+needs branching, they use `when:` on multiple steps. If they need
+parallelism or iteration, they use `agent_task` and let the agent reason
+about it, or they compose automations through events (§7.5).
+
+---
+
+## 6. Trigger contract (Layer 4)
+
+Three trigger types. That's the entire taxonomy.
+
+### `schedule`
+
+```python
+TriggerDefinition(
+    type="schedule",
+    config_schema={
+        "type": "object",
+        "required": ["cron", "timezone"],
+        "properties": {
+            "cron":     { "type": "string" },
+            "timezone": { "type": "string", "format": "iana-timezone" }
+        }
+    },
+    payload_schema={
+        "type": "object",
+        "properties": {
+            "fired_at":      { "type": "string", "format": "date-time" },
+            "scheduled_for": { "type": "string", "format": "date-time" },
+            "last_fired_at": { "type": "string", "format": "date-time" }
+        }
+    }
+)
+```
+
+Implementation: extends `app/utils/periodic_scheduler.py`, which already
+reads connector sync schedules. Adds a second source — `automation_triggers
+WHERE type='schedule'`. Same Celery Beat checker, two source tables.
+
+Minimum interval: 1 minute (the existing checker's resolution). The form
+editor warns when users set intervals under 15 minutes that they probably
+want an event trigger instead.
+
+### `webhook`
+
+```python
+TriggerDefinition(
+    type="webhook",
+    config_schema={
+        "type": "object",
+        "properties": {
+            "input_mapping": {
+                "type": "object",
+                "additionalProperties": { "type": "string" }
+                # values are JSONPath expressions
+            }
+        }
+    },
+    # payload is whatever the POST body is; user-defined shape via mapping
+)
+```
+
+Endpoint: `POST /api/v1/automations/{id}/fire`. Bearer token shown once,
+hashed at rest, rotatable, revocable. Returns `202 Accepted` with the
+created run's URL. Caller polls for status; we do not push callbacks in
+v1 (a `callback_webhook` action can be added later).
+
+Idempotency: honors `Idempotency-Key` header or `idempotency_key` in body.
+Dedups against runs in the last 24 hours.
+
+### `event`
+
+```python
+TriggerDefinition(
+    type="event",
+    config_schema={
+        "type": "object",
+        "required": ["event_type"],
+        "properties": {
+            "event_type": { "type": "string" },   # e.g. "drive.file_added"
+                                                   # or "surfsense.podcast.generated"
+            "filters":    { "$ref": "#/definitions/filter_expression" }
+        }
+    }
+    # payload shape is documented per event_type in a separate registry
+)
+```
+
+**Events absorb both connector events and internal SurfSense events.** A
+file added to Drive and a podcast finishing in SurfSense are both events
+in the same `domain_events` table, both subscribable by automations, both
+matched by the same dispatcher code. The engine doesn't distinguish.
+
+### Filter grammar
+
+Filters are JSON-structured operators, not expressions. This is the one
+place we deliberately don't use Jinja, because filters run on a hot path
+(every event matched against every subscribing trigger) and structured
+filters can be indexed and short-circuited.
+
+Vocabulary:
+- Equality: `equals`, `not_equals`
+- String: `starts_with`, `ends_with`, `contains`, `regex`
+- Numeric: `gt`, `gte`, `lt`, `lte`
+- Set: `in`, `not_in`
+- Existence: `exists`
+- Composition: `$and`, `$or`, `$not`
+
+Inspired by AWS EventBridge and MongoDB query syntax. The filter grammar
+itself is published as a JSON Schema, so users get inline error messages.
+
+---
+
+## 7. Runtime components
+
+Each component is distinct, replaceable, and has one job.
+
+### 7.1 Dispatcher
+
+What it does: matches firing triggers to automations, creates `AutomationRun`
+rows, enqueues executor tasks.
+
+For schedule triggers: Celery Beat polls the trigger table, computes due
+ones, fires.
+
+For webhook triggers: the FastAPI handler is the dispatcher entry point.
+Validates token, runs input_mapping, creates run.
+
+For event triggers: subscribes to the `domain_events` table. For each new
+event, evaluates all matching triggers' filters, fires the matches.
+
+Common path (after a trigger has fired):
+1. Resolve `inputs` from trigger payload and defaults
+2. Validate resolved inputs against the automation's input schema
+3. **Cost estimate** — sum capabilities' `cost_estimate(args)` for the plan;
+   refuse if exceeds `budget_cap_usd`
+4. **Idempotency check** — dedup against existing pending/running runs
+5. **Snapshot the resolved definition** into the run row (immutable history)
+6. Enqueue executor task on the appropriate Celery queue (per
+   `expected_duration_seconds`)
+
+### 7.2 Executor
+
+What it is: **a Celery task wrapping a single function that walks a plan
+step by step.** Not an agent, not a workflow engine, not a scheduler. A
+loop with bookkeeping. Maybe 200 lines.
+
+```python
+async def execute_run(run_id: int) -> None:
+    run = load_run(run_id); run.status = "running"; save(run)
+    context = build_run_context(run)
+    step_outputs = {}
+
+    for step in run.plan:
+        if step.when and not evaluate_predicate(step.when, context | step_outputs):
+            record_step_skipped(run, step); continue
+
+        resolved_config = render_config(step.config, context | step_outputs)
+        action = action_registry.get(step.action)
+        validate(resolved_config, action.config_schema)
+
+        try:
+            result = await with_retries(
+                action.handler,
+                ctx=build_action_context(run, action),
+                args=resolved_config,
+                policy=step.retry_policy or run.execution.retry_policy,
+            )
+            validate(result, step.output_schema)
+            if step.output_as:
+                step_outputs[step.output_as] = result
+            record_step_succeeded(run, step, result)
+        except Exception as e:
+            record_step_failed(run, step, e)
+            await run_on_failure(run, e)
+            return
+
+    run.status = "succeeded"; save(run)
+    publish_event("automation.run.succeeded", run)   # see §7.5
+```
+
+Intelligence lives **inside handlers**, not in the executor. The most
+intelligent handler is `agent_task`, which spins up a LangGraph Deep Agent
+for one step and returns when the agent finishes. The executor sees a
+validated dict come back; it doesn't know that step was "smart."
+
+### 7.3 Action handlers
+
+One handler per `ActionDefinition.type`. Receives `(ctx, args)`, returns
+a dict matching `output_contract` (or matching the user-declared
+`output_schema` for dynamic-output actions like `agent_task`).
+
+Handlers handle their own credential resolution via `ctx.resolve_credentials`.
+They do not know about retries, timeouts, or budget caps — those are the
+executor's concern.
+
+### 7.4 Template engine
+
+#### Why it exists
+
+Most fields in an automation definition contain literal strings the user
+authored once — but the actual rendered value has to change per run, because
+it includes data from the trigger payload or from prior step outputs. The
+template engine is what turns `"Daily digest for {{run.started_at}}"` into
+`"Daily digest for 2026-05-26"` at run time.
+
+Three fields use it:
+- `*_template` strings in tight action configs (Slack messages, email bodies,
+  Linear titles, etc.)
+- `prompt` in `agent_task` configs (so the agent sees resolved values, not
+  `{{...}}` placeholders)
+- `when:` step predicates (which need to evaluate to a boolean)
+
+#### Public interface
+
+Single module, ~80 lines. Three public functions — everything else in the
+engine routes through these:
+
+```python
+def render_template(template: str, context: dict) -> str: ...
+def evaluate_predicate(expression: str, context: dict) -> bool: ...
+def build_run_context(run, step_outputs) -> dict: ...
+```
+
+Backed by Jinja2's `SandboxedEnvironment`. The whole module is the seam: if
+the template language is ever swapped, only this file changes.
+
+#### Security architecture: allowlist by default
+
+`SandboxedEnvironment` starts empty. A freshly-created instance gives a
+template access to:
+- Variables in the context dict we pass in (`run`, `inputs`, prior step
+  outputs)
+- Public (non-underscore) attributes of those variables
+- Jinja's built-in control flow (`{% if %}`, `{% for %}`, `{% set %}`)
+
+Nothing else. No Python builtins, no modules, no I/O, no network, no
+filesystem. Everything beyond the above must be **explicitly registered.**
+This is the structurally important property: anything we didn't add is
+inaccessible. The risk surface equals the size of what we registered.
+
+The three sandbox rules that enforce this:
+1. **Attribute access is filtered** — names starting with underscore are
+   rejected. This blocks the entire family of `{{x.__class__.__mro__...}}`
+   Python escape paths in one rule.
+2. **Globals are allowlist-only** — `open`, `eval`, `exec`, `__import__`,
+   `getattr`, every module name, are all absent unless we register them.
+   We register zero globals.
+3. **Unsafe callables are blocked** — `str.format` and `str.format_map`
+   specifically (due to CVE-2016-10745), plus anything marked
+   `unsafe_callable`.
+
+#### What we register, exactly
+
+- **Filters: a curated 15**, no more. `join`, `length`, `default`, `upper`,
+  `lower`, `truncate`, `tojson`, `date`, `replace`, `trim`, `slugify`,
+  `first`, `last`, `sort`, `reverse`. Each one is audited for what it does
+  with its input; none of them takes a callable, runs `eval`, or reaches
+  into Python objects beyond simple data transformation.
+- **Globals: none.**
+- **Tests: only the safe built-ins** (`defined`, `none`, `number`, `string`,
+  `mapping`, `sequence`, `boolean`).
+
+Adding a new filter requires a deliberate code change and review: does this
+filter do anything dangerous with its input? If yes, don't add it. The list
+only grows by audited additions.
+
+#### Runtime limits (defense in depth)
+
+The sandbox handles the attack surface inside the template language. Three
+additional limits handle resource exhaustion that the language permits but
+the runtime shouldn't tolerate:
+
+- **Template source length capped at 8 KB.** Checked before parsing.
+- **Render time capped at 100 ms per render.** Implemented via a watchdog
+  thread; renders that exceed are killed and the step fails. Catches
+  `{% for i in range(10**9) %}` and nested loop bombs.
+- **Output size capped at 1 MB.** A small template can produce a multi-GB
+  string via `{{ 'A' * 10**8 }}`-style multiplication; this catches it.
+
+Plus `StrictUndefined`: any reference to a missing variable raises
+immediately rather than silently rendering empty, so misconfigurations
+fail fast.
+
+#### Threat model and residual risk
+
+The trust model from day one is:
+
+- Templates are generated by an LLM from a user's natural-language input
+  (see §10), or written/edited by humans in the editable form
+- A second LLM reviews the proposal and produces a plain-language summary
+  plus flagged anomalies for the user
+- The user reviews and approves before the automation runs
+- The Generator LLM's input is scoped (user prompt + schema + registry
+  only — no arbitrary document content), minimizing prompt-injection paths
+
+The sandbox + runtime limits + curated filter list protect against the
+malformed-template attack. Human review protects against the
+semantically-malicious-but-syntactically-valid attack. These are
+complementary layers, not redundant.
+
+Known residual risks, each genuinely small:
+
+- **Future Jinja CVEs.** Historical sandbox bypasses have existed and
+  been patched. This is a generic third-party-dependency risk, comparable
+  to bugs in any other library we rely on. Mitigation: subscribe to
+  security advisories, ship updates within a week of disclosure.
+- **Side channels via prompts to LLMs.** A template that renders into a
+  prompt can attempt prompt injection of the agent at run time. This is
+  not a sandbox concern but a separate concern in `agent_task`'s design.
+- **Operator deployments with long-lived secrets in worker env vars.**
+  Mitigation: credentials fetched per-handler-per-call via
+  `ActionContext.resolve_credentials`, never pre-loaded into worker
+  env vars accessible to templates.
+
+The sandbox-with-allowlist architecture means **the attack surface
+equals the set of things we registered.** With zero globals registered
+and 15 audited filters, the surface is small, bounded, and reviewable.
+This is the structural property that makes the architecture sound, and
+it doesn't depend on hypothetical assumptions about who authors templates.
+
+#### Pre-Phase-5 gate
+
+One trust-model change is documented in the roadmap: **Phase 5 introduces
+template sharing across SearchSpaces** (automation templates as
+exportable, importable artifacts). At that point, the *approver* of a
+template (the original author) is no longer the *runner* (the importer).
+The "human reviews before save" mitigation breaks down because the
+reviewer doesn't bear the risk.
+
+Before Phase 5 ships, this needs an explicit re-approval flow: importing
+a template triggers a fresh review pass by the importing user, with the
+flagged-anomalies output prominently displayed, and the import cannot
+complete without explicit per-template approval.
+
+This is a UX/flow decision, not a template-language migration. Jinja
+itself stays; what changes is the approval workflow at the import boundary.
+
+#### The `run.*` namespace exposed in every template
+
+```
+run.id, run.started_at, run.automation_id, run.automation_name,
+run.automation_version, run.trigger_type, run.trigger_id,
+run.search_space_id, run.creator_id, run.attempt,
+run.failed_step_id, run.error.*   (only in on_failure context)
+```
+
+#### Default value rendering
+
+Non-string template values render as JSON by default (via the `finalize`
+hook): lists become `["a", "b"]`, dicts become `{"k": "v"}`, datetimes
+become ISO 8601. The `| join`, `| length`, `| tojson` filters give explicit
+control. Strings render as themselves with no quoting. `None` renders as
+empty string in templates, as `null` in JSON contexts.
+
+### 7.5 Event bus
+
+`domain_events` table, polled by Celery Beat alongside the existing
+scheduler. Both connector events and internal SurfSense events publish to
+it. Both are consumed by the dispatcher's event-trigger subscriber.
+
+**Automations themselves publish events.** Successful and failed runs emit
+`automation.run.succeeded` / `automation.run.failed` events with the run
+metadata. This makes automations composable through events — chain them by
+subscribing one automation's event trigger to another's run event. No new
+mechanism; the trigger filter and event publishing already exist.
+
+Upgrade path documented: when throughput or latency demands it, replace
+PostgreSQL polling with Redis Streams. The `events.publish()` and
+`events.subscribe()` interfaces stay the same. Nothing else changes.
+
+---
+
+## 8. Cross-cutting concerns
+
+### Concurrency policy
+
+Per-automation `concurrency` field controls what happens when a new fire
+occurs while a previous run is still running:
+
+- `drop_if_running` — silently skip the new fire
+- `queue` — execute serially, in arrival order
+- `allow_parallel` — start a new run independently
+
+The dispatcher enforces this before enqueueing.
+
+### Retry policy
+
+Three fields, per-automation defaults with optional per-step overrides:
+- `max_retries`: integer, 0–10
+- `retry_backoff`: `none` | `linear` | `exponential`
+- `timeout_seconds`: integer
+
+Retries on:
+- Capability handler exceptions
+- Output schema validation failures (for dynamic-output actions, the
+  validation error is fed back to the LLM in the retry)
+
+Not retries:
+- `when:` evaluation failures (these are user errors, surface immediately)
+- Input validation failures (caught at dispatch, never reach the executor)
+
+### Budget enforcement
+
+`budget_cap_usd` is per-run. The dispatcher refuses to enqueue if estimated
+cost exceeds it. The executor kills the run if accumulated cost crosses it
+mid-flight (the LLM ops handler reports tokens consumed back to the
+executor between calls).
+
+### On-failure handlers
+
+`execution.on_failure` is a list of steps that run after the main plan has
+failed and all retries are exhausted. Same step shape as the main plan.
+Cannot have their own `on_failure`. See `run.error.*` in the run context.
+
+### Artifacts
+
+Actions that produce artifacts declare `produces_artifacts: list[ArtifactSpec]`:
+
+```python
+@dataclass
+class ArtifactSpec:
+    kind: str           # "audio", "document", "image", "data"
+    retention: str      # "transient" | "default" | "permanent"
+    visibility: str     # "private" | "search_space" | "shared"
+```
+
+The engine handles storage (writes to SurfSense's existing object storage),
+URL generation (signed, scoped to the run's permissions), and cleanup (a
+nightly Celery Beat task deletes expired artifacts).
+
+### Duration classes and queue routing
+
+Capabilities declare `expected_duration_seconds`. The dispatcher routes
+runs to Celery queues based on the longest-duration step:
+- < 10s → `automations_fast`
+- 10s – 5min → `automations_medium`
+- 5min – 1hr → `automations_long`
+
+Operators scale each queue's worker pool independently. A future "very
+long" queue is a config change, not a contract change.
+
+---
+
+## 9. Data model
+
+Six tables. All scoped by `search_space_id` for RBAC.
+
+The first four (`automations`, `automation_triggers`, `automation_runs`,
+`domain_events`) are the engine's own state. The last two
+(`mcp_connections`, `mcp_tools`) hold the durable knowledge that backs
+MCP-derived capabilities — see §3 for the lifecycle rationale.
+
+### `automations`
+
+| field             | type                                | notes                                                                      |
+| ----------------- | ----------------------------------- | -------------------------------------------------------------------------- |
+| `id`              | int PK                              |                                                                            |
+| `search_space_id` | FK → `search_spaces.id`             |                                                                            |
+| `created_by`      | FK → `users.id`                     | runs execute as this identity                                              |
+| `name`            | str                                 |                                                                            |
+| `description`     | str                                 |                                                                            |
+| `status`          | enum                                | `active`, `paused`, `archived`                                             |
+| `definition`      | jsonb                               | the editable structured spec                                               |
+| `version`         | int                                 | bumped on every edit                                                       |
+| `created_at` / `updated_at` | timestamps                |                                                                            |
+
+### `automation_triggers`
+
+| field           | type                                                                          | notes                                       |
+| --------------- | ----------------------------------------------------------------------------- | ------------------------------------------- |
+| `id`            | int PK                                                                        |                                             |
+| `automation_id` | FK                                                                            |                                             |
+| `type`          | enum: `schedule`, `webhook`, `event`                                          |                                             |
+| `config`        | jsonb                                                                         | validated against trigger's `config_schema` |
+| `enabled`       | bool                                                                          |                                             |
+| `secret_hash`   | str / null                                                                    | for webhook bearer tokens                   |
+| `last_fired_at` | timestamp                                                                     |                                             |
+
+### `automation_runs`
+
+| field             | type                                                                         | notes                                              |
+| ----------------- | ---------------------------------------------------------------------------- | -------------------------------------------------- |
+| `id`              | int PK                                                                       |                                                    |
+| `automation_id`   | FK                                                                           |                                                    |
+| `trigger_id`      | FK / null                                                                    | null = manual via UI                               |
+| `status`          | enum                                                                         | `pending`, `running`, `succeeded`, `failed`, `cancelled`, `timed_out` |
+| `definition_snapshot` | jsonb                                                                    | the definition as it was when this run fired       |
+| `trigger_payload` | jsonb                                                                        |                                                    |
+| `resolved_inputs` | jsonb                                                                        |                                                    |
+| `step_results`    | jsonb                                                                        | array of per-step results with timing              |
+| `output`          | jsonb / null                                                                 |                                                    |
+| `artifacts`       | jsonb                                                                        | references to created artifacts                    |
+| `error`           | jsonb / null                                                                 |                                                    |
+| `cost_usd`        | decimal                                                                      | accumulated cost                                   |
+| `started_at` / `finished_at` | timestamps                                                        |                                                    |
+| `agent_session_id`| str / null                                                                   | link to LangGraph trace if agent_task was used     |
+
+### `domain_events`
+
+| field             | type        | notes                                              |
+| ----------------- | ----------- | -------------------------------------------------- |
+| `id`              | UUID PK     |                                                    |
+| `search_space_id` | FK          | scoping                                            |
+| `event_type`      | varchar     | e.g. `drive.file_added`, `automation.run.succeeded` |
+| `source_id`       | varchar     | which connector/automation/etc. produced it        |
+| `payload`         | jsonb       | matches the event type's documented schema         |
+| `created_at`      | timestamp   |                                                    |
+| `consumed_by`     | jsonb       | array of consumer_ids, for tracking + replay       |
+| `expires_at`     | timestamp   | auto-cleanup after 7 days                          |
+
+### `mcp_connections`
+
+Persistent record of MCP server connections per SearchSpace.
+
+| field               | type        | notes                                              |
+| ------------------- | ----------- | -------------------------------------------------- |
+| `id`                | UUID PK     |                                                    |
+| `search_space_id`   | FK          | scoping                                            |
+| `server_url`        | text        | the MCP server's endpoint                          |
+| `transport`         | text        | `"http"`, `"stdio"`, etc.                          |
+| `name`              | text        | human-readable label (e.g., "Slack — Acme")        |
+| `access_token`      | bytea       | encrypted at rest                                  |
+| `refresh_token`     | bytea       | encrypted at rest                                  |
+| `expires_at`        | timestamp   | for OAuth tokens                                   |
+| `last_harvested_at` | timestamp   | when tool list was last refreshed                  |
+| `created_at`        | timestamp   |                                                    |
+| `created_by`        | FK → users  |                                                    |
+
+### `mcp_tools`
+
+The tool list each connected MCP server exposes. Acts as the durable
+source for MCP capabilities — definitions reference `mcp_tools` rows by
+qualified name, and worker processes lazily build handler closures from
+this state.
+
+| field           | type        | notes                                            |
+| --------------- | ----------- | ------------------------------------------------ |
+| `id`            | UUID PK     |                                                  |
+| `connection_id` | FK → `mcp_connections.id` ON DELETE CASCADE | |
+| `name`          | text        | the tool name reported by the MCP server         |
+| `description`   | text        | description for the NL generator and form editor |
+| `input_schema`  | jsonb       | JSON Schema for tool arguments                   |
+| `output_schema` | jsonb       | JSON Schema for tool results                     |
+| `side_effects`  | text[]      | inferred from MCP hints + naming + admin override |
+| UNIQUE          |             | (connection_id, name)                            |
+
+NL drafts are **not** a core table. They live in a generic short-TTL store
+(Redis or a transient table) when the NL flow is built in Phase 3.
+
+---
+
+## 10. NL authoring flow
+
+**This is how the system is intended to be used from day one, not just a
+Phase 3 addition.** The product surface is: user describes intent in natural
+language, LLM produces a structured proposal, user reviews and edits in an
+auto-generated form, then saves. Hand-authoring JSON directly is supported
+but is not the primary path.
+
+This shapes the trust model. Templates are LLM-generated from day one, not
+hand-written by power users. The mitigation is human-in-the-loop review,
+not "trusted authors only."
+
+### Pass 1: Proposal generation
+
+User provides natural-language input. The Generator LLM is given:
+- The full schema set (input schema for definition, registry of action
+  types with their config_schemas, registry of trigger types, available
+  capabilities for this SearchSpace, list of allowed Jinja filters)
+- A tool to list available connectors, channels, and other SearchSpace
+  resources, so it doesn't invent names that don't exist
+- A few-shot set of examples
+
+**Scoped input.** The Generator does *not* receive arbitrary SearchSpace
+document content. Its context is the user's prompt plus the schema and
+registry information. This minimizes the prompt-injection surface — there's
+no document text in the context for an attacker to seed instructions into.
+
+If a user wants document-aware generation later ("create an automation
+that processes documents like this one"), that's a deliberate feature
+extension with its own prompt-injection mitigations, not the default flow.
+
+Output: a structured proposal matching the automation definition schema.
+
+### Pass 2: Deterministic validation
+
+Server-side, before the proposal reaches the user:
+- Validate against JSON Schema (shape correctness)
+- Verify every capability referenced exists in the registry (resource existence)
+- Verify every connector/channel/resource referenced exists in this SearchSpace
+- Validate every template against the sandbox's allowlist (no underscore
+  attributes, no unregistered filter names, length under cap)
+
+Failures here are deterministic errors, not warnings. A proposal that
+references a non-existent capability or includes a template using
+`{{x.__class__}}` is rejected before the user sees it; the Generator is
+re-prompted with the validation error and asked to fix the proposal.
+
+### Pass 2.5: Review pass
+
+A second LLM call — the **Review LLM** — examines the validated proposal and
+produces two outputs for the user:
+
+1. **A plain-language summary** of what the automation will do, in business
+   terms. "This automation will run every weekday at 9am. It reads documents
+   in this SearchSpace tagged 'competitor' that were indexed since the last
+   run, asks an agent to summarize them as 5 bullets, and posts the summary
+   to your #engineering-standup Slack channel. Estimated cost: $0.40 per
+   run."
+
+2. **A "things worth checking" list** flagging anything unusual:
+   - Templates with unusual attribute paths or filter usage
+   - Prompts containing instructions that look more like commands than
+     descriptions ("ignore previous instructions" style)
+   - Action sequences that touch external systems without obvious benefit
+     to the user
+   - Cost estimates that seem high relative to the goal
+   - References to capabilities the user hasn't used before
+   - Schedules tighter than 15 minutes (likely should be event triggers)
+
+The Review LLM is a **UX layer** that makes review actually useful. It is
+**not a security boundary.** The deterministic controls (sandbox, runtime
+limits, schema validator) are the security boundaries. The Review LLM
+helps users catch their own intent mismatches and surfaces anomalies for
+attention, but the sandbox would block dangerous templates even if the
+Review LLM missed them.
+
+This separation is important: two probabilistic controls compounding can
+create a false sense of security. The Review LLM is explicitly framed in
+the architecture as helper, not gatekeeper.
+
+### Pass 3: Editable review
+
+The user lands on a form pre-filled with the proposal. The page shows:
+- The plain-language summary from the Review pass
+- The flagged items, prominently displayed near the relevant fields
+- The full editable form, auto-generated from the JSON Schemas
+- Cost estimate and impact summary (which external systems get touched)
+
+**Every field is editable.** Clarifications appear as required fields.
+Templates are shown in code-styled fields with syntax highlighting and the
+filter palette visible. The user can edit any field; saving re-runs Pass 2
+(deterministic validation) before persisting.
+
+Hitting **Save** promotes the proposal to an `automation` row.
+
+### Editing existing automations
+
+NL editing of an existing automation is a patch operation: the Generator
+LLM receives the current definition plus the NL instruction and produces a
+modified proposal. The same Pass 2 (validation) and Pass 2.5 (review) run
+against the modified version, and the user reviews the diff before saving.
+Existing run history is unaffected — only future runs use the new version.
+
+### Why human-in-the-loop is non-negotiable
+
+The Generator LLM, the Review LLM, and the sandbox are three layers of
+defense against malformed or malicious proposals. The human approval step
+is the fourth and most important layer. It exists because:
+
+- LLMs can be prompt-injected; humans can spot text that asks them to
+  ignore instructions
+- LLMs can produce confident-but-wrong proposals; humans can catch
+  semantic mismatches between intent and output
+- The cost of a bad automation running unattended is high; the cost of a
+  user clicking "approve" after reading is low
+
+The architecture must never offer "auto-approve" or "skip review" options
+for LLM-generated proposals. Save requires human action on the proposal,
+always.
+
+---
+
+## 11. Repository layout
+
+```
+surfsense_backend/app/
+├── automations/                       # NEW: the engine
+│   ├── __init__.py
+│   ├── models.py                      # SQLAlchemy models for 6 tables
+│   ├── schemas.py                     # Pydantic schemas (definition envelope, etc.)
+│   ├── routes.py                      # FastAPI router (/api/v1/automations)
+│   ├── service.py                     # CRUD + business logic
+│   ├── dispatcher.py                  # trigger matching, cost check, run creation
+│   ├── executor.py                    # the Celery task that runs a plan
+│   ├── templating.py                  # Jinja sandbox + filters
+│   ├── events.py                      # publish/subscribe for domain_events
+│   ├── filters.py                     # JSON filter grammar evaluator
+│   ├── actions/
+│   │   ├── registry.py
+│   │   ├── agent_task.py
+│   │   ├── transform_data.py
+│   │   ├── slack_post.py
+│   │   ├── send_email.py
+│   │   ├── notification.py
+│   │   └── (more in Phase 5: podcast_generation, report_generation, ...)
+│   ├── triggers/
+│   │   ├── registry.py
+│   │   ├── schedule.py                # Celery Beat hookup
+│   │   ├── webhook.py                 # /fire endpoint
+│   │   └── event.py                   # subscribes to domain_events
+│   ├── capabilities/
+│   │   ├── registry.py
+│   │   ├── native.py                  # native capability registrations
+│   │   ├── mcp_harvester.py           # registers MCP tools as capabilities (Phase 4)
+│   │   └── (LLM ops registered alongside)
+│   └── nl/                            # Phase 1 — primary user path
+│       ├── generator.py               # Generator LLM
+│       ├── reviewer.py                # Review LLM (summary + flagged items)
+│       ├── validator.py               # deterministic schema + resource checks
+│       └── prompts.py                 # system prompts for both LLMs
+│
+├── utils/
+│   └── periodic_scheduler.py          # EXTENDED to scan automation_triggers
+│
+└── alembic/versions/
+    └── NN_add_automation_tables.py
+
+surfsense_web/app/(routes)/
+└── automations/                       # NEW: UI
+    ├── page.tsx                       # list
+    ├── new/page.tsx                   # NL input + draft preview (Phase 1)
+    ├── [id]/page.tsx                  # editor (auto-generated forms)
+    └── [id]/runs/page.tsx             # run history, streamed via Electric SQL
+```
+
+---
+
+## 12. Phased delivery
+
+Each phase delivers something usable. Each de-risks the next. **NL authoring
+is the primary user path from Phase 1** — what evolves across phases is
+which actions and triggers are available, not whether users can describe
+automations in natural language.
+
+### Phase 1 — Engine MVP with NL authoring
+- 4 tables + Alembic migration
+- Capability registry with native capabilities (`search_space.query`,
+  `search_space.fetch_document`, `agent.run`)
+- `agent_task` action only
+- `schedule` trigger + manual "Run now" endpoint
+- Executor with retries, timeouts, budget caps
+- Template engine (Jinja sandbox + 15 filters + 4 runtime limits)
+- **NL authoring flow**: Generator LLM, deterministic validator,
+  Review LLM, editable form
+- Run history UI with Electric SQL streaming
+
+**After Phase 1**: a user can describe an automation in natural language,
+review the proposal (with summary + flagged anomalies), edit any field,
+save, and watch it run on a schedule. The Claude Routines value
+proposition, on SurfSense's data, with NL-first authoring.
+
+### Phase 2 — Webhooks and delivery
+- `webhook` trigger with per-automation bearer tokens
+- Tight actions: `slack_post`, `send_email`, `notification`
+- `transform_data` action
+- `on_failure` hooks
+- Step-level retry/timeout overrides
+- Concurrency policy enforcement
+
+**After Phase 2**: external systems can drive automations, results go
+somewhere humans see, complex pipelines have proper error handling.
+
+### Phase 3 — NL authoring polish
+- NL patch flow for editing existing automations (diff-based)
+- Conversational refinement during proposal review ("change the schedule
+  to weekdays only," "add a Slack notification on failure")
+- Improved Review LLM coverage (more anomaly patterns, cost-relative-to-
+  goal heuristics)
+- Saved prompt templates and starter examples
+
+**After Phase 3**: NL authoring is the polished primary surface; edit
+flows are conversational rather than form-only.
+
+### Phase 4 — Event triggers
+- `domain_events` table and `events.py` module
+- Indexing pipeline publishes `connector.*` events (smallest change — just
+  add publish calls to the existing flow)
+- Automations publish `automation.run.*` events on completion
+- `event` trigger with filter grammar
+- MCP capability harvester (so MCP-backed events and tools both work)
+
+**After Phase 4**: "do X when Y happens" automations work, including
+automation-chaining through events.
+
+### Phase 5 — Wrapping existing features and sharing
+- Wrap existing SurfSense capabilities as actions: `podcast_generation`,
+  `report_generation`, `indexing_sweep`
+- Artifact lifecycle implementation
+- `expected_duration_seconds` based queue routing (split `automations_long`
+  from `automations_default`)
+- **Automation templates** (shareable, exportable, importable) — with
+  the import re-approval flow that handles the approver-≠-runner trust
+  shift documented in §7.4's pre-Phase-5 gate
+- Cross-automation composition examples in the docs
+
+**After Phase 5**: every existing SurfSense capability is automatable
+without any per-feature code, and automations can be shared between
+SearchSpaces and users.
+
+---
+
+## 13. Decisions locked
+
+For reference — every decision made through the design process, in one
+place.
+
+### Foundations
+1. ✅ JSON Schema 2020-12 is the single schema language for everything
+2. ✅ Definition is the program; infrastructure is the interpreter
+3. ✅ List of steps (not single action) in the plan, with `output_as` chaining
+4. ✅ One capability registry serving native + MCP + LLM operations through the same interface
+5. ✅ Capability IDs do not leak handler kind (`slack.post_message`, not `mcp.slack.post_message`)
+6. ✅ Name-based resolution: definitions reference actions and capabilities by string ID. The registry is the runtime's vocabulary; lookup is a dict access. No code references in definitions.
+7. ✅ The expressive spectrum runs from pure direct calls to broad agent_task; the NL generator proposes the cheapest shape that meets intent (Shape 6 from §4 by default)
+
+### Trigger taxonomy
+8. ✅ Three trigger types: `schedule`, `webhook`, `event`
+9. ✅ Events absorb both connector events and internal SurfSense events
+10. ✅ Filter grammar is JSON-structured operators (not Jinja)
+
+### Templating cluster
+11. ✅ Jinja2 `SandboxedEnvironment` for templates and `when:` predicates — but with the explicit understanding that the sandbox is an allowlist-by-default architecture, not a denylist
+12. ✅ Zero globals registered. Curated 15 filters only, each audited for safe behavior with hostile input. List grows only by reviewed addition
+13. ✅ Four runtime mitigations: `StrictUndefined`, 8 KB template source cap, 100 ms render time cap (watchdog-enforced), 1 MB output size cap
+14. ✅ Non-string template values render as JSON by default
+15. ✅ Fixed `run.*` namespace, documented
+16. ⏸ **Pre-Phase-5 gate**: template sharing across SearchSpaces breaks the approver-equals-runner trust model. Mitigation is a re-approval flow at the import boundary (UX-level), not a template-language migration. Jinja itself stays.
+
+### Execution
+17. ✅ Executor is a Celery task wrapping a sequential loop — not an agent
+18. ✅ `when:` is optional per step; false = skipped (not failed)
+19. ✅ No DAGs, no parallelism, no loops — composition via agent_task or events
+20. ✅ `on_failure` part of execution policy from v1
+21. ✅ Step-level retry and timeout overrides
+22. ✅ Budget cap enforced pre-enqueue and mid-flight
+
+### Components
+23. ✅ Dispatcher / executor / handlers / registry — distinct, each replaceable
+24. ✅ Side effects are a set, including `USER_VISIBLE`
+25. ✅ `expected_duration_seconds` integer drives queue routing
+26. ✅ `produces_artifacts` is a list of `ArtifactSpec`, not a bool
+27. ✅ Output schemas recommended on `agent_task`; editor warns when missing
+
+### Event bus
+28. ✅ `domain_events` table for v1, with upgrade path to Redis Streams
+29. ✅ Automations publish run events for composability
+30. ✅ Publish/subscribe behind interface — no direct table access elsewhere
+
+### Capability storage (two-tier persistence)
+31. ✅ Native capabilities registered in-memory at startup from the codebase. Identical across all workers.
+32. ✅ MCP capability metadata persisted in `mcp_connections` and `mcp_tools` tables. Survives restarts.
+33. ✅ MCP handler closures built lazily per worker from database state. Worker-local cache, rebuilt on demand.
+34. ✅ MCP server tool list re-harvested on a schedule (default: daily) and on user request.
+35. ✅ MCP tools harvested into the capability registry at connection time
+36. ✅ Side effects inferred from MCP hints + naming + admin overrides
+37. ✅ MCP tools callable directly (no agent required) when caller knows args
+
+### Credentials
+38. ✅ Credentials never appear in the automation definition — only connection IDs do
+39. ✅ Credentials never appear in the LLM's context — the host holds them and uses them on the LLM's behalf
+40. ✅ Credentials resolved per-call by `ActionContext`, not pre-loaded into worker environment
+41. ✅ Tokens encrypted at rest in the database; refresh handled automatically by `ActionContext.resolve_*_client`
+
+### NL authoring
+42. ✅ LLM-authored templates is the primary path from day one — not a Phase 3 addition. Hand-authoring JSON is supported but secondary
+43. ✅ Generator LLM produces JSON; deterministic schema + resource validation runs before user sees the proposal
+44. ✅ Review LLM produces plain-language summary + flagged anomalies for the user — UX layer, not a security boundary
+45. ✅ Generator LLM's input is scoped (user prompt + schema + registry only); arbitrary document content is not fed in
+46. ✅ Human approval is required before save — no auto-approval option, ever
+47. ✅ Every field editable in the proposal; unresolved questions surface as clarifications
+48. ✅ NL drafts are transient storage, not a core table
+
+### Data model
+49. ✅ Six tables total — four for engine state, two for MCP persistence
+50. ✅ Run rows snapshot the definition (immutable history)
+51. ✅ All entities scoped by `search_space_id` for RBAC
+52. ✅ Editing an automation bumps `version`; existing runs unaffected
+
+---
+
+## 14. Open questions deferred to implementation
+
+None of these block design; they're decisions a developer will make in
+context, with the principle from §1 as their guide.
+
+- Exact retry backoff formulas (multipliers, jitter, ceilings)
+- Webhook signature verification standards (HMAC scheme, header naming)
+- Whether to support inline JSON Schema `$ref` to external schemas, or
+  inline everything
+- Specific CDN/storage backend choices for artifacts (probably
+  whatever SurfSense already uses for podcasts)
+- Rate limits per SearchSpace and per user
+- Audit log retention policy
+
+---
+
+## 15. Why this is ready to build
+
+This document satisfies five tests:
+
+1. **The four worked examples** (digest, CI webhook, file-added-trigger,
+   weekly podcast) all express cleanly in the contract without special
+   cases. Each one was used to find gaps before the gaps reached code.
+
+2. **The audit pass identified six refinements**, all incorporated. No
+   pending audit items.
+
+3. **Every decision points back to the principle from §1.** When a future
+   feature request lands, "does it belong in the definition or in the
+   engine?" gives a clear answer.
+
+4. **The build is staged** so Phase 1 ships in weeks, not months, and
+   each subsequent phase delivers user value while de-risking the next.
+
+5. **Existing SurfSense infrastructure is reused**, not paralleled. Celery
+   Beat, PostgreSQL/JSONB, Electric SQL, SQLAlchemy/Alembic, the existing
+   `tools/registry.py` pattern, the existing Search Space scoping — all
+   continue to do what they already do. The automation engine is a new
+   directory, not a new system.
+
+The next document a developer needs is the Pydantic models and JSON
+Schemas spelled out concretely. Those follow mechanically from this plan.
+
+---
+
+*Sources consulted: Claude Code Routines documentation; NousResearch/hermes-
+agent (cron and skills subsystems); n8n documentation on node types and
+workflow data model; the SurfSense repository and DeepWiki architecture
+notes (FastAPI + Celery Beat + Electric SQL + LangGraph Deep Agents +
+Search Space RBAC); Model Context Protocol specification for capability
+harvesting; AWS EventBridge for filter grammar; workflow-pattern
+literature (van der Aalst et al.) for the trigger / action / concurrency
+vocabulary.*