From b90bed2dbdfe7941f1a886481c45121309076501 Mon Sep 17 00:00:00 2001
From: CREDO23 <bakerathierry@gmail.com>
Date: Thu, 28 May 2026 15:38:57 +0200
Subject: [PATCH] chore: drop local design plan

---
 automation-design-plan.md | 1240 -------------------------------------
 1 file changed, 1240 deletions(-)
 delete mode 100644 automation-design-plan.md

diff --git a/automation-design-plan.md b/automation-design-plan.md
deleted file mode 100644
index db5f7a23c..000000000
--- a/automation-design-plan.md
+++ /dev/null
@@ -1,1240 +0,0 @@
-# SurfSense Automation Feature — Design Plan (v2)
-
-A generic, extensible automation system for SurfSense that lets users (and
-future SurfSense features) trigger agent work on a schedule, on an external
-event, or on demand — with the ability to author automations either by hand
-or from a natural-language description that yields an editable, structured
-definition.
-
-This document supersedes the v1 draft. It folds in the design audit pass and
-the corrections from working through worked examples (notably: removing the
-connector bias, clarifying the executor's role, integrating MCP cleanly, and
-committing to JSON Schema as the single declarative language).
-
----
-
-## 1. The load-bearing principle
-
-> **The JSON definition is the program. Everything else is interpreter.**
-
-Every decision in this document serves that principle. If we ever face a
-design choice and one option lets some behavior leak out of the definition
-into the engine, we pick the other option.
-
-Three properties follow from this principle, and they're the reason the
-system will survive feature growth:
-
-- **Reproducibility** — same definition + same inputs → same observable
-  behavior, regardless of which version of the engine runs it.
-- **Portability** — definitions can be exported, imported, version-
-  controlled, code-reviewed, and shared across SurfSense instances.
-- **LLM tractability** — the NL authoring flow works because the LLM only
-  needs to produce a self-contained JSON document that validates against a
-  schema. It doesn't need to understand the engine.
-
----
-
-## 2. The three-layer contract
-
-The system is structured as three layers. Layers 1 and 3 are defined by
-SurfSense developers (at registration time). Layer 2 is what users write
-(or the NL generator produces). The runtime reads all three to do its job.
-
-| Layer | What it is | Defined by |
-| ----- | ---------- | ---------- |
-| **1. Action contract** | Per-action params and output schema | Developers, at startup |
-| **2. Automation definition** | One concrete saved automation | Users (or NL generator) |
-| **3. Trigger contract** | Per-trigger params and payload schemas | Developers, at startup |
-
-Each layer constrains the next. The runtime reads all three but doesn't
-know what's in them ahead of time. That's how a new action or trigger
-type becomes available across the engine without code changes outside its
-registration.
-
-A unification layer below Layer 1 — one catalog of "things this SurfSense
-instance can do," shared by automations, agents, and future surfaces — was
-considered and deferred (§3). v1 actions are stand-alone.
-
-### Schema language
-
-Every shape in every layer is described in **JSON Schema (draft 2020-12).**
-No exceptions, no parallel languages, no inline shorthand. Two documented
-extensions on top:
-
-- `default: "$some_token"` — runtime-resolved defaults. The vocabulary is
-  fixed: `$last_fired_at`, `$creator`, `$space_default`. The engine resolves
-  these to values before validation.
-- `x-surfsense-*` annotations — editor hints (widget type, autocomplete
-  source). The validator ignores them; the form editor reads them.
-
----
-
-## 3. Capability unification layer — deferred to post-v1
-
-Earlier drafts introduced a `Capability` registry as Layer 1: one catalog
-of "things this SurfSense instance can do," shared by the automation
-engine (as actions), the agent (as tools), and any future HTTP surface.
-The motivation is real — one source of truth beats N parallel registries —
-but v1 has a single action (`agent_task`) and a single consumer (the
-automation engine). The five-field shape sketched earlier (`id`,
-`description`, `input_schema`, `output_schema`, `handler`) cannot safely
-host any non-trivial capability: it carries no caller identity, no
-search-space scoping, and no authorization gate on tool delegation.
-Building the abstraction with one consumer would lock in a shape that
-doesn't survive the second consumer.
-
-The unification layer returns when the second consumer lands (Phase 2
-tight actions or Phase 4 MCP), redesigned from the start with:
-
-- A `CallContext` carrying caller user id, search space id, and run id,
-  passed to every handler invocation.
-- Explicit scope declarations per capability (e.g. `reads:documents`,
-  `writes:slack`, `destructive`) for the authorization layer to read.
-- A per-user, per-search-space filter consulted at both definition save
-  time (validating `agent_task.tools`) and run time (scoping the agent's
-  tool list to what the automation creator can delegate).
-
-Until then:
-
-- v1 actions are stand-alone units (Layer 1 below); the automation engine
-  reads its own action registry, nothing else.
-- `agent_task.params.tools` is a forward-looking allowlist field with no
-  v1 semantics beyond "list of string identifiers." The handler's tool
-  resolution is opaque to the automation contract.
-
-### Credentials — deferred to Phase 2
-
-External-credential handlers (Slack, email, etc.) require per-user or
-per-connection auth. v1 actions run server-side with app-level
-configuration. When tight actions ship in Phase 2, the credential design
-lands as part of the unification redesign: connection IDs in the
-definition (never tokens); credentials loaded per-call by the handler
-context (never pre-loaded into worker memory); credentials never enter
-LLM context.
-
-### MCP — deferred to Phase 4
-
-External tool servers feeding tools into a shared registry land with the
-rest of the integration tooling in Phase 4, after the unification layer
-is in place. The two-tier registry, `mcp_connections` and `mcp_tools`
-tables, and the harvester arrive as a single coherent step then.
-
----
-
-## 4. Action contract
-
-An `Action` is what a user references in a plan step. Some actions are
-deterministic single-purpose handlers (`slack_post`, `send_email`); one
-action (`agent_task`) hosts an LLM and a tool allowlist for cases where
-judgment is needed. The contract is the same in both cases — only the
-handler differs.
-
-```python
-@dataclass(frozen=True, slots=True)
-class ActionDefinition:
-    type: str            # "agent_task", "slack_post"
-    name: str            # short UI label
-    description: str     # for the NL generator and the UI
-    params_schema: dict  # JSON Schema for step.params
-    handler: ActionHandler
-```
-
-This is the v1 shape: five fields, no handler context, no output
-contract, no artifact declaration. The deferrals are intentional:
-
-- **`output_contract`** — Phase 2. Deterministic handlers will return
-  a fixed shape; v1's only action (`agent_task`) takes an
-  `output_schema` inside `params` and validates against that instead.
-- **`produces_artifacts`** — Phase 5. Artifact lifecycle (storage,
-  signed URLs, retention) is its own design step; v1 handlers
-  persist their own outputs.
-- **Handler context** — paired with the unification redesign (§3).
-  v1 handlers receive `(args)` only; per-user / per-search-space
-  behavior is not yet a v1 concern.
-
-### Tight vs loose actions
-
-Two patterns coexist by design:
-
-- **Tight actions** (`slack_post`, `linear_create_issue`,
-  `send_email`) — deterministic single-purpose handlers. ~20 LOC
-  each. **Phase 2.**
-- **Loose actions** (`agent_task`) — params_schema accepts a `prompt`,
-  a `tools` allowlist, and an optional `output_schema` declaring what
-  the agent must return; the handler validates the agent's output
-  against it. **v1.**
-
-The agent's `tools` allowlist resolves opaquely in v1; the redesigned
-unification layer (§3) will give both invocation modes access to the
-same vocabulary, with per-user authorization gating both.
-
-### How names in the definition become function calls
-
-The definition contains strings like `"action": "agent_task"`. The
-string is just a name — it does not point to a function. At runtime,
-the executor performs a **name-based lookup** against the action
-registry:
-
-```python
-action_def = action_registry.get(step.action)   # dict lookup
-handler = action_def.handler                    # Python callable
-result = await handler(resolved_params)         # invocation
-```
-
-The registry is a Python dict populated at process startup. Each entry
-in `automations/registries/actions/*.py` calls `register_action(...)`
-at module import time, putting its `ActionDefinition` (including the
-handler function reference) into the registry.
-
-The definition is pure data. The registry is the engine's runtime
-vocabulary. They meet at name-based lookup; nothing else crosses the
-boundary.
-
-### The full expressive spectrum
-
-The contract supports a continuous spectrum from purely deterministic to
-fully agentic. Six practical shapes worth recognizing:
-
-| Shape | Example | Cost / latency profile |
-| --- | --- | --- |
-| **1. Direct call** | `slack_post` with literal channel and template | No LLM. ~200ms. Fractions of a cent. |
-| **2. Direct call with computed inputs** | `linear_create_issue` using `{{summary.title}}` from a prior step | No LLM for this step. Cheap. |
-| **3. Single-domain agent task** | `agent_task` with `tools: ["slack.*"]` only | One LLM, bounded toolset. |
-| **4. Multi-domain agent task, narrow** | `agent_task` with `tools: ["github.list_pull_requests", "linear.create_issue"]` | One LLM, named tools. |
-| **5. Multi-domain agent task, broad** | `agent_task` with `tools: ["slack.*", "github.*", "linear.*"]` | One LLM, large toolset, most agentic. |
-| **6. Composed plan** | `agent_task` (narrow) for thinking → `slack_post` + `linear_create_issue` for acting | Best cost-to-power ratio. |
-
-Shape 6 is the underrated one and the cost-and-speed answer. The agent
-reasons once (Shape 3 or 4) and its structured output drives several
-deterministic actions. This is roughly 5–10x cheaper and 3–4x faster than
-forcing the agent to do everything (Shape 5) and produces the same outcome.
-
-**The NL generator's job is to propose Shape 6-style plans by default.**
-The Review LLM flags proposals that use `agent_task` for steps a
-deterministic action could handle. This is the discipline that keeps
-automations cheap at scale.
-
-The user navigates the spectrum by intent (describing what they want), not
-by mechanism — the shape selection is the engine's responsibility, not the
-user's.
-
----
-
-## 5. Automation definition
-
-This is the JSON the user writes (or the NL generator produces). Stored in
-`automations.definition` as JSONB.
-
-### Top-level shape
-
-```jsonc
-{
-  "schema_version": "1.0",
-  "name": "Daily competitor digest",
-  "goal": "Summarize new competitor content and post to Slack",
-
-  "inputs": {
-    "schema": {
-      "type": "object",
-      "required": ["since"],
-      "properties": {
-        "since": { "type": "string", "format": "date-time",
-                   "default": "$last_fired_at" },
-        "tags":  { "type": "array", "items": { "type": "string" },
-                   "default": ["competitor"] }
-      }
-    }
-  },
-
-  "triggers": [
-    {
-      "type": "schedule",
-      "params": { "cron": "0 9 * * 1-5", "timezone": "Africa/Kigali" }
-    }
-  ],
-
-  "plan": [
-    {
-      "step_id": "research",
-      "action": "agent_task",
-      "params": {
-        "prompt": "Find documents tagged {{inputs.tags}} indexed since {{inputs.since}}. Return JSON with bullets and source_doc_ids.",
-        "tools": ["search_space.query", "search_space.fetch_document"],
-        "model": "anthropic/claude-sonnet-4-7",
-        "output_schema": {
-          "type": "object",
-          "required": ["bullets", "source_doc_ids"],
-          "properties": {
-            "bullets":        { "type": "array", "items": { "type": "string" } },
-            "source_doc_ids": { "type": "array", "items": { "type": "string" } }
-          }
-        }
-      },
-      "output_as": "summary"
-    },
-    {
-      "step_id": "deliver",
-      "action": "slack_post",
-      "params": {
-        "channel_id": "C0123",
-        "message_template": "*Competitor digest*\n\n{% for b in summary.bullets %}• {{b}}\n{% endfor %}"
-      }
-    }
-  ],
-
-  "execution": {
-    "timeout_seconds": 600,
-    "max_retries": 2,
-    "retry_backoff": "exponential",
-    "concurrency": "drop_if_running",
-    "on_failure": [ /* steps to run if main plan fails after retries */ ]
-  },
-
-  "metadata": { "tags": ["digest"] }
-}
-```
-
-### Plan steps
-
-```jsonc
-{
-  "step_id": "...",                      // unique within plan
-  "action": "...",                       // references an ActionDefinition.type
-  "when": "{{ ... }}",                   // optional Jinja expr → bool; false = skip
-  "params": { ... },                     // validated against action's params_schema
-  "output_as": "...",                    // binds output to this name for later steps
-  "max_retries": 0,                      // optional, overrides automation default
-  "timeout_seconds": 1200                // optional, overrides automation default
-}
-```
-
-Steps run **sequentially**. No parallelism, no DAGs, no loops. If a user
-needs branching, they use `when:` on multiple steps. If they need
-parallelism or iteration, they use `agent_task` and let the agent reason
-about it, or they compose automations through events (§7.5).
-
----
-
-## 6. Trigger contract
-
-Three trigger types. That's the entire taxonomy.
-
-### `schedule`
-
-```python
-TriggerDefinition(
-    type="schedule",
-    params_model=ScheduleTriggerParams,  # cron + timezone
-)
-# At fire time the schedule producer emits runtime inputs
-# (fired_at, scheduled_for, last_fired_at) which are merged with the
-# trigger row's static_inputs (static wins) and validated against
-# automation.definition.inputs.schema_.
-```
-
-Implementation: extends `app/utils/periodic_scheduler.py`, which already
-reads connector sync schedules. Adds a second source — `automation_triggers
-WHERE type='schedule'`. Same Celery Beat checker, two source tables.
-
-Minimum interval: 1 minute (the existing checker's resolution). The form
-editor warns when users set intervals under 15 minutes that they probably
-want an event trigger instead.
-
-### `webhook`
-
-```python
-TriggerDefinition(
-    type="webhook",
-    params_schema={
-        "type": "object",
-        "properties": {
-            "input_mapping": {
-                "type": "object",
-                "additionalProperties": { "type": "string" }
-                # values are JSONPath expressions
-            }
-        }
-    },
-    # payload is whatever the POST body is; user-defined shape via mapping
-)
-```
-
-Endpoint: `POST /api/v1/automations/{id}/fire`. Bearer token shown once,
-hashed at rest, rotatable, revocable. Returns `202 Accepted` with the
-created run's URL. Caller polls for status; we do not push callbacks in
-v1 (a `callback_webhook` action can be added later).
-
-Idempotency: honors `Idempotency-Key` header or `idempotency_key` in body.
-Dedups against runs in the last 24 hours.
-
-### `event`
-
-```python
-TriggerDefinition(
-    type="event",
-    params_schema={
-        "type": "object",
-        "required": ["event_type"],
-        "properties": {
-            "event_type": { "type": "string" },   # e.g. "drive.file_added"
-                                                   # or "surfsense.podcast.generated"
-            "filters":    { "$ref": "#/definitions/filter_expression" }
-        }
-    }
-    # payload shape is documented per event_type in a separate registry
-)
-```
-
-**Events absorb both connector events and internal SurfSense events.** A
-file added to Drive and a podcast finishing in SurfSense are both events
-in the same `domain_events` table, both subscribable by automations, both
-matched by the same dispatcher code. The engine doesn't distinguish.
-
-### Filter grammar
-
-Filters are JSON-structured operators, not expressions. This is the one
-place we deliberately don't use Jinja, because filters run on a hot path
-(every event matched against every subscribing trigger) and structured
-filters can be indexed and short-circuited.
-
-Vocabulary:
-- Equality: `equals`, `not_equals`
-- String: `starts_with`, `ends_with`, `contains`, `regex`
-- Numeric: `gt`, `gte`, `lt`, `lte`
-- Set: `in`, `not_in`
-- Existence: `exists`
-- Composition: `$and`, `$or`, `$not`
-
-Inspired by AWS EventBridge and MongoDB query syntax. The filter grammar
-itself is published as a JSON Schema, so users get inline error messages.
-
----
-
-## 7. Runtime components
-
-Each component is distinct, replaceable, and has one job.
-
-### 7.1 Dispatcher
-
-What it does: matches firing triggers to automations, creates `AutomationRun`
-rows, enqueues executor tasks.
-
-For schedule triggers: Celery Beat polls the trigger table, computes due
-ones, fires.
-
-For webhook triggers: the FastAPI handler is the dispatcher entry point.
-Validates token, runs input_mapping, creates run.
-
-For event triggers: subscribes to the `domain_events` table. For each new
-event, evaluates all matching triggers' filters, fires the matches.
-
-Common path (after a trigger has fired):
-1. Resolve `inputs` from trigger payload and defaults
-2. Validate resolved inputs against the automation's input schema
-3. **Idempotency check** — dedup against existing pending/running runs
-4. **Snapshot the resolved definition** into the run row (immutable history)
-5. Enqueue executor task on the single `automations_default` Celery queue
-
-The cost-estimate pre-check (originally step 3) is **deferred**. v1
-actions do not declare cost estimates, the run row has no `cost_usd`
-column, and no handler reports tokens used — so neither pre-flight
-prediction nor mid-flight accumulation can be enforced. `Execution`
-therefore does not expose `budget_cap_usd` in v1; it returns as a single
-field addition the day the cost ledger ships (per-action cost reporting
-+ `automation_runs.cost_usd` column + executor accumulation).
-
-Queue routing by `expected_duration_seconds` is **deferred** until load
-patterns justify a second queue. v1 uses a single queue.
-
-### 7.2 Executor
-
-What it is: **a Celery task wrapping a single function that walks a plan
-step by step.** Not an agent, not a workflow engine, not a scheduler. A
-loop with bookkeeping. Maybe 200 lines.
-
-```python
-async def execute_run(run_id: int) -> None:
-    run = load_run(run_id); run.status = "running"; save(run)
-    context = build_run_context(run)
-    step_outputs = {}
-
-    for step in run.plan:
-        if step.when and not evaluate_predicate(step.when, context | step_outputs):
-            record_step_skipped(run, step); continue
-
-        resolved_params = render_params(step.params, context | step_outputs)
-        action = action_registry.get(step.action)
-        validate(resolved_params, action.params_schema)
-
-        try:
-            result = await with_retries(
-                action.handler,
-                ctx=build_action_context(run, action),
-                args=resolved_params,
-                policy=step.retry_policy or run.execution.retry_policy,
-            )
-            validate(result, step.output_schema)
-            if step.output_as:
-                step_outputs[step.output_as] = result
-            record_step_succeeded(run, step, result)
-        except Exception as e:
-            record_step_failed(run, step, e)
-            await run_on_failure(run, e)
-            return
-
-    run.status = "succeeded"; save(run)
-    publish_event("automation.run.succeeded", run)   # see §7.5
-```
-
-Intelligence lives **inside handlers**, not in the executor. The most
-intelligent handler is `agent_task`, which spins up a LangGraph Deep Agent
-for one step and returns when the agent finishes. The executor sees a
-validated dict come back; it doesn't know that step was "smart."
-
-### 7.3 Action handlers
-
-One handler per `ActionDefinition.type`. Receives the validated `args`
-dict and returns whatever the step's output validates against (a fixed
-shape declared by tight actions, or a dynamic shape declared via
-`output_schema` in the step params for `agent_task`).
-
-Handlers do not know about retries or timeouts — those are the
-executor's concern.
-
-In v1, handlers take `(args)` only. The `CallContext` parameter sketched
-in §7.2's pseudo-code (caller user id, search space id, run id,
-credential resolver) arrives with the unification layer redesign (§3);
-v1's single action (`agent_task`) reads what it needs from app-level
-configuration.
-
-### 7.4 Template engine
-
-#### Why it exists
-
-Most fields in an automation definition contain literal strings the user
-authored once — but the actual rendered value has to change per run, because
-it includes data from the trigger payload or from prior step outputs. The
-template engine is what turns `"Daily digest for {{run.started_at}}"` into
-`"Daily digest for 2026-05-26"` at run time.
-
-Three fields use it:
-- `*_template` strings in tight action configs (Slack messages, email bodies,
-  Linear titles, etc.)
-- `prompt` in `agent_task` configs (so the agent sees resolved values, not
-  `{{...}}` placeholders)
-- `when:` step predicates (which need to evaluate to a boolean)
-
-#### Public interface
-
-Single module, ~80 lines. Three public functions — everything else in the
-engine routes through these:
-
-```python
-def render_template(template: str, context: dict) -> str: ...
-def evaluate_predicate(expression: str, context: dict) -> bool: ...
-def build_run_context(run, step_outputs) -> dict: ...
-```
-
-Backed by Jinja2's `SandboxedEnvironment`. The whole module is the seam: if
-the template language is ever swapped, only this file changes.
-
-#### Security architecture: allowlist by default
-
-`SandboxedEnvironment` starts empty. A freshly-created instance gives a
-template access to:
-- Variables in the context dict we pass in (`run`, `inputs`, prior step
-  outputs)
-- Public (non-underscore) attributes of those variables
-- Jinja's built-in control flow (`{% if %}`, `{% for %}`, `{% set %}`)
-
-Nothing else. No Python builtins, no modules, no I/O, no network, no
-filesystem. Everything beyond the above must be **explicitly registered.**
-This is the structurally important property: anything we didn't add is
-inaccessible. The risk surface equals the size of what we registered.
-
-The three sandbox rules that enforce this:
-1. **Attribute access is filtered** — names starting with underscore are
-   rejected. This blocks the entire family of `{{x.__class__.__mro__...}}`
-   Python escape paths in one rule.
-2. **Globals are allowlist-only** — `open`, `eval`, `exec`, `__import__`,
-   `getattr`, every module name, are all absent unless we register them.
-   We register zero globals.
-3. **Unsafe callables are blocked** — `str.format` and `str.format_map`
-   specifically (due to CVE-2016-10745), plus anything marked
-   `unsafe_callable`.
-
-#### What we register, exactly
-
-- **Filters: a curated 15**, no more. `join`, `length`, `default`, `upper`,
-  `lower`, `truncate`, `tojson`, `date`, `replace`, `trim`, `slugify`,
-  `first`, `last`, `sort`, `reverse`. Each one is audited for what it does
-  with its input; none of them takes a callable, runs `eval`, or reaches
-  into Python objects beyond simple data transformation.
-- **Globals: none.**
-- **Tests: only the safe built-ins** (`defined`, `none`, `number`, `string`,
-  `mapping`, `sequence`, `boolean`).
-
-Adding a new filter requires a deliberate code change and review: does this
-filter do anything dangerous with its input? If yes, don't add it. The list
-only grows by audited additions.
-
-#### Runtime limits (defense in depth)
-
-The sandbox handles the attack surface inside the template language. Three
-additional limits handle resource exhaustion that the language permits but
-the runtime shouldn't tolerate:
-
-- **Template source length capped at 8 KB.** Checked before parsing.
-- **Render time capped at 100 ms per render.** Implemented via a watchdog
-  thread; renders that exceed are killed and the step fails. Catches
-  `{% for i in range(10**9) %}` and nested loop bombs.
-- **Output size capped at 1 MB.** A small template can produce a multi-GB
-  string via `{{ 'A' * 10**8 }}`-style multiplication; this catches it.
-
-Plus `StrictUndefined`: any reference to a missing variable raises
-immediately rather than silently rendering empty, so misconfigurations
-fail fast.
-
-#### Threat model and residual risk
-
-The trust model from day one is:
-
-- Templates are generated by an LLM from a user's natural-language input
-  (see §10), or written/edited by humans in the editable form
-- A second LLM reviews the proposal and produces a plain-language summary
-  plus flagged anomalies for the user
-- The user reviews and approves before the automation runs
-- The Generator LLM's input is scoped (user prompt + schema + registry
-  only — no arbitrary document content), minimizing prompt-injection paths
-
-The sandbox + runtime limits + curated filter list protect against the
-malformed-template attack. Human review protects against the
-semantically-malicious-but-syntactically-valid attack. These are
-complementary layers, not redundant.
-
-Known residual risks, each genuinely small:
-
-- **Future Jinja CVEs.** Historical sandbox bypasses have existed and
-  been patched. This is a generic third-party-dependency risk, comparable
-  to bugs in any other library we rely on. Mitigation: subscribe to
-  security advisories, ship updates within a week of disclosure.
-- **Side channels via prompts to LLMs.** A template that renders into a
-  prompt can attempt prompt injection of the agent at run time. This is
-  not a sandbox concern but a separate concern in `agent_task`'s design.
-- **Operator deployments with long-lived secrets in worker env vars.**
-  Mitigation: credentials fetched per-handler-per-call via
-  `ActionContext.resolve_credentials`, never pre-loaded into worker
-  env vars accessible to templates.
-
-The sandbox-with-allowlist architecture means **the attack surface
-equals the set of things we registered.** With zero globals registered
-and 15 audited filters, the surface is small, bounded, and reviewable.
-This is the structural property that makes the architecture sound, and
-it doesn't depend on hypothetical assumptions about who authors templates.
-
-#### Pre-Phase-5 gate
-
-One trust-model change is documented in the roadmap: **Phase 5 introduces
-template sharing across SearchSpaces** (automation templates as
-exportable, importable artifacts). At that point, the *approver* of a
-template (the original author) is no longer the *runner* (the importer).
-The "human reviews before save" mitigation breaks down because the
-reviewer doesn't bear the risk.
-
-Before Phase 5 ships, this needs an explicit re-approval flow: importing
-a template triggers a fresh review pass by the importing user, with the
-flagged-anomalies output prominently displayed, and the import cannot
-complete without explicit per-template approval.
-
-This is a UX/flow decision, not a template-language migration. Jinja
-itself stays; what changes is the approval workflow at the import boundary.
-
-#### The `run.*` namespace exposed in every template
-
-```
-run.id, run.started_at, run.automation_id, run.automation_name,
-run.automation_version, run.trigger_type, run.trigger_id,
-run.search_space_id, run.creator_id, run.attempt,
-run.failed_step_id, run.error.*   (only in on_failure context)
-```
-
-#### Default value rendering
-
-Non-string template values render as JSON by default (via the `finalize`
-hook): lists become `["a", "b"]`, dicts become `{"k": "v"}`, datetimes
-become ISO 8601. The `| join`, `| length`, `| tojson` filters give explicit
-control. Strings render as themselves with no quoting. `None` renders as
-empty string in templates, as `null` in JSON contexts.
-
-### 7.5 Event bus
-
-`domain_events` table, polled by Celery Beat alongside the existing
-scheduler. Both connector events and internal SurfSense events publish to
-it. Both are consumed by the dispatcher's event-trigger subscriber.
-
-**Automations themselves publish events.** Successful and failed runs emit
-`automation.run.succeeded` / `automation.run.failed` events with the run
-metadata. This makes automations composable through events — chain them by
-subscribing one automation's event trigger to another's run event. No new
-mechanism; the trigger filter and event publishing already exist.
-
-Upgrade path documented: when throughput or latency demands it, replace
-PostgreSQL polling with Redis Streams. The `events.publish()` and
-`events.subscribe()` interfaces stay the same. Nothing else changes.
-
----
-
-## 8. Cross-cutting concerns
-
-### Concurrency policy
-
-Per-automation `concurrency` field controls what happens when a new fire
-occurs while a previous run is still running:
-
-- `drop_if_running` — silently skip the new fire
-- `queue` — execute serially, in arrival order
-- `allow_parallel` — start a new run independently
-
-The dispatcher enforces this before enqueueing.
-
-### Retry policy
-
-Three fields, per-automation defaults with optional per-step overrides:
-- `max_retries`: integer, 0–10
-- `retry_backoff`: `none` | `linear` | `exponential`
-- `timeout_seconds`: integer
-
-Retries on:
-- Action handler exceptions
-- Output schema validation failures (for dynamic-output actions, the
-  validation error is fed back to the LLM in the retry)
-
-Not retries:
-- `when:` evaluation failures (these are user errors, surface immediately)
-- Input validation failures (caught at dispatch, never reach the executor)
-
-### Budget enforcement *(deferred — not in v1)*
-
-Future shape: `budget_cap_usd` on `Execution`, dispatcher refuses to
-enqueue if estimated cost exceeds it, executor kills the run if
-accumulated cost crosses it mid-flight (the LLM ops handler reports
-tokens consumed back to the executor between calls).
-
-Prerequisites before this can land:
-- Each action declares cost reporting (tokens × model price, API call
-  charges) — `ActionDefinition` has no such field today.
-- `automation_runs.cost_usd` column + executor accumulates per step.
-- A historical-cost ledger so pre-flight estimation can return useful
-  numbers (otherwise the dispatcher gate is guessing).
-
-Until all three exist, v1 has no surface for budget enforcement.
-
-### On-failure handlers
-
-`execution.on_failure` is a list of steps that run after the main plan has
-failed and all retries are exhausted. Same step shape as the main plan.
-Cannot have their own `on_failure`. See `run.error.*` in the run context.
-
-### Artifacts
-
-Actions that produce artifacts declare `produces_artifacts: list[ArtifactSpec]`:
-
-```python
-@dataclass
-class ArtifactSpec:
-    kind: str           # "audio", "document", "image", "data"
-    retention: str      # "transient" | "default" | "permanent"
-    visibility: str     # "private" | "search_space" | "shared"
-```
-
-The engine handles storage (writes to SurfSense's existing object storage),
-URL generation (signed, scoped to the run's permissions), and cleanup (a
-nightly Celery Beat task deletes expired artifacts).
-
-### Duration classes and queue routing — deferred
-
-The original design routed runs to multiple Celery queues based on each
-action's declared `expected_duration_seconds`. v1 ships with **one
-queue** (`automations_default`) and actions do not declare a duration.
-Multi-queue routing returns when burst load on a single queue actually
-justifies the operational complexity of independent worker pools.
-
-Adding the second queue is a config change plus reintroducing
-`expected_duration_seconds` on the `ActionDefinition` dataclass — both
-mechanical, additive, and free of design rewrite.
-
----
-
-## 9. Data model
-
-**v1 ships three tables:** `automations`, `automation_triggers`,
-`automation_runs`. All scoped by `search_space_id` for RBAC.
-
-The other three tables described in earlier drafts are deferred:
-
-- `domain_events` → **deferred to Phase 3** (introduced with the event
-  trigger).
-- `mcp_connections`, `mcp_tools` → **deferred to Phase 4** (MCP
-  integration).
-
-The deferred tables ship as-is when their consuming feature lands;
-nothing in the v1 schema needs to change to accommodate them. The three
-v1 tables form the engine's persistent state — definitions, triggers,
-and an immutable run history.
-
-### `automations`
-
-| field             | type                                | notes                                                                      |
-| ----------------- | ----------------------------------- | -------------------------------------------------------------------------- |
-| `id`              | int PK                              |                                                                            |
-| `search_space_id` | FK → `search_spaces.id`             |                                                                            |
-| `created_by`      | FK → `users.id`                     | runs execute as this identity                                              |
-| `name`            | str                                 |                                                                            |
-| `description`     | str                                 |                                                                            |
-| `status`          | enum                                | `active`, `paused`, `archived`                                             |
-| `definition`      | jsonb                               | the editable structured spec                                               |
-| `version`         | int                                 | bumped on every edit                                                       |
-| `created_at` / `updated_at` | timestamps                |                                                                            |
-
-### `automation_triggers`
-
-| field           | type                                                                          | notes                                                       |
-| --------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------- |
-| `id`            | int PK                                                                        |                                                             |
-| `automation_id` | FK                                                                            |                                                             |
-| `type`          | enum: `schedule`, `manual` (Phase 2/3 add `webhook`, `event`)                  |                                                             |
-| `params`        | jsonb                                                                         | trigger-type config, validated against trigger's `params_schema` |
-| `static_inputs` | jsonb                                                                         | per-attachment domain values merged into every run (static wins on collision) |
-| `enabled`       | bool                                                                          |                                                             |
-| `last_fired_at` | timestamp                                                                     |                                                             |
-| `next_fire_at`  | timestamp / null                                                              | precomputed next fire moment for schedule triggers          |
-
-`secret_hash` (for webhook bearer tokens) is **deferred to Phase 2** with
-the webhook trigger.
-
-### `automation_runs`
-
-| field             | type                                                                         | notes                                              |
-| ----------------- | ---------------------------------------------------------------------------- | -------------------------------------------------- |
-| `id`              | int PK                                                                       |                                                    |
-| `automation_id`   | FK                                                                           |                                                    |
-| `trigger_id`      | FK / null                                                                    | null = manual via UI                               |
-| `status`          | enum                                                                         | `pending`, `running`, `succeeded`, `failed`, `cancelled`, `timed_out` |
-| `definition_snapshot` | jsonb                                                                    | the definition as it was when this run fired       |
-| `inputs`          | jsonb                                                                        | merged & validated inputs (trigger.static_inputs ∪ producer runtime data, static wins) |
-| `step_results`    | jsonb                                                                        | array of per-step results with timing              |
-| `output`          | jsonb / null                                                                 |                                                    |
-| `artifacts`       | jsonb                                                                        | references to created artifacts                    |
-| `error`           | jsonb / null                                                                 |                                                    |
-| `started_at` / `finished_at` | timestamps                                                        |                                                    |
-| `agent_session_id`| str / null                                                                   | link to LangGraph trace if agent_task was used     |
-
-`cost_usd` (per-run accumulated cost) is **deferred** until at least one
-action records token-level cost. When reintroduced it lands as a
-column-only migration.
-
-### Deferred tables
-
-- **`domain_events`** — the event bus backing event triggers. Ships in
-  Phase 3 with the event trigger. v1 only emits `automation.run.*`
-  events into application logs; the table is added when at least one
-  consumer needs to subscribe to them.
-- **`mcp_connections`** / **`mcp_tools`** — see §3. Both ship in Phase 4
-  alongside the MCP harvester and the two-tier registry.
-
-NL drafts are **not** a core table. They live in a generic short-TTL
-store (Redis or a transient table) when the NL flow is built in
-Phase 3.
-
----
-
-## 10. NL authoring flow
-
-**This is how the system is intended to be used from day one, not just a
-Phase 3 addition.** The product surface is: user describes intent in natural
-language, LLM produces a structured proposal, user reviews and edits in an
-auto-generated form, then saves. Hand-authoring JSON directly is supported
-but is not the primary path.
-
-This shapes the trust model. Templates are LLM-generated from day one, not
-hand-written by power users. The mitigation is human-in-the-loop review,
-not "trusted authors only."
-
-### Pass 1: Proposal generation
-
-User provides natural-language input. The Generator LLM is given:
-- The full schema set (input schema for definition, registry of action
-  types with their params_schemas, registry of trigger types, list of
-  allowed Jinja filters)
-- A tool to list available connectors, channels, and other SearchSpace
-  resources, so it doesn't invent names that don't exist
-- A few-shot set of examples
-
-**Scoped input.** The Generator does *not* receive arbitrary SearchSpace
-document content. Its context is the user's prompt plus the schema and
-registry information. This minimizes the prompt-injection surface — there's
-no document text in the context for an attacker to seed instructions into.
-
-If a user wants document-aware generation later ("create an automation
-that processes documents like this one"), that's a deliberate feature
-extension with its own prompt-injection mitigations, not the default flow.
-
-Output: a structured proposal matching the automation definition schema.
-
-### Pass 2: Deterministic validation
-
-Server-side, before the proposal reaches the user:
-- Validate against JSON Schema (shape correctness)
-- Verify every action and trigger type referenced exists in the registry
-- Verify every connector/channel/resource referenced exists in this SearchSpace
-- Validate every template against the sandbox's allowlist (no underscore
-  attributes, no unregistered filter names, length under cap)
-
-Failures here are deterministic errors, not warnings. A proposal that
-references a non-existent action or includes a template using
-`{{x.__class__}}` is rejected before the user sees it; the Generator is
-re-prompted with the validation error and asked to fix the proposal.
-
-### Pass 2.5: Review pass
-
-A second LLM call — the **Review LLM** — examines the validated proposal and
-produces two outputs for the user:
-
-1. **A plain-language summary** of what the automation will do, in business
-   terms. "This automation will run every weekday at 9am. It reads documents
-   in this SearchSpace tagged 'competitor' that were indexed since the last
-   run, asks an agent to summarize them as 5 bullets, and posts the summary
-   to your #engineering-standup Slack channel. Estimated cost: $0.40 per
-   run."
-
-2. **A "things worth checking" list** flagging anything unusual:
-   - Templates with unusual attribute paths or filter usage
-   - Prompts containing instructions that look more like commands than
-     descriptions ("ignore previous instructions" style)
-   - Action sequences that touch external systems without obvious benefit
-     to the user
-   - Cost estimates that seem high relative to the goal
-   - References to actions the user hasn't used before
-   - Schedules tighter than 15 minutes (likely should be event triggers)
-
-The Review LLM is a **UX layer** that makes review actually useful. It is
-**not a security boundary.** The deterministic controls (sandbox, runtime
-limits, schema validator) are the security boundaries. The Review LLM
-helps users catch their own intent mismatches and surfaces anomalies for
-attention, but the sandbox would block dangerous templates even if the
-Review LLM missed them.
-
-This separation is important: two probabilistic controls compounding can
-create a false sense of security. The Review LLM is explicitly framed in
-the architecture as helper, not gatekeeper.
-
-### Pass 3: Editable review
-
-The user lands on a form pre-filled with the proposal. The page shows:
-- The plain-language summary from the Review pass
-- The flagged items, prominently displayed near the relevant fields
-- The full editable form, auto-generated from the JSON Schemas
-- Cost estimate and impact summary (which external systems get touched)
-
-**Every field is editable.** Clarifications appear as required fields.
-Templates are shown in code-styled fields with syntax highlighting and the
-filter palette visible. The user can edit any field; saving re-runs Pass 2
-(deterministic validation) before persisting.
-
-Hitting **Save** promotes the proposal to an `automation` row.
-
-### Editing existing automations
-
-NL editing of an existing automation is a patch operation: the Generator
-LLM receives the current definition plus the NL instruction and produces a
-modified proposal. The same Pass 2 (validation) and Pass 2.5 (review) run
-against the modified version, and the user reviews the diff before saving.
-Existing run history is unaffected — only future runs use the new version.
-
-### Why human-in-the-loop is non-negotiable
-
-The Generator LLM, the Review LLM, and the sandbox are three layers of
-defense against malformed or malicious proposals. The human approval step
-is the fourth and most important layer. It exists because:
-
-- LLMs can be prompt-injected; humans can spot text that asks them to
-  ignore instructions
-- LLMs can produce confident-but-wrong proposals; humans can catch
-  semantic mismatches between intent and output
-- The cost of a bad automation running unattended is high; the cost of a
-  user clicking "approve" after reading is low
-
-The architecture must never offer "auto-approve" or "skip review" options
-for LLM-generated proposals. Save requires human action on the proposal,
-always.
-
----
-
-## 11. Repository layout
-
-```
-surfsense_backend/app/
-├── automations/                       # NEW: the engine
-│   ├── __init__.py
-│   ├── persistence/                   # SQLAlchemy models + enums for 3 tables
-│   ├── schemas/                       # Pydantic schemas (definition envelope, etc.)
-│   ├── routes.py                      # FastAPI router (/api/v1/automations)
-│   ├── service.py                     # CRUD + business logic
-│   ├── dispatcher.py                  # trigger matching, run creation
-│   ├── executor.py                    # the Celery task that runs a plan
-│   ├── templating.py                  # Jinja sandbox + filters
-│   ├── events.py                      # publish/subscribe for domain_events
-│   ├── filters.py                     # JSON filter grammar evaluator
-│   ├── registries/                    # action and trigger registries
-│   │   ├── actions/                   # ActionDefinition + handler registration
-│   │   └── triggers/                  # TriggerDefinition
-│   └── nl/                            # Phase 1 — primary user path
-│       ├── generator.py               # Generator LLM
-│       ├── reviewer.py                # Review LLM (summary + flagged items)
-│       ├── validator.py               # deterministic schema + resource checks
-│       └── prompts.py                 # system prompts for both LLMs
-│
-├── utils/
-│   └── periodic_scheduler.py          # EXTENDED to scan automation_triggers
-│
-└── alembic/versions/
-    └── NN_add_automation_tables.py
-
-surfsense_web/app/(routes)/
-└── automations/                       # NEW: UI
-    ├── page.tsx                       # list
-    ├── new/page.tsx                   # NL input + draft preview (Phase 1)
-    ├── [id]/page.tsx                  # editor (auto-generated forms)
-    └── [id]/runs/page.tsx             # run history, streamed via Electric SQL
-```
-
----
-
-## 12. Phased delivery
-
-Each phase delivers something usable. Each de-risks the next. **NL authoring
-is the primary user path from Phase 1** — what evolves across phases is
-which actions and triggers are available, not whether users can describe
-automations in natural language.
-
-### Phase 1 — Engine MVP with NL authoring
-
-**Step 1 (current scope, this batch of commits):**
-- 3 tables (`automations`, `automation_triggers`, `automation_runs`) +
-  Alembic migration
-- Empty action and trigger registries under
-  `app/automations/registries/` (concrete entries land in later steps)
-- Pydantic schemas for the automation definition envelope, the two v1
-  trigger params shapes (`schedule`, `manual`), and the one v1 action
-  params shape (`agent_task`)
-- Module structure under `app/automations/` (persistence/, schemas/,
-  registries/), fully isolated from the existing codebase
-
-**Step 2:**
-- The `agent_task` action handler and the `schedule` / `manual` triggers
-  registered in `app/automations/registries/`. Tool resolution for
-  `agent_task.params.tools` is opaque to the contract — the handler
-  decides what string identifiers it accepts and how they resolve.
-
-**Step 3:**
-- Executor (single-queue Celery task) with retries and timeouts
-- Template engine (Jinja sandbox + the v1 filter allowlist + runtime
-  limits)
-- Manual "Run now" endpoint
-
-**Step 4:**
-- NL authoring flow: Generator LLM, deterministic validator, Review LLM,
-  editable form
-- Run history UI with Electric SQL streaming
-
-**After Phase 1**: a user can describe an automation in natural language,
-review the proposal (with summary + flagged anomalies), edit any field,
-save, and watch it run on a schedule.
-
-### Phase 2 — Webhooks and delivery
-- `webhook` trigger with per-automation bearer tokens
-- Tight actions: `slack_post`, `send_email`, `notification`
-- `transform_data` action
-- `on_failure` hooks
-- Step-level retry/timeout overrides
-- Concurrency policy enforcement
-
-**After Phase 2**: external systems can drive automations, results go
-somewhere humans see, complex pipelines have proper error handling.
-
-### Phase 3 — NL authoring polish
-- NL patch flow for editing existing automations (diff-based)
-- Conversational refinement during proposal review ("change the schedule
-  to weekdays only," "add a Slack notification on failure")
-- Improved Review LLM coverage (more anomaly patterns, cost-relative-to-
-  goal heuristics)
-- Saved prompt templates and starter examples
-
-**After Phase 3**: NL authoring is the polished primary surface; edit
-flows are conversational rather than form-only.
-
-### Phase 4 — Event triggers + integration tooling
-- `domain_events` table and `events.py` module
-- Indexing pipeline publishes `connector.*` events (smallest change — just
-  add publish calls to the existing flow)
-- Automations publish `automation.run.*` events on completion
-- `event` trigger with filter grammar
-- The unification layer redesign (see §3) — `CallContext`, scope
-  declarations, per-user authorization gating
-- MCP integration on top of the unification layer (external tool servers
-  harvested into the shared catalog)
-
-**After Phase 4**: "do X when Y happens" automations work, including
-automation-chaining through events; external MCP tools and SurfSense
-actions share one vocabulary.
-
-### Phase 5 — Wrapping existing features and sharing
-- Wrap existing SurfSense features as actions: `podcast_generation`,
-  `report_generation`, `indexing_sweep`
-- Artifact lifecycle implementation
-- `expected_duration_seconds` based queue routing (split `automations_long`
-  from `automations_default`)
-- **Automation templates** (shareable, exportable, importable) — with
-  the import re-approval flow that handles the approver-≠-runner trust
-  shift documented in §7.4's pre-Phase-5 gate
-- Cross-automation composition examples in the docs
-
-**After Phase 5**: every existing SurfSense feature is automatable
-without any per-feature code, and automations can be shared between
-SearchSpaces and users.
-
----
-
-## 13. Decisions locked
-
-For reference — every decision made through the design process, in one
-place.
-
-### Foundations
-1. ✅ JSON Schema (draft 2020-12) is the single schema language for everything
-2. ✅ Definition is the program; infrastructure is the interpreter
-3. ✅ List of steps (not single action) in the plan, with `output_as` chaining
-4. ⏸ Capability unification layer (one catalog shared by automations, agents, and future surfaces) — **deferred to post-v1** (see §3). v1 ships actions only.
-5. ✅ Name-based resolution: definitions reference action and trigger types by string ID. The registry is the runtime's vocabulary; lookup is a dict access. No code references in definitions.
-6. ✅ The expressive spectrum runs from pure direct calls to broad agent_task; the NL generator proposes the cheapest shape that meets intent (Shape 6 from §4 by default)
-
-### Trigger taxonomy
-8. ✅ Three trigger types: `schedule`, `webhook`, `event`
-9. ✅ Events absorb both connector events and internal SurfSense events
-10. ✅ Filter grammar is JSON-structured operators (not Jinja)
-
-### Templating cluster
-11. ✅ Jinja2 `SandboxedEnvironment` for templates and `when:` predicates — but with the explicit understanding that the sandbox is an allowlist-by-default architecture, not a denylist
-12. ✅ Zero globals registered. Curated 15 filters only, each audited for safe behavior with hostile input. List grows only by reviewed addition
-13. ✅ Four runtime mitigations: `StrictUndefined`, 8 KB template source cap, 100 ms render time cap (watchdog-enforced), 1 MB output size cap
-14. ✅ Non-string template values render as JSON by default
-15. ✅ Fixed `run.*` namespace, documented
-16. ⏸ **Pre-Phase-5 gate**: template sharing across SearchSpaces breaks the approver-equals-runner trust model. Mitigation is a re-approval flow at the import boundary (UX-level), not a template-language migration. Jinja itself stays.
-
-### Execution
-17. ✅ Executor is a Celery task wrapping a sequential loop — not an agent
-18. ✅ `when:` is optional per step; false = skipped (not failed)
-19. ✅ No DAGs, no parallelism, no loops — composition via agent_task or events
-20. ✅ `on_failure` part of execution policy from v1
-21. ✅ Step-level retry and timeout overrides
-22. ⏸ Budget cap enforced pre-enqueue and mid-flight — **deferred** until the cost ledger ships (see §8 Budget enforcement)
-
-### Components
-23. ✅ Dispatcher / executor / handlers / registry — distinct, each replaceable
-24. ⏸ Side effects are a set, including `USER_VISIBLE` — **deferred** until multi-user automation RBAC ships
-25. ⏸ `expected_duration_seconds` integer drives queue routing — **deferred** until a second Celery queue is needed
-26. ⏸ `produces_artifacts` is a list of `ArtifactSpec`, not a bool — **deferred** until artifacts beyond the deliverable handlers' own persistence are needed
-27. ✅ Output schemas recommended on `agent_task`; editor warns when missing
-
-### Event bus
-28. ✅ `domain_events` table for v1, with upgrade path to Redis Streams
-29. ✅ Automations publish run events for composability
-30. ✅ Publish/subscribe behind interface — no direct table access elsewhere
-
-### Capability unification — all deferred to post-v1
-31. ⏸ One shared catalog of "things this SurfSense instance can do" — **deferred**, see §3
-32. ⏸ Handler `CallContext` (caller user id, search space id, run id) — **deferred** with unification
-33. ⏸ Per-capability scope declarations driving authorization — **deferred** with unification
-34. ⏸ MCP integration on top of the unification layer (`mcp_connections`, `mcp_tools`, harvester) — **deferred to Phase 4**
-
-### Credentials — all deferred to Phase 2
-35. ⏸ Credentials never appear in the automation definition — only connection IDs do — **Phase 2**
-36. ⏸ Credentials never appear in the LLM's context — the host holds them — **Phase 2**
-37. ⏸ Credentials resolved per-call by the handler context, not pre-loaded into worker environment — **Phase 2**
-38. ⏸ Tokens encrypted at rest; refresh handled automatically by the handler context — **Phase 2**
-
-### v1-minimum
-39. ✅ v1 ships actions only — no separate capability layer. `ActionDefinition` is five fields: `type`, `name`, `description`, `params_schema`, `handler`. Additional fields are added only when a concrete consumer feature requires them.
-40. ✅ Cost is **measured** from a per-run ledger, not declared. Pre-flight cost checks return when the ledger has enough history.
-41. ✅ Single `automations_default` Celery queue in v1. Multi-queue routing returns when load justifies it.
-
-### NL authoring
-42. ✅ LLM-authored templates is the primary path from day one — not a Phase 3 addition. Hand-authoring JSON is supported but secondary
-43. ✅ Generator LLM produces JSON; deterministic schema + resource validation runs before user sees the proposal
-44. ✅ Review LLM produces plain-language summary + flagged anomalies for the user — UX layer, not a security boundary
-45. ✅ Generator LLM's input is scoped (user prompt + schema + registry only); arbitrary document content is not fed in
-46. ✅ Human approval is required before save — no auto-approval option, ever
-47. ✅ Every field editable in the proposal; unresolved questions surface as clarifications
-48. ✅ NL drafts are transient storage, not a core table
-
-### Data model
-49. ✅ v1 ships three tables (`automations`, `automation_triggers`, `automation_runs`). `domain_events` lands in Phase 3; `mcp_connections` and `mcp_tools` in Phase 4.
-50. ✅ Run rows snapshot the definition (immutable history)
-51. ✅ All entities scoped by `search_space_id` for RBAC
-52. ✅ Editing an automation bumps `version`; existing runs unaffected
-
----
-
-## 14. Open questions deferred to implementation
-
-None of these block design; they're decisions a developer will make in
-context, with the principle from §1 as their guide.
-
-- Exact retry backoff formulas (multipliers, jitter, ceilings)
-- Webhook signature verification standards (HMAC scheme, header naming)
-- Whether to support inline JSON Schema `$ref` to external schemas, or
-  inline everything
-- Specific CDN/storage backend choices for artifacts (probably
-  whatever SurfSense already uses for podcasts)
-- Rate limits per SearchSpace and per user
-- Audit log retention policy
-
----
-
-## 15. Why this is ready to build
-
-This document satisfies five tests:
-
-1. **The four worked examples** (digest, CI webhook, file-added-trigger,
-   weekly podcast) all express cleanly in the contract without special
-   cases. Each one was used to find gaps before the gaps reached code.
-
-2. **The audit pass identified six refinements**, all incorporated. No
-   pending audit items.
-
-3. **Every decision points back to the principle from §1.** When a future
-   feature request lands, "does it belong in the definition or in the
-   engine?" gives a clear answer.
-
-4. **The build is staged** so Phase 1 ships in weeks, not months, and
-   each subsequent phase delivers user value while de-risking the next.
-
-5. **Existing SurfSense infrastructure is reused**, not paralleled. Celery
-   Beat, PostgreSQL/JSONB, Electric SQL, SQLAlchemy/Alembic, the existing
-   `tools/registry.py` pattern, the existing Search Space scoping — all
-   continue to do what they already do. The automation engine is a new
-   directory, not a new system.
-
-The next document a developer needs is the Pydantic models and JSON
-Schemas spelled out concretely. Those follow mechanically from this plan.
-
----
-
-*Sources consulted: Claude Code Routines documentation; NousResearch/hermes-
-agent (cron and skills subsystems); n8n documentation on node types and
-workflow data model; the SurfSense repository and DeepWiki architecture
-notes (FastAPI + Celery Beat + Electric SQL + LangGraph Deep Agents +
-Search Space RBAC); Model Context Protocol specification for external
-tool harvesting; AWS EventBridge for filter grammar; workflow-pattern
-literature (van der Aalst et al.) for the trigger / action / concurrency
-vocabulary.*