mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-29 19:35:20 +02:00
feat(automations): static_inputs on triggers + vertical-slice api/services
This commit is contained in:
parent
84d99f19a2
commit
27ab367a13
27 changed files with 915 additions and 356 deletions
|
|
@ -34,24 +34,27 @@ system will survive feature growth:
|
|||
|
||||
---
|
||||
|
||||
## 2. The four-layer contract
|
||||
## 2. The three-layer contract
|
||||
|
||||
The system is structured as four layers. Layers 1, 2, and 4 are defined by
|
||||
SurfSense developers (at registration time). Layer 3 is what users write
|
||||
(or the NL generator produces). The runtime reads all four to do its job.
|
||||
The system is structured as three layers. Layers 1 and 3 are defined by
|
||||
SurfSense developers (at registration time). Layer 2 is what users write
|
||||
(or the NL generator produces). The runtime reads all three to do its job.
|
||||
|
||||
| Layer | What it is | Defined by |
|
||||
| ----- | ---------- | ---------- |
|
||||
| **1. Capability registry** | What this SurfSense instance can do | Developers, at startup |
|
||||
| **2. Action contract** | Per-action input/output schema | Developers, at startup |
|
||||
| **3. Automation definition** | One concrete saved automation | Users (or NL generator) |
|
||||
| **4. Trigger contract** | Per-trigger config and payload schemas | Developers, at startup |
|
||||
| **1. Action contract** | Per-action params and output schema | Developers, at startup |
|
||||
| **2. Automation definition** | One concrete saved automation | Users (or NL generator) |
|
||||
| **3. Trigger contract** | Per-trigger params and payload schemas | Developers, at startup |
|
||||
|
||||
Each layer constrains the one above. The runtime reads all four but doesn't
|
||||
know what's in them ahead of time. That's how a new capability or trigger
|
||||
Each layer constrains the next. The runtime reads all three but doesn't
|
||||
know what's in them ahead of time. That's how a new action or trigger
|
||||
type becomes available across the engine without code changes outside its
|
||||
registration.
|
||||
|
||||
A unification layer below Layer 1 — one catalog of "things this SurfSense
|
||||
instance can do," shared by automations, agents, and future surfaces — was
|
||||
considered and deferred (§3). v1 actions are stand-alone.
|
||||
|
||||
### Schema language
|
||||
|
||||
Every shape in every layer is described in **JSON Schema (draft 2020-12).**
|
||||
|
|
@ -66,167 +69,126 @@ extensions on top:
|
|||
|
||||
---
|
||||
|
||||
## 3. Capability registry (Layer 1)
|
||||
## 3. Capability unification layer — deferred to post-v1
|
||||
|
||||
A `Capability` is one discrete thing the SurfSense backend exposes —
|
||||
"post a Slack message," "query the Search Space," "generate a podcast." It
|
||||
is the atomic unit of "things automations can do."
|
||||
Earlier drafts introduced a `Capability` registry as Layer 1: one catalog
|
||||
of "things this SurfSense instance can do," shared by the automation
|
||||
engine (as actions), the agent (as tools), and any future HTTP surface.
|
||||
The motivation is real — one source of truth beats N parallel registries —
|
||||
but v1 has a single action (`agent_task`) and a single consumer (the
|
||||
automation engine). The five-field shape sketched earlier (`id`,
|
||||
`description`, `input_schema`, `output_schema`, `handler`) cannot safely
|
||||
host any non-trivial capability: it carries no caller identity, no
|
||||
search-space scoping, and no authorization gate on tool delegation.
|
||||
Building the abstraction with one consumer would lock in a shape that
|
||||
doesn't survive the second consumer.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Capability:
|
||||
id: str # "slack.post_message"
|
||||
description: str # for the NL generator + UI label
|
||||
input_schema: dict # JSON Schema
|
||||
output_schema: dict # JSON Schema
|
||||
handler: AsyncHandler
|
||||
```
|
||||
The unification layer returns when the second consumer lands (Phase 2
|
||||
tight actions or Phase 4 MCP), redesigned from the start with:
|
||||
|
||||
### v1-minimum: five fields, nothing else
|
||||
- A `CallContext` carrying caller user id, search space id, and run id,
|
||||
passed to every handler invocation.
|
||||
- Explicit scope declarations per capability (e.g. `reads:documents`,
|
||||
`writes:slack`, `destructive`) for the authorization layer to read.
|
||||
- A per-user, per-search-space filter consulted at both definition save
|
||||
time (validating `agent_task.tools`) and run time (scoping the agent's
|
||||
tool list to what the automation creator can delegate).
|
||||
|
||||
The Capability is **deliberately five fields in v1**. Every additional field
|
||||
that earlier drafts considered (`name`, `required_credentials`,
|
||||
`side_effects`, `expected_duration_seconds`, `cost_estimate`) has been
|
||||
removed until a concrete consumer feature demands it. Authoring stays cheap
|
||||
and the registry stays trivial to introspect:
|
||||
Until then:
|
||||
|
||||
- `name` → folded into `description`. The UI can render a short label from
|
||||
the first line of `description` or fall back to `id`. No separate field
|
||||
needed in v1.
|
||||
- `required_credentials` → returns when external-credential capabilities
|
||||
ship (Phase 2). v1 capabilities run server-side with app config; nothing
|
||||
to declare.
|
||||
- `side_effects` → returns when RBAC inside automations or
|
||||
`READ_ONLY`-only agent tool gating arrives. v1 capabilities are
|
||||
hand-picked and all trusted code.
|
||||
- `expected_duration_seconds` → returns when multi-queue routing ships.
|
||||
Single Celery queue in v1.
|
||||
- `cost_estimate` → never returns as a declared field; cost is measured
|
||||
per run from a ledger, aggregated per Capability, and surfaced as a
|
||||
historical average. Pre-flight checks are deferred.
|
||||
|
||||
The runtime invariant: a Capability is **a typed, named, callable thing
|
||||
the system can do.** Every consumer (executor, agent tool layer, future
|
||||
HTTP API) sees the same five-field shape and uses it the same way.
|
||||
|
||||
### Where capabilities live (v1)
|
||||
|
||||
In v1, the capability registry is a single in-memory dict, populated at
|
||||
process startup from native registrations in
|
||||
`automations/registries/capabilities/`. Identical across all workers.
|
||||
No database persistence, no closures rebuilt per worker.
|
||||
|
||||
### MCP integration — deferred to Phase 4
|
||||
|
||||
The earlier two-tier registry (native + MCP-derived), the
|
||||
`mcp_connections` / `mcp_tools` tables, the harvester, and the lazy
|
||||
per-worker closure cache are **deferred to Phase 4** along with the
|
||||
rest of the integration-tooling surface. They are removed from v1
|
||||
because:
|
||||
|
||||
- v1 has no external connector capabilities (no Slack, Notion, Drive,
|
||||
etc.). The only capabilities that will ship are server-side helpers
|
||||
(search-space query / fetch) plus the loose `agent_task` action.
|
||||
- Without external connectors, the lifecycle mismatch that motivates
|
||||
the two-tier design (connect Monday, run Friday, workers restarted
|
||||
in between) doesn't arise. A startup-time dict is sufficient.
|
||||
- Phase 4 reintroduces this design as-is — the registry interface in
|
||||
v1 is the same callable surface a Phase-4 MCP harvester will register
|
||||
into. The deferral is additive, not a different design.
|
||||
|
||||
See archived design at `docs/automation/archived/mcp-registry.md` once
|
||||
v1 ships; for now the only consumer of the registry is the in-memory
|
||||
native path.
|
||||
- v1 actions are stand-alone units (Layer 1 below); the automation engine
|
||||
reads its own action registry, nothing else.
|
||||
- `agent_task.params.tools` is a forward-looking allowlist field with no
|
||||
v1 semantics beyond "list of string identifiers." The handler's tool
|
||||
resolution is opaque to the automation contract.
|
||||
|
||||
### Credentials — deferred to Phase 2
|
||||
|
||||
The earlier per-call credential resolution pattern (`ctx.resolve_mcp_client`,
|
||||
`ctx.resolve_http_client`, `ctx.resolve_llm`) is **deferred to Phase 2**.
|
||||
v1 capabilities run server-side using app-level configuration; none of
|
||||
the seven v1 capabilities needs per-user or per-connection auth.
|
||||
External-credential handlers (Slack, email, etc.) require per-user or
|
||||
per-connection auth. v1 actions run server-side with app-level
|
||||
configuration. When tight actions ship in Phase 2, the credential design
|
||||
lands as part of the unification redesign: connection IDs in the
|
||||
definition (never tokens); credentials loaded per-call by the handler
|
||||
context (never pre-loaded into worker memory); credentials never enter
|
||||
LLM context.
|
||||
|
||||
When Phase 2 ships external-credential capabilities (Slack, email, etc.),
|
||||
the three guarantees the original design promised are reintroduced
|
||||
unchanged:
|
||||
### MCP — deferred to Phase 4
|
||||
|
||||
- Credentials never appear in the automation definition (connection IDs
|
||||
only).
|
||||
- Credentials never appear in the LLM's context (the host holds them
|
||||
and uses them on the LLM's behalf when executing tool calls).
|
||||
- Credentials are loaded per-call, not pre-loaded into worker memory.
|
||||
|
||||
The Phase-2 design returns as-is; only the v1 surface is simplified.
|
||||
External tool servers feeding tools into a shared registry land with the
|
||||
rest of the integration tooling in Phase 4, after the unification layer
|
||||
is in place. The two-tier registry, `mcp_connections` and `mcp_tools`
|
||||
tables, and the harvester arrive as a single coherent step then.
|
||||
|
||||
---
|
||||
|
||||
## 4. Action contract (Layer 2)
|
||||
## 4. Action contract
|
||||
|
||||
An `Action` is what a user references in a plan step. Most actions are
|
||||
thin wrappers around one capability (e.g., `slack_post` wraps
|
||||
`slack.post_message`). Some compose: `agent_task` is one action whose
|
||||
handler invokes the LangGraph runtime, which in turn can call many
|
||||
capabilities.
|
||||
An `Action` is what a user references in a plan step. Some actions are
|
||||
deterministic single-purpose handlers (`slack_post`, `send_email`); one
|
||||
action (`agent_task`) hosts an LLM and a tool allowlist for cases where
|
||||
judgment is needed. The contract is the same in both cases — only the
|
||||
handler differs.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class ActionDefinition:
|
||||
type: str # "agent_task", "slack_post"
|
||||
name: str # for the UI
|
||||
description: str # for the NL generator
|
||||
config_schema: dict # JSON Schema for action.config
|
||||
output_contract: dict | DynamicOutput # what it produces
|
||||
uses_capabilities: list[str] # IDs from the registry
|
||||
produces_artifacts: list[ArtifactSpec] # see §8
|
||||
handler: AsyncHandler
|
||||
type: str # "agent_task", "slack_post"
|
||||
name: str # short UI label
|
||||
description: str # for the NL generator and the UI
|
||||
params_schema: dict # JSON Schema for step.params
|
||||
handler: ActionHandler
|
||||
```
|
||||
|
||||
This is the v1 shape: five fields, no handler context, no output
|
||||
contract, no artifact declaration. The deferrals are intentional:
|
||||
|
||||
- **`output_contract`** — Phase 2. Deterministic handlers will return
|
||||
a fixed shape; v1's only action (`agent_task`) takes an
|
||||
`output_schema` inside `params` and validates against that instead.
|
||||
- **`produces_artifacts`** — Phase 5. Artifact lifecycle (storage,
|
||||
signed URLs, retention) is its own design step; v1 handlers
|
||||
persist their own outputs.
|
||||
- **Handler context** — paired with the unification redesign (§3).
|
||||
v1 handlers receive `(args)` only; per-user / per-search-space
|
||||
behavior is not yet a v1 concern.
|
||||
|
||||
### Tight vs loose actions
|
||||
|
||||
Two patterns coexist by design:
|
||||
|
||||
- **Tight actions** (`slack_post`, `linear_create_issue`, `send_email`):
|
||||
config_schema is fully specified, output_contract is fixed, handler is a
|
||||
thin wrapper. ~20 LOC each. Used when the user knows exactly what they
|
||||
want done — no LLM tokens spent on trivial work.
|
||||
- **Tight actions** (`slack_post`, `linear_create_issue`,
|
||||
`send_email`) — deterministic single-purpose handlers. ~20 LOC
|
||||
each. **Phase 2.**
|
||||
- **Loose actions** (`agent_task`) — params_schema accepts a `prompt`,
|
||||
a `tools` allowlist, and an optional `output_schema` declaring what
|
||||
the agent must return; the handler validates the agent's output
|
||||
against it. **v1.**
|
||||
|
||||
- **Loose actions** (`agent_task`): config_schema accepts a `prompt` and a
|
||||
`tools` allowlist; output_contract is *dynamic* — the user declares the
|
||||
output shape they want via `output_schema` in the step config; the
|
||||
handler asks the LLM to return that shape and validates. Used when
|
||||
judgment is needed.
|
||||
|
||||
The agent's tool list is **the same capabilities** that tight actions call
|
||||
directly. One registry, two invocation modes. Adding a new MCP server gives
|
||||
both modes access to its tools automatically.
|
||||
The agent's `tools` allowlist resolves opaquely in v1; the redesigned
|
||||
unification layer (§3) will give both invocation modes access to the
|
||||
same vocabulary, with per-user authorization gating both.
|
||||
|
||||
### How names in the definition become function calls
|
||||
|
||||
The definition contains strings like `"action": "slack_post"`. The string is
|
||||
just a name — it does not point to a function. At runtime, the executor
|
||||
performs a **name-based lookup** against the action registry:
|
||||
The definition contains strings like `"action": "agent_task"`. The
|
||||
string is just a name — it does not point to a function. At runtime,
|
||||
the executor performs a **name-based lookup** against the action
|
||||
registry:
|
||||
|
||||
```python
|
||||
# step.action is a string from the JSON definition, e.g. "slack_post"
|
||||
action_def = _ACTION_REGISTRY[step.action] # dict lookup
|
||||
handler = action_def.handler # Python callable
|
||||
result = await handler(ctx, resolved_config) # invocation
|
||||
action_def = action_registry.get(step.action) # dict lookup
|
||||
handler = action_def.handler # Python callable
|
||||
result = await handler(resolved_params) # invocation
|
||||
```
|
||||
|
||||
The registry is a Python dict (or a thin wrapper around one) populated at
|
||||
process startup. Each entry in `automations/actions/*.py` calls a
|
||||
`register_action(...)` function at module import time, putting its
|
||||
`ActionDefinition` (including the handler function reference) into the
|
||||
registry.
|
||||
The registry is a Python dict populated at process startup. Each entry
|
||||
in `automations/registries/actions/*.py` calls `register_action(...)`
|
||||
at module import time, putting its `ActionDefinition` (including the
|
||||
handler function reference) into the registry.
|
||||
|
||||
The same pattern applies to capabilities. The definition references
|
||||
capabilities by ID (`"slack.post_message"`); the capability registry maps
|
||||
the ID to a `Capability` object holding the handler. Definitions never
|
||||
reference Python code directly — they reference names that the registry
|
||||
resolves to code.
|
||||
|
||||
This separation is what makes the contract portable. The definition is
|
||||
pure data. The registry is the engine's runtime vocabulary. They meet at
|
||||
name-based lookup; nothing else crosses the boundary.
|
||||
The definition is pure data. The registry is the engine's runtime
|
||||
vocabulary. They meet at name-based lookup; nothing else crosses the
|
||||
boundary.
|
||||
|
||||
### The full expressive spectrum
|
||||
|
||||
|
|
@ -238,7 +200,7 @@ fully agentic. Six practical shapes worth recognizing:
|
|||
| **1. Direct call** | `slack_post` with literal channel and template | No LLM. ~200ms. Fractions of a cent. |
|
||||
| **2. Direct call with computed inputs** | `linear_create_issue` using `{{summary.title}}` from a prior step | No LLM for this step. Cheap. |
|
||||
| **3. Single-domain agent task** | `agent_task` with `tools: ["slack.*"]` only | One LLM, bounded toolset. |
|
||||
| **4. Multi-domain agent task, narrow** | `agent_task` with `tools: ["github.list_pull_requests", "linear.create_issue"]` | One LLM, named capabilities. |
|
||||
| **4. Multi-domain agent task, narrow** | `agent_task` with `tools: ["github.list_pull_requests", "linear.create_issue"]` | One LLM, named tools. |
|
||||
| **5. Multi-domain agent task, broad** | `agent_task` with `tools: ["slack.*", "github.*", "linear.*"]` | One LLM, large toolset, most agentic. |
|
||||
| **6. Composed plan** | `agent_task` (narrow) for thinking → `slack_post` + `linear_create_issue` for acting | Best cost-to-power ratio. |
|
||||
|
||||
|
|
@ -258,7 +220,7 @@ user's.
|
|||
|
||||
---
|
||||
|
||||
## 5. Automation definition (Layer 3)
|
||||
## 5. Automation definition
|
||||
|
||||
This is the JSON the user writes (or the NL generator produces). Stored in
|
||||
`automations.definition` as JSONB.
|
||||
|
|
@ -287,7 +249,7 @@ This is the JSON the user writes (or the NL generator produces). Stored in
|
|||
"triggers": [
|
||||
{
|
||||
"type": "schedule",
|
||||
"config": { "cron": "0 9 * * 1-5", "timezone": "Africa/Kigali" }
|
||||
"params": { "cron": "0 9 * * 1-5", "timezone": "Africa/Kigali" }
|
||||
}
|
||||
],
|
||||
|
||||
|
|
@ -295,7 +257,7 @@ This is the JSON the user writes (or the NL generator produces). Stored in
|
|||
{
|
||||
"step_id": "research",
|
||||
"action": "agent_task",
|
||||
"config": {
|
||||
"params": {
|
||||
"prompt": "Find documents tagged {{inputs.tags}} indexed since {{inputs.since}}. Return JSON with bullets and source_doc_ids.",
|
||||
"tools": ["search_space.query", "search_space.fetch_document"],
|
||||
"model": "anthropic/claude-sonnet-4-7",
|
||||
|
|
@ -313,7 +275,7 @@ This is the JSON the user writes (or the NL generator produces). Stored in
|
|||
{
|
||||
"step_id": "deliver",
|
||||
"action": "slack_post",
|
||||
"config": {
|
||||
"params": {
|
||||
"channel_id": "C0123",
|
||||
"message_template": "*Competitor digest*\n\n{% for b in summary.bullets %}• {{b}}\n{% endfor %}"
|
||||
}
|
||||
|
|
@ -325,11 +287,10 @@ This is the JSON the user writes (or the NL generator produces). Stored in
|
|||
"max_retries": 2,
|
||||
"retry_backoff": "exponential",
|
||||
"concurrency": "drop_if_running",
|
||||
"budget_cap_usd": 1.50,
|
||||
"on_failure": [ /* steps to run if main plan fails after retries */ ]
|
||||
},
|
||||
|
||||
"metadata": { "tags": ["digest"], "created_from_nl": true }
|
||||
"metadata": { "tags": ["digest"] }
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -340,7 +301,7 @@ This is the JSON the user writes (or the NL generator produces). Stored in
|
|||
"step_id": "...", // unique within plan
|
||||
"action": "...", // references an ActionDefinition.type
|
||||
"when": "{{ ... }}", // optional Jinja expr → bool; false = skip
|
||||
"config": { ... }, // validated against action's config_schema
|
||||
"params": { ... }, // validated against action's params_schema
|
||||
"output_as": "...", // binds output to this name for later steps
|
||||
"max_retries": 0, // optional, overrides automation default
|
||||
"timeout_seconds": 1200 // optional, overrides automation default
|
||||
|
|
@ -354,7 +315,7 @@ about it, or they compose automations through events (§7.5).
|
|||
|
||||
---
|
||||
|
||||
## 6. Trigger contract (Layer 4)
|
||||
## 6. Trigger contract
|
||||
|
||||
Three trigger types. That's the entire taxonomy.
|
||||
|
||||
|
|
@ -363,23 +324,12 @@ Three trigger types. That's the entire taxonomy.
|
|||
```python
|
||||
TriggerDefinition(
|
||||
type="schedule",
|
||||
config_schema={
|
||||
"type": "object",
|
||||
"required": ["cron", "timezone"],
|
||||
"properties": {
|
||||
"cron": { "type": "string" },
|
||||
"timezone": { "type": "string", "format": "iana-timezone" }
|
||||
}
|
||||
},
|
||||
payload_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"fired_at": { "type": "string", "format": "date-time" },
|
||||
"scheduled_for": { "type": "string", "format": "date-time" },
|
||||
"last_fired_at": { "type": "string", "format": "date-time" }
|
||||
}
|
||||
}
|
||||
params_model=ScheduleTriggerParams, # cron + timezone
|
||||
)
|
||||
# At fire time the schedule producer emits runtime inputs
|
||||
# (fired_at, scheduled_for, last_fired_at) which are merged with the
|
||||
# trigger row's static_inputs (static wins) and validated against
|
||||
# automation.definition.inputs.schema_.
|
||||
```
|
||||
|
||||
Implementation: extends `app/utils/periodic_scheduler.py`, which already
|
||||
|
|
@ -395,7 +345,7 @@ want an event trigger instead.
|
|||
```python
|
||||
TriggerDefinition(
|
||||
type="webhook",
|
||||
config_schema={
|
||||
params_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"input_mapping": {
|
||||
|
|
@ -422,7 +372,7 @@ Dedups against runs in the last 24 hours.
|
|||
```python
|
||||
TriggerDefinition(
|
||||
type="event",
|
||||
config_schema={
|
||||
params_schema={
|
||||
"type": "object",
|
||||
"required": ["event_type"],
|
||||
"properties": {
|
||||
|
|
@ -485,11 +435,13 @@ Common path (after a trigger has fired):
|
|||
4. **Snapshot the resolved definition** into the run row (immutable history)
|
||||
5. Enqueue executor task on the single `automations_default` Celery queue
|
||||
|
||||
The cost-estimate pre-check (originally step 3) is **deferred**.
|
||||
v1 capabilities do not declare `cost_estimate`; pre-flight budgeting
|
||||
returns when a historical-cost ledger exists. The mid-flight budget
|
||||
cap (§7.2) still kills the run if accumulated cost crosses
|
||||
`budget_cap_usd`.
|
||||
The cost-estimate pre-check (originally step 3) is **deferred**. v1
|
||||
actions do not declare cost estimates, the run row has no `cost_usd`
|
||||
column, and no handler reports tokens used — so neither pre-flight
|
||||
prediction nor mid-flight accumulation can be enforced. `Execution`
|
||||
therefore does not expose `budget_cap_usd` in v1; it returns as a single
|
||||
field addition the day the cost ledger ships (per-action cost reporting
|
||||
+ `automation_runs.cost_usd` column + executor accumulation).
|
||||
|
||||
Queue routing by `expected_duration_seconds` is **deferred** until load
|
||||
patterns justify a second queue. v1 uses a single queue.
|
||||
|
|
@ -510,15 +462,15 @@ async def execute_run(run_id: int) -> None:
|
|||
if step.when and not evaluate_predicate(step.when, context | step_outputs):
|
||||
record_step_skipped(run, step); continue
|
||||
|
||||
resolved_config = render_config(step.config, context | step_outputs)
|
||||
resolved_params = render_params(step.params, context | step_outputs)
|
||||
action = action_registry.get(step.action)
|
||||
validate(resolved_config, action.config_schema)
|
||||
validate(resolved_params, action.params_schema)
|
||||
|
||||
try:
|
||||
result = await with_retries(
|
||||
action.handler,
|
||||
ctx=build_action_context(run, action),
|
||||
args=resolved_config,
|
||||
args=resolved_params,
|
||||
policy=step.retry_policy or run.execution.retry_policy,
|
||||
)
|
||||
validate(result, step.output_schema)
|
||||
|
|
@ -541,14 +493,20 @@ validated dict come back; it doesn't know that step was "smart."
|
|||
|
||||
### 7.3 Action handlers
|
||||
|
||||
One handler per `ActionDefinition.type`. Receives `(ctx, args)`, returns
|
||||
a dict matching `output_contract` (or matching the user-declared
|
||||
`output_schema` for dynamic-output actions like `agent_task`).
|
||||
One handler per `ActionDefinition.type`. Receives the validated `args`
|
||||
dict and returns whatever the step's output validates against (a fixed
|
||||
shape declared by tight actions, or a dynamic shape declared via
|
||||
`output_schema` in the step params for `agent_task`).
|
||||
|
||||
Handlers handle their own credential resolution via `ctx.resolve_credentials`.
|
||||
They do not know about retries, timeouts, or budget caps — those are the
|
||||
Handlers do not know about retries or timeouts — those are the
|
||||
executor's concern.
|
||||
|
||||
In v1, handlers take `(args)` only. The `CallContext` parameter sketched
|
||||
in §7.2's pseudo-code (caller user id, search space id, run id,
|
||||
credential resolver) arrives with the unification layer redesign (§3);
|
||||
v1's single action (`agent_task`) reads what it needs from app-level
|
||||
configuration.
|
||||
|
||||
### 7.4 Template engine
|
||||
|
||||
#### Why it exists
|
||||
|
|
@ -747,7 +705,7 @@ Three fields, per-automation defaults with optional per-step overrides:
|
|||
- `timeout_seconds`: integer
|
||||
|
||||
Retries on:
|
||||
- Capability handler exceptions
|
||||
- Action handler exceptions
|
||||
- Output schema validation failures (for dynamic-output actions, the
|
||||
validation error is fed back to the LLM in the retry)
|
||||
|
||||
|
|
@ -755,12 +713,21 @@ Not retries:
|
|||
- `when:` evaluation failures (these are user errors, surface immediately)
|
||||
- Input validation failures (caught at dispatch, never reach the executor)
|
||||
|
||||
### Budget enforcement
|
||||
### Budget enforcement *(deferred — not in v1)*
|
||||
|
||||
`budget_cap_usd` is per-run. The dispatcher refuses to enqueue if estimated
|
||||
cost exceeds it. The executor kills the run if accumulated cost crosses it
|
||||
mid-flight (the LLM ops handler reports tokens consumed back to the
|
||||
executor between calls).
|
||||
Future shape: `budget_cap_usd` on `Execution`, dispatcher refuses to
|
||||
enqueue if estimated cost exceeds it, executor kills the run if
|
||||
accumulated cost crosses it mid-flight (the LLM ops handler reports
|
||||
tokens consumed back to the executor between calls).
|
||||
|
||||
Prerequisites before this can land:
|
||||
- Each action declares cost reporting (tokens × model price, API call
|
||||
charges) — `ActionDefinition` has no such field today.
|
||||
- `automation_runs.cost_usd` column + executor accumulates per step.
|
||||
- A historical-cost ledger so pre-flight estimation can return useful
|
||||
numbers (otherwise the dispatcher gate is guessing).
|
||||
|
||||
Until all three exist, v1 has no surface for budget enforcement.
|
||||
|
||||
### On-failure handlers
|
||||
|
||||
|
|
@ -787,14 +754,13 @@ nightly Celery Beat task deletes expired artifacts).
|
|||
### Duration classes and queue routing — deferred
|
||||
|
||||
The original design routed runs to multiple Celery queues based on each
|
||||
capability's declared `expected_duration_seconds`. v1 ships with **one
|
||||
queue** (`automations_default`) and capabilities do not declare a
|
||||
duration. Multi-queue routing returns when burst load on a single queue
|
||||
actually justifies the operational complexity of independent worker
|
||||
pools.
|
||||
action's declared `expected_duration_seconds`. v1 ships with **one
|
||||
queue** (`automations_default`) and actions do not declare a duration.
|
||||
Multi-queue routing returns when burst load on a single queue actually
|
||||
justifies the operational complexity of independent worker pools.
|
||||
|
||||
Adding the second queue is a config change plus reintroducing
|
||||
`expected_duration_seconds` on the `Capability` dataclass — both
|
||||
`expected_duration_seconds` on the `ActionDefinition` dataclass — both
|
||||
mechanical, additive, and free of design rewrite.
|
||||
|
||||
---
|
||||
|
|
@ -832,14 +798,16 @@ and an immutable run history.
|
|||
|
||||
### `automation_triggers`
|
||||
|
||||
| field | type | notes |
|
||||
| --------------- | ----------------------------------------------------------------------------- | ------------------------------------------- |
|
||||
| `id` | int PK | |
|
||||
| `automation_id` | FK | |
|
||||
| `type` | enum: `schedule`, `manual` (Phase 2/3 add `webhook`, `event`) | |
|
||||
| `config` | jsonb | validated against trigger's `config_schema` |
|
||||
| `enabled` | bool | |
|
||||
| `last_fired_at` | timestamp | |
|
||||
| field | type | notes |
|
||||
| --------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------- |
|
||||
| `id` | int PK | |
|
||||
| `automation_id` | FK | |
|
||||
| `type` | enum: `schedule`, `manual` (Phase 2/3 add `webhook`, `event`) | |
|
||||
| `params` | jsonb | trigger-type config, validated against trigger's `params_schema` |
|
||||
| `static_inputs` | jsonb | per-attachment domain values merged into every run (static wins on collision) |
|
||||
| `enabled` | bool | |
|
||||
| `last_fired_at` | timestamp | |
|
||||
| `next_fire_at` | timestamp / null | precomputed next fire moment for schedule triggers |
|
||||
|
||||
`secret_hash` (for webhook bearer tokens) is **deferred to Phase 2** with
|
||||
the webhook trigger.
|
||||
|
|
@ -853,8 +821,7 @@ the webhook trigger.
|
|||
| `trigger_id` | FK / null | null = manual via UI |
|
||||
| `status` | enum | `pending`, `running`, `succeeded`, `failed`, `cancelled`, `timed_out` |
|
||||
| `definition_snapshot` | jsonb | the definition as it was when this run fired |
|
||||
| `trigger_payload` | jsonb | |
|
||||
| `resolved_inputs` | jsonb | |
|
||||
| `inputs` | jsonb | merged & validated inputs (trigger.static_inputs ∪ producer runtime data, static wins) |
|
||||
| `step_results` | jsonb | array of per-step results with timing |
|
||||
| `output` | jsonb / null | |
|
||||
| `artifacts` | jsonb | references to created artifacts |
|
||||
|
|
@ -863,7 +830,7 @@ the webhook trigger.
|
|||
| `agent_session_id`| str / null | link to LangGraph trace if agent_task was used |
|
||||
|
||||
`cost_usd` (per-run accumulated cost) is **deferred** until at least one
|
||||
v1 capability records token-level cost. When reintroduced it lands as a
|
||||
action records token-level cost. When reintroduced it lands as a
|
||||
column-only migration.
|
||||
|
||||
### Deferred tables
|
||||
|
|
@ -897,8 +864,8 @@ not "trusted authors only."
|
|||
|
||||
User provides natural-language input. The Generator LLM is given:
|
||||
- The full schema set (input schema for definition, registry of action
|
||||
types with their config_schemas, registry of trigger types, available
|
||||
capabilities for this SearchSpace, list of allowed Jinja filters)
|
||||
types with their params_schemas, registry of trigger types, list of
|
||||
allowed Jinja filters)
|
||||
- A tool to list available connectors, channels, and other SearchSpace
|
||||
resources, so it doesn't invent names that don't exist
|
||||
- A few-shot set of examples
|
||||
|
|
@ -918,13 +885,13 @@ Output: a structured proposal matching the automation definition schema.
|
|||
|
||||
Server-side, before the proposal reaches the user:
|
||||
- Validate against JSON Schema (shape correctness)
|
||||
- Verify every capability referenced exists in the registry (resource existence)
|
||||
- Verify every action and trigger type referenced exists in the registry
|
||||
- Verify every connector/channel/resource referenced exists in this SearchSpace
|
||||
- Validate every template against the sandbox's allowlist (no underscore
|
||||
attributes, no unregistered filter names, length under cap)
|
||||
|
||||
Failures here are deterministic errors, not warnings. A proposal that
|
||||
references a non-existent capability or includes a template using
|
||||
references a non-existent action or includes a template using
|
||||
`{{x.__class__}}` is rejected before the user sees it; the Generator is
|
||||
re-prompted with the validation error and asked to fix the proposal.
|
||||
|
||||
|
|
@ -947,7 +914,7 @@ produces two outputs for the user:
|
|||
- Action sequences that touch external systems without obvious benefit
|
||||
to the user
|
||||
- Cost estimates that seem high relative to the goal
|
||||
- References to capabilities the user hasn't used before
|
||||
- References to actions the user hasn't used before
|
||||
- Schedules tighter than 15 minutes (likely should be event triggers)
|
||||
|
||||
The Review LLM is a **UX layer** that makes review actually useful. It is
|
||||
|
|
@ -1009,33 +976,18 @@ always.
|
|||
surfsense_backend/app/
|
||||
├── automations/ # NEW: the engine
|
||||
│ ├── __init__.py
|
||||
│ ├── models.py # SQLAlchemy models for 6 tables
|
||||
│ ├── schemas.py # Pydantic schemas (definition envelope, etc.)
|
||||
│ ├── persistence/ # SQLAlchemy models + enums for 3 tables
|
||||
│ ├── schemas/ # Pydantic schemas (definition envelope, etc.)
|
||||
│ ├── routes.py # FastAPI router (/api/v1/automations)
|
||||
│ ├── service.py # CRUD + business logic
|
||||
│ ├── dispatcher.py # trigger matching, cost check, run creation
|
||||
│ ├── dispatcher.py # trigger matching, run creation
|
||||
│ ├── executor.py # the Celery task that runs a plan
|
||||
│ ├── templating.py # Jinja sandbox + filters
|
||||
│ ├── events.py # publish/subscribe for domain_events
|
||||
│ ├── filters.py # JSON filter grammar evaluator
|
||||
│ ├── actions/
|
||||
│ │ ├── registry.py
|
||||
│ │ ├── agent_task.py
|
||||
│ │ ├── transform_data.py
|
||||
│ │ ├── slack_post.py
|
||||
│ │ ├── send_email.py
|
||||
│ │ ├── notification.py
|
||||
│ │ └── (more in Phase 5: podcast_generation, report_generation, ...)
|
||||
│ ├── triggers/
|
||||
│ │ ├── registry.py
|
||||
│ │ ├── schedule.py # Celery Beat hookup
|
||||
│ │ ├── webhook.py # /fire endpoint
|
||||
│ │ └── event.py # subscribes to domain_events
|
||||
│ ├── capabilities/
|
||||
│ │ ├── registry.py
|
||||
│ │ ├── native.py # native capability registrations
|
||||
│ │ ├── mcp_harvester.py # registers MCP tools as capabilities (Phase 4)
|
||||
│ │ └── (LLM ops registered alongside)
|
||||
│ ├── registries/ # action and trigger registries
|
||||
│ │ ├── actions/ # ActionDefinition + handler registration
|
||||
│ │ └── triggers/ # TriggerDefinition
|
||||
│ └── nl/ # Phase 1 — primary user path
|
||||
│ ├── generator.py # Generator LLM
|
||||
│ ├── reviewer.py # Review LLM (summary + flagged items)
|
||||
|
|
@ -1070,23 +1022,22 @@ automations in natural language.
|
|||
**Step 1 (current scope, this batch of commits):**
|
||||
- 3 tables (`automations`, `automation_triggers`, `automation_runs`) +
|
||||
Alembic migration
|
||||
- Empty Capability, Action, Trigger registries (concrete entries land in
|
||||
later steps when the consuming feature lands)
|
||||
- Empty action and trigger registries under
|
||||
`app/automations/registries/` (concrete entries land in later steps)
|
||||
- Pydantic schemas for the automation definition envelope, the two v1
|
||||
trigger configs (`schedule`, `manual`), and the one v1 action config
|
||||
(`agent_task`)
|
||||
- Module structure under `app/automations/` (data/, schemas/,
|
||||
trigger params shapes (`schedule`, `manual`), and the one v1 action
|
||||
params shape (`agent_task`)
|
||||
- Module structure under `app/automations/` (persistence/, schemas/,
|
||||
registries/), fully isolated from the existing codebase
|
||||
|
||||
**Step 2:**
|
||||
- Register the `agent_task` action and the `schedule` / `manual`
|
||||
triggers in the registries
|
||||
- Capability registry populated with native deliverable-producing
|
||||
capabilities (chosen when this step starts)
|
||||
- The `agent_task` action handler and the `schedule` / `manual` triggers
|
||||
registered in `app/automations/registries/`. Tool resolution for
|
||||
`agent_task.params.tools` is opaque to the contract — the handler
|
||||
decides what string identifiers it accepts and how they resolve.
|
||||
|
||||
**Step 3:**
|
||||
- Executor (single-queue Celery task) with retries, timeouts, budget
|
||||
caps measured against `cost_usd` ledger on the run
|
||||
- Executor (single-queue Celery task) with retries and timeouts
|
||||
- Template engine (Jinja sandbox + the v1 filter allowlist + runtime
|
||||
limits)
|
||||
- Manual "Run now" endpoint
|
||||
|
|
@ -1122,19 +1073,23 @@ somewhere humans see, complex pipelines have proper error handling.
|
|||
**After Phase 3**: NL authoring is the polished primary surface; edit
|
||||
flows are conversational rather than form-only.
|
||||
|
||||
### Phase 4 — Event triggers
|
||||
### Phase 4 — Event triggers + integration tooling
|
||||
- `domain_events` table and `events.py` module
|
||||
- Indexing pipeline publishes `connector.*` events (smallest change — just
|
||||
add publish calls to the existing flow)
|
||||
- Automations publish `automation.run.*` events on completion
|
||||
- `event` trigger with filter grammar
|
||||
- MCP capability harvester (so MCP-backed events and tools both work)
|
||||
- The unification layer redesign (see §3) — `CallContext`, scope
|
||||
declarations, per-user authorization gating
|
||||
- MCP integration on top of the unification layer (external tool servers
|
||||
harvested into the shared catalog)
|
||||
|
||||
**After Phase 4**: "do X when Y happens" automations work, including
|
||||
automation-chaining through events.
|
||||
automation-chaining through events; external MCP tools and SurfSense
|
||||
actions share one vocabulary.
|
||||
|
||||
### Phase 5 — Wrapping existing features and sharing
|
||||
- Wrap existing SurfSense capabilities as actions: `podcast_generation`,
|
||||
- Wrap existing SurfSense features as actions: `podcast_generation`,
|
||||
`report_generation`, `indexing_sweep`
|
||||
- Artifact lifecycle implementation
|
||||
- `expected_duration_seconds` based queue routing (split `automations_long`
|
||||
|
|
@ -1144,7 +1099,7 @@ automation-chaining through events.
|
|||
shift documented in §7.4's pre-Phase-5 gate
|
||||
- Cross-automation composition examples in the docs
|
||||
|
||||
**After Phase 5**: every existing SurfSense capability is automatable
|
||||
**After Phase 5**: every existing SurfSense feature is automatable
|
||||
without any per-feature code, and automations can be shared between
|
||||
SearchSpaces and users.
|
||||
|
||||
|
|
@ -1156,13 +1111,12 @@ For reference — every decision made through the design process, in one
|
|||
place.
|
||||
|
||||
### Foundations
|
||||
1. ✅ JSON Schema 2020-12 is the single schema language for everything
|
||||
1. ✅ JSON Schema (draft 2020-12) is the single schema language for everything
|
||||
2. ✅ Definition is the program; infrastructure is the interpreter
|
||||
3. ✅ List of steps (not single action) in the plan, with `output_as` chaining
|
||||
4. ✅ One capability registry serving native + MCP + LLM operations through the same interface
|
||||
5. ✅ Capability IDs do not leak handler kind (`slack.post_message`, not `mcp.slack.post_message`)
|
||||
6. ✅ Name-based resolution: definitions reference actions and capabilities by string ID. The registry is the runtime's vocabulary; lookup is a dict access. No code references in definitions.
|
||||
7. ✅ The expressive spectrum runs from pure direct calls to broad agent_task; the NL generator proposes the cheapest shape that meets intent (Shape 6 from §4 by default)
|
||||
4. ⏸ Capability unification layer (one catalog shared by automations, agents, and future surfaces) — **deferred to post-v1** (see §3). v1 ships actions only.
|
||||
5. ✅ Name-based resolution: definitions reference action and trigger types by string ID. The registry is the runtime's vocabulary; lookup is a dict access. No code references in definitions.
|
||||
6. ✅ The expressive spectrum runs from pure direct calls to broad agent_task; the NL generator proposes the cheapest shape that meets intent (Shape 6 from §4 by default)
|
||||
|
||||
### Trigger taxonomy
|
||||
8. ✅ Three trigger types: `schedule`, `webhook`, `event`
|
||||
|
|
@ -1183,7 +1137,7 @@ place.
|
|||
19. ✅ No DAGs, no parallelism, no loops — composition via agent_task or events
|
||||
20. ✅ `on_failure` part of execution policy from v1
|
||||
21. ✅ Step-level retry and timeout overrides
|
||||
22. ✅ Budget cap enforced pre-enqueue and mid-flight
|
||||
22. ⏸ Budget cap enforced pre-enqueue and mid-flight — **deferred** until the cost ledger ships (see §8 Budget enforcement)
|
||||
|
||||
### Components
|
||||
23. ✅ Dispatcher / executor / handlers / registry — distinct, each replaceable
|
||||
|
|
@ -1197,25 +1151,22 @@ place.
|
|||
29. ✅ Automations publish run events for composability
|
||||
30. ✅ Publish/subscribe behind interface — no direct table access elsewhere
|
||||
|
||||
### Capability storage
|
||||
31. ✅ Native capabilities registered in-memory at startup from the codebase. Identical across all workers.
|
||||
32. ⏸ MCP capability metadata persisted in `mcp_connections` and `mcp_tools` tables — **deferred to Phase 4**
|
||||
33. ⏸ MCP handler closures built lazily per worker from database state — **deferred to Phase 4**
|
||||
34. ⏸ MCP server tool list re-harvested on a schedule — **deferred to Phase 4**
|
||||
35. ⏸ MCP tools harvested into the capability registry at connection time — **deferred to Phase 4**
|
||||
36. ⏸ Side effects inferred from MCP hints + naming + admin overrides — **deferred to Phase 4**
|
||||
37. ⏸ MCP tools callable directly (no agent required) when caller knows args — **deferred to Phase 4**
|
||||
### Capability unification — all deferred to post-v1
|
||||
31. ⏸ One shared catalog of "things this SurfSense instance can do" — **deferred**, see §3
|
||||
32. ⏸ Handler `CallContext` (caller user id, search space id, run id) — **deferred** with unification
|
||||
33. ⏸ Per-capability scope declarations driving authorization — **deferred** with unification
|
||||
34. ⏸ MCP integration on top of the unification layer (`mcp_connections`, `mcp_tools`, harvester) — **deferred to Phase 4**
|
||||
|
||||
### Credentials — all deferred to Phase 2
|
||||
38. ⏸ Credentials never appear in the automation definition — only connection IDs do — **Phase 2**
|
||||
39. ⏸ Credentials never appear in the LLM's context — the host holds them — **Phase 2**
|
||||
40. ⏸ Credentials resolved per-call by `ActionContext`, not pre-loaded into worker environment — **Phase 2**
|
||||
41. ⏸ Tokens encrypted at rest; refresh handled automatically by `ActionContext.resolve_*_client` — **Phase 2**
|
||||
35. ⏸ Credentials never appear in the automation definition — only connection IDs do — **Phase 2**
|
||||
36. ⏸ Credentials never appear in the LLM's context — the host holds them — **Phase 2**
|
||||
37. ⏸ Credentials resolved per-call by the handler context, not pre-loaded into worker environment — **Phase 2**
|
||||
38. ⏸ Tokens encrypted at rest; refresh handled automatically by the handler context — **Phase 2**
|
||||
|
||||
### v1-minimum (new lock)
|
||||
v1. ✅ `Capability` is exactly five fields: `id`, `description`, `input_schema`, `output_schema`, `handler`. Additional fields are added only when a concrete consumer feature requires them.
|
||||
v2. ✅ Cost is **measured** from a per-run ledger, not declared. Pre-flight cost checks return when the ledger has enough history.
|
||||
v3. ✅ Single `automations_default` Celery queue in v1. Multi-queue routing returns when load justifies it.
|
||||
### v1-minimum
|
||||
39. ✅ v1 ships actions only — no separate capability layer. `ActionDefinition` is five fields: `type`, `name`, `description`, `params_schema`, `handler`. Additional fields are added only when a concrete consumer feature requires them.
|
||||
40. ✅ Cost is **measured** from a per-run ledger, not declared. Pre-flight cost checks return when the ledger has enough history.
|
||||
41. ✅ Single `automations_default` Celery queue in v1. Multi-queue routing returns when load justifies it.
|
||||
|
||||
### NL authoring
|
||||
42. ✅ LLM-authored templates is the primary path from day one — not a Phase 3 addition. Hand-authoring JSON is supported but secondary
|
||||
|
|
@ -1227,7 +1178,7 @@ v3. ✅ Single `automations_default` Celery queue in v1. Multi-queue routing ret
|
|||
48. ✅ NL drafts are transient storage, not a core table
|
||||
|
||||
### Data model
|
||||
49. ✅ Six tables total — four for engine state, two for MCP persistence
|
||||
49. ✅ v1 ships three tables (`automations`, `automation_triggers`, `automation_runs`). `domain_events` lands in Phase 3; `mcp_connections` and `mcp_tools` in Phase 4.
|
||||
50. ✅ Run rows snapshot the definition (immutable history)
|
||||
51. ✅ All entities scoped by `search_space_id` for RBAC
|
||||
52. ✅ Editing an automation bumps `version`; existing runs unaffected
|
||||
|
|
@ -1283,7 +1234,7 @@ Schemas spelled out concretely. Those follow mechanically from this plan.
|
|||
agent (cron and skills subsystems); n8n documentation on node types and
|
||||
workflow data model; the SurfSense repository and DeepWiki architecture
|
||||
notes (FastAPI + Celery Beat + Electric SQL + LangGraph Deep Agents +
|
||||
Search Space RBAC); Model Context Protocol specification for capability
|
||||
harvesting; AWS EventBridge for filter grammar; workflow-pattern
|
||||
Search Space RBAC); Model Context Protocol specification for external
|
||||
tool harvesting; AWS EventBridge for filter grammar; workflow-pattern
|
||||
literature (van der Aalst et al.) for the trigger / action / concurrency
|
||||
vocabulary.*
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue