docs(automation): defer credentials, cost, queue-routing, side-effects

Update §3 (Credentials), §7.1 (Dispatcher common path), §8 (Duration
classes and queue routing), and §13 (Decisions locked) to reflect the
v1-minimum scope:

- Credentials block in §3 collapses to a deferred-to-Phase-2 note. The
  three guarantees (no creds in definition, no creds in LLM context,
  per-call resolution) return unchanged when Phase 2 ships external
  capabilities.
- Cost-estimate pre-check in the dispatcher's common path is removed.
  Mid-flight budget kill in the executor still enforces budget_cap_usd.
- Queue routing by expected_duration_seconds is deferred. Single
  automations_default queue in v1.
- Decisions 24, 25, 26, 32-37, 38-41 marked deferred with explicit
  return phase. Three new v1-minimum decisions added (5-field
  Capability, measured-not-declared cost, single queue).

All deferrals are additive: the original designs return as-is when
warranted; nothing is rewritten between phases.
This commit is contained in:
CREDO23 2026-05-26 22:35:37 +02:00
parent b029c090bd
commit 144d702c35

View file

@ -138,47 +138,24 @@ See archived design at `docs/automation/archived/mcp-registry.md` once
v1 ships; for now the only consumer of the registry is the in-memory
native path.
### Credentials: resolved at the moment of use
### Credentials — deferred to Phase 2
The handler doesn't carry credentials and the closure doesn't capture them.
When invoked, the handler asks `ActionContext` for what it needs:
The earlier per-call credential resolution pattern (`ctx.resolve_mcp_client`,
`ctx.resolve_http_client`, `ctx.resolve_llm`) is **deferred to Phase 2**.
v1 capabilities run server-side using app-level configuration; none of
the seven v1 capabilities needs per-user or per-connection auth.
```python
def make_mcp_handler(connection_id: UUID, tool_name: str):
async def handler(ctx: ActionContext, args: dict) -> Any:
# Credential resolution happens here, per call
client = await ctx.resolve_mcp_client(connection_id)
response = await client.call_tool(name=tool_name, arguments=args)
return response.content
return handler
```
When Phase 2 ships external-credential capabilities (Slack, email, etc.),
the three guarantees the original design promised are reintroduced
unchanged:
`ctx.resolve_mcp_client(connection_id)`:
1. Loads the `mcp_connections` row
2. Decrypts the access token
3. Refreshes the token if it's expired (using the refresh token)
4. Constructs an `MCPClient` with the token set as a default authorization
header
- Credentials never appear in the automation definition (connection IDs
only).
- Credentials never appear in the LLM's context (the host holds them
and uses them on the LLM's behalf when executing tool calls).
- Credentials are loaded per-call, not pre-loaded into worker memory.
The HTTP library carries the auth header on every subsequent call the
client makes — the handler doesn't think about it after construction.
For native capabilities calling external APIs directly,
`ctx.resolve_http_client(provider)` returns an authenticated `httpx`
client. For LLM operations, `ctx.resolve_llm(provider)` returns a
configured LLM client. **Three resolution methods, one pattern: the
context returns a client already authenticated.**
Three properties this gives us:
- **Credentials never appear in the automation definition.** The JSON
contains capability references and connection IDs, never tokens.
- **Credentials never appear in the LLM's context.** Even during
`agent_task`, the LLM sees tool descriptions only; the host holds
credentials and uses them when executing the tools the LLM requests.
- **Credentials are loaded per-call, not pre-loaded.** The credential
exists in memory only during the moment a handler is making a call. No
long-lived secrets in worker memory.
The Phase-2 design returns as-is; only the v1 surface is simplified.
---
@ -504,12 +481,18 @@ event, evaluates all matching triggers' filters, fires the matches.
Common path (after a trigger has fired):
1. Resolve `inputs` from trigger payload and defaults
2. Validate resolved inputs against the automation's input schema
3. **Cost estimate** — sum capabilities' `cost_estimate(args)` for the plan;
refuse if exceeds `budget_cap_usd`
4. **Idempotency check** — dedup against existing pending/running runs
5. **Snapshot the resolved definition** into the run row (immutable history)
6. Enqueue executor task on the appropriate Celery queue (per
`expected_duration_seconds`)
3. **Idempotency check** — dedup against existing pending/running runs
4. **Snapshot the resolved definition** into the run row (immutable history)
5. Enqueue executor task on the single `automations_default` Celery queue
The cost-estimate pre-check (originally step 3) is **deferred**.
v1 capabilities do not declare `cost_estimate`; pre-flight budgeting
returns when a historical-cost ledger exists. The mid-flight budget
cap (§7.2) still kills the run if accumulated cost crosses
`budget_cap_usd`.
Queue routing by `expected_duration_seconds` is **deferred** until load
patterns justify a second queue. v1 uses a single queue.
### 7.2 Executor
@ -801,16 +784,18 @@ The engine handles storage (writes to SurfSense's existing object storage),
URL generation (signed, scoped to the run's permissions), and cleanup (a
nightly Celery Beat task deletes expired artifacts).
### Duration classes and queue routing
### Duration classes and queue routing — deferred
Capabilities declare `expected_duration_seconds`. The dispatcher routes
runs to Celery queues based on the longest-duration step:
- < 10s `automations_fast`
- 10s 5min → `automations_medium`
- 5min 1hr → `automations_long`
The original design routed runs to multiple Celery queues based on each
capability's declared `expected_duration_seconds`. v1 ships with **one
queue** (`automations_default`) and capabilities do not declare a
duration. Multi-queue routing returns when burst load on a single queue
actually justifies the operational complexity of independent worker
pools.
Operators scale each queue's worker pool independently. A future "very
long" queue is a config change, not a contract change.
Adding the second queue is a config change plus reintroducing
`expected_duration_seconds` on the `Capability` dataclass — both
mechanical, additive, and free of design rewrite.
---
@ -1210,9 +1195,9 @@ place.
### Components
23. ✅ Dispatcher / executor / handlers / registry — distinct, each replaceable
24. Side effects are a set, including `USER_VISIBLE`
25. `expected_duration_seconds` integer drives queue routing
26. `produces_artifacts` is a list of `ArtifactSpec`, not a bool
24. Side effects are a set, including `USER_VISIBLE`**deferred** until multi-user automation RBAC ships
25. `expected_duration_seconds` integer drives queue routing**deferred** until a second Celery queue is needed
26. `produces_artifacts` is a list of `ArtifactSpec`, not a bool**deferred** until artifacts beyond the deliverable handlers' own persistence are needed
27. ✅ Output schemas recommended on `agent_task`; editor warns when missing
### Event bus
@ -1220,20 +1205,25 @@ place.
29. ✅ Automations publish run events for composability
30. ✅ Publish/subscribe behind interface — no direct table access elsewhere
### Capability storage (two-tier persistence)
### Capability storage
31. ✅ Native capabilities registered in-memory at startup from the codebase. Identical across all workers.
32. ✅ MCP capability metadata persisted in `mcp_connections` and `mcp_tools` tables. Survives restarts.
33. ✅ MCP handler closures built lazily per worker from database state. Worker-local cache, rebuilt on demand.
34. ✅ MCP server tool list re-harvested on a schedule (default: daily) and on user request.
35. MCP tools harvested into the capability registry at connection time
36. Side effects inferred from MCP hints + naming + admin overrides
37. MCP tools callable directly (no agent required) when caller knows args
32. ⏸ MCP capability metadata persisted in `mcp_connections` and `mcp_tools` tables — **deferred to Phase 4**
33. ⏸ MCP handler closures built lazily per worker from database state — **deferred to Phase 4**
34. ⏸ MCP server tool list re-harvested on a schedule — **deferred to Phase 4**
35. MCP tools harvested into the capability registry at connection time — **deferred to Phase 4**
36. Side effects inferred from MCP hints + naming + admin overrides — **deferred to Phase 4**
37. MCP tools callable directly (no agent required) when caller knows args — **deferred to Phase 4**
### Credentials
38. ✅ Credentials never appear in the automation definition — only connection IDs do
39. ✅ Credentials never appear in the LLM's context — the host holds them and uses them on the LLM's behalf
40. ✅ Credentials resolved per-call by `ActionContext`, not pre-loaded into worker environment
41. ✅ Tokens encrypted at rest in the database; refresh handled automatically by `ActionContext.resolve_*_client`
### Credentials — all deferred to Phase 2
38. ⏸ Credentials never appear in the automation definition — only connection IDs do — **Phase 2**
39. ⏸ Credentials never appear in the LLM's context — the host holds them — **Phase 2**
40. ⏸ Credentials resolved per-call by `ActionContext`, not pre-loaded into worker environment — **Phase 2**
41. ⏸ Tokens encrypted at rest; refresh handled automatically by `ActionContext.resolve_*_client` — **Phase 2**
### v1-minimum (new lock)
v1. ✅ `Capability` is exactly five fields: `id`, `description`, `input_schema`, `output_schema`, `handler`. Additional fields are added only when a concrete consumer feature requires them.
v2. ✅ Cost is **measured** from a per-run ledger, not declared. Pre-flight cost checks return when the ledger has enough history.
v3. ✅ Single `automations_default` Celery queue in v1. Multi-queue routing returns when load justifies it.
### NL authoring
42. ✅ LLM-authored templates is the primary path from day one — not a Phase 3 addition. Hand-authoring JSON is supported but secondary