docs(automation): defer credentials, cost, queue-routing, side-effects

Update §3 (Credentials), §7.1 (Dispatcher common path), §8 (Duration
classes and queue routing), and §13 (Decisions locked) to reflect the
v1-minimum scope:

- Credentials block in §3 collapses to a deferred-to-Phase-2 note. The
  three guarantees (no creds in definition, no creds in LLM context,
  per-call resolution) return unchanged when Phase 2 ships external
  capabilities.
- Cost-estimate pre-check in the dispatcher's common path is removed.
  Mid-flight budget kill in the executor still enforces budget_cap_usd.
- Queue routing by expected_duration_seconds is deferred. Single
  automations_default queue in v1.
- Decisions 24, 25, 26, 32-37, 38-41 marked deferred with explicit
  return phase. Three new v1-minimum decisions added (5-field
  Capability, measured-not-declared cost, single queue).

All deferrals are additive: the original designs return as-is when
warranted; nothing is rewritten between phases.
This commit is contained in:
CREDO23 2026-05-26 22:35:37 +02:00
parent b029c090bd
commit 144d702c35

View file

@ -138,47 +138,24 @@ See archived design at `docs/automation/archived/mcp-registry.md` once
v1 ships; for now the only consumer of the registry is the in-memory v1 ships; for now the only consumer of the registry is the in-memory
native path. native path.
### Credentials: resolved at the moment of use ### Credentials — deferred to Phase 2
The handler doesn't carry credentials and the closure doesn't capture them. The earlier per-call credential resolution pattern (`ctx.resolve_mcp_client`,
When invoked, the handler asks `ActionContext` for what it needs: `ctx.resolve_http_client`, `ctx.resolve_llm`) is **deferred to Phase 2**.
v1 capabilities run server-side using app-level configuration; none of
the seven v1 capabilities needs per-user or per-connection auth.
```python When Phase 2 ships external-credential capabilities (Slack, email, etc.),
def make_mcp_handler(connection_id: UUID, tool_name: str): the three guarantees the original design promised are reintroduced
async def handler(ctx: ActionContext, args: dict) -> Any: unchanged:
# Credential resolution happens here, per call
client = await ctx.resolve_mcp_client(connection_id)
response = await client.call_tool(name=tool_name, arguments=args)
return response.content
return handler
```
`ctx.resolve_mcp_client(connection_id)`: - Credentials never appear in the automation definition (connection IDs
1. Loads the `mcp_connections` row only).
2. Decrypts the access token - Credentials never appear in the LLM's context (the host holds them
3. Refreshes the token if it's expired (using the refresh token) and uses them on the LLM's behalf when executing tool calls).
4. Constructs an `MCPClient` with the token set as a default authorization - Credentials are loaded per-call, not pre-loaded into worker memory.
header
The HTTP library carries the auth header on every subsequent call the The Phase-2 design returns as-is; only the v1 surface is simplified.
client makes — the handler doesn't think about it after construction.
For native capabilities calling external APIs directly,
`ctx.resolve_http_client(provider)` returns an authenticated `httpx`
client. For LLM operations, `ctx.resolve_llm(provider)` returns a
configured LLM client. **Three resolution methods, one pattern: the
context returns a client already authenticated.**
Three properties this gives us:
- **Credentials never appear in the automation definition.** The JSON
contains capability references and connection IDs, never tokens.
- **Credentials never appear in the LLM's context.** Even during
`agent_task`, the LLM sees tool descriptions only; the host holds
credentials and uses them when executing the tools the LLM requests.
- **Credentials are loaded per-call, not pre-loaded.** The credential
exists in memory only during the moment a handler is making a call. No
long-lived secrets in worker memory.
--- ---
@ -504,12 +481,18 @@ event, evaluates all matching triggers' filters, fires the matches.
Common path (after a trigger has fired): Common path (after a trigger has fired):
1. Resolve `inputs` from trigger payload and defaults 1. Resolve `inputs` from trigger payload and defaults
2. Validate resolved inputs against the automation's input schema 2. Validate resolved inputs against the automation's input schema
3. **Cost estimate** — sum capabilities' `cost_estimate(args)` for the plan; 3. **Idempotency check** — dedup against existing pending/running runs
refuse if exceeds `budget_cap_usd` 4. **Snapshot the resolved definition** into the run row (immutable history)
4. **Idempotency check** — dedup against existing pending/running runs 5. Enqueue executor task on the single `automations_default` Celery queue
5. **Snapshot the resolved definition** into the run row (immutable history)
6. Enqueue executor task on the appropriate Celery queue (per The cost-estimate pre-check (originally step 3) is **deferred**.
`expected_duration_seconds`) v1 capabilities do not declare `cost_estimate`; pre-flight budgeting
returns when a historical-cost ledger exists. The mid-flight budget
cap (§7.2) still kills the run if accumulated cost crosses
`budget_cap_usd`.
Queue routing by `expected_duration_seconds` is **deferred** until load
patterns justify a second queue. v1 uses a single queue.
### 7.2 Executor ### 7.2 Executor
@ -801,16 +784,18 @@ The engine handles storage (writes to SurfSense's existing object storage),
URL generation (signed, scoped to the run's permissions), and cleanup (a URL generation (signed, scoped to the run's permissions), and cleanup (a
nightly Celery Beat task deletes expired artifacts). nightly Celery Beat task deletes expired artifacts).
### Duration classes and queue routing ### Duration classes and queue routing — deferred
Capabilities declare `expected_duration_seconds`. The dispatcher routes The original design routed runs to multiple Celery queues based on each
runs to Celery queues based on the longest-duration step: capability's declared `expected_duration_seconds`. v1 ships with **one
- < 10s `automations_fast` queue** (`automations_default`) and capabilities do not declare a
- 10s 5min → `automations_medium` duration. Multi-queue routing returns when burst load on a single queue
- 5min 1hr → `automations_long` actually justifies the operational complexity of independent worker
pools.
Operators scale each queue's worker pool independently. A future "very Adding the second queue is a config change plus reintroducing
long" queue is a config change, not a contract change. `expected_duration_seconds` on the `Capability` dataclass — both
mechanical, additive, and free of design rewrite.
--- ---
@ -1210,9 +1195,9 @@ place.
### Components ### Components
23. ✅ Dispatcher / executor / handlers / registry — distinct, each replaceable 23. ✅ Dispatcher / executor / handlers / registry — distinct, each replaceable
24. Side effects are a set, including `USER_VISIBLE` 24. Side effects are a set, including `USER_VISIBLE`**deferred** until multi-user automation RBAC ships
25. `expected_duration_seconds` integer drives queue routing 25. `expected_duration_seconds` integer drives queue routing**deferred** until a second Celery queue is needed
26. `produces_artifacts` is a list of `ArtifactSpec`, not a bool 26. `produces_artifacts` is a list of `ArtifactSpec`, not a bool**deferred** until artifacts beyond the deliverable handlers' own persistence are needed
27. ✅ Output schemas recommended on `agent_task`; editor warns when missing 27. ✅ Output schemas recommended on `agent_task`; editor warns when missing
### Event bus ### Event bus
@ -1220,20 +1205,25 @@ place.
29. ✅ Automations publish run events for composability 29. ✅ Automations publish run events for composability
30. ✅ Publish/subscribe behind interface — no direct table access elsewhere 30. ✅ Publish/subscribe behind interface — no direct table access elsewhere
### Capability storage (two-tier persistence) ### Capability storage
31. ✅ Native capabilities registered in-memory at startup from the codebase. Identical across all workers. 31. ✅ Native capabilities registered in-memory at startup from the codebase. Identical across all workers.
32. ✅ MCP capability metadata persisted in `mcp_connections` and `mcp_tools` tables. Survives restarts. 32. ⏸ MCP capability metadata persisted in `mcp_connections` and `mcp_tools` tables — **deferred to Phase 4**
33. ✅ MCP handler closures built lazily per worker from database state. Worker-local cache, rebuilt on demand. 33. ⏸ MCP handler closures built lazily per worker from database state — **deferred to Phase 4**
34. ✅ MCP server tool list re-harvested on a schedule (default: daily) and on user request. 34. ⏸ MCP server tool list re-harvested on a schedule — **deferred to Phase 4**
35. MCP tools harvested into the capability registry at connection time 35. MCP tools harvested into the capability registry at connection time — **deferred to Phase 4**
36. Side effects inferred from MCP hints + naming + admin overrides 36. Side effects inferred from MCP hints + naming + admin overrides — **deferred to Phase 4**
37. MCP tools callable directly (no agent required) when caller knows args 37. MCP tools callable directly (no agent required) when caller knows args — **deferred to Phase 4**
### Credentials ### Credentials — all deferred to Phase 2
38. ✅ Credentials never appear in the automation definition — only connection IDs do 38. ⏸ Credentials never appear in the automation definition — only connection IDs do — **Phase 2**
39. ✅ Credentials never appear in the LLM's context — the host holds them and uses them on the LLM's behalf 39. ⏸ Credentials never appear in the LLM's context — the host holds them — **Phase 2**
40. ✅ Credentials resolved per-call by `ActionContext`, not pre-loaded into worker environment 40. ⏸ Credentials resolved per-call by `ActionContext`, not pre-loaded into worker environment — **Phase 2**
41. ✅ Tokens encrypted at rest in the database; refresh handled automatically by `ActionContext.resolve_*_client` 41. ⏸ Tokens encrypted at rest; refresh handled automatically by `ActionContext.resolve_*_client` — **Phase 2**
### v1-minimum (new lock)
v1. ✅ `Capability` is exactly five fields: `id`, `description`, `input_schema`, `output_schema`, `handler`. Additional fields are added only when a concrete consumer feature requires them.
v2. ✅ Cost is **measured** from a per-run ledger, not declared. Pre-flight cost checks return when the ledger has enough history.
v3. ✅ Single `automations_default` Celery queue in v1. Multi-queue routing returns when load justifies it.
### NL authoring ### NL authoring
42. ✅ LLM-authored templates is the primary path from day one — not a Phase 3 addition. Hand-authoring JSON is supported but secondary 42. ✅ LLM-authored templates is the primary path from day one — not a Phase 3 addition. Hand-authoring JSON is supported but secondary