Re-apply the trim style after the prior refactor commit re-introduced
a multi-line docstring on AutomationRun.
- AutomationRun: drop the four-line docstring explaining where
per-step session ids live; move the note to a single-line inline
comment right above ``step_results`` where it's actionable.
- AutomationDefinition: drop the design-plan cross-reference; the
module docstring already establishes what the file is.
No behaviour change.
A run can contain zero, one, or N agent_task steps. A single
agent_session_id at the run level holds at most one of them, so the
column is the wrong shape for the data.
Per-step session ids (LangGraph thread/checkpoint reference for an
agent_task step) live inside step_results[i] alongside the rest of
the per-step bag (status, timings, output). Each agent step records
its own; non-agent steps record nothing. Run-level "primary session"
is a UI concern, not a schema concern.
Trade-off: trace -> run reverse lookup is now a JSONB query, not an
index hit. Usually traversal goes run -> trace; if the reverse
becomes hot we add a GIN index on step_results or a generated
column — both additive.
Changes:
- AutomationRun: drop the agent_session_id column; module docstring
notes where per-step session ids now live.
- Migration 144: drop the column from the CREATE TABLE; downgrade
unchanged.
Safe to edit migration 144 in place (vs. add 145 with ALTER ... DROP):
this branch has not shipped and the table has never existed in any
deployed database.
Cut the docstrings and Field(description=...) text across the entire
automations/ tree down to single-line intent statements, matching the
multi_agent_chat conciseness style:
- Module docstrings: one line stating what the file is.
- Class docstrings: deleted when the class name + module docstring
already cover intent; kept only where they add a constraint or
rationale not visible in the signature.
- Pydantic Field descriptions: short noun phrases / clauses, not
full sentences. Reasoning that belonged in the design plan moved
out of the code.
- Enum values: per-value docstrings replaced with terse inline
comments where the meaning isn't obvious from the name.
Behaviour is unchanged. The same 33 files, same public surface, same
imports — verified by re-running the 10-point registry smoke test and
the 8-point schema round-trip / constraint suite from commits 9 and
10.
LOC: 1180 → 691 (-42%).
Three registries under app/automations/registries/, each as its own
folder with the same SRP-per-file split (types.py for the dataclass,
store.py for the in-memory dict + register/get/all functions). All
three start empty; concrete entries land when the user signs off on
which capabilities / actions / triggers to include (step 2).
Capability (locked at v1-minimum five fields — see commit 2):
- id, description, input_schema, output_schema, handler
- CapabilityHandler = Callable[[dict[str, Any]], Awaitable[Any]]
- Frozen, slotted dataclass (immutable post-registration).
ActionDefinition (v1-trim of design plan §4):
- type, name, description, config_schema, handler
- Defers output_contract (handled per-step by agent_task's
config.output_schema), uses_capabilities (no static analysis
needed until >1 action ships), and produces_artifacts (deferred
alongside the artifact pipeline).
TriggerDefinition (declarative, no handler):
- type, description, config_schema, payload_schema
- No handler field — firing is a single dispatcher's
responsibility, not a per-trigger one.
store.py contract for all three:
- register_*: idempotent at process startup, raises on duplicate
- get_*: returns None on miss
- all_*: returns a defensive copy of the registry dict
Verified by an inline smoke test (10 checks): empty initial state,
registration and lookup work, duplicates raise, frozen dataclasses
reject mutation, snapshots are copies, handlers are awaitable.
Isolation invariant audit: grep across the full app/automations/
tree shows only three app.* imports, all of them
``from app.db import BaseModel, TimestampMixin`` in the model files.
No imports from app.agents.*, app.services.*, app.tasks.*,
app.routes.*, or any other business-logic module.
Three layers of Pydantic models under app/automations/schemas/, one
file per concern (SRP), matching the envelope in
automation-design-plan.md §5.
definition/ — the editable envelope persisted in
automations.definition:
- envelope.py AutomationDefinition (top-level shape)
- plan_step.py PlanStep (one step in the sequential plan)
- inputs.py InputsBlock (the inputs JSON Schema wrapper)
- execution.py ExecutionBlock (timeouts, retries, concurrency,
budget cap, on_failure plan)
- metadata.py MetadataBlock (tags + created_from_nl + extras)
- trigger_spec.py TriggerSpec (one entry in triggers[])
triggers/ — per-trigger config schemas, dispatched by registry on the
TriggerSpec.type discriminator:
- schedule.py ScheduleTriggerConfig(cron, timezone)
- manual.py ManualTriggerConfig() — empty in v1
actions/ — per-action config schemas, dispatched by registry on the
PlanStep.action discriminator:
- agent_task.py AgentTaskActionConfig(prompt, tools, model,
output_schema)
Design properties verified by an inline smoke test:
- The §5 worked example round-trips through model_validate_json /
model_dump_json byte-for-byte (InputsBlock uses
serialize_by_alias so the JSON key stays "schema" not
"schema_").
- Envelope rejects unknown top-level keys (extra="forbid").
- MetadataBlock tolerates unknown keys (extra="allow").
- ExecutionBlock defaults apply when the block is omitted.
- retry_backoff and concurrency are typed as Literal — bogus
values rejected at validation time.
- Per-type configs enforce their required fields (cron + timezone
on schedule; non-empty prompt on agent_task).
The envelope keeps trigger and action configs as untyped dicts on
purpose — per-type validation is a registry-driven dispatch (commit
10), keeping the envelope free of every-type-knows-every-type
coupling.
Migration 144 -> 143. Matches the SQLAlchemy models added in commit 7
and the v1 data model in automation-design-plan.md §9.
Up:
- CREATE TYPE automation_status / automation_trigger_type /
automation_run_status (PostgreSQL ENUMs created first because the
tables reference them).
- CREATE TABLE automations with FK to searchspaces (CASCADE) and
user (SET NULL); five indexes matching the SQLAlchemy model.
- CREATE TABLE automation_triggers with FK to automations
(CASCADE); four indexes.
- CREATE TABLE automation_runs with FK to automations (CASCADE) and
automation_triggers (SET NULL — null trigger_id == manual via UI);
four indexes.
Down: drops every index, table, and ENUM in reverse-dependency order
so the migration is reversible without ON DELETE side effects.
Verified: `alembic history` resolves 143 -> 144 (head) cleanly.
domain_events (Phase 3) and mcp_connections / mcp_tools (Phase 4) ship
in their own migrations when the consuming feature lands; this
migration only covers the three v1 tables.
Three enums (one file each) plus three models (one file each), all
under app/automations/persistence/. The module imports from app.db
only (Base/BaseModel/TimestampMixin and FK targets searchspaces.id /
user.id); no business-logic imports.
Enums:
- AutomationStatus: active | paused | archived
- RunStatus: pending | running | succeeded | failed | cancelled
| timed_out
- TriggerType: schedule | manual (Phase-2/3 add webhook | event)
Models:
- Automation: search_space-scoped, created_by_user_id (SET NULL),
name + description, status enum, definition JSONB, version int,
updated_at with onupdate.
- AutomationTrigger: FK → automations (CASCADE), type enum, config
JSONB, enabled bool, last_fired_at. Webhook secret_hash is omitted
until Phase 2.
- AutomationRun: FK → automations (CASCADE), nullable trigger_id
(SET NULL — null = manual via UI), status enum,
definition_snapshot for immutable history, trigger_payload /
resolved_inputs / step_results / output / artifacts / error JSONB
columns, started_at / finished_at timestamps, agent_session_id for
linking to the LangGraph trace. cost_usd column omitted until at
least one v1 capability records token-level cost.
Verified: Base.metadata exposes all three table names; columns and
enums introspect as documented; no linter errors.
Create app/automations/ with the SRP-per-file / grouped-folders layout
that mirrors app/agents/multi_agent_chat/. Twelve __init__.py files,
each a thin re-export with a single-line docstring describing the
subpackage's role, no exports yet (filled in subsequent commits).
Tree:
app/automations/
├── persistence/
│ ├── enums/ (status / type enums; one per file)
│ └── models/ (SQLAlchemy tables; one per file)
├── schemas/
│ ├── definition/ (the JSON envelope, broken by concern)
│ ├── triggers/ (per-trigger config schemas)
│ └── actions/ (per-action config schemas)
└── registries/
├── capabilities/ (types.py + store.py)
├── actions/ (types.py + store.py)
└── triggers/ (types.py + store.py)
The persistence/ folder is named to avoid surfsense_backend/.gitignore's
data/ ignore rule, which silently masked the original data/ name and
its contents from version control.
Isolation invariant: the module imports only from app.db (foundational
Base + FK targets, unavoidable) and stdlib / SQLAlchemy / Pydantic.
No imports from app.agents.*, app.services.*, app.tasks.*, app.routes.*
or any other business-logic module. Confirmed importable with no side
effects.
§9 (Data model): drop from six tables to three. v1 ships automations,
automation_triggers, automation_runs only. domain_events deferred to
Phase 3 (event trigger); mcp_connections/mcp_tools deferred to Phase 4
(MCP integration). Remove the table definitions for the deferred ones
and replace with a deferred-tables note pointing to the consuming
phase.
automation_triggers.type enum narrowed to schedule|manual for v1.
Webhook and event types ship with their respective phases. secret_hash
column deferred to Phase 2 alongside the webhook trigger.
automation_runs.cost_usd column deferred until at least one v1
capability records token-level cost — additive when reintroduced.
§14 (Phase 1) reorganized into four explicit steps matching the work
we're about to do: scaffolding + schemas + empty registries (step 1),
then registry population (step 2), then executor (step 3), then NL
authoring + UI (step 4). The current commit batch lands step 1 only.
Update §3 (Credentials), §7.1 (Dispatcher common path), §8 (Duration
classes and queue routing), and §13 (Decisions locked) to reflect the
v1-minimum scope:
- Credentials block in §3 collapses to a deferred-to-Phase-2 note. The
three guarantees (no creds in definition, no creds in LLM context,
per-call resolution) return unchanged when Phase 2 ships external
capabilities.
- Cost-estimate pre-check in the dispatcher's common path is removed.
Mid-flight budget kill in the executor still enforces budget_cap_usd.
- Queue routing by expected_duration_seconds is deferred. Single
automations_default queue in v1.
- Decisions 24, 25, 26, 32-37, 38-41 marked deferred with explicit
return phase. Three new v1-minimum decisions added (5-field
Capability, measured-not-declared cost, single queue).
All deferrals are additive: the original designs return as-is when
warranted; nothing is rewritten between phases.
Remove the two-tier registry, MCP database schema, harvester pseudocode,
and the lazy per-worker closure cache from §3. v1 ships with a single
in-memory native registry; the MCP design is reintroduced in Phase 4
along with the rest of the integration-tooling surface.
The deferral is additive: the v1 registry interface is the same callable
surface a Phase-4 MCP harvester will register into. No design rewrite
between phases.
Reduce the §3 Capability dataclass from ten fields to five:
id, description, input_schema, output_schema, handler. Removed
fields (name, required_credentials, side_effects,
expected_duration_seconds, cost_estimate) are reintroduced only when a
concrete consumer feature demands them. The v1 invariant is that a
Capability is a typed, named, callable unit and every consumer
(executor, agent tool layer, future HTTP API) sees the same five-field
shape.
Track the initial v2 design document for the SurfSense automation feature.
This is the baseline snapshot of the design before applying the v1-minimum
scope narrowing (capability trimming, MCP deferral, queue-routing deferral).
Subsequent commits trim this down to the v1 scope.
Adds 34 tests under tests/unit/tasks/chat/streaming/ that cover the
new flows tree against the legacy stream_new_chat.py module to gate
the upcoming cutover. Coverage:
* Public entry points: stream_new_chat and stream_resume_chat are
async generator functions whose parameter signatures (name, kind,
annotation, default) match the legacy versions one-for-one. Uses a
normalized-annotation comparison so PEP-563 vs eager-annotation
representation differences are tolerated.
* Extracted helpers: image-capability gate, runtime-context builders
for new-chat and resume-chat, LLM-bundle dispatcher, premium-quota
needs check + reservation dataclass, rate-limit recovery truth
table, persistence-spawn registration/self-unregistration, await
helpers.
* SSE frame iterators: iter_initial_frames + iter_final_frames emit
the canonical sequence; iter_token_usage_frame skips on None.
* Initial thinking step: 4 parametrized branches (text, image-only,
empty, mentioned-docs), long-query truncation, many-docs collapse.
These tests are scaffolding for the cutover and will be removed once
the legacy module is deleted.
Slim composition root for the resume-chat streaming flow. Mirrors the
new_chat orchestrator but specialized for resumed turns:
* no fresh user turn, no title generation, no image-capability gate
* persists a fresh assistant shell for the resumed turn
* applies build_resume_routing to dispatch user decisions to the
correct paused subagent before invoking the agent
* shares the same stream_loop + flow-local _recover closure for in-
stream provider rate-limit recovery
Also lands flows/__init__.py, which becomes the public chat-flow API:
from app.tasks.chat.streaming.flows import stream_new_chat, stream_resume_chat
Existing wiring (routes, contract test) still imports from the legacy
app.tasks.chat.stream_new_chat module. Cutover is the next phase.
Three focused modules used by the upcoming resume-chat orchestrator:
* runtime_context: build_resume_chat_runtime_context assembles the
SurfSenseContextSchema for a resume turn (handles empty mention
lists, since resume requests do not carry fresh @-mentions).
* assistant_shell: persist_resume_assistant_shell writes a fresh
assistant row for the resumed turn so the post-stream finalize
has a target.
* resume_routing: build_resume_routing collects the pending
interrupts across paused subagents and slices the flat list of
ResumeDecision[] into the correct (thread, subagent) buckets so
LangGraph routes each decision back to the right paused tool call.
Add-only; no orchestrator yet (next commit).
Slim composition root for the new-chat streaming flow. Sequences:
1. validate inputs and load the LLM bundle (negative id => YAML)
2. open the OTEL chat_request span; set agent_mode tag
3. spawn the four pre-stream DB writes (set-ai-responding, persist
user turn, persist assistant shell, first-assistant probe)
4. reserve premium quota (with free-fallback retry on denial)
5. build connector + checkpointer + agent + input_state
6. emit first frames (message-start, step-start, initial thinking step)
7. spawn the background title generator
8. run the shared stream_loop with a flow-local _recover closure that
reroutes to the next auto-pin config on provider 429s
9. finalize: emit terminal title/token frames, shielded assistant
finalize, release-or-finalize premium quota, close session, GC,
record OTEL outcome
Public entry-point flows/new_chat/__init__ re-exports stream_new_chat.
Existing wiring (routes, tests) still imports the legacy function from
app.tasks.chat.stream_new_chat. Cutover is a later commit.
Seven focused modules that the upcoming new_chat orchestrator
composes:
* auto_pin: resolve_initial_auto_pin selects the initial config (with
vision-capable filtering and error classification).
* llm_capability: check_image_input_capability blocks routing an
image-bearing turn to a known text-only model.
* runtime_context: build_new_chat_runtime_context assembles the
SurfSenseContextSchema for a new-chat turn.
* persistence_spawn: spawn_set_ai_responding_bg, spawn_persist_user_task,
spawn_persist_assistant_shell_task, and await_persist_task background
the four pre-stream DB writes so they overlap with agent build.
* initial_thinking_step: build_initial_thinking_step +
iter_initial_thinking_step_frame produce the very first thinking-1 SSE
step ("Understanding your request" / "Analyzing referenced content").
* title_gen: spawn_title_task + maybe_emit_title_update +
await_pending_title_update background the thread-title generator and
interleave its update into the stream when ready.
* input_state: build_new_chat_input_state assembles the LangGraph
input_state (history bootstrap, mentions resolution, context blocks,
human-message construction). The heavy one.
Add-only; no orchestrator yet (next commit).
Extracts finalize_assistant_message: the post-stream server-side write
of the final assistant message (with content parts + token usage)
guarded by asyncio.shield + shielded_async_session so a client
disconnect cannot abort the persist.
Add-only; legacy stream_new_chat.py keeps its inline finalize block
until cutover.
Two cooperating modules that wrap stream_agent_events with in-stream
recovery from provider 429s:
* rate_limit_recovery: can_recover_provider_rate_limit truth-table
guard, reroute_to_next_auto_pin (selects the next eligible auto-pin
config and reloads the LLM bundle), log_rate_limit_recovered.
* stream_loop: run_stream_loop drives stream_agent_events in a
while-True loop, delegating recovery to a flow-supplied RecoverFn
callback so new_chat and resume_chat can share the same loop while
keeping their own nonlocal state.
Add-only; not yet wired into any orchestrator.
Extracts handle_terminal_exception: the shared except-branch behavior for
the chat orchestrators. Classifies the raised exception, logs the
structured chat_stream error event, and emits the terminal-error SSE
frame + done sentinel via the streaming service.
Add-only; nothing imports it yet.
Centralizes the premium-credits lifecycle for chat turns:
* needs_premium_quota: gate check (premium user + non-fallback config).
* PremiumReservation: dataclass capturing reservation state + token totals.
* reserve_premium / finalize_premium / release_premium: idempotent
reservation, commit, and rollback used by the orchestrators.
Add-only; legacy stream_new_chat.py keeps its inline quota handling
until cutover.
Six small, single-purpose modules shared by the upcoming new_chat and
resume_chat orchestrators:
* llm_bundle: dispatches negative config_id to the YAML loader and
non-negative config_id to the DB loader, returning (llm, AgentConfig).
* pre_stream_setup: builds the connector service, resolves the
Firecrawl API key, and returns the chat checkpointer.
* first_frames: iter_initial_frames + iter_final_frames emit the canonical
message-start / step-start / idle / finish / done SSE envelope.
* finalize_emit: iter_token_usage_frame emits the per-turn usage frame
from a TokenAccumulator summary.
* finally_cleanup: close_session_and_clear_ai_responding and run_gc_pass
centralize the finally-block bookkeeping.
* span: open_chat_request_span / set_agent_mode / close_chat_request_span /
record_outcome_attrs wrap the OpenTelemetry chat_request span.
Add-only; these are not yet wired into stream_new_chat.py.
Extracts the inner agent-streaming driver previously inlined as
_stream_agent_events in stream_new_chat.py.
stream_agent_events drives graph_stream.event_stream.stream_output and,
after the agent finishes, performs the post-stream safety-net work:
* commit any pending content the agent never explicitly finished
* evaluate file-operation contract outcomes and emit the appropriate
contract verdict for desktop_local_folder turns
This unit is what flows/shared/stream_loop.py wraps in the rate-limit
recovery while-loop. Add-only; no existing wiring uses it yet.
Extracts the agent-construction wrapper that the chat streamers call to
materialize the LangGraph agent for a given thread. Centralizes how we
pass the agent factory plus checkpointer, runtime context, and the
in-memory content builder.
Add-only; pre-existing inline equivalent in stream_new_chat.py stays
until cutover.
Extracts the desktop_local_folder file-operation contract helpers:
* contract_enforcement_active: gates the contract on filesystem mode.
* evaluate_file_contract_outcome: scores tool outputs as success/no-op.
* log_file_contract: structured logging of contract verdicts.
This is the unit responsible for catching agents that claim to have
written/edited a file without actually invoking the filesystem tool.
Add-only; stream_new_chat.py keeps its inline duplicates until cutover.
Extracts two pure context helpers used during input-state assembly:
* mentioned_docs.format_mentioned_surfsense_docs_as_context: renders the
user's @-mentioned SurfSense docs into the LLM context block.
* deepagents_todos.extract_todos_from_deepagents: pulls the in-progress
todo list from a deep-agents state snapshot for the title generator.
Add-only; existing call sites in stream_new_chat.py remain untouched
until cutover.