feat: enhance task management and timeout configurations in multi-agent chat

- Added new environment variables for controlling task execution limits, including `SURFSENSE_SUBAGENT_INVOKE_TIMEOUT_SECONDS`, `SURFSENSE_TASK_BATCH_CONCURRENCY`, and `SURFSENSE_TASK_BATCH_MAX_SIZE`. - Updated documentation to reflect new batch processing capabilities for `task` calls, allowing for concurrent execution of multiple subagent tasks. - Improved error handling and receipt generation for deliverables, ensuring consistent feedback on task status. - Refactored middleware to incorporate search space ID for better task management.
2026-07-16 23:01:06 +02:00 · 2026-05-27 14:58:10 -07:00 · 2026-05-27 14:58:10 -07:00 · 9d6e9b7e2d
commit 9d6e9b7e2d
parent 820f541f08
66 changed files with 2561 additions and 380 deletions
--- a/surfsense_backend/.env.example
+++ b/surfsense_backend/.env.example
@ -357,3 +357,50 @@ LANGSMITH_PROJECT=surfsense
 # updates and deletes — the TTL only bounds staleness for bulk-import
 # paths that bypass the ORM. Set to 0 to disable the cache.
 # SURFSENSE_CONNECTOR_DISCOVERY_TTL_SECONDS=30
+
+# -----------------------------------------------------------------------------
+# `task` boundary controls (Hermes-inspired improvements)
+# -----------------------------------------------------------------------------
+# Wall-clock budget for a single ``task(subagent, ...)`` invocation in
+# seconds. Subagents that run hot (slow image vendors, sluggish embedders,
+# wedged MCP servers) would otherwise pin the orchestrator until the next
+# checkpoint heartbeat fires. On timeout the runtime cancels the underlying
+# coroutine and synthesizes a ToolMessage telling the orchestrator to treat
+# the result as ``status=error``. Set to 0 to disable the cap entirely.
+# Default: 300.0
+# SURFSENSE_SUBAGENT_INVOKE_TIMEOUT_SECONDS=300
+
+# Batch-mode (``task(tasks=[...])``) concurrency cap and max batch size.
+# Concurrency is enforced via an ``asyncio.Semaphore`` so a runaway fanout
+# cannot starve unrelated subagents (each child still owns an LLM call and
+# its own DB session). Max-size is a hard safety net for prompt-injection /
+# runaway loops; the orchestrator rarely needs more than a handful of
+# concurrent specialists. Set concurrency to 1 to effectively serialise
+# batches without changing the schema.
+# SURFSENSE_TASK_BATCH_CONCURRENCY=3
+# SURFSENSE_TASK_BATCH_MAX_SIZE=8
+
+# Soft per-turn cap on cumulative ``task(...)`` invocations across all
+# subagents. Once the sum of ``state['billable_calls']`` crosses this
+# number, the runtime appends a one-shot warning ToolMessage telling the
+# orchestrator to wrap up rather than launching more specialists. Tunable
+# so heavy-research turns (15+ legitimate specialist calls) don't trip the
+# alarm in production. Set to 0 to disable the warning entirely.
+# SURFSENSE_SUBAGENT_BILLABLE_THRESHOLD=15
+
+# Per-workspace spawn-paused kill switch — set via Redis at runtime, not
+# this env var. The env var below only disables the check itself (useful
+# for local dev without Redis). To pause a workspace in production:
+#     redis-cli SET surfsense:spawn_paused:<search_space_id> 1 EX 600
+#     redis-cli DEL surfsense:spawn_paused:<search_space_id>
+# The check is fail-open: a Redis blip never blocks ``task(...)``.
+# SURFSENSE_TASK_SPAWN_PAUSED_DISABLED=false
+
+# Note on Celery-backed deliverables (generate_podcast,
+# generate_video_presentation): these tools poll the artefact row until
+# it reaches a terminal status — they do NOT use an internal wall-clock
+# budget. The effective ceiling is SURFSENSE_SUBAGENT_INVOKE_TIMEOUT_SECONDS
+# (above, default 300s) in multi-agent mode and the chat's HTTP / process
+# lifetime in single-agent mode. If your podcasts or videos routinely
+# exceed 5 minutes, raise SURFSENSE_SUBAGENT_INVOKE_TIMEOUT_SECONDS (or
+# set it to 0 to disable that ceiling entirely).