apunkt/SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-05-06 14:22:47 +02:00

CREDO23 7b9a218d62 feat(chat): add multi-agent mode routing scaffold and telemetry.

2026-04-28 15:35:14 +02:00

1.7 KiB

Raw Blame History

Multi-Agent Architecture Phase 1 Runbook

Scope

This runbook covers mode selection and emergency rollback for:

single_agent
shadow_multi_agent_v1
multi_agent_v1

Phase 1 keeps execution behavior on the current single-agent path while mode wiring and telemetry are introduced.

Resolution Priority

Mode resolution follows this fixed order:

Global kill switch (FORCE_SINGLE_AGENT)
Request override (architecture_mode in chat payload)
System default (AGENT_ARCHITECTURE_MODE)
Safe fallback (single_agent)

Configuration

Set environment values in backend runtime:

AGENT_ARCHITECTURE_MODE=single_agent (default)
FORCE_SINGLE_AGENT=FALSE (default)

Changes require backend restart because config is loaded at process startup.

Mode Switching

System default switch

Set AGENT_ARCHITECTURE_MODE to desired value.
Keep FORCE_SINGLE_AGENT=FALSE.
Restart backend.
Verify logs include [architecture_telemetry] with expected architecture_mode.

Per-request override

Send optional architecture_mode in chat request payload:

"single_agent"
"shadow_multi_agent_v1"
"multi_agent_v1"

If FORCE_SINGLE_AGENT=TRUE, request override is ignored by design.

Emergency Rollback

Use the kill switch:

Set FORCE_SINGLE_AGENT=TRUE.
Restart backend.
Verify new requests log architecture_mode=single_agent.
Keep this state until incident is resolved.

Verification Checklist

Mode resolves according to the priority order.
Kill switch overrides all request/default values.
Streaming response schema remains unchanged.
Architecture telemetry is emitted with:
- architecture_mode
- orchestrator_used
- worker_count
- retry_count
- latency_ms
- token_total