Commit graph

206 commits

Author SHA1 Message Date
Anish Sarkar
dbb652d4f8 feat(observability): add telemetry error and event helpers 2026-05-22 17:48:01 +05:30
Anish Sarkar
b9d76f006d feat(retriever): instrument knowledge base search 2026-05-21 23:03:31 +05:30
Anish Sarkar
ea3d0a6463 feat(agents): emit metrics for model and tool calls 2026-05-21 23:02:36 +05:30
Anish Sarkar
6095b48b5f feat(observability): add SurfSense metric helpers 2026-05-21 23:02:20 +05:30
CREDO23
49da7a57df Merge remote-tracking branch 'upstream/dev' into improvement-agent-speed
Resolves: surfsense_backend/app/agents/new_chat/middleware/memory_injection.py
- Took both imports: upstream moved MEMORY_HARD_LIMIT/SOFT_LIMIT to
  app.services.memory; kept our perf-logger import for timing.

Pulls in upstream changes:
- Memory document feature (services/memory refactor, removal of
  app.agents.new_chat.memory_extraction and background extraction in
  stream_new_chat — agent now drives memory via update_memory tool).
- BACKEND_URL env refactor across web tool-ui/editor/chat/dashboard/lib.
- GitHub Actions backend test workflow + pre-commit biome bump.
- Token-display polish in MessageInfoDropdown; save_memory no-update
  sentinel.

Verified: 1723 unit tests pass, ruff clean. No semantic regression in
stream_new_chat (their memory-extraction deletion and our preflight
removal touch different functions).
2026-05-20 21:23:48 +02:00
CREDO23
c0aa4261ac perf(mcp): persist list_tools discovery in connector.config.cached_tools
Skip the ~1-3s MCP initialize + list_tools handshake on every cache miss
by reading tool definitions from the connector row we already load. Lazy
populate on first miss, self-heal on corrupt cache, zero schema migration.
2026-05-20 16:11:07 +02:00
CREDO23
db8bffab38 perf(prompt-cache): enable Azure prompt_cache_key routing hint
Splits the OpenAI-family gate into per-param predicates so AZURE and
AZURE_OPENAI configs now receive prompt_cache_key for backend routing
affinity (Microsoft auto-caches GPT-4o+ deployments at >=1024 tokens;
the key clusters same-prefix requests on the same GPU pool and raises
hit rate on turn 2+). prompt_cache_retention stays opted out for Azure
because litellm 1.83.14's Azure transformer would drop it silently;
revisit when Azure's supported params list is updated.
2026-05-20 11:58:15 +02:00
Anish Sarkar
8c9be9796a feat: add no-update sentinel handling to save_memory function and corresponding unit tests 2026-05-20 15:03:35 +05:30
CREDO23
c3db25302b perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery
The preflight pattern probed the LLM with a 1-token ping before each
cold turn (when requested_llm_config_id==0, llm_config_id<0, and the
45s healthy TTL had expired) to detect 429s before fanning out into
planner/classifier/title-gen. To absorb its ~1-5s RTT cost we built the
agent speculatively in parallel; on 429 we discarded the build and
repinned.

Three problems with that design:

1. False security. Provider rate limits are token-bucket. A 1-token
   ping consumes ~5 tokens; the real request consumes 10-50K. The
   probe can return 200 while the real call still 429s.
2. Pure overhead in the common case. On warm-agent-cache turns the
   probe dominates wall time: ~2.5s of TTFT pure tax for ~99% of users
   who never see a 429.
3. The in-stream recovery loop (catch of _is_provider_rate_limited
   gated by not _first_event_logged) already does the right thing
   reactively: mark_runtime_cooldown -> resolve_or_get_pinned_llm_config_id
   with exclude_config_ids={previous} -> rebuild agent -> retry the
   stream. Preflight was never the only safety net; it was a redundant
   probe in front of one.

Changes:
- Delete _preflight_llm, _settle_speculative_agent_build, and the
  _PREFLIGHT_TIMEOUT_SEC / _PREFLIGHT_MAX_TOKENS constants.
- Drop the parallel agent_build_task / preflight_task plumbing in
  both stream_new_chat and stream_resume_chat; build the agent inline
  with await _build_main_agent_for_thread(...).
- Drop the unused is_recently_healthy / mark_healthy imports here
  (still exported from auto_model_pin_service since OpenRouter
  catalogue refresh and a few tests reference clear_healthy).
- Remove the obsolete preflight + settle-speculative tests from
  test_stream_new_chat_contract.py.

Net: -447 LOC. ~2.5s removed from TTFT on every cold preflight-eligible
turn. 429 recovery path is unchanged - same repin/rebuild/retry, just
not paid in advance on the healthy path.
2026-05-20 11:03:08 +02:00
Anish Sarkar
132e7b3c44 refactor: remove memory extraction functions and related components from the new chat agent 2026-05-20 14:03:28 +05:30
Anish Sarkar
a0ff86e0e8 feat: add memory document model and parsing functionality for markdown handling 2026-05-20 13:20:05 +05:30
Anish Sarkar
fe07de3f9c chore: ran linting 2026-05-20 12:55:10 +05:30
Anish Sarkar
5247dc7097 feat: refine private and team memory protocols 2026-05-20 02:02:10 +05:30
Anish Sarkar
ceedd02353 refactor: extract shared memory service 2026-05-20 02:01:36 +05:30
Anish Sarkar
8de7d86d56 Merge remote-tracking branch 'upstream/dev' into fix/backend-tests 2026-05-16 19:40:01 +05:30
DESKTOP-RTLN3BA\$punk
9fb9778bd0 test: enhance index batch parallel tests to include hybrid chunker
Updated the test for the indexing pipeline to verify that both the standard and hybrid chunkers are called via asyncio.to_thread, ensuring non-blocking behavior. This change reflects the routing of non-code documents through the hybrid chunker, maintaining the event loop contract.
2026-05-15 18:02:04 -07:00
DESKTOP-RTLN3BA\$punk
c187b04e82 chore: linting 2026-05-15 17:33:44 -07:00
CREDO23
4980f9f1ba Merge remote-tracking branch 'upstream/dev' into feature/multi-agent-with-task-parallelization 2026-05-15 16:44:22 +02:00
CREDO23
98b6977c68 permissions/ask: gate 'approve_always' palette entry on MCP-ness
Only MCP tools have a persistence target for 'approve_always' (the
connector's trusted-tools list); for native tools the decision lives
only in the in-memory runtime ruleset. Reflect that in the wire palette
so the FE can stay a pure renderer of allowed_decisions instead of
peeking at context.mcp_connector_id to decide whether to show the
'Always Allow' button.

The backend still accepts an 'approve_always' reply for any tool kind
(in-memory promotion is harmless), it just doesn't advertise it when
there's nowhere to persist.
2026-05-15 14:54:16 +02:00
CREDO23
c8b756ae8f hitl/wire: rename 'always' decision-type to 'approve_always'
Renames the SurfSense HITL extension decision-type from "always" to
"approve_always" so it sits in the same verb-first family as "approve",
"reject", and "edit". The Python constant is now SURFSENSE_DECISION_APPROVE_ALWAYS;
the wire value, the permission-domain decision_type, and the FE union members
all match (no wire/internal mismatch).

Both the multi_agent_chat permission middleware and the legacy new_chat one
accept the new wire value; the FE types.ts union is updated accordingly.

The "context.always" payload key is intentionally left untouched - it's the
patterns-to-promote field, semantically distinct from the decision type.
2026-05-15 14:47:32 +02:00
CREDO23
6671c91841 multi_agent_chat/permissions: persist 'always' decisions to trusted-tools list
Until now an "Always Allow" reply only updated the in-memory runtime
ruleset, evaporating after the session ended. Persist it to the
existing connector.config['trusted_tools'] list so the next session's
fetch_user_allowlist_rulesets picks it up and the user is never asked
again for the same (connector, tool) pair.

- TrustedToolSaver + make_trusted_tool_saver(user_id) in
  user_tool_allowlist: opens its own session via async_session_maker
  per call, logs and swallows failures (in-memory promotion is the
  canonical "always" path, durable persistence is opportunistic).

- PermissionMiddleware._process is now pure: returns
  (state_update, list[_AlwaysPromotion]). aafter_model awaits the
  saver for each promotion; after_model discards them. Promotions are
  only emitted for tools whose metadata exposes mcp_connector_id, so
  native tools and KB FS ops are correctly skipped.

- main_agent factory builds the saver once per turn and stashes it in
  dependencies["trusted_tool_saver"]; pack_subagent and the KB
  middleware stack forward it through build_permission_mw.

- Renamed pm._process(state, None) call sites in two existing tests to
  pm.after_model(state, None) so they exercise the public hook
  contract instead of the now-tuple-returning private method.
2026-05-15 14:07:08 +02:00
Rohan Verma
9475036b8a
Merge pull request #1389 from CREDO23/feature/multi-agent
[Feature] Fix multi-agent delegation: orchestrator-only main agent with knowledge_base specialist
2026-05-15 04:54:17 -07:00
CREDO23
a97d1548a6 multi_agent_chat/permissions: surface MCP tool metadata into ask interrupts
The FE permission card needs mcp_connector_id, mcp_server, and
tool_description in the interrupt context to render "Always Allow"
against the right connected account. Thread the tool through the
ask pipeline:

- pack_subagent → build_permission_mw(tools=...) → PermissionMiddleware
  (tools_by_name) → request_permission_decision(tool=...) →
  build_permission_ask_payload(tool=...) projects card fields out of
  BaseTool.

- mcp_tool.py: stdio path now stashes mcp_connector_id in metadata for
  parity with the HTTP path.
2026-05-15 11:28:06 +02:00
CREDO23
ef1152b80e multi_agent_chat/permissions: layer user allow-list into subagent compile 2026-05-14 21:57:38 +02:00
CREDO23
d45dfbfbd6 multi_agent_chat: pack_subagent owns per-subagent PermissionMiddleware via Ruleset 2026-05-14 20:09:29 +02:00
CREDO23
0723702320 multi_agent_chat: real-graph regressions for unified HITL paths + format pass 2026-05-14 17:41:24 +02:00
CREDO23
a36b15b834 multi_agent_chat/middleware: tighten parallel-keying test with heterogeneous bundles and per-slice assertions 2026-05-14 10:11:51 +02:00
CREDO23
d69d2cc1fc multi_agent_chat/middleware: tighten heterogeneous slice arithmetic to (2,3) bundles 2026-05-14 10:05:04 +02:00
CREDO23
668b89927b multi_agent_chat/middleware: real-graph regression test for partial-pause parallel routing 2026-05-14 09:47:24 +02:00
CREDO23
8e10f38f32 multi_agent_chat/middleware: real-graph regression test for all-reject parallel routing 2026-05-14 09:36:03 +02:00
CREDO23
ca57b2106e multi_agent_chat/middleware: real-graph regression test for heterogeneous parallel decisions 2026-05-14 09:26:08 +02:00
DESKTOP-RTLN3BA\$punk
3737118050 chore: evals 2026-05-13 14:02:26 -07:00
CREDO23
f2495092da chat/stream_resume: salt thinking-step prefix with turn_id to avoid duplicate React keys 2026-05-13 21:15:51 +02:00
CREDO23
0fd87ccb7f chat/stream_resume: key Command(resume=...) by Interrupt.id for parallel HITL 2026-05-13 20:59:57 +02:00
CREDO23
c06dd6e8ba chat/stream_new_chat: emit one SSE frame per pending interrupt 2026-05-13 20:59:48 +02:00
CREDO23
1001f56206 multi_agent_chat/middleware: parallel task tests and full bridge coverage 2026-05-13 19:57:57 +02:00
CREDO23
6fb011c95c multi_agent_chat/middleware: real-graph regression tests for interrupt stamping 2026-05-13 19:57:09 +02:00
CREDO23
fc2c5b6445 multi_agent_chat/middleware: per-call thread_id, tcid-keyed resume, decisions slicer 2026-05-13 19:56:51 +02:00
CREDO23
246dae40a8 Merge upstream/dev into feature/multi-agent 2026-05-12 21:23:37 +02:00
Anish Sarkar
9b926b3133 refactor: update test for index() to use chunk_text_hybrid 2026-05-13 00:22:43 +05:30
CREDO23
d843468256 multi_agent_chat/subagents: dict-keyed middleware_stack + always-on KB 2026-05-12 18:04:54 +02:00
DESKTOP-RTLN3BA\$punk
c8374e6c5b feat: improved document, folder mentions rendering
Some checks are pending
Build and Push Docker Images / tag_release (push) Waiting to run
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Blocked by required conditions
2026-05-09 22:15:51 -07:00
Rohan Verma
28a02a9143
Merge pull request #1357 from CREDO23/feature/multi-agent
[Feature] Multi-agent chat: hierarchical timeline, live subagent streaming, and inline HITL approvals
2026-05-09 16:13:04 -07:00
Rohan Verma
fa31da9937
Merge branch 'dev' into feat/e2e-testing 2026-05-09 16:10:45 -07:00
CREDO23
2ab6b1c757 Merge upstream/dev into feature/multi-agent. 2026-05-09 23:00:56 +02:00
Anish Sarkar
de6fc80dbd chore: ran linting 2026-05-09 05:28:09 +05:30
Anish Sarkar
f7bac59a4b test(integration): enhance Drive indexer credential resolution tests for Composio and native connectors 2026-05-09 05:26:36 +05:30
Anish Sarkar
dbf575fbd0 chore: ran linting 2026-05-09 05:16:20 +05:30
CREDO23
1761b60c16 Carry thinkingStepId on tool output and extend builder and parity tests. 2026-05-08 23:17:12 +02:00
CREDO23
007a0a30ec Cover tool_activity_metadata for span-only, step-only, and combined cases. 2026-05-08 23:16:56 +02:00