Commit graph

2229 commits

Author SHA1 Message Date
CREDO23
c3db25302b perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery
The preflight pattern probed the LLM with a 1-token ping before each
cold turn (when requested_llm_config_id==0, llm_config_id<0, and the
45s healthy TTL had expired) to detect 429s before fanning out into
planner/classifier/title-gen. To absorb its ~1-5s RTT cost we built the
agent speculatively in parallel; on 429 we discarded the build and
repinned.

Three problems with that design:

1. False security. Provider rate limits are token-bucket. A 1-token
   ping consumes ~5 tokens; the real request consumes 10-50K. The
   probe can return 200 while the real call still 429s.
2. Pure overhead in the common case. On warm-agent-cache turns the
   probe dominates wall time: ~2.5s of TTFT pure tax for ~99% of users
   who never see a 429.
3. The in-stream recovery loop (catch of _is_provider_rate_limited
   gated by not _first_event_logged) already does the right thing
   reactively: mark_runtime_cooldown -> resolve_or_get_pinned_llm_config_id
   with exclude_config_ids={previous} -> rebuild agent -> retry the
   stream. Preflight was never the only safety net; it was a redundant
   probe in front of one.

Changes:
- Delete _preflight_llm, _settle_speculative_agent_build, and the
  _PREFLIGHT_TIMEOUT_SEC / _PREFLIGHT_MAX_TOKENS constants.
- Drop the parallel agent_build_task / preflight_task plumbing in
  both stream_new_chat and stream_resume_chat; build the agent inline
  with await _build_main_agent_for_thread(...).
- Drop the unused is_recently_healthy / mark_healthy imports here
  (still exported from auto_model_pin_service since OpenRouter
  catalogue refresh and a few tests reference clear_healthy).
- Remove the obsolete preflight + settle-speculative tests from
  test_stream_new_chat_contract.py.

Net: -447 LOC. ~2.5s removed from TTFT on every cold preflight-eligible
turn. 429 recovery path is unchanged - same repin/rebuild/retry, just
not paid in advance on the healthy path.
2026-05-20 11:03:08 +02:00
CREDO23
1791241c0c perf(indexers): offload sync embed_text to thread across background workers
Connector kb_sync_services (gmail, onedrive, google_calendar, jira),
streaming indexers (discord, luma, teams) and the file-processor save
path all called embed_text inside async coroutines, blocking the
background worker's event loop for the duration of the embed. Wrap each
call site in asyncio.to_thread so concurrent indexing tasks stop
serialising on the embed.
2026-05-20 10:09:38 +02:00
CREDO23
a8de98895a perf(revert-service): offload sync embed_texts to thread
_restore_in_place_document and _reinsert_document_from_revision are
async helpers invoked by the synchronous-feeling POST /api/threads/.../revert
route; both ran embed_texts inline, blocking the event loop while the
HTTP client waited.
2026-05-20 10:04:26 +02:00
CREDO23
a3d6fa6196 perf(document-converters): offload sync embed_text/embed_texts to thread
generate_document_summary and create_document_chunks are async helpers
called from the chat path and from many connector indexers. Both wrapped
embed_text/embed_texts directly inside the coroutine, blocking the event
loop for the full duration of the embedding call.
2026-05-20 10:03:42 +02:00
CREDO23
52d425f170 perf(kb-persistence): offload sync embed_texts to thread
_create_document and _update_document run on the chat critical path
when the filesystem subagent writes via the user's chat turn. Both
called embed_texts synchronously inside an async coroutine, blocking
the event loop for the duration of the embed.
2026-05-20 10:03:14 +02:00
CREDO23
4fa85a9a94 perf(kb-search): offload sync embed_texts to thread
embed_texts holds a threading.Lock and runs a sync embedding call inside
search_knowledge_base, an async coroutine on the KB priority middleware
critical path. Blocking the event loop here stalls every other coroutine
on the worker (SSE keepalives, concurrent chat requests, background
tasks). Wrap in asyncio.to_thread so the embed runs on the default
executor pool while the loop keeps serving.
2026-05-20 10:02:38 +02:00
CREDO23
32f6766cb6 fix(tokens): use canonical prompt_tokens_details path for cache fields
LiteLLM normalizes every provider's cache fields onto
usage.prompt_tokens_details (cached_tokens + cache_creation_tokens).
The earlier fallback to usage.cache_read_input_tokens /
usage.cache_creation_input_tokens was wrong: Anthropic-shaped fields
only live there via a trailing setattr loop, and the canonical field
name on the wrapper is cache_creation_tokens (not _input_tokens).
2026-05-20 09:55:39 +02:00
CREDO23
6090980c5e obs(tokens): log prompt-cache read/write counts and hit ratio per LLM call 2026-05-20 09:51:44 +02:00
CREDO23
0cdda14922 perf(kb subagent, desktop): cap evidence.content_excerpt to 500 chars 2026-05-20 09:43:36 +02:00
CREDO23
5edf0520c4 perf(kb subagent, cloud): cap evidence.content_excerpt to 500 chars 2026-05-20 09:43:32 +02:00
CREDO23
b554c600bb perf(research subagent): cap evidence.findings and evidence.sources to bound output 2026-05-20 09:42:57 +02:00
CREDO23
6c173dc2a7 perf(teams subagent): stop echoing raw teams/channels/messages payload into evidence.items 2026-05-20 09:42:03 +02:00
CREDO23
20f7896a99 perf(luma subagent): stop echoing raw events list into evidence.items 2026-05-20 09:41:47 +02:00
CREDO23
f4e66718be perf(discord subagent): stop echoing raw channels/messages payload into evidence.items 2026-05-20 09:41:36 +02:00
CREDO23
56d8ff89e2 perf(airtable subagent): stop echoing raw records list into evidence.items 2026-05-20 09:41:18 +02:00
CREDO23
1b2f13e25c perf(clickup subagent): stop echoing raw tasks list into evidence.items 2026-05-20 09:41:04 +02:00
CREDO23
6be1b22ef6 perf(jira subagent): stop echoing raw issues list into evidence.items 2026-05-20 09:40:48 +02:00
CREDO23
6e5dd54bbf perf(slack subagent): stop echoing raw messages list into evidence.items 2026-05-20 09:40:33 +02:00
CREDO23
d3d396a473 perf(linear subagent): stop echoing raw issues list into evidence.items 2026-05-20 09:40:18 +02:00
CREDO23
553becea28 perf(gmail subagent): stop echoing raw emails array into evidence.items 2026-05-20 09:40:00 +02:00
CREDO23
1481394017 chore(scripts): add MCP session lifetime probe 2026-05-19 21:30:34 +02:00
CREDO23
3a5e16e868 perf(calendar): stop echoing raw events into evidence.items 2026-05-19 21:30:28 +02:00
CREDO23
581bbfb5c1 perf(tokens): add per-call latency to capture log 2026-05-19 21:30:25 +02:00
CREDO23
b3b66e4c48 perf(new-chat): add memory_injection middleware timing log 2026-05-19 21:30:19 +02:00
CREDO23
1df40fbe31 perf(new-chat): add knowledge_tree middleware timing log 2026-05-19 21:30:14 +02:00
CREDO23
bd153d3cdb perf(multi-agent): add kb_context_projection timing log 2026-05-19 21:30:09 +02:00
CREDO23
33bfce4406 perf(subagent): add atask EXIT breakdown timing log 2026-05-19 21:30:05 +02:00
CREDO23
9e81f2a35b perf(subagent): add subagent compile timing log 2026-05-19 21:30:01 +02:00
CREDO23
9bfba34e8e perf(mcp): add per-call, discovery, and oauth-refresh timing logs 2026-05-19 21:29:56 +02:00
Rohan Verma
3c27fe688a
Merge pull request #1390 from AnishSarkar22/fix/backend-tests
fix: unit and integration tests
2026-05-17 18:15:53 -07:00
Anish Sarkar
cb9a0f327c test: refactor Gmail indexer tests to utilize ComposioService and hybrid chunking 2026-05-16 21:26:40 +05:30
Anish Sarkar
a0f2563dc3 test: update Stripe and Google Calendar integration tests to use ComposioService 2026-05-16 21:13:17 +05:30
Anish Sarkar
cc06cff4fb feat(tests): add mock response for file ownership in composio_module 2026-05-16 20:20:04 +05:30
Anish Sarkar
8de7d86d56 Merge remote-tracking branch 'upstream/dev' into fix/backend-tests 2026-05-16 19:40:01 +05:30
Anish Sarkar
af1d2fa430 Merge remote-tracking branch 'upstream/dev' into fix/zero-cache-stale-replica-1355 2026-05-16 19:30:09 +05:30
DESKTOP-RTLN3BA\$punk
9fb9778bd0 test: enhance index batch parallel tests to include hybrid chunker
Updated the test for the indexing pipeline to verify that both the standard and hybrid chunkers are called via asyncio.to_thread, ensuring non-blocking behavior. This change reflects the routing of non-code documents through the hybrid chunker, maintaining the event loop contract.
2026-05-15 18:02:04 -07:00
DESKTOP-RTLN3BA\$punk
c187b04e82 chore: linting 2026-05-15 17:33:44 -07:00
CREDO23
4980f9f1ba Merge remote-tracking branch 'upstream/dev' into feature/multi-agent-with-task-parallelization 2026-05-15 16:44:22 +02:00
CREDO23
a22e0e915f schemas/new_chat: accept 'approve_always' on the resume HTTP boundary
ResumeDecision is the Pydantic gate at the /resume HTTP route. It was
the last spot still rejecting the new wire decision-type, so the FE's
'approve_always' dispatch was being 422'd before it could reach the
permission middleware that already speaks it.
2026-05-15 15:23:39 +02:00
CREDO23
98b6977c68 permissions/ask: gate 'approve_always' palette entry on MCP-ness
Only MCP tools have a persistence target for 'approve_always' (the
connector's trusted-tools list); for native tools the decision lives
only in the in-memory runtime ruleset. Reflect that in the wire palette
so the FE can stay a pure renderer of allowed_decisions instead of
peeking at context.mcp_connector_id to decide whether to show the
'Always Allow' button.

The backend still accepts an 'approve_always' reply for any tool kind
(in-memory promotion is harmless), it just doesn't advertise it when
there's nowhere to persist.
2026-05-15 14:54:16 +02:00
CREDO23
c8b756ae8f hitl/wire: rename 'always' decision-type to 'approve_always'
Renames the SurfSense HITL extension decision-type from "always" to
"approve_always" so it sits in the same verb-first family as "approve",
"reject", and "edit". The Python constant is now SURFSENSE_DECISION_APPROVE_ALWAYS;
the wire value, the permission-domain decision_type, and the FE union members
all match (no wire/internal mismatch).

Both the multi_agent_chat permission middleware and the legacy new_chat one
accept the new wire value; the FE types.ts union is updated accordingly.

The "context.always" payload key is intentionally left untouched - it's the
patterns-to-promote field, semantically distinct from the decision type.
2026-05-15 14:47:32 +02:00
CREDO23
6671c91841 multi_agent_chat/permissions: persist 'always' decisions to trusted-tools list
Until now an "Always Allow" reply only updated the in-memory runtime
ruleset, evaporating after the session ended. Persist it to the
existing connector.config['trusted_tools'] list so the next session's
fetch_user_allowlist_rulesets picks it up and the user is never asked
again for the same (connector, tool) pair.

- TrustedToolSaver + make_trusted_tool_saver(user_id) in
  user_tool_allowlist: opens its own session via async_session_maker
  per call, logs and swallows failures (in-memory promotion is the
  canonical "always" path, durable persistence is opportunistic).

- PermissionMiddleware._process is now pure: returns
  (state_update, list[_AlwaysPromotion]). aafter_model awaits the
  saver for each promotion; after_model discards them. Promotions are
  only emitted for tools whose metadata exposes mcp_connector_id, so
  native tools and KB FS ops are correctly skipped.

- main_agent factory builds the saver once per turn and stashes it in
  dependencies["trusted_tool_saver"]; pack_subagent and the KB
  middleware stack forward it through build_permission_mw.

- Renamed pm._process(state, None) call sites in two existing tests to
  pm.after_model(state, None) so they exercise the public hook
  contract instead of the now-tuple-returning private method.
2026-05-15 14:07:08 +02:00
Rohan Verma
9475036b8a
Merge pull request #1389 from CREDO23/feature/multi-agent
[Feature] Fix multi-agent delegation: orchestrator-only main agent with knowledge_base specialist
2026-05-15 04:54:17 -07:00
Rohan Verma
4db3cf7fd5
Merge pull request #1377 from AnishSarkar22/feat/e2e-testing-ci
feat: add E2E CI and harden Docker build migrations
2026-05-15 04:47:26 -07:00
CREDO23
a97d1548a6 multi_agent_chat/permissions: surface MCP tool metadata into ask interrupts
The FE permission card needs mcp_connector_id, mcp_server, and
tool_description in the interrupt context to render "Always Allow"
against the right connected account. Thread the tool through the
ask pipeline:

- pack_subagent → build_permission_mw(tools=...) → PermissionMiddleware
  (tools_by_name) → request_permission_decision(tool=...) →
  build_permission_ask_payload(tool=...) projects card fields out of
  BaseTool.

- mcp_tool.py: stdio path now stashes mcp_connector_id in metadata for
  parity with the HTTP path.
2026-05-15 11:28:06 +02:00
CREDO23
ef1152b80e multi_agent_chat/permissions: layer user allow-list into subagent compile 2026-05-14 21:57:38 +02:00
CREDO23
e99c06c887 user_tool_allowlist: extract trust-tool storage into reusable service 2026-05-14 21:20:30 +02:00
CREDO23
31d6b43a42 multi_agent_chat/shared: drop bucket types and helpers 2026-05-14 20:10:25 +02:00
CREDO23
014801c764 multi_agent_chat/loader: MCP tools as flat list[BaseTool] per agent 2026-05-14 20:10:11 +02:00
CREDO23
5a00df8e48 multi_agent_chat/builtins: KB+deliverables+memory+research adopt RULESET + flat load_tools() 2026-05-14 20:09:55 +02:00