Commit graph

738 commits

Author SHA1 Message Date
DESKTOP-RTLN3BA\$punk
0fe650fd8e Merge commit '7ce409c580' into dev 2026-06-16 22:48:14 -07:00
DESKTOP-RTLN3BA\$punk
b9702b3245 chore: linting 2026-06-16 16:27:16 -07:00
DESKTOP-RTLN3BA\$punk
da64433439 fix(db): reap orphaned idle-in-transaction sessions on the Celery engine
The long-running ingestion/podcast/video tasks run on a separate Celery
engine (NullPool), so the web engine's idle_in_transaction_session_timeout
did not cover them — which is exactly where the original 11h zombie
(INSERT INTO chunks) came from. Apply the same protection to the Celery
engine with a generous 60-minute default so a worker that hangs/crashes
mid-transaction can't hold locks on documents/chunks indefinitely, while
never reaping a legitimate per-document embed window.

- config + .env.example: DB_CELERY_IDLE_IN_TX_TIMEOUT_MS (default 3600000).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-16 16:26:04 -07:00
CREDO23
32a6e54ce6 Merge remote-tracking branch 'upstream/dev' into features/documents-injestion-layered-cached 2026-06-14 11:30:33 +02:00
Anish Sarkar
c7409c8995 chore: ran linting 2026-06-13 21:59:35 +05:30
Anish Sarkar
ab5423d2d2 Merge remote-tracking branch 'upstream/dev' into feat/unified-model-connections 2026-06-13 19:04:49 +05:30
Anish Sarkar
8fe9c21e76 feat(token-tracking): add model metadata registration and enhance token usage tracking 2026-06-13 03:08:35 +05:30
CREDO23
5a71769dba fix(chunks): set position on remaining chunk insert paths
document_converters, the github size-fallback chunker, revert_service
restores, and the kb-persistence middleware now write explicit positions
(the middleware read path also orders by position).
2026-06-12 18:53:08 +02:00
CREDO23
0fb1d3d37b feat(etl-cache): route all file-based sources through the parse cache
Every file ingestion path (Dropbox, Google Drive / Composio Drive, OneDrive,
local folder, Obsidian, and the legacy upload handlers) now parses via the
extract_with_cache facade instead of calling EtlPipelineService.extract
directly, so identical bytes are deduplicated globally regardless of source.
vision_llm is passed through, keeping the existing cacheability gate intact.
2026-06-12 14:47:25 +02:00
CREDO23
0dc2ccc003 feat(tasks): route extraction through etl cache 2026-06-12 11:23:50 +02:00
DESKTOP-RTLN3BA\$punk
c855be8ccd fix(auto_reload): update task to use a lambda for user_id in async call 2026-06-11 16:51:18 -07:00
Anish Sarkar
8e8cf96faa feat(error-handling): implement LLM error adaptation and classification for chat streaming
- Introduced LLMErrorCategory and adapt_llm_exception to normalize LLM exceptions.
- Updated llm_retryable_message and llm_permanent_message to utilize the new adaptation logic.
- Enhanced classify_stream_exception to classify provider errors and return user-friendly messages.
- Added tests for error classification and adaptation to ensure robustness.
- Updated frontend error handling to display appropriate messages based on new classifications.
2026-06-12 05:03:14 +05:30
DESKTOP-RTLN3BA\$punk
05190da0a9 chore: linting 2026-06-11 15:31:43 -07:00
Anish Sarkar
908790e40f Merge remote-tracking branch 'upstream/dev' into feat/unified-model-connections 2026-06-12 03:15:28 +05:30
CREDO23
41f4a58663 Merge remote-tracking branch 'upstream/dev' into improvement-podcast-graph
# Conflicts:
#	surfsense_backend/app/tasks/celery_tasks/podcast_tasks.py
2026-06-11 23:14:49 +02:00
Anish Sarkar
3dd54230e7 fix(chat): normalize provider-safe message history 2026-06-12 02:17:37 +05:30
Anish Sarkar
5d5d574550 refactor(model-connections): move backend model connections to provider capabilities 2026-06-12 02:17:22 +05:30
Anish Sarkar
c28c4f5785 feat(chat): route models by provider capabilities 2026-06-11 18:22:23 +05:30
CREDO23
eb56acc407 refactor(podcasts): regenerate via brief gate, render brief inline in chat 2026-06-11 11:45:17 +02:00
DESKTOP-RTLN3BA\$punk
a7407502d3 feat(refactor): refactor payment system to implement unified credit wallet.
- Updated environment variables and - configurations for credit purchases via Stripe, replacing legacy page pack system.
- Introduced auto-reload feature for credit top-ups and modified database models to track credit transactions.
- Updated notification system to handle insufficient credits and auto-reload failures.
- Adjusted API routes and schemas to reflect changes in credit management.
2026-06-10 16:49:03 -07:00
CREDO23
97ab7a88fd refactor(podcasts): remove legacy podcaster agent, task, and schema 2026-06-10 21:45:04 +02:00
CREDO23
3eb7cdb2d8 refactor(podcasts): gate chat-triggered podcast on brief review 2026-06-10 21:44:50 +02:00
CREDO23
ba687813c1 fix(elasticsearch): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
c26181d086 fix(airtable): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
e3afe9d7c7 fix(luma): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
8191118eb4 fix(bookstack): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
45438249b6 fix(clickup): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
f5dd8f3985 fix(github): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
f085ac59e5 fix(teams): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
791b0afe16 fix(discord): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
be8a3bcd00 fix(slack): commit failed status immediately 2026-06-10 00:10:52 +02:00
CREDO23
c47949791b fix(confluence): fail skipped placeholders so they don't stay pending 2026-06-10 00:10:42 +02:00
CREDO23
d70d01f331 fix(linear): fail skipped placeholders so they don't stay pending 2026-06-10 00:10:42 +02:00
CREDO23
1b0912aaa3 fix(calendar): fail skipped placeholders so they don't stay pending 2026-06-10 00:10:42 +02:00
CREDO23
b2c2fc9c2e fix(gmail): fail skipped placeholders so they don't stay pending 2026-06-10 00:10:42 +02:00
CREDO23
90b32a8880 fix(notion): fail skipped placeholders so they don't stay pending 2026-06-10 00:10:42 +02:00
CREDO23
33300e4faa fix(dropbox): sanitize ETL reason and retry stuck pending/processing files 2026-06-10 00:10:25 +02:00
CREDO23
464e7d4554 fix(onedrive): sanitize ETL reason and retry stuck pending/processing files 2026-06-10 00:10:25 +02:00
CREDO23
c0c5f3414e fix(google-drive): sanitize ETL reason and retry stuck pending/processing files 2026-06-10 00:10:25 +02:00
CREDO23
e45e8389dc fix(dropbox): mark documents failed on ETL failure 2026-06-09 23:39:25 +02:00
CREDO23
82aaaa5a9f fix(onedrive): mark documents failed on ETL failure 2026-06-09 23:39:25 +02:00
CREDO23
6fd95f82b4 fix(google-drive): mark placeholders failed on ETL failure 2026-06-09 23:39:25 +02:00
CREDO23
cb10882dc8 feat(indexers): add mark_connector_documents_failed helper 2026-06-09 23:39:25 +02:00
DESKTOP-RTLN3BA\$punk
ce952d2ad1 chore: linting 2026-06-09 00:42:26 -07:00
DESKTOP-RTLN3BA\$punk
640ef5f15d feat(proxy): integrate Scrapling for enhanced web scraping capabilities
- Replaced Playwright with Scrapling's fetchers in the web crawling and YouTube processing modules for improved performance and flexibility.
- Updated proxy configuration to support dynamic proxy selection via environment variables.
- Enhanced logging to track performance metrics during web scraping operations.
- Refactored related modules to utilize the new proxy utilities and streamline the scraping process.
2026-06-09 00:15:10 -07:00
CREDO23
8bdfd00a15 Merge upstream/dev 2026-06-05 19:18:12 +02:00
CREDO23
0081b627e9 refactor(agents): move kb_persistence middleware into main_agent (owner)
The KB-persistence impl lived in shared/middleware/ but no subagent uses it --
consumers are the main_agent builder and the boundary event loop. Colocate with
its owner using the folder-per-middleware shape; __init__ re-exports the public
surface. Tests that reached module internals now alias the .middleware submodule.

  main_agent/middleware/kb_persistence.py -> kb_persistence/builder.py
  shared/middleware/kb_persistence.py     -> kb_persistence/middleware.py
2026-06-05 14:11:55 +02:00
CREDO23
a7a642fedc refactor(agents): move busy_mutex middleware into main_agent (owner)
The busy-mutex impl (BusyMutexMiddleware + cancel/turn-lifecycle primitives)
lived in shared/middleware/ but no subagent uses it -- consumers are the
main_agent builder and the boundary (turn lifecycle). Colocate with its owner
using the folder-per-middleware shape; __init__ re-exports the public surface so
boundary import sites only change package path:

  main_agent/middleware/busy_mutex.py    -> busy_mutex/builder.py
  shared/middleware/busy_mutex.py        -> busy_mutex/middleware.py
2026-06-05 14:08:45 +02:00
CREDO23
f2a61bc0ef refactor(agents): consolidate chat runtime infra under chat/runtime
Move the lower-level runtime/infra modules out of multi_agent_chat/shared/
(they were never used by subagents, so they failed the shared-by-all-siblings
rule) and unify them with the already-relocated checkpointer:

  agents/runtime/                      -> agents/chat/runtime/
  mac/shared/errors.py                 -> chat/runtime/errors.py
  mac/shared/llm_config.py             -> chat/runtime/llm_config.py
  mac/shared/prompt_caching.py         -> chat/runtime/prompt_caching.py
  mac/shared/mention_resolver.py       -> chat/runtime/mention_resolver.py
  mac/shared/path_resolver.py          -> chat/runtime/path_resolver.py

These sit below the agent packages: the boundary + agent factory + shared
middleware depend on them, and they import no agent code (acyclic).
2026-06-05 13:19:24 +02:00
CREDO23
7d866a2279 refactor(agents): sink sandbox.py into filesystem subsystem
shared/sandbox.py was used only by the filesystem middleware/tools (and the
boundary) -- never by main_agent or subagents as shared code. Move it next to
its only agent-side consumer:

  multi_agent_chat/shared/sandbox.py
  -> multi_agent_chat/shared/middleware/filesystem/sandbox.py
2026-06-05 13:15:57 +02:00