Commit graph

175 commits

Author SHA1 Message Date
Rohan Verma
b4af67f77d
Merge pull request #1540 from DhruvTilva/fix/table-element-text-as-html-keyerror
fix: handle missing text_as_html metadata for Unstructured table elements
2026-06-25 13:35:41 -07:00
DhruvTilva
e16e4e2c5c fix: guard missing text_as_html in Table element markdown conversion
When the Unstructured API returns a Table element without text_as_html
in its metadata (e.g. local install or free-tier API), the lambda was
raising KeyError: 'text_as_html', crashing the entire document
indexing pipeline for any file containing tables.

Guard the key access with .get() and fall back to the plain extracted
text content (x) so the pipeline continues and the table content is
still indexed, just without HTML formatting.
2026-06-25 23:52:15 +05:30
Anish Sarkar
2e33ba7723 chore: fix linting 2026-06-25 04:31:36 +05:30
Anish Sarkar
d6bffa6f07 chore: fix linting 2026-06-25 04:31:22 +05:30
Anish Sarkar
7d4c994900 refactor(blocknote): enhance inline content rendering by incorporating inherited styles 2026-06-25 04:19:21 +05:30
Anish Sarkar
bf8772e312 Merge remote-tracking branch 'upstream/dev' into feat/auth-revamp 2026-06-23 15:52:11 +05:30
Anish Sarkar
3695e1d5c5 Merge remote-tracking branch 'upstream/dev' into feat/api-key 2026-06-23 13:09:53 +05:30
Anish Sarkar
08c1d12eb1 fix(authz):publish zero parent tables 2026-06-23 12:53:36 +05:30
Anish Sarkar
5ba940f905 fix(auth):harden refresh token schema 2026-06-23 12:49:46 +05:30
DESKTOP-RTLN3BA\$punk
a08de01cc7 Revert "Merge pull request #1523 from CREDO23/fix/chat-citations"
This reverts commit cd2242147a, reversing
changes made to a4bb0a5253.
2026-06-22 22:55:29 -07:00
Anish Sarkar
fd31ac34fd Merge remote-tracking branch 'upstream/dev' into feat/api-key 2026-06-20 10:50:03 +05:30
Anish Sarkar
7e8d26fa81 refactor: route authorization through auth context 2026-06-19 20:27:28 +05:30
Anish Sarkar
cddfb3660b feat: resolve auth context from sessions and PATs 2026-06-19 20:26:46 +05:30
CREDO23
0f32b35d3e feat: add char-span to line-range helper 2026-06-19 14:53:49 +02:00
CREDO23
32a6e54ce6 Merge remote-tracking branch 'upstream/dev' into features/documents-injestion-layered-cached 2026-06-14 11:30:33 +02:00
CREDO23
5a71769dba fix(chunks): set position on remaining chunk insert paths
document_converters, the github size-fallback chunker, revert_service
restores, and the kb-persistence middleware now write explicit positions
(the middleware read path also orders by position).
2026-06-12 18:53:08 +02:00
Anish Sarkar
3dd54230e7 fix(chat): normalize provider-safe message history 2026-06-12 02:17:37 +05:30
DESKTOP-RTLN3BA\$punk
ce952d2ad1 chore: linting 2026-06-09 00:42:26 -07:00
DESKTOP-RTLN3BA\$punk
640ef5f15d feat(proxy): integrate Scrapling for enhanced web scraping capabilities
- Replaced Playwright with Scrapling's fetchers in the web crawling and YouTube processing modules for improved performance and flexibility.
- Updated proxy configuration to support dynamic proxy selection via environment variables.
- Enhanced logging to track performance metrics during web scraping operations.
- Refactored related modules to utilize the new proxy utilities and streamline the scraping process.
2026-06-09 00:15:10 -07:00
Anish Sarkar
81fa219b30 feat(backend): Remove LLM summaries from document indexing 2026-06-04 00:50:19 +05:30
DESKTOP-RTLN3BA\$punk
40ca9e6ed2 refactor: remove search_surfsense_docs tool and related references
- Deleted the `search_surfsense_docs` tool and its associated files, streamlining the agent's toolset.
- Updated various components and prompts to remove references to the now-removed tool, ensuring consistency across the codebase.
- Adjusted documentation to direct users to the SurfSense documentation link for product-related queries instead.
2026-05-28 22:35:14 -07:00
DESKTOP-RTLN3BA\$punk
9d6e9b7e2d feat: enhance task management and timeout configurations in multi-agent chat
- Added new environment variables for controlling task execution limits, including `SURFSENSE_SUBAGENT_INVOKE_TIMEOUT_SECONDS`, `SURFSENSE_TASK_BATCH_CONCURRENCY`, and `SURFSENSE_TASK_BATCH_MAX_SIZE`.
- Updated documentation to reflect new batch processing capabilities for `task` calls, allowing for concurrent execution of multiple subagent tasks.
- Improved error handling and receipt generation for deliverables, ensuring consistent feedback on task status.
- Refactored middleware to incorporate search space ID for better task management.
2026-05-27 14:58:10 -07:00
Anish Sarkar
6095b48b5f feat(observability): add SurfSense metric helpers 2026-05-21 23:02:20 +05:30
CREDO23
d5ee8cc4cd Merge remote-tracking branch 'upstream/dev' into improvement-agent-speed 2026-05-20 19:22:49 +02:00
CREDO23
a3d6fa6196 perf(document-converters): offload sync embed_text/embed_texts to thread
generate_document_summary and create_document_chunks are async helpers
called from the chat path and from many connector indexers. Both wrapped
embed_text/embed_texts directly inside the coroutine, blocking the event
loop for the full duration of the embedding call.
2026-05-20 10:03:42 +02:00
Anish Sarkar
01d7379914 refactor: add public URL handling for SurfSense documents across various components and schemas 2026-05-15 02:05:11 +05:30
DESKTOP-RTLN3BA\$punk
ca9bbee06d chore: linting 2026-04-28 21:37:51 -07:00
DESKTOP-RTLN3BA\$punk
e6433f78c4 Merge commit '61f4d05cd1' into dev_mod 2026-04-28 09:25:41 -07:00
DESKTOP-RTLN3BA\$punk
31a372bb84 feat: updated agent harness 2026-04-28 09:22:19 -07:00
DESKTOP-RTLN3BA\$punk
8d50f90060 chore: linting
Some checks failed
Obsidian Plugin Lint / lint (push) Has been cancelled
2026-04-27 14:04:50 -07:00
CREDO23
2d962f6dd2 Merge upstream/dev 2026-04-27 22:44:40 +02:00
CREDO23
d1080b1298 Extend new chat streaming for multimodal user turns 2026-04-24 18:48:02 +02:00
Anish Sarkar
9b1b9a90c0 Merge remote-tracking branch 'upstream/dev' into feat/obsidian-plugin 2026-04-24 21:34:55 +05:30
CREDO23
0eae96bffb fix: harden MCP OAuth and connector edge cases 2026-04-22 20:54:42 +02:00
CREDO23
328219e46f disable first-run indexing for live connectors 2026-04-21 21:52:17 +02:00
Anish Sarkar
99623a85d5 refactor: remove legacy Obsidian connector support 2026-04-22 00:10:24 +05:30
CREDO23
45acf9de15 add async retry utility with tenacity 2026-04-21 20:28:36 +02:00
DESKTOP-RTLN3BA\$punk
4a51ccdc2c cloud: added openrouter integration with global configs 2026-04-15 23:46:29 -07:00
CREDO23
7e90a8ed3c Route uploaded images to vision LLM with document-parser fallback 2026-04-09 14:33:33 +02:00
Anish Sarkar
20fa93f0ba refactor: make Azure Document Intelligence an internal LLAMACLOUD accelerator instead of a standalone ETL service 2026-04-08 03:26:24 +05:30
Anish Sarkar
1fa8d1220b feat: add support for Azure Document Intelligence in ETL pipeline 2026-04-08 00:59:12 +05:30
Anish Sarkar
0a26a6c5bb chore: ran linting 2026-04-07 05:55:39 +05:30
Anish Sarkar
e7beeb2a36 refactor: unify file skipping logic across Dropbox, Google Drive, and OneDrive connectors by replacing classification checks with a centralized service-based approach, enhancing maintainability and consistency in file handling 2026-04-07 02:19:31 +05:30
Anish Sarkar
0fb92b7c56 refactor: streamline file skipping logic in Dropbox indexer by removing redundant checks, improving code clarity 2026-04-06 22:17:50 +05:30
Anish Sarkar
63a75052ca Merge remote-tracking branch 'upstream/dev' into feat/unified-etl-pipeline 2026-04-06 22:04:51 +05:30
Anish Sarkar
dc7047f64d refactor: implement file type classification for supported extensions across Dropbox, Google Drive, and OneDrive connectors, enhancing file handling and error management 2026-04-06 22:03:47 +05:30
Anish Sarkar
e814540727 refactor: move PKCE pair generatio for airtable
- Removed the `generate_pkce_pair` function from `airtable_add_connector_route.py` and relocated it to `oauth_security.py` for better organization.
- Updated imports in `airtable_add_connector_route.py` to reflect the new location of the PKCE generation function.
2026-04-04 03:36:54 +05:30
Anish Sarkar
8e6b1c77ea feat: implement PKCE support in native Google OAuth flows
- Added `generate_code_verifier` function to create a PKCE code verifier for enhanced security.
- Updated Google Calendar, Drive, and Gmail connector routes to utilize the PKCE code verifier during OAuth authorization.
- Modified state management to include the code verifier for secure state generation and validation.
2026-04-04 03:35:34 +05:30
Anish Sarkar
746c730b2e chore: ran linting 2026-04-03 13:14:40 +05:30
Anish Sarkar
96a58d0d30 feat: implement local folder indexing and document versioning capabilities 2026-04-02 11:11:57 +05:30