SurfSense/surfsense_backend/app
guangyang1206 2f3a33c9d5 feat(chunker): add table-aware chunk_text_hybrid to prevent mid-row table splits
Document_chunker currently splits Markdown tables mid-row when the table is
larger than a single chunk window, producing garbled rows that are useless
for RAG retrieval (issue #1334).

Changes:
- document_chunker.py: add chunk_text_hybrid() that detects Markdown table
  blocks with a regex, emits each table as an indivisible single chunk, and
  feeds the surrounding prose through the normal chunk_text() chunker.
- indexing_pipeline_service.py: route normal (non-code) documents through
  chunk_text_hybrid instead of chunk_text so tables are protected by default.

Fixes #1334
2026-05-05 12:48:04 +08:00
..
agents feat: implement agent caches and fix invalid prompt cache configs 2026-05-03 06:03:40 -07:00
config feat: fixed vision/image provider specific errors and fixed podcast/video streaming 2026-05-02 19:18:53 -07:00
connectors chore: linting 2026-04-27 14:04:50 -07:00
etl_pipeline feat: unified credits and its cost calculations 2026-05-02 14:34:23 -07:00
indexing_pipeline feat(chunker): add table-aware chunk_text_hybrid to prevent mid-row table splits 2026-05-05 12:48:04 +08:00
observability chore: cleaned comments slop 2026-04-28 23:52:37 -07:00
prompts feat: add PDF preview and export functionality for Typst-based reports, enhance report content handling 2026-04-15 21:11:27 +05:30
retriever feat: made agent file sytem optimized 2026-03-28 16:39:46 -07:00
routes feat: moved chat persistance to Server Side 2026-05-04 03:06:15 -07:00
schemas feat: moved chat persistance to Server Side 2026-05-04 03:06:15 -07:00
services feat: implement agent caches and fix invalid prompt cache configs 2026-05-03 06:03:40 -07:00
tasks feat: moved chat persistance to Server Side 2026-05-04 03:06:15 -07:00
templates feat: update report generation and export capabilities to support multiple formats (PDF, DOCX, HTML, LaTeX, EPUB, ODT, plain text) across documentation and backend 2026-03-09 18:41:21 -07:00
utils chore: linting 2026-04-28 21:37:51 -07:00
__init__.py feat: SurfSense v0.0.6 init 2025-03-14 18:53:14 -07:00
app.py feat: add CORS preflight response caching for 24 hours 2026-05-04 19:55:19 -07:00
celery_app.py feat: unified credits and its cost calculations 2026-05-02 14:34:23 -07:00
db.py feat: moved chat persistance to Server Side 2026-05-04 03:06:15 -07:00
exceptions.py feat: add processing mode support for document uploads and ETL pipeline, improded error handling ux 2026-04-14 21:26:00 -07:00
rate_limiter.py try: ip fix for cludflare 2026-04-16 02:13:52 -07:00
users.py Seed default prompts on registration and for existing users 2026-03-31 18:12:09 +02:00