SurfSense/surfsense_backend
guangyang1206 2f3a33c9d5 feat(chunker): add table-aware chunk_text_hybrid to prevent mid-row table splits
Document_chunker currently splits Markdown tables mid-row when the table is
larger than a single chunk window, producing garbled rows that are useless
for RAG retrieval (issue #1334).

Changes:
- document_chunker.py: add chunk_text_hybrid() that detects Markdown table
  blocks with a regex, emits each table as an indivisible single chunk, and
  feeds the surrounding prose through the normal chunk_text() chunker.
- indexing_pipeline_service.py: route normal (non-code) documents through
  chunk_text_hybrid instead of chunk_text so tables are protected by default.

Fixes #1334
2026-05-05 12:48:04 +08:00
..
alembic feat: moved chat persistance to Server Side 2026-05-04 03:06:15 -07:00
app feat(chunker): add table-aware chunk_text_hybrid to prevent mid-row table splits 2026-05-05 12:48:04 +08:00
scripts feat: fixed vision/image provider specific errors and fixed podcast/video streaming 2026-05-02 19:18:53 -07:00
tests feat: moved chat persistance to Server Side 2026-05-04 03:06:15 -07:00
.dockerignore feat: Added Docker Support and missing dependencies. 2025-03-20 18:52:06 -07:00
.env.example feat: implement agent caches and fix invalid prompt cache configs 2026-05-03 06:03:40 -07:00
.gitignore feat: init video presentation agent 2026-03-21 22:13:41 -07:00
.python-version feat: SurfSense v0.0.6 init 2025-03-14 18:53:14 -07:00
alembic.ini add github connector, add alembic for db migrations, fix bug updating connectors 2025-04-13 13:56:22 -07:00
celery_worker.py fix: celery_app path and gmail indexing 2025-10-21 21:11:41 -07:00
Dockerfile fix: docker issues 2026-05-03 00:39:27 -07:00
main.py feat: added configable summary calculation and various improvements 2026-02-26 18:24:57 -08:00
pyproject.toml feat: version bump 2026-05-04 03:16:15 -07:00
uv.lock feat: version bump 2026-05-04 03:16:15 -07:00