Commit graph

6848 commits

Author SHA1 Message Date
CREDO23
a0046483a9 test: assert chunker routing via use_code_chunker flag 2026-06-18 20:06:33 +02:00
CREDO23
03012c3077 test: span-aware paragraph chunker fixture 2026-06-18 20:06:33 +02:00
CREDO23
a7cf9bd946 test: mock span chunker in reindex test 2026-06-18 20:06:33 +02:00
CREDO23
12e948cad1 test: mock span chunker in integration fixtures 2026-06-18 20:06:33 +02:00
CREDO23
60fff66ee0 test: verify chunk span persistence on index 2026-06-18 20:06:33 +02:00
CREDO23
94229213f4 test: cover span chunker invariants 2026-06-18 20:06:33 +02:00
CREDO23
65b7d1b01a chore: bump embedding cache chunker version to 2 2026-06-18 20:06:26 +02:00
CREDO23
c57ee978e6 feat: persist and refresh chunk char spans on index 2026-06-18 20:06:26 +02:00
CREDO23
1e33c28c24 feat: carry char spans on existing chunks 2026-06-18 20:06:26 +02:00
CREDO23
55491fef9d refactor: make embedding cache span-aware 2026-06-18 20:06:26 +02:00
CREDO23
0ab773cbcd feat: add lossless span-aware chunk_markdown_with_spans 2026-06-18 20:06:26 +02:00
CREDO23
1048490ba8 feat: migrate chunks with start_char/end_char columns 2026-06-18 20:06:26 +02:00
CREDO23
b89f242a89 feat: add start_char/end_char span columns to chunk model 2026-06-18 20:06:26 +02:00
CREDO23
b446897638 test: editor read paths never reconstruct body from chunks 2026-06-18 19:23:49 +02:00
CREDO23
b0a0eb7f9c fix: editor routes serve source_markdown only, never rebuild from chunks 2026-06-18 19:23:49 +02:00
DESKTOP-RTLN3BA\$punk
0c50161e92 Merge commit 'c941907448' into dev 2026-06-18 09:15:12 -07:00
DESKTOP-RTLN3BA\$punk
1d101f5bec feat: add position column to chunks for explicit document order
- Introduced a new `position` column in the `chunks` table to maintain explicit document order during re-indexing.
- Updated migration to add the column without backfilling historical rows to avoid performance issues on large tables.
- Adjusted the `Chunk` model to reflect the new column without indexing, as ordering reads are document-scoped.
2026-06-18 08:55:47 -07:00
Rohan Verma
c941907448
Merge pull request #1509 from MODSetter/dev
feat(release: 0.0.29): ETL/embedding caches, unified model connections, reverse-proxy support, podcast & indexing improvements
2026-06-17 23:46:24 -07:00
DESKTOP-RTLN3BA\$punk
0729e5a915 chore: linting 2026-06-17 23:40:53 -07:00
DESKTOP-RTLN3BA\$punk
c9afeb2817 feat: fix onboarding trigger
- Introduced a new endpoint to check the existence of a global LLM configuration file.
- Updated the frontend to utilize this status, affecting onboarding flow and user experience.
- Added necessary atoms and types for managing global LLM config status in the application state.
- Refactored navigation to ensure proper routing based on the global config status.
2026-06-17 23:30:56 -07:00
DESKTOP-RTLN3BA\$punk
55f91a29d5 chore: linting 2026-06-17 22:31:36 -07:00
DESKTOP-RTLN3BA\$punk
c6d42fc7c8 feat: bumped version to 0.0.29 2026-06-17 22:29:50 -07:00
DESKTOP-RTLN3BA\$punk
4e5c13f60a feat: readded google signins and add global announcement feature
- Updated .env.example to include new environment variables for Google authentication and global announcement settings.
- Integrated Google sign-in functionality in SignInButton and HeroSection components, allowing users to log in with their Google accounts.
- Added GlobalAnnouncement component to display maintenance notices or announcements on the homepage layout.
- Enhanced styling for Google sign-in buttons to improve user experience.
2026-06-17 21:29:14 -07:00
DESKTOP-RTLN3BA\$punk
b89866541e chore: linting 2026-06-17 20:50:07 -07:00
DESKTOP-RTLN3BA\$punk
4b8a2f9726 Merge commit '77688ac80c' into dev 2026-06-17 20:47:02 -07:00
Rohan Verma
6a45f24f98
Merge pull request #1508 from CREDO23/fix/indexing-batch-chunk-insert
fix(indexing): batch chunk inserts and truncate notification titles
2026-06-17 15:26:02 -07:00
CREDO23
2db1615195 test long filename document processing notifications 2026-06-17 15:06:05 +02:00
CREDO23
79f9bd182b test truncated document processing titles 2026-06-17 15:06:05 +02:00
CREDO23
e195fb77c5 test format_title helper 2026-06-17 15:06:05 +02:00
CREDO23
6d1879ffcb continue indexing when notification creation fails 2026-06-17 15:06:05 +02:00
CREDO23
e37b9b5e31 use started_title in document processing handler 2026-06-17 15:06:05 +02:00
CREDO23
5d3079c2e6 truncate document processing notification titles 2026-06-17 15:06:05 +02:00
CREDO23
a987ef81b2 add format_title helper for notification titles 2026-06-17 15:06:05 +02:00
CREDO23
5d20cf7c03 add notification TITLE_MAX_LENGTH constant 2026-06-17 15:06:05 +02:00
CREDO23
aee0f1ef7d add persist_scratch_index unit tests 2026-06-17 14:59:24 +02:00
CREDO23
a8a1f01945 update index batch parallel tests 2026-06-17 14:59:24 +02:00
CREDO23
aca23b4731 wire persist_scratch_index into scratch reindex 2026-06-17 14:59:24 +02:00
CREDO23
34de6c6f87 batch chunk inserts in persist_scratch_index 2026-06-17 14:59:24 +02:00
CREDO23
220d9c4fbb add INDEXING_CHUNK_INSERT_BATCH_SIZE config 2026-06-17 14:59:19 +02:00
Rohan Verma
77688ac80c
Merge pull request #1507 from AnishSarkar22/fix/documents-editor
feat(editor): update editor limits and add error boundary
2026-06-17 00:35:25 -07:00
Anish Sarkar
4658130bb8 feat(editor): update editor limits and add error boundary
- Reduced maximum document size for the editor from 5MB to 1MB.
- Introduced a new line limit of 5000 for documents in the editor.
- Implemented a PlateErrorBoundary component to handle rendering errors gracefully in the editor panel.
- Updated logic in the editor panel to check both size and line count for document limits.
2026-06-17 12:11:31 +05:30
DESKTOP-RTLN3BA\$punk
0fe650fd8e Merge commit '7ce409c580' into dev 2026-06-16 22:48:14 -07:00
Rohan Verma
f75878f907
Merge pull request #1506 from okxint/fix/xinference-relative-image-url
fix(image-gen): resolve relative URLs returned by Xinference and compatible backends
2026-06-16 22:41:52 -07:00
okxint
a12cd21f2f fix(image-gen): resolve relative URLs returned by Xinference and compatible backends
Some OpenAI-compatible image backends (e.g. Xinference) return a relative
URL like /files/image.png in data[0].url instead of an absolute one.
Browsers cannot resolve these, causing images to fail to load.

Track the provider's api_base after resolving model config via to_litellm().
When the returned URL starts with "/", extract the origin (scheme + host + port)
from api_base and prepend it to produce a full absolute URL.

No behaviour change for providers that return absolute URLs (OpenAI, Azure, etc).

Closes #1496
2026-06-17 10:57:39 +05:30
Rohan Verma
a49103870b
Merge pull request #1503 from dmitrymaranik/fix/connector-index-cross-tenant-authz
fix(connectors): scope index endpoint authorization to the connector's own search space
2026-06-16 17:01:13 -07:00
Rohan Verma
7ce409c580
Merge pull request #1502 from MODSetter/fix/db-startup-index-lock-hang
hotpatch: Fix/db startup index lock hang
2026-06-16 16:28:38 -07:00
DESKTOP-RTLN3BA\$punk
b9702b3245 chore: linting 2026-06-16 16:27:16 -07:00
DESKTOP-RTLN3BA\$punk
da64433439 fix(db): reap orphaned idle-in-transaction sessions on the Celery engine
The long-running ingestion/podcast/video tasks run on a separate Celery
engine (NullPool), so the web engine's idle_in_transaction_session_timeout
did not cover them — which is exactly where the original 11h zombie
(INSERT INTO chunks) came from. Apply the same protection to the Celery
engine with a generous 60-minute default so a worker that hangs/crashes
mid-transaction can't hold locks on documents/chunks indefinitely, while
never reaping a legitimate per-document embed window.

- config + .env.example: DB_CELERY_IDLE_IN_TX_TIMEOUT_MS (default 3600000).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-16 16:26:04 -07:00
DESKTOP-RTLN3BA\$punk
89cc3b37ee fix(db): prevent boot-time index DDL from hanging FastAPI startup
A single abandoned "idle in transaction" session held locks on the
documents table, which blocked the non-concurrent CREATE INDEX (hnsw)
run inside the FastAPI lifespan. Each API restart queued another
CREATE INDEX behind an advisory lock, leaving the server stuck at
"Waiting for application startup." indefinitely and freezing ingestion
writes.

Changes:
- setup_indexes(): build every index with CREATE INDEX CONCURRENTLY
  (non-blocking ShareUpdateExclusiveLock) under a per-session
  lock_timeout, and make each statement non-fatal so a contended/slow
  build is retried next boot instead of wedging startup. Drop leftover
  invalid indexes before rebuilding.
- create_db_and_tables(): apply lock_timeout to extension/create_all
  DDL and gate the whole bootstrap behind DB_BOOTSTRAP_ON_STARTUP.
- engine: set idle_in_transaction_session_timeout (asyncpg) so an
  abandoned transaction is reaped automatically.
- config + .env.example: DB_BOOTSTRAP_ON_STARTUP, DB_DDL_LOCK_TIMEOUT_MS,
  DB_IDLE_IN_TX_TIMEOUT_MS.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-16 16:18:49 -07:00
Dmitry Maranik
81fc467187 test(connectors): regression tests for cross-search-space index authorization
Two integration tests pinning the connector index endpoint's authorization:

- cross-space index (attacker owns space B, connector lives in victim's
  space A, request passes search_space_id=B) is rejected with 404 at the
  search-space reconciliation, before the permission check (which would
  otherwise pass for the attacker's own space).
- same-space index authorizes check_permission against the connector's
  own search space, not the caller-supplied query param.

Mirrors the existing tests/integration harness (direct handler calls with
the savepoint-rolled-back db_session; check_permission patched so the test
needs no real RBAC wiring).
2026-06-16 16:18:40 -07:00