SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-06-18 21:15:16 +02:00

Author	SHA1	Message	Date
okxint	a12cd21f2f	fix(image-gen): resolve relative URLs returned by Xinference and compatible backends Some OpenAI-compatible image backends (e.g. Xinference) return a relative URL like /files/image.png in data[0].url instead of an absolute one. Browsers cannot resolve these, causing images to fail to load. Track the provider's api_base after resolving model config via to_litellm(). When the returned URL starts with "/", extract the origin (scheme + host + port) from api_base and prepend it to produce a full absolute URL. No behaviour change for providers that return absolute URLs (OpenAI, Azure, etc). Closes #1496	2026-06-17 10:57:39 +05:30
Dmitry Maranik	e1ea82d7cf	fix(connectors): scope index endpoint authorization to the connector's own search space The POST /search-source-connectors/{connector_id}/index endpoint loaded the connector by id and then called check_permission() against the client-supplied search_space_id query parameter (the caller's own space) rather than the connector's own search_space_id, and never verified that the two matched. A user could therefore index another user's connector by passing their own search_space_id: the indexer ran with the victim connector's stored credentials and wrote the fetched content into the attacker's search space. The read/update/delete handlers already authorize against connector.search_space_id; this brings the index handler in line. Reject a connector that does not belong to the requested search space (404, to avoid disclosing connectors in other spaces) and authorize the permission check against connector.search_space_id.	2026-06-16 15:58:30 -07:00
Anish Sarkar	9b7e278114	refactor(config): update GATEWAY_ENABLED variable to FALSE and adjust related configurations for improved messaging gateway handling	2026-06-16 23:49:26 +05:30
Anish Sarkar	2a840fcc10	refactor(backend): derive frontend and backend urls from SURFSENSE_PUBLIC_URL	2026-06-16 02:10:50 +05:30
Rohan Verma	69bdcf5946	Merge pull request #1491 from AnishSarkar22/feat/unified-model-connections feat: Fix model attribution for prefix-stripped token usage callbacks	2026-06-14 17:50:48 -07:00
CREDO23	32a6e54ce6	Merge remote-tracking branch 'upstream/dev' into features/documents-injestion-layered-cached	2026-06-14 11:30:33 +02:00
Anish Sarkar	d9a4f14f99	feat(token-tracking): enhance model metadata reconciliation by adding bare model name handling	2026-06-14 12:18:22 +05:30
Anish Sarkar	7926814070	refactor(model-connections): remove unused fields and update verification logic	2026-06-14 02:46:19 +05:30
Anish Sarkar	c7409c8995	chore: ran linting	2026-06-13 21:59:35 +05:30
Anish Sarkar	ceace003aa	feat(local-models): add documentation for connecting local model providers	2026-06-13 21:52:45 +05:30
Anish Sarkar	ab5423d2d2	Merge remote-tracking branch 'upstream/dev' into feat/unified-model-connections	2026-06-13 19:04:49 +05:30
Anish Sarkar	76843f42f1	refactor(anonymous-models): remove description field from anonymous model responses and update related UI components	2026-06-13 16:30:26 +05:30
Anish Sarkar	576c56628a	chore(config): update global LLM configuration example with improved setup instructions, parameter naming, and enhanced comments for clarity	2026-06-13 14:57:14 +05:30
Anish Sarkar	4a6a282a46	feat(runtime-cooldown): implement Redis-based shared cooldown management for model selection	2026-06-13 13:53:01 +05:30
Anish Sarkar	bd4a04f2e7	feat(database-migrations): add migration to remove legacy model config tables and remove stale model connection code	2026-06-13 12:45:43 +05:30
Anish Sarkar	8fe9c21e76	feat(token-tracking): add model metadata registration and enhance token usage tracking	2026-06-13 03:08:35 +05:30
Anish Sarkar	5e86885a03	feat(model-connections): integrate model provider connections panel and connection card components	2026-06-13 02:40:22 +05:30
Anish Sarkar	15d9983669	feat(model-connections): enhance model selection facts and auto pinning logic	2026-06-13 02:19:27 +05:30
Anish Sarkar	45d27ba879	feat(model-connections): enhance auto mode with auto pinning	2026-06-13 01:39:26 +05:30
Anish Sarkar	9f6210ad08	feat(model-connections): add test preview functionality for model connections	2026-06-13 00:12:04 +05:30
CREDO23	dcebfc4756	Merge remote-tracking branch 'upstream/dev' into features/documents-injestion-layered-cached	2026-06-12 19:35:34 +02:00
Anish Sarkar	55f004e1da	feat(model-connections): improve model discovery error handling and enhance UI components	2026-06-12 22:50:50 +05:30
Anish Sarkar	407f2a9612	feat(model-connections): enhance model connection functionality with preview and selection features	2026-06-12 22:41:21 +05:30
CREDO23	052e9ef4d1	refactor(chunks): order chunk reads by (document_id, position) Presentation and citation ordering moves off Chunk.id/created_at to the explicit position column (id kept as tiebreaker). Vector and ts_rank ranking order_by clauses are untouched.	2026-06-12 18:53:21 +02:00
CREDO23	5a71769dba	fix(chunks): set position on remaining chunk insert paths document_converters, the github size-fallback chunker, revert_service restores, and the kb-persistence middleware now write explicit positions (the middleware read path also orders by position).	2026-06-12 18:53:08 +02:00
CREDO23	7d55aaf2c1	feat(indexing): reconcile chunks incrementally on re-index index() now loads existing rows and applies a content diff instead of delete-all/reinsert-all: unchanged chunks keep their rows and embeddings (zero HNSW/GIN churn), moved chunks get a position-only UPDATE, and only new texts are embedded, batched with the summary embedding. First index keeps the cache-aware build_chunk_embeddings path.	2026-06-12 18:53:08 +02:00
CREDO23	fd495e1b2f	feat(observability): add chunk reconcile metric and kill-switch flag surfsense.indexing.reconcile.chunks counts reused/embedded/deleted chunks per re-index. CHUNK_RECONCILE_ENABLED (default on) falls back to delete-all + full re-embed if the diff path ever misbehaves.	2026-06-12 18:52:57 +02:00
CREDO23	8d413ea5c2	refactor(indexing): expose chunk_markdown and embed_batch helpers Split _compute so the incremental edit path can reuse the exact same chunker selection and embedding entry points (and their test patch targets) without going through the doc-level cache.	2026-06-12 18:52:57 +02:00
CREDO23	f82dedf712	feat(indexing): add pure chunk reconciler for content-addressed diffs Greedy multiset match on chunk text decides which rows keep their embeddings, which texts need embedding, and which rows are deleted. No DB, no embeddings; fully unit-tested (reuse, head insert, middle edit, deletion, duplicates, reorder, full rewrite).	2026-06-12 18:52:46 +02:00
CREDO23	c6e71c851c	feat(chunks): add explicit position column with backfill migration Chunk ids stop reflecting document order once incremental re-indexing keeps unchanged rows across edits. Backfill preserves the historical id ordering so behavior is identical on day one.	2026-06-12 18:52:45 +02:00
CREDO23	91d947ff79	refactor(embedding-cache): rename index cache to embedding cache The cached payload is the indexing pipeline's embeddings (markdown is chunked then embedded), so "embedding cache" names the expensive output directly and removes the "index" ambiguity (DB index vs vector index vs indexing phase). Renames the service, settings, eligibility, eviction task, metrics, config flags (INDEX_CACHE_* -> EMBEDDING_CACHE_*), object prefix, and the table (index_cache_embedding_sets -> embedding_cache_sets) with its constraint and indexes. Migration 161 renamed accordingly.	2026-06-12 17:00:01 +02:00
CREDO23	4e4f7f34fa	feat(index-cache): add TTL/size eviction task and daily schedule	2026-06-12 16:48:18 +02:00
CREDO23	019aa7bf76	feat(index-cache): serve chunk embeddings from cache during indexing	2026-06-12 16:48:18 +02:00
CREDO23	e8938c119b	feat(index-cache): add recall/remember service	2026-06-12 16:48:10 +02:00
CREDO23	4d6378e031	feat(observability): add index cache hit/miss and eviction metrics	2026-06-12 16:48:10 +02:00
CREDO23	daccd304ee	feat(index-cache): add settings, eligibility, and config flags	2026-06-12 16:48:10 +02:00
CREDO23	ad6da7c6af	feat(index-cache): add embedding blob store sharing the cache backend	2026-06-12 16:48:01 +02:00
CREDO23	f541114544	feat(index-cache): add cached embedding set table and repository	2026-06-12 16:48:01 +02:00
CREDO23	59fa4c38c3	feat(index-cache): add pickle-free blob serialization	2026-06-12 16:48:01 +02:00
CREDO23	cf208365b4	feat(index-cache): add embedding set value objects	2026-06-12 16:48:01 +02:00
CREDO23	0fb1d3d37b	feat(etl-cache): route all file-based sources through the parse cache Every file ingestion path (Dropbox, Google Drive / Composio Drive, OneDrive, local folder, Obsidian, and the legacy upload handlers) now parses via the extract_with_cache facade instead of calling EtlPipelineService.extract directly, so identical bytes are deduplicated globally regardless of source. vision_llm is passed through, keeping the existing cacheability gate intact.	2026-06-12 14:47:25 +02:00
CREDO23	0808fbcdee	feat(etl-cache): emit hit/miss and eviction metrics	2026-06-12 11:57:03 +02:00
CREDO23	9efe24879d	feat(observability): add etl cache lookup and eviction metrics	2026-06-12 11:57:03 +02:00
CREDO23	ce1e90386f	refactor(etl-cache): extract pure cacheability gate	2026-06-12 11:50:51 +02:00
CREDO23	0dc2ccc003	feat(tasks): route extraction through etl cache	2026-06-12 11:23:50 +02:00
CREDO23	1c05980ffb	feat(celery): schedule etl cache eviction	2026-06-12 11:23:50 +02:00
CREDO23	9f29a885b1	feat(db): register CachedParse model	2026-06-12 11:23:50 +02:00
CREDO23	5c4eec26cc	feat(config): add ETL_CACHE_* settings	2026-06-12 11:23:50 +02:00
CREDO23	324ba141a6	feat(etl-cache): add eviction task and public API	2026-06-12 11:23:40 +02:00
CREDO23	7ad39fd995	feat(etl-cache): add eviction policy	2026-06-12 11:23:40 +02:00

1 2 3 4 5 ...

2305 commits