Commit graph

2252 commits

Author SHA1 Message Date
CREDO23
8d413ea5c2 refactor(indexing): expose chunk_markdown and embed_batch helpers
Split _compute so the incremental edit path can reuse the exact same chunker
selection and embedding entry points (and their test patch targets) without
going through the doc-level cache.
2026-06-12 18:52:57 +02:00
CREDO23
f82dedf712 feat(indexing): add pure chunk reconciler for content-addressed diffs
Greedy multiset match on chunk text decides which rows keep their embeddings,
which texts need embedding, and which rows are deleted. No DB, no embeddings;
fully unit-tested (reuse, head insert, middle edit, deletion, duplicates,
reorder, full rewrite).
2026-06-12 18:52:46 +02:00
CREDO23
c6e71c851c feat(chunks): add explicit position column with backfill migration
Chunk ids stop reflecting document order once incremental re-indexing keeps
unchanged rows across edits. Backfill preserves the historical id ordering
so behavior is identical on day one.
2026-06-12 18:52:45 +02:00
CREDO23
91d947ff79 refactor(embedding-cache): rename index cache to embedding cache
The cached payload is the indexing pipeline's embeddings (markdown is
chunked then embedded), so "embedding cache" names the expensive output
directly and removes the "index" ambiguity (DB index vs vector index vs
indexing phase). Renames the service, settings, eligibility, eviction
task, metrics, config flags (INDEX_CACHE_* -> EMBEDDING_CACHE_*), object
prefix, and the table (index_cache_embedding_sets -> embedding_cache_sets)
with its constraint and indexes. Migration 161 renamed accordingly.
2026-06-12 17:00:01 +02:00
CREDO23
4e4f7f34fa feat(index-cache): add TTL/size eviction task and daily schedule 2026-06-12 16:48:18 +02:00
CREDO23
019aa7bf76 feat(index-cache): serve chunk embeddings from cache during indexing 2026-06-12 16:48:18 +02:00
CREDO23
e8938c119b feat(index-cache): add recall/remember service 2026-06-12 16:48:10 +02:00
CREDO23
4d6378e031 feat(observability): add index cache hit/miss and eviction metrics 2026-06-12 16:48:10 +02:00
CREDO23
daccd304ee feat(index-cache): add settings, eligibility, and config flags 2026-06-12 16:48:10 +02:00
CREDO23
ad6da7c6af feat(index-cache): add embedding blob store sharing the cache backend 2026-06-12 16:48:01 +02:00
CREDO23
f541114544 feat(index-cache): add cached embedding set table and repository 2026-06-12 16:48:01 +02:00
CREDO23
59fa4c38c3 feat(index-cache): add pickle-free blob serialization 2026-06-12 16:48:01 +02:00
CREDO23
cf208365b4 feat(index-cache): add embedding set value objects 2026-06-12 16:48:01 +02:00
CREDO23
0fb1d3d37b feat(etl-cache): route all file-based sources through the parse cache
Every file ingestion path (Dropbox, Google Drive / Composio Drive, OneDrive,
local folder, Obsidian, and the legacy upload handlers) now parses via the
extract_with_cache facade instead of calling EtlPipelineService.extract
directly, so identical bytes are deduplicated globally regardless of source.
vision_llm is passed through, keeping the existing cacheability gate intact.
2026-06-12 14:47:25 +02:00
CREDO23
0808fbcdee feat(etl-cache): emit hit/miss and eviction metrics 2026-06-12 11:57:03 +02:00
CREDO23
9efe24879d feat(observability): add etl cache lookup and eviction metrics 2026-06-12 11:57:03 +02:00
CREDO23
ce1e90386f refactor(etl-cache): extract pure cacheability gate 2026-06-12 11:50:51 +02:00
CREDO23
0dc2ccc003 feat(tasks): route extraction through etl cache 2026-06-12 11:23:50 +02:00
CREDO23
1c05980ffb feat(celery): schedule etl cache eviction 2026-06-12 11:23:50 +02:00
CREDO23
9f29a885b1 feat(db): register CachedParse model 2026-06-12 11:23:50 +02:00
CREDO23
5c4eec26cc feat(config): add ETL_CACHE_* settings 2026-06-12 11:23:50 +02:00
CREDO23
324ba141a6 feat(etl-cache): add eviction task and public API 2026-06-12 11:23:40 +02:00
CREDO23
7ad39fd995 feat(etl-cache): add eviction policy 2026-06-12 11:23:40 +02:00
CREDO23
758da06c4f feat(etl-cache): add extract_with_cache 2026-06-12 11:23:40 +02:00
CREDO23
41dea96af4 feat(etl-cache): add EtlCacheService 2026-06-12 11:23:40 +02:00
CREDO23
87fdb37fa3 feat(etl-cache): expose storage layer 2026-06-12 11:23:40 +02:00
CREDO23
a6f2457c7c feat(etl-cache): add MarkdownCacheStore for cache blobs 2026-06-12 11:22:57 +02:00
CREDO23
217d040e9e feat(etl-cache): resolve cache blob storage backend 2026-06-12 11:22:57 +02:00
CREDO23
d9b1b491e9 feat(etl-cache): add cache blob object-key builder 2026-06-12 11:22:57 +02:00
CREDO23
8d3238bcd1 feat(etl-cache): expose cache persistence layer 2026-06-12 11:22:57 +02:00
CREDO23
ea10127979 feat(etl-cache): add CachedParseRepository data access 2026-06-12 11:22:57 +02:00
CREDO23
c624235780 feat(etl-cache): add CachedParse table model 2026-06-12 11:22:48 +02:00
CREDO23
205a63b9bc feat(etl-cache): add EtlCacheSettings resolved from config 2026-06-12 11:22:48 +02:00
CREDO23
b84debd999 feat(etl-cache): expose cache schema value objects 2026-06-12 11:22:48 +02:00
CREDO23
3c9ea0011d feat(etl-cache): add EvictionCandidate value object 2026-06-12 11:22:48 +02:00
CREDO23
24f824b597 feat(etl-cache): add ParseKey cache identity value object 2026-06-12 11:22:48 +02:00
DESKTOP-RTLN3BA\$punk
c855be8ccd fix(auto_reload): update task to use a lambda for user_id in async call 2026-06-11 16:51:18 -07:00
DESKTOP-RTLN3BA\$punk
cff721aa42 feat(migration): evolve podcast lifecycle by detaching from zero_publication and updating column handling 2026-06-11 16:17:14 -07:00
DESKTOP-RTLN3BA\$punk
05190da0a9 chore: linting 2026-06-11 15:31:43 -07:00
CREDO23
7b30a76856 fix(gitignore): anchor data/ rule; track podcast voice catalogs 2026-06-12 00:06:37 +02:00
CREDO23
41f4a58663 Merge remote-tracking branch 'upstream/dev' into improvement-podcast-graph
# Conflicts:
#	surfsense_backend/app/tasks/celery_tasks/podcast_tasks.py
2026-06-11 23:14:49 +02:00
DESKTOP-RTLN3BA\$punk
c3695e7837 feat: update auto-reload settings and enhance payment session creation
- Added currency parameter to the Stripe checkout session for auto-reload setup.
- Integrated AutoReloadSettings component into the BuyMorePage for improved user experience.
- Removed deprecated AutoReloadSettings component from user settings directory.
- Updated import paths for AutoReloadSettings in purchases page to reflect new structure.
2026-06-11 13:29:40 -07:00
CREDO23
ca9b157676 fix(podcasts): keep legacy episodes readable and guard regenerate 2026-06-11 12:43:07 +02:00
CREDO23
aa7f14d94f feat(podcasts): add revert-regeneration and surface cancel on the live card 2026-06-11 12:31:42 +02:00
CREDO23
f0fc660d70 feat(podcasts): constrain monologue briefs to a single speaker 2026-06-11 11:56:57 +02:00
CREDO23
eb56acc407 refactor(podcasts): regenerate via brief gate, render brief inline in chat 2026-06-11 11:45:17 +02:00
CREDO23
11a6b178a0 refactor(podcasts): drop transcript gate, add regenerate-from-ready and voice previews 2026-06-11 10:42:13 +02:00
DESKTOP-RTLN3BA\$punk
65e511f77b feat: enhance credit management and user experience
- Updated database queries to check for column existence with schema context.
- Modified credit purchase quantity limits to allow up to 10,000 credits.
- Improved user interface for credit purchases, enabling custom amounts and clamping input values.
- Adjusted FAQ content to clarify credit purchasing process.
2026-06-10 22:52:27 -07:00
DESKTOP-RTLN3BA\$punk
a7407502d3 feat(refactor): refactor payment system to implement unified credit wallet.
- Updated environment variables and - configurations for credit purchases via Stripe, replacing legacy page pack system.
- Introduced auto-reload feature for credit top-ups and modified database models to track credit transactions.
- Updated notification system to handle insufficient credits and auto-reload failures.
- Adjusted API routes and schemas to reflect changes in credit management.
2026-06-10 16:49:03 -07:00
CREDO23
97ab7a88fd refactor(podcasts): remove legacy podcaster agent, task, and schema 2026-06-10 21:45:04 +02:00