Commit graph

463 commits

Author SHA1 Message Date
CREDO23
dcebfc4756 Merge remote-tracking branch 'upstream/dev' into features/documents-injestion-layered-cached 2026-06-12 19:35:34 +02:00
CREDO23
311570b4f0 test(indexing): cover the edit path and make integration caches hermetic
Real-DB tests assert unchanged chunk rows survive edits, only new text is
embedded, removed rows are deleted with positions compacted, and the kill
switch restores full-replace. An autouse fixture disables the ETL/embedding
caches so a developer's .env can't leak cache hits into unrelated tests.
2026-06-12 18:53:21 +02:00
CREDO23
f82dedf712 feat(indexing): add pure chunk reconciler for content-addressed diffs
Greedy multiset match on chunk text decides which rows keep their embeddings,
which texts need embedding, and which rows are deleted. No DB, no embeddings;
fully unit-tested (reuse, head insert, middle edit, deletion, duplicates,
reorder, full rewrite).
2026-06-12 18:52:46 +02:00
CREDO23
412493ae08 test(embedding-cache): add integration tests for service, repository, and store
Covers the public cache surface against real Postgres and a real local file
backend (no mocks): recall miss, remember->recall vector/text/order round-trip,
the dimension-mismatch refusal, the repository SQL behind eviction and dedup
(size sum, coldest ordering, TTL cutoff, duplicate-key no-op, reuse counter),
and the blob store save/load round-trip and delete.
2026-06-12 17:33:21 +02:00
CREDO23
91d947ff79 refactor(embedding-cache): rename index cache to embedding cache
The cached payload is the indexing pipeline's embeddings (markdown is
chunked then embedded), so "embedding cache" names the expensive output
directly and removes the "index" ambiguity (DB index vs vector index vs
indexing phase). Renames the service, settings, eligibility, eviction
task, metrics, config flags (INDEX_CACHE_* -> EMBEDDING_CACHE_*), object
prefix, and the table (index_cache_embedding_sets -> embedding_cache_sets)
with its constraint and indexes. Migration 161 renamed accordingly.
2026-06-12 17:00:01 +02:00
CREDO23
8cf578d965 test(index-cache): add unit tests and repoint embed/chunk patch targets 2026-06-12 16:48:18 +02:00
CREDO23
99cf212c31 test: fix auth-mode mismatch and stale QuotaInsufficientError kwargs
Pin AUTH_TYPE=LOCAL (and REGISTRATION_ENABLED=TRUE) in the test bootstrap so
the email/password auth routers mount during integration tests regardless of a
developer's .env=GOOGLE; without this the upload tests 404 on registration.
Also update three tests to the current QuotaInsufficientError signature
(balance_micros) after used_micros/limit_micros were removed.
2026-06-12 12:19:49 +02:00
CREDO23
d5e0280097 test(etl-cache): cover two-phase eviction task on real infra 2026-06-12 11:54:36 +02:00
CREDO23
1460173dad test(etl-cache): cover extract_with_cache end-to-end 2026-06-12 11:50:57 +02:00
CREDO23
c49a0f1233 test(etl-cache): cover store, service, and repository on real infra 2026-06-12 11:50:57 +02:00
CREDO23
3dec3231d0 test(etl-cache): cover over-budget eviction selection 2026-06-12 11:50:52 +02:00
CREDO23
a3e7047c35 test(etl-cache): cover cacheability gate rules 2026-06-12 11:50:52 +02:00
CREDO23
dddacbe762 test(etl-cache): cover content-addressing dedup and key shape 2026-06-12 11:50:52 +02:00
Rohan Verma
4c28ba5295
Merge pull request #1487 from CREDO23/improvement-podcast-graph
[Feat] Podcast: Backend-owned language offering for the brief form
2026-06-12 00:58:02 -07:00
CREDO23
0c7e5dee8b test(podcast): align quota error kwargs with wallet refactor 2026-06-12 07:38:38 +02:00
CREDO23
402ae6befe test(podcast): languages endpoint 2026-06-12 07:38:38 +02:00
CREDO23
a19b7dd8e0 test(podcast): offerable languages catalog rules 2026-06-12 07:38:38 +02:00
DESKTOP-RTLN3BA\$punk
05190da0a9 chore: linting 2026-06-11 15:31:43 -07:00
CREDO23
41f4a58663 Merge remote-tracking branch 'upstream/dev' into improvement-podcast-graph
# Conflicts:
#	surfsense_backend/app/tasks/celery_tasks/podcast_tasks.py
2026-06-11 23:14:49 +02:00
CREDO23
ca9b157676 fix(podcasts): keep legacy episodes readable and guard regenerate 2026-06-11 12:43:07 +02:00
CREDO23
aa7f14d94f feat(podcasts): add revert-regeneration and surface cancel on the live card 2026-06-11 12:31:42 +02:00
CREDO23
f0fc660d70 feat(podcasts): constrain monologue briefs to a single speaker 2026-06-11 11:56:57 +02:00
CREDO23
eb56acc407 refactor(podcasts): regenerate via brief gate, render brief inline in chat 2026-06-11 11:45:17 +02:00
CREDO23
11a6b178a0 refactor(podcasts): drop transcript gate, add regenerate-from-ready and voice previews 2026-06-11 10:42:13 +02:00
CREDO23
c84525897b test(podcasts): relocate stateful tests to integration
Move the lifecycle service, Celery task bodies, and mark_failed coverage out of
DB-faking unit tests and into integration tests against a real Postgres, faking
only true externals (broker, object store, TTS, ffmpeg, billing, LLM). Add HTTP
slices for cancel, voices, scoping, and public-chat streaming. The unit tier is
now fake-free pure logic with no session doubles.
2026-06-11 06:27:00 +02:00
DESKTOP-RTLN3BA\$punk
a7407502d3 feat(refactor): refactor payment system to implement unified credit wallet.
- Updated environment variables and - configurations for credit purchases via Stripe, replacing legacy page pack system.
- Introduced auto-reload feature for credit top-ups and modified database models to track credit transactions.
- Updated notification system to handle insufficient credits and auto-reload failures.
- Adjusted API routes and schemas to reflect changes in credit management.
2026-06-10 16:49:03 -07:00
CREDO23
8f38737ad9 test(podcasts): retarget celery and observability tests to new tasks 2026-06-10 21:45:04 +02:00
CREDO23
aa7aa81c16 refactor(podcasts): drop language detection from brief 2026-06-10 20:51:38 +02:00
CREDO23
15e44616f3 test(podcasts): cover drafting billing gate 2026-06-10 18:44:26 +02:00
CREDO23
0bed4a0d38 test(podcasts): cover failure recording 2026-06-10 18:44:25 +02:00
CREDO23
0c7987cd9e test(podcasts): cover api read model 2026-06-10 18:44:25 +02:00
CREDO23
fa7ab8a06d test(podcasts): cover renderer validation 2026-06-10 18:44:25 +02:00
CREDO23
36c201f9e2 test(podcasts): cover structured json parsing 2026-06-10 18:44:25 +02:00
CREDO23
0c92ee963e test(podcasts): cover voice catalog 2026-06-10 18:44:25 +02:00
CREDO23
e926990d8e test(podcasts): cover language and voice resolution 2026-06-10 18:44:25 +02:00
CREDO23
aaa9f01087 test(podcasts): cover brief and transcript contracts 2026-06-10 18:44:25 +02:00
CREDO23
9d8e4e4f9d test(podcasts): cover lifecycle state machine 2026-06-10 18:44:25 +02:00
CREDO23
f61e8af8c0 test(podcasts): add shared test fixtures 2026-06-10 18:44:25 +02:00
CREDO23
59c1cf14c7 test(indexers): cover mark_connector_documents_failed behavior 2026-06-10 00:11:00 +02:00
CREDO23
77544ab768 test(google-drive): assert stuck pending/processing docs retry 2026-06-10 00:11:00 +02:00
CREDO23
9f76daec8f test(indexers): update download mock return shape 2026-06-09 23:39:25 +02:00
CREDO23
bdd3728c5b test(dropbox): update download failure return shape 2026-06-09 23:39:25 +02:00
CREDO23
b5aa41beb6 test(onedrive): update download failure return shape 2026-06-09 23:39:25 +02:00
CREDO23
5f59ad3ad3 test(google-drive): update download failure return shape 2026-06-09 23:39:25 +02:00
DESKTOP-RTLN3BA\$punk
ce952d2ad1 chore: linting 2026-06-09 00:42:26 -07:00
CREDO23
53a3920a82 fix(e2e): load .env after harness env defaults 2026-06-05 19:24:26 +02:00
CREDO23
8bdfd00a15 Merge upstream/dev 2026-06-05 19:18:12 +02:00
CREDO23
52ff304d64 fix(e2e): delegate connector work via task in fake LLM 2026-06-05 18:49:57 +02:00
CREDO23
bfadde93b7 fix(e2e): call .unique() when minting test token
The User mapper eager-loads the oauth_accounts collection via joined load
under AUTH_TYPE=GOOGLE, so the mint endpoint's query must call .unique()
before scalar_one_or_none() to avoid InvalidRequestError (500).
2026-06-05 18:17:11 +02:00
CREDO23
88fe213176 refactor(agents): extract subagent-invocation contract to subagents/shared
The knowledge_base subagent imported subagent_invoke_config + EXCLUDED_STATE_KEYS
from main_agent's checkpointed_subagent_middleware -- a subagent reaching into
main-agent internals. Both symbols (plus the recursion-limit constant they need)
are a subagent-invocation contract shared by the orchestrator's task middleware
and any nested-invoking subagent. Move them to subagents/shared/invocation.py;
config.py keeps the HITL resume side-channel and constants.py keeps the
main-agent tuning knobs. All consumers (task_tool, kb tool, tests) repointed.
2026-06-05 14:18:44 +02:00