The pull-based KB design (on-demand search_knowledge_base tool + pre-injected
workspace tree) fully replaced the old eager retrieval path. Remove its last
remnants:
- Delete KnowledgePriorityMiddleware (knowledge_search.py) and its tests.
- Drop the kb_priority state field + reducer default; trim
KbContextProjectionMiddleware to project only workspace_tree_text.
- Remove the now-dead feature flags enable_kb_priority_preinjection and
enable_kb_planner_runnable across backend (flags, route schema, tests,
env examples) and frontend (settings toggle, zod schema).
- Scrub <priority_documents> and stale KnowledgePriorityMiddleware references
from prompts, docstrings, and the ADR.
No functional change: nothing wrote kb_priority and neither flag gated live
behavior after the cutover. Full backend suite green (pre-existing unrelated
failures aside).
Add the checkpointed CitationRegistry (load/merge helpers + state field)
and a lightweight CitationStateMiddleware so subagents can register into
the same conversation registry. Resolve [n] -> [citation:<payload>] at
stream finalize from the registry, polymorphically by source type.
Thread mentioned_thread_ids from the route through the orchestrator into
input-state assembly, resolve them for the requesting user, and append
the rendered referenced-chat block to the agent's query context.
- Updated the LLM bundle construction to include a streaming option for both DB-backed and global models.
- Modified the `litellm_kwargs` to set the streaming parameter to True, enhancing the functionality for chat streaming flows.
The long-running ingestion/podcast/video tasks run on a separate Celery
engine (NullPool), so the web engine's idle_in_transaction_session_timeout
did not cover them — which is exactly where the original 11h zombie
(INSERT INTO chunks) came from. Apply the same protection to the Celery
engine with a generous 60-minute default so a worker that hangs/crashes
mid-transaction can't hold locks on documents/chunks indefinitely, while
never reaping a legitimate per-document embed window.
- config + .env.example: DB_CELERY_IDLE_IN_TX_TIMEOUT_MS (default 3600000).
Co-authored-by: Cursor <cursoragent@cursor.com>
document_converters, the github size-fallback chunker, revert_service
restores, and the kb-persistence middleware now write explicit positions
(the middleware read path also orders by position).
Every file ingestion path (Dropbox, Google Drive / Composio Drive, OneDrive,
local folder, Obsidian, and the legacy upload handlers) now parses via the
extract_with_cache facade instead of calling EtlPipelineService.extract
directly, so identical bytes are deduplicated globally regardless of source.
vision_llm is passed through, keeping the existing cacheability gate intact.
- Introduced LLMErrorCategory and adapt_llm_exception to normalize LLM exceptions.
- Updated llm_retryable_message and llm_permanent_message to utilize the new adaptation logic.
- Enhanced classify_stream_exception to classify provider errors and return user-friendly messages.
- Added tests for error classification and adaptation to ensure robustness.
- Updated frontend error handling to display appropriate messages based on new classifications.
- Updated environment variables and - configurations for credit purchases via Stripe, replacing legacy page pack system.
- Introduced auto-reload feature for credit top-ups and modified database models to track credit transactions.
- Updated notification system to handle insufficient credits and auto-reload failures.
- Adjusted API routes and schemas to reflect changes in credit management.