The long-running ingestion/podcast/video tasks run on a separate Celery
engine (NullPool), so the web engine's idle_in_transaction_session_timeout
did not cover them — which is exactly where the original 11h zombie
(INSERT INTO chunks) came from. Apply the same protection to the Celery
engine with a generous 60-minute default so a worker that hangs/crashes
mid-transaction can't hold locks on documents/chunks indefinitely, while
never reaping a legitimate per-document embed window.
- config + .env.example: DB_CELERY_IDLE_IN_TX_TIMEOUT_MS (default 3600000).
Co-authored-by: Cursor <cursoragent@cursor.com>
A single abandoned "idle in transaction" session held locks on the
documents table, which blocked the non-concurrent CREATE INDEX (hnsw)
run inside the FastAPI lifespan. Each API restart queued another
CREATE INDEX behind an advisory lock, leaving the server stuck at
"Waiting for application startup." indefinitely and freezing ingestion
writes.
Changes:
- setup_indexes(): build every index with CREATE INDEX CONCURRENTLY
(non-blocking ShareUpdateExclusiveLock) under a per-session
lock_timeout, and make each statement non-fatal so a contended/slow
build is retried next boot instead of wedging startup. Drop leftover
invalid indexes before rebuilding.
- create_db_and_tables(): apply lock_timeout to extension/create_all
DDL and gate the whole bootstrap behind DB_BOOTSTRAP_ON_STARTUP.
- engine: set idle_in_transaction_session_timeout (asyncpg) so an
abandoned transaction is reaped automatically.
- config + .env.example: DB_BOOTSTRAP_ON_STARTUP, DB_DDL_LOCK_TIMEOUT_MS,
DB_IDLE_IN_TX_TIMEOUT_MS.
Co-authored-by: Cursor <cursoragent@cursor.com>
surfsense.indexing.reconcile.chunks counts reused/embedded/deleted chunks per
re-index. CHUNK_RECONCILE_ENABLED (default on) falls back to delete-all +
full re-embed if the diff path ever misbehaves.
The cached payload is the indexing pipeline's embeddings (markdown is
chunked then embedded), so "embedding cache" names the expensive output
directly and removes the "index" ambiguity (DB index vs vector index vs
indexing phase). Renames the service, settings, eligibility, eviction
task, metrics, config flags (INDEX_CACHE_* -> EMBEDDING_CACHE_*), object
prefix, and the table (index_cache_embedding_sets -> embedding_cache_sets)
with its constraint and indexes. Migration 161 renamed accordingly.
- Updated environment variables and - configurations for credit purchases via Stripe, replacing legacy page pack system.
- Introduced auto-reload feature for credit top-ups and modified database models to track credit transactions.
- Updated notification system to handle insufficient credits and auto-reload failures.
- Adjusted API routes and schemas to reflect changes in credit management.
- Replaced Playwright with Scrapling's fetchers in the web crawling and YouTube processing modules for improved performance and flexibility.
- Updated proxy configuration to support dynamic proxy selection via environment variables.
- Enhanced logging to track performance metrics during web scraping operations.
- Refactored related modules to utilize the new proxy utilities and streamline the scraping process.
- Consolidated Redis configuration by introducing a single `REDIS_URL` variable for Celery broker, result backend, and app cache.
- Removed deprecated variables related to Firecrawl and Stripe token limits from `.env.example` files.
- Updated documentation to reflect changes in environment variable usage for improved clarity and maintainability.
- Replaced environment variable usage with a centralized configuration system in multiple modules, including `celery_app`, `agent_cache_store`, `sandbox`, `file_storage`, and `connector_service`.
- Enhanced maintainability and readability by sourcing configuration values from the `config` module instead of directly from environment variables.
- Updated relevant settings to ensure consistent access to configuration values across the application.
- Added a global switch `GATEWAY_ENABLED` to control the activation of all messaging gateway channels (Telegram, WhatsApp, Slack, Discord).
- Updated relevant routes and workers to check the `GATEWAY_ENABLED` flag, returning 404 for HTTP routes when disabled.
- Enhanced documentation in the `.env.example` file to reflect the new configuration option.
- Add MiniMax-M3 to the model selection list (set as the new default)
- Add MiniMax-M2.7 and MiniMax-M2.7-highspeed as alternatives
- Remove deprecated MiniMax-M2.5 / M2.5-highspeed entries
- Update example config and Chinese setup docs to reference M3 (512K context)
Adds an optional planner LLM role wired through KnowledgePriorityMiddleware
so KB query rewriting, date extraction, and recency classification run on a
cheap model (e.g. gpt-4o-mini, Haiku, Azure nano) instead of the user's
chat LLM. Operators opt in by setting is_planner: true on exactly one
global config; without it, behavior is unchanged.