fix(db): prevent boot-time index DDL from hanging FastAPI startup

A single abandoned "idle in transaction" session held locks on the
documents table, which blocked the non-concurrent CREATE INDEX (hnsw)
run inside the FastAPI lifespan. Each API restart queued another
CREATE INDEX behind an advisory lock, leaving the server stuck at
"Waiting for application startup." indefinitely and freezing ingestion
writes.

Changes:
- setup_indexes(): build every index with CREATE INDEX CONCURRENTLY
  (non-blocking ShareUpdateExclusiveLock) under a per-session
  lock_timeout, and make each statement non-fatal so a contended/slow
  build is retried next boot instead of wedging startup. Drop leftover
  invalid indexes before rebuilding.
- create_db_and_tables(): apply lock_timeout to extension/create_all
  DDL and gate the whole bootstrap behind DB_BOOTSTRAP_ON_STARTUP.
- engine: set idle_in_transaction_session_timeout (asyncpg) so an
  abandoned transaction is reaped automatically.
- config + .env.example: DB_BOOTSTRAP_ON_STARTUP, DB_DDL_LOCK_TIMEOUT_MS,
  DB_IDLE_IN_TX_TIMEOUT_MS.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
DESKTOP-RTLN3BA\$punk 2026-06-16 16:18:49 -07:00
parent 284df841ef
commit 89cc3b37ee
3 changed files with 158 additions and 43 deletions

View file

@ -541,6 +541,21 @@ class Config:
# Database
DATABASE_URL = os.getenv("DATABASE_URL")
# When TRUE (default) the app ensures extensions/tables/indexes exist on
# startup. Set FALSE in environments where schema is owned exclusively by
# Alembic migrations to skip all boot-time DDL.
DB_BOOTSTRAP_ON_STARTUP = (
os.getenv("DB_BOOTSTRAP_ON_STARTUP", "TRUE").upper() == "TRUE"
)
# Per-session lock_timeout (ms) applied to boot-time DDL so a contended
# CREATE INDEX / CREATE TABLE fails fast instead of hanging the FastAPI
# lifespan forever behind another transaction's lock.
DB_DDL_LOCK_TIMEOUT_MS = int(os.getenv("DB_DDL_LOCK_TIMEOUT_MS", "5000"))
# Global idle_in_transaction_session_timeout (ms) applied to every pooled
# connection so an abandoned "idle in transaction" session can't wedge the
# database indefinitely. 0 disables. Only applied to asyncpg connections.
DB_IDLE_IN_TX_TIMEOUT_MS = int(os.getenv("DB_IDLE_IN_TX_TIMEOUT_MS", "900000"))
# Celery / Redis
# Redis (single endpoint for Celery broker, result backend, and app cache).
# Legacy CELERY_BROKER_URL / CELERY_RESULT_BACKEND / REDIS_APP_URL still