SurfSense/surfsense_backend/alembic/versions/165_add_chunk_position.py

"""add chunks.position for explicit document order

Incremental re-indexing keeps unchanged chunk rows, so auto-increment ids no
longer reflect document order. The ``position`` column makes that order
explicit and is written by the indexing pipeline for every new or re-indexed
document.

This migration intentionally does NOT backfill historical rows. On a large,
heavily-indexed table (notably a multi-hundred-GB HNSW embedding index) a bulk
UPDATE of every chunk becomes a non-HOT update that rewrites every secondary
index per row -- turning a one-off migration into a multi-day operation.
Instead, existing rows keep ``position = 0`` and therefore order by the
``Chunk.id`` tiebreaker (identical to the pre-feature behavior); new and
re-indexed documents get correct positions from application code, and any
document needing exact order can simply be re-indexed on demand.

Revision ID: 165
Revises: 164
"""

from collections.abc import Sequence

from alembic import op

revision: str = "165"
down_revision: str | None = "164"
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None

# Leftover UNLOGGED scratch table from earlier backfill attempts; dropped here
# so re-running this migration converges the schema regardless of past state.
SCRATCH_TABLE = "_chunk_position_backfill"


def upgrade() -> None:
    # Adding a NOT NULL column with a constant default is metadata-only on
    # PostgreSQL 11+, so this is fast even on very large tables. IF NOT EXISTS
    # makes it a no-op where the column already exists.
    op.execute(
        "ALTER TABLE chunks ADD COLUMN IF NOT EXISTS position INTEGER NOT NULL DEFAULT 0;"
    )

    # Clean up the scratch table left behind by the abandoned backfill approach.
    op.execute(f"DROP TABLE IF EXISTS {SCRATCH_TABLE};")


def downgrade() -> None:
    op.execute(f"DROP TABLE IF EXISTS {SCRATCH_TABLE};")
    op.execute("ALTER TABLE chunks DROP COLUMN IF EXISTS position;")
feat(chunks): add explicit position column with backfill migration Chunk ids stop reflecting document order once incremental re-indexing keeps unchanged rows across edits. Backfill preserves the historical id ordering so behavior is identical on day one. 2026-06-12 18:52:45 +02:00			`"""add chunks.position for explicit document order`

			`Incremental re-indexing keeps unchanged chunk rows, so auto-increment ids no`
$DESKTOP-RTLN3BA\$punk$ feat: add position column to chunks for explicit document order - Introduced a new `position` column in the `chunks` table to maintain explicit document order during re-indexing. - Updated migration to add the column without backfilling historical rows to avoid performance issues on large tables. - Adjusted the `Chunk` model to reflect the new column without indexing, as ordering reads are document-scoped. 2026-06-18 08:55:47 -07:00			longer reflect document order. The ``position`` column makes that order
			`explicit and is written by the indexing pipeline for every new or re-indexed`
			`document.`

			`This migration intentionally does NOT backfill historical rows. On a large,`
			`heavily-indexed table (notably a multi-hundred-GB HNSW embedding index) a bulk`
			`UPDATE of every chunk becomes a non-HOT update that rewrites every secondary`
			`index per row -- turning a one-off migration into a multi-day operation.`
			Instead, existing rows keep ``position = 0`` and therefore order by the
			``Chunk.id`` tiebreaker (identical to the pre-feature behavior); new and
			`re-indexed documents get correct positions from application code, and any`
			`document needing exact order can simply be re-indexed on demand.`
$DESKTOP-RTLN3BA\$punk$ feat(migration): implement chunk position backfill with batched updates and indexing for improved performance 2026-06-16 15:19:56 -07:00
$DESKTOP-RTLN3BA\$punk$ chore(migration): added dead users cleanup 2026-06-16 15:48:17 -07:00			`Revision ID: 165`
			`Revises: 164`
feat(chunks): add explicit position column with backfill migration Chunk ids stop reflecting document order once incremental re-indexing keeps unchanged rows across edits. Backfill preserves the historical id ordering so behavior is identical on day one. 2026-06-12 18:52:45 +02:00			`"""`

			`from collections.abc import Sequence`

			`from alembic import op`

$DESKTOP-RTLN3BA\$punk$ chore(migration): added dead users cleanup 2026-06-16 15:48:17 -07:00			`revision: str = "165"`
			`down_revision: str \| None = "164"`
feat(chunks): add explicit position column with backfill migration Chunk ids stop reflecting document order once incremental re-indexing keeps unchanged rows across edits. Backfill preserves the historical id ordering so behavior is identical on day one. 2026-06-12 18:52:45 +02:00			`branch_labels: str \| Sequence[str] \| None = None`
			`depends_on: str \| Sequence[str] \| None = None`

$DESKTOP-RTLN3BA\$punk$ feat: add position column to chunks for explicit document order - Introduced a new `position` column in the `chunks` table to maintain explicit document order during re-indexing. - Updated migration to add the column without backfilling historical rows to avoid performance issues on large tables. - Adjusted the `Chunk` model to reflect the new column without indexing, as ordering reads are document-scoped. 2026-06-18 08:55:47 -07:00			`# Leftover UNLOGGED scratch table from earlier backfill attempts; dropped here`
			`# so re-running this migration converges the schema regardless of past state.`
$DESKTOP-RTLN3BA\$punk$ feat(migration): implement chunk position backfill with batched updates and indexing for improved performance 2026-06-16 15:19:56 -07:00			`SCRATCH_TABLE = "_chunk_position_backfill"`

feat(chunks): add explicit position column with backfill migration Chunk ids stop reflecting document order once incremental re-indexing keeps unchanged rows across edits. Backfill preserves the historical id ordering so behavior is identical on day one. 2026-06-12 18:52:45 +02:00
			`def upgrade() -> None:`
$DESKTOP-RTLN3BA\$punk$ feat(migration): implement chunk position backfill with batched updates and indexing for improved performance 2026-06-16 15:19:56 -07:00			`# Adding a NOT NULL column with a constant default is metadata-only on`
$DESKTOP-RTLN3BA\$punk$ feat: add position column to chunks for explicit document order - Introduced a new `position` column in the `chunks` table to maintain explicit document order during re-indexing. - Updated migration to add the column without backfilling historical rows to avoid performance issues on large tables. - Adjusted the `Chunk` model to reflect the new column without indexing, as ordering reads are document-scoped. 2026-06-18 08:55:47 -07:00			`# PostgreSQL 11+, so this is fast even on very large tables. IF NOT EXISTS`
			`# makes it a no-op where the column already exists.`
feat(chunks): add explicit position column with backfill migration Chunk ids stop reflecting document order once incremental re-indexing keeps unchanged rows across edits. Backfill preserves the historical id ordering so behavior is identical on day one. 2026-06-12 18:52:45 +02:00			`op.execute(`
			`"ALTER TABLE chunks ADD COLUMN IF NOT EXISTS position INTEGER NOT NULL DEFAULT 0;"`
			`)`

$DESKTOP-RTLN3BA\$punk$ feat: add position column to chunks for explicit document order - Introduced a new `position` column in the `chunks` table to maintain explicit document order during re-indexing. - Updated migration to add the column without backfilling historical rows to avoid performance issues on large tables. - Adjusted the `Chunk` model to reflect the new column without indexing, as ordering reads are document-scoped. 2026-06-18 08:55:47 -07:00			`# Clean up the scratch table left behind by the abandoned backfill approach.`
			`op.execute(f"DROP TABLE IF EXISTS {SCRATCH_TABLE};")`
feat(chunks): add explicit position column with backfill migration Chunk ids stop reflecting document order once incremental re-indexing keeps unchanged rows across edits. Backfill preserves the historical id ordering so behavior is identical on day one. 2026-06-12 18:52:45 +02:00

			`def downgrade() -> None:`
$DESKTOP-RTLN3BA\$punk$ feat(migration): implement chunk position backfill with batched updates and indexing for improved performance 2026-06-16 15:19:56 -07:00			`op.execute(f"DROP TABLE IF EXISTS {SCRATCH_TABLE};")`
feat(chunks): add explicit position column with backfill migration Chunk ids stop reflecting document order once incremental re-indexing keeps unchanged rows across edits. Backfill preserves the historical id ordering so behavior is identical on day one. 2026-06-12 18:52:45 +02:00			`op.execute("ALTER TABLE chunks DROP COLUMN IF EXISTS position;")`