rename test_index.py to test_index_document.py

2026-04-25 00:36:31 +02:00 · 2026-02-25 01:43:21 +02:00 · 2026-02-25 01:43:21 +02:00 · 46c7ccd70b
commit 46c7ccd70b
parent 61e50834e6
9 changed files with 4071 additions and 3233 deletions
--- a/.cursor/skills/tdd/SKILL.md
+++ b/.cursor/skills/tdd/SKILL.md
@ -0,0 +1,112 @@
+---
+name: tdd
+description: Strict Python TDD workflow using pytest (Red-Green-Refactor).
+---
+
+---
+name: tdd
+description: Test-driven development with red-green-refactor loop. Use when user wants to build features or fix bugs using TDD, mentions "red-green-refactor", wants integration tests, or asks for test-first development.
+---
+
+# Test-Driven Development
+
+## Philosophy
+
+**Core principle**: Tests should verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't.
+
+**Good tests** are integration-style: they exercise real code paths through public APIs. They describe _what_ the system does, not _how_ it does it. A good test reads like a specification - "user can checkout with valid cart" tells you exactly what capability exists. These tests survive refactors because they don't care about internal structure.
+
+**Bad tests** are coupled to implementation. They mock internal collaborators, test private methods, or verify through external means (like querying a database directly instead of using the interface). The warning sign: your test breaks when you refactor, but behavior hasn't changed. If you rename an internal function and tests fail, those tests were testing implementation, not behavior.
+
+See [tests.md](tests.md) for examples and [mocking.md](mocking.md) for mocking guidelines.
+
+## Anti-Pattern: Horizontal Slices
+
+**DO NOT write all tests first, then all implementation.** This is "horizontal slicing" - treating RED as "write all tests" and GREEN as "write all code."
+
+This produces **crap tests**:
+
+- Tests written in bulk test _imagined_ behavior, not _actual_ behavior
+- You end up testing the _shape_ of things (data structures, function signatures) rather than user-facing behavior
+- Tests become insensitive to real changes - they pass when behavior breaks, fail when behavior is fine
+- You outrun your headlights, committing to test structure before understanding the implementation
+
+**Correct approach**: Vertical slices via tracer bullets. One test → one implementation → repeat. Each test responds to what you learned from the previous cycle. Because you just wrote the code, you know exactly what behavior matters and how to verify it.
+
+```
+WRONG (horizontal):
+  RED:   test1, test2, test3, test4, test5
+  GREEN: impl1, impl2, impl3, impl4, impl5
+
+RIGHT (vertical):
+  RED→GREEN: test1→impl1
+  RED→GREEN: test2→impl2
+  RED→GREEN: test3→impl3
+  ...
+```
+
+## Workflow
+
+### 1. Planning
+
+Before writing any code:
+
+- [ ] Confirm with user what interface changes are needed
+- [ ] Confirm with user which behaviors to test (prioritize)
+- [ ] Identify opportunities for [deep modules](deep-modules.md) (small interface, deep implementation)
+- [ ] Design interfaces for [testability](interface-design.md)
+- [ ] List the behaviors to test (not implementation steps)
+- [ ] Get user approval on the plan
+
+Ask: "What should the public interface look like? Which behaviors are most important to test?"
+
+**You can't test everything.** Confirm with the user exactly which behaviors matter most. Focus testing effort on critical paths and complex logic, not every possible edge case.
+
+### 2. Tracer Bullet
+
+Write ONE test that confirms ONE thing about the system:
+
+```
+RED:   Write test for first behavior → test fails
+GREEN: Write minimal code to pass → test passes
+```
+
+This is your tracer bullet - proves the path works end-to-end.
+
+### 3. Incremental Loop
+
+For each remaining behavior:
+
+```
+RED:   Write next test → fails
+GREEN: Minimal code to pass → passes
+```
+
+Rules:
+
+- One test at a time
+- Only enough code to pass current test
+- Don't anticipate future tests
+- Keep tests focused on observable behavior
+
+### 4. Refactor
+
+After all tests pass, look for [refactor candidates](refactoring.md):
+
+- [ ] Extract duplication
+- [ ] Deepen modules (move complexity behind simple interfaces)
+- [ ] Apply SOLID principles where natural
+- [ ] Consider what new code reveals about existing code
+- [ ] Run tests after each refactor step
+
+**Never refactor while RED.** Get to GREEN first.
+
+## Checklist Per Cycle
+
+```
+[ ] Test describes behavior, not implementation
+[ ] Test uses public interface only
+[ ] Test would survive internal refactor
+[ ] Code is minimal for this test
+[ ] No speculative features added
+```
--- a/.cursor/skills/tdd/deep-modules.md
+++ b/.cursor/skills/tdd/deep-modules.md
@ -0,0 +1,33 @@
+# Deep Modules
+
+From "A Philosophy of Software Design":
+
+**Deep module** = small interface + lots of implementation
+
+```
+┌─────────────────────┐
+│   Small Interface   │  ← Few methods, simple params
+├─────────────────────┤
+│                     │
+│                     │
+│  Deep Implementation│  ← Complex logic hidden
+│                     │
+│                     │
+└─────────────────────┘
+```
+
+**Shallow module** = large interface + little implementation (avoid)
+
+```
+┌─────────────────────────────────┐
+│       Large Interface           │  ← Many methods, complex params
+├─────────────────────────────────┤
+│  Thin Implementation            │  ← Just passes through
+└─────────────────────────────────┘
+```
+
+When designing interfaces, ask:
+
+- Can I reduce the number of methods?
+- Can I simplify the parameters?
+- Can I hide more complexity inside?
--- a/.cursor/skills/tdd/interface-design.md
+++ b/.cursor/skills/tdd/interface-design.md
@ -0,0 +1,33 @@
+# Interface Design for Testability
+
+Good interfaces make testing natural:
+
+1. **Accept dependencies, don't create them**
+```python
+# Testable
+def process_order(order, payment_gateway):
+    pass
+
+# Hard to test
+def process_order(order):
+    gateway = StripeGateway()
+
+```
+
+
+2. **Return results, don't produce side effects**
+```python
+# Testable
+def calculate_discount(cart) -> float:
+    return discount
+
+# Hard to test
+def apply_discount(cart) -> None:
+    cart.total -= discount
+
+```
+
+
+3. **Small surface area**
+* Fewer methods = fewer tests needed
+* Fewer params = simpler test setup
--- a/.cursor/skills/tdd/mocking.md
+++ b/.cursor/skills/tdd/mocking.md
@ -0,0 +1,69 @@
+
+# When to Mock
+
+Mock at **system boundaries** only:
+
+* External APIs (payment, email, etc.)
+* Databases (sometimes - prefer test DB)
+* Time/randomness
+* File system (sometimes)
+
+Don't mock:
+
+* Your own classes/modules
+* Internal collaborators
+* Anything you control
+
+## Designing for Mockability
+
+At system boundaries, design interfaces that are easy to mock:
+
+**1. Use dependency injection**
+
+Pass external dependencies in rather than creating them internally:
+
+```python
+import os
+
+# Easy to mock
+def process_payment(order, payment_client):
+    return payment_client.charge(order.total)
+
+# Hard to mock
+def process_payment(order):
+    client = StripeClient(os.getenv("STRIPE_KEY"))
+    return client.charge(order.total)
+
+```
+
+**2. Prefer SDK-style interfaces over generic fetchers**
+
+Create specific functions for each external operation instead of one generic function with conditional logic:
+
+```python
+import requests
+
+# GOOD: Each function is independently mockable
+class UserAPI:
+    def get_user(self, user_id):
+        return requests.get(f"/users/{user_id}")
+
+    def get_orders(self, user_id):
+        return requests.get(f"/users/{user_id}/orders")
+
+    def create_order(self, data):
+        return requests.post("/orders", json=data)
+
+# BAD: Mocking requires conditional logic inside the mock
+class GenericAPI:
+    def fetch(self, endpoint, method="GET", data=None):
+        return requests.request(method, endpoint, json=data)
+
+```
+
+The SDK approach means:
+
+* Each mock returns one specific shape
+* No conditional logic in test setup
+* Easier to see which endpoints a test exercises
+* Type safety per endpoint
--- a/.cursor/skills/tdd/refactoring.md
+++ b/.cursor/skills/tdd/refactoring.md
@ -0,0 +1,10 @@
+# Refactor Candidates
+
+After TDD cycle, look for:
+
+- **Duplication** → Extract function/class
+- **Long methods** → Break into private helpers (keep tests on public interface)
+- **Shallow modules** → Combine or deepen
+- **Feature envy** → Move logic to where data lives
+- **Primitive obsession** → Introduce value objects
+- **Existing code** the new code reveals as problematic
--- a/.cursor/skills/tdd/tests.md
+++ b/.cursor/skills/tdd/tests.md
@ -0,0 +1,60 @@
+# Good and Bad Tests
+
+## Good Tests
+
+**Integration-style**: Test through real interfaces, not mocks of internal parts.
+
+```python
+# GOOD: Tests observable behavior
+def test_user_can_checkout_with_valid_cart():
+    cart = create_cart()
+    cart.add(product)
+    result = checkout(cart, payment_method)
+    assert result.status == "confirmed"
+
+```
+
+Characteristics:
+
+* Tests behavior users/callers care about
+* Uses public API only
+* Survives internal refactors
+* Describes WHAT, not HOW
+* One logical assertion per test
+
+## Bad Tests
+
+**Implementation-detail tests**: Coupled to internal structure.
+
+```python
+# BAD: Tests implementation details
+def test_checkout_calls_payment_service_process():
+    mock_payment = MagicMock()
+    checkout(cart, mock_payment)
+    mock_payment.process.assert_called_with(cart.total)
+
+```
+
+Red flags:
+
+* Mocking internal collaborators
+* Testing private methods
+* Asserting on call counts/order
+* Test breaks when refactoring without behavior change
+* Test name describes HOW not WHAT
+* Verifying through external means instead of interface
+
+```python
+# BAD: Bypasses interface to verify
+def test_create_user_saves_to_database():
+    create_user({"name": "Alice"})
+    row = db.query("SELECT * FROM users WHERE name = ?", ["Alice"])
+    assert row is not None
+
+# GOOD: Verifies through interface
+def test_create_user_makes_user_retrievable():
+    user = create_user({"name": "Alice"})
+    retrieved = get_user(user.id)
+    assert retrieved.name == "Alice"
+
+```
--- a/surfsense_backend/INDEXING_PIPELINE_PLAN.md
+++ b/surfsense_backend/INDEXING_PIPELINE_PLAN.md
@ -0,0 +1,458 @@
+# Indexing Pipeline Refactor — Plan
+
+## Context
+
+The current system has 20+ connector indexers each implementing the same pipeline
+independently. This causes:
+
+- `safe_set_chunks`, `get_current_timestamp`, `check_document_by_unique_identifier`
+  defined twice across `connector_indexers/base.py` and `document_processors/base.py`
+- `config.embedding_model_instance.embed()` called in 20+ places directly
+- `document.content` stores different things per connector (summary vs raw text)
+- `source_markdown` column always NULL — never written by any connector
+- GitHub uses custom character chunking, bypassing the shared chunker
+- Obsidian calls LLM then discards the result, embeds raw text instead
+- `LinearKBSyncService` and `NotionKBSyncService` both say they
+  "mirror Phase 2 logic exactly" — a third copy of the same pipeline
+- No tests anywhere
+
+---
+
+## What We Are Building
+
+### 1. `ConnectorDocument`
+
+The canonical data model. Every adapter produces one. The pipeline consumes one.
+Nothing else crosses this boundary.
+
+**File:** `app/pipeline/connector_document.py`
+
+```
+title: str
+source_markdown: str          # raw content, always set, validated not empty
+unique_id: str                # source-native key: page_id, issue_id, ts...
+document_type: DocumentType
+should_summarize: bool = True # drives SUMMARIZE step
+content_type: "text" | "code" # drives CHUNK step (chunker vs code_chunker)
+metadata: dict = {}
+connector_id: int | None = None
+search_space_id: int = 0
+created_by_id: str = ""
+```
+
+### 2. `IndexingPipelineService`
+
+The single pipeline. All connectors use it. Nothing changes inside it
+when a new connector is added.
+
+**File:** `app/services/indexing_pipeline_service.py`
+
+**Public interface:**
+```
+prepare_for_indexing(connector_documents) → list[(Document, ConnectorDocument)]
+index(document, connector_doc, llm)       → None
+```
+
+**Public interface:**
+```
+prepare_for_indexing(connector_documents) → list[(Document, ConnectorDocument)]
+index(document, connector_doc, llm)       → None
+```
+
+**Private steps (one responsibility each):**
+```
+_compute_unique_identifier_hash(connector_doc)  → str
+_compute_content_hash(connector_doc)            → str
+_find_existing_by_identifier(hash)              → Document | None   # always selectinload chunks
+_find_existing_by_content(hash)                 → Document | None
+_create(connector_doc, hash)                    → Document          # creates new DB row
+_mark_processing(document)                      → None              # status transition
+_mark_ready(document, connector_doc, ...)       → None              # status transition + full data
+_mark_failed(document, error)                   → None              # status transition + error
+_summarize(connector_doc, llm)                  → str
+_chunk(connector_doc)                           → list[str]
+_embed_document(summary_content)                → list[float]
+_embed_chunks(segments)                         → list[list[float]]
+```
+
+**What `prepare_for_indexing` orchestrates:**
+```
+for each ConnectorDocument:
+    _compute_unique_identifier_hash
+    _compute_content_hash
+    _find_existing_by_identifier
+        → not found: check content duplicate
+            → duplicate: skip
+            → new: _create
+        → found, content_hash same, title same: skip (unchanged)
+        → found, content_hash same, title different: update title inline
+        → found, content_hash different: queue for full reprocess
+session.commit() once
+```
+
+**What `index` orchestrates:**
+```
+try:
+    _mark_processing(document), commit
+    _summarize
+    _chunk
+    _embed_document
+    _embed_chunks
+    _mark_ready(document, connector_doc, ...)
+except Exception as e:
+    rollback()
+    session.refresh(document)   ← required: rollback expires all attributes
+    _mark_failed(document, e)
+    commit()
+```
+
+### 3. Adapters
+
+One per connector. Pure static converter. No IO, no DB, no hashing.
+
+**File:** `app/pipeline/adapters/{connector_name}_adapter.py`
+
+```python
+class ClickUpAdapter:
+    @staticmethod
+    def to_connector_document(task: dict, connector_id, search_space_id, user_id) -> ConnectorDocument:
+        ...
+```
+
+---
+
+## What Moves Into The Service
+
+| Currently in | Moves to |
+|---|---|
+| `connector_indexers/base.py` | `IndexingPipelineService` |
+| `document_processors/base.py` | `IndexingPipelineService` (duplicates removed) |
+| `document_converters.py` | `IndexingPipelineService` |
+| `config.embedding_model_instance.embed()` (20 places) | `_embed_document`, `_embed_chunks` only |
+| `config.chunker_instance.chunk()` (all connectors) | `_chunk` only |
+| `config.code_chunker_instance.chunk()` (unused/GitHub) | `_chunk` only |
+
+**Stays outside (adapter concern):**
+- `build_document_metadata_string()`
+- `build_document_metadata_markdown()`
+- `convert_element_to_markdown()`
+- `convert_document_to_markdown()`
+
+---
+
+## Implementation Order
+
+### Step 1 — Create `ConnectorDocument`
+New file. Nothing references it yet. Zero risk.
+
+### Step 2 — Create `IndexingPipelineService`
+New file. Nothing calls it yet. Zero risk.
+
+### Step 3 — Write tests (see test plan below)
+Tests call the new service directly. No connector changed yet.
+
+### Step 4 — Migrate one connector (ClickUp — simplest)
+- Write `ClickUpAdapter`
+- Update `clickup_indexer.py` to use the service
+- Keep old code until verified working in testing
+
+### Step 5 — Migrate remaining connectors one by one
+Each connector migrated = old Phase 2 code deleted from that indexer.
+
+### Step 6 — Update `LinearKBSyncService` and `NotionKBSyncService`
+Replace their Phase 2 reimplementation with a call to `service.index()`.
+
+### Step 7 — Update `composio_indexer.py`
+Standalone indexer for Composio connectors (`composio_google_drive_connector.py`,
+`composio_google_calendar_connector.py`, `composio_gmail_connector.py`).
+Same migration pattern as standard connectors.
+
+### Step 8 — Delete duplicate base functions
+Only after all connectors are migrated.
+`connector_indexers/base.py` and `document_processors/base.py` — remove duplicates,
+keep only connector helpers (`get_connector_by_id`, `calculate_date_range`, etc.)
+
+---
+
+## Session Management Rules
+
+These rules apply to the pipeline implementation. They replace the inconsistent patterns
+found across the 20+ existing connectors.
+
+### No `session.no_autoflush`
+All connectors used `with session.no_autoflush` before `_find_existing_by_content`.
+This is cargo-cult. `session.add(document)` happens AFTER the content_hash check,
+so no staged documents exist in the session at check time. Autoflush is harmless.
+For same-content same-batch items, autoflush correctly finds the first staged document
+and deduplicates. Without `no_autoflush`, same-batch dedup works better, not worse.
+
+### Error handling in `index()`
+After a rollback, SQLAlchemy expires all object attributes. Accessing any attribute
+on `document` after rollback raises `MissingGreenlet` in async mode.
+The correct pattern:
+```python
+except Exception as e:
+    await self.session.rollback()
+    await self.session.refresh(document)   # reload from DB before touching attributes
+    await self._write_failed(document, str(e))
+    await self.session.commit()
+```
+
+### `safe_set_chunks` requires `selectinload`
+`safe_set_chunks` uses `set_committed_value` which marks old chunks as orphans for
+cascade deletion. This only works if the old chunks were loaded into the session first.
+`_find_existing_by_identifier` MUST always use `selectinload(Document.chunks)`.
+Any call to `_write_ready` on a document not loaded with `selectinload` will silently
+accumulate chunks instead of replacing them.
+
+### Phase 1 batch size
+Not implemented upfront. If a connector proves to need it during migration, add a
+`batch_size` parameter to `prepare_for_indexing` at that point.
+
+### `session.refresh()` is not needed after Phase 1 commit
+After Phase 1's commit, `index()` only SETs attributes on the document — it never
+READs them. SQLAlchemy write operations do not trigger lazy loading. No refresh needed.
+
+---
+
+## Connector Deviations — Handled at Migration Time
+
+| Connector | Deviation | How handled |
+|---|---|---|
+| Webcrawler | Content unknown at Phase 1 — fetched during Phase 2 | Call `prepare_for_indexing` per URL after crawling, not in bulk |
+| Slack, Discord | No LLM summarization | `should_summarize = False` in `ConnectorDocument` |
+| GitHub | Custom 4000-char character chunking | `content_type = "code"` → `code_chunker_instance` |
+| Obsidian | LLM called but result discarded, raw content embedded | `should_summarize = True` → pipeline fixes this correctly |
+
+---
+
+## Testing Philosophy
+
+Two rules drive every decision below:
+
+**1. Test behavior through public interfaces, not implementation details.**
+`IndexingPipelineService` exposes two public methods: `prepare_for_indexing` and
+`index`. Private methods (`_create`, `_summarize`, `_mark_ready`, etc.) are
+implementation details. Testing them directly couples tests to internal structure —
+renaming a private method breaks tests even though behavior hasn't changed.
+
+**2. Mock only true external boundaries.**
+Mocking internal collaborators (session, chunker) tests your assumptions about
+those collaborators, not the actual code. A mock never fails because of a bug in
+the real system. The DB is not mocked — a real test DB is used. Only the LLM and
+embedding model are mocked: they are genuine external services (cost, latency,
+non-determinism, network flakiness).
+
+**Consequence:** most tests are integration tests. Unit tests cover only pure
+functions and value objects that have no dependencies.
+
+---
+
+## Test Plan
+
+### Unit tests — pure functions and value objects, no mocks
+
+#### `ConnectorDocument`
+
+| Test | Checks |
+|---|---|
+| `source_markdown` empty → validation error | rejection of invalid input |
+| `source_markdown` whitespace only → validation error | rejection of invalid input |
+| Valid document created with defaults | all default values correct |
+| `should_summarize=False` accepted | optional flag works |
+| `content_type="code"` accepted | optional flag works |
+
+#### `_compute_unique_identifier_hash` + `_compute_content_hash`
+
+| Test | Checks |
+|---|---|
+| Same input → same hash (both functions) | determinism |
+| Different `unique_id` → different hash | isolation by field |
+| Different `search_space_id` → different hash | isolation by space |
+| Different `document_type` → different hash | isolation by type |
+| Same content, different space → different hash | space-scoped content dedup |
+| Different content → different hash | content sensitivity |
+
+#### Adapters
+
+Each adapter is a pure static converter. Zero mocks needed.
+
+| Test | Checks |
+|---|---|
+| `to_connector_document` returns valid `ConnectorDocument` | output type |
+| `unique_id` maps to correct source field | field mapping |
+| `document_type` is correct enum value | enum correctness |
+| `source_markdown` contains expected content | content mapping |
+| `should_summarize` set correctly for this connector | flag correctness |
+| `content_type` set correctly (code vs text) | flag correctness |
+| Empty/missing fields handled gracefully | robustness |
+
+---
+
+### Integration tests — real DB, LLM and embedding model mocked
+
+#### `prepare_for_indexing`
+
+| Test | Checks |
+|---|---|
+| New document → status=pending in DB, returned | `_create` + hash logic |
+| Same `content_hash`, same title → skipped, not returned | unchanged dedup |
+| Same `content_hash`, different title → title updated, not returned | title-only update |
+| Changed `content_hash` → existing doc returned for reprocessing | update detection |
+| Duplicate content hash within batch → skipped | same-batch dedup |
+| Multiple documents → one commit for all | batch commit |
+
+#### `index`
+
+| Test | Mocks | Checks |
+|---|---|---|
+| `document.status = ready` after call | llm, embedding | happy path |
+| `document.content = summary` when `should_summarize=True` | llm, embedding | summarization used |
+| `document.content = source_markdown` when `should_summarize=False` | embedding | summarization skipped |
+| `document.source_markdown` always set | llm, embedding | field always written |
+| Chunks written to DB | llm, embedding | chunking + persistence |
+| `document.status = processing` set before heavy work | llm (slow), embedding | status progression |
+| LLM raises → `document.status = failed` | llm (raises) | error handling |
+| After LLM error → not stuck in `processing` | llm (raises) | recovery correctness |
+| After LLM error → chunks NOT written | llm (raises) | no partial data |
+
+---
+
+## What Does NOT Need Tests
+
+| Component | Reason |
+|---|---|
+| Private methods of `IndexingPipelineService` | covered by integration tests on public interface |
+| Celery task wiring | orchestration only, no logic |
+| External API calls (Notion, Slack, etc.) | external dependency, not our logic |
+| `get_user_long_context_llm` | existing service, not our code |
+| `calculate_date_range` | already has a `# FIX:` comment — needs its own separate tests |
+| Full end-to-end indexer run | too many moving parts, not unit or integration |
+
+---
+
+## Files To Create
+
+```
+app/indexing_pipeline/
+    connector_document.py          ← ConnectorDocument model
+    adapters/
+        clickup_adapter.py         ← first adapter (pilot)
+        notion_adapter.py
+        slack_adapter.py
+        ...
+        google/                    ← created when 2nd Google connector is migrated
+            shared.py              ← shared pure helpers (metadata, markdown conversion)
+            google_drive_adapter.py
+            google_docs_adapter.py
+
+app/clients/                       ← created when 2nd connector of a family is migrated
+    google_client.py               ← shared OAuth + API client for all Google connectors
+
+app/indexing_pipeline/
+    indexing_pipeline_service.py   ← IndexingPipelineService
+
+tests/
+    unit/
+        indexing_pipeline/
+            test_connector_document.py
+            test_compute_hashes.py
+        adapters/
+            test_clickup_adapter.py
+            test_notion_adapter.py
+            test_slack_adapter.py
+    integration/
+        indexing_pipeline/
+            test_prepare_for_indexing.py
+            test_index.py
+```
+
+## TDD Build Order
+
+One test → one implementation → repeat. Never write multiple tests before making
+the previous one green.
+
+---
+
+### Phase 1 — `ConnectorDocument` (unit)
+
+**File:** `tests/unit/indexing_pipeline/test_connector_document.py`
+**Implements:** `app/indexing_pipeline/connector_document.py`
+
+- `test_empty_source_markdown_raises`
+- `test_whitespace_only_source_markdown_raises`
+- `test_valid_document_created_with_defaults`
+- `test_should_summarize_false_accepted`
+- `test_content_type_code_accepted`
+
+→ After this phase: add `make_connector_document` factory fixture to `tests/conftest.py`
+
+---
+
+### Phase 2 — Hash functions (unit)
+
+**File:** `tests/unit/indexing_pipeline/test_compute_hashes.py`
+**Implements:** `_compute_unique_identifier_hash`, `_compute_content_hash` on `IndexingPipelineService`
+
+- `test_same_input_same_hash`
+- `test_different_unique_id_different_hash`
+- `test_different_search_space_id_different_hash`
+- `test_different_document_type_different_hash`
+- `test_same_content_same_space_same_hash`
+- `test_same_content_different_space_different_hash`
+- `test_different_content_different_hash`
+
+---
+
+### Phase 3 — `prepare_for_indexing` (integration)
+
+**File:** `tests/integration/indexing_pipeline/test_prepare_for_indexing.py`
+**Implements:** `prepare_for_indexing` + all private DB methods it uses
+
+- `test_new_document_written_as_pending`
+- `test_unchanged_document_skipped`
+- `test_title_only_change_updates_title`
+- `test_changed_content_queued_for_reprocessing`
+- `test_duplicate_content_in_batch_skipped`
+- `test_multiple_documents_one_commit`
+
+---
+
+### Phase 4 — `index` (integration)
+
+**File:** `tests/integration/indexing_pipeline/test_index.py`
+**Implements:** `index` + all private processing methods it uses
+
+- `test_sets_status_ready`
+- `test_content_is_summary_when_should_summarize_true`
+- `test_content_is_source_markdown_when_should_summarize_false`
+- `test_source_markdown_always_set`
+- `test_chunks_written_to_db`
+- `test_status_processing_set_before_heavy_work`
+- `test_llm_error_sets_status_failed`
+- `test_llm_error_document_not_stuck_in_processing`
+- `test_llm_error_chunks_not_written`
+
+---
+
+### Phase 5 — Adapters (unit, one file per connector)
+
+Start with ClickUp — simplest connector, no special deviations.
+
+**File:** `tests/unit/adapters/test_clickup_adapter.py`
+**Implements:** `app/pipeline/adapters/clickup_adapter.py`
+
+Then repeat for each connector as it is migrated.
+
+---
+
+**Total: 12 unit tests + 15 integration tests = 27 tests.**
+
+---
+
+## Out of Scope (for now)
+
+Document processors (`youtube_processor`, `file_processors`, `markdown_processor`,
+`extension_processor`, `circleback_processor`) follow the same 2-phase pattern and
+will eventually migrate to the pipeline. Excluded from this refactor — they handle
+user uploads, not connector syncs, and their migration is a separate task.
--- a/surfsense_backend/tests/integration/indexing_pipeline/test_index_document.py
+++ b/surfsense_backend/tests/integration/indexing_pipeline/test_index_document.py
--- a/surfsense_backend/uv.lock
+++ b/surfsense_backend/uv.lock