docs: remove ADR 0001 (RAG/citation architecture shipped)

The RAG/citation/context redesign in ADR 0001 is implemented and validated (KB + web on the unified [n] citation spine, pull-based retrieval, eager path retired). Drop the ADR and the one stale docstring reference to it.
2026-06-26 21:39:43 +02:00 · 2026-06-25 18:50:58 +02:00 · 2026-06-25 18:50:58 +02:00 · 232cc937c5
commit 232cc937c5
parent 2beafbdec8
2 changed files with 1 additions and 635 deletions
--- a/docs/adr/0001-rag-citation-and-context-architecture.md
+++ b/docs/adr/0001-rag-citation-and-context-architecture.md
@ -1,634 +0,0 @@
-# ADR 0001 — RAG, Citation, and Context Architecture
-
- **Status:** Proposed
- **Date:** 2026-06-24
- **Owners:** SurfSense core
- **Supersedes:** the pre-agent KB priority/planner injection path
-
---
-
-## 1. Context & problem
-
-SurfSense answers questions over a user's indexed knowledge base (documents,
-chats, connectors, web results). The current pipeline causes the model to
-**hallucinate citations and answers**. Root causes identified during review:
-
- **Content/ID split.** The model is asked to author or copy complex identifiers
-  (`chunk_id`, raw URLs, free-text titles) that sit far from the content they
-  label. LLMs reliably corrupt nearby digits — so citations point at the wrong
-  source or at nothing.
- **Pre-agent work.** A planner LLM call + embedding + hybrid search runs in
-  `before_agent` on every turn (`KnowledgePriorityMiddleware`), plus an eager
-  `fetch_mentioned_documents` whose chunks are then **discarded**. This adds
-  latency and context noise before the agent even reasons.
- **Mentions are mismanaged.** An `@document` mention forces a wasted full-chunk
-  fetch, points at the doc **twice** (inline backtick path + `<priority_documents>`
-  entry), and still requires a read round-trip — then dumps the **whole** doc
-  regardless of the question.
- **Retrieval quality.** Search retrieves on chunks but collapses to documents,
-  chunks have **no overlap**, and the reranker exists (`RerankerService`) but is
-  **not wired** into the agent path.
- **Context bloat.** The workspace tree (up to 4000 tokens) and priority lists are
-  injected into the durable `messages` list every turn, causing context
-  distraction/confusion.
-
-This ADR defines the target architecture. It is the **single source of truth**;
-implementation issues should reference section numbers here.
-
---
-
-## 2. Principles
-
-1. **The model cites tiny numbers `[n]`, never identifiers.** The server owns the
-   mapping from `[n]` to a real source. There is nothing for the model to invent.
-2. **Retrieval is pull-based, behind tools.** Nothing retrieves before the agent
-   runs. The agent calls a tool when it needs information.
-3. **A mention is scope, not a retrieval trigger.** Mentioning a thing tells the
-   model the thing exists and gives it a filter it *may* apply — it does not fetch.
-4. **Ambient context is not conversation.** Transient per-turn context (tree,
-   mention scope, memory) is rendered via the system prompt, not appended to the
-   durable `messages` trajectory.
-5. **All complexity lives server-side** (resolver, retriever), so the model's job
-   stays trivial: read passages, echo the number next to the one you used.
-
---
-
-## 3. Citation architecture (the spine)
-
-Everything hangs off this. Build it first.
-
-### 3.1 What is citable
-
-Anything that is *information retrieved from a source*. Each source type has a
-natural **citable unit**:
-
-| Source | Citable unit | Entry locator | Enters context via |
-|---|---|---|---|
-| `kb_chunk` | chunk | `document_id` + `chunk_id` | `search_knowledge_base` |
-| `kb_document` | document | `document_id` | `read` (whole doc) |
-| `connector_item` | item | `connector_id` + `external_id` | connector tool |
-| `web_result` | url | `url` | web search / crawl |
-| `chat_turn` | turn | `thread_id` + `message_id` | `@chat` / referenced chat |
-| `anon_chunk` | chunk | `session/doc` + `chunk_id` | uploaded anonymous doc |
-
-**Not citable** (control/pointer — never gets a number): workspace tree, mention
-scope notes, `report_context`, the priority/registry listing itself.
-
-### 3.2 The citation entry (the truth)
-
-A registered entry is the durable identity of a citable unit:
-
-```python
-class CitationEntry(TypedDict):
-    n: int                      # the tiny label shown to the model
-    source_type: str            # "kb_chunk" | "kb_document" | "connector_item"
-                                # | "web_result" | "chat_turn" | "anon_chunk"
-    locator: dict[str, Any]     # source-specific identity (see table 3.1)
-    display: dict[str, Any]     # title, source label, url, date — for the UI pill
-```
-
-### 3.3 The registry (the bookkeeping)
-
-Lives in agent **state** so it survives across turns and across orchestrator +
-subagents.
-
-```python
-class CitationRegistry(TypedDict):
-    by_n: dict[int, CitationEntry]      # n -> entry  (resolve direction)
-    by_key: dict[str, int]              # source_key -> n  (dedup / find-or-create)
-    next_n: int                         # monotonic counter
-```
-
- **`source_key`** is a stable string derived from `(source_type, locator)`, e.g.
-  `"kb_chunk:42:880"`, `"web_result:https://…"`, `"chat_turn:7:1190"`.
- **Numbering is per-conversation and monotonic.** A given `[n]` never changes
-  meaning within a conversation.
- **Dedup:** registering an already-seen unit returns its existing `n`.
-
-### 3.4 The two operations
-
-```python
-def register(registry, source_type, locator, display) -> int:
-    """Find-or-create. Returns the [n] for this unit."""
-    key = make_key(source_type, locator)
-    if key in registry["by_key"]:
-        return registry["by_key"][key]
-    n = registry["next_n"]
-    registry["next_n"] += 1
-    registry["by_n"][n] = {"n": n, "source_type": source_type,
-                           "locator": locator, "display": display}
-    registry["by_key"][key] = n
-    return n
-
-def resolve(registry, n) -> CitationEntry | None:
-    """Map a model-emitted [n] back to its source. Unknown n -> None (drop)."""
-    return registry["by_n"].get(n)
-```
-
-### 3.5 Lifecycle
-
-```
-source yields item
-   → register(entry)            # source_type + locator + display  → assign/reuse [n]
-   → render passage with [n]    # the number sits INLINE next to the content
-   → model writes "...March 10 [n]"
-   → resolver: [n] → entry      # server-side, on the streamed answer
-   → frontend renders citation pill
-```
-
-The model only ever **echoes** a number that was printed next to the content it
-used. Unknown/garbled numbers resolve to nothing and are dropped (abstention by
-construction).
-
-### 3.6 Presentation format (`<retrieved_context>`)
-
-`[n]` must be the **only** citable integer adjacent to each passage. No
-`chunk 4 of 19`, no raw ids near the text. Grouping by document is allowed; the
-`[n]` is per passage.
-
-```
-<retrieved_context>
-Excerpts retrieved from the user's knowledge base for this query.
-Cite a passage with its [n].
-
-Document: "Q3 Launch Notes" (Slack · #launch · 2026-03-02)
-  [1] We agreed to push launch to March 10.
-  [2] Marketing will be notified next week.
-Document: "Timeline" (Notion · 2026-02-28)
-  [3] Dates floated were Mar 10 and Mar 17.
-</retrieved_context>
-```
-
-### 3.7 Reconciliation with the existing token format
-
-The frontend and evals already parse **`[citation:ID]`**
-(`surfsense_web/lib/citations/citation-parser.ts`,
-`surfsense_evals/src/surfsense_evals/core/parse/citations.py`).
-
-**Decision:** keep the wire token `[citation:ID]` where `ID = n`. The model is
-instructed to emit `[n]`; a thin normalization step rewrites `[n]` →
-`[citation:n]` on the streamed output before it reaches the existing parser, OR
-the model is instructed to emit `[citation:n]` directly. Either way `ID` is now a
-**small ordinal from the registry**, not a `chunk_id`/url/title. The resolver maps
-`n` → `CitationEntry` → the frontend citation object the UI already expects.
-
-> **Decided (§8.8):** the model emits `[n]` (smallest surface for the model to
-> get right); the server normalizes `[n]` → `[citation:n]` before the existing
-> parser.
-
---
-
-## 4. Retrieval architecture (pull-based)
-
-### 4.0 Execution channels (verified against the codebase)
-
-The orchestrator (main agent) does **not** own the virtual filesystem. It has a
-small fixed toolset; everything else is delegated via `task(<specialist>, …)`.
-Verified in `main_agent/tools/index.py` and `subagents/builtins/knowledge_base`.
-
-| Capability | Owner | Reached via |
-|---|---|---|
-| `search_knowledge_base(query, scope?)` — semantic/hybrid **RAG retrieval**, read-only | **orchestrator** | direct call |
-| `web_search`, `scrape_webpage` | **orchestrator** | direct call |
-| `update_memory`, `create_automation`, `write_todos`, `task` | **orchestrator** | direct call |
-| virtual filesystem: `read_file`, `write_file`, `edit_file`, `ls`, `glob`, `grep`, `list_tree`, `rm`, `rmdir`, `move_file` | **knowledge_base subagent** | `task(knowledge_base, …)` |
-| connector ops (gmail/slack/jira/…) | **connector subagents** | `task(<connector>, …)` |
-
-Consequences for citations:
-
- The **dominant RAG path is orchestrator-direct** (`search_knowledge_base`), so
-  it registers `[n]` exactly where the answer is composed — **no relay**.
- The **shared registry** (§8.9) is load-bearing only for the **delegated** lanes
-  (whole-doc reads via `knowledge_base`, connector reads): the subagent registers
-  into the shared registry and relays `[n]` upward.
- `search_knowledge_base` is **semantic RAG**, distinct from filesystem search
-  (`grep`/`glob`), which belongs to the subagent. `routing.md` conflates these and
-  omits `search_knowledge_base` from its direct-tools list — that prompt is stale
-  and must be corrected (see §7).
-
-### 4.1 The two retrieval operations
-
-| Operation | Tool | Owner | For |
-|---|---|---|---|
-| **search** | `search_knowledge_base(query, scope?)` → chunks, each registered → `[n]` | orchestrator (direct) | "related / scoped question" — RAG |
-| **read** | `read_file(path)` (whole object) | knowledge_base subagent (`task`) | "summarize / translate / rewrite / navigate this" |
-
-The agent chooses based on the query. No server-side intent classifier; the query
-semantics decide (summarize ⇒ delegate a `read`; related ⇒ direct `search`).
-
-### 4.2 `scope` — the mention→retrieval bridge
-
-`scope` is an **optional typed filter** restricting the search haystack:
-
-```python
-scope = {
-    "document_ids": [42],
-    "folder_ids": [],
-    "connector_ids": [],
-}
-```
-
- Becomes `WHERE` constraints on the chunk search (`document_id IN (...)`, etc.).
- **Agent-controlled, not automatic.** "in this doc" → agent passes scope; "related"
-  → agent omits it.
- Spans only **KB-indexed** references (doc/folder/connector). Chats are **not**
-  KB-indexed (no `CHAT` document type; they live in `NewChatThread` /
-  `NewChatMessage`, not `Document`/`Chunk`), so `@chat` never appears in `scope` —
-  it uses the separate read channel in §5.
- **How it reaches the retriever depends on the channel:**
-  - direct `search_knowledge_base` → `scope` is a **structured tool arg** the
-    orchestrator passes (new arg to add — current tool has no `scope`).
-  - delegated `read` / browse → the orchestrator expresses scope in the **task
-    prompt** (path + ids); the subagent translates it into its filesystem calls.
-
-**Decision:** even when `scope` pins a single doc, `search_knowledge_base` still
-runs full hybrid ranking *within* that doc (a large doc still needs its relevant
-passages surfaced) — it does not return raw chunk order.
-
-### 4.3 Retrieval quality fixes (folded into this work)
-
- Return at **chunk granularity** with stable `chunk_id` (no collapse-to-document
-  that loses the citable unit).
- **Wire the reranker** (`RerankerService`) into the `search_knowledge_base` path.
- **Chunk overlap** in the indexing pipeline (config in `app/config/__init__.py`,
-  `RecursiveChunker` currently has no overlap).
- Add the `scope` arg to `search_knowledge_base`.
-
-### 4.4 End-to-end pipeline
-
-```mermaid
-flowchart TD
-    U["User turn + @mentions"] --> AMB["Mentions → ambient scope note (no fetch)"]
-    AMB --> ORCH{"ORCHESTRATOR reasons"}
-
-    ORCH -- "scoped/related question" --> SKB["search_knowledge_base(query, scope?)<br/>DIRECT · hybrid + rerank"]
-    ORCH -- "public web" --> WEB["web_search / scrape_webpage<br/>DIRECT"]
-    ORCH -- "summarize/read/navigate/mutate" --> TKB["task(knowledge_base, …)<br/>DELEGATE"]
-    ORCH -- "connector op" --> TCN["task(gmail/slack/…)<br/>DELEGATE"]
-
-    SKB --> REGD["register kb_chunk → [n]"]
-    WEB --> REGD2["register web_result → [n]"]
-
-    subgraph SUB["SUBAGENTS (filesystem / connector tools)"]
-        FS["read_file/ls/glob/grep/…"]
-        CN["connector ops"]
-        FS --> REGS["register → [n] (SHARED registry)"]
-        CN --> REGS
-        REGS --> SYN["synthesize + relay [n] up"]
-    end
-
-    TKB --> FS
-    TCN --> CN
-
-    REGD --> COMPOSE["Orchestrator composes answer with [n]"]
-    REGD2 --> COMPOSE
-    SYN --> COMPOSE
-    COMPOSE --> NORM["[n] → [citation:n]"] --> RESOLVE["resolve via shared registry<br/>(unknown → dropped)"] --> UI["Citation pills"]
-```
-
-### 4.5 Tradeoffs: pull vs push (and perceived latency)
-
-We chose **pull** (the agent reads/searches via tools when needed) over **push**
-(eagerly injecting referenced content into context). Rationale and costs:
-
-**Why pull is the default**
-
- Token efficiency — fetch only what the query needs, not whole docs.
- Scales to many/large mentions, folders, connectors — push cannot.
- Intent-adaptive granularity — passages for scoped Qs, whole doc for summaries.
- Context hygiene — content arrives as *evidence* (`[n]`), not ambient noise.
- Uniform across all mention types.
-
-**Costs (and why they're acceptable)**
-
- **Perceived latency (TTFT).** Pull adds a tool round-trip before answer tokens.
-  This is the only place push clearly wins. The mitigation is **progress
-  streaming** (time-to-first-*signal*, not first-*token*): stream "Reading
-  *Q3 Launch Notes*…" / "Searching your knowledge base…" so the wait feels
-  productive — the pattern used by Perplexity, Claude, and Cursor.
-  > **Out of scope for this ADR's rollout.** Progress streaming is a separate
-  > workstream — it touches the streaming subsystem, not the retrieval/citation
-  > path. Tracked as an **after-plan follow-up**. Today intermediate/subagent
-  > steps are largely suppressed (`surfsense:internal`), which is what makes pull
-  > *feel* slow; the follow-up promotes a curated subset of tool/subagent events
-  > to user-visible progress.
- **"Cite-without-read" risk** — neutralized structurally: ambient pointers carry
-  **no `[n]`**; `[n]` exists only after a tool returns evidence; invented `[n]`
-  resolves to nothing and is dropped. The worst residual case degrades from a
-  confident wrong citation to an uncited claim (further guarded by content-free
-  pointers + a "read before you answer" policy line).
- **Delegation synthesis loss** — whole-doc reads go through the KB subagent,
-  which summarizes back; mitigate by instructing it to return quotes + `[n]`.
-
-**Conditional hybrid.** A bounded eager fast-path (inject content only when a
-single *small* doc is mentioned) may be added **later, only if** latency telemetry
-justifies it — not built speculatively.
-
---
-
-## 5. Mention architecture (scope, not trigger)
-
-When the user mentions anything:
-
-1. It is recorded as **ambient scope** in the system prompt (via `dynamic_prompt`
-   + `runtime.context`), e.g.:
-   > Referenced this turn: doc 42 (`/documents/Launch/Q3.xml`), folder 7
-   > (`/documents/Specs/`). For a scoped question call
-   > `search_knowledge_base(query, scope={document_ids:[42]})`; to load the whole
-   > thing delegate `task(knowledge_base, "read /documents/Launch/Q3.xml …")`.
-2. **No fetch, no RAG, no `<priority_documents>` pre-injection.**
-3. The agent decides: direct `search_knowledge_base(query, scope)` (scoped
-   question) or delegated `task(knowledge_base, …)` read (whole-object intent).
-
-References split into **two kinds** by whether the source is searchable:
-
- **Searchable references** (`@document`, `@folder`, `@connector`, anon upload) — the
-  source is KB-indexed, so they become `scope` and are pulled via
-  `search_knowledge_base` / delegated read. Pointer + pull.
- **Read references** (`@chat`) — the source is **not** KB-indexed, so there is
-  nothing to "search". The thread is a finite, user-selected artifact; its turns are
-  loaded directly (access-checked) and citable as `chat_turn`. Pointer + read.
-
-Per mention type (note the channel — direct vs delegated):
-
-| Mention | Ambient note | Retrieval behavior | Citation kind on use |
-|---|---|---|---|
-| `@document` | doc id + path | direct `search_knowledge_base(scope={document_ids:[id]})`, or delegated `task(knowledge_base, read …)` | `kb_chunk` / `kb_document` |
-| `@folder` | folder id + path | direct `search_knowledge_base(scope={folder_ids:[id]})`, or delegated browse | `kb_chunk` |
-| `@connector account` | connector_id + account | `task(<connector>, "… connector_id=id")` | `connector_item` |
-| `@chat` | thread id + title | **on-demand read** (not `scope`): pointer only; model calls `read_chat(thread_id)` when it needs the conversation, reusing the access-checked `referenced_chat_context` resolver | `chat_turn` |
-| anonymous upload | session doc ref | direct `search_knowledge_base(scope=anon)` / delegated read | `anon_chunk` |
-
---
-
-## 6. Context plane separation
-
-| Plane | Carries | Mechanism | Lifetime |
-|---|---|---|---|
-| **Ambient** | workspace tree, mention scope, memory, instructions | system prompt via `dynamic_prompt` + `runtime.context` | per-turn, not persisted in messages |
-| **Evidence** | retrieved passages with `[n]` | tool results / `<retrieved_context>` | enters trajectory when a tool runs |
-| **Trajectory** | user/assistant turns, tool calls | `messages` | durable, checkpointed |
-
-The workspace tree and priority/registry listings move **out** of `messages` into
-the ambient plane.
-
---
-
-## 7. Cleanup (what gets removed/changed)
-
-Remove from the hot path:
-
- `KnowledgePriorityMiddleware` search branch (planner LLM, embedding, hybrid
-  search in `before_agent`). ✅ **Done** — the whole `knowledge_search.py`
-  module is deleted.
- `fetch_mentioned_documents` eager chunk pull.
- `<priority_documents>` pre-injection and `KbContextProjectionMiddleware`
-  priority projection. ✅ **Done** — `<priority_documents>` is no longer
-  produced anywhere; `KbContextProjectionMiddleware` is trimmed to a pure
-  `<workspace_tree>` projector. The `enable_kb_priority_preinjection` flag and
-  every `<priority_documents>` prompt reference are removed.
- `kb_priority` state plumbing (deleted per §8.10; add a dedicated
-  `citation_registry` field instead). ✅ **Done** — `kb_priority` /
-  `KbPriorityEntry` are removed from state + reducers. `kb_matched_chunk_ids`
-  is already gone (build-order Step 5).
-
-Keep / add:
-
- `search_knowledge_base(query, scope?)` (orchestrator-direct) as the **only** RAG
-  entry point, returning registered chunks with `[n]`. Add the `scope` arg.
- `read_file` (knowledge_base subagent, via `task`) for whole-object ops; cited
-  reads register a `kb_document` / `kb_chunk` entry into the shared registry.
- The **citation registry** in state (shared across orchestrator + subagents).
- Reranker wired into `search_knowledge_base`; chunk overlap in indexing.
- Ambient mention note via `dynamic_prompt`.
- **Fix `routing.md`:** add `search_knowledge_base` to the orchestrator's
-  direct-tools list, and clarify that "search inside the workspace goes through
-  `task(knowledge_base)`" refers to **filesystem** search (`grep`/`glob`), not the
-  semantic `search_knowledge_base` tool.
-
---
-
-## 8. Locked decisions
-
-1. Model cites `[n]`; server owns `[n] → source` via a registry. ✅
-2. Numbering is **per-conversation, monotonic, dedup'd** (find-or-create). ✅
-3. Retrieval is pull-based: orchestrator-direct `search_knowledge_base` (RAG) +
-   delegated `read_file` (knowledge_base subagent); no pre-agent retrieval. ✅
-4. Mention = ambient scope; `scope` is an agent-controlled `search_knowledge_base`
-   filter. ✅
-5. Scoped search still runs full hybrid ranking within scope. ✅
-6. Ambient context (tree, mention scope) lives in the system prompt, not `messages`. ✅
-7. Wire token stays `[citation:ID]` with `ID = n`. ✅
-8. **Model emits `[n]`; the server normalizes `[n]` → `[citation:n]`** on the
-   streamed output before the existing parser. The model's surface stays minimal. ✅
-9. **Subagent retrievals register into the same conversation `citation_registry`**,
-   so `[n]` is globally consistent across orchestrator + subagents. This replaces
-   the Channel A/B relay entirely. ✅
-10. **Delete the legacy `kb_priority` / `kb_matched_chunk_ids` plumbing**; add a
-    dedicated `citation_registry` field to state rather than overloading old
-    fields. ✅
-11. **`@chat` is a non-indexed read reference** (chats aren't in `Document`/`Chunk`):
-    pointer only, loaded **on demand** via a `read_chat(thread_id)` tool that reuses
-    the access-checked `referenced_chat_context` resolver and registers each surfaced
-    turn as `chat_turn`. ✅
-12. **One document render for both surfaces.** RAG excerpts
-    (`search_knowledge_base`) and full reads (`read_file`) render through a *single*
-    document renderer — same envelope, same `[n]` contract. Completeness is carried
-    by `view="excerpt"` vs `view="full"`, **not** an `is_complete` boolean and **not**
-    a numeric coverage count: `view="excerpt"` alone tells the model it saw a slice.
-    (A `chunks_shown`/`total_chunks` count was considered and dropped — it never had a
-    total to show for search excerpts, and full reads already say `view="full"`.) Raw
-    ids and `metadata_json` are dropped from the model's view.
-    **No `<chunk_index>` seek table** — a full read returns the whole document as one
-    numbered document block (an index keyed by internal ids gives the agent no actionable
-    signal, and any `[n]`-keyed/preview index adds cognitive load that risks
-    degrading the primary answer). Supersedes the standalone `<retrieved_context>`
-    shape and the removed `is_complete`. See §12. (planned)
-
-## 9. Open items
-
-_All decisions locked (§8). Decision #12 is locked but **not yet built** — see the
-§12 schema and the rollout follow-ups._
-
-## 10. Rollout
-
-### Already built in parallel (committed, not yet wired)
-
-`shared/citations/` (registry, markers, normalizer), `shared/retrieved_context/`
-(renderer), `shared/retrieval/` (hybrid search + rerank + service), hybrid-search
-behavior tests, and the on-contract prompt `base/citation_contract.md`
-(`[n]` / `[1][2]`).
-
-### Two findings that shape the cutover
-
- **The agent is already pull-based by default.** `enable_kb_priority_preinjection`
-  is `False` and `KnowledgePriorityMiddleware` runs `mentions_only=True`; an
-  on-demand `search_knowledge_base` tool already exists. So the cutover *upgrades
-  the existing pull tool to the citation spine* — it does not remove eager RAG
-  (already gated off).
- **The production citation prompt is local to the agent**, at
-  `main_agent/system_prompt/prompts/citations/on.md` (two-channel
-  `[citation:chunk_id]`). The composer's `base/citations_on.md` only serves the
-  anonymous/automation path. Both must learn the `[n]` contract.
-
-### Phased cutover
-
-0. **Registry on state.** Add `citation_registry: CitationRegistry` to
-   `SurfSenseFilesystemState` with a replace reducer; confirm checkpointer
-   round-trip.
-1. **Swap the KB tool.** Rewrite `search_knowledge_base` to call
-   `search_knowledge_base_context` (renders `<retrieved_context>` with `[n]`,
-   mutates the registry) and persist the registry via `Command(update=...)`.
-2. **Normalize `[n]` → `[citation:<payload>]`.** Finalize-time first (rewrite the
-   completed assistant text from the checkpointed registry before DB persist);
-   buffered live-stream normalization is a follow-up. Bare-`[n]` only, so
-   web_search `[citation:url]` markers are untouched.
-3. **Prompt contract (both surfaces).** Update `main_agent/.../citations/on.md`
-   (production) to teach the `[n]` channel alongside the existing web_search/`task`
-   channels; reconcile the composer path by folding `citation_contract.md` into
-   `base/citations_on.md` (then delete `citation_contract.md`). `citations_off.md`
-   stays.
-4. **Mentions → scope.** Map `@document`/`@folder` mentions to
-   `SearchScope(document_ids=…)` for the tool; retire `kb_priority` mention
-   surfacing.
-5. **Remove the old eager path.** ✅ **Done** — `KnowledgePriorityMiddleware`
-   and the old `search_knowledge_base` hybrid helper in `knowledge_search.py`
-   are deleted (the whole module is gone); `kb_context_projection` is trimmed to
-   a tree-only projector (kept because it still projects `<workspace_tree>` to
-   subagents); `kb_priority` state + the `enable_kb_priority_preinjection` flag +
-   all `<priority_documents>` prompt references are removed. Still pending:
-   `ChucksHybridSearchRetriever` (after migrating `ConnectorService`). Migrate
-   `web_search` to register `WEB_RESULT` so all citations unify on `[n]` —
-   **done**, see §12 build-order Step 6.
-
---
-
-## 11. After-plan follow-ups (separate workstreams)
-
-Not part of the §10 rollout — different subsystems, tracked here so they aren't
-lost:
-
- **Progress streaming** (streaming subsystem). Promote a curated subset of
-  tool/subagent events to user-visible progress ("Reading…", "Searching…") to
-  collapse *perceived* latency from pull-based retrieval. See §4.5. This is the
-  mitigation for pull's only real cost, but it touches the streaming pipeline, not
-  the retrieval/citation path — so it ships independently.
-
---
-
-## 12. Unified document render (search + read)
-
-The model meets a knowledge-base document in two moments: as **excerpts** from a
-search, and as a **full read** of one object. Today these use two unrelated
-shapes (compact text for search; `<document_metadata>` + `<chunk_index>` +
-`<chunk id>` XML for reads), with two different citation tokens. That doubles the
-schema the model must learn and is a hallucination surface. We collapse both onto
-**one renderer**.
-
-### Principles
-
- **One envelope, two views.** The same renderer renders a document whether it
-  arrives partial (search) or complete (read). Only the `view` and the set of
-  passages shown differ.
- **`[n]` is the only citable token**, in both views, assigned by the shared
-  registry (find-or-create). A chunk first seen in search keeps its `[n]` when the
-  same doc is later read in full.
- **Completeness is the `view` word, nothing more.** A search result is inherently
-  excerpts; a read is inherently the whole object. No `is_complete` flag, no numeric
-  coverage count. `view="excerpt"` tells the model it saw a slice (so it should read
-  the doc before claiming the doc "only" says X); `view="full"` says it has the whole
-  object. A `chunks_shown`/`total_chunks` count was considered and rejected: search
-  excerpts have no total on hand (and we won't add a count query for it), and full
-  reads are already self-evident from `view`.
- **Drop noise.** Raw `document_id` / `chunk_id` and the `metadata_json` blob
-  leave the model's view (they stay server-side as registry keys). The model
-  sees `title`, `source`, and `[n]` passages.
- **No seek table.** A full read returns the whole document as one numbered
-  document block; the `<chunk_index>` line-range map is dropped. It was keyed by internal
-  `chunk_id` (which the model never sees), so it gave the agent nothing actionable
-  to seek by. Re-keying it to `[n]` or adding chunk previews would only add cognitive
-  load the agent must reconcile against the actual content — a hallucination/quality
-  risk that outweighs the token savings on the rare genuinely-large read. Simpler:
-  hand over the document, numbered, and let the model read it.
-
-### Shape
-
-Excerpt (from `search_knowledge_base`):
-
-```xml
-<document title="Q3 Launch Notes" source="Slack · #launch · 2026-03-02" view="excerpt">
-  [3] We agreed to push launch to March 10.
-  [4] Marketing will be notified next week.
-</document>
-```
-
-Full (from a read):
-
-```xml
-<document title="Q3 Launch Notes" source="Slack · #launch · 2026-03-02" view="full">
-  [3] We agreed to push launch to March 10.
-  [4] Marketing will be notified next week.
-  [7] …
-  …(all chunks, numbered)
-</document>
-```
-
-`<retrieved_context>` becomes simply "N documents in excerpt view"; a read is
-"one document in full view". This supersedes the standalone `<retrieved_context>`
-renderer decision and confirms the earlier removal of `is_complete`.
-
-### Build order (one step at a time)
-
-1. **Registry merge reducer** — `citation_registry` merges (find-or-create union,
-   re-mint on collision) instead of replacing, so parent/subagent (and parallel)
-   registrations stay globally consistent. Pure; independently testable. ✅
-2. **One document renderer** with a `view` parameter; point `search_knowledge_base`
-   at it (excerpt view), replacing today's `retrieved_context` renderer. ✅
-3. **Register-on-read + full view** — the KB read path registers its chunks and
-   renders through the same renderer (full view); the whole document is returned
-   numbered, with **no `<chunk_index>`**. The `read_file` tool loads the document
-   via `KBPostgresBackend.aload_document`, renders it against the conversation
-   registry, and persists `citation_registry`; `build_document_xml` is deleted. ✅
-4. **Retire Channel C** — now that KB reads emit `[n]` (Step 3), the
-   knowledge_base read/specialist path cites bare `[n]` instead of
-   `[citation:chunk_id]`. The KB subagent prompts (cloud/desktop, full/read-only)
-   and `description_readonly.md` were rewritten to the `<document view="full">`
-   `[n]` format, the `evidence.chunk_ids` field became `evidence.citations`, and
-   `citations/on.md` folds the KB relay into Channel A (preserve `[n]` from a
-   specialist verbatim). Channel C is **narrowed, not deleted**: it still covers
-   `task` specialists that emit `[citation:id]` — today only the deliverables
-   `knowledge_base` tool, which builds its own `<chunk id>` XML and is not yet on
-   the registry/`[n]` spine. Migrating that tool (and then fully deleting
-   Channel C) is a follow-up. ✅
-5. **Delete `kb_matched_chunk_ids`** — with no seek table and no `matched` flag, the
-   search→read highlighting hand-off has no consumer. Removed: the state field
-   (`filesystem_state.py`) and its reducer default (`reducers.py`); the
-   `search_knowledge_base` tool's `_matched_chunk_ids` writer; the dead
-   `KnowledgePriorityMiddleware` writes plus the `matched_chunk_ids` return of
-   `_materialize_priority` (`knowledge_search.py`); and the stale
-   `<chunk_index>` / `matched="true"` / `<chunk id>` rendering prose in the cloud
-   filesystem prompt (`cloud.py`), rewritten to the `<document view="full">` `[n]`
-   read format. The `resolver.py` docstring reference was dropped and the two
-   integration assertions that read the field now assert scope confinement via the
-   rendered `<retrieved_context>` titles. (The retriever-layer `matched_chunk_ids`
-   in `chunks_hybrid_search.py` is a separate output shape and is untouched.) ✅
-6. **Web onto the registry (Channel B → A)** — `web_search` now registers each
-   result as a `WEB_RESULT` (locator `{url}`) and renders a `<web_results>` block
-   of `<document view="excerpt">` blocks with `[n]` labels, returning a
-   `Command(update={messages, citation_registry})` like `search_knowledge_base`.
-   `markers.py` already maps `WEB_RESULT → url`, so `[n]` resolves end-to-end with
-   no frontend change. To enable this, the renderer was generalized: a
-   `RenderablePassage` now carries a generic `locator: dict` (KB fills
-   `{document_id, chunk_id}`; web fills `{url}`) instead of fixed KB fields, and a
-   dedicated **citation-state middleware** declares the `citation_registry` channel
-   for the `research` subagent (which doesn't use the filesystem state). The two
-   duplicate `web_search` implementations were collapsed into the shared
-   `app/agents/chat/shared/tools/web_search.py`; the `research` copy was deleted.
-   Prompts updated: `citations/on.md` drops the web channel (web is now Channel A
-   `[n]`; only the legacy `[citation:id]` specialist relay remains, relabelled
-   Channel B), the research subagent prompt cites `[n]`, the main `web_search`
-   description teaches `<web_results>`/`[n]`, `off.md` suppresses `[n]` too, and
-   stale `<chunk_index>`/`[citation:chunk_id]` references in `dynamic_context` and
-   the grok/openai_codex provider hints were corrected to `[n]`. `scrape_webpage`
-   stays uncited (raw page text, no `[n]`) — a fact from a scrape reports its URL
-   instead. Connectors and chat turns remain unmigrated (future workstreams). ✅
--- a/surfsense_backend/tests/unit/agents/multi_agent_chat/shared/citations/test_registry.py
+++ b/surfsense_backend/tests/unit/agents/multi_agent_chat/shared/citations/test_registry.py
@ -1,4 +1,4 @@
-"""Unit tests for the citation registry spine (ADR 0001 §3)."""
+"""Unit tests for the citation registry spine."""

 from __future__ import annotations