docs: remove ADR 0001 (RAG/citation architecture shipped)

The RAG/citation/context redesign in ADR 0001 is implemented and validated
(KB + web on the unified [n] citation spine, pull-based retrieval, eager path
retired). Drop the ADR and the one stale docstring reference to it.
This commit is contained in:
CREDO23 2026-06-25 18:50:58 +02:00
parent 2beafbdec8
commit 232cc937c5
2 changed files with 1 additions and 635 deletions

View file

@ -1,634 +0,0 @@
# ADR 0001 — RAG, Citation, and Context Architecture
- **Status:** Proposed
- **Date:** 2026-06-24
- **Owners:** SurfSense core
- **Supersedes:** the pre-agent KB priority/planner injection path
---
## 1. Context & problem
SurfSense answers questions over a user's indexed knowledge base (documents,
chats, connectors, web results). The current pipeline causes the model to
**hallucinate citations and answers**. Root causes identified during review:
- **Content/ID split.** The model is asked to author or copy complex identifiers
(`chunk_id`, raw URLs, free-text titles) that sit far from the content they
label. LLMs reliably corrupt nearby digits — so citations point at the wrong
source or at nothing.
- **Pre-agent work.** A planner LLM call + embedding + hybrid search runs in
`before_agent` on every turn (`KnowledgePriorityMiddleware`), plus an eager
`fetch_mentioned_documents` whose chunks are then **discarded**. This adds
latency and context noise before the agent even reasons.
- **Mentions are mismanaged.** An `@document` mention forces a wasted full-chunk
fetch, points at the doc **twice** (inline backtick path + `<priority_documents>`
entry), and still requires a read round-trip — then dumps the **whole** doc
regardless of the question.
- **Retrieval quality.** Search retrieves on chunks but collapses to documents,
chunks have **no overlap**, and the reranker exists (`RerankerService`) but is
**not wired** into the agent path.
- **Context bloat.** The workspace tree (up to 4000 tokens) and priority lists are
injected into the durable `messages` list every turn, causing context
distraction/confusion.
This ADR defines the target architecture. It is the **single source of truth**;
implementation issues should reference section numbers here.
---
## 2. Principles
1. **The model cites tiny numbers `[n]`, never identifiers.** The server owns the
mapping from `[n]` to a real source. There is nothing for the model to invent.
2. **Retrieval is pull-based, behind tools.** Nothing retrieves before the agent
runs. The agent calls a tool when it needs information.
3. **A mention is scope, not a retrieval trigger.** Mentioning a thing tells the
model the thing exists and gives it a filter it *may* apply — it does not fetch.
4. **Ambient context is not conversation.** Transient per-turn context (tree,
mention scope, memory) is rendered via the system prompt, not appended to the
durable `messages` trajectory.
5. **All complexity lives server-side** (resolver, retriever), so the model's job
stays trivial: read passages, echo the number next to the one you used.
---
## 3. Citation architecture (the spine)
Everything hangs off this. Build it first.
### 3.1 What is citable
Anything that is *information retrieved from a source*. Each source type has a
natural **citable unit**:
| Source | Citable unit | Entry locator | Enters context via |
|---|---|---|---|
| `kb_chunk` | chunk | `document_id` + `chunk_id` | `search_knowledge_base` |
| `kb_document` | document | `document_id` | `read` (whole doc) |
| `connector_item` | item | `connector_id` + `external_id` | connector tool |
| `web_result` | url | `url` | web search / crawl |
| `chat_turn` | turn | `thread_id` + `message_id` | `@chat` / referenced chat |
| `anon_chunk` | chunk | `session/doc` + `chunk_id` | uploaded anonymous doc |
**Not citable** (control/pointer — never gets a number): workspace tree, mention
scope notes, `report_context`, the priority/registry listing itself.
### 3.2 The citation entry (the truth)
A registered entry is the durable identity of a citable unit:
```python
class CitationEntry(TypedDict):
n: int # the tiny label shown to the model
source_type: str # "kb_chunk" | "kb_document" | "connector_item"
# | "web_result" | "chat_turn" | "anon_chunk"
locator: dict[str, Any] # source-specific identity (see table 3.1)
display: dict[str, Any] # title, source label, url, date — for the UI pill
```
### 3.3 The registry (the bookkeeping)
Lives in agent **state** so it survives across turns and across orchestrator +
subagents.
```python
class CitationRegistry(TypedDict):
by_n: dict[int, CitationEntry] # n -> entry (resolve direction)
by_key: dict[str, int] # source_key -> n (dedup / find-or-create)
next_n: int # monotonic counter
```
- **`source_key`** is a stable string derived from `(source_type, locator)`, e.g.
`"kb_chunk:42:880"`, `"web_result:https://…"`, `"chat_turn:7:1190"`.
- **Numbering is per-conversation and monotonic.** A given `[n]` never changes
meaning within a conversation.
- **Dedup:** registering an already-seen unit returns its existing `n`.
### 3.4 The two operations
```python
def register(registry, source_type, locator, display) -> int:
"""Find-or-create. Returns the [n] for this unit."""
key = make_key(source_type, locator)
if key in registry["by_key"]:
return registry["by_key"][key]
n = registry["next_n"]
registry["next_n"] += 1
registry["by_n"][n] = {"n": n, "source_type": source_type,
"locator": locator, "display": display}
registry["by_key"][key] = n
return n
def resolve(registry, n) -> CitationEntry | None:
"""Map a model-emitted [n] back to its source. Unknown n -> None (drop)."""
return registry["by_n"].get(n)
```
### 3.5 Lifecycle
```
source yields item
→ register(entry) # source_type + locator + display → assign/reuse [n]
→ render passage with [n] # the number sits INLINE next to the content
→ model writes "...March 10 [n]"
→ resolver: [n] → entry # server-side, on the streamed answer
→ frontend renders citation pill
```
The model only ever **echoes** a number that was printed next to the content it
used. Unknown/garbled numbers resolve to nothing and are dropped (abstention by
construction).
### 3.6 Presentation format (`<retrieved_context>`)
`[n]` must be the **only** citable integer adjacent to each passage. No
`chunk 4 of 19`, no raw ids near the text. Grouping by document is allowed; the
`[n]` is per passage.
```
<retrieved_context>
Excerpts retrieved from the user's knowledge base for this query.
Cite a passage with its [n].
Document: "Q3 Launch Notes" (Slack · #launch · 2026-03-02)
[1] We agreed to push launch to March 10.
[2] Marketing will be notified next week.
Document: "Timeline" (Notion · 2026-02-28)
[3] Dates floated were Mar 10 and Mar 17.
</retrieved_context>
```
### 3.7 Reconciliation with the existing token format
The frontend and evals already parse **`[citation:ID]`**
(`surfsense_web/lib/citations/citation-parser.ts`,
`surfsense_evals/src/surfsense_evals/core/parse/citations.py`).
**Decision:** keep the wire token `[citation:ID]` where `ID = n`. The model is
instructed to emit `[n]`; a thin normalization step rewrites `[n]`
`[citation:n]` on the streamed output before it reaches the existing parser, OR
the model is instructed to emit `[citation:n]` directly. Either way `ID` is now a
**small ordinal from the registry**, not a `chunk_id`/url/title. The resolver maps
`n``CitationEntry` → the frontend citation object the UI already expects.
> **Decided (§8.8):** the model emits `[n]` (smallest surface for the model to
> get right); the server normalizes `[n]``[citation:n]` before the existing
> parser.
---
## 4. Retrieval architecture (pull-based)
### 4.0 Execution channels (verified against the codebase)
The orchestrator (main agent) does **not** own the virtual filesystem. It has a
small fixed toolset; everything else is delegated via `task(<specialist>, …)`.
Verified in `main_agent/tools/index.py` and `subagents/builtins/knowledge_base`.
| Capability | Owner | Reached via |
|---|---|---|
| `search_knowledge_base(query, scope?)` — semantic/hybrid **RAG retrieval**, read-only | **orchestrator** | direct call |
| `web_search`, `scrape_webpage` | **orchestrator** | direct call |
| `update_memory`, `create_automation`, `write_todos`, `task` | **orchestrator** | direct call |
| virtual filesystem: `read_file`, `write_file`, `edit_file`, `ls`, `glob`, `grep`, `list_tree`, `rm`, `rmdir`, `move_file` | **knowledge_base subagent** | `task(knowledge_base, …)` |
| connector ops (gmail/slack/jira/…) | **connector subagents** | `task(<connector>, …)` |
Consequences for citations:
- The **dominant RAG path is orchestrator-direct** (`search_knowledge_base`), so
it registers `[n]` exactly where the answer is composed — **no relay**.
- The **shared registry** (§8.9) is load-bearing only for the **delegated** lanes
(whole-doc reads via `knowledge_base`, connector reads): the subagent registers
into the shared registry and relays `[n]` upward.
- `search_knowledge_base` is **semantic RAG**, distinct from filesystem search
(`grep`/`glob`), which belongs to the subagent. `routing.md` conflates these and
omits `search_knowledge_base` from its direct-tools list — that prompt is stale
and must be corrected (see §7).
### 4.1 The two retrieval operations
| Operation | Tool | Owner | For |
|---|---|---|---|
| **search** | `search_knowledge_base(query, scope?)` → chunks, each registered → `[n]` | orchestrator (direct) | "related / scoped question" — RAG |
| **read** | `read_file(path)` (whole object) | knowledge_base subagent (`task`) | "summarize / translate / rewrite / navigate this" |
The agent chooses based on the query. No server-side intent classifier; the query
semantics decide (summarize ⇒ delegate a `read`; related ⇒ direct `search`).
### 4.2 `scope` — the mention→retrieval bridge
`scope` is an **optional typed filter** restricting the search haystack:
```python
scope = {
"document_ids": [42],
"folder_ids": [],
"connector_ids": [],
}
```
- Becomes `WHERE` constraints on the chunk search (`document_id IN (...)`, etc.).
- **Agent-controlled, not automatic.** "in this doc" → agent passes scope; "related"
→ agent omits it.
- Spans only **KB-indexed** references (doc/folder/connector). Chats are **not**
KB-indexed (no `CHAT` document type; they live in `NewChatThread` /
`NewChatMessage`, not `Document`/`Chunk`), so `@chat` never appears in `scope`
it uses the separate read channel in §5.
- **How it reaches the retriever depends on the channel:**
- direct `search_knowledge_base``scope` is a **structured tool arg** the
orchestrator passes (new arg to add — current tool has no `scope`).
- delegated `read` / browse → the orchestrator expresses scope in the **task
prompt** (path + ids); the subagent translates it into its filesystem calls.
**Decision:** even when `scope` pins a single doc, `search_knowledge_base` still
runs full hybrid ranking *within* that doc (a large doc still needs its relevant
passages surfaced) — it does not return raw chunk order.
### 4.3 Retrieval quality fixes (folded into this work)
- Return at **chunk granularity** with stable `chunk_id` (no collapse-to-document
that loses the citable unit).
- **Wire the reranker** (`RerankerService`) into the `search_knowledge_base` path.
- **Chunk overlap** in the indexing pipeline (config in `app/config/__init__.py`,
`RecursiveChunker` currently has no overlap).
- Add the `scope` arg to `search_knowledge_base`.
### 4.4 End-to-end pipeline
```mermaid
flowchart TD
U["User turn + @mentions"] --> AMB["Mentions → ambient scope note (no fetch)"]
AMB --> ORCH{"ORCHESTRATOR reasons"}
ORCH -- "scoped/related question" --> SKB["search_knowledge_base(query, scope?)<br/>DIRECT · hybrid + rerank"]
ORCH -- "public web" --> WEB["web_search / scrape_webpage<br/>DIRECT"]
ORCH -- "summarize/read/navigate/mutate" --> TKB["task(knowledge_base, …)<br/>DELEGATE"]
ORCH -- "connector op" --> TCN["task(gmail/slack/…)<br/>DELEGATE"]
SKB --> REGD["register kb_chunk → [n]"]
WEB --> REGD2["register web_result → [n]"]
subgraph SUB["SUBAGENTS (filesystem / connector tools)"]
FS["read_file/ls/glob/grep/…"]
CN["connector ops"]
FS --> REGS["register → [n] (SHARED registry)"]
CN --> REGS
REGS --> SYN["synthesize + relay [n] up"]
end
TKB --> FS
TCN --> CN
REGD --> COMPOSE["Orchestrator composes answer with [n]"]
REGD2 --> COMPOSE
SYN --> COMPOSE
COMPOSE --> NORM["[n] → [citation:n]"] --> RESOLVE["resolve via shared registry<br/>(unknown → dropped)"] --> UI["Citation pills"]
```
### 4.5 Tradeoffs: pull vs push (and perceived latency)
We chose **pull** (the agent reads/searches via tools when needed) over **push**
(eagerly injecting referenced content into context). Rationale and costs:
**Why pull is the default**
- Token efficiency — fetch only what the query needs, not whole docs.
- Scales to many/large mentions, folders, connectors — push cannot.
- Intent-adaptive granularity — passages for scoped Qs, whole doc for summaries.
- Context hygiene — content arrives as *evidence* (`[n]`), not ambient noise.
- Uniform across all mention types.
**Costs (and why they're acceptable)**
- **Perceived latency (TTFT).** Pull adds a tool round-trip before answer tokens.
This is the only place push clearly wins. The mitigation is **progress
streaming** (time-to-first-*signal*, not first-*token*): stream "Reading
*Q3 Launch Notes*…" / "Searching your knowledge base…" so the wait feels
productive — the pattern used by Perplexity, Claude, and Cursor.
> **Out of scope for this ADR's rollout.** Progress streaming is a separate
> workstream — it touches the streaming subsystem, not the retrieval/citation
> path. Tracked as an **after-plan follow-up**. Today intermediate/subagent
> steps are largely suppressed (`surfsense:internal`), which is what makes pull
> *feel* slow; the follow-up promotes a curated subset of tool/subagent events
> to user-visible progress.
- **"Cite-without-read" risk** — neutralized structurally: ambient pointers carry
**no `[n]`**; `[n]` exists only after a tool returns evidence; invented `[n]`
resolves to nothing and is dropped. The worst residual case degrades from a
confident wrong citation to an uncited claim (further guarded by content-free
pointers + a "read before you answer" policy line).
- **Delegation synthesis loss** — whole-doc reads go through the KB subagent,
which summarizes back; mitigate by instructing it to return quotes + `[n]`.
**Conditional hybrid.** A bounded eager fast-path (inject content only when a
single *small* doc is mentioned) may be added **later, only if** latency telemetry
justifies it — not built speculatively.
---
## 5. Mention architecture (scope, not trigger)
When the user mentions anything:
1. It is recorded as **ambient scope** in the system prompt (via `dynamic_prompt`
+ `runtime.context`), e.g.:
> Referenced this turn: doc 42 (`/documents/Launch/Q3.xml`), folder 7
> (`/documents/Specs/`). For a scoped question call
> `search_knowledge_base(query, scope={document_ids:[42]})`; to load the whole
> thing delegate `task(knowledge_base, "read /documents/Launch/Q3.xml …")`.
2. **No fetch, no RAG, no `<priority_documents>` pre-injection.**
3. The agent decides: direct `search_knowledge_base(query, scope)` (scoped
question) or delegated `task(knowledge_base, …)` read (whole-object intent).
References split into **two kinds** by whether the source is searchable:
- **Searchable references** (`@document`, `@folder`, `@connector`, anon upload) — the
source is KB-indexed, so they become `scope` and are pulled via
`search_knowledge_base` / delegated read. Pointer + pull.
- **Read references** (`@chat`) — the source is **not** KB-indexed, so there is
nothing to "search". The thread is a finite, user-selected artifact; its turns are
loaded directly (access-checked) and citable as `chat_turn`. Pointer + read.
Per mention type (note the channel — direct vs delegated):
| Mention | Ambient note | Retrieval behavior | Citation kind on use |
|---|---|---|---|
| `@document` | doc id + path | direct `search_knowledge_base(scope={document_ids:[id]})`, or delegated `task(knowledge_base, read …)` | `kb_chunk` / `kb_document` |
| `@folder` | folder id + path | direct `search_knowledge_base(scope={folder_ids:[id]})`, or delegated browse | `kb_chunk` |
| `@connector account` | connector_id + account | `task(<connector>, "… connector_id=id")` | `connector_item` |
| `@chat` | thread id + title | **on-demand read** (not `scope`): pointer only; model calls `read_chat(thread_id)` when it needs the conversation, reusing the access-checked `referenced_chat_context` resolver | `chat_turn` |
| anonymous upload | session doc ref | direct `search_knowledge_base(scope=anon)` / delegated read | `anon_chunk` |
---
## 6. Context plane separation
| Plane | Carries | Mechanism | Lifetime |
|---|---|---|---|
| **Ambient** | workspace tree, mention scope, memory, instructions | system prompt via `dynamic_prompt` + `runtime.context` | per-turn, not persisted in messages |
| **Evidence** | retrieved passages with `[n]` | tool results / `<retrieved_context>` | enters trajectory when a tool runs |
| **Trajectory** | user/assistant turns, tool calls | `messages` | durable, checkpointed |
The workspace tree and priority/registry listings move **out** of `messages` into
the ambient plane.
---
## 7. Cleanup (what gets removed/changed)
Remove from the hot path:
- `KnowledgePriorityMiddleware` search branch (planner LLM, embedding, hybrid
search in `before_agent`). ✅ **Done** — the whole `knowledge_search.py`
module is deleted.
- `fetch_mentioned_documents` eager chunk pull.
- `<priority_documents>` pre-injection and `KbContextProjectionMiddleware`
priority projection. ✅ **Done**`<priority_documents>` is no longer
produced anywhere; `KbContextProjectionMiddleware` is trimmed to a pure
`<workspace_tree>` projector. The `enable_kb_priority_preinjection` flag and
every `<priority_documents>` prompt reference are removed.
- `kb_priority` state plumbing (deleted per §8.10; add a dedicated
`citation_registry` field instead). ✅ **Done**`kb_priority` /
`KbPriorityEntry` are removed from state + reducers. `kb_matched_chunk_ids`
is already gone (build-order Step 5).
Keep / add:
- `search_knowledge_base(query, scope?)` (orchestrator-direct) as the **only** RAG
entry point, returning registered chunks with `[n]`. Add the `scope` arg.
- `read_file` (knowledge_base subagent, via `task`) for whole-object ops; cited
reads register a `kb_document` / `kb_chunk` entry into the shared registry.
- The **citation registry** in state (shared across orchestrator + subagents).
- Reranker wired into `search_knowledge_base`; chunk overlap in indexing.
- Ambient mention note via `dynamic_prompt`.
- **Fix `routing.md`:** add `search_knowledge_base` to the orchestrator's
direct-tools list, and clarify that "search inside the workspace goes through
`task(knowledge_base)`" refers to **filesystem** search (`grep`/`glob`), not the
semantic `search_knowledge_base` tool.
---
## 8. Locked decisions
1. Model cites `[n]`; server owns `[n] → source` via a registry. ✅
2. Numbering is **per-conversation, monotonic, dedup'd** (find-or-create). ✅
3. Retrieval is pull-based: orchestrator-direct `search_knowledge_base` (RAG) +
delegated `read_file` (knowledge_base subagent); no pre-agent retrieval. ✅
4. Mention = ambient scope; `scope` is an agent-controlled `search_knowledge_base`
filter. ✅
5. Scoped search still runs full hybrid ranking within scope. ✅
6. Ambient context (tree, mention scope) lives in the system prompt, not `messages`. ✅
7. Wire token stays `[citation:ID]` with `ID = n`. ✅
8. **Model emits `[n]`; the server normalizes `[n]` → `[citation:n]`** on the
streamed output before the existing parser. The model's surface stays minimal. ✅
9. **Subagent retrievals register into the same conversation `citation_registry`**,
so `[n]` is globally consistent across orchestrator + subagents. This replaces
the Channel A/B relay entirely. ✅
10. **Delete the legacy `kb_priority` / `kb_matched_chunk_ids` plumbing**; add a
dedicated `citation_registry` field to state rather than overloading old
fields. ✅
11. **`@chat` is a non-indexed read reference** (chats aren't in `Document`/`Chunk`):
pointer only, loaded **on demand** via a `read_chat(thread_id)` tool that reuses
the access-checked `referenced_chat_context` resolver and registers each surfaced
turn as `chat_turn`. ✅
12. **One document render for both surfaces.** RAG excerpts
(`search_knowledge_base`) and full reads (`read_file`) render through a *single*
document renderer — same envelope, same `[n]` contract. Completeness is carried
by `view="excerpt"` vs `view="full"`, **not** an `is_complete` boolean and **not**
a numeric coverage count: `view="excerpt"` alone tells the model it saw a slice.
(A `chunks_shown`/`total_chunks` count was considered and dropped — it never had a
total to show for search excerpts, and full reads already say `view="full"`.) Raw
ids and `metadata_json` are dropped from the model's view.
**No `<chunk_index>` seek table** — a full read returns the whole document as one
numbered document block (an index keyed by internal ids gives the agent no actionable
signal, and any `[n]`-keyed/preview index adds cognitive load that risks
degrading the primary answer). Supersedes the standalone `<retrieved_context>`
shape and the removed `is_complete`. See §12. (planned)
## 9. Open items
_All decisions locked (§8). Decision #12 is locked but **not yet built** — see the
§12 schema and the rollout follow-ups._
## 10. Rollout
### Already built in parallel (committed, not yet wired)
`shared/citations/` (registry, markers, normalizer), `shared/retrieved_context/`
(renderer), `shared/retrieval/` (hybrid search + rerank + service), hybrid-search
behavior tests, and the on-contract prompt `base/citation_contract.md`
(`[n]` / `[1][2]`).
### Two findings that shape the cutover
- **The agent is already pull-based by default.** `enable_kb_priority_preinjection`
is `False` and `KnowledgePriorityMiddleware` runs `mentions_only=True`; an
on-demand `search_knowledge_base` tool already exists. So the cutover *upgrades
the existing pull tool to the citation spine* — it does not remove eager RAG
(already gated off).
- **The production citation prompt is local to the agent**, at
`main_agent/system_prompt/prompts/citations/on.md` (two-channel
`[citation:chunk_id]`). The composer's `base/citations_on.md` only serves the
anonymous/automation path. Both must learn the `[n]` contract.
### Phased cutover
0. **Registry on state.** Add `citation_registry: CitationRegistry` to
`SurfSenseFilesystemState` with a replace reducer; confirm checkpointer
round-trip.
1. **Swap the KB tool.** Rewrite `search_knowledge_base` to call
`search_knowledge_base_context` (renders `<retrieved_context>` with `[n]`,
mutates the registry) and persist the registry via `Command(update=...)`.
2. **Normalize `[n]` → `[citation:<payload>]`.** Finalize-time first (rewrite the
completed assistant text from the checkpointed registry before DB persist);
buffered live-stream normalization is a follow-up. Bare-`[n]` only, so
web_search `[citation:url]` markers are untouched.
3. **Prompt contract (both surfaces).** Update `main_agent/.../citations/on.md`
(production) to teach the `[n]` channel alongside the existing web_search/`task`
channels; reconcile the composer path by folding `citation_contract.md` into
`base/citations_on.md` (then delete `citation_contract.md`). `citations_off.md`
stays.
4. **Mentions → scope.** Map `@document`/`@folder` mentions to
`SearchScope(document_ids=…)` for the tool; retire `kb_priority` mention
surfacing.
5. **Remove the old eager path.****Done**`KnowledgePriorityMiddleware`
and the old `search_knowledge_base` hybrid helper in `knowledge_search.py`
are deleted (the whole module is gone); `kb_context_projection` is trimmed to
a tree-only projector (kept because it still projects `<workspace_tree>` to
subagents); `kb_priority` state + the `enable_kb_priority_preinjection` flag +
all `<priority_documents>` prompt references are removed. Still pending:
`ChucksHybridSearchRetriever` (after migrating `ConnectorService`). Migrate
`web_search` to register `WEB_RESULT` so all citations unify on `[n]`
**done**, see §12 build-order Step 6.
---
## 11. After-plan follow-ups (separate workstreams)
Not part of the §10 rollout — different subsystems, tracked here so they aren't
lost:
- **Progress streaming** (streaming subsystem). Promote a curated subset of
tool/subagent events to user-visible progress ("Reading…", "Searching…") to
collapse *perceived* latency from pull-based retrieval. See §4.5. This is the
mitigation for pull's only real cost, but it touches the streaming pipeline, not
the retrieval/citation path — so it ships independently.
---
## 12. Unified document render (search + read)
The model meets a knowledge-base document in two moments: as **excerpts** from a
search, and as a **full read** of one object. Today these use two unrelated
shapes (compact text for search; `<document_metadata>` + `<chunk_index>` +
`<chunk id>` XML for reads), with two different citation tokens. That doubles the
schema the model must learn and is a hallucination surface. We collapse both onto
**one renderer**.
### Principles
- **One envelope, two views.** The same renderer renders a document whether it
arrives partial (search) or complete (read). Only the `view` and the set of
passages shown differ.
- **`[n]` is the only citable token**, in both views, assigned by the shared
registry (find-or-create). A chunk first seen in search keeps its `[n]` when the
same doc is later read in full.
- **Completeness is the `view` word, nothing more.** A search result is inherently
excerpts; a read is inherently the whole object. No `is_complete` flag, no numeric
coverage count. `view="excerpt"` tells the model it saw a slice (so it should read
the doc before claiming the doc "only" says X); `view="full"` says it has the whole
object. A `chunks_shown`/`total_chunks` count was considered and rejected: search
excerpts have no total on hand (and we won't add a count query for it), and full
reads are already self-evident from `view`.
- **Drop noise.** Raw `document_id` / `chunk_id` and the `metadata_json` blob
leave the model's view (they stay server-side as registry keys). The model
sees `title`, `source`, and `[n]` passages.
- **No seek table.** A full read returns the whole document as one numbered
document block; the `<chunk_index>` line-range map is dropped. It was keyed by internal
`chunk_id` (which the model never sees), so it gave the agent nothing actionable
to seek by. Re-keying it to `[n]` or adding chunk previews would only add cognitive
load the agent must reconcile against the actual content — a hallucination/quality
risk that outweighs the token savings on the rare genuinely-large read. Simpler:
hand over the document, numbered, and let the model read it.
### Shape
Excerpt (from `search_knowledge_base`):
```xml
<document title="Q3 Launch Notes" source="Slack · #launch · 2026-03-02" view="excerpt">
[3] We agreed to push launch to March 10.
[4] Marketing will be notified next week.
</document>
```
Full (from a read):
```xml
<document title="Q3 Launch Notes" source="Slack · #launch · 2026-03-02" view="full">
[3] We agreed to push launch to March 10.
[4] Marketing will be notified next week.
[7] …
…(all chunks, numbered)
</document>
```
`<retrieved_context>` becomes simply "N documents in excerpt view"; a read is
"one document in full view". This supersedes the standalone `<retrieved_context>`
renderer decision and confirms the earlier removal of `is_complete`.
### Build order (one step at a time)
1. **Registry merge reducer**`citation_registry` merges (find-or-create union,
re-mint on collision) instead of replacing, so parent/subagent (and parallel)
registrations stay globally consistent. Pure; independently testable. ✅
2. **One document renderer** with a `view` parameter; point `search_knowledge_base`
at it (excerpt view), replacing today's `retrieved_context` renderer. ✅
3. **Register-on-read + full view** — the KB read path registers its chunks and
renders through the same renderer (full view); the whole document is returned
numbered, with **no `<chunk_index>`**. The `read_file` tool loads the document
via `KBPostgresBackend.aload_document`, renders it against the conversation
registry, and persists `citation_registry`; `build_document_xml` is deleted. ✅
4. **Retire Channel C** — now that KB reads emit `[n]` (Step 3), the
knowledge_base read/specialist path cites bare `[n]` instead of
`[citation:chunk_id]`. The KB subagent prompts (cloud/desktop, full/read-only)
and `description_readonly.md` were rewritten to the `<document view="full">`
`[n]` format, the `evidence.chunk_ids` field became `evidence.citations`, and
`citations/on.md` folds the KB relay into Channel A (preserve `[n]` from a
specialist verbatim). Channel C is **narrowed, not deleted**: it still covers
`task` specialists that emit `[citation:id]` — today only the deliverables
`knowledge_base` tool, which builds its own `<chunk id>` XML and is not yet on
the registry/`[n]` spine. Migrating that tool (and then fully deleting
Channel C) is a follow-up. ✅
5. **Delete `kb_matched_chunk_ids`** — with no seek table and no `matched` flag, the
search→read highlighting hand-off has no consumer. Removed: the state field
(`filesystem_state.py`) and its reducer default (`reducers.py`); the
`search_knowledge_base` tool's `_matched_chunk_ids` writer; the dead
`KnowledgePriorityMiddleware` writes plus the `matched_chunk_ids` return of
`_materialize_priority` (`knowledge_search.py`); and the stale
`<chunk_index>` / `matched="true"` / `<chunk id>` rendering prose in the cloud
filesystem prompt (`cloud.py`), rewritten to the `<document view="full">` `[n]`
read format. The `resolver.py` docstring reference was dropped and the two
integration assertions that read the field now assert scope confinement via the
rendered `<retrieved_context>` titles. (The retriever-layer `matched_chunk_ids`
in `chunks_hybrid_search.py` is a separate output shape and is untouched.) ✅
6. **Web onto the registry (Channel B → A)**`web_search` now registers each
result as a `WEB_RESULT` (locator `{url}`) and renders a `<web_results>` block
of `<document view="excerpt">` blocks with `[n]` labels, returning a
`Command(update={messages, citation_registry})` like `search_knowledge_base`.
`markers.py` already maps `WEB_RESULT → url`, so `[n]` resolves end-to-end with
no frontend change. To enable this, the renderer was generalized: a
`RenderablePassage` now carries a generic `locator: dict` (KB fills
`{document_id, chunk_id}`; web fills `{url}`) instead of fixed KB fields, and a
dedicated **citation-state middleware** declares the `citation_registry` channel
for the `research` subagent (which doesn't use the filesystem state). The two
duplicate `web_search` implementations were collapsed into the shared
`app/agents/chat/shared/tools/web_search.py`; the `research` copy was deleted.
Prompts updated: `citations/on.md` drops the web channel (web is now Channel A
`[n]`; only the legacy `[citation:id]` specialist relay remains, relabelled
Channel B), the research subagent prompt cites `[n]`, the main `web_search`
description teaches `<web_results>`/`[n]`, `off.md` suppresses `[n]` too, and
stale `<chunk_index>`/`[citation:chunk_id]` references in `dynamic_context` and
the grok/openai_codex provider hints were corrected to `[n]`. `scrape_webpage`
stays uncited (raw page text, no `[n]`) — a fact from a scrape reports its URL
instead. Connectors and chat turns remain unmigrated (future workstreams). ✅

View file

@ -1,4 +1,4 @@
"""Unit tests for the citation registry spine (ADR 0001 §3)."""
"""Unit tests for the citation registry spine."""
from __future__ import annotations