The pull-based KB design (on-demand search_knowledge_base tool + pre-injected workspace tree) fully replaced the old eager retrieval path. Remove its last remnants: - Delete KnowledgePriorityMiddleware (knowledge_search.py) and its tests. - Drop the kb_priority state field + reducer default; trim KbContextProjectionMiddleware to project only workspace_tree_text. - Remove the now-dead feature flags enable_kb_priority_preinjection and enable_kb_planner_runnable across backend (flags, route schema, tests, env examples) and frontend (settings toggle, zod schema). - Scrub <priority_documents> and stale KnowledgePriorityMiddleware references from prompts, docstrings, and the ADR. No functional change: nothing wrote kb_priority and neither flag gated live behavior after the cutover. Full backend suite green (pre-existing unrelated failures aside).
32 KiB
ADR 0001 — RAG, Citation, and Context Architecture
- Status: Proposed
- Date: 2026-06-24
- Owners: SurfSense core
- Supersedes: the pre-agent KB priority/planner injection path
1. Context & problem
SurfSense answers questions over a user's indexed knowledge base (documents, chats, connectors, web results). The current pipeline causes the model to hallucinate citations and answers. Root causes identified during review:
- Content/ID split. The model is asked to author or copy complex identifiers
(
chunk_id, raw URLs, free-text titles) that sit far from the content they label. LLMs reliably corrupt nearby digits — so citations point at the wrong source or at nothing. - Pre-agent work. A planner LLM call + embedding + hybrid search runs in
before_agenton every turn (KnowledgePriorityMiddleware), plus an eagerfetch_mentioned_documentswhose chunks are then discarded. This adds latency and context noise before the agent even reasons. - Mentions are mismanaged. An
@documentmention forces a wasted full-chunk fetch, points at the doc twice (inline backtick path +<priority_documents>entry), and still requires a read round-trip — then dumps the whole doc regardless of the question. - Retrieval quality. Search retrieves on chunks but collapses to documents,
chunks have no overlap, and the reranker exists (
RerankerService) but is not wired into the agent path. - Context bloat. The workspace tree (up to 4000 tokens) and priority lists are
injected into the durable
messageslist every turn, causing context distraction/confusion.
This ADR defines the target architecture. It is the single source of truth; implementation issues should reference section numbers here.
2. Principles
- The model cites tiny numbers
[n], never identifiers. The server owns the mapping from[n]to a real source. There is nothing for the model to invent. - Retrieval is pull-based, behind tools. Nothing retrieves before the agent runs. The agent calls a tool when it needs information.
- A mention is scope, not a retrieval trigger. Mentioning a thing tells the model the thing exists and gives it a filter it may apply — it does not fetch.
- Ambient context is not conversation. Transient per-turn context (tree,
mention scope, memory) is rendered via the system prompt, not appended to the
durable
messagestrajectory. - All complexity lives server-side (resolver, retriever), so the model's job stays trivial: read passages, echo the number next to the one you used.
3. Citation architecture (the spine)
Everything hangs off this. Build it first.
3.1 What is citable
Anything that is information retrieved from a source. Each source type has a natural citable unit:
| Source | Citable unit | Entry locator | Enters context via |
|---|---|---|---|
kb_chunk |
chunk | document_id + chunk_id |
search_knowledge_base |
kb_document |
document | document_id |
read (whole doc) |
connector_item |
item | connector_id + external_id |
connector tool |
web_result |
url | url |
web search / crawl |
chat_turn |
turn | thread_id + message_id |
@chat / referenced chat |
anon_chunk |
chunk | session/doc + chunk_id |
uploaded anonymous doc |
Not citable (control/pointer — never gets a number): workspace tree, mention
scope notes, report_context, the priority/registry listing itself.
3.2 The citation entry (the truth)
A registered entry is the durable identity of a citable unit:
class CitationEntry(TypedDict):
n: int # the tiny label shown to the model
source_type: str # "kb_chunk" | "kb_document" | "connector_item"
# | "web_result" | "chat_turn" | "anon_chunk"
locator: dict[str, Any] # source-specific identity (see table 3.1)
display: dict[str, Any] # title, source label, url, date — for the UI pill
3.3 The registry (the bookkeeping)
Lives in agent state so it survives across turns and across orchestrator + subagents.
class CitationRegistry(TypedDict):
by_n: dict[int, CitationEntry] # n -> entry (resolve direction)
by_key: dict[str, int] # source_key -> n (dedup / find-or-create)
next_n: int # monotonic counter
source_keyis a stable string derived from(source_type, locator), e.g."kb_chunk:42:880","web_result:https://…","chat_turn:7:1190".- Numbering is per-conversation and monotonic. A given
[n]never changes meaning within a conversation. - Dedup: registering an already-seen unit returns its existing
n.
3.4 The two operations
def register(registry, source_type, locator, display) -> int:
"""Find-or-create. Returns the [n] for this unit."""
key = make_key(source_type, locator)
if key in registry["by_key"]:
return registry["by_key"][key]
n = registry["next_n"]
registry["next_n"] += 1
registry["by_n"][n] = {"n": n, "source_type": source_type,
"locator": locator, "display": display}
registry["by_key"][key] = n
return n
def resolve(registry, n) -> CitationEntry | None:
"""Map a model-emitted [n] back to its source. Unknown n -> None (drop)."""
return registry["by_n"].get(n)
3.5 Lifecycle
source yields item
→ register(entry) # source_type + locator + display → assign/reuse [n]
→ render passage with [n] # the number sits INLINE next to the content
→ model writes "...March 10 [n]"
→ resolver: [n] → entry # server-side, on the streamed answer
→ frontend renders citation pill
The model only ever echoes a number that was printed next to the content it used. Unknown/garbled numbers resolve to nothing and are dropped (abstention by construction).
3.6 Presentation format (<retrieved_context>)
[n] must be the only citable integer adjacent to each passage. No
chunk 4 of 19, no raw ids near the text. Grouping by document is allowed; the
[n] is per passage.
<retrieved_context>
Excerpts retrieved from the user's knowledge base for this query.
Cite a passage with its [n].
Document: "Q3 Launch Notes" (Slack · #launch · 2026-03-02)
[1] We agreed to push launch to March 10.
[2] Marketing will be notified next week.
Document: "Timeline" (Notion · 2026-02-28)
[3] Dates floated were Mar 10 and Mar 17.
</retrieved_context>
3.7 Reconciliation with the existing token format
The frontend and evals already parse [citation:ID]
(surfsense_web/lib/citations/citation-parser.ts,
surfsense_evals/src/surfsense_evals/core/parse/citations.py).
Decision: keep the wire token [citation:ID] where ID = n. The model is
instructed to emit [n]; a thin normalization step rewrites [n] →
[citation:n] on the streamed output before it reaches the existing parser, OR
the model is instructed to emit [citation:n] directly. Either way ID is now a
small ordinal from the registry, not a chunk_id/url/title. The resolver maps
n → CitationEntry → the frontend citation object the UI already expects.
Decided (§8.8): the model emits
[n](smallest surface for the model to get right); the server normalizes[n]→[citation:n]before the existing parser.
4. Retrieval architecture (pull-based)
4.0 Execution channels (verified against the codebase)
The orchestrator (main agent) does not own the virtual filesystem. It has a
small fixed toolset; everything else is delegated via task(<specialist>, …).
Verified in main_agent/tools/index.py and subagents/builtins/knowledge_base.
| Capability | Owner | Reached via |
|---|---|---|
search_knowledge_base(query, scope?) — semantic/hybrid RAG retrieval, read-only |
orchestrator | direct call |
web_search, scrape_webpage |
orchestrator | direct call |
update_memory, create_automation, write_todos, task |
orchestrator | direct call |
virtual filesystem: read_file, write_file, edit_file, ls, glob, grep, list_tree, rm, rmdir, move_file |
knowledge_base subagent | task(knowledge_base, …) |
| connector ops (gmail/slack/jira/…) | connector subagents | task(<connector>, …) |
Consequences for citations:
- The dominant RAG path is orchestrator-direct (
search_knowledge_base), so it registers[n]exactly where the answer is composed — no relay. - The shared registry (§8.9) is load-bearing only for the delegated lanes
(whole-doc reads via
knowledge_base, connector reads): the subagent registers into the shared registry and relays[n]upward. search_knowledge_baseis semantic RAG, distinct from filesystem search (grep/glob), which belongs to the subagent.routing.mdconflates these and omitssearch_knowledge_basefrom its direct-tools list — that prompt is stale and must be corrected (see §7).
4.1 The two retrieval operations
| Operation | Tool | Owner | For |
|---|---|---|---|
| search | search_knowledge_base(query, scope?) → chunks, each registered → [n] |
orchestrator (direct) | "related / scoped question" — RAG |
| read | read_file(path) (whole object) |
knowledge_base subagent (task) |
"summarize / translate / rewrite / navigate this" |
The agent chooses based on the query. No server-side intent classifier; the query
semantics decide (summarize ⇒ delegate a read; related ⇒ direct search).
4.2 scope — the mention→retrieval bridge
scope is an optional typed filter restricting the search haystack:
scope = {
"document_ids": [42],
"folder_ids": [],
"connector_ids": [],
}
- Becomes
WHEREconstraints on the chunk search (document_id IN (...), etc.). - Agent-controlled, not automatic. "in this doc" → agent passes scope; "related" → agent omits it.
- Spans only KB-indexed references (doc/folder/connector). Chats are not
KB-indexed (no
CHATdocument type; they live inNewChatThread/NewChatMessage, notDocument/Chunk), so@chatnever appears inscope— it uses the separate read channel in §5. - How it reaches the retriever depends on the channel:
- direct
search_knowledge_base→scopeis a structured tool arg the orchestrator passes (new arg to add — current tool has noscope). - delegated
read/ browse → the orchestrator expresses scope in the task prompt (path + ids); the subagent translates it into its filesystem calls.
- direct
Decision: even when scope pins a single doc, search_knowledge_base still
runs full hybrid ranking within that doc (a large doc still needs its relevant
passages surfaced) — it does not return raw chunk order.
4.3 Retrieval quality fixes (folded into this work)
- Return at chunk granularity with stable
chunk_id(no collapse-to-document that loses the citable unit). - Wire the reranker (
RerankerService) into thesearch_knowledge_basepath. - Chunk overlap in the indexing pipeline (config in
app/config/__init__.py,RecursiveChunkercurrently has no overlap). - Add the
scopearg tosearch_knowledge_base.
4.4 End-to-end pipeline
flowchart TD
U["User turn + @mentions"] --> AMB["Mentions → ambient scope note (no fetch)"]
AMB --> ORCH{"ORCHESTRATOR reasons"}
ORCH -- "scoped/related question" --> SKB["search_knowledge_base(query, scope?)<br/>DIRECT · hybrid + rerank"]
ORCH -- "public web" --> WEB["web_search / scrape_webpage<br/>DIRECT"]
ORCH -- "summarize/read/navigate/mutate" --> TKB["task(knowledge_base, …)<br/>DELEGATE"]
ORCH -- "connector op" --> TCN["task(gmail/slack/…)<br/>DELEGATE"]
SKB --> REGD["register kb_chunk → [n]"]
WEB --> REGD2["register web_result → [n]"]
subgraph SUB["SUBAGENTS (filesystem / connector tools)"]
FS["read_file/ls/glob/grep/…"]
CN["connector ops"]
FS --> REGS["register → [n] (SHARED registry)"]
CN --> REGS
REGS --> SYN["synthesize + relay [n] up"]
end
TKB --> FS
TCN --> CN
REGD --> COMPOSE["Orchestrator composes answer with [n]"]
REGD2 --> COMPOSE
SYN --> COMPOSE
COMPOSE --> NORM["[n] → [citation:n]"] --> RESOLVE["resolve via shared registry<br/>(unknown → dropped)"] --> UI["Citation pills"]
4.5 Tradeoffs: pull vs push (and perceived latency)
We chose pull (the agent reads/searches via tools when needed) over push (eagerly injecting referenced content into context). Rationale and costs:
Why pull is the default
- Token efficiency — fetch only what the query needs, not whole docs.
- Scales to many/large mentions, folders, connectors — push cannot.
- Intent-adaptive granularity — passages for scoped Qs, whole doc for summaries.
- Context hygiene — content arrives as evidence (
[n]), not ambient noise. - Uniform across all mention types.
Costs (and why they're acceptable)
- Perceived latency (TTFT). Pull adds a tool round-trip before answer tokens.
This is the only place push clearly wins. The mitigation is progress
streaming (time-to-first-signal, not first-token): stream "Reading
Q3 Launch Notes…" / "Searching your knowledge base…" so the wait feels
productive — the pattern used by Perplexity, Claude, and Cursor.
Out of scope for this ADR's rollout. Progress streaming is a separate workstream — it touches the streaming subsystem, not the retrieval/citation path. Tracked as an after-plan follow-up. Today intermediate/subagent steps are largely suppressed (
surfsense:internal), which is what makes pull feel slow; the follow-up promotes a curated subset of tool/subagent events to user-visible progress. - "Cite-without-read" risk — neutralized structurally: ambient pointers carry
no
[n];[n]exists only after a tool returns evidence; invented[n]resolves to nothing and is dropped. The worst residual case degrades from a confident wrong citation to an uncited claim (further guarded by content-free pointers + a "read before you answer" policy line). - Delegation synthesis loss — whole-doc reads go through the KB subagent,
which summarizes back; mitigate by instructing it to return quotes +
[n].
Conditional hybrid. A bounded eager fast-path (inject content only when a single small doc is mentioned) may be added later, only if latency telemetry justifies it — not built speculatively.
5. Mention architecture (scope, not trigger)
When the user mentions anything:
- It is recorded as ambient scope in the system prompt (via
dynamic_promptruntime.context), e.g.:
Referenced this turn: doc 42 (
/documents/Launch/Q3.xml), folder 7 (/documents/Specs/). For a scoped question callsearch_knowledge_base(query, scope={document_ids:[42]}); to load the whole thing delegatetask(knowledge_base, "read /documents/Launch/Q3.xml …"). - No fetch, no RAG, no
<priority_documents>pre-injection. - The agent decides: direct
search_knowledge_base(query, scope)(scoped question) or delegatedtask(knowledge_base, …)read (whole-object intent).
References split into two kinds by whether the source is searchable:
- Searchable references (
@document,@folder,@connector, anon upload) — the source is KB-indexed, so they becomescopeand are pulled viasearch_knowledge_base/ delegated read. Pointer + pull. - Read references (
@chat) — the source is not KB-indexed, so there is nothing to "search". The thread is a finite, user-selected artifact; its turns are loaded directly (access-checked) and citable aschat_turn. Pointer + read.
Per mention type (note the channel — direct vs delegated):
| Mention | Ambient note | Retrieval behavior | Citation kind on use |
|---|---|---|---|
@document |
doc id + path | direct search_knowledge_base(scope={document_ids:[id]}), or delegated task(knowledge_base, read …) |
kb_chunk / kb_document |
@folder |
folder id + path | direct search_knowledge_base(scope={folder_ids:[id]}), or delegated browse |
kb_chunk |
@connector account |
connector_id + account | task(<connector>, "… connector_id=id") |
connector_item |
@chat |
thread id + title | on-demand read (not scope): pointer only; model calls read_chat(thread_id) when it needs the conversation, reusing the access-checked referenced_chat_context resolver |
chat_turn |
| anonymous upload | session doc ref | direct search_knowledge_base(scope=anon) / delegated read |
anon_chunk |
6. Context plane separation
| Plane | Carries | Mechanism | Lifetime |
|---|---|---|---|
| Ambient | workspace tree, mention scope, memory, instructions | system prompt via dynamic_prompt + runtime.context |
per-turn, not persisted in messages |
| Evidence | retrieved passages with [n] |
tool results / <retrieved_context> |
enters trajectory when a tool runs |
| Trajectory | user/assistant turns, tool calls | messages |
durable, checkpointed |
The workspace tree and priority/registry listings move out of messages into
the ambient plane.
7. Cleanup (what gets removed/changed)
Remove from the hot path:
KnowledgePriorityMiddlewaresearch branch (planner LLM, embedding, hybrid search inbefore_agent). ✅ Done — the wholeknowledge_search.pymodule is deleted.fetch_mentioned_documentseager chunk pull.<priority_documents>pre-injection andKbContextProjectionMiddlewarepriority projection. ✅ Done —<priority_documents>is no longer produced anywhere;KbContextProjectionMiddlewareis trimmed to a pure<workspace_tree>projector. Theenable_kb_priority_preinjectionflag and every<priority_documents>prompt reference are removed.kb_prioritystate plumbing (deleted per §8.10; add a dedicatedcitation_registryfield instead). ✅ Done —kb_priority/KbPriorityEntryare removed from state + reducers.kb_matched_chunk_idsis already gone (build-order Step 5).
Keep / add:
search_knowledge_base(query, scope?)(orchestrator-direct) as the only RAG entry point, returning registered chunks with[n]. Add thescopearg.read_file(knowledge_base subagent, viatask) for whole-object ops; cited reads register akb_document/kb_chunkentry into the shared registry.- The citation registry in state (shared across orchestrator + subagents).
- Reranker wired into
search_knowledge_base; chunk overlap in indexing. - Ambient mention note via
dynamic_prompt. - Fix
routing.md: addsearch_knowledge_baseto the orchestrator's direct-tools list, and clarify that "search inside the workspace goes throughtask(knowledge_base)" refers to filesystem search (grep/glob), not the semanticsearch_knowledge_basetool.
8. Locked decisions
- Model cites
[n]; server owns[n] → sourcevia a registry. ✅ - Numbering is per-conversation, monotonic, dedup'd (find-or-create). ✅
- Retrieval is pull-based: orchestrator-direct
search_knowledge_base(RAG) + delegatedread_file(knowledge_base subagent); no pre-agent retrieval. ✅ - Mention = ambient scope;
scopeis an agent-controlledsearch_knowledge_basefilter. ✅ - Scoped search still runs full hybrid ranking within scope. ✅
- Ambient context (tree, mention scope) lives in the system prompt, not
messages. ✅ - Wire token stays
[citation:ID]withID = n. ✅ - Model emits
[n]; the server normalizes[n]→[citation:n]on the streamed output before the existing parser. The model's surface stays minimal. ✅ - Subagent retrievals register into the same conversation
citation_registry, so[n]is globally consistent across orchestrator + subagents. This replaces the Channel A/B relay entirely. ✅ - Delete the legacy
kb_priority/kb_matched_chunk_idsplumbing; add a dedicatedcitation_registryfield to state rather than overloading old fields. ✅ @chatis a non-indexed read reference (chats aren't inDocument/Chunk): pointer only, loaded on demand via aread_chat(thread_id)tool that reuses the access-checkedreferenced_chat_contextresolver and registers each surfaced turn aschat_turn. ✅- One document render for both surfaces. RAG excerpts
(
search_knowledge_base) and full reads (read_file) render through a single document renderer — same envelope, same[n]contract. Completeness is carried byview="excerpt"vsview="full", not anis_completeboolean and not a numeric coverage count:view="excerpt"alone tells the model it saw a slice. (Achunks_shown/total_chunkscount was considered and dropped — it never had a total to show for search excerpts, and full reads already sayview="full".) Raw ids andmetadata_jsonare dropped from the model's view. No<chunk_index>seek table — a full read returns the whole document as one numbered document block (an index keyed by internal ids gives the agent no actionable signal, and any[n]-keyed/preview index adds cognitive load that risks degrading the primary answer). Supersedes the standalone<retrieved_context>shape and the removedis_complete. See §12. (planned)
9. Open items
All decisions locked (§8). Decision #12 is locked but not yet built — see the §12 schema and the rollout follow-ups.
10. Rollout
Already built in parallel (committed, not yet wired)
shared/citations/ (registry, markers, normalizer), shared/retrieved_context/
(renderer), shared/retrieval/ (hybrid search + rerank + service), hybrid-search
behavior tests, and the on-contract prompt base/citation_contract.md
([n] / [1][2]).
Two findings that shape the cutover
- The agent is already pull-based by default.
enable_kb_priority_preinjectionisFalseandKnowledgePriorityMiddlewarerunsmentions_only=True; an on-demandsearch_knowledge_basetool already exists. So the cutover upgrades the existing pull tool to the citation spine — it does not remove eager RAG (already gated off). - The production citation prompt is local to the agent, at
main_agent/system_prompt/prompts/citations/on.md(two-channel[citation:chunk_id]). The composer'sbase/citations_on.mdonly serves the anonymous/automation path. Both must learn the[n]contract.
Phased cutover
- Registry on state. Add
citation_registry: CitationRegistrytoSurfSenseFilesystemStatewith a replace reducer; confirm checkpointer round-trip. - Swap the KB tool. Rewrite
search_knowledge_baseto callsearch_knowledge_base_context(renders<retrieved_context>with[n], mutates the registry) and persist the registry viaCommand(update=...). - Normalize
[n]→[citation:<payload>]. Finalize-time first (rewrite the completed assistant text from the checkpointed registry before DB persist); buffered live-stream normalization is a follow-up. Bare-[n]only, so web_search[citation:url]markers are untouched. - Prompt contract (both surfaces). Update
main_agent/.../citations/on.md(production) to teach the[n]channel alongside the existing web_search/taskchannels; reconcile the composer path by foldingcitation_contract.mdintobase/citations_on.md(then deletecitation_contract.md).citations_off.mdstays. - Mentions → scope. Map
@document/@foldermentions toSearchScope(document_ids=…)for the tool; retirekb_prioritymention surfacing. - Remove the old eager path. ✅ Done —
KnowledgePriorityMiddlewareand the oldsearch_knowledge_basehybrid helper inknowledge_search.pyare deleted (the whole module is gone);kb_context_projectionis trimmed to a tree-only projector (kept because it still projects<workspace_tree>to subagents);kb_prioritystate + theenable_kb_priority_preinjectionflag + all<priority_documents>prompt references are removed. Still pending:ChucksHybridSearchRetriever(after migratingConnectorService). Migrateweb_searchto registerWEB_RESULTso all citations unify on[n]— done, see §12 build-order Step 6.
11. After-plan follow-ups (separate workstreams)
Not part of the §10 rollout — different subsystems, tracked here so they aren't lost:
- Progress streaming (streaming subsystem). Promote a curated subset of tool/subagent events to user-visible progress ("Reading…", "Searching…") to collapse perceived latency from pull-based retrieval. See §4.5. This is the mitigation for pull's only real cost, but it touches the streaming pipeline, not the retrieval/citation path — so it ships independently.
12. Unified document render (search + read)
The model meets a knowledge-base document in two moments: as excerpts from a
search, and as a full read of one object. Today these use two unrelated
shapes (compact text for search; <document_metadata> + <chunk_index> +
<chunk id> XML for reads), with two different citation tokens. That doubles the
schema the model must learn and is a hallucination surface. We collapse both onto
one renderer.
Principles
- One envelope, two views. The same renderer renders a document whether it
arrives partial (search) or complete (read). Only the
viewand the set of passages shown differ. [n]is the only citable token, in both views, assigned by the shared registry (find-or-create). A chunk first seen in search keeps its[n]when the same doc is later read in full.- Completeness is the
viewword, nothing more. A search result is inherently excerpts; a read is inherently the whole object. Nois_completeflag, no numeric coverage count.view="excerpt"tells the model it saw a slice (so it should read the doc before claiming the doc "only" says X);view="full"says it has the whole object. Achunks_shown/total_chunkscount was considered and rejected: search excerpts have no total on hand (and we won't add a count query for it), and full reads are already self-evident fromview. - Drop noise. Raw
document_id/chunk_idand themetadata_jsonblob leave the model's view (they stay server-side as registry keys). The model seestitle,source, and[n]passages. - No seek table. A full read returns the whole document as one numbered
document block; the
<chunk_index>line-range map is dropped. It was keyed by internalchunk_id(which the model never sees), so it gave the agent nothing actionable to seek by. Re-keying it to[n]or adding chunk previews would only add cognitive load the agent must reconcile against the actual content — a hallucination/quality risk that outweighs the token savings on the rare genuinely-large read. Simpler: hand over the document, numbered, and let the model read it.
Shape
Excerpt (from search_knowledge_base):
<document title="Q3 Launch Notes" source="Slack · #launch · 2026-03-02" view="excerpt">
[3] We agreed to push launch to March 10.
[4] Marketing will be notified next week.
</document>
Full (from a read):
<document title="Q3 Launch Notes" source="Slack · #launch · 2026-03-02" view="full">
[3] We agreed to push launch to March 10.
[4] Marketing will be notified next week.
[7] …
…(all chunks, numbered)
</document>
<retrieved_context> becomes simply "N documents in excerpt view"; a read is
"one document in full view". This supersedes the standalone <retrieved_context>
renderer decision and confirms the earlier removal of is_complete.
Build order (one step at a time)
- Registry merge reducer —
citation_registrymerges (find-or-create union, re-mint on collision) instead of replacing, so parent/subagent (and parallel) registrations stay globally consistent. Pure; independently testable. ✅ - One document renderer with a
viewparameter; pointsearch_knowledge_baseat it (excerpt view), replacing today'sretrieved_contextrenderer. ✅ - Register-on-read + full view — the KB read path registers its chunks and
renders through the same renderer (full view); the whole document is returned
numbered, with no
<chunk_index>. Theread_filetool loads the document viaKBPostgresBackend.aload_document, renders it against the conversation registry, and persistscitation_registry;build_document_xmlis deleted. ✅ - Retire Channel C — now that KB reads emit
[n](Step 3), the knowledge_base read/specialist path cites bare[n]instead of[citation:chunk_id]. The KB subagent prompts (cloud/desktop, full/read-only) anddescription_readonly.mdwere rewritten to the<document view="full">[n]format, theevidence.chunk_idsfield becameevidence.citations, andcitations/on.mdfolds the KB relay into Channel A (preserve[n]from a specialist verbatim). Channel C is narrowed, not deleted: it still coverstaskspecialists that emit[citation:id]— today only the deliverablesknowledge_basetool, which builds its own<chunk id>XML and is not yet on the registry/[n]spine. Migrating that tool (and then fully deleting Channel C) is a follow-up. ✅ - Delete
kb_matched_chunk_ids— with no seek table and nomatchedflag, the search→read highlighting hand-off has no consumer. Removed: the state field (filesystem_state.py) and its reducer default (reducers.py); thesearch_knowledge_basetool's_matched_chunk_idswriter; the deadKnowledgePriorityMiddlewarewrites plus thematched_chunk_idsreturn of_materialize_priority(knowledge_search.py); and the stale<chunk_index>/matched="true"/<chunk id>rendering prose in the cloud filesystem prompt (cloud.py), rewritten to the<document view="full">[n]read format. Theresolver.pydocstring reference was dropped and the two integration assertions that read the field now assert scope confinement via the rendered<retrieved_context>titles. (The retriever-layermatched_chunk_idsinchunks_hybrid_search.pyis a separate output shape and is untouched.) ✅ - Web onto the registry (Channel B → A) —
web_searchnow registers each result as aWEB_RESULT(locator{url}) and renders a<web_results>block of<document view="excerpt">blocks with[n]labels, returning aCommand(update={messages, citation_registry})likesearch_knowledge_base.markers.pyalready mapsWEB_RESULT → url, so[n]resolves end-to-end with no frontend change. To enable this, the renderer was generalized: aRenderablePassagenow carries a genericlocator: dict(KB fills{document_id, chunk_id}; web fills{url}) instead of fixed KB fields, and a dedicated citation-state middleware declares thecitation_registrychannel for theresearchsubagent (which doesn't use the filesystem state). The two duplicateweb_searchimplementations were collapsed into the sharedapp/agents/chat/shared/tools/web_search.py; theresearchcopy was deleted. Prompts updated:citations/on.mddrops the web channel (web is now Channel A[n]; only the legacy[citation:id]specialist relay remains, relabelled Channel B), the research subagent prompt cites[n], the mainweb_searchdescription teaches<web_results>/[n],off.mdsuppresses[n]too, and stale<chunk_index>/[citation:chunk_id]references indynamic_contextand the grok/openai_codex provider hints were corrected to[n].scrape_webpagestays uncited (raw page text, no[n]) — a fact from a scrape reports its URL instead. Connectors and chat turns remain unmigrated (future workstreams). ✅