web_search now registers each result as a WEB_RESULT (locator {url}) and
renders a <web_results> block of <document view="excerpt"> [n] passages,
returning Command(update={messages, citation_registry}) like
search_knowledge_base. Collapse the duplicate research-subagent web_search
into the shared tool and teach the prompts to cite web hits with [n].
The main agent's search_knowledge_base tool runs the hybrid spine, renders
a <retrieved_context> of numbered [n] passages, and persists the registry.
KB subagent prompts teach citing [n] from <document view="full"> reads
(evidence.chunk_ids -> evidence.citations). Delete the now-unused
search->read highlighting hand-off: the kb_matched_chunk_ids state field,
its reducer default, the tool's _matched_chunk_ids writer, and the dead
KnowledgePriorityMiddleware writes.
Add the checkpointed CitationRegistry (load/merge helpers + state field)
and a lightweight CitationStateMiddleware so subagents can register into
the same conversation registry. Resolve [n] -> [citation:<payload>] at
stream finalize from the registry, polymorphically by source type.
Add a shared document_render package that renders sources as
<document view="excerpt|full"> blocks with server-assigned [n] passage
labels (KB locator {document_id, chunk_id}, web locator {url}). Wire the
KB read backend (kb_postgres) and read_file to the new renderer and drop
the legacy per-document XML renderer (document_xml, retrieved_context) and
the old chunk_index / matched="true" / <chunk id> read format.
Mirror search_threads visibility in the referenced-chat resolver: a
search-space owner can now @-mention legacy threads that predate creator
tracking (null created_by_id), instead of those being silently dropped.
Add unit tests for role-specific turn extraction in the resolver and for
the transcript renderer: full rendering within budget, dropping oldest
turns with a marker, partial-tail fill of an overflowing turn, and
multi-chat tagging.
Thread mentioned_thread_ids from the route through the orchestrator into
input-state assembly, resolve them for the requesting user, and append
the rendered referenced-chat block to the agent's query context.
Add the referenced_chat_context slice: models for the data shapes, a
fail-closed resolver that fetches mentioned threads and their visible
turns under the same access rules as thread search, and a transcript
renderer that emits a budgeted <referenced_chat_context> block. When a
chat exceeds the per-reference character budget, recent turns are kept
and any leftover budget is filled with the overflowing turn's tail, with
truncation markers signalling the cut.
Extend the chat contract so a turn can reference other conversations:
add the "thread" kind to MentionedDocumentInfo and a mentioned_thread_ids
field on NewChatRequest and RegenerateRequest.