The pull-based KB design (on-demand search_knowledge_base tool + pre-injected
workspace tree) fully replaced the old eager retrieval path. Remove its last
remnants:
- Delete KnowledgePriorityMiddleware (knowledge_search.py) and its tests.
- Drop the kb_priority state field + reducer default; trim
KbContextProjectionMiddleware to project only workspace_tree_text.
- Remove the now-dead feature flags enable_kb_priority_preinjection and
enable_kb_planner_runnable across backend (flags, route schema, tests,
env examples) and frontend (settings toggle, zod schema).
- Scrub <priority_documents> and stale KnowledgePriorityMiddleware references
from prompts, docstrings, and the ADR.
No functional change: nothing wrote kb_priority and neither flag gated live
behavior after the cutover. Full backend suite green (pre-existing unrelated
failures aside).
The main agent's search_knowledge_base tool runs the hybrid spine, renders
a <retrieved_context> of numbered [n] passages, and persists the registry.
KB subagent prompts teach citing [n] from <document view="full"> reads
(evidence.chunk_ids -> evidence.citations). Delete the now-unused
search->read highlighting hand-off: the kb_matched_chunk_ids state field,
its reducer default, the tool's _matched_chunk_ids writer, and the dead
KnowledgePriorityMiddleware writes.
Mirror search_threads visibility in the referenced-chat resolver: a
search-space owner can now @-mention legacy threads that predate creator
tracking (null created_by_id), instead of those being silently dropped.
Add the referenced_chat_context slice: models for the data shapes, a
fail-closed resolver that fetches mentioned threads and their visible
turns under the same access rules as thread search, and a transcript
renderer that emits a budgeted <referenced_chat_context> block. When a
chat exceeds the per-reference character budget, recent turns are kept
and any leftover budget is filled with the overflowing turn's tail, with
truncation markers signalling the cut.
Recursive pass over the agents module to make docstrings and inline
comments concise and intent-oriented: drop narration that just restates
the code, condense verbose module/function docstrings, and keep only the
non-obvious "why" notes. No functional code changed.
Move the lower-level runtime/infra modules out of multi_agent_chat/shared/
(they were never used by subagents, so they failed the shared-by-all-siblings
rule) and unify them with the already-relocated checkpointer:
agents/runtime/ -> agents/chat/runtime/
mac/shared/errors.py -> chat/runtime/errors.py
mac/shared/llm_config.py -> chat/runtime/llm_config.py
mac/shared/prompt_caching.py -> chat/runtime/prompt_caching.py
mac/shared/mention_resolver.py -> chat/runtime/mention_resolver.py
mac/shared/path_resolver.py -> chat/runtime/path_resolver.py
These sit below the agent packages: the boundary + agent factory + shared
middleware depend on them, and they import no agent code (acyclic).