The RAG/citation/context redesign in ADR 0001 is implemented and validated
(KB + web on the unified [n] citation spine, pull-based retrieval, eager path
retired). Drop the ADR and the one stale docstring reference to it.
The pull-based KB design (on-demand search_knowledge_base tool + pre-injected
workspace tree) fully replaced the old eager retrieval path. Remove its last
remnants:
- Delete KnowledgePriorityMiddleware (knowledge_search.py) and its tests.
- Drop the kb_priority state field + reducer default; trim
KbContextProjectionMiddleware to project only workspace_tree_text.
- Remove the now-dead feature flags enable_kb_priority_preinjection and
enable_kb_planner_runnable across backend (flags, route schema, tests,
env examples) and frontend (settings toggle, zod schema).
- Scrub <priority_documents> and stale KnowledgePriorityMiddleware references
from prompts, docstrings, and the ADR.
No functional change: nothing wrote kb_priority and neither flag gated live
behavior after the cutover. Full backend suite green (pre-existing unrelated
failures aside).
The legacy system_prompt_composer fragments and its default_system_instructions
wrapper were no longer referenced by any live code path (the main-agent prompt
builder owns composition now). Delete the whole orphaned tree and its test.
The model frequently writes citations glued to the preceding word
(docs[17]); the (?<!\w) lookbehind (added to dodge arr[1] array indexing)
silently skipped these, leaving raw [n] that fails to render and reads
like array access. Drop the lookbehind so glued citations resolve; genuine
code/array syntax stays protected by the existing code-region carve-out and
unresolved ordinals still drop harmlessly.
Rewrite the main-agent citation contract to a single [n] channel and sync
the orphaned system_prompt_composer surface to match; drop stale
[citation:chunk_id] / <chunk_index> references from dynamic_context and
provider hints. Reuse the shared hybrid search in the deliverables report
(citations omitted for now) and delete the orphaned report KB helper.
Remove the dead eager KnowledgePriorityMiddleware wiring (knowledge_priority
+ stack) and its legacy browse test. Update ADR 0001 to reflect the cutover.
web_search now registers each result as a WEB_RESULT (locator {url}) and
renders a <web_results> block of <document view="excerpt"> [n] passages,
returning Command(update={messages, citation_registry}) like
search_knowledge_base. Collapse the duplicate research-subagent web_search
into the shared tool and teach the prompts to cite web hits with [n].
The main agent's search_knowledge_base tool runs the hybrid spine, renders
a <retrieved_context> of numbered [n] passages, and persists the registry.
KB subagent prompts teach citing [n] from <document view="full"> reads
(evidence.chunk_ids -> evidence.citations). Delete the now-unused
search->read highlighting hand-off: the kb_matched_chunk_ids state field,
its reducer default, the tool's _matched_chunk_ids writer, and the dead
KnowledgePriorityMiddleware writes.
Add the checkpointed CitationRegistry (load/merge helpers + state field)
and a lightweight CitationStateMiddleware so subagents can register into
the same conversation registry. Resolve [n] -> [citation:<payload>] at
stream finalize from the registry, polymorphically by source type.
Add a shared document_render package that renders sources as
<document view="excerpt|full"> blocks with server-assigned [n] passage
labels (KB locator {document_id, chunk_id}, web locator {url}). Wire the
KB read backend (kb_postgres) and read_file to the new renderer and drop
the legacy per-document XML renderer (document_xml, retrieved_context) and
the old chunk_index / matched="true" / <chunk id> read format.
Add unit tests for role-specific turn extraction in the resolver and for
the transcript renderer: full rendering within budget, dropping oldest
turns with a marker, partial-tail fill of an overflowing turn, and
multi-chat tagging.
The knowledge_base subagent imported subagent_invoke_config + EXCLUDED_STATE_KEYS
from main_agent's checkpointed_subagent_middleware -- a subagent reaching into
main-agent internals. Both symbols (plus the recursion-limit constant they need)
are a subagent-invocation contract shared by the orchestrator's task middleware
and any nested-invoking subagent. Move them to subagents/shared/invocation.py;
config.py keeps the HITL resume side-channel and constants.py keeps the
main-agent tuning knobs. All consumers (task_tool, kb tool, tests) repointed.
The busy-mutex impl (BusyMutexMiddleware + cancel/turn-lifecycle primitives)
lived in shared/middleware/ but no subagent uses it -- consumers are the
main_agent builder and the boundary (turn lifecycle). Colocate with its owner
using the folder-per-middleware shape; __init__ re-exports the public surface so
boundary import sites only change package path:
main_agent/middleware/busy_mutex.py -> busy_mutex/builder.py
shared/middleware/busy_mutex.py -> busy_mutex/middleware.py
Per-file verification of the slice-3 candidates showed receipts/ and
date_filters.py are shared contracts (consumed by shared/state + shared
middleware + subagents), so they correctly stay put.
permissions was the real misfit: the rule *model* lived at shared/permissions.py
while its enforcement lived at shared/middleware/permissions/. Unify them into a
single self-contained subsystem:
shared/permissions.py -> shared/permissions/model.py
shared/middleware/permissions/{deny,ask,middleware}
-> shared/permissions/{deny,ask,middleware}
The package __init__ re-exports the model API + build_permission_mw, so the 32
external model consumers keep importing `from ...shared.permissions import Rule`
unchanged; only the 8 internal files redirect to `.model` (cycle-safe, model
loaded before middleware).
Move the lower-level runtime/infra modules out of multi_agent_chat/shared/
(they were never used by subagents, so they failed the shared-by-all-siblings
rule) and unify them with the already-relocated checkpointer:
agents/runtime/ -> agents/chat/runtime/
mac/shared/errors.py -> chat/runtime/errors.py
mac/shared/llm_config.py -> chat/runtime/llm_config.py
mac/shared/prompt_caching.py -> chat/runtime/prompt_caching.py
mac/shared/mention_resolver.py -> chat/runtime/mention_resolver.py
mac/shared/path_resolver.py -> chat/runtime/path_resolver.py
These sit below the agent packages: the boundary + agent factory + shared
middleware depend on them, and they import no agent code (acyclic).
Recursive shared-folder rule: a shared/ must be shared by ALL siblings at its
level. The kernel (context, compaction, retry_after, web_search) was shared by
only 2 of the agents -- anonymous_chat + multi_agent_chat -- never by podcaster
or video_presentation. Those 2 are the "chat" category, so their shared code
belongs in that category's shared/, not the top-level one.
app/agents/anonymous_chat/ -> app/agents/chat/anonymous_chat/
app/agents/multi_agent_chat/ -> app/agents/chat/multi_agent_chat/
app/agents/shared/ -> app/agents/chat/shared/ (anon<->mac kernel)
Top-level app/agents/shared/ is gone: nothing was shared across all three
categories (chat / podcaster / video_presentation).
~289 import sites rewritten (app.agents.{anonymous_chat,multi_agent_chat,shared}
-> app.agents.chat.*); all moves are git renames (history preserved).
app/agents/ now: chat/, podcaster/, video_presentation/, runtime/.
These were never shared with anonymous_chat (nor podcaster/video_presentation)
-- only multi_agent_chat (subagents/main agent) and the boundary use them:
shared/tools/mcp/ -> multi_agent_chat/shared/tools/mcp/
shared/tools/hitl.py -> multi_agent_chat/shared/tools/hitl.py
shared/tools/catalog.py -> multi_agent_chat/shared/tools/catalog.py
shared/middleware/dedup_tool_calls.py
-> multi_agent_chat/shared/middleware/dedup_tool_calls.py
app/agents/shared/ now holds only the genuine anon<->mac kernel:
context, middleware/{compaction,retry_after}, tools/web_search.
Neither module is imported by any sibling agent package, so neither belongs in
the cross-agent shared kernel:
- checkpointer.py -> app/agents/runtime/checkpointer.py
LangGraph Postgres checkpoint saver. It's cross-agent *runtime infra* wired by
the boundary (app lifespan + anonymous_chat & multi_agent_chat flows), not
agent code. New app/agents/runtime/ layer holds boundary-wired agent infra.
- shared/system_prompt.py + shared/prompts/ -> app/prompts/
The legacy single-agent prompt composer. The live agents don't use it
(main_agent has its own system_prompt/ builder; anonymous_chat builds inline);
its only consumer is new_llm_config_routes for displaying default instructions.
Moved to the existing non-agent prompt domain:
system_prompt.py -> app/prompts/default_system_instructions.py
prompts/ -> app/prompts/system_prompt_composer/
app/agents/shared/ now contains only genuinely cross-agent code: context,
middleware/{compaction,retry_after,dedup_tool_calls}, tools/.
NOTE: get_default_system_instructions() (LLM-config UI) composes from the legacy
library, which differs from what the live agents actually run -- pre-existing
latent staleness, not changed here.
app/agents/shared/ is a sibling of anonymous_chat/podcaster/multi_agent_chat/
video_presentation, so it should only hold code shared across 2+ of those
agents. In practice podcaster and video_presentation import nothing from it,
and anonymous_chat needs only context + compaction + retry_after + web_search.
Everything else was multi_agent_chat-only (the boundary just passes through).
Move the multi_agent_chat-only cluster into multi_agent_chat/shared/ (files
moved verbatim via git rename; ~116 import sites rewritten):
errors, feature_flags, filesystem_selection, path_resolver, prompt_caching,
sandbox, llm_config, mention_resolver
middleware/busy_mutex, middleware/kb_persistence
busy_mutex/llm_config/mention_resolver are boundary-only but import the moved
modules, so they were folded in to avoid a backwards shared -> multi_agent_chat
dependency. main_agent builders now import the impls directly; the shared
middleware barrel keeps only the genuinely-shared compaction + retry_after.
Also delete the dead leftover shared/plugins and shared/skills dirs (live
copies already live under main_agent/).
Remaining in app/agents/shared/: context, system_prompt(+prompts), checkpointer,
middleware/{compaction,retry_after,dedup_tool_calls}, tools/. checkpointer and
system_prompt are boundary-only infra pending a dedicated home decision.
app/agents/shared/middleware/permission.py was an older, monolithic
PermissionMiddleware superseded by the modular permissions/ package under
multi_agent_chat/shared/middleware/ (core + evaluation + ask/ + factory).
Production wires only the package (main_agent stack + every subagent
builder); the kernel file was reachable only through the shared barrel
re-export (itself unused) and two tests pinned to its dead internals
(_raise_interrupt, _normalize_permission_decision, old after_model shape).
- delete app/agents/shared/middleware/permission.py
- drop PermissionMiddleware from the shared middleware barrel
- delete test_permission_middleware.py (covered the dead impl only; live
behavior is covered by tests/.../middleware/shared/permissions/*)
- test_desktop_safety_rules.py: keep the ruleset-level regression tests,
drop the dead import + TestPermissionMiddlewareIntegration class
knowledge_search, memory_injection and scoped_model_fallback no longer
belong in the cross-agent kernel (app/agents/shared/middleware): they are
consumed only inside multi_agent_chat. Relocate each impl next to the
builder that uses it:
- knowledge_search.py -> multi_agent_chat/shared/middleware/ (genuinely
shared: its _render_priority_message feeds kb_context_projection, used by
both the main agent and the KB subagent)
- memory_injection.py -> multi_agent_chat/shared/middleware/ (beside its
memory.py builder)
- scoped_model_fallback.py -> multi_agent_chat/shared/middleware/resilience/
(beside fallback.py/bundle.py)
Impls moved verbatim (git rename). Builders/consumers now import the local
sibling; main_agent knowledge_priority imports the new shared path; shared
middleware barrel trimmed.
Tests: repoint imports; convert the knowledge_search monkeypatch targets
from brittle dotted-string form to object-based patching (monkeypatch.setattr
on the imported module), which is robust to import ordering. No behavior
change.
Each main-agent-only middleware now lives in its own folder under
main_agent/middleware/<concept>/ with builder.py (flag-gated construction)
+ middleware.py (the impl), re-exported via __init__.py. This kills the
cross-folder hop into agents/shared/middleware and keeps each middleware's
two responsibilities (build vs behavior) as colocated siblings.
Moved (impl from shared/middleware, builder from main_agent/middleware):
action_log, anonymous_document, context_editing, doom_loop, knowledge_tree,
noop_injection, otel_span, tool_call_repair.
Impls moved verbatim (git rename, no body edits) so behavior is unchanged.
Builders now import from the local .middleware sibling. stack.py import
paths updated for the 3 renamed folders; shared middleware barrel trimmed;
tests repointed (imports + patch targets).
DedupHITLToolCallsMiddleware is only wired by the main_agent stack, but
its module also exports dedup-key resolvers consumed by the shared MCP
tool layer. Splitting keeps the resolvers (dedup_key_full_args,
wrap_dedup_key_by_arg_name, DedupResolver) in shared and moves the
middleware class verbatim into main_agent/middleware/dedup_hitl.py
(merged with its builder), eliminating the shared->main_agent dependency
that a flat move would create. No behavior change.
file_intent (FileIntentMiddleware) and flatten_system
(FlattenSystemMessageMiddleware) were only ever instantiated in the
single-agent chat_deepagent stack, which was removed in 14bbea085. They
have no production consumer in multi_agent_chat. Delete both modules and
their unit tests.
Also drop the vestigial KnowledgeBaseSearchMiddleware alias (= the live
KnowledgePriorityMiddleware); its tests now target the real class so the
behavior coverage is preserved. Trim the three barrel/__all__ entries and
strip the now-dead class names from comments.
permissions.py (authorization Rule/Ruleset model) is consumed across all
MAC subagents + the permissions middleware, with a single external
consumer (user_tool_allowlist service) -> move to
multi_agent_chat/shared/permissions.py and repoint all 42 sites.
deliverable_wait.py (wait_for_deliverable) is used only by the podcast and
video_presentation deliverable tools -> colocate into
subagents/builtins/deliverables/.
No behavior change; import-all + permission/allowlist/deliverable unit
tests stay green.
filesystem_state.py (the multi-agent graph state) and state_reducers.py
(its merge reducers) are consumed only by multi_agent_chat (filesystem
tools/middleware, kb projection, and the MAC-only shared middleware) plus
two unit tests -- no external app code. Relocate them into a dedicated
multi_agent_chat/shared/state/ package (filesystem_state.py + reducers.py)
and repoint every importer.
No behavior change; import-all + the full unit/middleware + unit/agents
suites (1066 tests) stay green.
Move modules out of agents/shared/ that are consumed by a single package
(main_agent), placing each next to its only consumer instead of in a
"shared" grab-bag:
- agent_cache.py -> main_agent/runtime/agent_cache_store.py
- connector_searchable_types.py -> main_agent/runtime/
- plugin_loader.py + plugins/ -> main_agent/plugins/
- skills/ + skills_backends.py -> main_agent/skills/
- tools/invalid_tool.py -> main_agent/tools/
Drop the skills_backends re-export from the shared middleware barrel and
repoint all consumers + tests. No behavior change; import-all,
error-contract, and the moved tests stay green.
The three MCP siblings (mcp_client/mcp_tool/mcp_tools_cache) served one
objective but sat loose at the top of shared/tools. Grouped them into an
mcp/ package and dropped the redundant prefix: client.py, tool.py, cache.py.
Updated all importers (routes, mcp_tools subagent, e2e fake patch targets,
unit test) to the new paths.
The deliverables subagent runs its own generate_image/podcast/report/resume/
video_presentation (via tools/index.py); the shared/tools copies had zero
production importers — classic dead twins. Removed them so deliverable tools
live only in their vertical slice.
While repointing the 2 stranded unit tests at the LIVE deliverables modules,
found the OpenRouter empty-api_base defense (resolve_api_base) existed ONLY in
the dead shared generate_image, never propagated to the live multi-agent copy.
Ported the fix into deliverables/tools/generate_image.py (both the global-config
and user-DB-config branches) so an empty api_base no longer falls through to
LiteLLM's global api_base (Azure) and 404s.
Tests now exercise the live Command/receipt-returning tools (invoke the raw
coroutine with a hand-built ToolRuntime; resume progress events neutralized).
Replace the connector-coupled BUILTIN_TOOLS registry with a pure-data
catalog so shared/tools no longer imports any connector module, making the
connector packages independently deletable.
- add shared/tools/catalog.py (ToolMetadata + TOOL_CATALOG, 41 tools, no imports)
- point GET /agent/tools (the only live consumer) at the catalog
- relocate ToolDefinition into action_log middleware (its sole consumer);
drop the inert tool_definitions wiring (no tool defines reverse)
- delete shared/tools/registry.py: connector imports, dead factories,
dead get_connector_gated_tools, and BUILTIN_TOOLS
- drop stale dedup-propagation test (path removed in C1) + refresh docstrings
import-all guardrail + agents unit suite green (987 passed).
Eliminate the top-level multi_agent_chat/middleware/ package so each slice
owns its middleware (vertical-slice colocation):
- middleware/shared/ -> shared/middleware/ (cross-slice middleware)
- middleware/subagent/ -> subagents/shared/middleware/ (subagent stack)
- main_agent/middleware/ already colocated in Slice A
The moved shared/ subtree is internally consistent (all relative imports
stay within it), so only external absolute refs were rewritten. The
subagent stack's ..shared.* relatives were promoted to absolute paths to
the new shared/middleware/ location.
multi_agent_chat/ root is now: main_agent/, shared/, subagents/.
Verified: 2430 unit tests pass, 1 skipped (baseline unchanged).
Vertical-slice colocation: all main-agent code should live under
main_agent/ instead of being split across a parallel middleware/main_agent
tree. Move multi_agent_chat/middleware/main_agent/ -> main_agent/middleware/
and its assembler middleware/stack.py -> main_agent/middleware/stack.py, so
the main-agent slice is self-contained (graph, runtime, system_prompt, tools,
middleware).
Genuinely cross-slice middleware (middleware/shared/, middleware/subagent/)
stays under multi_agent_chat/middleware/ for a later slice; the moved builders
now reference it via absolute imports.
Pure move + import rewrite (git-tracked renames). Verified: full unit suite
green (2430 passed, 1 skipped), including test_import_all and the
checkpointed-subagent middleware suite.
The single-agent-era filesystem middleware (app/agents/shared/middleware/
filesystem.py, ~2000 lines) was never instantiated in production, yet three
unit suites validated it — an illusory guardrail while the live decomposed
middleware (multi_agent_chat/middleware/shared/filesystem) was unguarded.
Close the gap before reorganizing the agents module:
- Add 14 integration tests driving live B's tools in desktop mode (real
on-disk effects) and cloud mode (in-state staging, namespace policy).
- Port all high-value dead-twin assertions onto the live path: cloud rm/rmdir
staging + guard rails, KBPostgresBackend delete-view filter, mode-scoped
system prompt, cwd/relative/namespace resolution, multi-root mount
normalization.
- Delete dead twin filesystem.py, drop its __init__ re-export, and retire its
3 dead-twin tests.
Verified: test_import_all + middleware unit + FS integration all green.
With multi-agent the only live factory (B1), the single-agent stack is dead.
Remove app/agents/new_chat/ entirely: chat_deepagent.py, subagents/, and all
re-export shims (errors/context/llm_config/permissions/tools/middleware/...) that
existed only to serve frozen single-agent code. Live code already imports the
shared kernel (app.agents.shared.*) directly.
Tests: delete single-agent-only suites (test_resolve_prompt_model_name,
test_specialized_subagents) and the chat_deepagent source-shape contract assertion;
repoint test_scoped_model_fallback to the shared middleware path. Suite green
(2710 passed).
Three live shared leaves discovered while taking stock after slice 7 (all are
consumed by the multi-agent stack and/or live routes, not single-agent-only):
- connector_searchable_types -> shared + shim (multi-agent factory uses it)
- agent_cache -> shared + shim (multi-agent runtime/agent_cache uses it)
- system_prompt + prompts/ (42 .md fragments) -> shared together + shim.
Repointed composer's _PROMPTS_PACKAGE to app.agents.shared.prompts so
importlib.resources fragment loading keeps working; system_prompt's relative
".prompts.composer" import is preserved by moving both as a unit.
Each keeps a re-export shim for the frozen chat_deepagent. After this slice,
new_chat/ holds only the frozen single-agent stack (chat_deepagent, subagents/,
__init__) plus shims.