rowboat/apps/x/packages/shared/src/runs.ts

224 lines
6.2 KiB
TypeScript
Raw Permalink Normal View History

2025-12-29 15:30:57 +05:30
import { LlmStepStreamEvent } from "./llm-step-events.js";
import { Message, ToolCallPart } from "./message.js";
feat: run code mode on an in-app ACP client with live approvals (#593) * feat(code-mode): add ACP client engine (Layer 2 core) Own the Agent Client Protocol client instead of shelling out to `acpx`, so code mode can stream structured events (tool calls, diffs, plan) and surface live permission requests. Headless acpx can't do live approvals (it only supports --approve-all), which is why we drive the agent adapters ourselves. - code-mode/acp/{agents,client,permission-broker,session-store,manager,types}.ts: headless engine driving the Claude/Codex ACP adapters; one warm session per chat with create-or-resume via session/load; approval policy (ask | auto-approve-reads | yolo) in the broker. - claude-exec.ts: cross-platform claude resolver (Windows .cmd EINVAL fix + macOS/Linux GUI-PATH safety net) shared with the legacy acpx path in builtin-tools.ts. - add @agentclientprotocol/sdk + claude/codex adapters to core. * feat(code-mode): route code mode through code_agent_run tool + live approvals Replace the acpx shell-out with a structured code_agent_run tool that drives the ACP engine directly, streaming the agent's tool calls / diffs / plan into the chat and surfacing permission requests inline. - shared: code-mode.ts zod schemas; add code-run-event + code-run-permission-request RunEvent variants (stream to the renderer over the existing runs:events channel); codeRun:resolvePermission IPC channel. - core: CodePermissionRegistry (promise-based mid-run approvals — the LLM tool-loop's pre-call gate can't model a mid-execution wait); register codeModeManager + codePermissionRegistry in awilix. - core: code_agent_run builtin tool (streams via ctx.publish, asks via the registry, cancels on ctx.signal, returns the agent summary). CodeModeConfig.approvalPolicy (ask | auto-approve-reads | yolo; default ask). Exclude the tool from the headless background-task / live-note / inline-task agents so they can't block on an approval. - main: codeRun:resolvePermission handler -> registry.resolve. - rewrite the code-with-agents skill and the runtime "Code Mode (Active)" block to call code_agent_run instead of emitting npx acpx commands. * feat(code-mode): render coding runs inline (live timeline + permission card) Render a code_agent_run tool call as a live CodingRun block instead of generic tool output: the agent's text, tool-call rows (kind icon + status + changed-file names from diffs), a plan checklist, and resolved-permission lines — plus an inline Allow / Always-allow / Deny card wired to codeRun:resolvePermission. - chat-conversation.ts: ToolCall carries codeRunEvents + pendingCodePermission; code_agent_run is excluded from tool-grouping so it renders standalone. - App.tsx: handle code-run-event / code-run-permission-request, clear the pending card on tool-result, handleCodePermissionResponse, render via CodingRunBlock. * fix(code-mode): run the ACP adapter as Node under Electron + resolve it from main Two runtime failures that only surfaced inside the packaged/bundled Electron app (the headless harness used real node, so neither showed there): - "ACP connection closed": the main process spawns the adapter via process.execPath, which inside Electron is the Electron binary, not node — so the child never ran as Node and its ACP stdio stream closed immediately. Set ELECTRON_RUN_AS_NODE=1 on the adapter env (a no-op under real node). - "Cannot find module '@agentclientprotocol/claude-agent-acp'": the adapters were transitive (core) deps, unreachable from the esbuild-bundled main.cjs. Add them as direct deps of the main app so require.resolve finds them at runtime (and so they ship when packaged). Also capture the adapter's stderr + exit code and enrich connection errors, so a future failure reports the real cause instead of the opaque "ACP connection closed". * chore(code-mode): remove dead acpx code paths and stale copy Code mode now runs through the code_agent_run tool (owning the ACP client), so the legacy acpx shell-out paths are dead. Remove them: - core: envForCommand (acpx-only CLAUDE_CODE_EXECUTABLE injection) from executeCommand; getCodeModeCommandLabel (acpx run-status label). - renderer: the acpx-detection "switch agent / auto-flip the code-mode chip" flow — App.tsx executeCommand detection, the permission-request onSwitchAgent button + badge, and the composer's code-mode-detected listener. - copy: Settings -> Code Mode and the code-with-agents skill summary no longer mention acpx; tidy stale comments (claude-exec, command-executor). No behavior change for code mode; the general executeCommand tool is unaffected. * feat(code-mode): approval-policy selector in Settings Surface the approval policy (Ask every time / Auto-approve reads / YOLO) in Settings -> Code Mode, instead of being config-file only. The broker already reads CodeModeConfig.approvalPolicy; this plumbs it through the codeMode:getConfig / setConfig IPC + main handlers and adds the picker UI (with a one-line explanation of each level). Defaults to "ask". * fix(code-mode): harden ACP engine — turn-scoped connections, chip-authoritative agent, reliable stop Three robustness fixes that co-modify manager.runPrompt and the code_agent_run tool, so they land together: - Lifecycle: scope each ACP adapter connection to the agent turn. Dispose it a short grace (60s) after the turn ends instead of holding it for the app's life; the next turn resumes via session/load (both agents support it). Wire disposeAll() on app quit (was dead code). Fixes the unbounded per-chat leak of booted agent processes. - Agent selection: make the composer chip the source of truth. Thread codeMode into ToolContext; code_agent_run uses it instead of the model's guessed `agent` arg, which anchored on the thread's earlier agent and ignored a chip change. Prompts updated to match; the run is labelled by the agent that actually ran. - Stop/abort: guarantee a stopped turn unwinds. On abort the manager sends ACP session/cancel, then force-kills the adapter after a 2s grace and resolves the turn as cancelled — a wedged adapter can no longer hang the run and lock the chat. code_agent_run returns a clean cancelled result. * fix(code-mode): hide Codex's native console window on Windows Codex's engine ships as a native console-subsystem binary (codex.exe). Launched from our console-less Electron process tree, Windows allocated a fresh *visible* console window for it; closing that window wedged the run in a pending state. (Claude Code is a Node CLI, so it never triggers this.) The window is created by @openai/codex's launcher (bin/codex.js), which spawns codex.exe with no windowsHide. Patch it via pnpm to pass windowsHide: true (CREATE_NO_WINDOW) so the console stays hidden — no window, nothing to close. * refactor(code-mode): move ACP session files out of WorkDir/config Per-run ACP session state is runtime state that accumulates one file per chat run, not user/app config. Relocate it from WorkDir/config to a dedicated WorkDir/code-mode/sessions/ so it can be listed, cleaned up, and managed on its own without crowding config. Drop the now-redundant codesession- filename prefix (the directory conveys it).
2026-06-05 14:45:08 +05:30
import { CodeRunEvent as CodeRunEventSchema, PermissionAsk } from "./code-mode.js";
2025-12-29 15:30:57 +05:30
import z from "zod";
const BaseRunEvent = z.object({
runId: z.string(),
ts: z.iso.datetime().optional(),
subflow: z.array(z.string()),
});
export const RunProcessingStartEvent = BaseRunEvent.extend({
type: z.literal("run-processing-start"),
});
export const RunProcessingEndEvent = BaseRunEvent.extend({
type: z.literal("run-processing-end"),
});
export const StartEvent = BaseRunEvent.extend({
type: z.literal("start"),
agentName: z.string(),
freeze model + provider per run at creation time The model dropdown was broken in two ways: it wrote to ~/.rowboat/config/models.json (the BYOK creds file, stamped with a fake `flavor: 'openrouter'` to satisfy zod when signed in), and the runtime ignored that write entirely for signed-in users because `streamAgent` hard-coded `gpt-5.4`. Model selection was also globally scoped, so every chat shared one brain. This change moves model + provider out of the global config and onto the run itself, resolved once at runs:create and frozen for the run's lifetime. ## Resolution `runsCore.createRun` resolves per-field, falling through: run.model = opts.model ?? agent.model ?? defaults.model run.provider = opts.provider ?? agent.provider ?? defaults.provider A new `core/models/defaults.ts` is the only place in the codebase that branches on signed-in state. `getDefaultModelAndProvider()` returns name strings; `resolveProviderConfig(name)` does the name → full LlmProvider lookup at runtime. `createProvider` learns about `flavor: 'rowboat'` so the gateway is just another flavor. `provider` is stored as a name (e.g. `"rowboat"`, `"openai"`), not a full LlmProvider object. API keys never get written into the JSONL log; rotating a key in models.json applies to existing runs without re-creation. Cost: deleting a provider from settings breaks runs that referenced it (clear error surfaced via `resolveProviderConfig`). ## Runtime `streamAgent` no longer resolves anything — it reads `state.runModel` / `state.runProvider`, looks up the provider config, instantiates. Subflows inherit the parent run's pair, so KG / inline-task subagents run on whatever the main run resolved to at creation. The `knowledgeGraphAgents` array, `isKgAgent`, and the per-agent default constants are gone. KG / inline-task / pre-built agents declare their preferred model in YAML frontmatter (claude-haiku-4.5 / claude-sonnet-4.6) — used at resolution time when those agents are themselves the top-level agent of a run (background triggers, scheduled tasks, etc.). ## Standalone callers Non-run LLM call sites (summarize_meeting, track/routing, builtin-tools parseFile) and `agent-schedule/runner` were branching on signed-in independently. They all route through `getDefaultModelAndProvider` + `resolveProviderConfig` + `createProvider` now; `agent-schedule/runner` switched from raw `runsRepo.create` to `runsCore.createRun` so resolution applies to scheduled-agent runs too. ## UI `chat-input-with-mentions` stops calling `models:saveConfig`. The dropdown notifies the parent via `onSelectedModelChange` ({provider, model} as names); App.tsx stashes selection per-tab and passes it to the next `runs:create`. When a run already exists, the input fetches it and renders a static label — model can't change mid-run. ## Legacy runs A lenient zod schema in `repo.ts` (`StartEvent.extend(...optional)` plus `RunEvent.or(LegacyStartEvent)`) parses pre-existing runs. `repo.fetch` fills missing model/provider from current defaults and returns the strict canonical `Run` type. No file-rewriting migration; no impact on the canonical schema in `@x/shared`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:26:01 +05:30
model: z.string(),
provider: z.string(),
permissionMode: z.enum(["manual", "auto"]).optional(),
// useCase/subUseCase tag the run for analytics. Optional on read so legacy
// run files written before these fields existed still parse cleanly.
useCase: z.enum([
"copilot_chat",
feat: live notes — single objective per note replaces multi-track model Folds the multi-`track:`-array model into one `live:` block per note: a single persistent objective the live-note agent maintains, plus an optional triggers object (`cronExpr` / `windows` / `eventMatchCriteria`, each independently optional). A note is now passive or live — no per-track scopes, no section ownership contract, no `once` trigger. The agent owns the whole body and makes patch-style incremental edits per run. Highlights: - Schema: `track:` array → single `live:` object (`packages/shared/src/live-note.ts`). - Runtime: scheduler / event processor / runner under `core/knowledge/live-note/`, with split `lastAttemptAt` (every run, drives 5-min backoff) vs `lastRunAt` (success only, anchors cycles). `throwOnError` on agent runs surfaces LLM / billing failures into `lastRunError`. - Today.md: regenerated by template v2 (single objective covering overview / calendar / emails / what-you-missed / priorities; existing files renamed to `Today.md.bkp.<stamp>`). - Renderer: `LiveNoteSidebar` mounts inside the editor row (no chat overlap, auto-closes on note switch); toolbar Radio button becomes a status pill; `LiveNotesView` replaces background-agents view. - Copilot: new `live-note` skill with act-first stance, default folder/cadence pickers, and a non-negotiable rule to extend an existing objective rather than add a second one. Shared `KNOWLEDGE_NOTE_STYLE_GUIDE` enforces terse-and-scannable writing across `doc-collab` and the live-note agent. - Analytics: `track_block` use-case → `live_note_agent`; trigger (`manual` / `cron` / `window` / `event`) becomes the Pass-2 sub-use-case, alongside `routing` for Pass 1. Legacy run files with the old value are read-mapped via `LegacyStartEvent` so they stay openable in the runs list. Hard cutover — no back-compat shims for legacy `track:` frontmatter arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 00:26:46 +05:30
"live_note_agent",
feat: background tasks Adds Background Tasks — recurring background agents the user can set up to either keep a digest current (daily email summary, top HN stories, weather brief) or perform a recurring action (draft a reply, post to Slack, call an API). Each task is a persistent set of instructions plus optional triggers (schedule, time-of-day window, or matching incoming Gmail / calendar event). The agent reads the verbs in the instructions on every run and picks the right mode automatically. User-facing surfaces: - New "Background tasks" entry in the sidebar, with a table listing every task, its schedule, last run, and an active toggle. - A detail page per task with a max-width reader showing the task's current output and a control sidebar for editing instructions, triggers, and reviewing run history. - "New task" can open in a free-form box where the user describes what they want and Copilot sets it up end-to-end, or in a structured form for manual setup. - "Edit with Copilot" hand-off from the detail view, pre-seeded with the task's context. Under the hood: - The event pipeline that previously powered live-notes is now a generic consumer registry. Live-notes and background tasks both subscribe; incoming events are routed to candidates from both concurrently. - Schedule helpers and the agent-message trigger block are factored out of live-notes into shared modules. Both features use the same building blocks now. - Copilot's proactive routing is reframed: anything recurring (cadence words, watch / monitor verbs, action verbs, event-conditional asks) now flows to background tasks. Live-notes load only on explicit mention. - A small reliability fix for the run-creation fallback chain: an empty-string model/provider passed by an LLM tool call now correctly falls through to the default instead of being persisted as a real value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:43:25 +05:30
"background_task_agent",
"meeting_note",
"knowledge_sync",
]).optional(),
subUseCase: z.string().optional(),
2025-12-29 15:30:57 +05:30
});
export const SpawnSubFlowEvent = BaseRunEvent.extend({
type: z.literal("spawn-subflow"),
agentName: z.string(),
toolCallId: z.string(),
});
export const LlmStreamEvent = BaseRunEvent.extend({
type: z.literal("llm-stream-event"),
event: LlmStepStreamEvent,
});
export const MessageEvent = BaseRunEvent.extend({
type: z.literal("message"),
messageId: z.string(),
message: Message,
});
export const ToolInvocationEvent = BaseRunEvent.extend({
type: z.literal("tool-invocation"),
toolCallId: z.string().optional(),
toolName: z.string(),
input: z.string(),
});
export const ToolResultEvent = BaseRunEvent.extend({
type: z.literal("tool-result"),
toolCallId: z.string().optional(),
toolName: z.string(),
result: z.any(),
});
2026-05-06 19:03:27 +05:30
export const ToolOutputStreamEvent = BaseRunEvent.extend({
type: z.literal("tool-output-stream"),
toolCallId: z.string(),
toolName: z.string(),
output: z.string(),
});
2025-12-29 15:30:57 +05:30
export const AskHumanRequestEvent = BaseRunEvent.extend({
type: z.literal("ask-human-request"),
toolCallId: z.string(),
query: z.string(),
Code Mode: in-chat toggle, settings tab, and permission/command UX (#572) * feat: add in-chat code mode toggle with claude/codex swap * feat: show agent and add swap-and-retry on acpx permission card * style: reorder permission card buttons (approve, deny, swap) * feat: add tooltips to composer plus and web search buttons * feat: add code mode settings tab with agent install/auth checks * feat: show sign-in command when agent installed but signed out * style: refine code-mode permission and command block UX - Render permission block before the command block - Collapse permission details after a response; click header to expand - Drop status icons/badge; use minimal green / bold red blocks - Auto-collapse the running command block once it completes * feat: rotating progress labels for code-mode commands; darker tool borders - Code-mode (acpx) command block shows status-aware labels: rotating 'Working on the task…' phrases (5s each, holding on the last) while running, then 'Completed the task' / "Couldn't complete the task" - Darken outer border on all tool blocks in light and dark modes * fix: detect Claude Code sign-in via macOS Keychain On macOS, Claude Code stores OAuth credentials in the login Keychain (service 'Claude Code-credentials'), not in ~/.claude/.credentials.json. Read the Keychain as a fallback so signed-in Mac users are detected. * feat: persistent per-chat sessions for code-mode coding agents - Use a named acpx session (rowboat-<runId>) per chat so follow-up coding requests resume the same agent and keep context - Create the session once at chat start (sessions new --name), then prompt with -s <name>; reuse on follow-ups (no re-create) - Drop the redundant in-chat 'reply yes' confirmation (the executeCommand permission card is the confirmation) - Code-mode output uses plain-text paths (overrides global filepath rule) - On not-installed/auth errors, point user to Settings -> Code Mode * fix: code-mode session creation uses idempotent ensure, run sequentially - Use 'sessions ensure --name' instead of 'sessions new' so reopening a chat resumes the existing session instead of erroring on a name clash - Create the session and run the prompt as separate sequential calls so the permission/command blocks render one at a time (not all at once) * fix: reliable Claude Code session resume on Windows (avoid claude.cmd EINVAL) Resuming a code-mode chat after restarting the app spawns a fresh ACP agent. On Windows + Node >=20.12 the bridge spawning claude.cmd throws EINVAL, so the session queue owner fails to start. Rowboat injects CLAUDE_CODE_EXECUTABLE=claude.exe to dodge this, but the override didn't reliably reach the spawn. Windows-only; no-op on macOS/Linux. - executeCommand now accepts an env override and the non-abortable fallback path passes it through (was silently dropped) - resolveClaudeExeOnWindows also scans known npm/pnpm/volta global bin dirs, not just PATH (Electron's runtime PATH can omit them) - add --timeout 600 to acpx prompt commands so a genuine stall fails cleanly instead of hanging on 'Running' forever
2026-05-28 14:52:09 +05:30
options: z.array(z.string()).optional(),
2025-12-29 15:30:57 +05:30
});
export const AskHumanResponseEvent = BaseRunEvent.extend({
type: z.literal("ask-human-response"),
toolCallId: z.string(),
response: z.string(),
});
export const ToolPermissionMetadata = z.discriminatedUnion("kind", [
z.object({
kind: z.literal("command"),
commandNames: z.array(z.string()),
}),
z.object({
kind: z.literal("file"),
operation: z.enum(["read", "list", "search", "write", "delete"]),
paths: z.array(z.string()),
pathPrefix: z.string(),
}),
]);
2025-12-29 15:30:57 +05:30
export const ToolPermissionRequestEvent = BaseRunEvent.extend({
type: z.literal("tool-permission-request"),
toolCall: ToolCallPart,
permission: ToolPermissionMetadata.optional(),
2025-12-29 15:30:57 +05:30
});
export const ToolPermissionResponseEvent = BaseRunEvent.extend({
type: z.literal("tool-permission-response"),
toolCallId: z.string(),
response: z.enum(["approve", "deny"]),
scope: z.enum(["once", "session", "always"]).optional(),
2025-12-29 15:30:57 +05:30
});
feat: run code mode on an in-app ACP client with live approvals (#593) * feat(code-mode): add ACP client engine (Layer 2 core) Own the Agent Client Protocol client instead of shelling out to `acpx`, so code mode can stream structured events (tool calls, diffs, plan) and surface live permission requests. Headless acpx can't do live approvals (it only supports --approve-all), which is why we drive the agent adapters ourselves. - code-mode/acp/{agents,client,permission-broker,session-store,manager,types}.ts: headless engine driving the Claude/Codex ACP adapters; one warm session per chat with create-or-resume via session/load; approval policy (ask | auto-approve-reads | yolo) in the broker. - claude-exec.ts: cross-platform claude resolver (Windows .cmd EINVAL fix + macOS/Linux GUI-PATH safety net) shared with the legacy acpx path in builtin-tools.ts. - add @agentclientprotocol/sdk + claude/codex adapters to core. * feat(code-mode): route code mode through code_agent_run tool + live approvals Replace the acpx shell-out with a structured code_agent_run tool that drives the ACP engine directly, streaming the agent's tool calls / diffs / plan into the chat and surfacing permission requests inline. - shared: code-mode.ts zod schemas; add code-run-event + code-run-permission-request RunEvent variants (stream to the renderer over the existing runs:events channel); codeRun:resolvePermission IPC channel. - core: CodePermissionRegistry (promise-based mid-run approvals — the LLM tool-loop's pre-call gate can't model a mid-execution wait); register codeModeManager + codePermissionRegistry in awilix. - core: code_agent_run builtin tool (streams via ctx.publish, asks via the registry, cancels on ctx.signal, returns the agent summary). CodeModeConfig.approvalPolicy (ask | auto-approve-reads | yolo; default ask). Exclude the tool from the headless background-task / live-note / inline-task agents so they can't block on an approval. - main: codeRun:resolvePermission handler -> registry.resolve. - rewrite the code-with-agents skill and the runtime "Code Mode (Active)" block to call code_agent_run instead of emitting npx acpx commands. * feat(code-mode): render coding runs inline (live timeline + permission card) Render a code_agent_run tool call as a live CodingRun block instead of generic tool output: the agent's text, tool-call rows (kind icon + status + changed-file names from diffs), a plan checklist, and resolved-permission lines — plus an inline Allow / Always-allow / Deny card wired to codeRun:resolvePermission. - chat-conversation.ts: ToolCall carries codeRunEvents + pendingCodePermission; code_agent_run is excluded from tool-grouping so it renders standalone. - App.tsx: handle code-run-event / code-run-permission-request, clear the pending card on tool-result, handleCodePermissionResponse, render via CodingRunBlock. * fix(code-mode): run the ACP adapter as Node under Electron + resolve it from main Two runtime failures that only surfaced inside the packaged/bundled Electron app (the headless harness used real node, so neither showed there): - "ACP connection closed": the main process spawns the adapter via process.execPath, which inside Electron is the Electron binary, not node — so the child never ran as Node and its ACP stdio stream closed immediately. Set ELECTRON_RUN_AS_NODE=1 on the adapter env (a no-op under real node). - "Cannot find module '@agentclientprotocol/claude-agent-acp'": the adapters were transitive (core) deps, unreachable from the esbuild-bundled main.cjs. Add them as direct deps of the main app so require.resolve finds them at runtime (and so they ship when packaged). Also capture the adapter's stderr + exit code and enrich connection errors, so a future failure reports the real cause instead of the opaque "ACP connection closed". * chore(code-mode): remove dead acpx code paths and stale copy Code mode now runs through the code_agent_run tool (owning the ACP client), so the legacy acpx shell-out paths are dead. Remove them: - core: envForCommand (acpx-only CLAUDE_CODE_EXECUTABLE injection) from executeCommand; getCodeModeCommandLabel (acpx run-status label). - renderer: the acpx-detection "switch agent / auto-flip the code-mode chip" flow — App.tsx executeCommand detection, the permission-request onSwitchAgent button + badge, and the composer's code-mode-detected listener. - copy: Settings -> Code Mode and the code-with-agents skill summary no longer mention acpx; tidy stale comments (claude-exec, command-executor). No behavior change for code mode; the general executeCommand tool is unaffected. * feat(code-mode): approval-policy selector in Settings Surface the approval policy (Ask every time / Auto-approve reads / YOLO) in Settings -> Code Mode, instead of being config-file only. The broker already reads CodeModeConfig.approvalPolicy; this plumbs it through the codeMode:getConfig / setConfig IPC + main handlers and adds the picker UI (with a one-line explanation of each level). Defaults to "ask". * fix(code-mode): harden ACP engine — turn-scoped connections, chip-authoritative agent, reliable stop Three robustness fixes that co-modify manager.runPrompt and the code_agent_run tool, so they land together: - Lifecycle: scope each ACP adapter connection to the agent turn. Dispose it a short grace (60s) after the turn ends instead of holding it for the app's life; the next turn resumes via session/load (both agents support it). Wire disposeAll() on app quit (was dead code). Fixes the unbounded per-chat leak of booted agent processes. - Agent selection: make the composer chip the source of truth. Thread codeMode into ToolContext; code_agent_run uses it instead of the model's guessed `agent` arg, which anchored on the thread's earlier agent and ignored a chip change. Prompts updated to match; the run is labelled by the agent that actually ran. - Stop/abort: guarantee a stopped turn unwinds. On abort the manager sends ACP session/cancel, then force-kills the adapter after a 2s grace and resolves the turn as cancelled — a wedged adapter can no longer hang the run and lock the chat. code_agent_run returns a clean cancelled result. * fix(code-mode): hide Codex's native console window on Windows Codex's engine ships as a native console-subsystem binary (codex.exe). Launched from our console-less Electron process tree, Windows allocated a fresh *visible* console window for it; closing that window wedged the run in a pending state. (Claude Code is a Node CLI, so it never triggers this.) The window is created by @openai/codex's launcher (bin/codex.js), which spawns codex.exe with no windowsHide. Patch it via pnpm to pass windowsHide: true (CREATE_NO_WINDOW) so the console stays hidden — no window, nothing to close. * refactor(code-mode): move ACP session files out of WorkDir/config Per-run ACP session state is runtime state that accumulates one file per chat run, not user/app config. Relocate it from WorkDir/config to a dedicated WorkDir/code-mode/sessions/ so it can be listed, cleaned up, and managed on its own without crowding config. Drop the now-redundant codesession- filename prefix (the directory conveys it).
2026-06-05 14:45:08 +05:30
// A structured item from a code_agent_run coding turn (tool call, diff, plan,
// message chunk, resolved permission). Fire-and-forget — rendered live.
export const CodeRunStreamEvent = BaseRunEvent.extend({
type: z.literal("code-run-event"),
toolCallId: z.string(),
event: CodeRunEventSchema,
});
// The coding agent is asking for permission mid-turn and the run is BLOCKED until
// the user answers via `codeRun:resolvePermission` (keyed by requestId).
export const CodeRunPermissionRequestEvent = BaseRunEvent.extend({
type: z.literal("code-run-permission-request"),
toolCallId: z.string(),
requestId: z.string(),
ask: PermissionAsk,
});
export const ToolPermissionAutoDecisionEvent = BaseRunEvent.extend({
type: z.literal("tool-permission-auto-decision"),
toolCallId: z.string(),
toolCall: ToolCallPart,
permission: ToolPermissionMetadata.optional(),
decision: z.enum(["allow", "deny"]),
reason: z.string(),
});
2025-12-29 15:30:57 +05:30
export const RunErrorEvent = BaseRunEvent.extend({
type: z.literal("error"),
error: z.string(),
});
export const RunStoppedEvent = BaseRunEvent.extend({
type: z.literal("run-stopped"),
reason: z.enum(["user-requested", "force-stopped"]).optional(),
});
2025-12-29 15:30:57 +05:30
export const RunEvent = z.union([
RunProcessingStartEvent,
RunProcessingEndEvent,
StartEvent,
SpawnSubFlowEvent,
LlmStreamEvent,
MessageEvent,
ToolInvocationEvent,
ToolResultEvent,
2026-05-06 19:03:27 +05:30
ToolOutputStreamEvent,
2025-12-29 15:30:57 +05:30
AskHumanRequestEvent,
AskHumanResponseEvent,
ToolPermissionRequestEvent,
ToolPermissionResponseEvent,
feat: run code mode on an in-app ACP client with live approvals (#593) * feat(code-mode): add ACP client engine (Layer 2 core) Own the Agent Client Protocol client instead of shelling out to `acpx`, so code mode can stream structured events (tool calls, diffs, plan) and surface live permission requests. Headless acpx can't do live approvals (it only supports --approve-all), which is why we drive the agent adapters ourselves. - code-mode/acp/{agents,client,permission-broker,session-store,manager,types}.ts: headless engine driving the Claude/Codex ACP adapters; one warm session per chat with create-or-resume via session/load; approval policy (ask | auto-approve-reads | yolo) in the broker. - claude-exec.ts: cross-platform claude resolver (Windows .cmd EINVAL fix + macOS/Linux GUI-PATH safety net) shared with the legacy acpx path in builtin-tools.ts. - add @agentclientprotocol/sdk + claude/codex adapters to core. * feat(code-mode): route code mode through code_agent_run tool + live approvals Replace the acpx shell-out with a structured code_agent_run tool that drives the ACP engine directly, streaming the agent's tool calls / diffs / plan into the chat and surfacing permission requests inline. - shared: code-mode.ts zod schemas; add code-run-event + code-run-permission-request RunEvent variants (stream to the renderer over the existing runs:events channel); codeRun:resolvePermission IPC channel. - core: CodePermissionRegistry (promise-based mid-run approvals — the LLM tool-loop's pre-call gate can't model a mid-execution wait); register codeModeManager + codePermissionRegistry in awilix. - core: code_agent_run builtin tool (streams via ctx.publish, asks via the registry, cancels on ctx.signal, returns the agent summary). CodeModeConfig.approvalPolicy (ask | auto-approve-reads | yolo; default ask). Exclude the tool from the headless background-task / live-note / inline-task agents so they can't block on an approval. - main: codeRun:resolvePermission handler -> registry.resolve. - rewrite the code-with-agents skill and the runtime "Code Mode (Active)" block to call code_agent_run instead of emitting npx acpx commands. * feat(code-mode): render coding runs inline (live timeline + permission card) Render a code_agent_run tool call as a live CodingRun block instead of generic tool output: the agent's text, tool-call rows (kind icon + status + changed-file names from diffs), a plan checklist, and resolved-permission lines — plus an inline Allow / Always-allow / Deny card wired to codeRun:resolvePermission. - chat-conversation.ts: ToolCall carries codeRunEvents + pendingCodePermission; code_agent_run is excluded from tool-grouping so it renders standalone. - App.tsx: handle code-run-event / code-run-permission-request, clear the pending card on tool-result, handleCodePermissionResponse, render via CodingRunBlock. * fix(code-mode): run the ACP adapter as Node under Electron + resolve it from main Two runtime failures that only surfaced inside the packaged/bundled Electron app (the headless harness used real node, so neither showed there): - "ACP connection closed": the main process spawns the adapter via process.execPath, which inside Electron is the Electron binary, not node — so the child never ran as Node and its ACP stdio stream closed immediately. Set ELECTRON_RUN_AS_NODE=1 on the adapter env (a no-op under real node). - "Cannot find module '@agentclientprotocol/claude-agent-acp'": the adapters were transitive (core) deps, unreachable from the esbuild-bundled main.cjs. Add them as direct deps of the main app so require.resolve finds them at runtime (and so they ship when packaged). Also capture the adapter's stderr + exit code and enrich connection errors, so a future failure reports the real cause instead of the opaque "ACP connection closed". * chore(code-mode): remove dead acpx code paths and stale copy Code mode now runs through the code_agent_run tool (owning the ACP client), so the legacy acpx shell-out paths are dead. Remove them: - core: envForCommand (acpx-only CLAUDE_CODE_EXECUTABLE injection) from executeCommand; getCodeModeCommandLabel (acpx run-status label). - renderer: the acpx-detection "switch agent / auto-flip the code-mode chip" flow — App.tsx executeCommand detection, the permission-request onSwitchAgent button + badge, and the composer's code-mode-detected listener. - copy: Settings -> Code Mode and the code-with-agents skill summary no longer mention acpx; tidy stale comments (claude-exec, command-executor). No behavior change for code mode; the general executeCommand tool is unaffected. * feat(code-mode): approval-policy selector in Settings Surface the approval policy (Ask every time / Auto-approve reads / YOLO) in Settings -> Code Mode, instead of being config-file only. The broker already reads CodeModeConfig.approvalPolicy; this plumbs it through the codeMode:getConfig / setConfig IPC + main handlers and adds the picker UI (with a one-line explanation of each level). Defaults to "ask". * fix(code-mode): harden ACP engine — turn-scoped connections, chip-authoritative agent, reliable stop Three robustness fixes that co-modify manager.runPrompt and the code_agent_run tool, so they land together: - Lifecycle: scope each ACP adapter connection to the agent turn. Dispose it a short grace (60s) after the turn ends instead of holding it for the app's life; the next turn resumes via session/load (both agents support it). Wire disposeAll() on app quit (was dead code). Fixes the unbounded per-chat leak of booted agent processes. - Agent selection: make the composer chip the source of truth. Thread codeMode into ToolContext; code_agent_run uses it instead of the model's guessed `agent` arg, which anchored on the thread's earlier agent and ignored a chip change. Prompts updated to match; the run is labelled by the agent that actually ran. - Stop/abort: guarantee a stopped turn unwinds. On abort the manager sends ACP session/cancel, then force-kills the adapter after a 2s grace and resolves the turn as cancelled — a wedged adapter can no longer hang the run and lock the chat. code_agent_run returns a clean cancelled result. * fix(code-mode): hide Codex's native console window on Windows Codex's engine ships as a native console-subsystem binary (codex.exe). Launched from our console-less Electron process tree, Windows allocated a fresh *visible* console window for it; closing that window wedged the run in a pending state. (Claude Code is a Node CLI, so it never triggers this.) The window is created by @openai/codex's launcher (bin/codex.js), which spawns codex.exe with no windowsHide. Patch it via pnpm to pass windowsHide: true (CREATE_NO_WINDOW) so the console stays hidden — no window, nothing to close. * refactor(code-mode): move ACP session files out of WorkDir/config Per-run ACP session state is runtime state that accumulates one file per chat run, not user/app config. Relocate it from WorkDir/config to a dedicated WorkDir/code-mode/sessions/ so it can be listed, cleaned up, and managed on its own without crowding config. Drop the now-redundant codesession- filename prefix (the directory conveys it).
2026-06-05 14:45:08 +05:30
CodeRunStreamEvent,
CodeRunPermissionRequestEvent,
ToolPermissionAutoDecisionEvent,
2025-12-29 15:30:57 +05:30
RunErrorEvent,
RunStoppedEvent,
2025-12-29 15:30:57 +05:30
]);
export const ToolPermissionAuthorizePayload = ToolPermissionResponseEvent.pick({
subflow: true,
toolCallId: true,
response: true,
scope: true,
2025-12-29 15:30:57 +05:30
});
export const AskHumanResponsePayload = AskHumanResponseEvent.pick({
subflow: true,
toolCallId: true,
response: true,
});
export const UseCase = z.enum([
"copilot_chat",
feat: live notes — single objective per note replaces multi-track model Folds the multi-`track:`-array model into one `live:` block per note: a single persistent objective the live-note agent maintains, plus an optional triggers object (`cronExpr` / `windows` / `eventMatchCriteria`, each independently optional). A note is now passive or live — no per-track scopes, no section ownership contract, no `once` trigger. The agent owns the whole body and makes patch-style incremental edits per run. Highlights: - Schema: `track:` array → single `live:` object (`packages/shared/src/live-note.ts`). - Runtime: scheduler / event processor / runner under `core/knowledge/live-note/`, with split `lastAttemptAt` (every run, drives 5-min backoff) vs `lastRunAt` (success only, anchors cycles). `throwOnError` on agent runs surfaces LLM / billing failures into `lastRunError`. - Today.md: regenerated by template v2 (single objective covering overview / calendar / emails / what-you-missed / priorities; existing files renamed to `Today.md.bkp.<stamp>`). - Renderer: `LiveNoteSidebar` mounts inside the editor row (no chat overlap, auto-closes on note switch); toolbar Radio button becomes a status pill; `LiveNotesView` replaces background-agents view. - Copilot: new `live-note` skill with act-first stance, default folder/cadence pickers, and a non-negotiable rule to extend an existing objective rather than add a second one. Shared `KNOWLEDGE_NOTE_STYLE_GUIDE` enforces terse-and-scannable writing across `doc-collab` and the live-note agent. - Analytics: `track_block` use-case → `live_note_agent`; trigger (`manual` / `cron` / `window` / `event`) becomes the Pass-2 sub-use-case, alongside `routing` for Pass 1. Legacy run files with the old value are read-mapped via `LegacyStartEvent` so they stay openable in the runs list. Hard cutover — no back-compat shims for legacy `track:` frontmatter arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 00:26:46 +05:30
"live_note_agent",
feat: background tasks Adds Background Tasks — recurring background agents the user can set up to either keep a digest current (daily email summary, top HN stories, weather brief) or perform a recurring action (draft a reply, post to Slack, call an API). Each task is a persistent set of instructions plus optional triggers (schedule, time-of-day window, or matching incoming Gmail / calendar event). The agent reads the verbs in the instructions on every run and picks the right mode automatically. User-facing surfaces: - New "Background tasks" entry in the sidebar, with a table listing every task, its schedule, last run, and an active toggle. - A detail page per task with a max-width reader showing the task's current output and a control sidebar for editing instructions, triggers, and reviewing run history. - "New task" can open in a free-form box where the user describes what they want and Copilot sets it up end-to-end, or in a structured form for manual setup. - "Edit with Copilot" hand-off from the detail view, pre-seeded with the task's context. Under the hood: - The event pipeline that previously powered live-notes is now a generic consumer registry. Live-notes and background tasks both subscribe; incoming events are routed to candidates from both concurrently. - Schedule helpers and the agent-message trigger block are factored out of live-notes into shared modules. Both features use the same building blocks now. - Copilot's proactive routing is reframed: anything recurring (cadence words, watch / monitor verbs, action verbs, event-conditional asks) now flows to background tasks. Live-notes load only on explicit mention. - A small reliability fix for the run-creation fallback chain: an empty-string model/provider passed by an LLM tool call now correctly falls through to the default instead of being persisted as a real value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:43:25 +05:30
"background_task_agent",
"meeting_note",
"knowledge_sync",
]);
2025-12-29 15:30:57 +05:30
export const Run = z.object({
id: z.string(),
2026-01-20 16:36:36 +05:30
title: z.string().optional(),
2025-12-29 15:30:57 +05:30
createdAt: z.iso.datetime(),
agentId: z.string(),
freeze model + provider per run at creation time The model dropdown was broken in two ways: it wrote to ~/.rowboat/config/models.json (the BYOK creds file, stamped with a fake `flavor: 'openrouter'` to satisfy zod when signed in), and the runtime ignored that write entirely for signed-in users because `streamAgent` hard-coded `gpt-5.4`. Model selection was also globally scoped, so every chat shared one brain. This change moves model + provider out of the global config and onto the run itself, resolved once at runs:create and frozen for the run's lifetime. ## Resolution `runsCore.createRun` resolves per-field, falling through: run.model = opts.model ?? agent.model ?? defaults.model run.provider = opts.provider ?? agent.provider ?? defaults.provider A new `core/models/defaults.ts` is the only place in the codebase that branches on signed-in state. `getDefaultModelAndProvider()` returns name strings; `resolveProviderConfig(name)` does the name → full LlmProvider lookup at runtime. `createProvider` learns about `flavor: 'rowboat'` so the gateway is just another flavor. `provider` is stored as a name (e.g. `"rowboat"`, `"openai"`), not a full LlmProvider object. API keys never get written into the JSONL log; rotating a key in models.json applies to existing runs without re-creation. Cost: deleting a provider from settings breaks runs that referenced it (clear error surfaced via `resolveProviderConfig`). ## Runtime `streamAgent` no longer resolves anything — it reads `state.runModel` / `state.runProvider`, looks up the provider config, instantiates. Subflows inherit the parent run's pair, so KG / inline-task subagents run on whatever the main run resolved to at creation. The `knowledgeGraphAgents` array, `isKgAgent`, and the per-agent default constants are gone. KG / inline-task / pre-built agents declare their preferred model in YAML frontmatter (claude-haiku-4.5 / claude-sonnet-4.6) — used at resolution time when those agents are themselves the top-level agent of a run (background triggers, scheduled tasks, etc.). ## Standalone callers Non-run LLM call sites (summarize_meeting, track/routing, builtin-tools parseFile) and `agent-schedule/runner` were branching on signed-in independently. They all route through `getDefaultModelAndProvider` + `resolveProviderConfig` + `createProvider` now; `agent-schedule/runner` switched from raw `runsRepo.create` to `runsCore.createRun` so resolution applies to scheduled-agent runs too. ## UI `chat-input-with-mentions` stops calling `models:saveConfig`. The dropdown notifies the parent via `onSelectedModelChange` ({provider, model} as names); App.tsx stashes selection per-tab and passes it to the next `runs:create`. When a run already exists, the input fetches it and renders a static label — model can't change mid-run. ## Legacy runs A lenient zod schema in `repo.ts` (`StartEvent.extend(...optional)` plus `RunEvent.or(LegacyStartEvent)`) parses pre-existing runs. `repo.fetch` fills missing model/provider from current defaults and returns the strict canonical `Run` type. No file-rewriting migration; no impact on the canonical schema in `@x/shared`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:26:01 +05:30
model: z.string(),
provider: z.string(),
permissionMode: z.enum(["manual", "auto"]).optional(),
useCase: UseCase.optional(),
subUseCase: z.string().optional(),
2025-12-29 15:30:57 +05:30
log: z.array(RunEvent),
});
export const ListRunsResponse = z.object({
runs: z.array(Run.pick({
id: true,
2026-01-20 16:36:36 +05:30
title: true,
2025-12-29 15:30:57 +05:30
createdAt: true,
agentId: true,
})),
nextCursor: z.string().optional(),
});
freeze model + provider per run at creation time The model dropdown was broken in two ways: it wrote to ~/.rowboat/config/models.json (the BYOK creds file, stamped with a fake `flavor: 'openrouter'` to satisfy zod when signed in), and the runtime ignored that write entirely for signed-in users because `streamAgent` hard-coded `gpt-5.4`. Model selection was also globally scoped, so every chat shared one brain. This change moves model + provider out of the global config and onto the run itself, resolved once at runs:create and frozen for the run's lifetime. ## Resolution `runsCore.createRun` resolves per-field, falling through: run.model = opts.model ?? agent.model ?? defaults.model run.provider = opts.provider ?? agent.provider ?? defaults.provider A new `core/models/defaults.ts` is the only place in the codebase that branches on signed-in state. `getDefaultModelAndProvider()` returns name strings; `resolveProviderConfig(name)` does the name → full LlmProvider lookup at runtime. `createProvider` learns about `flavor: 'rowboat'` so the gateway is just another flavor. `provider` is stored as a name (e.g. `"rowboat"`, `"openai"`), not a full LlmProvider object. API keys never get written into the JSONL log; rotating a key in models.json applies to existing runs without re-creation. Cost: deleting a provider from settings breaks runs that referenced it (clear error surfaced via `resolveProviderConfig`). ## Runtime `streamAgent` no longer resolves anything — it reads `state.runModel` / `state.runProvider`, looks up the provider config, instantiates. Subflows inherit the parent run's pair, so KG / inline-task subagents run on whatever the main run resolved to at creation. The `knowledgeGraphAgents` array, `isKgAgent`, and the per-agent default constants are gone. KG / inline-task / pre-built agents declare their preferred model in YAML frontmatter (claude-haiku-4.5 / claude-sonnet-4.6) — used at resolution time when those agents are themselves the top-level agent of a run (background triggers, scheduled tasks, etc.). ## Standalone callers Non-run LLM call sites (summarize_meeting, track/routing, builtin-tools parseFile) and `agent-schedule/runner` were branching on signed-in independently. They all route through `getDefaultModelAndProvider` + `resolveProviderConfig` + `createProvider` now; `agent-schedule/runner` switched from raw `runsRepo.create` to `runsCore.createRun` so resolution applies to scheduled-agent runs too. ## UI `chat-input-with-mentions` stops calling `models:saveConfig`. The dropdown notifies the parent via `onSelectedModelChange` ({provider, model} as names); App.tsx stashes selection per-tab and passes it to the next `runs:create`. When a run already exists, the input fetches it and renders a static label — model can't change mid-run. ## Legacy runs A lenient zod schema in `repo.ts` (`StartEvent.extend(...optional)` plus `RunEvent.or(LegacyStartEvent)`) parses pre-existing runs. `repo.fetch` fills missing model/provider from current defaults and returns the strict canonical `Run` type. No file-rewriting migration; no impact on the canonical schema in `@x/shared`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:26:01 +05:30
export const CreateRunOptions = z.object({
agentId: z.string(),
model: z.string().optional(),
provider: z.string().optional(),
permissionMode: z.enum(["manual", "auto"]).optional(),
useCase: UseCase.optional(),
subUseCase: z.string().optional(),
freeze model + provider per run at creation time The model dropdown was broken in two ways: it wrote to ~/.rowboat/config/models.json (the BYOK creds file, stamped with a fake `flavor: 'openrouter'` to satisfy zod when signed in), and the runtime ignored that write entirely for signed-in users because `streamAgent` hard-coded `gpt-5.4`. Model selection was also globally scoped, so every chat shared one brain. This change moves model + provider out of the global config and onto the run itself, resolved once at runs:create and frozen for the run's lifetime. ## Resolution `runsCore.createRun` resolves per-field, falling through: run.model = opts.model ?? agent.model ?? defaults.model run.provider = opts.provider ?? agent.provider ?? defaults.provider A new `core/models/defaults.ts` is the only place in the codebase that branches on signed-in state. `getDefaultModelAndProvider()` returns name strings; `resolveProviderConfig(name)` does the name → full LlmProvider lookup at runtime. `createProvider` learns about `flavor: 'rowboat'` so the gateway is just another flavor. `provider` is stored as a name (e.g. `"rowboat"`, `"openai"`), not a full LlmProvider object. API keys never get written into the JSONL log; rotating a key in models.json applies to existing runs without re-creation. Cost: deleting a provider from settings breaks runs that referenced it (clear error surfaced via `resolveProviderConfig`). ## Runtime `streamAgent` no longer resolves anything — it reads `state.runModel` / `state.runProvider`, looks up the provider config, instantiates. Subflows inherit the parent run's pair, so KG / inline-task subagents run on whatever the main run resolved to at creation. The `knowledgeGraphAgents` array, `isKgAgent`, and the per-agent default constants are gone. KG / inline-task / pre-built agents declare their preferred model in YAML frontmatter (claude-haiku-4.5 / claude-sonnet-4.6) — used at resolution time when those agents are themselves the top-level agent of a run (background triggers, scheduled tasks, etc.). ## Standalone callers Non-run LLM call sites (summarize_meeting, track/routing, builtin-tools parseFile) and `agent-schedule/runner` were branching on signed-in independently. They all route through `getDefaultModelAndProvider` + `resolveProviderConfig` + `createProvider` now; `agent-schedule/runner` switched from raw `runsRepo.create` to `runsCore.createRun` so resolution applies to scheduled-agent runs too. ## UI `chat-input-with-mentions` stops calling `models:saveConfig`. The dropdown notifies the parent via `onSelectedModelChange` ({provider, model} as names); App.tsx stashes selection per-tab and passes it to the next `runs:create`. When a run already exists, the input fetches it and renders a static label — model can't change mid-run. ## Legacy runs A lenient zod schema in `repo.ts` (`StartEvent.extend(...optional)` plus `RunEvent.or(LegacyStartEvent)`) parses pre-existing runs. `repo.fetch` fills missing model/provider from current defaults and returns the strict canonical `Run` type. No file-rewriting migration; no impact on the canonical schema in `@x/shared`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:26:01 +05:30
});