rowboat/apps/x/CODE_MODE_ENGINES_PLAN.md
gagan 2ddec07712
Code mode: make packaged builds work via managed engine provisioning (#625)
* fix(code-mode): make packaged code mode work via on-demand engine provisioning

Packaged builds could never run code mode: the Claude/Codex ACP adapters are
spawned as separate `node <entry>` processes resolved at runtime, but esbuild
can't inline a dynamic spawn target and Forge strips the workspace node_modules,
so every release threw `Cannot find module '@agentclientprotocol/...'`. Dev
worked only because of the pnpm symlink.

Rather than bundle the ~400 MB of native engines (one claude + one codex binary
per OS), provision them on demand:

- forge.config.cjs: stage the two ACP adapters + their JS dependency closure into
  .package/acp/node_modules (npm-style nested layout, native engines skipped),
  exempt .package from the node_modules ignore rule, and only sign/notarize when
  APPLE_ID is set so unsigned local/CI builds can package.
- agents.ts: resolve the adapter from the staged location first (node_modules
  fallback in dev); provision the pinned engine and point the adapter at it via
  CLAUDE_CODE_EXECUTABLE / CODEX_PATH. No dependence on a user's global install.
- engine-provisioner.ts: ensureEngine() downloads the per-platform engine package
  from npm AT THE EXACT VERSION THE ADAPTER WAS BUILT AGAINST, verifies its sha512
  integrity, extracts atomically into ~/.rowboat/engines/<agent>/<version>/, and
  caches it. Version-pinning keeps the ACP handshake compatible.
- engine-manifest.ts + scripts/gen-engine-manifest.mjs: committed manifest of
  tarball URLs + integrity for all platforms, regenerated from the adapters'
  pinned versions on a bump.

Verified on macOS arm64: both engines provision and run, and both adapters
complete the ACP initialize handshake from the packaged .app against the
provisioned engines. Installer drops from ~790 MB to 390 MB.

* feat(code-mode): explicit per-agent Enable in Settings; no silent chat download

Code mode now requires the user to explicitly enable an agent before use, instead
of silently downloading a ~200 MB engine on the first chat message.

- Settings → Code Mode: each agent shows "Not enabled" + an Enable button that
  downloads its engine with a live progress indicator (download % → verify →
  install), then flips to "Engine ready". Driven by a new codeMode:provisionEngine
  IPC call + a codeMode:engineProgress push channel. The section now states the
  prerequisite explicitly: the agent must be installed (Enable) and logged in
  (claude login / codex login — code mode reuses that saved credential).
- Chat path no longer auto-downloads: getProvisionedEnginePath() returns the
  enabled engine or throws a clear "enable it in Settings → Code Mode" error, so
  there's never a surprise mid-conversation download. getAgentLaunchSpec is sync
  again.
- Agent status: `installed` now means "engine provisioned" (downloaded), driving
  the Enable/Ready state; the new-session dialog shows "Enable in Settings" and
  disables un-enabled agents. Dropped the dead PATH-probing for a global CLI.

Verified: empty cache -> status installed=false and the chat path throws the
enable-in-Settings error (no download); core, renderer, and main typecheck/build;
no new lint errors.

* fix(code-mode): show only percentage during engine download in Settings

* feat(code-mode): prune superseded engine versions after install

After a successful provision, remove any other version dirs (and their .meta) for
that agent so old ~200 MB engines don't accumulate across version bumps. Best-effort;
never fails a good install. Verified: a planted stale version dir + meta are both
removed after provisioning the current version.

* fix(code-mode): keep showing engine download % after reopening Settings

Provisioning state lived in the row component, which unmounts when the Settings
dialog closes — so reopening mid-download showed the Enable button again even though
the download was still running in the main process. Move provisioning state to a
module-level store with one persistent listener on codeMode:engineProgress, so a row
remounting (dialog reopened) reflects the live % and resolves to Ready on completion.

* fix(code-mode): flip Enable row straight to Ready after install (no Enable flash)

On successful provision the in-flight flag was cleared before the async status
refresh completed, so the row briefly (or until reopen) showed the Enable button
again. Await the status refresh before clearing the flag so it transitions directly
to Ready.

* fix(code-mode): optimistically show Ready right after Enable completes

Awaiting the status refresh wasn't enough — setStatus re-renders the parent
separately from the row, leaving a window where the in-flight flag was cleared but
the status prop was stale, so the row flashed/stuck on the Enable button until
reopen. Track just-enabled agents in a module-level set and treat them as installed
immediately; loadStatus still syncs the real status in the background.

* fix(code-mode): graft login-shell PATH + add startup deadline

#1 (the gh/git "command not found" in packaged builds): GUI/Finder launches inherit
launchd's stripped PATH (/usr/bin:/bin:...), so tools the engine spawns — gh, git,
rg, bash — fail even though they work from a terminal (e.g. Homebrew's
/opt/homebrew/bin/gh). Probe the user's login-shell PATH and graft it onto the
engine's env before spawn (shell-env.ts; no-op on Windows / probe failure).

#2: add a 60s startup deadline (initialize / session create+load) so a wedged engine
fails with a clear, stderr-enriched error instead of an infinite "(pending...)".
Overridable via ROWBOAT_ACP_STARTUP_TIMEOUT_MS. Manager now disposes the client on
startup failure so the spawned adapter doesn't leak.

Verified: getAgentLaunchSpec's env.PATH now includes /opt/homebrew/bin (where gh
lives); core builds; no new lint errors.

* chore(code-mode): comment out signing/notarization for local builds

Revert to the explicit comment-out approach for osxSign/osxNotarize: uncomment them
(with APPLE_ID/APPLE_PASSWORD/APPLE_TEAM_ID) for a signed release build.

* chore(code-mode): keep signing/notarization active in committed config

The repo's forge.config ships with osxSign/osxNotarize enabled (release-ready).
Developers comment them out locally for unsigned test builds and don't commit that.

* chore: approve workspace build scripts so packaging runs non-interactively

The allowBuilds entries were left as "set this to true or false" placeholders, so
`pnpm install` / the pre-build deps check aborted with ERR_PNPM_IGNORED_BUILDS and
`npm run package` failed. Set them to true (and add node-pty, used by the code-mode
embedded terminal) so build scripts are approved and packaging works without a manual
`pnpm approve-builds`.
2026-06-17 21:53:15 +05:30

16 KiB

Code Mode — Managed Engine Provisioning Plan

Branch: feat/code-mode-managed-engines (off dev @ 8ce24ebb)

1. Problem & Goal

Code mode runs two coding agents — Claude Code and Codex — by spawning their ACP adapters, which in turn spawn a heavy native engine binary (~205 MB claude, ~194 MB codex). We need code mode to work in packaged releases with ~99% reliability for both agents, without shipping a ~400 MB installer.

Current state (HEAD = revert of #614)

  • Packaged builds do not stage the ACP adapters, and forge.config.cjs ignore: /^\/node_modules\// strips them. At runtime agents.ts resolves the adapter via require.resolve(...) then spawns it — which throws Cannot find module '@agentclientprotocol/...'.
  • Net: packaged code mode is broken in every release. It only works in dev because pnpm symlinks exist. There is no 400 MB bloat today — but no function either.

Why the two prior approaches are insufficient

  • Bundle engines (A): +~400 MB per installer (one claude + one codex native binary per OS/arch). Works offline, but installer is huge.
  • Drive from user's local install (B) — what #614 settled on and was reverted: requires the user to have both CLIs installed and logged in, correct version, on the right PATH. Depends on the user's machine → structurally cannot hit 99% (version skew, GUI-launch PATH stripping, missing installs). This is why #614 was reverted.

2. Chosen Architecture — Managed Engine Provisioning (validated against Conductor)

Split the problem into engine vs auth, treat them differently — exactly what Conductor (conductor.build) does:

  1. Engine = owned by the app. We provision version-pinned engine binaries into app-support (~/.rowboat/engines/<agent>/<version>/), download-on-demand on first use, sha256-verified, symlink/path-pinned. Not the user's global npm, not their PATH. → no version skew, no PATH quirks → this is what delivers the 99%.
  2. Auth = reused from the user. The engines read existing credentials (~/.claude API key / Pro / Max, ~/.codex auth.json). No second login. status.ts already inspects these.

Empirical proof from Conductor on this machine

  • DMG is 123 MB — far smaller than the ~400 MB of engines → engines are not in the installer; they're downloaded after install.
  • Layout observed at ~/Library/Application Support/com.conductor.app/:
    agent-binaries/claude/2.1.170/claude   (222 MB, single Mach-O arm64)
    agent-binaries/codex/0.138.0/codex     (205 MB, single Mach-O arm64)
    bin/claude -> agent-binaries/claude/2.1.170/claude   (symlink to active version)
    bin/codex  -> agent-binaries/codex/0.138.0/codex
    agent-binaries/.meta/claude-2.1.170.json  = {sha256, size, downloaded_at_unix_ms, ...}
    
  • So: versioned dirs + stable symlink + sha256 .meta ledger + download. We mirror this.

3. Concrete facts that make this clean (verified)

Adapters honor an external engine via env var (no code change in adapters needed)

  • Claude@agentclientprotocol/claude-agent-acp@0.39.0, dist/acp-agent.js:39: if (process.env.CLAUDE_CODE_EXECUTABLE) return it; and line 1552 pathToClaudeCodeExecutable: process.env.CLAUDE_CODE_EXECUTABLE ?? .... If unset and no bundled native dep → it throws "set CLAUDE_CODE_EXECUTABLE".
  • Codex@agentclientprotocol/codex-acp@0.0.44, dist/index.js:20900: const codexPath = process.env["CODEX_PATH"] ?? "codex";spawn(codexPath, ["app-server"]).
  • Implication: provisioning + setting these two env vars is the entire engine story.

Our engine packages are already single self-contained binaries

  • @anthropic-ai/claude-agent-sdk-darwin-arm64@0.3.156 → contains one claude executable (+ LICENSE/README).
  • @openai/codex@0.128.0-darwin-arm64vendor/<target>/codex/codex native binary plus a bundled rg (ripgrep) at vendor/<target>/path/rg — see Risk R1.

Pinned versions + platform package names (read from the installed adapter trees)

  • Claude: adapter claude-agent-acp@0.39.0 → engine @anthropic-ai/claude-agent-sdk@0.3.156. Platform optional deps @anthropic-ai/claude-agent-sdk-<platform>@0.3.156: darwin-arm64, darwin-x64, linux-x64, linux-arm64, linux-x64-musl, linux-arm64-musl, win32-x64, win32-arm64.
  • Codex: adapter codex-acp@0.0.44@openai/codex@^0.128.0 (pnpm-patched, see R2). Platform deps aliased @openai/codex-<platform>npm:@openai/codex@0.128.0-<platform>: darwin-arm64, darwin-x64, linux-x64, linux-arm64, win32-x64, win32-arm64.

4. Distribution source — npm platform packages, adapter-pinned (DECIDED)

We fetch the per-platform engine packages from the npm registry at the exact versions our ACP adapters depend on, extract the native binary, and provision it into ~/.rowboat/engines/.... No self-host, no curl installer, no fallback.

  • Tarball URL: https://registry.npmjs.org/<pkg>/-/<file>-<version>.tgz
    • claude: @anthropic-ai/claude-agent-sdk-<platform>@0.3.156 → extract the claude binary.
    • codex: @openai/codex@0.128.0-<platform> → extract vendor/<target>/codex/codex and keep vendor/<target>/path/rg (see R1).
  • Why adapter-pinned npm (not curl installer, not release bucket, not self-host):
    • The binary is the exact version our adapter was built/tested against → ACP handshake guaranteed → the key to ~99% reliability.
    • npm registry is highly available, immutable per-version, and the packument provides dist.integrity (sha512) + dist.shasum (sha1) → integrity verification with zero infra.
    • The official curl | bash installer is for end-users: it installs globally to ~/.local/bin and auto-updates in the background — exactly what we must avoid. We want an isolated, pinned, app-managed copy that never moves under the adapter. (Conductor likewise fetches the raw binary rather than running the installer.)
  • Versions are read from our lockfile at build time and embedded in engine-manifest.json, so the manifest can never drift from the adapters we ship.

Reference: how this relates to Conductor / official installers

  • Conductor self-hosts the same kind of native binaries on storage.conductor.build (raw/.gz/.zst + {url,gzipUrl,zstdUrl,sha256} manifest), pairing the adapters with recent standalone CLI versions (claude 2.1.x, codex 0.138). That proves recent versions work — but we deliberately pin to the adapter's own engine version for determinism.
  • Official Claude releases also publish a GPG-signed manifest.json (SHA256/platform) at downloads.claude.ai/claude-code-releases/<ver>/. We don't use it now (npm is simpler and adapter-matched), but it's the upgrade path if we ever want the latest CLI line.
  • If we later want control/availability/compression, we can mirror these npm tarballs to our own bucket — runtime stays identical, only manifest URLs change.

Locked decisions

  • Source: npm platform packages, adapter-pinned versions (user).
  • No self-host (user).
  • Always provision; no fallback to a user's pre-installed claude/codex (user). Reverted path B is fully retired.
  • Codex pnpm patch — IRRELEVANT (user): it targets the JS launcher; we point CODEX_PATH at the native binary, so it does not apply. (Risk R2 removed.)

5. Design

5.1 Build-time: generate an engine manifest + stage adapter JS

(a) Engine manifest (engine-manifest.json, embedded in the app). A build script reads the installed adapter dependency trees and emits, per agent:

{
  "claude": {
    "version": "0.3.156",
    "platforms": {
      "darwin-arm64": { "pkg": "@anthropic-ai/claude-agent-sdk-darwin-arm64",
                        "tarball": "https://registry.npmjs.org/.../-/...-0.3.156.tgz",
                        "integrity": "sha512-...",
                        "binRelPath": "claude" },
      "...": {}
    }
  },
  "codex": {
    "version": "0.128.0",
    "platforms": {
      "darwin-arm64": { "pkg": "@openai/codex-darwin-arm64",
                        "tarball": "https://registry.npmjs.org/...",
                        "integrity": "sha512-...",
                        "binRelPath": "vendor/aarch64-apple-darwin/codex/codex",
                        "extraPaths": ["vendor/aarch64-apple-darwin/path/rg"] }
    }
  }
}
  • Versions + tarball URLs + integrity are pulled from the lockfile / npm packument so the manifest is always in sync with the adapters we ship. (Generating it at build time sidesteps hardcoding fragile package names.)
  • binRelPath / extraPaths capture where the executable (and codex's rg) live inside each tarball.

(b) Stage the ACP adapter JS into the package (the part of #614 we KEEP). The adapters themselves (tiny, ~15 MB total incl. their non-native JS deps) must exist on disk in packaged builds so agents.ts can resolve + spawn them. In forge.config.cjs generateAssets, stage the two adapters and their non-native production dependency closure into .package/acp/node_modules (npm-style nested layout), and exempt .package from the node_modules ignore rule. We drop #614's native-engine staging entirely — engines come from provisioning, not the bundle.

5.2 Runtime: engine provisioner (new module in packages/core)

New file: packages/core/src/code-mode/acp/engine-provisioner.ts.

ensureEngine(agent): Promise<{ executablePath: string }>
  1. Read manifest entry for (agent, currentPlatform). Error clearly if unsupported.
  2. dir = ~/.rowboat/engines/<agent>/<version>/
  3. If dir exists AND .meta/<agent>-<version>.json sha256 matches → return binPath.
  4. Else acquire a cross-process lock (avoid double download), then:
     a. Download tarball to a temp file, streaming, with progress events.
     b. Verify integrity (sha512/sha256 from manifest).
     c. Extract into a temp dir, then atomic rename into <version>/ (tar gzip).
     d. chmod +x the binary (and codex's rg) on unix.
     e. Write .meta/<agent>-<version>.json {sha256, size, downloaded_at_unix_ms}.
  5. Return absolute path to the engine executable (binRelPath joined to dir).
  • Progress + cancellation surfaced over IPC for the first-run "Downloading engine…" UI.
  • Offline / failure → typed error with a clear, actionable message (not a hang).
  • Resumability / atomicity: download to temp, verify, then rename; never leave a half-extracted version dir that passes the existence check.

5.3 Wire provisioning into launch

In agents.ts getAgentLaunchSpec() (currently sets only CLAUDE_CODE_EXECUTABLE):

  • Make it (or its caller in client.ts / manager.ts) await ensureEngine(agent) first, then set:
    • claude → env.CLAUDE_CODE_EXECUTABLE = <provisioned claude>
    • codex → env.CODEX_PATH = <provisioned codex> (and ensure its rg sibling resolves; keep the vendor dir layout intact so codex finds ../path/rg).
  • Keep ELECTRON_RUN_AS_NODE=1 for the adapter spawn (unchanged).
  • The provisioner replaces both the bundled-engine dependency and the reverted "resolve user's local install" path. (Optionally: if a healthy provisioned engine is absent but a compatible local install exists, we may fall back to it — but the default and reliable path is the provisioned engine. Decide in review; default = provisioned only.)

5.4 Status & UX (status.ts + renderer)

  • checkCodeModeAgentStatus() becomes: engine = provisioned? (instead of "installed on PATH?"); auth = existing credentials present? (unchanged logic).
  • First-run flow: user picks agent → if engine missing, show "Downloading engine (~200 MB), one time…" with progress; on completion, proceed. Subsequent uses: instant.
  • Clear error states: download failed / offline / unsupported platform / auth missing.

6. File-by-file change plan

File Change
apps/main/forge.config.cjs Stage adapter JS closure into .package/acp/node_modules; exempt .package from node_modules ignore; generate + copy engine-manifest.json. (No native engines staged.)
apps/main/scripts/gen-engine-manifest.mjs (new) Build-time: read lockfile/packuments → emit engine-manifest.json (versions, tarball URLs, integrity, bin paths).
packages/core/src/code-mode/acp/engine-provisioner.ts (new) ensureEngine() — download, verify, extract, lock, progress, .meta ledger.
packages/core/src/code-mode/acp/agents.ts Resolve adapter from staged .package/acp first (fallback to node_modules in dev); await ensureEngine; set CLAUDE_CODE_EXECUTABLE/CODEX_PATH to provisioned binaries.
packages/core/src/code-mode/acp/client.ts / manager.ts Await provisioning before spawn; surface provisioning progress/errors; keep startup deadline.
packages/core/src/code-mode/status.ts Engine status = provisioned (not PATH); keep auth checks.
apps/main/src/ipc.ts + preload + renderer IPC for provisioning progress + first-run download UI + error states.
CI (optional) Smoke: packaged app boots each adapter, provisions a (small fake) engine, sets env var, answers ACP initialize; offline → clear error not a hang.

7. Edge cases & risks

  • R1 — Codex needs rg. The codex platform package bundles ripgrep at vendor/<target>/path/rg. We must extract/keep the vendor layout so codex finds it (don't extract the bare binary). Verify codex app-server boots from the provisioned dir.
  • R2 — Codex pnpm patch. RESOLVED (irrelevant). The patch targets the JS launcher; we point CODEX_PATH at the native binary, so it does not apply.
  • R3 — Platform/arch matrix. Manifest must cover darwin x64/arm64, linux x64/arm64 (+musl for claude), win32 x64/arm64. Windows engine is claude.exe/codex.exe.
  • R4 — Integrity & supply chain. Always verify the manifest integrity hash before chmod/exec. Treat a hash mismatch as a hard failure.
  • R5 — Disk + upgrades. Versioned dirs accumulate. Add a cleanup that keeps only the active pinned version per agent.
  • R6 — First-run network. Required once per agent; cached forever after. Must be a clear, cancellable UX, never a silent hang (reuse the #614 startup-deadline lesson).
  • R7 — code signing / Gatekeeper (macOS). Downloaded native binaries aren't covered by our app's signature. Verify they run under Gatekeeper (they're already signed/notarized by Anthropic/OpenAI; quarantine attr may need clearing). Conductor runs them fine from app-support — confirm we do too.
  • R8 — engine-manifest staleness. Manifest must regenerate whenever the adapter/engine versions change; tie generation into the build so it can't drift.

8. Phasing

  1. P1 — Packaging fix (unblocks dev→packaged parity): stage adapter JS into .package, resolver checks staged path first. Verify packaged code mode can at least spawn the adapter. (Independent of provisioning; the part of #614 worth keeping.)
  2. P2 — Provisioner (core): ensureEngine() + manifest + wire env vars. Verify both engines provision + boot from ~/.rowboat/engines on macOS arm64.
  3. P3 — UX + status: first-run download UI, status panel, error states.
  4. P4 — Cross-platform + CI smoke: matrix manifest, mac/linux/win smoke.
  5. P5 — Polish: version cleanup, cancellation, offline messaging, optional local-install fallback.

9. Decisions & remaining questions

Resolved (locked)

  1. Source = npm platform packages, adapter-pinned versions (not self-host, not curl installer, not release bucket).
  2. Always provision; no local-install fallback.
  3. Codex pnpm patch irrelevant.

Still open (can decide during implementation)

  • Provision timing: on first code-mode use (lazy) vs first app launch (eager, background download). Plan assumes lazy + cached.
  • Version-dir cleanup: keep only the active pinned version per agent (R5).
  • Verify R1 (codex rg) and R7 (Gatekeeper on downloaded macOS binaries) during P2.