Skills move out of packages/core/src/application/assistant/skills/*/skill.ts
(TS string constants) into apps/skills/<id>/SKILL.md (Agent Skills spec format
— YAML frontmatter + markdown body). One directory, one loader, one place to
look at every skill the agent can load.
Key change vs the old dev system: a `{{include:<skill-id>}}` directive lets one
skill transclude another. This removes the parallel TS constant for the
knowledge-note style guide — it now lives at apps/skills/knowledge-note-style/
(hidden from catalog) and is pulled into doc-collab + the live-note and
background-task agents via the resolver instead of via a TS import.
Infrastructure:
- packages/core/src/skills/ — types, skill-md-parser, FS-backed official repo,
SkillResolver with recursive {{include:<id>}} expansion + cycle detection
- packages/shared/src/skill.ts — SkillFrontmatter, SkillCatalogEntry,
ResolvedSkill schemas
- DI: officialSkillsRepo + skillResolver registered; registerSkillsDir helper
wires the path before any consumer resolves
- IPC: skills:list / skills:get (read-only) for the Settings UI
- Main: resolveSkillsDir picks Resources/skills (packaged) or repo apps/skills
(dev). forge.config.cjs ships apps/skills/ as extraResource.
Consumer refactor:
- buildCopilotInstructions: catalog markdown built from resolver.getCatalog()
- builtin-tools: loadSkill uses resolver, new listSkills tool
- background-tasks/agent + live-note/agent: now async builders that load
the knowledge-note-style skill content via resolver
- runtime.loadAgent: awaits the now-async builders
- Deleted: assistant/skills/ directory, knowledge-note-style.ts
UI:
- New SkillsSettings component (read-only list + detail view) wired into
Settings dialog as the "Skills" tab.
6.1 KiB
| name | description | metadata | ||
|---|---|---|---|---|
| browser-control | Control the embedded browser pane - open sites, inspect page state, and interact with indexed page elements. |
|
Browser Control Skill
You have access to the browser-control tool, which controls Rowboat's embedded browser pane directly.
Use this skill when the user asks you to open a website, browse in-app, search the web in the browser pane, click something on a page, fill a form, or otherwise interact with a live webpage inside Rowboat.
Core Workflow
- Start with
+ "browser-control({ action: "open" })" +if the browser pane may not already be open. - Use
+ "browser-control({ action: "read-page" })" +to inspect the current page. - The tool returns:
+ "snapshotId" +- page
+ "url" +and+ "title" + - visible page text
- interactable elements with numbered
+ "index" +values + "suggestedSkills" +— site-specific and interaction-specific skill hints for the current page
- Always inspect
+ "suggestedSkills" +before acting. If any skill in the list matches what the user asked for (site or task), call+ "load-browser-skill({ id: "" })" +first, read it in full, then plan your actions. These skills encode selectors, timing, and gotchas that would otherwise cost you several failed attempts to rediscover. If no skill matches, proceed — but do not skip this check. - Prefer acting on those numbered indices with
+ "click" +/+ "type" +/+ "press" +. - After each action, read the returned page snapshot before deciding the next step — including re-checking
+ "suggestedSkills" +if the navigation landed you on a new domain.
Actions
open
Open the browser pane and ensure an active tab exists.
get-state
Return the current browser tabs and active tab id.
new-tab
Open a new browser tab.
Parameters:
+ "target" +(optional): URL or plain-language search query
switch-tab
Switch to a tab by + "tabId" + .
close-tab
Close a tab by + "tabId" + .
navigate
Navigate the active tab.
Parameters:
+ "target" +: URL or plain-language search query
Plain-language targets are converted into a search automatically.
back / forward / reload
Standard browser navigation controls.
read-page
Read the current page and return a compact snapshot.
Parameters:
+ "maxElements" +(optional)+ "maxTextLength" +(optional)
click
Click an element.
Prefer:
+ "index" +: element index from+ "read-page" +
Optional:
+ "snapshotId" +: include it when acting on a recent snapshot+ "selector" +: fallback only when no usable index exists
type
Type into an input, textarea, or contenteditable element.
Parameters:
+ "text" +: text to enter- plus the same target fields as
+ "click" +
press
Send a key press such as + "Enter" + , + "Tab" + , + "Escape" + , or arrow keys.
Parameters:
+ "key" +- optional target fields if you need to focus a specific element first
scroll
Scroll the current page.
Parameters:
+ "direction" +:+ ""up"" +or+ ""down"" +(optional; defaults down)+ "amount" +: pixel distance (optional)
wait
Wait for the page to settle, useful after async UI changes.
Parameters:
+ "ms" +: milliseconds to wait (optional)
Companion Tools
load-browser-skill
Rowboat caches a library of browser skills (from + "browser-use/browser-harness" + ) indexed by both domain (github, linkedin, amazon, booking, …) and interaction type within a domain (e.g. + "github/repo-actions" + , + "github/scraping" + , + "arxiv-bulk/*" + ). Whenever + "browser-control" + returns a + "suggestedSkills" + array — which it does on + "navigate" + , + "new-tab" + , and + "read-page" + — treat it as a required reading step, not optional. Pick the entry that matches the current task (domain match first, then the interaction-specific variant if one exists) and call + "load-browser-skill({ id: "" })" + before attempting the action.
You can also proactively call + "load-browser-skill({ action: "list", site: "" })" + when you know you're about to work on a site, to see what skills exist even if + "suggestedSkills" + is empty (e.g. before navigating).
These skills are written against a Python harness, so treat them as reference knowledge. Reuse the selectors, timing, and sequencing, but adapt them to Rowboat's structured browser actions. Do not look for or call + "http-fetch" + . If a browser-harness recipe suggests + "js(...)" + or + "http_get(...)" + style shortcuts, treat those as non-portable and fall back to reading and interacting with the page itself.
Important Rules
- Prefer
+ "read-page" +before interacting. - Prefer element
+ "index" +over CSS selectors. - If the tool says the snapshot is stale, call
+ "read-page" +again. - After navigation, clicking, typing, pressing, or scrolling, use the returned page snapshot instead of assuming the page state.
- Always check
+ "suggestedSkills" +after+ "navigate" +,+ "new-tab" +, or+ "read-page" +, and load the matching domain or interaction skill before acting. Skipping this step is the single most common way to waste a dozen failed clicks on a site whose quirks are already documented. If the array is empty, proceed normally — but don't skip the check. - Do not try to use
+ "http-fetch" +. If a browser-harness recipe mentions+ "http_get(...)" +or a public API shortcut, adapt it to DOM-based browsing instead. - Use Rowboat's browser for live interaction. Use web search tools for research where a live session is unnecessary.
- Do not wrap browser URLs or browser pages in
+ "```filepath" +blocks. Filepath cards are only for real files on disk, not web pages or browser tabs. - If you mention a page the browser opened, use plain text for the URL/title instead of trying to create a clickable file card.