mirror of
https://github.com/rowboatlabs/rowboat.git
synced 2026-06-24 20:28:16 +02:00
feat(skills): single-source skill system with markdown SKILL.md + include directive
Skills move out of packages/core/src/application/assistant/skills/*/skill.ts
(TS string constants) into apps/skills/<id>/SKILL.md (Agent Skills spec format
— YAML frontmatter + markdown body). One directory, one loader, one place to
look at every skill the agent can load.
Key change vs the old dev system: a `{{include:<skill-id>}}` directive lets one
skill transclude another. This removes the parallel TS constant for the
knowledge-note style guide — it now lives at apps/skills/knowledge-note-style/
(hidden from catalog) and is pulled into doc-collab + the live-note and
background-task agents via the resolver instead of via a TS import.
Infrastructure:
- packages/core/src/skills/ — types, skill-md-parser, FS-backed official repo,
SkillResolver with recursive {{include:<id>}} expansion + cycle detection
- packages/shared/src/skill.ts — SkillFrontmatter, SkillCatalogEntry,
ResolvedSkill schemas
- DI: officialSkillsRepo + skillResolver registered; registerSkillsDir helper
wires the path before any consumer resolves
- IPC: skills:list / skills:get (read-only) for the Settings UI
- Main: resolveSkillsDir picks Resources/skills (packaged) or repo apps/skills
(dev). forge.config.cjs ships apps/skills/ as extraResource.
Consumer refactor:
- buildCopilotInstructions: catalog markdown built from resolver.getCatalog()
- builtin-tools: loadSkill uses resolver, new listSkills tool
- background-tasks/agent + live-note/agent: now async builders that load
the knowledge-note-style skill content via resolver
- runtime.loadAgent: awaits the now-async builders
- Deleted: assistant/skills/ directory, knowledge-note-style.ts
UI:
- New SkillsSettings component (read-only list + detail view) wired into
Settings dialog as the "Skills" tab.
This commit is contained in:
parent
b01af12148
commit
9a308cb7a9
38 changed files with 1217 additions and 1204 deletions
123
apps/skills/browser-control/SKILL.md
Normal file
123
apps/skills/browser-control/SKILL.md
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
---
|
||||
name: browser-control
|
||||
description: >-
|
||||
Control the embedded browser pane - open sites, inspect page state, and interact with indexed page elements.
|
||||
metadata:
|
||||
title: "Browser Control"
|
||||
---
|
||||
|
||||
# Browser Control Skill
|
||||
|
||||
You have access to the **browser-control** tool, which controls Rowboat's embedded browser pane directly.
|
||||
|
||||
Use this skill when the user asks you to open a website, browse in-app, search the web in the browser pane, click something on a page, fill a form, or otherwise interact with a live webpage inside Rowboat.
|
||||
|
||||
## Core Workflow
|
||||
|
||||
1. Start with ` + "`browser-control({ action: \"open\" })`" + ` if the browser pane may not already be open.
|
||||
2. Use ` + "`browser-control({ action: \"read-page\" })`" + ` to inspect the current page.
|
||||
3. The tool returns:
|
||||
- ` + "`snapshotId`" + `
|
||||
- page ` + "`url`" + ` and ` + "`title`" + `
|
||||
- visible page text
|
||||
- interactable elements with numbered ` + "`index`" + ` values
|
||||
- ` + "`suggestedSkills`" + ` — site-specific and interaction-specific skill hints for the current page
|
||||
4. **Always inspect ` + "`suggestedSkills`" + ` before acting.** If any skill in the list matches what the user asked for (site or task), call ` + "`load-browser-skill({ id: \"<id>\" })`" + ` *first*, read it in full, then plan your actions. These skills encode selectors, timing, and gotchas that would otherwise cost you several failed attempts to rediscover. If no skill matches, proceed — but do not skip this check.
|
||||
5. Prefer acting on those numbered indices with ` + "`click`" + ` / ` + "`type`" + ` / ` + "`press`" + `.
|
||||
6. After each action, read the returned page snapshot before deciding the next step — including re-checking ` + "`suggestedSkills`" + ` if the navigation landed you on a new domain.
|
||||
|
||||
## Actions
|
||||
|
||||
### open
|
||||
Open the browser pane and ensure an active tab exists.
|
||||
|
||||
### get-state
|
||||
Return the current browser tabs and active tab id.
|
||||
|
||||
### new-tab
|
||||
Open a new browser tab.
|
||||
|
||||
Parameters:
|
||||
- ` + "`target`" + ` (optional): URL or plain-language search query
|
||||
|
||||
### switch-tab
|
||||
Switch to a tab by ` + "`tabId`" + `.
|
||||
|
||||
### close-tab
|
||||
Close a tab by ` + "`tabId`" + `.
|
||||
|
||||
### navigate
|
||||
Navigate the active tab.
|
||||
|
||||
Parameters:
|
||||
- ` + "`target`" + `: URL or plain-language search query
|
||||
|
||||
Plain-language targets are converted into a search automatically.
|
||||
|
||||
### back / forward / reload
|
||||
Standard browser navigation controls.
|
||||
|
||||
### read-page
|
||||
Read the current page and return a compact snapshot.
|
||||
|
||||
Parameters:
|
||||
- ` + "`maxElements`" + ` (optional)
|
||||
- ` + "`maxTextLength`" + ` (optional)
|
||||
|
||||
### click
|
||||
Click an element.
|
||||
|
||||
Prefer:
|
||||
- ` + "`index`" + `: element index from ` + "`read-page`" + `
|
||||
|
||||
Optional:
|
||||
- ` + "`snapshotId`" + `: include it when acting on a recent snapshot
|
||||
- ` + "`selector`" + `: fallback only when no usable index exists
|
||||
|
||||
### type
|
||||
Type into an input, textarea, or contenteditable element.
|
||||
|
||||
Parameters:
|
||||
- ` + "`text`" + `: text to enter
|
||||
- plus the same target fields as ` + "`click`" + `
|
||||
|
||||
### press
|
||||
Send a key press such as ` + "`Enter`" + `, ` + "`Tab`" + `, ` + "`Escape`" + `, or arrow keys.
|
||||
|
||||
Parameters:
|
||||
- ` + "`key`" + `
|
||||
- optional target fields if you need to focus a specific element first
|
||||
|
||||
### scroll
|
||||
Scroll the current page.
|
||||
|
||||
Parameters:
|
||||
- ` + "`direction`" + `: ` + "`\"up\"`" + ` or ` + "`\"down\"`" + ` (optional; defaults down)
|
||||
- ` + "`amount`" + `: pixel distance (optional)
|
||||
|
||||
### wait
|
||||
Wait for the page to settle, useful after async UI changes.
|
||||
|
||||
Parameters:
|
||||
- ` + "`ms`" + `: milliseconds to wait (optional)
|
||||
|
||||
## Companion Tools
|
||||
|
||||
### load-browser-skill
|
||||
Rowboat caches a library of browser skills (from ` + "`browser-use/browser-harness`" + `) indexed by both **domain** (github, linkedin, amazon, booking, …) and **interaction type** within a domain (e.g. ` + "`github/repo-actions`" + `, ` + "`github/scraping`" + `, ` + "`arxiv-bulk/*`" + `). Whenever ` + "`browser-control`" + ` returns a ` + "`suggestedSkills`" + ` array — which it does on ` + "`navigate`" + `, ` + "`new-tab`" + `, and ` + "`read-page`" + ` — treat it as a required reading step, not optional. Pick the entry that matches the current task (domain match first, then the interaction-specific variant if one exists) and call ` + "`load-browser-skill({ id: \"<id>\" })`" + ` before attempting the action.
|
||||
|
||||
You can also proactively call ` + "`load-browser-skill({ action: \"list\", site: \"<site>\" })`" + ` when you know you're about to work on a site, to see what skills exist even if ` + "`suggestedSkills`" + ` is empty (e.g. before navigating).
|
||||
|
||||
These skills are written against a Python harness, so treat them as **reference knowledge**. Reuse the selectors, timing, and sequencing, but adapt them to Rowboat's structured browser actions. **Do not look for or call ` + "`http-fetch`" + `.** If a browser-harness recipe suggests ` + "`js(...)`" + ` or ` + "`http_get(...)`" + ` style shortcuts, treat those as non-portable and fall back to reading and interacting with the page itself.
|
||||
|
||||
## Important Rules
|
||||
|
||||
- Prefer ` + "`read-page`" + ` before interacting.
|
||||
- Prefer element ` + "`index`" + ` over CSS selectors.
|
||||
- If the tool says the snapshot is stale, call ` + "`read-page`" + ` again.
|
||||
- After navigation, clicking, typing, pressing, or scrolling, use the returned page snapshot instead of assuming the page state.
|
||||
- **Always check ` + "`suggestedSkills`" + ` after ` + "`navigate`" + `, ` + "`new-tab`" + `, or ` + "`read-page`" + `, and load the matching domain or interaction skill before acting.** Skipping this step is the single most common way to waste a dozen failed clicks on a site whose quirks are already documented. If the array is empty, proceed normally — but don't skip the check.
|
||||
- Do not try to use ` + "`http-fetch`" + `. If a browser-harness recipe mentions ` + "`http_get(...)`" + ` or a public API shortcut, adapt it to DOM-based browsing instead.
|
||||
- Use Rowboat's browser for live interaction. Use web search tools for research where a live session is unnecessary.
|
||||
- Do not wrap browser URLs or browser pages in ` + "```filepath" + ` blocks. Filepath cards are only for real files on disk, not web pages or browser tabs.
|
||||
- If you mention a page the browser opened, use plain text for the URL/title instead of trying to create a clickable file card.
|
||||
Loading…
Add table
Add a link
Reference in a new issue