Add tabbed embedded browser and assistant browser control
This commit is contained in:
arkml 2026-04-15 13:21:09 +05:30 committed by GitHub
parent e2c13f0f6f
commit 7dbfcb72f4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
23 changed files with 2893 additions and 59 deletions

View file

@ -71,6 +71,7 @@ Rowboat is an agentic assistant for everyday work - emails, meetings, projects,
**App Control:** When users ask you to open notes, show the bases or graph view, filter or search notes, or manage saved views, load the \`app-navigation\` skill first. It provides structured guidance for navigating the app UI and controlling the knowledge base view.
**Tracks (Auto-Updating Note Blocks):** When users ask you to **track**, **monitor**, **watch**, or **keep an eye on** something in a note or say things like "every morning tell me X", "show the current Y in this note", "pin live updates of Z here" load the \`tracks\` skill first. Also load it when a user presses Cmd+K with a note open and requests auto-refreshing content at the cursor. Track blocks are YAML-fenced scheduled blocks whose output is rewritten on each run — useful for weather, news, prices, status pages, and personal dashboards.
**Browser Control:** When users ask you to open a website, browse in-app, search the web in the embedded browser, or interact with a live webpage inside Rowboat, load the \`browser-control\` skill first. It explains the \`read-page -> indexed action -> refreshed page\` workflow for the browser pane.
## Learning About the User (save-to-memory)
@ -243,6 +244,7 @@ ${runtimeContextPrompt}
- \`slack-checkConnection\`, \`slack-listAvailableTools\`, \`slack-executeAction\` - Slack integration (requires Slack to be connected via Composio). Use \`slack-listAvailableTools\` first to discover available tool slugs, then \`slack-executeAction\` to execute them.
- \`web-search\` - Search the web. Returns rich results with full text, highlights, and metadata. The \`category\` parameter defaults to \`general\` (full web search) — only use a specific category like \`news\`, \`company\`, \`research paper\` etc. when the query is clearly about that type. For everyday queries (weather, restaurants, prices, how-to), use \`general\`.
- \`app-navigation\` - Control the app UI: open notes, switch views, filter/search the knowledge base, manage saved views. **Load the \`app-navigation\` skill before using this tool.**
- \`browser-control\` - Control the embedded browser pane: open sites, inspect the live page, switch tabs, and interact with indexed page elements. **Load the \`browser-control\` skill before using this tool.**
- \`save-to-memory\` - Save observations about the user to the agent memory system. Use this proactively during conversations.
- \`composio-list-toolkits\`, \`composio-search-tools\`, \`composio-execute-tool\`, \`composio-connect-toolkit\` — Composio integration tools. Load the \`composio-integration\` skill for usage guidance.
@ -281,6 +283,13 @@ This renders as an interactive card in the UI that the user can click to open th
- Files on the user's machine (~/Desktop/..., /Users/..., etc.)
- Audio files, images, documents, or any file reference
Do NOT use filepath blocks for:
- Website URLs or browser pages (\`https://...\`, \`http://...\`)
- Anything currently open in the embedded browser
- Browser tabs or browser tab ids
For browser pages, mention the URL in plain text or use the browser-control tool. Do not try to turn browser pages into clickable file cards.
**IMPORTANT:** Only use filepath blocks for files that already exist. The card is clickable and opens the file, so it must point to a real file. If you are proposing a path for a file that hasn't been created yet (e.g., "Shall I save it at ~/Documents/report.pdf?"), use inline code (\`~/Documents/report.pdf\`) instead of a filepath block. Use the filepath block only after the file has been written/created successfully.
Never output raw file paths in plain text when they could be wrapped in a filepath block unless the file does not exist yet.`;

View file

@ -0,0 +1,106 @@
export const skill = String.raw`
# Browser Control Skill
You have access to the **browser-control** tool, which controls Rowboat's embedded browser pane directly.
Use this skill when the user asks you to open a website, browse in-app, search the web in the browser pane, click something on a page, fill a form, or otherwise interact with a live webpage inside Rowboat.
## Core Workflow
1. Start with ` + "`browser-control({ action: \"open\" })`" + ` if the browser pane may not already be open.
2. Use ` + "`browser-control({ action: \"read-page\" })`" + ` to inspect the current page.
3. The tool returns:
- ` + "`snapshotId`" + `
- page ` + "`url`" + ` and ` + "`title`" + `
- visible page text
- interactable elements with numbered ` + "`index`" + ` values
4. Prefer acting on those numbered indices with ` + "`click`" + ` / ` + "`type`" + ` / ` + "`press`" + `.
5. After each action, read the returned page snapshot before deciding the next step.
## Actions
### open
Open the browser pane and ensure an active tab exists.
### get-state
Return the current browser tabs and active tab id.
### new-tab
Open a new browser tab.
Parameters:
- ` + "`target`" + ` (optional): URL or plain-language search query
### switch-tab
Switch to a tab by ` + "`tabId`" + `.
### close-tab
Close a tab by ` + "`tabId`" + `.
### navigate
Navigate the active tab.
Parameters:
- ` + "`target`" + `: URL or plain-language search query
Plain-language targets are converted into a search automatically.
### back / forward / reload
Standard browser navigation controls.
### read-page
Read the current page and return a compact snapshot.
Parameters:
- ` + "`maxElements`" + ` (optional)
- ` + "`maxTextLength`" + ` (optional)
### click
Click an element.
Prefer:
- ` + "`index`" + `: element index from ` + "`read-page`" + `
Optional:
- ` + "`snapshotId`" + `: include it when acting on a recent snapshot
- ` + "`selector`" + `: fallback only when no usable index exists
### type
Type into an input, textarea, or contenteditable element.
Parameters:
- ` + "`text`" + `: text to enter
- plus the same target fields as ` + "`click`" + `
### press
Send a key press such as ` + "`Enter`" + `, ` + "`Tab`" + `, ` + "`Escape`" + `, or arrow keys.
Parameters:
- ` + "`key`" + `
- optional target fields if you need to focus a specific element first
### scroll
Scroll the current page.
Parameters:
- ` + "`direction`" + `: ` + "`\"up\"`" + ` or ` + "`\"down\"`" + ` (optional; defaults down)
- ` + "`amount`" + `: pixel distance (optional)
### wait
Wait for the page to settle, useful after async UI changes.
Parameters:
- ` + "`ms`" + `: milliseconds to wait (optional)
## Important Rules
- Prefer ` + "`read-page`" + ` before interacting.
- Prefer element ` + "`index`" + ` over CSS selectors.
- If the tool says the snapshot is stale, call ` + "`read-page`" + ` again.
- After navigation, clicking, typing, pressing, or scrolling, use the returned page snapshot instead of assuming the page state.
- Use Rowboat's browser for live interaction. Use web search tools for research where a live session is unnecessary.
- Do not wrap browser URLs or browser pages in ` + "```filepath" + ` blocks. Filepath cards are only for real files on disk, not web pages or browser tabs.
- If you mention a page the browser opened, use plain text for the URL/title instead of trying to create a clickable file card.
`;
export default skill;

View file

@ -11,6 +11,7 @@ import backgroundAgentsSkill from "./background-agents/skill.js";
import createPresentationsSkill from "./create-presentations/skill.js";
import appNavigationSkill from "./app-navigation/skill.js";
import browserControlSkill from "./browser-control/skill.js";
import composioIntegrationSkill from "./composio-integration/skill.js";
import tracksSkill from "./tracks/skill.js";
@ -105,6 +106,12 @@ const definitions: SkillDefinition[] = [
summary: "Create and manage track blocks — YAML-scheduled auto-updating content blocks in notes (weather, news, prices, status, dashboards). Insert at cursor (Cmd+K) or append to notes.",
content: tracksSkill,
},
{
id: "browser-control",
title: "Browser Control",
summary: "Control the embedded browser pane - open sites, inspect page state, and interact with indexed page elements.",
content: browserControlSkill,
},
];
const skillEntries = definitions.map((definition) => ({

View file

@ -0,0 +1,8 @@
import type { BrowserControlInput, BrowserControlResult } from '@x/shared/dist/browser-control.js';
export interface IBrowserControlService {
execute(
input: BrowserControlInput,
ctx?: { signal?: AbortSignal },
): Promise<BrowserControlResult>;
}

View file

@ -17,6 +17,7 @@ import { WorkDir } from "../../config/config.js";
import { composioAccountsRepo } from "../../composio/repo.js";
import { executeAction as executeComposioAction, isConfigured as isComposioConfigured, searchTools as searchComposioTools } from "../../composio/client.js";
import { CURATED_TOOLKITS, CURATED_TOOLKIT_SLUGS } from "@x/shared/dist/composio.js";
import { BrowserControlInputSchema, type BrowserControlInput } from "@x/shared/dist/browser-control.js";
import type { ToolContext } from "./exec-tool.js";
import { generateText } from "ai";
import { createProvider } from "../../models/models.js";
@ -26,6 +27,7 @@ import { getGatewayProvider } from "../../models/gateway.js";
import { getAccessToken } from "../../auth/tokens.js";
import { API_URL } from "../../config/env.js";
import { updateContent, updateTrackBlock } from "../../knowledge/track/fileops.js";
import type { IBrowserControlService } from "../browser-control/service.js";
// Parser libraries are loaded dynamically inside parseFile.execute()
// to avoid pulling pdfjs-dist's DOM polyfills into the main bundle.
// Import paths are computed so esbuild cannot statically resolve them.
@ -562,7 +564,7 @@ export const BuiltinTools: z.infer<typeof BuiltinToolsSchema> = {
count: matches.length,
tool: 'ripgrep',
};
} catch (rgError) {
} catch {
// Fallback to basic grep if ripgrep not available or failed
const grepArgs = [
'-rn',
@ -997,6 +999,39 @@ export const BuiltinTools: z.infer<typeof BuiltinToolsSchema> = {
},
},
// ============================================================================
// Browser Control
// ============================================================================
'browser-control': {
description: 'Control the embedded browser pane. Read the current page, inspect indexed interactable elements, and navigate/click/type/press keys in the active browser tab.',
inputSchema: BrowserControlInputSchema,
isAvailable: async () => {
try {
container.resolve<IBrowserControlService>('browserControlService');
return true;
} catch {
return false;
}
},
execute: async (input: BrowserControlInput, ctx?: ToolContext) => {
try {
const browserControlService = container.resolve<IBrowserControlService>('browserControlService');
return await browserControlService.execute(input, { signal: ctx?.signal });
} catch (error) {
return {
success: false,
action: input.action,
error: error instanceof Error ? error.message : 'Browser control is unavailable.',
browser: {
activeTabId: null,
tabs: [],
},
};
}
},
},
// ============================================================================
// App Navigation
// ============================================================================

View file

@ -1,4 +1,4 @@
import { asClass, createContainer, InjectionMode } from "awilix";
import { asClass, asValue, createContainer, InjectionMode } from "awilix";
import { FSModelConfigRepo, IModelConfigRepo } from "../models/repo.js";
import { FSMcpConfigRepo, IMcpConfigRepo } from "../mcp/repo.js";
import { FSAgentsRepo, IAgentsRepo } from "../agents/repo.js";
@ -15,6 +15,7 @@ import { IAbortRegistry, InMemoryAbortRegistry } from "../runs/abort-registry.js
import { FSAgentScheduleRepo, IAgentScheduleRepo } from "../agent-schedule/repo.js";
import { FSAgentScheduleStateRepo, IAgentScheduleStateRepo } from "../agent-schedule/state-repo.js";
import { FSSlackConfigRepo, ISlackConfigRepo } from "../slack/repo.js";
import type { IBrowserControlService } from "../application/browser-control/service.js";
const container = createContainer({
injectionMode: InjectionMode.PROXY,
@ -41,4 +42,10 @@ container.register({
slackConfigRepo: asClass<ISlackConfigRepo>(FSSlackConfigRepo).singleton(),
});
export default container;
export default container;
export function registerBrowserControlService(service: IBrowserControlService): void {
container.register({
browserControlService: asValue(service),
});
}

View file

@ -0,0 +1,134 @@
import { z } from 'zod';
export const BrowserTabStateSchema = z.object({
id: z.string(),
url: z.string(),
title: z.string(),
canGoBack: z.boolean(),
canGoForward: z.boolean(),
loading: z.boolean(),
});
export const BrowserStateSchema = z.object({
activeTabId: z.string().nullable(),
tabs: z.array(BrowserTabStateSchema),
});
export const BrowserPageElementSchema = z.object({
index: z.number().int().positive(),
tagName: z.string(),
role: z.string().nullable(),
type: z.string().nullable(),
label: z.string().nullable(),
text: z.string().nullable(),
placeholder: z.string().nullable(),
href: z.string().nullable(),
disabled: z.boolean(),
});
export const BrowserPageSnapshotSchema = z.object({
snapshotId: z.string(),
url: z.string(),
title: z.string(),
loading: z.boolean(),
text: z.string(),
elements: z.array(BrowserPageElementSchema),
});
export const BrowserControlActionSchema = z.enum([
'open',
'get-state',
'new-tab',
'switch-tab',
'close-tab',
'navigate',
'back',
'forward',
'reload',
'read-page',
'click',
'type',
'press',
'scroll',
'wait',
]);
const BrowserElementTargetFields = {
index: z.number().int().positive().optional(),
selector: z.string().min(1).optional(),
snapshotId: z.string().optional(),
} as const;
export const BrowserControlInputSchema = z.object({
action: BrowserControlActionSchema,
target: z.string().min(1).optional(),
tabId: z.string().min(1).optional(),
text: z.string().optional(),
key: z.string().min(1).optional(),
direction: z.enum(['up', 'down']).optional(),
amount: z.number().int().positive().max(5000).optional(),
ms: z.number().int().positive().max(30000).optional(),
maxElements: z.number().int().positive().max(100).optional(),
maxTextLength: z.number().int().positive().max(20000).optional(),
...BrowserElementTargetFields,
}).strict().superRefine((value, ctx) => {
const needsElementTarget = value.action === 'click' || value.action === 'type';
const hasElementTarget = value.index !== undefined || value.selector !== undefined;
if ((value.action === 'switch-tab' || value.action === 'close-tab') && !value.tabId) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
path: ['tabId'],
message: 'tabId is required for this action.',
});
}
if ((value.action === 'navigate') && !value.target) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
path: ['target'],
message: 'target is required for navigate.',
});
}
if (value.action === 'type' && value.text === undefined) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
path: ['text'],
message: 'text is required for type.',
});
}
if (value.action === 'press' && !value.key) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
path: ['key'],
message: 'key is required for press.',
});
}
if (needsElementTarget && !hasElementTarget) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
path: ['index'],
message: 'Provide an element index or selector.',
});
}
});
export const BrowserControlResultSchema = z.object({
success: z.boolean(),
action: BrowserControlActionSchema,
message: z.string().optional(),
error: z.string().optional(),
browser: BrowserStateSchema,
page: BrowserPageSnapshotSchema.optional(),
});
export type BrowserTabState = z.infer<typeof BrowserTabStateSchema>;
export type BrowserState = z.infer<typeof BrowserStateSchema>;
export type BrowserPageElement = z.infer<typeof BrowserPageElementSchema>;
export type BrowserPageSnapshot = z.infer<typeof BrowserPageSnapshotSchema>;
export type BrowserControlAction = z.infer<typeof BrowserControlActionSchema>;
export type BrowserControlInput = z.infer<typeof BrowserControlInputSchema>;
export type BrowserControlResult = z.infer<typeof BrowserControlResultSchema>;

View file

@ -12,4 +12,5 @@ export * as blocks from './blocks.js';
export * as trackBlock from './track-block.js';
export * as frontmatter from './frontmatter.js';
export * as bases from './bases.js';
export * as browserControl from './browser-control.js';
export { PrefixLogger };

View file

@ -10,6 +10,7 @@ import { TrackEvent } from './track-block.js';
import { UserMessageContent } from './message.js';
import { RowboatApiConfig } from './rowboat-account.js';
import { ZListToolkitsResponse } from './composio.js';
import { BrowserStateSchema } from './browser-control.js';
// ============================================================================
// Runtime Validation Schemas (Single Source of Truth)
@ -626,6 +627,87 @@ const ipcSchemas = {
error: z.string().optional(),
}),
},
// Embedded browser (WebContentsView) channels
'browser:setBounds': {
req: z.object({
x: z.number().int(),
y: z.number().int(),
width: z.number().int().nonnegative(),
height: z.number().int().nonnegative(),
}),
res: z.object({ ok: z.literal(true) }),
},
'browser:setVisible': {
req: z.object({ visible: z.boolean() }),
res: z.object({ ok: z.literal(true) }),
},
'browser:newTab': {
req: z.object({
url: z.string().min(1).refine(
(u) => {
const lower = u.trim().toLowerCase();
if (lower.startsWith('javascript:')) return false;
if (lower.startsWith('file://')) return false;
if (lower.startsWith('chrome://')) return false;
if (lower.startsWith('chrome-extension://')) return false;
return true;
},
{ message: 'Unsafe URL scheme' },
).optional(),
}),
res: z.object({
ok: z.boolean(),
tabId: z.string().optional(),
error: z.string().optional(),
}),
},
'browser:switchTab': {
req: z.object({ tabId: z.string().min(1) }),
res: z.object({ ok: z.boolean() }),
},
'browser:closeTab': {
req: z.object({ tabId: z.string().min(1) }),
res: z.object({ ok: z.boolean() }),
},
'browser:navigate': {
req: z.object({
url: z.string().min(1).refine(
(u) => {
const lower = u.trim().toLowerCase();
if (lower.startsWith('javascript:')) return false;
if (lower.startsWith('file://')) return false;
if (lower.startsWith('chrome://')) return false;
if (lower.startsWith('chrome-extension://')) return false;
return true;
},
{ message: 'Unsafe URL scheme' },
),
}),
res: z.object({
ok: z.boolean(),
error: z.string().optional(),
}),
},
'browser:back': {
req: z.null(),
res: z.object({ ok: z.boolean() }),
},
'browser:forward': {
req: z.null(),
res: z.object({ ok: z.boolean() }),
},
'browser:reload': {
req: z.null(),
res: z.object({ ok: z.literal(true) }),
},
'browser:getState': {
req: z.null(),
res: BrowserStateSchema,
},
'browser:didUpdateState': {
req: BrowserStateSchema,
res: z.null(),
},
// Billing channels
'billing:getInfo': {
req: z.null(),