Browser Automation
The browser tool — navigate, interact with, and extract content from web pages.
Overview
The browser tool provides web browsing capabilities to your agent. It can navigate to URLs, interact with page elements, extract content, and take screenshots. This is powered by a headless browser (Playwright) running on the host system.
Capabilities
| Action | Description |
|---|---|
navigate | Open a URL in the browser |
click | Click on an element |
type | Type text into an input field |
extract | Extract text content from the page |
screenshot | Take a screenshot of the page |
scroll | Scroll the page |
wait | Wait for an element or condition |
Usage Examples
Navigating and Extracting
The agent can browse a website and extract relevant information:
{
"tool": "browser",
"input": {
"action": "navigate",
"url": "https://news.ycombinator.com"
}
}Followed by:
{
"tool": "browser",
"input": {
"action": "extract",
"selector": ".titleline"
}
}Form Interaction
{
"tool": "browser",
"input": {
"action": "type",
"selector": "#search-input",
"text": "ArgentOS documentation"
}
}Configuration
{
"agents": {
"defaults": {
"browser": {
"headless": true,
"timeout": 30000,
"viewport": { "width": 1280, "height": 720 }
}
}
}
}Prerequisites
The browser tool requires Playwright to be installed:
npx playwright install chromiumSecurity Considerations
- The browser runs with the same permissions as the ArgentOS process
- Be cautious with agents that have both browser and exec tools -- they can be powerful in combination
- Consider restricting allowed domains in production environments
- Cookie and session data persists between browser actions within a session
Limitations
- Browser state does not persist across gateway restarts
- JavaScript-heavy SPAs may require explicit wait conditions
- File downloads through the browser require separate handling
- The browser runs headless by default; there is no visual UI to observe