Browser Automation

The browser tool — navigate, interact with, and extract content from web pages.

Overview

The browser tool provides web browsing capabilities to your agent. It can navigate to URLs, interact with page elements, extract content, and take screenshots. This is powered by a headless browser (Playwright) running on the host system.

Capabilities

Action	Description
`navigate`	Open a URL in the browser
`click`	Click on an element
`type`	Type text into an input field
`extract`	Extract text content from the page
`screenshot`	Take a screenshot of the page
`scroll`	Scroll the page
`wait`	Wait for an element or condition

Usage Examples

Navigating and Extracting

The agent can browse a website and extract relevant information:

{
  "tool": "browser",
  "input": {
    "action": "navigate",
    "url": "https://news.ycombinator.com"
  }
}

Followed by:

{
  "tool": "browser",
  "input": {
    "action": "extract",
    "selector": ".titleline"
  }
}

Form Interaction

{
  "tool": "browser",
  "input": {
    "action": "type",
    "selector": "#search-input",
    "text": "ArgentOS documentation"
  }
}

Configuration

{
  "agents": {
    "defaults": {
      "browser": {
        "headless": true,
        "timeout": 30000,
        "viewport": { "width": 1280, "height": 720 }
      }
    }
  }
}

Prerequisites

The browser tool requires Playwright to be installed:

npx playwright install chromium

Security Considerations

The browser runs with the same permissions as the ArgentOS process
Be cautious with agents that have both browser and exec tools -- they can be powerful in combination
Consider restricting allowed domains in production environments
Cookie and session data persists between browser actions within a session

Limitations

Browser state does not persist across gateway restarts
JavaScript-heavy SPAs may require explicit wait conditions
File downloads through the browser require separate handling
The browser runs headless by default; there is no visual UI to observe

On this page