Browser-Use Tutorial: AI-Powered Browser Automation with LLM Agents
Introduction
Browser-Use is an open-source Python library that enables Large Language Models (LLMs) to autonomously control web browsers. With Browser-Use, you can build AI agents capable of navigating web pages, filling forms, extracting data, and executing complex browser tasks just like a human would.
This library bridges the gap between LLM reasoning capabilities and real-world interaction through browsers. Unlike traditional web scraping that requires fragile CSS selectors or XPath expressions, Browser-Use leverages the vision and reasoning capabilities of LLMs to understand web pages both visually and semantically.
Popular use cases for Browser-Use include:
- Web Research Agent: Automatically search and gather information from multiple sources
- Form Automation: Fill web forms automatically
- Testing Agent: Perform automated UI testing
- Data Extraction: Extract structured data from web pages
- Workflow Automation: Automate multi-step workflows involving browser interactions
In this tutorial, we'll cover installation, basic usage, advanced techniques, and best practices for building reliable AI browser agents with Browser-Use.
Installation
Prerequisites
Before installing Browser-Use, make sure you have:
- Python 3.11 or newer
- pip or uv as your package manager
- An API key from an LLM provider (OpenAI, Anthropic, or others)
Installation with pip
pip install browser-use
Installation with uv (Recommended)
uv pip install browser-use
Install Playwright Browser
Browser-Use uses Playwright as its browser engine. After installation, run:
playwright install chromium
Setup Environment Variables
Create a .env file in your project root:
OPENAIAPIKEY=sk-your-openai-key
ANTHROPICAPIKEY=sk-ant-your-anthropic-key
Verify Installation
import browseruse
print(f"Browser-Use version: {browser
use.version}")
Basic Usage
Your First Agent
Here's the simplest example to create a browser agent:
import asyncio
from browseruse import Agent
from langchainopenai import ChatOpenAI
async def main():
agent = Agent(
task="Search for today's Bitcoin price on Google and provide the result",
llm=ChatOpenAI(model="gpt-4o"),
)
result = await agent.run()
print(result)
asyncio.run(main())
The agent will open a browser, navigate to Google, search for the Bitcoin price, and return the result.
Using Anthropic Claude
Browser-Use supports various LLM providers. Here's an example using Claude:
import asyncio
from browseruse import Agent
from langchainanthropic import ChatAnthropic
async def main():
agent = Agent(
task="Open Wikipedia and search for information about Machine Learning",
llm=ChatAnthropic(model="claude-sonnet-4-20250514"),
)
result = await agent.run()
print(result)
asyncio.run(main())
Running with Visible Browser
By default, the browser runs in headless mode. To see what the agent is doing:
import asyncio
from browseruse import Agent, Browser, BrowserConfig
from langchainopenai import ChatOpenAI
async def main():
browser = Browser(
config=BrowserConfig(
headless=False, # Browser is visible
)
)
agent = Agent(
task="Navigate to GitHub and search for the browser-use repository",
llm=ChatOpenAI(model="gpt-4o"),
browser=browser,
)
result = await agent.run()
print(result)
await browser.close()
asyncio.run(main())
Extracting Structured Data
Use Pydantic models to get structured output:
import asyncio
from pydantic import BaseModel