--- title: "Browser Automation — Agent Skill for Codex & OpenClaw" description: "Use when the user asks to automate browser tasks, scrape websites, fill forms, capture screenshots, extract structured data from web pages, or build. Agent skill for Claude Code, Codex CLI, Gemini CLI, OpenClaw." --- # Browser Automation

:material-rocket-launch: Engineering - POWERFUL :material-identifier: `browser-automation` :material-github: Source

Install: claude /plugin install engineering-advanced-skills

## Overview The Browser Automation skill provides comprehensive tools and knowledge for building production-grade web automation workflows using Playwright. This skill covers data extraction, form filling, screenshot capture, session management, and anti-detection patterns for reliable browser automation at scale. **When to use this skill:** - Scraping structured data from websites (tables, listings, search results) - Automating multi-step browser workflows (login, fill forms, download files) - Capturing screenshots or PDFs of web pages - Extracting data from SPAs and JavaScript-heavy sites - Building repeatable browser-based data pipelines **When NOT to use this skill:** - Writing browser tests or E2E test suites — use **playwright-pro** instead - Testing API endpoints — use **api-test-suite-builder** instead - Load testing or performance benchmarking — use **performance-profiler** instead **Why Playwright over Selenium or Puppeteer:** - **Auto-wait built in** — no explicit `sleep()` or `waitForElement()` needed for most actions - **Multi-browser from one API** — Chromium, Firefox, WebKit with zero config changes - **Network interception** — block ads, mock responses, capture API calls natively - **Browser contexts** — isolated sessions without spinning up new browser instances - **Codegen** — `playwright codegen` records your actions and generates scripts - **Async-first** — Python async/await for high-throughput scraping ## Core Competencies ### 1. Web Scraping Patterns #### DOM Extraction with CSS Selectors CSS selectors are the primary tool for element targeting. Prefer them over XPath for readability and performance. **Selector priority (most to least reliable):** 1. `data-testid`, `data-id`, or custom data attributes — stable across redesigns 2. `#id` selectors — unique but may change between deploys 3. Semantic selectors: `article`, `nav`, `main`, `section` — resilient to CSS changes 4. Class-based: `.product-card`, `.price` — brittle if classes are generated (e.g., CSS modules) 5. Positional: `nth-child()`, `nth-of-type()` — last resort, breaks on layout changes **Compound selectors for precision:** ```python # Product cards within a specific container page.query_selector_all("div.search-results > article.product-card") # Price inside a product card (scoped) card.query_selector("span[data-field='price']") # Links with specific text content page.locator("a", has_text="Next Page") ``` #### XPath for Complex Traversal Use XPath only when CSS cannot express the relationship: ```python # Find element by text content (XPath strength) page.locator("//td[contains(text(), 'Total')]/following-sibling::td[1]") # Navigate up the DOM tree page.locator("//span[@class='price']/ancestor::div[@class='product']") ``` #### Pagination Patterns - **Next-button pagination**: Click "Next" until disabled or absent - **URL-based pagination**: Increment `?page=N` or `&offset=N` in URL - **Infinite scroll**: Scroll to bottom, wait for new content, repeat until no change - **Load-more button**: Click button, wait for DOM mutation, repeat #### Infinite Scroll Handling ```python async def scroll_to_bottom(page, max_scrolls=50, pause_ms=1500): previous_height = 0 for i in range(max_scrolls): current_height = await page.evaluate("document.body.scrollHeight") if current_height == previous_height: break await page.evaluate("window.scrollTo(0, document.body.scrollHeight)") await page.wait_for_timeout(pause_ms) previous_height = current_height return i + 1 # number of scrolls performed ``` ### 2. Form Filling & Multi-Step Workflows #### Login Flows ```python async def login(page, url, username, password): await page.goto(url) await page.fill("input[name='username']", username) await page.fill("input[name='password']", password) await page.click("button[type='submit']") # Wait for navigation to complete (post-login redirect) await page.wait_for_url("**/dashboard**") ``` #### Multi-Page Forms Break multi-step forms into discrete functions per step. Each function: 1. Fills the fields for that step 2. Clicks the "Next" or "Continue" button 3. Waits for the next step to load (URL change or DOM element) ```python async def fill_step_1(page, data): await page.fill("#first-name", data["first_name"]) await page.fill("#last-name", data["last_name"]) await page.select_option("#country", data["country"]) await page.click("button:has-text('Continue')") await page.wait_for_selector("#step-2-form") async def fill_step_2(page, data): await page.fill("#address", data["address"]) await page.fill("#city", data["city"]) await page.click("button:has-text('Continue')") await page.wait_for_selector("#step-3-form") ``` #### File Uploads ```python # Single file await page.set_input_files("input[type='file']", "/path/to/file.pdf") # Multiple files await page.set_input_files("input[type='file']", [ "/path/to/file1.pdf", "/path/to/file2.pdf" ]) # Drag-and-drop upload zones (no visible input element) async with page.expect_file_chooser() as fc_info: await page.click("div.upload-zone") file_chooser = await fc_info.value await file_chooser.set_files("/path/to/file.pdf") ``` #### Dropdown and Select Handling ```python # Native