browser-automation (564-line SKILL.md, 3 scripts, 3 references): - Web scraping, form filling, screenshot capture, data extraction - Anti-detection patterns, cookie/session management, dynamic content - scraping_toolkit.py, form_automation_builder.py, anti_detection_checker.py - NOT testing (that's playwright-pro) — this is automation & scraping spec-driven-workflow (586-line SKILL.md, 3 scripts, 3 references): - Spec-first development: write spec BEFORE code - Bounded autonomy rules, 6-phase workflow, self-review checklist - spec_generator.py, spec_validator.py, test_extractor.py - Pairs with tdd-guide for red-green-refactor after spec Updated engineering plugin.json (31 → 33 skills). Added both to mkdocs.yml nav and generated docs pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 KiB
13 KiB
Playwright Browser API Reference (Automation Focus)
This reference covers Playwright's Python async API for browser automation tasks — NOT testing. For test-specific APIs (assertions, fixtures, test runners), see playwright-pro.
Browser Launch & Context
Launching the Browser
from playwright.async_api import async_playwright
async with async_playwright() as p:
# Chromium (recommended for most automation)
browser = await p.chromium.launch(headless=True)
# Firefox (better for some anti-detection scenarios)
browser = await p.firefox.launch(headless=True)
# WebKit (Safari engine — useful for Apple-specific sites)
browser = await p.webkit.launch(headless=True)
Launch options:
| Option | Type | Default | Purpose |
|---|---|---|---|
headless |
bool | True | Run without visible window |
slow_mo |
int | 0 | Milliseconds to slow each operation (debugging) |
proxy |
dict | None | Proxy server configuration |
args |
list | [] | Additional Chromium flags |
downloads_path |
str | None | Directory for downloads |
channel |
str | None | Browser channel: "chrome", "msedge" |
Browser Contexts (Session Isolation)
Browser contexts are isolated environments within a single browser instance. Each context has its own cookies, localStorage, and cache. Use them instead of launching multiple browsers.
# Create isolated context
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 ...",
locale="en-US",
timezone_id="America/New_York",
geolocation={"latitude": 40.7128, "longitude": -74.0060},
permissions=["geolocation"],
)
# Multiple contexts share one browser (resource efficient)
context_a = await browser.new_context() # User A session
context_b = await browser.new_context() # User B session
Storage State (Session Persistence)
# Save state after login (cookies + localStorage)
await context.storage_state(path="auth_state.json")
# Restore state in new context
context = await browser.new_context(storage_state="auth_state.json")
Page Navigation
Basic Navigation
page = await context.new_page()
# Navigate with different wait strategies
await page.goto("https://example.com") # Default: "load"
await page.goto("https://example.com", wait_until="domcontentloaded") # Faster
await page.goto("https://example.com", wait_until="networkidle") # Wait for network quiet
await page.goto("https://example.com", timeout=30000) # Custom timeout (ms)
wait_until options:
"load"— wait for theloadevent (all resources loaded)"domcontentloaded"— DOM is ready, images/styles may still load"networkidle"— no network requests for 500ms (best for SPAs)"commit"— response received, before any rendering
Wait Strategies
# Wait for a specific element to appear
await page.wait_for_selector("div.content", state="visible")
await page.wait_for_selector("div.loading", state="hidden") # Wait for loading to finish
await page.wait_for_selector("table tbody tr", state="attached") # In DOM but maybe not visible
# Wait for URL change
await page.wait_for_url("**/dashboard**")
await page.wait_for_url(re.compile(r"/dashboard/\d+"))
# Wait for specific network response
async with page.expect_response("**/api/data*") as resp_info:
await page.click("button.load")
response = await resp_info.value
json_data = await response.json()
# Wait for page load state
await page.wait_for_load_state("networkidle")
# Fixed wait (use sparingly — prefer the methods above)
await page.wait_for_timeout(1000) # milliseconds
Navigation History
await page.go_back()
await page.go_forward()
await page.reload()
Element Interaction
Finding Elements
# Single element (returns first match)
element = await page.query_selector("css=div.product")
element = await page.query_selector("xpath=//div[@class='product']")
# Multiple elements
elements = await page.query_selector_all("div.product")
# Locator API (recommended — auto-waits, re-queries on each action)
locator = page.locator("div.product")
count = await locator.count()
first = locator.first
nth = locator.nth(2)
Locator vs query_selector:
query_selector— returns an ElementHandle at a point in time. Can go stale if DOM changes.locator— returns a Locator that re-queries each time you interact with it. Preferred for reliability.
Clicking
await page.click("button.submit")
await page.click("a:has-text('Next')")
await page.dblclick("div.editable")
await page.click("button", position={"x": 10, "y": 10}) # Click at offset
await page.click("button", force=True) # Skip actionability checks
await page.click("button", modifiers=["Shift"]) # With modifier key
Text Input
# Fill (clears existing content first)
await page.fill("input#email", "user@example.com")
# Type (simulates keystroke-by-keystroke input — slower, more realistic)
await page.type("input#search", "query text", delay=50) # 50ms between keys
# Press specific keys
await page.press("input#search", "Enter")
await page.press("body", "Control+a")
Dropdowns & Select
# Native <select> element
await page.select_option("select#country", value="US")
await page.select_option("select#country", label="United States")
await page.select_option("select#tags", value=["tag1", "tag2"]) # Multi-select
# Custom dropdown (non-native)
await page.click("div.dropdown-trigger")
await page.click("li.option:has-text('United States')")
Checkboxes & Radio Buttons
await page.check("input#agree")
await page.uncheck("input#newsletter")
is_checked = await page.is_checked("input#agree")
File Upload
# Standard file input
await page.set_input_files("input[type='file']", "/path/to/file.pdf")
await page.set_input_files("input[type='file']", ["/path/a.pdf", "/path/b.pdf"])
# Clear file selection
await page.set_input_files("input[type='file']", [])
# Non-standard upload (drag-and-drop zones)
async with page.expect_file_chooser() as fc_info:
await page.click("div.upload-zone")
file_chooser = await fc_info.value
await file_chooser.set_files("/path/to/file.pdf")
Hover & Focus
await page.hover("div.menu-item")
await page.focus("input#search")
Data Extraction
Text Content
# Get text content of an element
text = await page.text_content("h1.title")
inner_text = await page.inner_text("div.description") # Visible text only
inner_html = await page.inner_html("div.content") # HTML markup
# Get attribute
href = await page.get_attribute("a.link", "href")
src = await page.get_attribute("img.photo", "src")
JavaScript Evaluation
# Evaluate in page context
title = await page.evaluate("document.title")
scroll_height = await page.evaluate("document.body.scrollHeight")
# Evaluate on a specific element
text = await page.eval_on_selector("h1", "el => el.textContent")
texts = await page.eval_on_selector_all("li", "els => els.map(e => e.textContent.trim())")
# Complex extraction
data = await page.evaluate("""
() => {
const rows = document.querySelectorAll('table tbody tr');
return Array.from(rows).map(row => {
const cells = row.querySelectorAll('td');
return {
name: cells[0]?.textContent.trim(),
value: cells[1]?.textContent.trim(),
};
});
}
""")
Screenshots & PDF
# Full page screenshot
await page.screenshot(path="page.png", full_page=True)
# Viewport screenshot
await page.screenshot(path="viewport.png")
# Element screenshot
await page.locator("div.chart").screenshot(path="chart.png")
# PDF (Chromium only)
await page.pdf(path="page.pdf", format="A4", print_background=True)
# Screenshot as bytes (for processing without saving)
buffer = await page.screenshot()
Network Interception
Monitoring Requests
# Listen for all responses
page.on("response", lambda response: print(f"{response.status} {response.url}"))
# Wait for a specific API call
async with page.expect_response("**/api/products*") as resp:
await page.click("button.load")
response = await resp.value
data = await response.json()
Blocking Resources (Speed Up Scraping)
# Block images, fonts, and CSS to speed up scraping
await page.route("**/*.{png,jpg,jpeg,gif,svg,woff,woff2,ttf}", lambda route: route.abort())
await page.route("**/*.css", lambda route: route.abort())
# Block specific domains (ads, analytics)
await page.route("**/google-analytics.com/**", lambda route: route.abort())
await page.route("**/facebook.com/**", lambda route: route.abort())
Modifying Requests
# Add custom headers
await page.route("**/*", lambda route: route.continue_(headers={
**route.request.headers,
"X-Custom-Header": "value"
}))
# Mock API responses
await page.route("**/api/data", lambda route: route.fulfill(
status=200,
content_type="application/json",
body=json.dumps({"items": []}),
))
Dialog Handling
# Auto-accept all dialogs
page.on("dialog", lambda dialog: dialog.accept())
# Handle specific dialog types
async def handle_dialog(dialog):
if dialog.type == "confirm":
await dialog.accept()
elif dialog.type == "prompt":
await dialog.accept("my input")
elif dialog.type == "alert":
await dialog.dismiss()
page.on("dialog", handle_dialog)
File Downloads
# Wait for download to start
async with page.expect_download() as dl_info:
await page.click("a.download-link")
download = await dl_info.value
# Save to specific path
await download.save_as("/path/to/downloads/" + download.suggested_filename)
# Get download as bytes
path = await download.path() # Temp file path
# Set download behavior at context level
context = await browser.new_context(accept_downloads=True)
Frames & Iframes
# Access iframe by selector
frame = page.frame_locator("iframe#content")
await frame.locator("button.submit").click()
# Access frame by name
frame = page.frame(name="editor")
# Access all frames
for frame in page.frames:
print(frame.url)
Cookie Management
# Get all cookies
cookies = await context.cookies()
# Get cookies for specific URL
cookies = await context.cookies(["https://example.com"])
# Add cookies
await context.add_cookies([{
"name": "session",
"value": "abc123",
"domain": "example.com",
"path": "/",
"httpOnly": True,
"secure": True,
}])
# Clear cookies
await context.clear_cookies()
Concurrency Patterns
Multiple Pages in One Context
# Open multiple tabs in the same session
pages = []
for url in urls:
page = await context.new_page()
await page.goto(url)
pages.append(page)
# Process all pages
for page in pages:
data = await extract_data(page)
await page.close()
Multiple Contexts for Parallel Sessions
import asyncio
async def scrape_with_context(browser, url):
context = await browser.new_context(user_agent=random.choice(USER_AGENTS))
page = await context.new_page()
await page.goto(url)
data = await extract_data(page)
await context.close()
return data
# Run 5 concurrent scraping tasks
tasks = [scrape_with_context(browser, url) for url in urls[:5]]
results = await asyncio.gather(*tasks)
Init Scripts (Stealth)
Init scripts run before any page script, in every new page/context.
# Remove webdriver flag
await context.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
""")
# Override plugins (headless Chrome has empty plugins)
await context.add_init_script("""
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
""")
# Override languages
await context.add_init_script("""
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en'],
});
""")
# From file
await context.add_init_script(path="stealth.js")
Common Automation Patterns
Scrolling
# Scroll to bottom
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
# Scroll element into view
await page.locator("div.target").scroll_into_view_if_needed()
# Smooth scroll simulation
await page.evaluate("""
async () => {
const delay = ms => new Promise(r => setTimeout(r, ms));
for (let i = 0; i < document.body.scrollHeight; i += 300) {
window.scrollTo(0, i);
await delay(100);
}
}
""")
Clipboard Operations
# Copy text
await page.evaluate("navigator.clipboard.writeText('hello')")
# Paste via keyboard
await page.keyboard.press("Control+v")
Shadow DOM
# Playwright pierces open shadow DOM with >> operator
await page.locator("my-component >> .inner-button").click()
# Or use the css= engine with >> for chained piercing
await page.locator("css=host-element >> css=.shadow-child").click()