browser-automation (564-line SKILL.md, 3 scripts, 3 references): - Web scraping, form filling, screenshot capture, data extraction - Anti-detection patterns, cookie/session management, dynamic content - scraping_toolkit.py, form_automation_builder.py, anti_detection_checker.py - NOT testing (that's playwright-pro) — this is automation & scraping spec-driven-workflow (586-line SKILL.md, 3 scripts, 3 references): - Spec-first development: write spec BEFORE code - Bounded autonomy rules, 6-phase workflow, self-review checklist - spec_generator.py, spec_validator.py, test_extractor.py - Pairs with tdd-guide for red-green-refactor after spec Updated engineering plugin.json (31 → 33 skills). Added both to mkdocs.yml nav and generated docs pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14 KiB
Anti-Detection Patterns for Browser Automation
This reference covers techniques to make Playwright automation less detectable by anti-bot services. These are defense-in-depth measures — no single technique is sufficient, but combining them significantly reduces detection risk.
Detection Vectors
Anti-bot systems detect automation through multiple signals. Understanding what they check helps you counter effectively.
Tier 1: Trivial Detection (Every Site Checks These)
- navigator.webdriver — Set to
trueby all automation frameworks - User-Agent string — Default headless UA contains "HeadlessChrome"
- WebGL renderer — Headless Chrome reports "SwiftShader" or "Google SwiftShader"
Tier 2: Common Detection (Most Anti-Bot Services)
- Viewport/screen dimensions — Unusual sizes flag automation
- Plugins array — Empty in headless mode, populated in real browsers
- Languages — Missing or mismatched locale
- Request timing — Machine-speed interactions
- Mouse movement — No mouse events between clicks
Tier 3: Advanced Detection (Cloudflare, DataDome, PerimeterX)
- Canvas fingerprint — Headless renders differently
- WebGL fingerprint — GPU-specific rendering variations
- Audio fingerprint — AudioContext processing differences
- Font enumeration — Different available fonts in headless
- Behavioral analysis — Scroll patterns, click patterns, reading time
Stealth Techniques
1. WebDriver Flag Removal
The most critical fix. Every anti-bot check starts here.
await page.add_init_script("""
// Remove webdriver flag
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
});
// Remove Playwright-specific properties
delete window.__playwright;
delete window.__pw_manual;
""")
2. User Agent Configuration
Match the user agent to the browser you are launching. A Chrome UA with Firefox-specific headers is a red flag.
# Chrome 120 on Windows 10 (most common configuration globally)
CHROME_WIN = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
# Chrome 120 on macOS
CHROME_MAC = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
# Chrome 120 on Linux
CHROME_LINUX = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
# Firefox 121 on Windows
FIREFOX_WIN = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0"
Rules:
- Update UAs every 2-3 months as browser versions increment
- Match UA platform to
navigator.platformoverride - If using Chromium, use Chrome UAs. If Firefox, use Firefox UAs.
- Never use obviously fake or ancient UAs
3. Viewport and Screen Properties
Common real-world screen resolutions (from analytics data):
| Resolution | Market Share | Use For |
|---|---|---|
| 1920x1080 | ~23% | Default choice |
| 1366x768 | ~14% | Laptop simulation |
| 1536x864 | ~9% | Scaled laptop |
| 1440x900 | ~7% | MacBook |
| 2560x1440 | ~5% | High-end desktop |
import random
VIEWPORTS = [
{"width": 1920, "height": 1080},
{"width": 1366, "height": 768},
{"width": 1536, "height": 864},
{"width": 1440, "height": 900},
]
viewport = random.choice(VIEWPORTS)
context = await browser.new_context(
viewport=viewport,
screen=viewport, # screen should match viewport
)
4. Navigator Properties Hardening
STEALTH_INIT = """
// Plugins (headless Chrome has 0 plugins, real Chrome has 3-5)
Object.defineProperty(navigator, 'plugins', {
get: () => {
const plugins = [
{ name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
{ name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
{ name: 'Native Client', filename: 'internal-nacl-plugin' },
];
plugins.length = 3;
return plugins;
},
});
// Languages
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en'],
});
// Platform (match to user agent)
Object.defineProperty(navigator, 'platform', {
get: () => 'Win32', // or 'MacIntel' for macOS UA
});
// Hardware concurrency (real browsers report CPU cores)
Object.defineProperty(navigator, 'hardwareConcurrency', {
get: () => 8,
});
// Device memory (Chrome-specific)
Object.defineProperty(navigator, 'deviceMemory', {
get: () => 8,
});
// Connection info
Object.defineProperty(navigator, 'connection', {
get: () => ({
effectiveType: '4g',
rtt: 50,
downlink: 10,
saveData: false,
}),
});
"""
await context.add_init_script(STEALTH_INIT)
5. WebGL Fingerprint Evasion
Headless Chrome uses SwiftShader for WebGL, which anti-bot services detect.
# Option A: Launch with a real GPU (headed mode on a machine with GPU)
browser = await p.chromium.launch(headless=False)
# Option B: Override WebGL renderer info
await page.add_init_script("""
const getParameter = WebGLRenderingContext.prototype.getParameter;
WebGLRenderingContext.prototype.getParameter = function(parameter) {
if (parameter === 37445) {
return 'Intel Inc.'; // UNMASKED_VENDOR_WEBGL
}
if (parameter === 37446) {
return 'Intel(R) Iris(TM) Plus Graphics 640'; // UNMASKED_RENDERER_WEBGL
}
return getParameter.call(this, parameter);
};
""")
6. Canvas Fingerprint Noise
Anti-bot services render text/shapes to a canvas and hash the output. Headless Chrome produces a different hash.
await page.add_init_script("""
const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function(type) {
if (type === 'image/png' || type === undefined) {
// Add minimal noise to the canvas to change fingerprint
const ctx = this.getContext('2d');
if (ctx) {
const imageData = ctx.getImageData(0, 0, this.width, this.height);
for (let i = 0; i < imageData.data.length; i += 4) {
// Shift one channel by +/- 1 (imperceptible)
imageData.data[i] = imageData.data[i] ^ 1;
}
ctx.putImageData(imageData, 0, 0);
}
}
return originalToDataURL.apply(this, arguments);
};
""")
Request Throttling Patterns
Human-Like Delays
Real users do not click at machine speed. Add realistic delays between actions.
import random
import asyncio
async def human_delay(action_type="browse"):
"""Add realistic delay based on action type."""
delays = {
"browse": (1.0, 3.0), # Browsing between pages
"read": (2.0, 8.0), # Reading content
"fill": (0.3, 0.8), # Between form fields
"click": (0.1, 0.5), # Before clicking
"scroll": (0.5, 1.5), # Between scroll actions
}
min_s, max_s = delays.get(action_type, (0.5, 2.0))
await asyncio.sleep(random.uniform(min_s, max_s))
Request Rate Limiting
import time
class RateLimiter:
"""Enforce minimum delay between requests."""
def __init__(self, min_interval_seconds=1.0):
self.min_interval = min_interval_seconds
self.last_request_time = 0
async def wait(self):
elapsed = time.time() - self.last_request_time
if elapsed < self.min_interval:
await asyncio.sleep(self.min_interval - elapsed)
self.last_request_time = time.time()
# Usage
limiter = RateLimiter(min_interval_seconds=2.0)
for url in urls:
await limiter.wait()
await page.goto(url)
Exponential Backoff on Errors
async def with_backoff(coro_factory, max_retries=5, base_delay=1.0):
for attempt in range(max_retries):
try:
return await coro_factory()
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s...")
await asyncio.sleep(delay)
Proxy Rotation Strategies
Single Proxy
browser = await p.chromium.launch(
proxy={"server": "http://proxy.example.com:8080"}
)
Authenticated Proxy
context = await browser.new_context(
proxy={
"server": "http://proxy.example.com:8080",
"username": "user",
"password": "pass",
}
)
Rotating Proxy Pool
PROXIES = [
"http://proxy1.example.com:8080",
"http://proxy2.example.com:8080",
"http://proxy3.example.com:8080",
]
async def create_context_with_proxy(browser):
proxy = random.choice(PROXIES)
return await browser.new_context(
proxy={"server": proxy}
)
Per-Request Proxy (via Context Rotation)
Playwright does not support per-request proxy switching. Achieve it by creating a new context for each request or batch:
async def scrape_url(browser, url, proxy):
context = await browser.new_context(proxy={"server": proxy})
page = await context.new_page()
try:
await page.goto(url)
data = await extract_data(page)
return data
finally:
await context.close()
SOCKS5 Proxy
browser = await p.chromium.launch(
proxy={"server": "socks5://proxy.example.com:1080"}
)
Headless Detection Avoidance
Running Chrome Channel Instead of Chromium
The bundled Chromium binary has different properties than a real Chrome install. Using the Chrome channel makes the browser indistinguishable from a normal install.
# Use installed Chrome instead of bundled Chromium
browser = await p.chromium.launch(channel="chrome", headless=True)
Requirements: Chrome must be installed on the system.
New Headless Mode (Chrome 112+)
Chrome's "new headless" mode is harder to detect than the old one:
browser = await p.chromium.launch(
args=["--headless=new"],
)
Avoiding Common Flags
Do NOT pass these flags — they are headless-detection signals:
--disable-gpu(old headless workaround, not needed)--no-sandbox(security risk, detectable)--disable-setuid-sandbox(same as above)
Behavioral Evasion
Mouse Movement Simulation
Anti-bot services track mouse events. A click without preceding mouse movement is suspicious.
async def human_click(page, selector):
"""Click with preceding mouse movement."""
element = await page.query_selector(selector)
box = await element.bounding_box()
if box:
# Move to element with slight offset
x = box["x"] + box["width"] / 2 + random.uniform(-5, 5)
y = box["y"] + box["height"] / 2 + random.uniform(-5, 5)
await page.mouse.move(x, y, steps=random.randint(5, 15))
await asyncio.sleep(random.uniform(0.05, 0.2))
await page.mouse.click(x, y)
Typing Speed Variation
async def human_type(page, selector, text):
"""Type with variable speed like a human."""
await page.click(selector)
for char in text:
await page.keyboard.type(char)
# Faster for common keys, slower for special characters
if char in "aeiou tnrs":
await asyncio.sleep(random.uniform(0.03, 0.08))
else:
await asyncio.sleep(random.uniform(0.08, 0.20))
Scroll Behavior
Real users scroll gradually, not in instant jumps.
async def human_scroll(page, distance=None):
"""Scroll down gradually like a human."""
if distance is None:
distance = random.randint(300, 800)
current = 0
while current < distance:
step = random.randint(50, 150)
await page.mouse.wheel(0, step)
current += step
await asyncio.sleep(random.uniform(0.05, 0.15))
Detection Testing
Self-Check Script
Navigate to these URLs to test your stealth configuration:
https://bot.sannysoft.com/— Comprehensive bot detection testhttps://abrahamjuliot.github.io/creepjs/— Advanced fingerprint analysishttps://browserleaks.com/webgl— WebGL fingerprint detailshttps://browserleaks.com/canvas— Canvas fingerprint details
Quick Test Pattern
async def test_stealth(page):
"""Navigate to detection test page and report results."""
await page.goto("https://bot.sannysoft.com/")
await page.wait_for_timeout(3000)
# Check for failed tests
failed = await page.eval_on_selector_all(
"td.failed",
"els => els.map(e => e.parentElement.querySelector('td').textContent)"
)
if failed:
print(f"FAILED checks: {failed}")
else:
print("All checks passed.")
await page.screenshot(path="stealth_test.png", full_page=True)
Recommended Stealth Stack
For most automation tasks, apply these in order of priority:
- WebDriver flag removal — Critical, takes 2 lines
- Custom user agent — Critical, takes 1 line
- Viewport configuration — High priority, takes 1 line
- Request delays — High priority, add random.uniform() calls
- Navigator properties — Medium priority, init script block
- Chrome channel — Medium priority, one launch option
- WebGL override — Low priority unless hitting advanced anti-bot
- Canvas noise — Low priority unless hitting advanced anti-bot
- Proxy rotation — Only for high-volume or repeated scraping
- Behavioral simulation — Only for sites with behavioral analysis