claude-skills-reference/engineering/browser-automation/references/anti_detection_patterns.md

# Anti-Detection Patterns for Browser Automation

This reference covers techniques to make Playwright automation less detectable by anti-bot services. These are defense-in-depth measures — no single technique is sufficient, but combining them significantly reduces detection risk.

## Detection Vectors

Anti-bot systems detect automation through multiple signals. Understanding what they check helps you counter effectively.

### Tier 1: Trivial Detection (Every Site Checks These)
1. **navigator.webdriver** — Set to `true` by all automation frameworks
2. **User-Agent string** — Default headless UA contains "HeadlessChrome"
3. **WebGL renderer** — Headless Chrome reports "SwiftShader" or "Google SwiftShader"

### Tier 2: Common Detection (Most Anti-Bot Services)
4. **Viewport/screen dimensions** — Unusual sizes flag automation
5. **Plugins array** — Empty in headless mode, populated in real browsers
6. **Languages** — Missing or mismatched locale
7. **Request timing** — Machine-speed interactions
8. **Mouse movement** — No mouse events between clicks

### Tier 3: Advanced Detection (Cloudflare, DataDome, PerimeterX)
9. **Canvas fingerprint** — Headless renders differently
10. **WebGL fingerprint** — GPU-specific rendering variations
11. **Audio fingerprint** — AudioContext processing differences
12. **Font enumeration** — Different available fonts in headless
13. **Behavioral analysis** — Scroll patterns, click patterns, reading time

## Stealth Techniques

### 1. WebDriver Flag Removal

The most critical fix. Every anti-bot check starts here.

```python
await page.add_init_script("""
    // Remove webdriver flag
    Object.defineProperty(navigator, 'webdriver', {
        get: () => undefined,
    });

    // Remove Playwright-specific properties
    delete window.__playwright;
    delete window.__pw_manual;
""")
```

### 2. User Agent Configuration

Match the user agent to the browser you are launching. A Chrome UA with Firefox-specific headers is a red flag.

```python
# Chrome 120 on Windows 10 (most common configuration globally)
CHROME_WIN = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

# Chrome 120 on macOS
CHROME_MAC = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

# Chrome 120 on Linux
CHROME_LINUX = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

# Firefox 121 on Windows
FIREFOX_WIN = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0"
```

**Rules:**
- Update UAs every 2-3 months as browser versions increment
- Match UA platform to `navigator.platform` override
- If using Chromium, use Chrome UAs. If Firefox, use Firefox UAs.
- Never use obviously fake or ancient UAs

### 3. Viewport and Screen Properties

Common real-world screen resolutions (from analytics data):

| Resolution | Market Share | Use For |
|-----------|-------------|---------|
| 1920x1080 | ~23% | Default choice |
| 1366x768 | ~14% | Laptop simulation |
| 1536x864 | ~9% | Scaled laptop |
| 1440x900 | ~7% | MacBook |
| 2560x1440 | ~5% | High-end desktop |

```python
import random

VIEWPORTS = [
    {"width": 1920, "height": 1080},
    {"width": 1366, "height": 768},
    {"width": 1536, "height": 864},
    {"width": 1440, "height": 900},
]

viewport = random.choice(VIEWPORTS)
context = await browser.new_context(
    viewport=viewport,
    screen=viewport,  # screen should match viewport
)
```

### 4. Navigator Properties Hardening

```python
STEALTH_INIT = """
    // Plugins (headless Chrome has 0 plugins, real Chrome has 3-5)
    Object.defineProperty(navigator, 'plugins', {
        get: () => {
            const plugins = [
                { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
                { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
                { name: 'Native Client', filename: 'internal-nacl-plugin' },
            ];
            plugins.length = 3;
            return plugins;
        },
    });

    // Languages
    Object.defineProperty(navigator, 'languages', {
        get: () => ['en-US', 'en'],
    });

    // Platform (match to user agent)
    Object.defineProperty(navigator, 'platform', {
        get: () => 'Win32',  // or 'MacIntel' for macOS UA
    });

    // Hardware concurrency (real browsers report CPU cores)
    Object.defineProperty(navigator, 'hardwareConcurrency', {
        get: () => 8,
    });

    // Device memory (Chrome-specific)
    Object.defineProperty(navigator, 'deviceMemory', {
        get: () => 8,
    });

    // Connection info
    Object.defineProperty(navigator, 'connection', {
        get: () => ({
            effectiveType: '4g',
            rtt: 50,
            downlink: 10,
            saveData: false,
        }),
    });
"""

await context.add_init_script(STEALTH_INIT)
```

### 5. WebGL Fingerprint Evasion

Headless Chrome uses SwiftShader for WebGL, which anti-bot services detect.

```python
# Option A: Launch with a real GPU (headed mode on a machine with GPU)
browser = await p.chromium.launch(headless=False)

# Option B: Override WebGL renderer info
await page.add_init_script("""
    const getParameter = WebGLRenderingContext.prototype.getParameter;
    WebGLRenderingContext.prototype.getParameter = function(parameter) {
        if (parameter === 37445) {
            return 'Intel Inc.';  // UNMASKED_VENDOR_WEBGL
        }
        if (parameter === 37446) {
            return 'Intel(R) Iris(TM) Plus Graphics 640';  // UNMASKED_RENDERER_WEBGL
        }
        return getParameter.call(this, parameter);
    };
""")
```

### 6. Canvas Fingerprint Noise

Anti-bot services render text/shapes to a canvas and hash the output. Headless Chrome produces a different hash.

```python
await page.add_init_script("""
    const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
    HTMLCanvasElement.prototype.toDataURL = function(type) {
        if (type === 'image/png' || type === undefined) {
            // Add minimal noise to the canvas to change fingerprint
            const ctx = this.getContext('2d');
            if (ctx) {
                const imageData = ctx.getImageData(0, 0, this.width, this.height);
                for (let i = 0; i < imageData.data.length; i += 4) {
                    // Shift one channel by +/- 1 (imperceptible)
                    imageData.data[i] = imageData.data[i] ^ 1;
                }
                ctx.putImageData(imageData, 0, 0);
            }
        }
        return originalToDataURL.apply(this, arguments);
    };
""")
```

## Request Throttling Patterns

### Human-Like Delays

Real users do not click at machine speed. Add realistic delays between actions.

```python
import random
import asyncio

async def human_delay(action_type="browse"):
    """Add realistic delay based on action type."""
    delays = {
        "browse": (1.0, 3.0),      # Browsing between pages
        "read": (2.0, 8.0),        # Reading content
        "fill": (0.3, 0.8),        # Between form fields
        "click": (0.1, 0.5),       # Before clicking
        "scroll": (0.5, 1.5),      # Between scroll actions
    }
    min_s, max_s = delays.get(action_type, (0.5, 2.0))
    await asyncio.sleep(random.uniform(min_s, max_s))
```

### Request Rate Limiting

```python
import time

class RateLimiter:
    """Enforce minimum delay between requests."""

    def __init__(self, min_interval_seconds=1.0):
        self.min_interval = min_interval_seconds
        self.last_request_time = 0

    async def wait(self):
        elapsed = time.time() - self.last_request_time
        if elapsed < self.min_interval:
            await asyncio.sleep(self.min_interval - elapsed)
        self.last_request_time = time.time()

# Usage
limiter = RateLimiter(min_interval_seconds=2.0)
for url in urls:
    await limiter.wait()
    await page.goto(url)
```

### Exponential Backoff on Errors

```python
async def with_backoff(coro_factory, max_retries=5, base_delay=1.0):
    for attempt in range(max_retries):
        try:
            return await coro_factory()
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s...")
            await asyncio.sleep(delay)
```

## Proxy Rotation Strategies

### Single Proxy

```python
browser = await p.chromium.launch(
    proxy={"server": "http://proxy.example.com:8080"}
)
```

### Authenticated Proxy

```python
context = await browser.new_context(
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass",
    }
)
```

### Rotating Proxy Pool

```python
PROXIES = [
    "http://proxy1.example.com:8080",
    "http://proxy2.example.com:8080",
    "http://proxy3.example.com:8080",
]

async def create_context_with_proxy(browser):
    proxy = random.choice(PROXIES)
    return await browser.new_context(
        proxy={"server": proxy}
    )
```

### Per-Request Proxy (via Context Rotation)

Playwright does not support per-request proxy switching. Achieve it by creating a new context for each request or batch:

```python
async def scrape_url(browser, url, proxy):
    context = await browser.new_context(proxy={"server": proxy})
    page = await context.new_page()
    try:
        await page.goto(url)
        data = await extract_data(page)
        return data
    finally:
        await context.close()
```

### SOCKS5 Proxy

```python
browser = await p.chromium.launch(
    proxy={"server": "socks5://proxy.example.com:1080"}
)
```

## Headless Detection Avoidance

### Running Chrome Channel Instead of Chromium

The bundled Chromium binary has different properties than a real Chrome install. Using the Chrome channel makes the browser indistinguishable from a normal install.

```python
# Use installed Chrome instead of bundled Chromium
browser = await p.chromium.launch(channel="chrome", headless=True)
```

**Requirements:** Chrome must be installed on the system.

### New Headless Mode (Chrome 112+)

Chrome's "new headless" mode is harder to detect than the old one:

```python
browser = await p.chromium.launch(
    args=["--headless=new"],
)
```

### Avoiding Common Flags

Do NOT pass these flags — they are headless-detection signals:
- `--disable-gpu` (old headless workaround, not needed)
- `--no-sandbox` (security risk, detectable)
- `--disable-setuid-sandbox` (same as above)

## Behavioral Evasion

### Mouse Movement Simulation

Anti-bot services track mouse events. A click without preceding mouse movement is suspicious.

```python
async def human_click(page, selector):
    """Click with preceding mouse movement."""
    element = await page.query_selector(selector)
    box = await element.bounding_box()
    if box:
        # Move to element with slight offset
        x = box["x"] + box["width"] / 2 + random.uniform(-5, 5)
        y = box["y"] + box["height"] / 2 + random.uniform(-5, 5)
        await page.mouse.move(x, y, steps=random.randint(5, 15))
        await asyncio.sleep(random.uniform(0.05, 0.2))
        await page.mouse.click(x, y)
```

### Typing Speed Variation

```python
async def human_type(page, selector, text):
    """Type with variable speed like a human."""
    await page.click(selector)
    for char in text:
        await page.keyboard.type(char)
        # Faster for common keys, slower for special characters
        if char in "aeiou tnrs":
            await asyncio.sleep(random.uniform(0.03, 0.08))
        else:
            await asyncio.sleep(random.uniform(0.08, 0.20))
```

### Scroll Behavior

Real users scroll gradually, not in instant jumps.

```python
async def human_scroll(page, distance=None):
    """Scroll down gradually like a human."""
    if distance is None:
        distance = random.randint(300, 800)

    current = 0
    while current < distance:
        step = random.randint(50, 150)
        await page.mouse.wheel(0, step)
        current += step
        await asyncio.sleep(random.uniform(0.05, 0.15))
```

## Detection Testing

### Self-Check Script

Navigate to these URLs to test your stealth configuration:

- `https://bot.sannysoft.com/` — Comprehensive bot detection test
- `https://abrahamjuliot.github.io/creepjs/` — Advanced fingerprint analysis
- `https://browserleaks.com/webgl` — WebGL fingerprint details
- `https://browserleaks.com/canvas` — Canvas fingerprint details

### Quick Test Pattern

```python
async def test_stealth(page):
    """Navigate to detection test page and report results."""
    await page.goto("https://bot.sannysoft.com/")
    await page.wait_for_timeout(3000)

    # Check for failed tests
    failed = await page.eval_on_selector_all(
        "td.failed",
        "els => els.map(e => e.parentElement.querySelector('td').textContent)"
    )

    if failed:
        print(f"FAILED checks: {failed}")
    else:
        print("All checks passed.")

    await page.screenshot(path="stealth_test.png", full_page=True)
```

## Recommended Stealth Stack

For most automation tasks, apply these in order of priority:

1. **WebDriver flag removal** — Critical, takes 2 lines
2. **Custom user agent** — Critical, takes 1 line
3. **Viewport configuration** — High priority, takes 1 line
4. **Request delays** — High priority, add random.uniform() calls
5. **Navigator properties** — Medium priority, init script block
6. **Chrome channel** — Medium priority, one launch option
7. **WebGL override** — Low priority unless hitting advanced anti-bot
8. **Canvas noise** — Low priority unless hitting advanced anti-bot
9. **Proxy rotation** — Only for high-volume or repeated scraping
10. **Behavioral simulation** — Only for sites with behavioral analysis