firefrost-gaming/claude-skills-reference

Files

Reza Rezvani 97952ccbee feat(engineering): add browser-automation and spec-driven-workflow skills

browser-automation (564-line SKILL.md, 3 scripts, 3 references):
- Web scraping, form filling, screenshot capture, data extraction
- Anti-detection patterns, cookie/session management, dynamic content
- scraping_toolkit.py, form_automation_builder.py, anti_detection_checker.py
- NOT testing (that's playwright-pro) — this is automation & scraping

spec-driven-workflow (586-line SKILL.md, 3 scripts, 3 references):
- Spec-first development: write spec BEFORE code
- Bounded autonomy rules, 6-phase workflow, self-review checklist
- spec_generator.py, spec_validator.py, test_extractor.py
- Pairs with tdd-guide for red-green-refactor after spec

Updated engineering plugin.json (31 → 33 skills).
Added both to mkdocs.yml nav and generated docs pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-25 12:57:18 +01:00

14 KiB

Raw Blame History

Anti-Detection Patterns for Browser Automation

This reference covers techniques to make Playwright automation less detectable by anti-bot services. These are defense-in-depth measures — no single technique is sufficient, but combining them significantly reduces detection risk.

Detection Vectors

Anti-bot systems detect automation through multiple signals. Understanding what they check helps you counter effectively.

Tier 1: Trivial Detection (Every Site Checks These)

navigator.webdriver — Set to true by all automation frameworks
User-Agent string — Default headless UA contains "HeadlessChrome"
WebGL renderer — Headless Chrome reports "SwiftShader" or "Google SwiftShader"

Tier 2: Common Detection (Most Anti-Bot Services)

Viewport/screen dimensions — Unusual sizes flag automation
Plugins array — Empty in headless mode, populated in real browsers
Languages — Missing or mismatched locale
Request timing — Machine-speed interactions
Mouse movement — No mouse events between clicks

Tier 3: Advanced Detection (Cloudflare, DataDome, PerimeterX)

Canvas fingerprint — Headless renders differently
WebGL fingerprint — GPU-specific rendering variations
Audio fingerprint — AudioContext processing differences
Font enumeration — Different available fonts in headless
Behavioral analysis — Scroll patterns, click patterns, reading time

Stealth Techniques

1. WebDriver Flag Removal

The most critical fix. Every anti-bot check starts here.

await page.add_init_script("""
    // Remove webdriver flag
    Object.defineProperty(navigator, 'webdriver', {
        get: () => undefined,
    });

    // Remove Playwright-specific properties
    delete window.__playwright;
    delete window.__pw_manual;
""")

2. User Agent Configuration

Match the user agent to the browser you are launching. A Chrome UA with Firefox-specific headers is a red flag.

# Chrome 120 on Windows 10 (most common configuration globally)
CHROME_WIN = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

# Chrome 120 on macOS
CHROME_MAC = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

# Chrome 120 on Linux
CHROME_LINUX = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

# Firefox 121 on Windows
FIREFOX_WIN = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0"

Rules:

Update UAs every 2-3 months as browser versions increment
Match UA platform to navigator.platform override
If using Chromium, use Chrome UAs. If Firefox, use Firefox UAs.
Never use obviously fake or ancient UAs

3. Viewport and Screen Properties

Common real-world screen resolutions (from analytics data):

Resolution	Market Share	Use For
1920x1080	~23%	Default choice
1366x768	~14%	Laptop simulation
1536x864	~9%	Scaled laptop
1440x900	~7%	MacBook
2560x1440	~5%	High-end desktop

import random

VIEWPORTS = [
    {"width": 1920, "height": 1080},
    {"width": 1366, "height": 768},
    {"width": 1536, "height": 864},
    {"width": 1440, "height": 900},
]

viewport = random.choice(VIEWPORTS)
context = await browser.new_context(
    viewport=viewport,
    screen=viewport,  # screen should match viewport
)

4. Navigator Properties Hardening

STEALTH_INIT = """
    // Plugins (headless Chrome has 0 plugins, real Chrome has 3-5)
    Object.defineProperty(navigator, 'plugins', {
        get: () => {
            const plugins = [
                { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
                { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
                { name: 'Native Client', filename: 'internal-nacl-plugin' },
            ];
            plugins.length = 3;
            return plugins;
        },
    });

    // Languages
    Object.defineProperty(navigator, 'languages', {
        get: () => ['en-US', 'en'],
    });

    // Platform (match to user agent)
    Object.defineProperty(navigator, 'platform', {
        get: () => 'Win32',  // or 'MacIntel' for macOS UA
    });

    // Hardware concurrency (real browsers report CPU cores)
    Object.defineProperty(navigator, 'hardwareConcurrency', {
        get: () => 8,
    });

    // Device memory (Chrome-specific)
    Object.defineProperty(navigator, 'deviceMemory', {
        get: () => 8,
    });

    // Connection info
    Object.defineProperty(navigator, 'connection', {
        get: () => ({
            effectiveType: '4g',
            rtt: 50,
            downlink: 10,
            saveData: false,
        }),
    });
"""

await context.add_init_script(STEALTH_INIT)

5. WebGL Fingerprint Evasion

Headless Chrome uses SwiftShader for WebGL, which anti-bot services detect.

# Option A: Launch with a real GPU (headed mode on a machine with GPU)
browser = await p.chromium.launch(headless=False)

# Option B: Override WebGL renderer info
await page.add_init_script("""
    const getParameter = WebGLRenderingContext.prototype.getParameter;
    WebGLRenderingContext.prototype.getParameter = function(parameter) {
        if (parameter === 37445) {
            return 'Intel Inc.';  // UNMASKED_VENDOR_WEBGL
        }
        if (parameter === 37446) {
            return 'Intel(R) Iris(TM) Plus Graphics 640';  // UNMASKED_RENDERER_WEBGL
        }
        return getParameter.call(this, parameter);
    };
""")

6. Canvas Fingerprint Noise

Anti-bot services render text/shapes to a canvas and hash the output. Headless Chrome produces a different hash.

await page.add_init_script("""
    const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
    HTMLCanvasElement.prototype.toDataURL = function(type) {
        if (type === 'image/png' || type === undefined) {
            // Add minimal noise to the canvas to change fingerprint
            const ctx = this.getContext('2d');
            if (ctx) {
                const imageData = ctx.getImageData(0, 0, this.width, this.height);
                for (let i = 0; i < imageData.data.length; i += 4) {
                    // Shift one channel by +/- 1 (imperceptible)
                    imageData.data[i] = imageData.data[i] ^ 1;
                }
                ctx.putImageData(imageData, 0, 0);
            }
        }
        return originalToDataURL.apply(this, arguments);
    };
""")

Request Throttling Patterns

Human-Like Delays

Real users do not click at machine speed. Add realistic delays between actions.

import random
import asyncio

async def human_delay(action_type="browse"):
    """Add realistic delay based on action type."""
    delays = {
        "browse": (1.0, 3.0),      # Browsing between pages
        "read": (2.0, 8.0),        # Reading content
        "fill": (0.3, 0.8),        # Between form fields
        "click": (0.1, 0.5),       # Before clicking
        "scroll": (0.5, 1.5),      # Between scroll actions
    }
    min_s, max_s = delays.get(action_type, (0.5, 2.0))
    await asyncio.sleep(random.uniform(min_s, max_s))

Request Rate Limiting

import time

class RateLimiter:
    """Enforce minimum delay between requests."""

    def __init__(self, min_interval_seconds=1.0):
        self.min_interval = min_interval_seconds
        self.last_request_time = 0

    async def wait(self):
        elapsed = time.time() - self.last_request_time
        if elapsed < self.min_interval:
            await asyncio.sleep(self.min_interval - elapsed)
        self.last_request_time = time.time()

# Usage
limiter = RateLimiter(min_interval_seconds=2.0)
for url in urls:
    await limiter.wait()
    await page.goto(url)

Exponential Backoff on Errors

async def with_backoff(coro_factory, max_retries=5, base_delay=1.0):
    for attempt in range(max_retries):
        try:
            return await coro_factory()
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s...")
            await asyncio.sleep(delay)

Proxy Rotation Strategies

Single Proxy

browser = await p.chromium.launch(
    proxy={"server": "http://proxy.example.com:8080"}
)

Authenticated Proxy

context = await browser.new_context(
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass",
    }
)

Rotating Proxy Pool

PROXIES = [
    "http://proxy1.example.com:8080",
    "http://proxy2.example.com:8080",
    "http://proxy3.example.com:8080",
]

async def create_context_with_proxy(browser):
    proxy = random.choice(PROXIES)
    return await browser.new_context(
        proxy={"server": proxy}
    )

Per-Request Proxy (via Context Rotation)

Playwright does not support per-request proxy switching. Achieve it by creating a new context for each request or batch:

async def scrape_url(browser, url, proxy):
    context = await browser.new_context(proxy={"server": proxy})
    page = await context.new_page()
    try:
        await page.goto(url)
        data = await extract_data(page)
        return data
    finally:
        await context.close()

SOCKS5 Proxy

browser = await p.chromium.launch(
    proxy={"server": "socks5://proxy.example.com:1080"}
)

Headless Detection Avoidance

Running Chrome Channel Instead of Chromium

The bundled Chromium binary has different properties than a real Chrome install. Using the Chrome channel makes the browser indistinguishable from a normal install.

# Use installed Chrome instead of bundled Chromium
browser = await p.chromium.launch(channel="chrome", headless=True)

Requirements: Chrome must be installed on the system.

New Headless Mode (Chrome 112+)

Chrome's "new headless" mode is harder to detect than the old one:

browser = await p.chromium.launch(
    args=["--headless=new"],
)

Avoiding Common Flags

Do NOT pass these flags — they are headless-detection signals:

--disable-gpu (old headless workaround, not needed)
--no-sandbox (security risk, detectable)
--disable-setuid-sandbox (same as above)

Behavioral Evasion

Mouse Movement Simulation

Anti-bot services track mouse events. A click without preceding mouse movement is suspicious.

async def human_click(page, selector):
    """Click with preceding mouse movement."""
    element = await page.query_selector(selector)
    box = await element.bounding_box()
    if box:
        # Move to element with slight offset
        x = box["x"] + box["width"] / 2 + random.uniform(-5, 5)
        y = box["y"] + box["height"] / 2 + random.uniform(-5, 5)
        await page.mouse.move(x, y, steps=random.randint(5, 15))
        await asyncio.sleep(random.uniform(0.05, 0.2))
        await page.mouse.click(x, y)

Typing Speed Variation

async def human_type(page, selector, text):
    """Type with variable speed like a human."""
    await page.click(selector)
    for char in text:
        await page.keyboard.type(char)
        # Faster for common keys, slower for special characters
        if char in "aeiou tnrs":
            await asyncio.sleep(random.uniform(0.03, 0.08))
        else:
            await asyncio.sleep(random.uniform(0.08, 0.20))

Scroll Behavior

Real users scroll gradually, not in instant jumps.

async def human_scroll(page, distance=None):
    """Scroll down gradually like a human."""
    if distance is None:
        distance = random.randint(300, 800)

    current = 0
    while current < distance:
        step = random.randint(50, 150)
        await page.mouse.wheel(0, step)
        current += step
        await asyncio.sleep(random.uniform(0.05, 0.15))

Detection Testing

Self-Check Script

Navigate to these URLs to test your stealth configuration:

https://bot.sannysoft.com/ — Comprehensive bot detection test
https://abrahamjuliot.github.io/creepjs/ — Advanced fingerprint analysis
https://browserleaks.com/webgl — WebGL fingerprint details
https://browserleaks.com/canvas — Canvas fingerprint details

Quick Test Pattern

async def test_stealth(page):
    """Navigate to detection test page and report results."""
    await page.goto("https://bot.sannysoft.com/")
    await page.wait_for_timeout(3000)

    # Check for failed tests
    failed = await page.eval_on_selector_all(
        "td.failed",
        "els => els.map(e => e.parentElement.querySelector('td').textContent)"
    )

    if failed:
        print(f"FAILED checks: {failed}")
    else:
        print("All checks passed.")

    await page.screenshot(path="stealth_test.png", full_page=True)

Recommended Stealth Stack

For most automation tasks, apply these in order of priority:

WebDriver flag removal — Critical, takes 2 lines
Custom user agent — Critical, takes 1 line
Viewport configuration — High priority, takes 1 line
Request delays — High priority, add random.uniform() calls
Navigator properties — Medium priority, init script block
Chrome channel — Medium priority, one launch option
WebGL override — Low priority unless hitting advanced anti-bot
Canvas noise — Low priority unless hitting advanced anti-bot
Proxy rotation — Only for high-volume or repeated scraping
Behavioral simulation — Only for sites with behavioral analysis

14 KiB Raw Blame History