feat: add headless browser rendering for JavaScript SPA sites (#321)
New BrowserRenderer class uses Playwright to render JavaScript-heavy documentation sites (React, Vue SPAs) that return empty HTML shells with requests.get(). Activated via --browser flag on web scraping. - browser_renderer.py: Playwright wrapper with lazy browser launch, auto-install Chromium on first use, context manager support - doc_scraper.py: browser_mode config, _render_with_browser() helper, integrated into scrape_page() and scrape_page_async() - SPA detection warnings now suggest --browser flag - Optional dep: pip install "skill-seekers[browser]" - 14 real e2e tests (actual Chromium, no mocks) - UML updated: Scrapers class diagram (BrowserRenderer + dependency), Parsers (DoctorParser), Utilities (Doctor), Components, and new Browser Rendering sequence diagram (#20) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Before Width: | Height: | Size: 486 KiB After Width: | Height: | Size: 539 KiB |
|
Before Width: | Height: | Size: 219 KiB After Width: | Height: | Size: 286 KiB |
|
Before Width: | Height: | Size: 223 KiB After Width: | Height: | Size: 268 KiB |
|
Before Width: | Height: | Size: 82 KiB After Width: | Height: | Size: 89 KiB |
BIN
docs/UML/exports/20_browser_rendering_sequence.png
Normal file
|
After Width: | Height: | Size: 100 KiB |