feat: add headless browser rendering for JavaScript SPA sites (#321)
New BrowserRenderer class uses Playwright to render JavaScript-heavy documentation sites (React, Vue SPAs) that return empty HTML shells with requests.get(). Activated via --browser flag on web scraping. - browser_renderer.py: Playwright wrapper with lazy browser launch, auto-install Chromium on first use, context manager support - doc_scraper.py: browser_mode config, _render_with_browser() helper, integrated into scrape_page() and scrape_page_async() - SPA detection warnings now suggest --browser flag - Optional dep: pip install "skill-seekers[browser]" - 14 real e2e tests (actual Chromium, no mocks) - UML updated: Scrapers class diagram (BrowserRenderer + dependency), Parsers (DoctorParser), Utilities (Doctor), Components, and new Browser Rendering sequence diagram (#20) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Before Width: | Height: | Size: 486 KiB After Width: | Height: | Size: 539 KiB |
|
Before Width: | Height: | Size: 219 KiB After Width: | Height: | Size: 286 KiB |
|
Before Width: | Height: | Size: 223 KiB After Width: | Height: | Size: 268 KiB |
|
Before Width: | Height: | Size: 82 KiB After Width: | Height: | Size: 89 KiB |
BIN
docs/UML/exports/20_browser_rendering_sequence.png
Normal file
|
After Width: | Height: | Size: 100 KiB |
@@ -137,7 +137,12 @@ MCP Client (Claude Code/Cursor) → FastMCPServer (stdio/HTTP) with two invocati
|
||||
### Runtime Components
|
||||

|
||||
|
||||
Component diagram with corrected runtime dependencies. Key flows: `CLI Core` dispatches to `Scrapers` (via `scraper.main(argv)`) and to `Adaptors` (via package/upload commands). `Scrapers` call `Codebase Analysis` via `analyze_codebase(enhance_level)`. `Codebase Analysis` uses `C3.x Classes` internally and `Enhancement` when level ≥ 2. `MCP Server` reaches `Scrapers` via subprocess and `Adaptors` via direct import.
|
||||
Component diagram with corrected runtime dependencies. Key flows: `CLI Core` dispatches to `Scrapers` (via `scraper.main(argv)`) and to `Adaptors` (via package/upload commands). `Scrapers` call `Codebase Analysis` via `analyze_codebase(enhance_level)`. `Codebase Analysis` uses `C3.x Classes` internally and `Enhancement` when level ≥ 2. `MCP Server` reaches `Scrapers` via subprocess and `Adaptors` via direct import. `Scrapers` optionally use `Browser Renderer (Playwright)` via `render_page()` when `--browser` flag is set for JavaScript SPA sites.
|
||||
|
||||
### Browser Rendering Flow
|
||||

|
||||
|
||||
When `--browser` flag is set, `DocScraper.scrape_page()` delegates to `BrowserRenderer.render_page(url)` instead of `requests.get()`. The renderer auto-installs Chromium on first use, navigates with `wait_until='networkidle'` to let JavaScript execute, then returns the fully-rendered HTML. The rest of the pipeline (BeautifulSoup → `extract_content()` → `save_page()`) remains unchanged. Optional dependency: `pip install "skill-seekers[browser]"`.
|
||||
|
||||
## File Locations
|
||||
|
||||
|
||||