Commit Graph

2 Commits

Author SHA1 Message Date
yusyus
00c72ea4a3 fix: resolve CI failures across all GitHub Actions workflows
- Fix ruff format issue in doc_scraper.py
- Add pytest skip markers for browser renderer tests when Playwright is
  not installed in CI
- Replace broken Python heredocs in 4 workflow YAML files
  (scheduled-updates, vector-db-export, quality-metrics, test-vector-dbs)
  with python3 -c calls to fix YAML parsing errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 20:40:45 +03:00
yusyus
ea4fed0be4 feat: add headless browser rendering for JavaScript SPA sites (#321)
New BrowserRenderer class uses Playwright to render JavaScript-heavy
documentation sites (React, Vue SPAs) that return empty HTML shells
with requests.get(). Activated via --browser flag on web scraping.

- browser_renderer.py: Playwright wrapper with lazy browser launch,
  auto-install Chromium on first use, context manager support
- doc_scraper.py: browser_mode config, _render_with_browser() helper,
  integrated into scrape_page() and scrape_page_async()
- SPA detection warnings now suggest --browser flag
- Optional dep: pip install "skill-seekers[browser]"
- 14 real e2e tests (actual Chromium, no mocks)
- UML updated: Scrapers class diagram (BrowserRenderer + dependency),
  Parsers (DoctorParser), Utilities (Doctor), Components, and new
  Browser Rendering sequence diagram (#20)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:06:14 +03:00