skill-seekers-reference

firefrost-gaming/skill-seekers-reference

Fork 0

Commit Graph

Author	SHA1	Message	Date
yusyus	00c72ea4a3	fix: resolve CI failures across all GitHub Actions workflows - Fix ruff format issue in doc_scraper.py - Add pytest skip markers for browser renderer tests when Playwright is not installed in CI - Replace broken Python heredocs in 4 workflow YAML files (scheduled-updates, vector-db-export, quality-metrics, test-vector-dbs) with python3 -c calls to fix YAML parsing errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 20:40:45 +03:00
yusyus	ea4fed0be4	feat: add headless browser rendering for JavaScript SPA sites (#321 ) New BrowserRenderer class uses Playwright to render JavaScript-heavy documentation sites (React, Vue SPAs) that return empty HTML shells with requests.get(). Activated via --browser flag on web scraping. - browser_renderer.py: Playwright wrapper with lazy browser launch, auto-install Chromium on first use, context manager support - doc_scraper.py: browser_mode config, _render_with_browser() helper, integrated into scrape_page() and scrape_page_async() - SPA detection warnings now suggest --browser flag - Optional dep: pip install "skill-seekers[browser]" - 14 real e2e tests (actual Chromium, no mocks) - UML updated: Scrapers class diagram (BrowserRenderer + dependency), Parsers (DoctorParser), Utilities (Doctor), Components, and new Browser Rendering sequence diagram (#20) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 22:06:14 +03:00

Author

SHA1

Message

Date

yusyus

00c72ea4a3

fix: resolve CI failures across all GitHub Actions workflows

- Fix ruff format issue in doc_scraper.py
- Add pytest skip markers for browser renderer tests when Playwright is
  not installed in CI
- Replace broken Python heredocs in 4 workflow YAML files
  (scheduled-updates, vector-db-export, quality-metrics, test-vector-dbs)
  with python3 -c calls to fix YAML parsing errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-29 20:40:45 +03:00

yusyus

ea4fed0be4

feat: add headless browser rendering for JavaScript SPA sites (#321 )

New BrowserRenderer class uses Playwright to render JavaScript-heavy
documentation sites (React, Vue SPAs) that return empty HTML shells
with requests.get(). Activated via --browser flag on web scraping.

- browser_renderer.py: Playwright wrapper with lazy browser launch,
  auto-install Chromium on first use, context manager support
- doc_scraper.py: browser_mode config, _render_with_browser() helper,
  integrated into scrape_page() and scrape_page_async()
- SPA detection warnings now suggest --browser flag
- Optional dep: pip install "skill-seekers[browser]"
- 14 real e2e tests (actual Chromium, no mocks)
- UML updated: Scrapers class diagram (BrowserRenderer + dependency),
  Parsers (DoctorParser), Utilities (Doctor), Components, and new
  Browser Rendering sequence diagram (#20)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-28 22:06:14 +03:00

2 Commits