yusyus
d76ab1d9a4
fix: report accurate saved/skipped page counts and detect SPA sites ( #320 , #321 )
...
The scraper previously reported len(visited_urls) as "Scraped N pages"
even when save_page() silently skipped pages with empty content (<50
chars). For JavaScript SPA sites this meant "Scraped 190 pages" followed
by "No scraped data found!" with no explanation.
Changes:
- Added pages_saved/pages_skipped counters to DocToSkillConverter
- save_page() now increments pages_skipped on skip, pages_saved on save
- New _log_scrape_completion() reports "(N saved, M skipped)" breakdown
- SPA detection warns when all/most pages have empty content
- build_skill() error now explains empty content cause when pages skipped
- Updated both sync and async scrape completion paths
- 14 new tests across 4 test classes (counting, messages, SPA, build)
Fixes #320
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-24 22:26:35 +03:00
..
2025-10-29 23:19:32 +03:00
2026-03-21 20:31:51 +03:00
2025-10-19 02:08:58 +03:00
2026-01-17 23:02:11 +03:00
2026-02-07 22:55:02 +03:00
2025-10-19 17:01:37 +03:00
2026-03-14 23:39:23 +03:00
2026-02-22 22:32:31 +03:00
2026-01-31 21:30:00 +03:00
2026-01-17 17:48:15 +00:00
2026-01-18 00:01:30 +03:00
2026-02-08 13:34:48 +03:00
2026-02-08 14:49:45 +03:00
2026-01-17 23:02:11 +03:00
2026-01-29 22:56:33 +03:00
2026-01-17 23:25:12 +03:00
2026-03-01 10:54:32 +03:00
2026-03-15 15:30:15 +03:00
2026-03-16 00:53:35 +03:00
2026-03-01 10:54:45 +03:00
2026-02-08 14:42:27 +03:00
2026-01-17 17:29:21 +00:00
2026-02-05 21:27:41 +03:00
2026-02-22 20:43:17 +03:00
2026-02-26 22:25:59 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 17:48:15 +00:00
2026-03-01 10:54:32 +03:00
2026-02-22 20:43:17 +03:00
2026-01-17 22:54:40 +03:00
2026-01-17 23:02:11 +03:00
2026-02-08 14:49:45 +03:00
2026-02-08 14:49:45 +03:00
2026-02-22 22:32:31 +03:00
2026-02-26 00:30:40 +03:00
2026-03-15 02:34:41 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 23:02:11 +03:00
2026-02-08 14:42:27 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 23:02:11 +03:00
2026-01-17 23:02:11 +03:00
2026-01-17 23:02:11 +03:00
2026-03-20 22:35:12 +03:00
2026-01-18 00:01:30 +03:00
2026-01-31 21:30:00 +03:00
2026-02-08 14:42:27 +03:00
2026-03-21 20:31:51 +03:00
2026-01-17 17:48:15 +00:00
2026-02-22 22:32:31 +03:00
2026-02-22 20:43:17 +03:00
2026-02-22 20:43:17 +03:00
2026-02-22 20:43:17 +03:00
2026-02-15 20:24:32 +03:00
2026-03-21 21:24:21 +03:00
2026-03-21 21:24:21 +03:00
2026-01-17 17:29:21 +00:00
2026-01-17 23:02:11 +03:00
2026-01-18 00:01:30 +03:00
2026-01-17 17:29:21 +00:00
2026-03-21 00:30:48 +03:00
2026-02-22 20:43:17 +03:00
2026-02-22 20:43:17 +03:00
2026-02-08 14:44:46 +03:00
2026-02-07 20:59:03 +03:00
2026-02-22 22:32:31 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 17:48:15 +00:00
2026-02-08 14:42:27 +03:00
2026-03-15 15:30:15 +03:00
2026-03-21 20:31:51 +03:00
2026-01-17 17:48:15 +00:00
2026-03-16 00:53:35 +03:00
2026-01-17 17:48:15 +00:00
2026-02-15 20:24:32 +03:00
2026-02-08 13:33:15 +03:00
2026-01-17 22:54:40 +03:00
2026-02-04 21:00:49 +03:00
2026-02-22 22:32:31 +03:00
2026-03-01 10:54:32 +03:00
2026-01-17 17:29:21 +00:00
2026-02-22 21:52:04 +03:00
2026-01-17 17:48:15 +00:00
2026-02-08 14:42:27 +03:00
2026-02-22 20:43:17 +03:00
2026-01-17 17:48:15 +00:00
2026-01-17 23:02:11 +03:00
2026-03-24 22:26:35 +03:00
2026-03-21 00:30:48 +03:00
2026-02-08 14:42:27 +03:00
2026-01-18 13:48:37 +03:00
2026-01-18 00:01:30 +03:00
2026-01-17 23:02:11 +03:00
2026-02-15 20:24:32 +03:00
2026-01-17 23:02:11 +03:00
2026-02-08 14:42:27 +03:00
2026-02-22 22:32:31 +03:00
2026-03-15 02:16:32 +03:00
2026-03-15 02:16:32 +03:00
2026-02-22 20:43:17 +03:00
2026-02-02 23:08:25 +03:00
2026-02-22 20:43:17 +03:00
2026-02-26 22:25:59 +03:00
2026-02-18 22:50:05 +03:00
2026-03-21 21:24:21 +03:00
2026-02-22 22:32:31 +03:00
2026-03-01 10:54:32 +03:00
2026-01-17 17:48:15 +00:00
2026-03-21 21:24:21 +03:00
2026-01-17 17:29:21 +00:00
2026-03-01 21:48:21 +03:00
2026-03-01 19:48:02 +03:00
2026-03-01 10:54:32 +03:00
2026-02-18 22:50:05 +03:00
2026-02-18 22:50:05 +03:00
2026-02-18 22:50:05 +03:00