yusyus
|
80382551b1
|
Fix Issue #7: Fix all broken configs and add Laravel support
Tested and fixed all 11 production configs - now 100% working!
Fixed Configs:
1. Django (configs/django.json)
- ❌ Was using: div.document (selector doesn't exist)
- ✅ Now using: article (1,688 chars of content)
- Verified on: https://docs.djangoproject.com/en/stable/
2. Astro (configs/astro.json)
- ❌ Was using: homepage URL (no article element)
- ✅ Now using: /en/getting-started/ with article selector
- Added: start_urls, categories, improved URL patterns
- Increased max_pages from 15 to 100
3. Tailwind (configs/tailwind.json)
- ❌ Was using: article (selector doesn't exist)
- ✅ Now using: div.prose (195 chars of content)
- Verified on: https://tailwindcss.com/docs
New Config:
4. Laravel (configs/laravel.json) - NEW!
- Created complete Laravel 9.x config
- Selector: #main-content (16,131 chars of content)
- Base URL: https://laravel.com/docs/9.x/
- Includes: 8 start_urls covering installation, routing,
controllers, views, Blade, Eloquent, migrations, auth
- Categories: getting_started, routing, views, models,
authentication, api
- max_pages: 500
Test Results:
✅ 11/11 configs tested and verified (100%)
✅ All selectors extract content properly
✅ All base URLs accessible
Working Configs:
- ✅ astro.json
- ✅ django.json
- ✅ fastapi.json
- ✅ godot.json
- ✅ godot-large-example.json
- ✅ kubernetes.json
- ✅ laravel.json (NEW)
- ✅ react.json
- ✅ steam-economy-complete.json
- ✅ tailwind.json
- ✅ vue.json
How I Tested:
1. Created test_selectors.py to find correct CSS selectors
2. Tested each config's base_url + selector combination
3. Verified content extraction (not just "found" but actual text)
4. Ensured meaningful content length (50+ chars minimum)
Fixes Issue #7 - Laravel scraping not working
Fixes #7
|
2025-10-21 00:16:39 +03:00 |
|