- Change --enhance-workflow from type:str to action:append in all argument
files (workflow, create, scrape, github, pdf) so the flag can be given
multiple times to chain workflows in sequence
- Add workflow_runner.py: shared utility used by all 4 scrapers
- collect_workflow_vars(): merges extra context then user --var flags
(user flags take precedence over scraper metadata)
- run_workflows(): executes named workflows in order, then any inline
--enhance-stage workflow; handles dry-run/preview mode
- Remove duplicate ~115-130 line workflow blocks from doc_scraper,
github_scraper, pdf_scraper, and codebase_scraper; replace with
single run_workflows() call each
- Remove mutual exclusivity between workflows and AI enhancement:
workflows now run first, then traditional enhancement continues
independently (--enhance-level 0 to disable)
- Add tests/test_workflow_runner.py: 21 tests covering no-flags, single
workflow, multiple/chained workflows, inline stages, mixed mode,
variable precedence, and dry-run
- Fix test_markdown_parsing: accept "text" or "unknown" for unlabelled
code blocks (unified MarkdownParser returns "text" by default)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Merging with admin override due to known issues:
✅ **What Works**:
- GLM-4.7 Claude-compatible API support (correctly implemented)
- PDF scraper improvements (content truncation fixed, page traceability added)
- Documentation updates comprehensive
⚠️ **Known Issues (will be fixed in next commit)**:
1. Import bugs in 3 files causing UnboundLocalError (30 tests failing)
2. PDF scraper test expectations need updating for new behavior (5 tests failing)
3. test_godot_config failure (pre-existing, not caused by this PR - 1 test failing)
**Action Plan**:
Fixes for issues #1 and #2 are ready and will be committed immediately after merge.
Issue #3 requires separate investigation as it's a pre-existing problem.
Total: 36 failing tests, 35 will be fixed in next commit.
This hotfix resolves 4 critical bugs reported by users:
Issue #258: install command fails with unified_scraper
- Added --fresh and --dry-run flags to unified_scraper.py
- Updated main.py to pass both flags to unified scraper
- Fixed "unrecognized arguments" error
Issue #259 (Original): scrape command doesn't accept positional URL and --max-pages
- Added positional URL argument to scrape command
- Added --max-pages flag with safety warnings (>1000 pages, <10 pages)
- Updated doc_scraper.py and main.py argument parsers
Issue #259 (Comment A): Version shows 2.7.0 instead of actual version
- Fixed hardcoded version in main.py
- Now reads version dynamically from __init__.py
Issue #259 (Comment B): PDF command shows empty "Error: " message
- Improved exception handler in main.py to show exception type if message is empty
- Added proper error handling in pdf_scraper.py with context-specific messages
- Added traceback support in verbose mode
All fixes tested and verified with exact commands from issue reports.
Resolves: #258, #259
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed incorrect variable names in list comprehensions that were causing
NameError in CI (Python 3.11/3.12):
Critical fixes:
- tests/test_markdown_parsing.py: 'l' → 'link' in list comprehension
- src/skill_seekers/cli/pdf_extractor_poc.py: 'l' → 'line' (2 occurrences)
Additional auto-lint fixes:
- Removed unused imports in llms_txt_downloader.py, llms_txt_parser.py
- Fixed comparison operators in config files
- Fixed list comprehension in other files
All tests now pass in CI.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements hybrid smart extraction + improved fallback templates for
skill descriptions across all scrapers.
Changes:
- github_scraper.py:
* Added extract_description_from_readme() helper
* Extracts from README first paragraph (60 lines)
* Updates description after README extraction
* Fallback: "Use when working with {name}"
* Updated 3 locations (GitHubScraper, GitHubToSkillConverter, main)
- doc_scraper.py:
* Added infer_description_from_docs() helper
* Extracts from meta tags or first paragraph (65 lines)
* Tries: meta description, og:description, first content paragraph
* Fallback: "Use when working with {name}"
* Updated 2 locations (create_enhanced_skill_md, get_configuration)
- pdf_scraper.py:
* Added infer_description_from_pdf() helper
* Extracts from PDF metadata (subject, title)
* Fallback: "Use when referencing {name} documentation"
* Updated 3 locations (PDFToSkillConverter, main x2)
- generate_router.py:
* Updated 2 locations with improved router descriptions
* "Use when working with {name} development and programming"
All changes:
- Only apply to NEW skill generations (don't modify existing)
- No API calls (free/offline)
- Smart extraction when metadata/README available
- Improved "Use when..." fallbacks instead of generic templates
- 612 tests passing (100%)
Fixes#191
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#193 - PDF scraping broken for PyPI users
Changed 3 files from absolute to relative imports to fix
ModuleNotFoundError when package is installed via pip:
1. pdf_scraper.py:22
- from pdf_extractor_poc import → from .pdf_extractor_poc import
- Fixes: skill-seekers pdf command failed with import error
2. github_scraper.py:36
- from code_analyzer import → from .code_analyzer import
- Proactive fix: prevents future import errors
3. test_unified_simple.py:17
- from config_validator import → from .config_validator import
- Proactive fix: test helper file
These absolute imports worked locally due to sys.path differences
but failed when installed via PyPI (pip install skill-seekers).
Tested with:
- skill-seekers pdf command now works ✅
- Extracted 32-page Godot Farming PDF successfully
All CLI commands should now work correctly when installed from PyPI.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>