skill-seekers-reference

firefrost-gaming/skill-seekers-reference

Author	SHA1	Message	Date
yusyus	0878ad3ef6	fix: resolve all ruff linting errors (W293, F401, B904, UP007, UP045, E741, SIM102, SIM117, ARG) Auto-fixed (whitespace, imports, type annotations): - codebase_scraper.py: W293 blank lines with whitespace - doc_scraper.py: W293 blank lines with whitespace - parsers/extractors/__init__.py: W293 - parsers/extractors/base_parser.py: W293, UP007, UP045, F401 Manual fixes: - enhancement_workflow.py: B904 raise without `from exc`, remove unused `os` import - parsers/extractors/quality_scorer.py: E741 ambiguous var `l` → `line` - parsers/extractors/rst_parser.py: SIM102 nested if → combined conditions (x2) - pdf_scraper.py: F821 undefined `logger` → `print()` (consistent with file style) - mcp/tools/workflow_tools.py: ARG001 unused `args` → `_args` - tests/test_workflow_runner.py: ARG005 unused lambda args → `_a`/`_kw`, ARG001 `kwargs` → `_kwargs` - tests/test_workflows_command.py: SIM117 nested with → combined with (x2) All 1922 tests pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-18 22:44:41 +03:00
yusyus	60c46673ed	feat: support multiple --enhance-workflow flags with shared workflow_runner - Change --enhance-workflow from type:str to action:append in all argument files (workflow, create, scrape, github, pdf) so the flag can be given multiple times to chain workflows in sequence - Add workflow_runner.py: shared utility used by all 4 scrapers - collect_workflow_vars(): merges extra context then user --var flags (user flags take precedence over scraper metadata) - run_workflows(): executes named workflows in order, then any inline --enhance-stage workflow; handles dry-run/preview mode - Remove duplicate ~115-130 line workflow blocks from doc_scraper, github_scraper, pdf_scraper, and codebase_scraper; replace with single run_workflows() call each - Remove mutual exclusivity between workflows and AI enhancement: workflows now run first, then traditional enhancement continues independently (--enhance-level 0 to disable) - Add tests/test_workflow_runner.py: 21 tests covering no-flags, single workflow, multiple/chained workflows, inline stages, mixed mode, variable precedence, and dry-run - Fix test_markdown_parsing: accept "text" or "unknown" for unlabelled code blocks (unified MarkdownParser returns "text" by default) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 22:05:27 +03:00
yusyus	8f720670f2	style: Format code with ruff - Format 5 files affected by PDF scraper changes - Ensures CI/CD code quality checks pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-27 21:11:21 +03:00
Zhichang Yu	9435d2911d	feat: Add GLM-4.7 support and fix PDF scraper issues (#266 ) Merging with admin override due to known issues: ✅ What Works: - GLM-4.7 Claude-compatible API support (correctly implemented) - PDF scraper improvements (content truncation fixed, page traceability added) - Documentation updates comprehensive ⚠️ Known Issues (will be fixed in next commit): 1. Import bugs in 3 files causing UnboundLocalError (30 tests failing) 2. PDF scraper test expectations need updating for new behavior (5 tests failing) 3. test_godot_config failure (pre-existing, not caused by this PR - 1 test failing) Action Plan: Fixes for issues #1 and #2 are ready and will be committed immediately after merge. Issue #3 requires separate investigation as it's a pre-existing problem. Total: 36 failing tests, 35 will be fixed in next commit.	2026-01-27 21:10:40 +03:00
yusyus	cc76efa29a	fix: Critical CLI bug fixes for issues #258 and #259 This hotfix resolves 4 critical bugs reported by users: Issue #258: install command fails with unified_scraper - Added --fresh and --dry-run flags to unified_scraper.py - Updated main.py to pass both flags to unified scraper - Fixed "unrecognized arguments" error Issue #259 (Original): scrape command doesn't accept positional URL and --max-pages - Added positional URL argument to scrape command - Added --max-pages flag with safety warnings (>1000 pages, <10 pages) - Updated doc_scraper.py and main.py argument parsers Issue #259 (Comment A): Version shows 2.7.0 instead of actual version - Fixed hardcoded version in main.py - Now reads version dynamically from __init__.py Issue #259 (Comment B): PDF command shows empty "Error: " message - Improved exception handler in main.py to show exception type if message is empty - Added proper error handling in pdf_scraper.py with context-specific messages - Added traceback support in verbose mode All fixes tested and verified with exact commands from issue reports. Resolves: #258, #259 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-21 23:22:03 +03:00
yusyus	9666938eb0	fix: Resolve 21 ruff linting errors (SIM102, SIM117, B904, SIM113, B007) Fixed all 21 linting errors identified in GitHub Actions: SIM102 (7 errors - nested if statements): - config_extractor.py:468 - Combined nested conditions - config_validator.py (was B904, already fixed) - pattern_recognizer.py:430,538,916 - Combined nested conditions - test_example_extractor.py:365,412,460 - Combined nested conditions - unified_skill_builder.py:1070 - Combined nested conditions SIM117 (9 errors - multiple with statements): - test_install_agent.py:418 - Combined with statements - test_issue_219_e2e.py:278 - Combined with statements - test_llms_txt_downloader.py:33,88 - Combined with statements - test_skip_llms_txt.py:75,98,121,148,172,304 - Combined with statements B904 (1 error - exception handling): - config_validator.py:62 - Added 'from e' to exception chain SIM113 (1 error - enumerate usage): - doc_scraper.py:1068 - Removed unused 'completed' counter variable B007 (1 error - unused loop variable): - pdf_scraper.py:167 - Changed 'keywords' to '_' for unused variable All changes improve code quality without altering functionality. Tests: 1214 passed, 167 skipped (4 pre-existing failures unrelated) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-17 23:54:22 +03:00
yusyus	6439c85cde	fix: Fix list comprehension variable names (NameError in CI) Fixed incorrect variable names in list comprehensions that were causing NameError in CI (Python 3.11/3.12): Critical fixes: - tests/test_markdown_parsing.py: 'l' → 'link' in list comprehension - src/skill_seekers/cli/pdf_extractor_poc.py: 'l' → 'line' (2 occurrences) Additional auto-lint fixes: - Removed unused imports in llms_txt_downloader.py, llms_txt_parser.py - Fixed comparison operators in config files - Fixed list comprehension in other files All tests now pass in CI. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-17 23:33:34 +03:00
yusyus	81dd5bbfbc	fix: Fix remaining 61 ruff linting errors (SIM102, SIM117) Fixed all remaining linting errors from the 310 total: - SIM102: Combined nested if statements (31 errors) - adaptors/openai.py - config_extractor.py - codebase_scraper.py - doc_scraper.py - github_fetcher.py - pattern_recognizer.py - pdf_scraper.py - test_example_extractor.py - SIM117: Combined multiple with statements (24 errors) - tests/test_async_scraping.py (2 errors) - tests/test_github_scraper.py (2 errors) - tests/test_guide_enhancer.py (20 errors) - Fixed test fixture parameter (mock_config in test_c3_integration.py) All 700+ tests passing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-17 23:25:12 +03:00
yusyus	596b219599	fix: Resolve remaining 188 linting errors (249 total fixed) Second batch of comprehensive linting fixes: Unused Arguments/Variables (136 errors): - ARG002/ARG001 (91 errors): Prefixed unused method/function arguments with '_' - Interface methods in adaptors (base.py, gemini.py, markdown.py) - AST analyzer methods maintaining signatures (code_analyzer.py) - Test fixtures and hooks (conftest.py) - Added noqa: ARG001/ARG002 for pytest hooks requiring exact names - F841 (45 errors): Prefixed unused local variables with '_' - Tuple unpacking where some values aren't needed - Variables assigned but not referenced Loop & Boolean Quality (28 errors): - B007 (18 errors): Prefixed unused loop control variables with '_' - enumerate() loops where index not used - for-in loops where loop variable not referenced - E712 (10 errors): Simplified boolean comparisons - Changed '== True' to direct boolean check - Changed '== False' to 'not' expression - Improved test readability Code Quality (24 errors): - SIM201 (4 errors): Already fixed in previous commit - SIM118 (2 errors): Already fixed in previous commit - E741 (4 errors): Already fixed in previous commit - Config manager loop variable fix (1 error) All Tests Passing: - test_scraper_features.py: 42 passed - test_integration.py: 51 passed - test_architecture_scenarios.py: 11 passed - test_real_world_fastmcp.py: 19 passed, 1 skipped Note: Some SIM errors (nested if, multiple with) remain unfixed as they would require non-trivial refactoring. Focus was on functional correctness. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-17 23:02:11 +03:00
Pablo Estevez	c33c6f9073	change max lenght	2026-01-17 17:48:15 +00:00
Pablo Estevez	5ed767ff9a	run ruff	2026-01-17 17:29:21 +00:00
yusyus	a99e22c639	feat: Multi-Source Synthesis Architecture - Rich Standalone Skills + Smart Combination BREAKING CHANGE: Major architectural improvements to multi-source skill generation This commit implements the complete "Multi-Source Synthesis Architecture" where each source (documentation, GitHub, PDF) generates a rich standalone SKILL.md file before being intelligently synthesized with source-specific formulas. ## 🎯 Core Architecture Changes ### 1. Rich Standalone SKILL.md Generation (Source Parity) Each source now generates comprehensive, production-quality SKILL.md files that can stand alone OR be synthesized with other sources. GitHub Scraper Enhancements (+263 lines): - Now generates 300+ line SKILL.md (was ~50 lines) - Integrates C3.x codebase analysis data: - C2.5: API Reference extraction - C3.1: Design pattern detection (27 high-confidence patterns) - C3.2: Test example extraction (215 examples) - C3.7: Architectural pattern analysis - Enhanced sections: - ⚡ Quick Reference with pattern summaries - 📝 Code Examples from real repository tests - 🔧 API Reference from codebase analysis - 🏗️ Architecture Overview with design patterns - ⚠️ Known Issues from GitHub issues - Location: src/skill_seekers/cli/github_scraper.py PDF Scraper Enhancements (+205 lines): - Now generates 200+ line SKILL.md (was ~50 lines) - Enhanced content extraction: - 📖 Chapter Overview (PDF structure breakdown) - 🔑 Key Concepts (extracted from headings) - ⚡ Quick Reference (pattern extraction) - 📝 Code Examples: Top 15 (was top 5), grouped by language - Quality scoring and intelligent truncation - Better formatting and organization - Location: src/skill_seekers/cli/pdf_scraper.py Result: All 3 sources (docs, GitHub, PDF) now have equal capability to generate rich, comprehensive standalone skills. ### 2. File Organization & Caching System Problem: output/ directory cluttered with intermediate files, data, and logs. Solution: New `.skillseeker-cache/` hidden directory for all intermediate files. New Structure: ``` .skillseeker-cache/{skill_name}/ ├── sources/ # Standalone SKILL.md from each source │ ├── httpx_docs/ │ ├── httpx_github/ │ └── httpx_pdf/ ├── data/ # Raw scraped data (JSON) ├── repos/ # Cloned GitHub repositories (cached for reuse) └── logs/ # Session logs with timestamps output/{skill_name}/ # CLEAN: Only final synthesized skill ├── SKILL.md └── references/ ``` Benefits: - ✅ Clean output/ directory (only final product) - ✅ Intermediate files preserved for debugging - ✅ Repository clones cached and reused (faster re-runs) - ✅ Timestamped logs for each scraping session - ✅ All cache dirs added to .gitignore Changes: - .gitignore: Added `.skillseeker-cache/` entry - unified_scraper.py: Complete reorganization (+238 lines) - Added cache directory structure - File logging with timestamps - Repository cloning with caching/reuse - Cleaner intermediate file management - Better subprocess logging and error handling ### 3. Config Repository Migration Moved to separate config repository: https://github.com/yusufkaraaslan/skill-seekers-configs Deleted from this repo (35 config files): - ansible-core.json, astro.json, claude-code.json - django.json, django_unified.json, fastapi.json, fastapi_unified.json - godot.json, godot_unified.json, godot_github.json, godot-large-example.json - react.json, react_unified.json, react_github.json, react_github_example.json - vue.json, kubernetes.json, laravel.json, tailwind.json, hono.json - svelte_cli_unified.json, steam-economy-complete.json - deck_deck_go_local.json, python-tutorial-test.json, example_pdf.json - test-manual.json, fastapi_unified_test.json, fastmcp_github_example.json - example-team/ directory (4 files) Kept as reference example: - configs/httpx_comprehensive.json (complete multi-source example) Rationale: - Cleaner repository (979+ lines added, 1680 deleted) - Configs managed separately with versioning - Official presets available via `fetch-config` command - Users can maintain private config repos ### 4. AI Enhancement Improvements enhance_skill.py (+125 lines): - Better integration with multi-source synthesis - Enhanced prompt generation for synthesized skills - Improved error handling and logging - Support for source metadata in enhancement ### 5. Documentation Updates CLAUDE.md (+252 lines): - Comprehensive project documentation - Architecture explanations - Development workflow guidelines - Testing requirements - Multi-source synthesis patterns SKILL_QUALITY_ANALYSIS.md (new): - Quality assessment framework - Before/after analysis of httpx skill - Grading rubric for skill quality - Metrics and benchmarks ### 6. Testing & Validation Scripts test_httpx_skill.sh (new): - Complete httpx skill generation test - Multi-source synthesis validation - Quality metrics verification test_httpx_quick.sh (new): - Quick validation script - Subset of features for rapid testing ## 📊 Quality Improvements \| Metric \| Before \| After \| Improvement \| \|--------\|--------\|-------\|-------------\| \| GitHub SKILL.md lines \| ~50 \| 300+ \| +500% \| \| PDF SKILL.md lines \| ~50 \| 200+ \| +300% \| \| GitHub C3.x integration \| ❌ No \| ✅ Yes \| New feature \| \| PDF pattern extraction \| ❌ No \| ✅ Yes \| New feature \| \| File organization \| Messy \| Clean cache \| Major improvement \| \| Repository cloning \| Always fresh \| Cached reuse \| Faster re-runs \| \| Logging \| Console only \| Timestamped files \| Better debugging \| \| Config management \| In-repo \| Separate repo \| Cleaner separation \| ## 🧪 Testing All existing tests pass: - test_c3_integration.py: Updated for new architecture - 700+ tests passing - Multi-source synthesis validated with httpx example ## 🔧 Technical Details Modified Core Files: 1. src/skill_seekers/cli/github_scraper.py (+263 lines) - _generate_skill_md(): Rich content with C3.x integration - _format_pattern_summary(): Design pattern summaries - _format_code_examples(): Test example formatting - _format_api_reference(): API reference from codebase - _format_architecture(): Architectural pattern analysis 2. src/skill_seekers/cli/pdf_scraper.py (+205 lines) - _generate_skill_md(): Enhanced with rich content - _format_key_concepts(): Extract concepts from headings - _format_patterns_from_content(): Pattern extraction - Code examples: Top 15, grouped by language, better quality scoring 3. src/skill_seekers/cli/unified_scraper.py (+238 lines) - __init__(): Cache directory structure - _setup_logging(): File logging with timestamps - _clone_github_repo(): Repository caching system - _scrape_documentation(): Move to cache, better logging - Better subprocess handling and error reporting 4. src/skill_seekers/cli/enhance_skill.py (+125 lines) - Multi-source synthesis awareness - Enhanced prompt generation - Better error handling Minor Updates: - src/skill_seekers/cli/codebase_scraper.py (+3 lines): Minor improvements - src/skill_seekers/cli/test_example_extractor.py: Quality scoring adjustments - tests/test_c3_integration.py: Test updates for new architecture ## 🚀 Migration Guide For users with existing configs: No action required - all existing configs continue to work. For users wanting official presets: ```bash # Fetch from official config repo skill-seekers fetch-config --name react --target unified # Or use existing local configs skill-seekers unified --config configs/httpx_comprehensive.json ``` Cache directory: New `.skillseeker-cache/` directory will be created automatically. Safe to delete - will be regenerated on next run. ## 📈 Next Steps This architecture enables: - ✅ Source parity: All sources generate rich standalone skills - ✅ Smart synthesis: Each combination has optimal formula - ✅ Better debugging: Cached files and logs preserved - ✅ Faster iteration: Repository caching, clean output - 🔄 Future: Multi-platform enhancement (Gemini, GPT-4) - planned - 🔄 Future: Conflict detection between sources - planned - 🔄 Future: Source prioritization rules - planned ## 🎓 Example: httpx Skill Quality Before: 186 lines, basic synthesis, missing data After: 640 lines with AI enhancement, A- (9/10) quality What changed: - All C3.x analysis data integrated (patterns, tests, API, architecture) - GitHub metadata included (stars, topics, languages) - PDF chapter structure visible - Professional formatting with emojis and clear sections - Real-world code examples from test suite - Design patterns explained with confidence scores - Known issues with impact assessment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-11 23:01:07 +03:00
yusyus	74bae4b49f	feat(#191 ): Smart description generation for skill descriptions Implements hybrid smart extraction + improved fallback templates for skill descriptions across all scrapers. Changes: - github_scraper.py: * Added extract_description_from_readme() helper * Extracts from README first paragraph (60 lines) * Updates description after README extraction * Fallback: "Use when working with {name}" * Updated 3 locations (GitHubScraper, GitHubToSkillConverter, main) - doc_scraper.py: * Added infer_description_from_docs() helper * Extracts from meta tags or first paragraph (65 lines) * Tries: meta description, og:description, first content paragraph * Fallback: "Use when working with {name}" * Updated 2 locations (create_enhanced_skill_md, get_configuration) - pdf_scraper.py: * Added infer_description_from_pdf() helper * Extracts from PDF metadata (subject, title) * Fallback: "Use when referencing {name} documentation" * Updated 3 locations (PDFToSkillConverter, main x2) - generate_router.py: * Updated 2 locations with improved router descriptions * "Use when working with {name} development and programming" All changes: - Only apply to NEW skill generations (don't modify existing) - No API calls (free/offline) - Smart extraction when metadata/README available - Improved "Use when..." fallbacks instead of generic templates - 612 tests passing (100%) Fixes #191 Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-28 19:00:26 +03:00
yusyus	d7a4c51427	fix: Convert absolute imports to relative imports in cli modules Fixes #193 - PDF scraping broken for PyPI users Changed 3 files from absolute to relative imports to fix ModuleNotFoundError when package is installed via pip: 1. pdf_scraper.py:22 - from pdf_extractor_poc import → from .pdf_extractor_poc import - Fixes: skill-seekers pdf command failed with import error 2. github_scraper.py:36 - from code_analyzer import → from .code_analyzer import - Proactive fix: prevents future import errors 3. test_unified_simple.py:17 - from config_validator import → from .config_validator import - Proactive fix: test helper file These absolute imports worked locally due to sys.path differences but failed when installed via PyPI (pip install skill-seekers). Tested with: - skill-seekers pdf command now works ✅ - Extracted 32-page Godot Farming PDF successfully All CLI commands should now work correctly when installed from PyPI. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 21:47:18 +03:00
yusyus	13ca374295	refactor: Update CLI commands to use new unified entry points Updated all command examples in CLI scripts from old pattern: python3 cli/<script>.py → skill-seekers <command> Changes: - doc_scraper.py → skill-seekers scrape - github_scraper.py → skill-seekers github - pdf_scraper.py → skill-seekers pdf - unified_scraper.py → skill-seekers unified - enhance_skill.py → skill-seekers enhance - enhance_skill_local.py → skill-seekers enhance - package_skill.py → skill-seekers package - estimate_pages.py → skill-seekers estimate This reflects the new modern Python packaging with proper entry points. Users can now use clean commands instead of file paths. Files updated: 10 CLI scripts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-07 01:23:17 +03:00
yusyus	ce1c07b437	feat: Add modern Python packaging - Phase 1 (Foundation) Implements issue #168 - Modern Python packaging with uv support This is Phase 1 of the modernization effort, establishing the core package structure and build system. ## Major Changes ### 1. Migrated to src/ Layout - Moved cli/ → src/skill_seekers/cli/ - Moved skill_seeker_mcp/ → src/skill_seekers/mcp/ - Created root package: src/skill_seekers/__init__.py - Updated all imports: cli. → skill_seekers.cli. - Updated all imports: skill_seeker_mcp. → skill_seekers.mcp. ### 2. Created pyproject.toml - Modern Python packaging configuration - All dependencies properly declared - 8 CLI entry points configured: * skill-seekers (unified CLI) * skill-seekers-scrape * skill-seekers-github * skill-seekers-pdf * skill-seekers-unified * skill-seekers-enhance * skill-seekers-package * skill-seekers-upload * skill-seekers-estimate - uv tool support enabled - Build system: setuptools with wheel ### 3. Created Unified CLI (main.py) - Git-style subcommands (skill-seekers scrape, etc.) - Delegates to existing tool main() functions - Full help system at top-level and subcommand level - Backwards compatible with individual commands ### 4. Updated Package Versions - cli/__init__.py: 1.3.0 → 2.0.0 - mcp/__init__.py: 1.2.0 → 2.0.0 - Root package: 2.0.0 ### 5. Updated Test Suite - Fixed test_package_structure.py for new layout - All 28 package structure tests passing - Updated all test imports for new structure ## Installation Methods (Working) ```bash # Development install pip install -e . # Run unified CLI skill-seekers --version # → 2.0.0 skill-seekers --help # Run individual tools skill-seekers-scrape --help skill-seekers-github --help ``` ## Test Results - Package structure tests: 28/28 passing ✅ - Package installs successfully ✅ - All entry points working ✅ ## Still TODO (Phase 2) - [ ] Run full test suite (299 tests) - [ ] Update documentation (README, CLAUDE.md, etc.) - [ ] Test with uv tool run/install - [ ] Build and publish to PyPI - [ ] Create PR and merge ## Breaking Changes None - fully backwards compatible. Old import paths still work. ## Migration for Users No action needed. Package works with both pip and uv. Closes #168 (when complete) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-07 01:14:24 +03:00

16 Commits