Commit Graph

11 Commits

Author SHA1 Message Date
yusyus
74bae4b49f feat(#191): Smart description generation for skill descriptions
Implements hybrid smart extraction + improved fallback templates for
skill descriptions across all scrapers.

Changes:
- github_scraper.py:
  * Added extract_description_from_readme() helper
  * Extracts from README first paragraph (60 lines)
  * Updates description after README extraction
  * Fallback: "Use when working with {name}"
  * Updated 3 locations (GitHubScraper, GitHubToSkillConverter, main)

- doc_scraper.py:
  * Added infer_description_from_docs() helper
  * Extracts from meta tags or first paragraph (65 lines)
  * Tries: meta description, og:description, first content paragraph
  * Fallback: "Use when working with {name}"
  * Updated 2 locations (create_enhanced_skill_md, get_configuration)

- pdf_scraper.py:
  * Added infer_description_from_pdf() helper
  * Extracts from PDF metadata (subject, title)
  * Fallback: "Use when referencing {name} documentation"
  * Updated 3 locations (PDFToSkillConverter, main x2)

- generate_router.py:
  * Updated 2 locations with improved router descriptions
  * "Use when working with {name} development and programming"

All changes:
- Only apply to NEW skill generations (don't modify existing)
- No API calls (free/offline)
- Smart extraction when metadata/README available
- Improved "Use when..." fallbacks instead of generic templates
- 612 tests passing (100%)

Fixes #191

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28 19:00:26 +03:00
yusyus
c411eb24ec fix: Add UTF-8 encoding to all file operations for Windows compatibility
Fixes #209 - UnicodeDecodeError on Windows with non-ASCII characters

**Problem:**
Windows users with non-English locales (Chinese, Japanese, Korean, etc.)
experienced GBK/SHIFT-JIS codec errors when the system default encoding
is not UTF-8.

Error: 'gbk' codec can't decode byte 0xac in position 206: illegal
multibyte sequence

**Root Cause:**
File operations using open() without explicit encoding parameter use
the system default encoding, which on Windows Chinese edition is GBK.
JSON files contain UTF-8 encoded characters that fail to decode with GBK.

**Solution:**
Added encoding='utf-8' to ALL file operations across:
- doc_scraper.py (4 instances):
  * load_config() - line 1310
  * check_existing_data() - line 1416
  * save_checkpoint() - line 173
  * load_checkpoint() - line 186

- github_scraper.py (1 instance):
  * main() config loading - line 922

- unified_scraper.py (10 instances):
  * All JSON read/write operations - lines 134, 153, 205, 239, 275,
    278, 325, 328, 342, 364

**Test Results:**
-  All 612 tests passing (100% pass rate)
-  Backward compatible (UTF-8 is standard on Linux/macOS)
-  Fixes Windows locale issues

**Impact:**
-  Works on ALL Windows locales (Chinese, Japanese, Korean, etc.)
-  Maintains compatibility with Linux/macOS
-  Prevents future encoding issues

**Thanks to:** @my5icol for the detailed bug report and fix suggestion!
2025-12-28 18:27:50 +03:00
yusyus
eb3b9d9175 fix: Add robust CHANGELOG encoding handling and enhancement flags
Fixes #219 - Two issues resolved:

1. **Encoding Error Fix:**
   - Added graceful error handling for CHANGELOG extraction
   - Handles 'unsupported encoding: none' error from GitHub API
   - Falls back to latin-1 encoding if UTF-8 fails
   - Logs warnings instead of crashing
   - Continues processing even if CHANGELOG has encoding issues

2. **Enhancement Flags Added:**
   - Added --enhance-local flag to github command
   - Added --enhance flag for API-based enhancement
   - Added --api-key flag for API authentication
   - Auto-enhancement after skill building when flags used
   - Matches doc_scraper.py functionality

**Test Results:**
-  All 612 tests passing (100% pass rate)
-  All 22 github_scraper tests passing
-  Backward compatible

**Usage:**
```bash
# Local enhancement (no API key needed)
skill-seekers github --repo ccxt/ccxt --name ccxtSkills --enhance-local

# API-based enhancement
skill-seekers github --repo owner/repo --enhance --api-key sk-ant-...
```
2025-12-28 18:21:03 +03:00
yusyus
65ded6c07c fix: Fix local repo extraction limitations (code analyzer, exclusions, enhancement)
This commit fixes three critical limitations discovered during local repository skill extraction testing:

**Fix 1: Code Analyzer Import Issue**
- Changed unified_scraper.py to use absolute imports instead of relative imports
- Fixed: `from github_scraper import` → `from skill_seekers.cli.github_scraper import`
- Fixed: `from pdf_scraper import` → `from skill_seekers.cli.pdf_scraper import`
- Result: CodeAnalyzer now available during extraction, deep analysis works

**Fix 2: Unity Library Exclusions**
- Updated should_exclude_dir() to accept and check full directory paths
- Updated _extract_file_tree_local() to pass both dir name and full path
- Added exclusion config passing from unified_scraper to github_scraper
- Result: exclude_dirs_additional now works (297 files excluded in test)

**Fix 3: AI Enhancement for Single Sources**
- Changed read_reference_files() to use rglob() for recursive search
- Now finds reference files in subdirectories (e.g., references/github/README.md)
- Result: AI enhancement works with unified skills that have nested references

**Test Results:**
- Code Analyzer:  Working (deep analysis running)
- Unity Exclusions:  Working (297 files excluded from 679)
- AI Enhancement:  Working (finds and reads nested references)

**Files Changed:**
- src/skill_seekers/cli/unified_scraper.py (Fix 1 & 2)
- src/skill_seekers/cli/github_scraper.py (Fix 2)
- src/skill_seekers/cli/utils.py (Fix 3)

**Test Artifacts:**
- configs/deck_deck_go_local.json (test configuration)
- docs/LOCAL_REPO_TEST_RESULTS.md (comprehensive test report)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 22:24:38 +03:00
yusyus
ea289cebe1 feat: Make EXCLUDED_DIRS configurable for local repository analysis
Closes #203

Adds configuration options to customize directory exclusions during local
repository analysis, while maintaining backward compatibility with smart
defaults.

**New Config Options:**

1. `exclude_dirs_additional` - Extend defaults (most common)
   - Adds custom directories to default exclusions
   - Example: ["proprietary", "legacy", "third_party"]
   - Total exclusions = defaults + additional

2. `exclude_dirs` - Replace defaults (advanced users)
   - Completely overrides default exclusions
   - Example: ["node_modules", ".git", "custom_vendor"]
   - Gives full control over exclusions

**Implementation:**

- Modified GitHubScraper.__init__() to parse exclude_dirs config
- Changed should_exclude_dir() to use instance variable instead of global
- Added logging for custom exclusions (INFO for extend, WARNING for replace)
- Maintains backward compatibility (no config = use defaults)

**Testing:**

- Added 12 comprehensive tests in test_excluded_dirs_config.py
  - 3 tests for defaults (backward compatibility)
  - 3 tests for extend mode
  - 3 tests for replace mode
  - 1 test for precedence
  - 2 tests for edge cases
- All 12 new tests passing 
- All 22 existing github_scraper tests passing 

**Documentation:**

- Updated CLAUDE.md config parameters section
- Added detailed "Configurable Directory Exclusions" feature section
- Included examples for both modes
- Listed common use cases (monorepos, enterprise, legacy codebases)

**Use Cases:**

- Monorepos with custom directory structures
- Enterprise projects with non-standard naming conventions
- Including unusual directories for analysis
- Minimal exclusions for small/simple projects

**Backward Compatibility:**

 Fully backward compatible - existing configs work unchanged
 Smart defaults maintained when no config provided
 All existing tests pass

Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
2025-11-29 23:53:27 +03:00
yusyus
58ec69eb52 feat: Add unlimited local repository analysis with bug fixes (PR #195)
Merges PR #195 by @jimmy058910 with conflict resolution.

**New Features:**
- Local repository analysis via `local_repo_path` configuration
- Bypass GitHub API rate limits (50 → unlimited files)
- Auto-exclusion of virtual environments and build artifacts
- Support for analyzing large codebases (323 files vs 50 before)

**Improvements:**
- Code analysis coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST parsing errors: 95 → 0 (-100%)

**Conflict Resolution:**
- Preserved logger initialization fix from development (Issue #190)
- Kept relative imports from development (Task 1.2 fix)
- Integrated EXCLUDED_DIRS and local repo features from PR
- Combined best of both implementations

**Testing:**
-  All 22 GitHub scraper tests passing
-  Syntax validation passed
-  Local repo analysis feature intact
-  Bug fixes from development preserved

Original implementation by @jimmy058910 in PR #195.
Conflict resolution preserves all bug fixes while adding local repo feature.

Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
2025-11-29 22:46:31 +03:00
yusyus
414519b3c7 fix: Initialize logger before use in github_scraper.py
Fixes Issue #190 - "name 'logger' is not defined" error

**Problem:**
- Logger was used at line 40 (in code_analyzer import exception)
- Logger was defined at line 47
- Caused runtime error when code_analyzer import failed

**Solution:**
- Moved logging.basicConfig() and logger initialization to lines 34-39
- Now logger is defined BEFORE the code_analyzer import block
- Warning message now works correctly when code_analyzer is missing

**Testing:**
-  All 22 GitHub scraper tests pass
-  Logger warning appears correctly when code_analyzer missing
-  No similar issues found in other CLI files

Closes #190
2025-11-29 22:01:38 +03:00
yusyus
d7a4c51427 fix: Convert absolute imports to relative imports in cli modules
Fixes #193 - PDF scraping broken for PyPI users

Changed 3 files from absolute to relative imports to fix
ModuleNotFoundError when package is installed via pip:

1. pdf_scraper.py:22
   - from pdf_extractor_poc import → from .pdf_extractor_poc import
   - Fixes: skill-seekers pdf command failed with import error

2. github_scraper.py:36
   - from code_analyzer import → from .code_analyzer import
   - Proactive fix: prevents future import errors

3. test_unified_simple.py:17
   - from config_validator import → from .config_validator import
   - Proactive fix: test helper file

These absolute imports worked locally due to sys.path differences
but failed when installed via PyPI (pip install skill-seekers).

Tested with:
- skill-seekers pdf command now works 
- Extracted 32-page Godot Farming PDF successfully

All CLI commands should now work correctly when installed from PyPI.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 21:47:18 +03:00
Jimmy Moceri
0b2a0d121e feat: Add unlimited local repository analysis and fix 10 critical bugs
Features:
- Add local_repo_path config parameter for unlimited file analysis
- Auto-exclude virtual environments and build artifacts (95% noise reduction)
- Enable comprehensive codebase analysis (50 → 323 files, 546% increase)

Bug Fixes:
- Fix logger initialization error (Issue #190)
- Fix NoneType subscriptable errors in release tag parsing (3 instances)
- Fix relative import paths causing ModuleNotFoundError
- Fix hardcoded 50-file analysis limit
- Fix GitHub API file tree limitation (140 → 345 files discovered)
- Fix AST parser 'not iterable' errors (95 → 0 parsing failures)
- Fix virtual environment file pollution (23,341 → 1,109 file tree items)
- Fix force_rescrape flag not checked before interactive prompt

Impact:
- Code coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST errors: 95 → 0 (-100%)

Tested on JMo Security repository with 345 Python files.
2025-11-16 22:35:23 -05:00
yusyus
13ca374295 refactor: Update CLI commands to use new unified entry points
Updated all command examples in CLI scripts from old pattern:
  python3 cli/<script>.py → skill-seekers <command>

Changes:
- doc_scraper.py → skill-seekers scrape
- github_scraper.py → skill-seekers github
- pdf_scraper.py → skill-seekers pdf
- unified_scraper.py → skill-seekers unified
- enhance_skill.py → skill-seekers enhance
- enhance_skill_local.py → skill-seekers enhance
- package_skill.py → skill-seekers package
- estimate_pages.py → skill-seekers estimate

This reflects the new modern Python packaging with proper entry
points. Users can now use clean commands instead of file paths.

Files updated: 10 CLI scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:23:17 +03:00
yusyus
ce1c07b437 feat: Add modern Python packaging - Phase 1 (Foundation)
Implements issue #168 - Modern Python packaging with uv support

This is Phase 1 of the modernization effort, establishing the core
package structure and build system.

## Major Changes

### 1. Migrated to src/ Layout
- Moved cli/ → src/skill_seekers/cli/
- Moved skill_seeker_mcp/ → src/skill_seekers/mcp/
- Created root package: src/skill_seekers/__init__.py
- Updated all imports: cli. → skill_seekers.cli.
- Updated all imports: skill_seeker_mcp. → skill_seekers.mcp.

### 2. Created pyproject.toml
- Modern Python packaging configuration
- All dependencies properly declared
- 8 CLI entry points configured:
  * skill-seekers (unified CLI)
  * skill-seekers-scrape
  * skill-seekers-github
  * skill-seekers-pdf
  * skill-seekers-unified
  * skill-seekers-enhance
  * skill-seekers-package
  * skill-seekers-upload
  * skill-seekers-estimate
- uv tool support enabled
- Build system: setuptools with wheel

### 3. Created Unified CLI (main.py)
- Git-style subcommands (skill-seekers scrape, etc.)
- Delegates to existing tool main() functions
- Full help system at top-level and subcommand level
- Backwards compatible with individual commands

### 4. Updated Package Versions
- cli/__init__.py: 1.3.0 → 2.0.0
- mcp/__init__.py: 1.2.0 → 2.0.0
- Root package: 2.0.0

### 5. Updated Test Suite
- Fixed test_package_structure.py for new layout
- All 28 package structure tests passing
- Updated all test imports for new structure

## Installation Methods (Working)

```bash
# Development install
pip install -e .

# Run unified CLI
skill-seekers --version  # → 2.0.0
skill-seekers --help

# Run individual tools
skill-seekers-scrape --help
skill-seekers-github --help
```

## Test Results
- Package structure tests: 28/28 passing 
- Package installs successfully 
- All entry points working 

## Still TODO (Phase 2)
- [ ] Run full test suite (299 tests)
- [ ] Update documentation (README, CLAUDE.md, etc.)
- [ ] Test with uv tool run/install
- [ ] Build and publish to PyPI
- [ ] Create PR and merge

## Breaking Changes
None - fully backwards compatible. Old import paths still work.

## Migration for Users
No action needed. Package works with both pip and uv.

Closes #168 (when complete)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:14:24 +03:00