Commit Graph

18 Commits

Author SHA1 Message Date
Pablo Estevez
c33c6f9073 change max lenght 2026-01-17 17:48:15 +00:00
Pablo Estevez
5ed767ff9a run ruff 2026-01-17 17:29:21 +00:00
yusyus
c89f059712 feat(v2.7.0): Smart Rate Limit Management & Multi-Token Configuration
Major Features:
- Multi-profile GitHub token system with secure storage
- Smart rate limit handler with 4 strategies (prompt/wait/switch/fail)
- Interactive configuration wizard with browser integration
- Configurable timeout (default 30 min) per profile
- Automatic profile switching on rate limits
- Live countdown timers with real-time progress
- Non-interactive mode for CI/CD (--non-interactive flag)
- Progress tracking and resume capability (skeleton)
- Comprehensive test suite (16 tests, all passing)

Solves:
- Indefinite waiting on GitHub rate limits
- Confusing GitHub token setup

Files Added:
- src/skill_seekers/cli/config_manager.py (~490 lines)
- src/skill_seekers/cli/config_command.py (~400 lines)
- src/skill_seekers/cli/rate_limit_handler.py (~450 lines)
- src/skill_seekers/cli/resume_command.py (~150 lines)
- tests/test_rate_limit_handler.py (16 tests)

Files Modified:
- src/skill_seekers/cli/github_fetcher.py (rate limit integration)
- src/skill_seekers/cli/github_scraper.py (--non-interactive, --profile flags)
- src/skill_seekers/cli/main.py (config, resume subcommands)
- pyproject.toml (version 2.7.0)
- CHANGELOG.md, README.md, CLAUDE.md (documentation)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-17 18:38:31 +03:00
yusyus
a99e22c639 feat: Multi-Source Synthesis Architecture - Rich Standalone Skills + Smart Combination
BREAKING CHANGE: Major architectural improvements to multi-source skill generation

This commit implements the complete "Multi-Source Synthesis Architecture" where
each source (documentation, GitHub, PDF) generates a rich standalone SKILL.md
file before being intelligently synthesized with source-specific formulas.

## 🎯 Core Architecture Changes

### 1. Rich Standalone SKILL.md Generation (Source Parity)

Each source now generates comprehensive, production-quality SKILL.md files that
can stand alone OR be synthesized with other sources.

**GitHub Scraper Enhancements** (+263 lines):
- Now generates 300+ line SKILL.md (was ~50 lines)
- Integrates C3.x codebase analysis data:
  - C2.5: API Reference extraction
  - C3.1: Design pattern detection (27 high-confidence patterns)
  - C3.2: Test example extraction (215 examples)
  - C3.7: Architectural pattern analysis
- Enhanced sections:
  -  Quick Reference with pattern summaries
  - 📝 Code Examples from real repository tests
  - 🔧 API Reference from codebase analysis
  - 🏗️ Architecture Overview with design patterns
  - ⚠️ Known Issues from GitHub issues
- Location: src/skill_seekers/cli/github_scraper.py

**PDF Scraper Enhancements** (+205 lines):
- Now generates 200+ line SKILL.md (was ~50 lines)
- Enhanced content extraction:
  - 📖 Chapter Overview (PDF structure breakdown)
  - 🔑 Key Concepts (extracted from headings)
  -  Quick Reference (pattern extraction)
  - 📝 Code Examples: Top 15 (was top 5), grouped by language
  - Quality scoring and intelligent truncation
- Better formatting and organization
- Location: src/skill_seekers/cli/pdf_scraper.py

**Result**: All 3 sources (docs, GitHub, PDF) now have equal capability to
generate rich, comprehensive standalone skills.

### 2. File Organization & Caching System

**Problem**: output/ directory cluttered with intermediate files, data, and logs.

**Solution**: New `.skillseeker-cache/` hidden directory for all intermediate files.

**New Structure**:
```
.skillseeker-cache/{skill_name}/
├── sources/          # Standalone SKILL.md from each source
│   ├── httpx_docs/
│   ├── httpx_github/
│   └── httpx_pdf/
├── data/             # Raw scraped data (JSON)
├── repos/            # Cloned GitHub repositories (cached for reuse)
└── logs/             # Session logs with timestamps

output/{skill_name}/  # CLEAN: Only final synthesized skill
├── SKILL.md
└── references/
```

**Benefits**:
-  Clean output/ directory (only final product)
-  Intermediate files preserved for debugging
-  Repository clones cached and reused (faster re-runs)
-  Timestamped logs for each scraping session
-  All cache dirs added to .gitignore

**Changes**:
- .gitignore: Added `.skillseeker-cache/` entry
- unified_scraper.py: Complete reorganization (+238 lines)
  - Added cache directory structure
  - File logging with timestamps
  - Repository cloning with caching/reuse
  - Cleaner intermediate file management
  - Better subprocess logging and error handling

### 3. Config Repository Migration

**Moved to separate config repository**: https://github.com/yusufkaraaslan/skill-seekers-configs

**Deleted from this repo** (35 config files):
- ansible-core.json, astro.json, claude-code.json
- django.json, django_unified.json, fastapi.json, fastapi_unified.json
- godot.json, godot_unified.json, godot_github.json, godot-large-example.json
- react.json, react_unified.json, react_github.json, react_github_example.json
- vue.json, kubernetes.json, laravel.json, tailwind.json, hono.json
- svelte_cli_unified.json, steam-economy-complete.json
- deck_deck_go_local.json, python-tutorial-test.json, example_pdf.json
- test-manual.json, fastapi_unified_test.json, fastmcp_github_example.json
- example-team/ directory (4 files)

**Kept as reference example**:
- configs/httpx_comprehensive.json (complete multi-source example)

**Rationale**:
- Cleaner repository (979+ lines added, 1680 deleted)
- Configs managed separately with versioning
- Official presets available via `fetch-config` command
- Users can maintain private config repos

### 4. AI Enhancement Improvements

**enhance_skill.py** (+125 lines):
- Better integration with multi-source synthesis
- Enhanced prompt generation for synthesized skills
- Improved error handling and logging
- Support for source metadata in enhancement

### 5. Documentation Updates

**CLAUDE.md** (+252 lines):
- Comprehensive project documentation
- Architecture explanations
- Development workflow guidelines
- Testing requirements
- Multi-source synthesis patterns

**SKILL_QUALITY_ANALYSIS.md** (new):
- Quality assessment framework
- Before/after analysis of httpx skill
- Grading rubric for skill quality
- Metrics and benchmarks

### 6. Testing & Validation Scripts

**test_httpx_skill.sh** (new):
- Complete httpx skill generation test
- Multi-source synthesis validation
- Quality metrics verification

**test_httpx_quick.sh** (new):
- Quick validation script
- Subset of features for rapid testing

## 📊 Quality Improvements

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| GitHub SKILL.md lines | ~50 | 300+ | +500% |
| PDF SKILL.md lines | ~50 | 200+ | +300% |
| GitHub C3.x integration |  No |  Yes | New feature |
| PDF pattern extraction |  No |  Yes | New feature |
| File organization | Messy | Clean cache | Major improvement |
| Repository cloning | Always fresh | Cached reuse | Faster re-runs |
| Logging | Console only | Timestamped files | Better debugging |
| Config management | In-repo | Separate repo | Cleaner separation |

## 🧪 Testing

All existing tests pass:
- test_c3_integration.py: Updated for new architecture
- 700+ tests passing
- Multi-source synthesis validated with httpx example

## 🔧 Technical Details

**Modified Core Files**:
1. src/skill_seekers/cli/github_scraper.py (+263 lines)
   - _generate_skill_md(): Rich content with C3.x integration
   - _format_pattern_summary(): Design pattern summaries
   - _format_code_examples(): Test example formatting
   - _format_api_reference(): API reference from codebase
   - _format_architecture(): Architectural pattern analysis

2. src/skill_seekers/cli/pdf_scraper.py (+205 lines)
   - _generate_skill_md(): Enhanced with rich content
   - _format_key_concepts(): Extract concepts from headings
   - _format_patterns_from_content(): Pattern extraction
   - Code examples: Top 15, grouped by language, better quality scoring

3. src/skill_seekers/cli/unified_scraper.py (+238 lines)
   - __init__(): Cache directory structure
   - _setup_logging(): File logging with timestamps
   - _clone_github_repo(): Repository caching system
   - _scrape_documentation(): Move to cache, better logging
   - Better subprocess handling and error reporting

4. src/skill_seekers/cli/enhance_skill.py (+125 lines)
   - Multi-source synthesis awareness
   - Enhanced prompt generation
   - Better error handling

**Minor Updates**:
- src/skill_seekers/cli/codebase_scraper.py (+3 lines): Minor improvements
- src/skill_seekers/cli/test_example_extractor.py: Quality scoring adjustments
- tests/test_c3_integration.py: Test updates for new architecture

## 🚀 Migration Guide

**For users with existing configs**:
No action required - all existing configs continue to work.

**For users wanting official presets**:
```bash
# Fetch from official config repo
skill-seekers fetch-config --name react --target unified

# Or use existing local configs
skill-seekers unified --config configs/httpx_comprehensive.json
```

**Cache directory**:
New `.skillseeker-cache/` directory will be created automatically.
Safe to delete - will be regenerated on next run.

## 📈 Next Steps

This architecture enables:
-  Source parity: All sources generate rich standalone skills
-  Smart synthesis: Each combination has optimal formula
-  Better debugging: Cached files and logs preserved
-  Faster iteration: Repository caching, clean output
- 🔄 Future: Multi-platform enhancement (Gemini, GPT-4) - planned
- 🔄 Future: Conflict detection between sources - planned
- 🔄 Future: Source prioritization rules - planned

## 🎓 Example: httpx Skill Quality

**Before**: 186 lines, basic synthesis, missing data
**After**: 640 lines with AI enhancement, A- (9/10) quality

**What changed**:
- All C3.x analysis data integrated (patterns, tests, API, architecture)
- GitHub metadata included (stars, topics, languages)
- PDF chapter structure visible
- Professional formatting with emojis and clear sections
- Real-world code examples from test suite
- Design patterns explained with confidence scores
- Known issues with impact assessment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-11 23:01:07 +03:00
yusyus
eac1f4ef8e feat(C2.1): Add .gitignore support to github_scraper for local repos
- Add pathspec import with graceful fallback
- Add gitignore_spec attribute to GitHubScraper class
- Implement _load_gitignore() method to parse .gitignore files
- Update should_exclude_dir() to check .gitignore rules
- Load .gitignore automatically in local repository mode
- Handle directory patterns with and without trailing slash
- Add 4 comprehensive tests for .gitignore functionality

Closes #63 - C2.1 File Tree Walker with .gitignore support complete

Features:
- Loads .gitignore from local repository root
- Respects .gitignore patterns for directory exclusion
- Falls back gracefully when pathspec not installed
- Works alongside existing hard-coded exclusions
- Only active in local_repo_path mode (not GitHub API mode)

Test coverage:
- test_load_gitignore_exists: .gitignore parsing
- test_load_gitignore_missing: Missing .gitignore handling
- test_should_exclude_dir_with_gitignore: .gitignore exclusion
- test_should_exclude_dir_default_exclusions: Existing exclusions still work

Integration:
- github_scraper.py now has same .gitignore support as codebase_scraper.py
- Both tools use pathspec library for consistent behavior
- Enables proper repository analysis respecting project .gitignore rules
2026-01-01 23:21:12 +03:00
yusyus
f2faebb8d5 fix: Complete fix for Issue #219 - All three problems resolved
**Problem #1: Large File Encoding Error**  FIXED
- Add large file download support via download_url
- Detect encoding='none' for files >1MB
- Download via GitHub raw URL instead of API
- Handles ccxt/ccxt's 1.4MB CHANGELOG.md successfully

**Problem #2: Missing CLI Enhancement Flags**  FIXED
- Add --enhance, --enhance-local, --api-key to main.py github_parser
- Add flag forwarding in CLI dispatcher
- Fixes 'unrecognized arguments' error
- Users can now use: skill-seekers github --repo owner/repo --enhance-local

**Problem #3: Custom API Endpoint Support**  FIXED
- Support ANTHROPIC_BASE_URL environment variable
- Support ANTHROPIC_AUTH_TOKEN (alternative to ANTHROPIC_API_KEY)
- Fix ThinkingBlock.text error with newer Anthropic SDK
- Find TextBlock in response content array (handles thinking blocks)

**Changes**:
- src/skill_seekers/cli/enhance_skill.py:
  - Support custom base_url parameter
  - Support both ANTHROPIC_API_KEY and ANTHROPIC_AUTH_TOKEN
  - Iterate through content blocks to find text (handles ThinkingBlock)

- src/skill_seekers/cli/main.py:
  - Add --enhance, --enhance-local, --api-key to github_parser
  - Forward flags to github_scraper.py in dispatcher

- src/skill_seekers/cli/github_scraper.py:
  - Add large file detection (encoding=None/"none")
  - Download via download_url with requests
  - Log file size and download progress

- tests/test_github_scraper.py:
  - Add test_get_file_content_large_file
  - Add test_extract_changelog_large_file
  - All 31 tests passing 

**Credits**:
- Thanks to @XGCoder for detailed bug report
- Thanks to @gorquan for local fixes and guidance

Fixes #219

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 20:57:03 +03:00
yusyus
58286f454a fix: Handle symlinked README.md and CHANGELOG.md in GitHub scraper
- Add _get_file_content() helper method to detect and follow symlinks
- Update _extract_readme() to use new helper
- Update _extract_changelog() to use new helper
- Add 7 comprehensive tests for symlink handling
- All 29 GitHub scraper tests passing

Fixes #225

When README.md or CHANGELOG.md are symlinks (like in vercel/ai repo),
PyGithub returns ContentFile with type='symlink' and encoding=None.
Direct access to decoded_content throws AssertionError.

Solution: Detect symlink type, follow target path, then decode actual file.
Handles edge cases: broken symlinks, missing targets, encoding errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 20:41:28 +03:00
yusyus
74bae4b49f feat(#191): Smart description generation for skill descriptions
Implements hybrid smart extraction + improved fallback templates for
skill descriptions across all scrapers.

Changes:
- github_scraper.py:
  * Added extract_description_from_readme() helper
  * Extracts from README first paragraph (60 lines)
  * Updates description after README extraction
  * Fallback: "Use when working with {name}"
  * Updated 3 locations (GitHubScraper, GitHubToSkillConverter, main)

- doc_scraper.py:
  * Added infer_description_from_docs() helper
  * Extracts from meta tags or first paragraph (65 lines)
  * Tries: meta description, og:description, first content paragraph
  * Fallback: "Use when working with {name}"
  * Updated 2 locations (create_enhanced_skill_md, get_configuration)

- pdf_scraper.py:
  * Added infer_description_from_pdf() helper
  * Extracts from PDF metadata (subject, title)
  * Fallback: "Use when referencing {name} documentation"
  * Updated 3 locations (PDFToSkillConverter, main x2)

- generate_router.py:
  * Updated 2 locations with improved router descriptions
  * "Use when working with {name} development and programming"

All changes:
- Only apply to NEW skill generations (don't modify existing)
- No API calls (free/offline)
- Smart extraction when metadata/README available
- Improved "Use when..." fallbacks instead of generic templates
- 612 tests passing (100%)

Fixes #191

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28 19:00:26 +03:00
yusyus
c411eb24ec fix: Add UTF-8 encoding to all file operations for Windows compatibility
Fixes #209 - UnicodeDecodeError on Windows with non-ASCII characters

**Problem:**
Windows users with non-English locales (Chinese, Japanese, Korean, etc.)
experienced GBK/SHIFT-JIS codec errors when the system default encoding
is not UTF-8.

Error: 'gbk' codec can't decode byte 0xac in position 206: illegal
multibyte sequence

**Root Cause:**
File operations using open() without explicit encoding parameter use
the system default encoding, which on Windows Chinese edition is GBK.
JSON files contain UTF-8 encoded characters that fail to decode with GBK.

**Solution:**
Added encoding='utf-8' to ALL file operations across:
- doc_scraper.py (4 instances):
  * load_config() - line 1310
  * check_existing_data() - line 1416
  * save_checkpoint() - line 173
  * load_checkpoint() - line 186

- github_scraper.py (1 instance):
  * main() config loading - line 922

- unified_scraper.py (10 instances):
  * All JSON read/write operations - lines 134, 153, 205, 239, 275,
    278, 325, 328, 342, 364

**Test Results:**
-  All 612 tests passing (100% pass rate)
-  Backward compatible (UTF-8 is standard on Linux/macOS)
-  Fixes Windows locale issues

**Impact:**
-  Works on ALL Windows locales (Chinese, Japanese, Korean, etc.)
-  Maintains compatibility with Linux/macOS
-  Prevents future encoding issues

**Thanks to:** @my5icol for the detailed bug report and fix suggestion!
2025-12-28 18:27:50 +03:00
yusyus
eb3b9d9175 fix: Add robust CHANGELOG encoding handling and enhancement flags
Fixes #219 - Two issues resolved:

1. **Encoding Error Fix:**
   - Added graceful error handling for CHANGELOG extraction
   - Handles 'unsupported encoding: none' error from GitHub API
   - Falls back to latin-1 encoding if UTF-8 fails
   - Logs warnings instead of crashing
   - Continues processing even if CHANGELOG has encoding issues

2. **Enhancement Flags Added:**
   - Added --enhance-local flag to github command
   - Added --enhance flag for API-based enhancement
   - Added --api-key flag for API authentication
   - Auto-enhancement after skill building when flags used
   - Matches doc_scraper.py functionality

**Test Results:**
-  All 612 tests passing (100% pass rate)
-  All 22 github_scraper tests passing
-  Backward compatible

**Usage:**
```bash
# Local enhancement (no API key needed)
skill-seekers github --repo ccxt/ccxt --name ccxtSkills --enhance-local

# API-based enhancement
skill-seekers github --repo owner/repo --enhance --api-key sk-ant-...
```
2025-12-28 18:21:03 +03:00
yusyus
65ded6c07c fix: Fix local repo extraction limitations (code analyzer, exclusions, enhancement)
This commit fixes three critical limitations discovered during local repository skill extraction testing:

**Fix 1: Code Analyzer Import Issue**
- Changed unified_scraper.py to use absolute imports instead of relative imports
- Fixed: `from github_scraper import` → `from skill_seekers.cli.github_scraper import`
- Fixed: `from pdf_scraper import` → `from skill_seekers.cli.pdf_scraper import`
- Result: CodeAnalyzer now available during extraction, deep analysis works

**Fix 2: Unity Library Exclusions**
- Updated should_exclude_dir() to accept and check full directory paths
- Updated _extract_file_tree_local() to pass both dir name and full path
- Added exclusion config passing from unified_scraper to github_scraper
- Result: exclude_dirs_additional now works (297 files excluded in test)

**Fix 3: AI Enhancement for Single Sources**
- Changed read_reference_files() to use rglob() for recursive search
- Now finds reference files in subdirectories (e.g., references/github/README.md)
- Result: AI enhancement works with unified skills that have nested references

**Test Results:**
- Code Analyzer:  Working (deep analysis running)
- Unity Exclusions:  Working (297 files excluded from 679)
- AI Enhancement:  Working (finds and reads nested references)

**Files Changed:**
- src/skill_seekers/cli/unified_scraper.py (Fix 1 & 2)
- src/skill_seekers/cli/github_scraper.py (Fix 2)
- src/skill_seekers/cli/utils.py (Fix 3)

**Test Artifacts:**
- configs/deck_deck_go_local.json (test configuration)
- docs/LOCAL_REPO_TEST_RESULTS.md (comprehensive test report)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 22:24:38 +03:00
yusyus
ea289cebe1 feat: Make EXCLUDED_DIRS configurable for local repository analysis
Closes #203

Adds configuration options to customize directory exclusions during local
repository analysis, while maintaining backward compatibility with smart
defaults.

**New Config Options:**

1. `exclude_dirs_additional` - Extend defaults (most common)
   - Adds custom directories to default exclusions
   - Example: ["proprietary", "legacy", "third_party"]
   - Total exclusions = defaults + additional

2. `exclude_dirs` - Replace defaults (advanced users)
   - Completely overrides default exclusions
   - Example: ["node_modules", ".git", "custom_vendor"]
   - Gives full control over exclusions

**Implementation:**

- Modified GitHubScraper.__init__() to parse exclude_dirs config
- Changed should_exclude_dir() to use instance variable instead of global
- Added logging for custom exclusions (INFO for extend, WARNING for replace)
- Maintains backward compatibility (no config = use defaults)

**Testing:**

- Added 12 comprehensive tests in test_excluded_dirs_config.py
  - 3 tests for defaults (backward compatibility)
  - 3 tests for extend mode
  - 3 tests for replace mode
  - 1 test for precedence
  - 2 tests for edge cases
- All 12 new tests passing 
- All 22 existing github_scraper tests passing 

**Documentation:**

- Updated CLAUDE.md config parameters section
- Added detailed "Configurable Directory Exclusions" feature section
- Included examples for both modes
- Listed common use cases (monorepos, enterprise, legacy codebases)

**Use Cases:**

- Monorepos with custom directory structures
- Enterprise projects with non-standard naming conventions
- Including unusual directories for analysis
- Minimal exclusions for small/simple projects

**Backward Compatibility:**

 Fully backward compatible - existing configs work unchanged
 Smart defaults maintained when no config provided
 All existing tests pass

Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
2025-11-29 23:53:27 +03:00
yusyus
58ec69eb52 feat: Add unlimited local repository analysis with bug fixes (PR #195)
Merges PR #195 by @jimmy058910 with conflict resolution.

**New Features:**
- Local repository analysis via `local_repo_path` configuration
- Bypass GitHub API rate limits (50 → unlimited files)
- Auto-exclusion of virtual environments and build artifacts
- Support for analyzing large codebases (323 files vs 50 before)

**Improvements:**
- Code analysis coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST parsing errors: 95 → 0 (-100%)

**Conflict Resolution:**
- Preserved logger initialization fix from development (Issue #190)
- Kept relative imports from development (Task 1.2 fix)
- Integrated EXCLUDED_DIRS and local repo features from PR
- Combined best of both implementations

**Testing:**
-  All 22 GitHub scraper tests passing
-  Syntax validation passed
-  Local repo analysis feature intact
-  Bug fixes from development preserved

Original implementation by @jimmy058910 in PR #195.
Conflict resolution preserves all bug fixes while adding local repo feature.

Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
2025-11-29 22:46:31 +03:00
yusyus
414519b3c7 fix: Initialize logger before use in github_scraper.py
Fixes Issue #190 - "name 'logger' is not defined" error

**Problem:**
- Logger was used at line 40 (in code_analyzer import exception)
- Logger was defined at line 47
- Caused runtime error when code_analyzer import failed

**Solution:**
- Moved logging.basicConfig() and logger initialization to lines 34-39
- Now logger is defined BEFORE the code_analyzer import block
- Warning message now works correctly when code_analyzer is missing

**Testing:**
-  All 22 GitHub scraper tests pass
-  Logger warning appears correctly when code_analyzer missing
-  No similar issues found in other CLI files

Closes #190
2025-11-29 22:01:38 +03:00
yusyus
d7a4c51427 fix: Convert absolute imports to relative imports in cli modules
Fixes #193 - PDF scraping broken for PyPI users

Changed 3 files from absolute to relative imports to fix
ModuleNotFoundError when package is installed via pip:

1. pdf_scraper.py:22
   - from pdf_extractor_poc import → from .pdf_extractor_poc import
   - Fixes: skill-seekers pdf command failed with import error

2. github_scraper.py:36
   - from code_analyzer import → from .code_analyzer import
   - Proactive fix: prevents future import errors

3. test_unified_simple.py:17
   - from config_validator import → from .config_validator import
   - Proactive fix: test helper file

These absolute imports worked locally due to sys.path differences
but failed when installed via PyPI (pip install skill-seekers).

Tested with:
- skill-seekers pdf command now works 
- Extracted 32-page Godot Farming PDF successfully

All CLI commands should now work correctly when installed from PyPI.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 21:47:18 +03:00
Jimmy Moceri
0b2a0d121e feat: Add unlimited local repository analysis and fix 10 critical bugs
Features:
- Add local_repo_path config parameter for unlimited file analysis
- Auto-exclude virtual environments and build artifacts (95% noise reduction)
- Enable comprehensive codebase analysis (50 → 323 files, 546% increase)

Bug Fixes:
- Fix logger initialization error (Issue #190)
- Fix NoneType subscriptable errors in release tag parsing (3 instances)
- Fix relative import paths causing ModuleNotFoundError
- Fix hardcoded 50-file analysis limit
- Fix GitHub API file tree limitation (140 → 345 files discovered)
- Fix AST parser 'not iterable' errors (95 → 0 parsing failures)
- Fix virtual environment file pollution (23,341 → 1,109 file tree items)
- Fix force_rescrape flag not checked before interactive prompt

Impact:
- Code coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST errors: 95 → 0 (-100%)

Tested on JMo Security repository with 345 Python files.
2025-11-16 22:35:23 -05:00
yusyus
13ca374295 refactor: Update CLI commands to use new unified entry points
Updated all command examples in CLI scripts from old pattern:
  python3 cli/<script>.py → skill-seekers <command>

Changes:
- doc_scraper.py → skill-seekers scrape
- github_scraper.py → skill-seekers github
- pdf_scraper.py → skill-seekers pdf
- unified_scraper.py → skill-seekers unified
- enhance_skill.py → skill-seekers enhance
- enhance_skill_local.py → skill-seekers enhance
- package_skill.py → skill-seekers package
- estimate_pages.py → skill-seekers estimate

This reflects the new modern Python packaging with proper entry
points. Users can now use clean commands instead of file paths.

Files updated: 10 CLI scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:23:17 +03:00
yusyus
ce1c07b437 feat: Add modern Python packaging - Phase 1 (Foundation)
Implements issue #168 - Modern Python packaging with uv support

This is Phase 1 of the modernization effort, establishing the core
package structure and build system.

## Major Changes

### 1. Migrated to src/ Layout
- Moved cli/ → src/skill_seekers/cli/
- Moved skill_seeker_mcp/ → src/skill_seekers/mcp/
- Created root package: src/skill_seekers/__init__.py
- Updated all imports: cli. → skill_seekers.cli.
- Updated all imports: skill_seeker_mcp. → skill_seekers.mcp.

### 2. Created pyproject.toml
- Modern Python packaging configuration
- All dependencies properly declared
- 8 CLI entry points configured:
  * skill-seekers (unified CLI)
  * skill-seekers-scrape
  * skill-seekers-github
  * skill-seekers-pdf
  * skill-seekers-unified
  * skill-seekers-enhance
  * skill-seekers-package
  * skill-seekers-upload
  * skill-seekers-estimate
- uv tool support enabled
- Build system: setuptools with wheel

### 3. Created Unified CLI (main.py)
- Git-style subcommands (skill-seekers scrape, etc.)
- Delegates to existing tool main() functions
- Full help system at top-level and subcommand level
- Backwards compatible with individual commands

### 4. Updated Package Versions
- cli/__init__.py: 1.3.0 → 2.0.0
- mcp/__init__.py: 1.2.0 → 2.0.0
- Root package: 2.0.0

### 5. Updated Test Suite
- Fixed test_package_structure.py for new layout
- All 28 package structure tests passing
- Updated all test imports for new structure

## Installation Methods (Working)

```bash
# Development install
pip install -e .

# Run unified CLI
skill-seekers --version  # → 2.0.0
skill-seekers --help

# Run individual tools
skill-seekers-scrape --help
skill-seekers-github --help
```

## Test Results
- Package structure tests: 28/28 passing 
- Package installs successfully 
- All entry points working 

## Still TODO (Phase 2)
- [ ] Run full test suite (299 tests)
- [ ] Update documentation (README, CLAUDE.md, etc.)
- [ ] Test with uv tool run/install
- [ ] Build and publish to PyPI
- [ ] Create PR and merge

## Breaking Changes
None - fully backwards compatible. Old import paths still work.

## Migration for Users
No action needed. Package works with both pip and uv.

Closes #168 (when complete)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:14:24 +03:00