Commit Graph

115 Commits

Author SHA1 Message Date
yusyus
cbacdb0e66 release: v2.1.1 - GitHub Repository Analysis Enhancements
Major improvements:
- Configurable directory exclusions (Issue #203)
- Unlimited local repository analysis
- Skip llms.txt option (PR #198)
- 10+ bug fixes for GitHub scraper
- Test suite expanded to 427 tests

See CHANGELOG.md for full details.
2025-11-30 12:22:28 +03:00
yusyus
ea289cebe1 feat: Make EXCLUDED_DIRS configurable for local repository analysis
Closes #203

Adds configuration options to customize directory exclusions during local
repository analysis, while maintaining backward compatibility with smart
defaults.

**New Config Options:**

1. `exclude_dirs_additional` - Extend defaults (most common)
   - Adds custom directories to default exclusions
   - Example: ["proprietary", "legacy", "third_party"]
   - Total exclusions = defaults + additional

2. `exclude_dirs` - Replace defaults (advanced users)
   - Completely overrides default exclusions
   - Example: ["node_modules", ".git", "custom_vendor"]
   - Gives full control over exclusions

**Implementation:**

- Modified GitHubScraper.__init__() to parse exclude_dirs config
- Changed should_exclude_dir() to use instance variable instead of global
- Added logging for custom exclusions (INFO for extend, WARNING for replace)
- Maintains backward compatibility (no config = use defaults)

**Testing:**

- Added 12 comprehensive tests in test_excluded_dirs_config.py
  - 3 tests for defaults (backward compatibility)
  - 3 tests for extend mode
  - 3 tests for replace mode
  - 1 test for precedence
  - 2 tests for edge cases
- All 12 new tests passing 
- All 22 existing github_scraper tests passing 

**Documentation:**

- Updated CLAUDE.md config parameters section
- Added detailed "Configurable Directory Exclusions" feature section
- Included examples for both modes
- Listed common use cases (monorepos, enterprise, legacy codebases)

**Use Cases:**

- Monorepos with custom directory structures
- Enterprise projects with non-standard naming conventions
- Including unusual directories for analysis
- Minimal exclusions for small/simple projects

**Backward Compatibility:**

 Fully backward compatible - existing configs work unchanged
 Smart defaults maintained when no config provided
 All existing tests pass

Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
2025-11-29 23:53:27 +03:00
yusyus
bd20b32470 Merge PR #198: Skip llms.txt Config Option
Merges feat/add-skip-llm-to-config by @sogoiii.

This PR adds a valuable configuration option to explicitly skip llms.txt
detection, useful when a site's llms.txt is incomplete, incorrect, or when
specific HTML scraping is needed.

Key features:
- New 'skip_llms_txt' config option (default: false, backward compatible)
- Boolean type validation with warning for invalid values
- Support in both sync and async scraping modes
- 17 comprehensive tests (15 feature tests + 2 config validation tests)

All tests passing after fixing import paths to use proper package names.

Test results:  17/17 tests passing
Full test suite:  391 tests passing

Co-authored-by: sogoiii <sogoiii@users.noreply.github.com>
2025-11-29 22:56:46 +03:00
yusyus
58ec69eb52 feat: Add unlimited local repository analysis with bug fixes (PR #195)
Merges PR #195 by @jimmy058910 with conflict resolution.

**New Features:**
- Local repository analysis via `local_repo_path` configuration
- Bypass GitHub API rate limits (50 → unlimited files)
- Auto-exclusion of virtual environments and build artifacts
- Support for analyzing large codebases (323 files vs 50 before)

**Improvements:**
- Code analysis coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST parsing errors: 95 → 0 (-100%)

**Conflict Resolution:**
- Preserved logger initialization fix from development (Issue #190)
- Kept relative imports from development (Task 1.2 fix)
- Integrated EXCLUDED_DIRS and local repo features from PR
- Combined best of both implementations

**Testing:**
-  All 22 GitHub scraper tests passing
-  Syntax validation passed
-  Local repo analysis feature intact
-  Bug fixes from development preserved

Original implementation by @jimmy058910 in PR #195.
Conflict resolution preserves all bug fixes while adding local repo feature.

Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
2025-11-29 22:46:31 +03:00
yusyus
414519b3c7 fix: Initialize logger before use in github_scraper.py
Fixes Issue #190 - "name 'logger' is not defined" error

**Problem:**
- Logger was used at line 40 (in code_analyzer import exception)
- Logger was defined at line 47
- Caused runtime error when code_analyzer import failed

**Solution:**
- Moved logging.basicConfig() and logger initialization to lines 34-39
- Now logger is defined BEFORE the code_analyzer import block
- Warning message now works correctly when code_analyzer is missing

**Testing:**
-  All 22 GitHub scraper tests pass
-  Logger warning appears correctly when code_analyzer missing
-  No similar issues found in other CLI files

Closes #190
2025-11-29 22:01:38 +03:00
yusyus
d7a4c51427 fix: Convert absolute imports to relative imports in cli modules
Fixes #193 - PDF scraping broken for PyPI users

Changed 3 files from absolute to relative imports to fix
ModuleNotFoundError when package is installed via pip:

1. pdf_scraper.py:22
   - from pdf_extractor_poc import → from .pdf_extractor_poc import
   - Fixes: skill-seekers pdf command failed with import error

2. github_scraper.py:36
   - from code_analyzer import → from .code_analyzer import
   - Proactive fix: prevents future import errors

3. test_unified_simple.py:17
   - from config_validator import → from .config_validator import
   - Proactive fix: test helper file

These absolute imports worked locally due to sys.path differences
but failed when installed via PyPI (pip install skill-seekers).

Tested with:
- skill-seekers pdf command now works 
- Extracted 32-page Godot Farming PDF successfully

All CLI commands should now work correctly when installed from PyPI.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 21:47:18 +03:00
sogoiii
a0b1c2f42f feat: add skip_llms_txt config option to bypass llms.txt detection
- Add skip_llms_txt config option (default: False)
- Validate value is boolean, warn and default to False if not
- Support in both sync and async scraping modes
- Add 17 tests for config, behavior, and edge cases
2025-11-20 13:55:46 -08:00
Jimmy Moceri
0b2a0d121e feat: Add unlimited local repository analysis and fix 10 critical bugs
Features:
- Add local_repo_path config parameter for unlimited file analysis
- Auto-exclude virtual environments and build artifacts (95% noise reduction)
- Enable comprehensive codebase analysis (50 → 323 files, 546% increase)

Bug Fixes:
- Fix logger initialization error (Issue #190)
- Fix NoneType subscriptable errors in release tag parsing (3 instances)
- Fix relative import paths causing ModuleNotFoundError
- Fix hardcoded 50-file analysis limit
- Fix GitHub API file tree limitation (140 → 345 files discovered)
- Fix AST parser 'not iterable' errors (95 → 0 parsing failures)
- Fix virtual environment file pollution (23,341 → 1,109 file tree items)
- Fix force_rescrape flag not checked before interactive prompt

Impact:
- Code coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST errors: 95 → 0 (-100%)

Tested on JMo Security repository with 345 Python files.
2025-11-16 22:35:23 -05:00
yusyus
befcb898e3 fix: Skip quality checks in MCP context to prevent stdin errors
The MCP server's package_skill_tool was failing in CI because the quality
checker was prompting for user input, which doesn't exist in CI/MCP contexts.

Fix:
- Add --skip-quality-check flag to package_skill command in MCP server
- This prevents interactive prompts that cause EOFError in CI
- MCP tools should skip interactive checks since they run in background

Impact:
- All 25 MCP server tests now pass
- All 391 tests passing
- CI builds will succeed

Context:
- Quality checks are interactive by default for CLI users
- MCP server runs commands programmatically without user input
- This is the correct behavior: interactive for CLI, non-interactive for MCP

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 23:16:28 +03:00
yusyus
3272f9c59d feat: Add comprehensive quality checker for skills
Phase 2 & 3: Quality assurance before packaging

New module: quality_checker.py
- Enhancement verification (checks for template text, code examples, sections)
- Structure validation (SKILL.md, references/ directory)
- Content quality checks (YAML frontmatter, language tags, "When to Use" section)
- Link validation (internal markdown links)
- Quality scoring system (0-100 score + A-F grade)
- Detailed reporting with errors, warnings, and info messages
- CLI with --verbose and --strict modes

Integration in package_skill.py:
- Automatic quality checks before packaging
- Display quality report with score and grade
- Ask user to confirm if warnings/errors found
- Add --skip-quality-check flag to bypass checks
- Updated help examples

Benefits:
- Catch quality issues before packaging
- Ensure SKILL.md is properly enhanced
- Validate all links work
- Give users confidence in skill quality
- Comprehensive quality reports

Addresses user request: "check some sort of quality check at the end
like is links working, skill is good etc and give report the user"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 23:01:28 +03:00
yusyus
e279ed6ca8 fix: Enhancement race condition - add headless mode that waits
Phase 1: Fix race condition where main console exits before enhancement completes

Changes to enhance_skill_local.py:
- Add headless mode (default) using subprocess.run() which WAITS for completion
- Add timeout protection (default 10 minutes, configurable)
- Verify SKILL.md was actually updated (check mtime and size)
- Add --interactive-enhancement flag to use old terminal mode
- Detailed progress messages and error handling
- Clean up temp files after completion

Changes to doc_scraper.py:
- Use skill-seekers-enhance entry point instead of direct python path
- Pass --interactive-enhancement flag through if requested
- Update help text to reflect new headless default behavior
- Show proper status messages (HEADLESS vs INTERACTIVE)

Benefits:
- Main console now waits for enhancement to complete
- No more "Package your skill" message while enhancement is running
- Timeout prevents infinite hangs
- Terminal mode still available for users who want it
- Better error messages and progress tracking

Fixes user request: "make sure 1. console wait for it to finish"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 22:53:01 +03:00
yusyus
4cbd0a0a3c fix: Add anthropic-beta header and correct field name for skill uploads
Fixes #182

Changes:
- Add 'anthropic-beta: skills-2025-10-02' header (required by Anthropic Skills API)
- Change multipart field name from 'skill' to 'files[]' (correct API format)

Without these fixes, all upload attempts returned 404 errors.

Verified:
- All 379 tests passing (100%)
- No regressions in test suite
- Upload functionality corrected per API requirements

Co-authored-by: Straughter "BatmanOsama" Guthrie <straughterguthrie@gmail.com>
Original PR: #183
2025-11-11 23:39:56 +03:00
yusyus
530a68d1dc fix: Update test imports and merge_sources for v2.0.0 release
- Fix conflict_detector import in merge_sources.py (use relative import)
- Update test_mcp_server.py to use skill_seekers.mcp.server imports
- Fix @patch decorators to reference full module path
- Add MCP_AVAILABLE guards to test_unified_mcp_integration.py
- Add proper skipif decorators for MCP tests
- All 379 tests now passing (0 failures)

Resolves import errors that occurred during PyPI package testing.
2025-11-11 22:26:52 +03:00
yusyus
13ca374295 refactor: Update CLI commands to use new unified entry points
Updated all command examples in CLI scripts from old pattern:
  python3 cli/<script>.py → skill-seekers <command>

Changes:
- doc_scraper.py → skill-seekers scrape
- github_scraper.py → skill-seekers github
- pdf_scraper.py → skill-seekers pdf
- unified_scraper.py → skill-seekers unified
- enhance_skill.py → skill-seekers enhance
- enhance_skill_local.py → skill-seekers enhance
- package_skill.py → skill-seekers package
- estimate_pages.py → skill-seekers estimate

This reflects the new modern Python packaging with proper entry
points. Users can now use clean commands instead of file paths.

Files updated: 10 CLI scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:23:17 +03:00
yusyus
ce1c07b437 feat: Add modern Python packaging - Phase 1 (Foundation)
Implements issue #168 - Modern Python packaging with uv support

This is Phase 1 of the modernization effort, establishing the core
package structure and build system.

## Major Changes

### 1. Migrated to src/ Layout
- Moved cli/ → src/skill_seekers/cli/
- Moved skill_seeker_mcp/ → src/skill_seekers/mcp/
- Created root package: src/skill_seekers/__init__.py
- Updated all imports: cli. → skill_seekers.cli.
- Updated all imports: skill_seeker_mcp. → skill_seekers.mcp.

### 2. Created pyproject.toml
- Modern Python packaging configuration
- All dependencies properly declared
- 8 CLI entry points configured:
  * skill-seekers (unified CLI)
  * skill-seekers-scrape
  * skill-seekers-github
  * skill-seekers-pdf
  * skill-seekers-unified
  * skill-seekers-enhance
  * skill-seekers-package
  * skill-seekers-upload
  * skill-seekers-estimate
- uv tool support enabled
- Build system: setuptools with wheel

### 3. Created Unified CLI (main.py)
- Git-style subcommands (skill-seekers scrape, etc.)
- Delegates to existing tool main() functions
- Full help system at top-level and subcommand level
- Backwards compatible with individual commands

### 4. Updated Package Versions
- cli/__init__.py: 1.3.0 → 2.0.0
- mcp/__init__.py: 1.2.0 → 2.0.0
- Root package: 2.0.0

### 5. Updated Test Suite
- Fixed test_package_structure.py for new layout
- All 28 package structure tests passing
- Updated all test imports for new structure

## Installation Methods (Working)

```bash
# Development install
pip install -e .

# Run unified CLI
skill-seekers --version  # → 2.0.0
skill-seekers --help

# Run individual tools
skill-seekers-scrape --help
skill-seekers-github --help
```

## Test Results
- Package structure tests: 28/28 passing 
- Package installs successfully 
- All entry points working 

## Still TODO (Phase 2)
- [ ] Run full test suite (299 tests)
- [ ] Update documentation (README, CLAUDE.md, etc.)
- [ ] Test with uv tool run/install
- [ ] Build and publish to PyPI
- [ ] Create PR and merge

## Breaking Changes
None - fully backwards compatible. Old import paths still work.

## Migration for Users
No action needed. Package works with both pip and uv.

Closes #168 (when complete)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:14:24 +03:00