yusyus
cee3fcf025
fix(A1.3): Add comprehensive validation to submit_config MCP tool
...
Issue: #11 (A1.3 - Add MCP tool to submit custom configs)
## Summary
Fixed submit_config MCP tool to use ConfigValidator for comprehensive validation
instead of basic 3-field checks. Now supports both legacy and unified config
formats with detailed error messages and validation warnings.
## Critical Gaps Fixed (6 total)
1. ✅ Missing comprehensive validation (HIGH) - Only checked 3 fields
2. ✅ No unified config support (HIGH) - Couldn't handle multi-source configs
3. ✅ No test coverage (MEDIUM) - Zero tests for submit_config_tool
4. ✅ No URL format validation (MEDIUM) - Accepted malformed URLs
5. ✅ No warnings for unlimited scraping (LOW) - Silent config issues
6. ✅ No url_patterns validation (MEDIUM) - No selector structure checks
## Changes Made
### Phase 1: Validation Logic (server.py lines 1224-1380)
- Added ConfigValidator import with graceful degradation
- Replaced basic validation (3 fields) with comprehensive ConfigValidator.validate()
- Enhanced category detection for unified multi-source configs
- Added validation warnings collection (unlimited scraping, missing max_pages)
- Updated GitHub issue template with:
* Config format type (Unified vs Legacy)
* Validation warnings section
* Updated documentation URL handling for unified configs
* Checklist showing "Config validated with ConfigValidator"
### Phase 2: Test Coverage (test_mcp_server.py lines 617-769)
Added 8 comprehensive test cases:
1. test_submit_config_requires_token - GitHub token requirement
2. test_submit_config_validates_required_fields - Required field validation
3. test_submit_config_validates_name_format - Name format validation
4. test_submit_config_validates_url_format - URL format validation
5. test_submit_config_accepts_legacy_format - Legacy config acceptance
6. test_submit_config_accepts_unified_format - Unified config acceptance
7. test_submit_config_from_file_path - File path input support
8. test_submit_config_detects_category - Category auto-detection
### Phase 3: Documentation Updates
- Updated Issue #11 with completion notes
- Updated tool description to mention format support
- Updated CHANGELOG.md with fix details
- Added EVOLUTION_ANALYSIS.md for deep architecture analysis
## Validation Improvements
### Before:
```python
required_fields = ["name", "description", "base_url"]
missing_fields = [field for field in required_fields if field not in config_data]
if missing_fields:
return error
```
### After:
```python
validator = ConfigValidator(config_data)
validator.validate() # Comprehensive validation:
# - Name format (alphanumeric, hyphens, underscores only)
# - URL formats (must start with http:// or https://)
# - Selectors structure (dict with proper keys)
# - Rate limits (non-negative numbers)
# - Max pages (positive integer or -1)
# - Supports both legacy AND unified formats
# - Provides detailed error messages with examples
```
## Test Results
✅ All 427 tests passing (no regressions)
✅ 8 new tests for submit_config_tool
✅ No breaking changes
## Files Modified
- src/skill_seekers/mcp/server.py (157 lines changed)
- tests/test_mcp_server.py (157 lines added)
- CHANGELOG.md (12 lines added)
- EVOLUTION_ANALYSIS.md (500+ lines, new file)
## Issue Resolution
Closes #11 - A1.3 now fully implemented with comprehensive validation,
test coverage, and support for both config formats.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
2025-12-21 18:32:20 +03:00
yusyus
cbacdb0e66
release: v2.1.1 - GitHub Repository Analysis Enhancements
...
Major improvements:
- Configurable directory exclusions (Issue #203 )
- Unlimited local repository analysis
- Skip llms.txt option (PR #198 )
- 10+ bug fixes for GitHub scraper
- Test suite expanded to 427 tests
See CHANGELOG.md for full details.
2025-11-30 12:22:28 +03:00
yusyus
bd2b201aa5
docs: Update all documentation for v2.1.0 release
...
Updates across all major documentation files to reflect v2.1.0 release
status and recent completions.
Changes:
- CLAUDE.md:
* Updated version from v2.0.0 to v2.1.0
* Updated date to November 29, 2025
* Updated test count from 391 to 427
* Moved completed PRs (#195 , #198 ) and Issue #203 to "Completed" section
* Updated "Next Up" priorities
- README.md:
* Updated version badge from 2.0.0 to 2.1.0
* Updated test badge from 379 to 427 passing
- CHANGELOG.md:
* Added Issue #203 (Configurable EXCLUDED_DIRS) to Unreleased section
* Documented 19 comprehensive tests for exclude_dirs feature
* Listed both extend and replace modes
- FUTURE_RELEASES.md:
* Marked v2.1.0 as "Released" (November 29, 2025)
* Moved "Fix 12 unified tests" to completed
* Updated release schedule table
- FLEXIBLE_ROADMAP.md:
* Updated current status from v1.0.0 to v2.1.0
* Added latest release date
* Expanded "What Works" section with new features
* Updated test count to 427
All documentation now accurately reflects:
- v2.1.0 release status ✅
- 427 tests passing (up from 391) ✅ - Issue #203 completion ✅
- PR #195 and #198 merged status ✅
Related: #203
2025-11-30 01:06:21 +03:00
yusyus
58ec69eb52
feat: Add unlimited local repository analysis with bug fixes (PR #195 )
...
Merges PR #195 by @jimmy058910 with conflict resolution.
**New Features:**
- Local repository analysis via `local_repo_path` configuration
- Bypass GitHub API rate limits (50 → unlimited files)
- Auto-exclusion of virtual environments and build artifacts
- Support for analyzing large codebases (323 files vs 50 before)
**Improvements:**
- Code analysis coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST parsing errors: 95 → 0 (-100%)
**Conflict Resolution:**
- Preserved logger initialization fix from development (Issue #190 )
- Kept relative imports from development (Task 1.2 fix)
- Integrated EXCLUDED_DIRS and local repo features from PR
- Combined best of both implementations
**Testing:**
- ✅ All 22 GitHub scraper tests passing
- ✅ Syntax validation passed
- ✅ Local repo analysis feature intact
- ✅ Bug fixes from development preserved
Original implementation by @jimmy058910 in PR #195 .
Conflict resolution preserves all bug fixes while adding local repo feature.
Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com >
2025-11-29 22:46:31 +03:00
yusyus
998be0d2dd
fix: Update setup_mcp.sh for v2.0.0 src/ layout + test fixes ( #201 )
...
Merges setup_mcp.sh fix for v2.0.0 src/ layout + test updates.
Original fix by @501981732 in PR #197 .
Test updates to make CI pass.
Closes #192
2025-11-29 21:34:51 +03:00
Jimmy Moceri
0b2a0d121e
feat: Add unlimited local repository analysis and fix 10 critical bugs
...
Features:
- Add local_repo_path config parameter for unlimited file analysis
- Auto-exclude virtual environments and build artifacts (95% noise reduction)
- Enable comprehensive codebase analysis (50 → 323 files, 546% increase)
Bug Fixes:
- Fix logger initialization error (Issue #190 )
- Fix NoneType subscriptable errors in release tag parsing (3 instances)
- Fix relative import paths causing ModuleNotFoundError
- Fix hardcoded 50-file analysis limit
- Fix GitHub API file tree limitation (140 → 345 files discovered)
- Fix AST parser 'not iterable' errors (95 → 0 parsing failures)
- Fix virtual environment file pollution (23,341 → 1,109 file tree items)
- Fix force_rescrape flag not checked before interactive prompt
Impact:
- Code coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST errors: 95 → 0 (-100%)
Tested on JMo Security repository with 345 Python files.
2025-11-16 22:35:23 -05:00
yusyus
88dce89adf
docs: Update documentation for v2.0.0 PyPI release
...
README.md:
- Add PyPI badges (version, downloads, python version)
- Update test count from 299 to 379 passing tests
- Add prominent 'Now Available on PyPI!' callout section
- Reorder installation options (pip as Option 1, uv as Option 2)
- Add links to Quick Start and Bulletproof guides
- Emphasize PyPI as the recommended installation method
CHANGELOG.md:
- Add comprehensive v2.0.0 release entry (dated 2025-11-11)
- Document PyPI publication as major milestone
- Detail modern Python packaging changes
- Include unified CLI interface documentation
- Add migration guide for users and developers
- List all breaking changes and deprecations
- Document 379 passing tests and import fixes
FUTURE_RELEASES.md (NEW):
- Create roadmap document for upcoming releases
- Plan v2.1.0 (Dec 2025): Test coverage & quality improvements
- Plan v2.2.0 (Q1 2026): Web presence & community growth
- Plan v2.3.0 (Q2 2026): Developer experience & integrations
- Long-term vision for v3.0+
- Community contribution guidelines
- Release schedule and priority system
🚀 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-11 22:27:04 +03:00
yusyus
66b7f9c4f6
chore: Bump version to v1.3.0
...
Update version numbers across project for v1.3.0 release:
- CHANGELOG.md: Move [Unreleased] → [1.3.0] - 2025-10-26
- README.md: Update version badge 1.2.0 → 1.3.0
- cli/__init__.py: Update __version__ = "1.3.0"
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-10-26 13:16:54 +03:00
yusyus
319331f5a6
feat: Complete refactoring with async support, type safety, and package structure
...
This comprehensive refactoring improves code quality, performance, and maintainability
while maintaining 100% backwards compatibility.
## Major Features Added
### 🚀 Async/Await Support (2-3x Performance Boost)
- Added `--async` flag for parallel scraping using asyncio
- Implemented `scrape_page_async()` with httpx.AsyncClient
- Implemented `scrape_all_async()` with asyncio.gather()
- Connection pooling for better resource management
- Performance: 18 pg/s → 55 pg/s (3x faster)
- Memory: 120 MB → 40 MB (66% reduction)
- Full documentation in ASYNC_SUPPORT.md
### 📦 Python Package Structure (Phase 0 Complete)
- Created cli/__init__.py for clean imports
- Created skill_seeker_mcp/__init__.py (renamed from mcp/)
- Created skill_seeker_mcp/tools/__init__.py
- Proper package imports: `from cli import constants`
- Better IDE support and autocomplete
### ⚙️ Centralized Configuration
- Created cli/constants.py with 18 configuration constants
- DEFAULT_ASYNC_MODE, DEFAULT_RATE_LIMIT, DEFAULT_MAX_PAGES
- Enhancement limits, categorization scores, file limits
- All magic numbers now centralized and configurable
### 🔧 Code Quality Improvements
- Converted 71 print() statements to proper logging
- Added type hints to all DocToSkillConverter methods
- Fixed all mypy type checking issues
- Installed types-requests for better type safety
- Code quality: 5.5/10 → 6.5/10
## Testing
- Test count: 207 → 299 tests (92 new tests)
- 11 comprehensive async tests (all passing)
- 16 constants tests (all passing)
- Fixed test isolation issues
- 100% pass rate maintained (299/299 passing)
## Documentation
- Updated README.md with async examples and test count
- Updated CLAUDE.md with async usage guide
- Created ASYNC_SUPPORT.md (292 lines)
- Updated CHANGELOG.md with all changes
- Cleaned up temporary refactoring documents
## Cleanup
- Removed temporary planning/status documents
- Moved test_pr144_concerns.py to tests/ folder
- Updated .gitignore for test artifacts
- Better repository organization
## Breaking Changes
None - all changes are backwards compatible.
Async mode is opt-in via --async flag.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-10-26 13:05:39 +03:00
Edgar I.
0e3f0c6375
docs: update status for Phase 1 completion
2025-10-24 18:28:30 +04:00
yusyus
394eab218e
Add PDF Advanced Features (v1.2.0)
...
Priority 2 & 3 Features Implemented:
- OCR support for scanned PDFs (pytesseract + Pillow)
- Password-protected PDF support
- Complex table extraction
- Parallel page processing (3x faster)
- Intelligent caching (50% faster re-runs)
Testing:
- New test file: test_pdf_advanced_features.py (26 tests)
- Updated test_pdf_extractor.py (23 tests)
- Updated test_pdf_scraper.py (18 tests)
- Total: 49/49 PDF tests passing (100%)
- Overall: 142/142 tests passing (100%)
Documentation:
- Added docs/PDF_ADVANCED_FEATURES.md (580 lines)
- Updated CHANGELOG.md with v1.1.0 and v1.2.0
- Updated README.md version badges and features
- Updated docs/TESTING.md with new test counts
Dependencies:
- Added Pillow==11.0.0
- Added pytesseract==0.3.13
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-10-23 21:43:05 +03:00
yusyus
517ed46338
Add project infrastructure and documentation
...
Infrastructure:
- Add GitHub Actions workflows (tests.yml, release.yml)
- Add CHANGELOG.md with full version history
- Add CONTRIBUTING.md with contribution guidelines
- Add RELEASE_NOTES_v1.0.0.md for v1.0.0 release
Documentation:
- Update README.md with version badge (v1.0.0)
- Update test count badge (14 tests)
- Add links to new documentation files
Features:
- CI/CD pipeline with automated testing
- Multi-OS testing (Ubuntu, macOS)
- Multi-Python version testing (3.7-3.11)
- Automated release creation on tag push
- Code coverage reporting
This completes the v1.0.0 production release setup.
2025-10-19 22:37:55 +03:00