Commit Graph

13 Commits

Author SHA1 Message Date
yusyus
70ca1d9ba6 docs(A1.9): Add comprehensive git source documentation and example repository
Phase 4 Complete:
- Updated README.md with git source usage examples and use cases
- Created docs/GIT_CONFIG_SOURCES.md (800+ lines comprehensive guide)
- Updated CHANGELOG.md with v2.2.0 release notes
- Added configs/example-team/ example repository with E2E test

Documentation covers:
- Quick start and architecture
- MCP tools reference (4 tools with examples)
- Authentication for GitHub, GitLab, Bitbucket
- Use cases (small teams, enterprise, open source)
- Best practices, troubleshooting, advanced topics
- Complete API reference

Example repository includes:
- 3 example configs (react-custom, vue-internal, company-api)
- README with usage guide
- E2E test script (7 steps, 100% passing)

🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 19:38:26 +03:00
yusyus
cee3fcf025 fix(A1.3): Add comprehensive validation to submit_config MCP tool
Issue: #11 (A1.3 - Add MCP tool to submit custom configs)

## Summary
Fixed submit_config MCP tool to use ConfigValidator for comprehensive validation
instead of basic 3-field checks. Now supports both legacy and unified config
formats with detailed error messages and validation warnings.

## Critical Gaps Fixed (6 total)
1.  Missing comprehensive validation (HIGH) - Only checked 3 fields
2.  No unified config support (HIGH) - Couldn't handle multi-source configs
3.  No test coverage (MEDIUM) - Zero tests for submit_config_tool
4.  No URL format validation (MEDIUM) - Accepted malformed URLs
5.  No warnings for unlimited scraping (LOW) - Silent config issues
6.  No url_patterns validation (MEDIUM) - No selector structure checks

## Changes Made

### Phase 1: Validation Logic (server.py lines 1224-1380)
- Added ConfigValidator import with graceful degradation
- Replaced basic validation (3 fields) with comprehensive ConfigValidator.validate()
- Enhanced category detection for unified multi-source configs
- Added validation warnings collection (unlimited scraping, missing max_pages)
- Updated GitHub issue template with:
  * Config format type (Unified vs Legacy)
  * Validation warnings section
  * Updated documentation URL handling for unified configs
  * Checklist showing "Config validated with ConfigValidator"

### Phase 2: Test Coverage (test_mcp_server.py lines 617-769)
Added 8 comprehensive test cases:
1. test_submit_config_requires_token - GitHub token requirement
2. test_submit_config_validates_required_fields - Required field validation
3. test_submit_config_validates_name_format - Name format validation
4. test_submit_config_validates_url_format - URL format validation
5. test_submit_config_accepts_legacy_format - Legacy config acceptance
6. test_submit_config_accepts_unified_format - Unified config acceptance
7. test_submit_config_from_file_path - File path input support
8. test_submit_config_detects_category - Category auto-detection

### Phase 3: Documentation Updates
- Updated Issue #11 with completion notes
- Updated tool description to mention format support
- Updated CHANGELOG.md with fix details
- Added EVOLUTION_ANALYSIS.md for deep architecture analysis

## Validation Improvements

### Before:
```python
required_fields = ["name", "description", "base_url"]
missing_fields = [field for field in required_fields if field not in config_data]
if missing_fields:
    return error
```

### After:
```python
validator = ConfigValidator(config_data)
validator.validate()  # Comprehensive validation:
  # - Name format (alphanumeric, hyphens, underscores only)
  # - URL formats (must start with http:// or https://)
  # - Selectors structure (dict with proper keys)
  # - Rate limits (non-negative numbers)
  # - Max pages (positive integer or -1)
  # - Supports both legacy AND unified formats
  # - Provides detailed error messages with examples
```

## Test Results
 All 427 tests passing (no regressions)
 8 new tests for submit_config_tool
 No breaking changes

## Files Modified
- src/skill_seekers/mcp/server.py (157 lines changed)
- tests/test_mcp_server.py (157 lines added)
- CHANGELOG.md (12 lines added)
- EVOLUTION_ANALYSIS.md (500+ lines, new file)

## Issue Resolution
Closes #11 - A1.3 now fully implemented with comprehensive validation,
test coverage, and support for both config formats.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 18:32:20 +03:00
yusyus
cbacdb0e66 release: v2.1.1 - GitHub Repository Analysis Enhancements
Major improvements:
- Configurable directory exclusions (Issue #203)
- Unlimited local repository analysis
- Skip llms.txt option (PR #198)
- 10+ bug fixes for GitHub scraper
- Test suite expanded to 427 tests

See CHANGELOG.md for full details.
2025-11-30 12:22:28 +03:00
yusyus
bd2b201aa5 docs: Update all documentation for v2.1.0 release
Updates across all major documentation files to reflect v2.1.0 release
status and recent completions.

Changes:
- CLAUDE.md:
  * Updated version from v2.0.0 to v2.1.0
  * Updated date to November 29, 2025
  * Updated test count from 391 to 427
  * Moved completed PRs (#195, #198) and Issue #203 to "Completed" section
  * Updated "Next Up" priorities

- README.md:
  * Updated version badge from 2.0.0 to 2.1.0
  * Updated test badge from 379 to 427 passing

- CHANGELOG.md:
  * Added Issue #203 (Configurable EXCLUDED_DIRS) to Unreleased section
  * Documented 19 comprehensive tests for exclude_dirs feature
  * Listed both extend and replace modes

- FUTURE_RELEASES.md:
  * Marked v2.1.0 as "Released" (November 29, 2025)
  * Moved "Fix 12 unified tests" to completed
  * Updated release schedule table

- FLEXIBLE_ROADMAP.md:
  * Updated current status from v1.0.0 to v2.1.0
  * Added latest release date
  * Expanded "What Works" section with new features
  * Updated test count to 427

All documentation now accurately reflects:
- v2.1.0 release status 
- 427 tests passing (up from 391)   - Issue #203 completion 
- PR #195 and #198 merged status 

Related: #203
2025-11-30 01:06:21 +03:00
yusyus
58ec69eb52 feat: Add unlimited local repository analysis with bug fixes (PR #195)
Merges PR #195 by @jimmy058910 with conflict resolution.

**New Features:**
- Local repository analysis via `local_repo_path` configuration
- Bypass GitHub API rate limits (50 → unlimited files)
- Auto-exclusion of virtual environments and build artifacts
- Support for analyzing large codebases (323 files vs 50 before)

**Improvements:**
- Code analysis coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST parsing errors: 95 → 0 (-100%)

**Conflict Resolution:**
- Preserved logger initialization fix from development (Issue #190)
- Kept relative imports from development (Task 1.2 fix)
- Integrated EXCLUDED_DIRS and local repo features from PR
- Combined best of both implementations

**Testing:**
-  All 22 GitHub scraper tests passing
-  Syntax validation passed
-  Local repo analysis feature intact
-  Bug fixes from development preserved

Original implementation by @jimmy058910 in PR #195.
Conflict resolution preserves all bug fixes while adding local repo feature.

Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
2025-11-29 22:46:31 +03:00
yusyus
998be0d2dd fix: Update setup_mcp.sh for v2.0.0 src/ layout + test fixes (#201)
Merges setup_mcp.sh fix for v2.0.0 src/ layout + test updates.

Original fix by @501981732 in PR #197.
Test updates to make CI pass.

Closes #192
2025-11-29 21:34:51 +03:00
Jimmy Moceri
0b2a0d121e feat: Add unlimited local repository analysis and fix 10 critical bugs
Features:
- Add local_repo_path config parameter for unlimited file analysis
- Auto-exclude virtual environments and build artifacts (95% noise reduction)
- Enable comprehensive codebase analysis (50 → 323 files, 546% increase)

Bug Fixes:
- Fix logger initialization error (Issue #190)
- Fix NoneType subscriptable errors in release tag parsing (3 instances)
- Fix relative import paths causing ModuleNotFoundError
- Fix hardcoded 50-file analysis limit
- Fix GitHub API file tree limitation (140 → 345 files discovered)
- Fix AST parser 'not iterable' errors (95 → 0 parsing failures)
- Fix virtual environment file pollution (23,341 → 1,109 file tree items)
- Fix force_rescrape flag not checked before interactive prompt

Impact:
- Code coverage: 14% → 93.6% (+79.6pp)
- Files analyzed: 50 → 323 (+546%)
- Classes extracted: 55 → 585 (+964%)
- Functions extracted: 512 → 2,784 (+444%)
- AST errors: 95 → 0 (-100%)

Tested on JMo Security repository with 345 Python files.
2025-11-16 22:35:23 -05:00
yusyus
88dce89adf docs: Update documentation for v2.0.0 PyPI release
README.md:
- Add PyPI badges (version, downloads, python version)
- Update test count from 299 to 379 passing tests
- Add prominent 'Now Available on PyPI!' callout section
- Reorder installation options (pip as Option 1, uv as Option 2)
- Add links to Quick Start and Bulletproof guides
- Emphasize PyPI as the recommended installation method

CHANGELOG.md:
- Add comprehensive v2.0.0 release entry (dated 2025-11-11)
- Document PyPI publication as major milestone
- Detail modern Python packaging changes
- Include unified CLI interface documentation
- Add migration guide for users and developers
- List all breaking changes and deprecations
- Document 379 passing tests and import fixes

FUTURE_RELEASES.md (NEW):
- Create roadmap document for upcoming releases
- Plan v2.1.0 (Dec 2025): Test coverage & quality improvements
- Plan v2.2.0 (Q1 2026): Web presence & community growth
- Plan v2.3.0 (Q2 2026): Developer experience & integrations
- Long-term vision for v3.0+
- Community contribution guidelines
- Release schedule and priority system

🚀 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 22:27:04 +03:00
yusyus
66b7f9c4f6 chore: Bump version to v1.3.0
Update version numbers across project for v1.3.0 release:
- CHANGELOG.md: Move [Unreleased] → [1.3.0] - 2025-10-26
- README.md: Update version badge 1.2.0 → 1.3.0
- cli/__init__.py: Update __version__ = "1.3.0"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 13:16:54 +03:00
yusyus
319331f5a6 feat: Complete refactoring with async support, type safety, and package structure
This comprehensive refactoring improves code quality, performance, and maintainability
while maintaining 100% backwards compatibility.

## Major Features Added

### 🚀 Async/Await Support (2-3x Performance Boost)
- Added `--async` flag for parallel scraping using asyncio
- Implemented `scrape_page_async()` with httpx.AsyncClient
- Implemented `scrape_all_async()` with asyncio.gather()
- Connection pooling for better resource management
- Performance: 18 pg/s → 55 pg/s (3x faster)
- Memory: 120 MB → 40 MB (66% reduction)
- Full documentation in ASYNC_SUPPORT.md

### 📦 Python Package Structure (Phase 0 Complete)
- Created cli/__init__.py for clean imports
- Created skill_seeker_mcp/__init__.py (renamed from mcp/)
- Created skill_seeker_mcp/tools/__init__.py
- Proper package imports: `from cli import constants`
- Better IDE support and autocomplete

### ⚙️ Centralized Configuration
- Created cli/constants.py with 18 configuration constants
- DEFAULT_ASYNC_MODE, DEFAULT_RATE_LIMIT, DEFAULT_MAX_PAGES
- Enhancement limits, categorization scores, file limits
- All magic numbers now centralized and configurable

### 🔧 Code Quality Improvements
- Converted 71 print() statements to proper logging
- Added type hints to all DocToSkillConverter methods
- Fixed all mypy type checking issues
- Installed types-requests for better type safety
- Code quality: 5.5/10 → 6.5/10

## Testing
- Test count: 207 → 299 tests (92 new tests)
- 11 comprehensive async tests (all passing)
- 16 constants tests (all passing)
- Fixed test isolation issues
- 100% pass rate maintained (299/299 passing)

## Documentation
- Updated README.md with async examples and test count
- Updated CLAUDE.md with async usage guide
- Created ASYNC_SUPPORT.md (292 lines)
- Updated CHANGELOG.md with all changes
- Cleaned up temporary refactoring documents

## Cleanup
- Removed temporary planning/status documents
- Moved test_pr144_concerns.py to tests/ folder
- Updated .gitignore for test artifacts
- Better repository organization

## Breaking Changes
None - all changes are backwards compatible.
Async mode is opt-in via --async flag.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 13:05:39 +03:00
Edgar I.
0e3f0c6375 docs: update status for Phase 1 completion 2025-10-24 18:28:30 +04:00
yusyus
394eab218e Add PDF Advanced Features (v1.2.0)
Priority 2 & 3 Features Implemented:
- OCR support for scanned PDFs (pytesseract + Pillow)
- Password-protected PDF support
- Complex table extraction
- Parallel page processing (3x faster)
- Intelligent caching (50% faster re-runs)

Testing:
- New test file: test_pdf_advanced_features.py (26 tests)
- Updated test_pdf_extractor.py (23 tests)
- Updated test_pdf_scraper.py (18 tests)
- Total: 49/49 PDF tests passing (100%)
- Overall: 142/142 tests passing (100%)

Documentation:
- Added docs/PDF_ADVANCED_FEATURES.md (580 lines)
- Updated CHANGELOG.md with v1.1.0 and v1.2.0
- Updated README.md version badges and features
- Updated docs/TESTING.md with new test counts

Dependencies:
- Added Pillow==11.0.0
- Added pytesseract==0.3.13

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-23 21:43:05 +03:00
yusyus
517ed46338 Add project infrastructure and documentation
Infrastructure:
- Add GitHub Actions workflows (tests.yml, release.yml)
- Add CHANGELOG.md with full version history
- Add CONTRIBUTING.md with contribution guidelines
- Add RELEASE_NOTES_v1.0.0.md for v1.0.0 release

Documentation:
- Update README.md with version badge (v1.0.0)
- Update test count badge (14 tests)
- Add links to new documentation files

Features:
- CI/CD pipeline with automated testing
- Multi-OS testing (Ubuntu, macOS)
- Multi-Python version testing (3.7-3.11)
- Automated release creation on tag push
- Code coverage reporting

This completes the v1.0.0 production release setup.
2025-10-19 22:37:55 +03:00