- Filter out chunks smaller than min_chunk_size (default 100 tokens)
- Exception: Keep all chunks if entire document is smaller than target size
- All 15 tests passing (100% pass rate)
Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were
being created despite min_chunk_size=100 setting.
Test: pytest tests/test_rag_chunker.py -v
## Problem
Framework detection was broken because files with only imports (no
classes/functions) were excluded from analysis. The architectural pattern
detector received empty file lists, resulting in 0 frameworks detected.
## Root Cause
In codebase_scraper.py:873-881, the has_content check filtered out files
that didn't have classes, functions, or other structural elements. This
excluded simple __init__.py files that only contained import statements,
which are critical for framework detection.
## Solution (3 parts)
1. **Extract imports from Python files** (code_analyzer.py:140-178)
- Added import extraction using AST (ast.Import, ast.ImportFrom)
- Returns imports list in analysis results
- Now captures: "from flask import Flask" → ["flask"]
2. **Include import-only files** (codebase_scraper.py:873-881)
- Updated has_content check to include files with imports
- Files with imports are now included in analysis results
- Comment added: "IMPORTANT: Include files with imports for framework
detection (fixes#239)"
3. **Enhance framework detection** (architectural_pattern_detector.py:195-240)
- Extract imports from all Python files in analysis
- Check imports in addition to file paths and directory structure
- Prioritize import-based detection (high confidence)
- Require 2+ matches for path-based detection (avoid false positives)
- Added debug logging: "Collected N imports for framework detection"
## Results
**Before fix:**
- Test Flask project: 0 files analyzed, 0 frameworks detected
- Files with imports: excluded from analysis
- Framework detection: completely broken
**After fix:**
- Test Flask project: 3 files analyzed, Flask detected ✅
- Files with imports: included in analysis
- Framework detection: working correctly
- No false positives (ASP.NET, Rails, etc.)
## Testing
Added comprehensive test suite (tests/test_framework_detection.py):
- ✅ test_flask_framework_detection_from_imports
- ✅ test_files_with_imports_are_included
- ✅ test_no_false_positive_frameworks
All existing tests pass:
- ✅ 38 tests in test_codebase_scraper.py
- ✅ 54 tests in test_code_analyzer.py
- ✅ 3 new tests in test_framework_detection.py
## Impact
- Fixes issue #239 completely
- Framework detection now works for Python projects
- Import-only files (common in Python packages) are properly analyzed
- No performance impact (import extraction is fast)
- No breaking changes to existing functionality
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem:
The analyze command created duplicate documentation directories:
- output/skill-seekers/documentation/ (1.5MB) - Not referenced
- output/skill-seekers/references/documentation/ (1.5MB) - Referenced
This wasted 1.5MB per skill (50% duplication).
Root Cause:
_generate_references() copied directories to references/ but never
cleaned up the source directories.
Solution:
After copying each directory to references/, immediately remove the
source directory using shutil.rmtree(). SKILL.md only references
references/{target}, making the source directories redundant.
Changes:
- Add cleanup in _generate_references() after each copytree operation
- Add 2 comprehensive tests to verify no duplicate directories
- Test coverage: 38/38 tests passing in test_codebase_scraper.py
Impact:
- Saves 1.5MB per skill (documentation size varies)
- Prevents 50% duplication of all analysis output directories
- Clean, efficient disk usage
Tests Added:
- test_no_duplicate_directories_created: Verifies source cleanup
- test_no_disk_space_wasted: Verifies single copy in references/
Reported by: @yangshare via Issue #279
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added comprehensive integration tests using the exact MikroORM URLs that
caused 404 errors in the original bug report.
Test Coverage (6 integration tests):
1. test_mikro_orm_urls_from_issue_277
- Tests exact URLs from the bug report
- Verifies no malformed anchor fragments in results
- Validates deduplication and correct URL transformation
2. test_no_404_causing_urls_generated
- Verifies no URLs matching the 404 error pattern are generated
- Tests all problematic patterns from the issue
3. test_deduplication_prevents_multiple_requests
- Validates that multiple anchors on same page deduplicate correctly
- Ensures bandwidth savings
4. test_md_files_with_anchors_preserved
- Tests .md files with anchors are handled correctly
- Verifies anchor stripping on .md URLs
5. test_real_scraping_scenario_no_404s
- Integration test simulating full llms.txt parsing flow
- Validates URL structure with regex patterns
6. test_issue_277_error_message_urls
- Tests the exact malformed URLs from error output
- Verifies correct URLs are generated instead
Results:
- 18/18 tests passing (12 unit + 6 integration)
- All MikroORM URLs from issue #277 handled correctly
- No 404-causing patterns generated
Related: #277
Thank you @PaawanBarach for this excellent contribution! 🎉
Adds pattern-based language detection for 7 new programming languages with comprehensive test coverage.
✅ 70 regex patterns with smart weight distribution
✅ Framework-specific patterns (Flutter, case classes, mixins)
✅ 7 new tests, all passing (30/30 total)
✅ No regressions, backward compatible
This resolves#165 and significantly expands our language support!
- Update version checks in test_package_structure.py from 2.8.0 to 2.9.0
- Update version check in test_cli_paths.py from 2.8.0 to 2.9.0
- Remove trailing whitespace from blank lines in code_analyzer.py (lines 1436-1504)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Thanks @franklegolasyoung for the excellent work on the core fixes for issues #267, #242, and #260! 🙏
Your comprehensive approach to fixing PDF processing, expanding workflow detection, and improving the Chinese README documentation is much appreciated. I've added code quality fixes and comprehensive tests to ensure everything passes CI.
All 1266+ tests are now passing, and the issues are resolved! 🎉
- Remove SPYKE-related client documentation files
- Fix critical ruff linter errors:
- Remove unused 'os' import in test_analyze_e2e.py
- Remove unused 'setups' variable in test_test_example_extractor.py
- Prefix unused output_dir parameter in codebase_scraper.py
- Fix import sorting in test_integration.py
- Update CHANGELOG.md with comprehensive PR #272 feature documentation
These changes were part of PR #272 cleanup but didn't make it into the squash merge.
Complete implementation of C3.9, granular AI enhancement control, performance optimizations, and bug fixes.
Features:
- C3.9 Project Documentation Extraction (markdown files)
- Granular AI enhancement control (--enhance-level 0-3)
- C# test extraction support
- 6-12x faster LOCAL mode with parallel execution
- Auto-enhancement UX improvements
- LOCAL mode fallback for all AI enhancements
Bug Fixes:
- C# language support
- Config type field compatibility
- LocalSkillEnhancer import
Documentation:
- Updated CHANGELOG.md
- Updated CLAUDE.md
- Removed client-specific files
Tests: All 1,257 tests passing
Critical linter errors: Fixed
- Scan ALL .md files in project (README, docs/, etc.)
- Smart categorization by folder/filename (overview, architecture, guides, etc.)
- Processing depth: surface=raw copy, deep=parse+summarize, full=AI-enhanced
- AI enhancement at level 2+ adds topic extraction and cross-references
- New "Project Documentation" section in SKILL.md with summaries
- Output to references/documentation/ organized by category
- Default ON, use --skip-docs to disable
- Add skip_docs parameter to MCP scrape_codebase_tool
- Add 15 new tests for markdown documentation features
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bug fixes:
- Fix KeyError in config_enhancer.py where "config_type" was expected but
config_extractor saves as "type". Now supports both field names for
backward compatibility.
- Fix settings "value_type" vs "type" mismatch in the same file.
New features:
- Add C# support for regex-based test example extraction
- Add language alias mapping (C# -> csharp, C++ -> cpp)
- Enhanced C# patterns for NUnit, xUnit, MSTest test frameworks
- Support for mock patterns (NSubstitute, Moq)
- Support for Zenject dependency injection patterns
- Support for setup/teardown method extraction
Tests:
- Add 2 new C# test extraction tests (NUnit tests, mock patterns)
- All 1257 tests pass (165 skipped)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update Chinese README (README.zh-CN.md) with new preset flags
- Update docs/features/*.md (PATTERN_DETECTION, HOW_TO_GUIDES, BOOTSTRAP_SKILL_TECHNICAL)
- Update scripts/bootstrap_skill.sh to use 'skill-seekers analyze'
- Update scripts/skill_header.md command examples
- Update tests/test_bootstrap_skill.py assertions
- Fix CHANGELOG.md historical entry with correct command name
All references to 'skill-seekers-codebase' updated to 'skill-seekers analyze'
except where needed for backward compatibility (pyproject.toml, E2E tests).
Related to Phase 1 implementation from previous commits.
Fixes 2 failing integration tests to match current validation behavior:
1. test_load_config_with_validation_errors:
- Legacy validator is intentionally lenient for backward compatibility
- Only validates presence of fields, not format
- Updated test to use config that's truly invalid (missing all type fields)
2. test_godot_config:
- godot.json uses unified format (sources array), not legacy format
- Old validate_config() expects legacy format with top-level base_url
- Updated to use ConfigValidator which supports both formats
Changes:
- Import ConfigValidator for unified format validation
- Fix test_load_config_with_validation_errors to trigger actual validation error
- Fix test_godot_config to use ConfigValidator instead of old validate_config
Test Results:
- Both previously failing tests now PASS ✅
- All 71 related tests PASS ✅
- No regressions introduced
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements Phase 1 of the codebase analysis UX improvement plan, making the
command discoverable and adding intuitive preset flags while maintaining 100%
backward compatibility.
New Features:
- Add 'analyze' subcommand to main CLI (skill-seekers analyze)
- Add --quick preset: Fast analysis (1-2 min, basic features only)
- Add --comprehensive preset: Full analysis (20-60 min, all features + AI)
- Add --enhance flag: Simple AI enhancement with auto-detection
- Improve help text with timing estimates and mode descriptions
Files Modified:
- src/skill_seekers/cli/main.py: Add analyze subcommand (lines 15, 273-311, 542-589)
- src/skill_seekers/cli/codebase_scraper.py: Add preset logic and improve help text
- tests/test_analyze_command.py: NEW - 20 comprehensive tests
- tests/test_cli_paths.py: Fix version check (2.7.0 -> 2.7.2)
- tests/test_package_structure.py: Fix 4 version checks (2.7.0 -> 2.7.2)
- README.md: Update examples to use 'analyze' command
- CLAUDE.md: Update examples to use 'analyze' command
Test Results:
- 81 tests related to Phase 1: ALL PASSING ✅
- 20 new tests for analyze command: ALL PASSING ✅
- Zero regressions introduced
- 100% backward compatibility maintained
Backward Compatibility:
- Old 'skill-seekers-codebase' command still works
- All existing flags (--depth, --ai-mode, --skip-*) still functional
- No breaking changes
Usage Examples:
skill-seekers analyze --directory . --quick
skill-seekers analyze --directory . --comprehensive
skill-seekers analyze --directory . --enhance
Fixes#262 (codebase UX issues)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#264
Users reported that preset configs (react.json, godot.json, etc.) were not
found after installing via pip/uv, causing immediate failure on first use.
Solution: Instead of bundling configs in the package, the CLI now automatically
fetches missing configs from the SkillSeekersWeb.com API.
Changes:
- Created config_fetcher.py with smart config resolution:
1. Check local path (backward compatible)
2. Check with configs/ prefix
3. Auto-fetch from SkillSeekersWeb.com API (new!)
- Updated doc_scraper.py to use ConfigValidator (supports unified configs)
- Added 15 comprehensive tests for auto-fetch functionality
User Experience:
- Zero configuration needed - presets work immediately after install
- Better error messages showing available configs from API
- Downloaded configs are cached locally for future use
- Fully backward compatible with existing local configs
Testing:
- 15 new unit tests (all passing)
- 2 integration tests with real API
- Full test suite: 1387 tests passing
- No breaking changes
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Critical Bugs Fixed
### 1. UnboundLocalError in AI Enhancement Modules (BLOCKING)
**Issue**: Duplicate `import os` statements inside conditional blocks caused
UnboundLocalError when accessing os.environ before the import was reached.
**Files Fixed**:
- src/skill_seekers/cli/guide_enhancer.py (lines 92, 112)
- src/skill_seekers/cli/ai_enhancer.py (line 77)
- src/skill_seekers/cli/config_enhancer.py (line 82)
**Root Cause**: `os` was already imported at file top, but re-imported inside
conditional blocks, creating a local variable scope issue.
**Solution**: Removed duplicate import statements - os is already available
from the top-level import.
**Impact**: Fixed 30 failing guide_enhancer tests
### 2. PDF Scraper Test Expectations (BREAKING CHANGE)
**Issue**: Tests expected old keyword-based categorization behavior, but PR
introduced new single-file strategy for single PDF sources.
**Files Fixed**:
- tests/test_pdf_scraper.py (5 tests updated)
**Tests Updated**:
1. test_categorize_by_keywords
2. test_build_skill_creates_reference_files
3. test_code_blocks_included_in_references
4. test_high_quality_code_preferred
5. test_image_references_in_markdown
**Solution**: Updated test expectations to match new single-file strategy
behavior (single PDF → single category named after PDF basename).
**Impact**: Fixed 5 failing PDF scraper tests
## Test Results
**Before Fixes**: 35 tests failing
**After Fixes**: 130 tests passing, 5 skipped ✅
### Tested Modules:
- ✅ PDF scraper (18 tests)
- ✅ Guide enhancer (30 tests)
- ✅ All adaptors (82 tests)
## Verification
```bash
pytest tests/test_pdf_scraper.py tests/test_guide_enhancer.py tests/test_adaptors/ -v
# Result: 130 passed, 5 skipped in 1.11s
```
## Notes
The original PR features (GLM-4.7 support + PDF scraper improvements) are
excellent and working correctly. These fixes only address the import scoping
bug introduced during implementation and update tests for the new behavior.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This PR modernizes the MCP setup with comprehensive improvements:
**Key Improvements:**
✅ Virtual environment auto-detection (venv, .venv, $VIRTUAL_ENV)
✅ Module-based imports (python -m skill_seekers.mcp.server_fastmcp)
✅ Eliminates 'module not found' errors from missing dependencies
✅ No need for --break-system-packages or global installs
✅ Clean project isolation with venv
✅ Prepares for v3.0.0 when server.py will be removed
**Bug Fixes:**
🐛 Fixed 41 instances of server_fastmcp_fastmcp → server_fastmcp typo
🐛 Updated tests to accept -e ".[mcp]" format
🐛 Updated tests for module reference format
**Files Changed:** 13 files (+312/-154 lines)
**Testing:** All 1386 tests passing (verified)
Co-Authored-By: MiaoDX <miaodx@hotmail.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed 2 test assertions to match PR #252 improvements:
1. test_requirements_txt_path:
- Now accepts '-e ".[mcp]"' format with MCP extra dependencies
- Previously only accepted '-e .' format
2. test_json_config_path_format:
- Now checks for module reference 'skill_seekers.mcp.server_fastmcp'
- Previously checked for file path 'server_fastmcp.py'
These changes align tests with the modern module import approach
introduced in PR #252 for better venv compatibility.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The anthropic import is only used to check availability, not actually used in
code. Added # noqa: F401 comment to suppress 'imported but unused' warning.
Fixes GitHub Actions ruff linting failure.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add ANTHROPIC_AVAILABLE check at module level
- Skip TestIssue219Problem3CustomAPIEndpoints when anthropic not installed
- Skip TestIssue219IntegrationAll when anthropic not installed
This fixes 4 test failures when the optional anthropic package is not installed.
The tests now properly skip instead of failing with SystemExit.
Fixes pre-existing test failures unrelated to documentation work.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed unused tmp_path fixture parameter to fix ruff ARG002 error:
- Line 54: test_bootstrap_script_runs now only takes project_root
The test doesn't use tmp_path - it runs bootstrap in project_root
and checks output/skill-seekers/ directory.
Fixes ruff error:
ARG002 Unused method argument: `tmp_path`
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed _tmp_path to tmp_path to fix pytest fixture error:
- Line 54: test_bootstrap_script_runs fixture parameter
Error was:
fixture '_tmp_path' not found
available fixtures: ..., tmp_path, ...
This was causing 1 ERROR in CI test runs across all Python versions.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed incorrect variable names in list comprehensions that were causing
NameError in CI (Python 3.11/3.12):
Critical fixes:
- tests/test_markdown_parsing.py: 'l' → 'link' in list comprehension
- src/skill_seekers/cli/pdf_extractor_poc.py: 'l' → 'line' (2 occurrences)
Additional auto-lint fixes:
- Removed unused imports in llms_txt_downloader.py, llms_txt_parser.py
- Fixed comparison operators in config files
- Fixed list comprehension in other files
All tests now pass in CI.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes 5 additional failing tests in test_real_world_fastmcp.py with the
same stdin reading issue.
All tests now use interactive=False when creating GitHubThreeStreamFetcher
or calling UnifiedCodebaseAnalyzer.analyze() to prevent stdin prompts
during test execution.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>