- API mode: For pattern/example enhancement (batch processing)
- LOCAL mode: For SKILL.md enhancement (opens Claude Code terminal)
- Both modes still available, serve different purposes
- Updated CHANGELOG to explain when to use each mode
BREAKING CHANGE: All codebase analysis features are now enabled by default
This improves user experience by maximizing value out-of-the-box. Users
now get all analysis features (API reference, dependency graph, pattern
detection, test example extraction) without needing to know about flags.
Changes:
- Changed flag pattern from --build-* to --skip-* for better discoverability
- Updated function signature: all analysis features default to True
- Inverted boolean logic: --skip-* flags disable features
- Added backward compatibility warnings for deprecated --build-* flags
- Updated help text and usage examples
Migration:
- Remove old --build-* flags from your scripts (features now ON by default)
- Use new --skip-* flags to disable specific features if needed
Old (DEPRECATED):
codebase-scraper --directory . --build-api-reference --build-dependency-graph
New:
codebase-scraper --directory . # All features enabled by default
codebase-scraper --directory . --skip-patterns # Disable specific features
Rationale:
- Users should get maximum value by default
- Explicit opt-out is better than hidden opt-in
- Improves feature discoverability
- Aligns with user expectations from C2 and C3 features
Testing:
- All 107 codebase analysis tests passing
- Backward compatibility warnings working correctly
- Help text updated correctly
🚨 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add build_dependency_graph parameter to scrape_codebase MCP tool
- Update tool documentation with new parameter
- Pass --build-dependency-graph flag to CLI command
- Update FastMCP server function signature
Usage via MCP:
scrape_codebase(
directory="/path/to/repo",
build_dependency_graph=True
)
This completes the C2.6 feature set by exposing dependency graph
generation through the MCP interface, making it available to all
MCP clients (Claude Code, Cursor, etc.).
- Add pathspec import with graceful fallback
- Add gitignore_spec attribute to GitHubScraper class
- Implement _load_gitignore() method to parse .gitignore files
- Update should_exclude_dir() to check .gitignore rules
- Load .gitignore automatically in local repository mode
- Handle directory patterns with and without trailing slash
- Add 4 comprehensive tests for .gitignore functionality
Closes#63 - C2.1 File Tree Walker with .gitignore support complete
Features:
- Loads .gitignore from local repository root
- Respects .gitignore patterns for directory exclusion
- Falls back gracefully when pathspec not installed
- Works alongside existing hard-coded exclusions
- Only active in local_repo_path mode (not GitHub API mode)
Test coverage:
- test_load_gitignore_exists: .gitignore parsing
- test_load_gitignore_missing: Missing .gitignore handling
- test_should_exclude_dir_with_gitignore: .gitignore exclusion
- test_should_exclude_dir_default_exclusions: Existing exclusions still work
Integration:
- github_scraper.py now has same .gitignore support as codebase_scraper.py
- Both tools use pathspec library for consistent behavior
- Enables proper repository analysis respecting project .gitignore rules
- Created src/skill_seekers/cli/api_reference_builder.py (330 lines)
- Generates markdown API documentation from code analysis results
- Supports Python, JavaScript/TypeScript, and C++ code signatures
Features:
- Class documentation with inheritance and methods
- Function/method signatures with parameters and return types
- Parameter tables with types and defaults
- Async function indicators
- Decorators display (for Python)
- Standalone CLI tool for generating API docs from JSON
Tests:
- Created tests/test_api_reference_builder.py with 7 tests
- All tests passing ✅
- Test coverage: Class formatting, function formatting, parameter tables,
markdown structure, code analyzer integration, async indicators
Output Format:
- One .md file per analyzed source file
- Organized: Classes → Methods, then standalone Functions
- Professional markdown tables for parameters
CLI Usage:
python -m skill_seekers.cli.api_reference_builder \
code_analysis.json output/api_reference/
Related Issues:
- Closes#66 (C2.4 Build API reference from code)
- Part of C2 Local Codebase Scraping roadmap (TIER 3)
**Problem #1: Large File Encoding Error** ✅ FIXED
- Add large file download support via download_url
- Detect encoding='none' for files >1MB
- Download via GitHub raw URL instead of API
- Handles ccxt/ccxt's 1.4MB CHANGELOG.md successfully
**Problem #2: Missing CLI Enhancement Flags** ✅ FIXED
- Add --enhance, --enhance-local, --api-key to main.py github_parser
- Add flag forwarding in CLI dispatcher
- Fixes 'unrecognized arguments' error
- Users can now use: skill-seekers github --repo owner/repo --enhance-local
**Problem #3: Custom API Endpoint Support** ✅ FIXED
- Support ANTHROPIC_BASE_URL environment variable
- Support ANTHROPIC_AUTH_TOKEN (alternative to ANTHROPIC_API_KEY)
- Fix ThinkingBlock.text error with newer Anthropic SDK
- Find TextBlock in response content array (handles thinking blocks)
**Changes**:
- src/skill_seekers/cli/enhance_skill.py:
- Support custom base_url parameter
- Support both ANTHROPIC_API_KEY and ANTHROPIC_AUTH_TOKEN
- Iterate through content blocks to find text (handles ThinkingBlock)
- src/skill_seekers/cli/main.py:
- Add --enhance, --enhance-local, --api-key to github_parser
- Forward flags to github_scraper.py in dispatcher
- src/skill_seekers/cli/github_scraper.py:
- Add large file detection (encoding=None/"none")
- Download via download_url with requests
- Log file size and download progress
- tests/test_github_scraper.py:
- Add test_get_file_content_large_file
- Add test_extract_changelog_large_file
- All 31 tests passing ✅
**Credits**:
- Thanks to @XGCoder for detailed bug report
- Thanks to @gorquan for local fixes and guidance
Fixes#219🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add _get_file_content() helper method to detect and follow symlinks
- Update _extract_readme() to use new helper
- Update _extract_changelog() to use new helper
- Add 7 comprehensive tests for symlink handling
- All 29 GitHub scraper tests passing
Fixes#225
When README.md or CHANGELOG.md are symlinks (like in vercel/ai repo),
PyGithub returns ContentFile with type='symlink' and encoding=None.
Direct access to decoded_content throws AssertionError.
Solution: Detect symlink type, follow target path, then decode actual file.
Handles edge cases: broken symlinks, missing targets, encoding errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add _check_skill_completeness() method to quality checker that validates:
- Prerequisites/verification sections (helps Claude check conditions first)
- Error handling/troubleshooting guidance (common issues and solutions)
- Workflow steps (sequential instructions using first/then/next/finally)
This addresses G2.3 and G2.4 from the roadmap:
- G2.3: Add readability scoring (via workflow step detection)
- G2.4: Add completeness checker
New checks use info-level messages (not warnings) to avoid affecting
quality scores for existing skills while still providing helpful guidance.
Includes 4 new unit tests for completeness checks.
Contributed by the AI Writing Guide project.
- Switch from manual package listing to automatic discovery
- Improves maintainability and prevents missing module bugs
- All tests passing (700+ tests)
- Package contents verified identical to v2.5.1
Fixes#226
Merges #227
Thanks to @iamKhan79690 for the contribution!
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Anas Ur Rehman (@iamKhan79690) <noreply@github.com>
Added empty py.typed marker file to enable type checkers (mypy, pyright,
pylance) to use inline type hints from the package.
This file was declared in pyproject.toml package_data but was missing,
causing build warnings.
Benefits:
- Enables type checkers to use inline type hints
- Follows Python typing best practices (PEP 561)
- Improves IDE autocomplete/intellisense
Fixes#222🤖 Generated with [Claude Code](https://claude.com/claude-code)
- Replace TextContent = None with proper fallback class in all MCP tool modules
- Fixes TypeError when MCP library is not fully initialized in test environment
- Ensures all 700 tests pass (was 699 passing, 1 failing)
- Affected files:
* packaging_tools.py
* config_tools.py
* scraping_tools.py
* source_tools.py
* splitting_tools.py
The fallback class maintains the same interface as mcp.types.TextContent,
allowing tests to run successfully even when the MCP library import fails.
Test results: ✅ 700 passed, 157 skipped, 2 warnings
- Add MarkdownAdaptor for universal markdown export
- Pure markdown format (no platform-specific features)
- ZIP packaging with README.md, references/, DOCUMENTATION.md
- No upload capability (manual use only)
- No AI enhancement support
- Combines all references into single DOCUMENTATION.md
- Add 12 unit tests (all passing)
Test Results:
- 12 MarkdownAdaptor tests passing
- 45 total adaptor tests passing (4 skipped)
Phase 4 Complete ✅
Related to #179
Implements hybrid smart extraction + improved fallback templates for
skill descriptions across all scrapers.
Changes:
- github_scraper.py:
* Added extract_description_from_readme() helper
* Extracts from README first paragraph (60 lines)
* Updates description after README extraction
* Fallback: "Use when working with {name}"
* Updated 3 locations (GitHubScraper, GitHubToSkillConverter, main)
- doc_scraper.py:
* Added infer_description_from_docs() helper
* Extracts from meta tags or first paragraph (65 lines)
* Tries: meta description, og:description, first content paragraph
* Fallback: "Use when working with {name}"
* Updated 2 locations (create_enhanced_skill_md, get_configuration)
- pdf_scraper.py:
* Added infer_description_from_pdf() helper
* Extracts from PDF metadata (subject, title)
* Fallback: "Use when referencing {name} documentation"
* Updated 3 locations (PDFToSkillConverter, main x2)
- generate_router.py:
* Updated 2 locations with improved router descriptions
* "Use when working with {name} development and programming"
All changes:
- Only apply to NEW skill generations (don't modify existing)
- No API calls (free/offline)
- Smart extraction when metadata/README available
- Improved "Use when..." fallbacks instead of generic templates
- 612 tests passing (100%)
Fixes#191
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#209 - UnicodeDecodeError on Windows with non-ASCII characters
**Problem:**
Windows users with non-English locales (Chinese, Japanese, Korean, etc.)
experienced GBK/SHIFT-JIS codec errors when the system default encoding
is not UTF-8.
Error: 'gbk' codec can't decode byte 0xac in position 206: illegal
multibyte sequence
**Root Cause:**
File operations using open() without explicit encoding parameter use
the system default encoding, which on Windows Chinese edition is GBK.
JSON files contain UTF-8 encoded characters that fail to decode with GBK.
**Solution:**
Added encoding='utf-8' to ALL file operations across:
- doc_scraper.py (4 instances):
* load_config() - line 1310
* check_existing_data() - line 1416
* save_checkpoint() - line 173
* load_checkpoint() - line 186
- github_scraper.py (1 instance):
* main() config loading - line 922
- unified_scraper.py (10 instances):
* All JSON read/write operations - lines 134, 153, 205, 239, 275,
278, 325, 328, 342, 364
**Test Results:**
- ✅ All 612 tests passing (100% pass rate)
- ✅ Backward compatible (UTF-8 is standard on Linux/macOS)
- ✅ Fixes Windows locale issues
**Impact:**
- ✅ Works on ALL Windows locales (Chinese, Japanese, Korean, etc.)
- ✅ Maintains compatibility with Linux/macOS
- ✅ Prevents future encoding issues
**Thanks to:** @my5icol for the detailed bug report and fix suggestion!
Fixes#214 - Local enhancement now handles large skills automatically
**Problem:**
- Claude CLI has undocumented ~30-40K character limit
- Large skills (>30K chars) fail silently during local enhancement
- Users experience "Claude finished but SKILL.md was not updated" error
**Solution:**
- Auto-detect large skills (>30K chars)
- Apply intelligent summarization to reduce content size
- Preserve critical content:
* First 20% (introduction/overview)
* Up to 5 best code blocks
* Up to 10 section headings with context
- Target ~30% of original size
- Show clear warnings when summarization is applied
**Implementation:**
- Added `summarize_reference()` method to LocalSkillEnhancer
- Modified `create_enhancement_prompt()` to accept summarization parameters
- Updated `run()` method to auto-enable summarization for large skills
- Added comprehensive test suite (6 tests)
**Test Results:**
- ✅ All 612 tests passing (100% pass rate)
- ✅ 6 new smart summarization tests
- ✅ E2E test: 60K skill → 17K prompt (within limits)
- ✅ Code block preservation verified
**User Experience:**
When enhancement is triggered on a large skill:
```
⚠️ LARGE SKILL DETECTED
📊 Reference content: 60,072 characters
💡 Claude CLI limit: ~30,000-40,000 characters
🔧 Applying smart summarization to ensure success...
• Keeping introductions and overviews
• Extracting best code examples
• Preserving key concepts and headings
• Target: ~30% of original size
✓ Reduced from 60,072 to 15,685 chars (26%)
✓ Prompt created and optimized (17,804 characters)
✓ Ready for Claude CLI (within safe limits)
```
**Backward Compatibility:**
- No breaking changes
- Works with existing skills
- Falls back gracefully for normal-sized skills
Add automatic skill installation to 10+ AI coding agents with a single command.
New Features:
- New install-agent command for installing skills to any AI agent
- Support for 10+ agents: Claude Code, Cursor, VS Code, Amp, Goose, OpenCode, Letta, Aide, Windsurf
- Smart path resolution (global ~/.agent vs project-relative .agent/)
- Fuzzy agent name matching with suggestions
- --agent all flag to install to all agents at once
- --force flag to overwrite existing installations
- --dry-run flag to preview installations
- Comprehensive error handling and user feedback
Implementation:
- Created install_agent.py (379 lines) with core installation logic
- Updated main.py with install-agent subcommand
- Updated pyproject.toml with entry point
- Added 32 comprehensive tests (all passing, 603 total)
- No regressions in existing functionality
Documentation:
- Updated README.md with multi-agent installation guide
- Updated CLAUDE.md with install-agent examples
- Updated CHANGELOG.md with v2.3.0 release notes
- Added agent compatibility table
Technical Details:
- 100% own implementation (no external dependencies)
- Pure Python using stdlib (shutil, pathlib, argparse)
- Compatible with Agent Skills open standard (agentskills.io)
- Works offline
Closes#210🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Created LanguageDetector class supporting 20+ programming languages
- Confidence-based detection with customizable thresholds (min_confidence parameter)
- Replaces duplicate language detection code in doc_scraper and pdf_extractor
- Comprehensive test suite with 100+ test cases
Changes:
- NEW: src/skill_seekers/cli/language_detector.py (17 KB)
- Unified detector with pattern matching for 20+ languages
- Confidence scoring (0.0-1.0 scale)
- Supports: Python, JavaScript, TypeScript, Java, C++, C#, Go, Rust, PHP, Ruby, Swift, Kotlin, Shell, SQL, HTML, CSS, JSON, YAML, XML, and more
- NEW: tests/test_language_detector.py (20 KB)
- 100+ test cases covering all supported languages
- Edge case testing (mixed code, low confidence, etc.)
- MODIFIED: src/skill_seekers/cli/doc_scraper.py
- Removed 80+ lines of duplicate detection code
- Now uses shared LanguageDetector instance
- MODIFIED: src/skill_seekers/cli/pdf_extractor_poc.py
- Removed 130+ lines of duplicate detection code
- Now uses shared LanguageDetector instance
- MODIFIED: tests/test_pdf_extractor.py
- Fixed imports to use proper package paths
- Added manual detector initialization in test setup
Benefits:
- DRY: Single source of truth for language detection
- Maintainability: Add new languages in one place
- Consistency: Same detection logic across all scrapers
- Testability: Comprehensive test coverage
- Extensibility: Easy to add new languages or improve patterns
Addresses technical debt from having duplicate detection logic in multiple files.
Add retry_with_backoff() and retry_with_backoff_async() for network operations.
Features:
- Configurable max attempts (default: 3)
- Exponential backoff with configurable base delay
- Operation name for meaningful log messages
- Both sync and async versions
Addresses E2.6: Add retry logic for network failures
Co-authored-by: Joseph Magly <1159087+jmagly@users.noreply.github.com>
This commit fixes three critical limitations discovered during local repository skill extraction testing:
**Fix 1: Code Analyzer Import Issue**
- Changed unified_scraper.py to use absolute imports instead of relative imports
- Fixed: `from github_scraper import` → `from skill_seekers.cli.github_scraper import`
- Fixed: `from pdf_scraper import` → `from skill_seekers.cli.pdf_scraper import`
- Result: CodeAnalyzer now available during extraction, deep analysis works
**Fix 2: Unity Library Exclusions**
- Updated should_exclude_dir() to accept and check full directory paths
- Updated _extract_file_tree_local() to pass both dir name and full path
- Added exclusion config passing from unified_scraper to github_scraper
- Result: exclude_dirs_additional now works (297 files excluded in test)
**Fix 3: AI Enhancement for Single Sources**
- Changed read_reference_files() to use rglob() for recursive search
- Now finds reference files in subdirectories (e.g., references/github/README.md)
- Result: AI enhancement works with unified skills that have nested references
**Test Results:**
- Code Analyzer: ✅ Working (deep analysis running)
- Unity Exclusions: ✅ Working (297 files excluded from 679)
- AI Enhancement: ✅ Working (finds and reads nested references)
**Files Changed:**
- src/skill_seekers/cli/unified_scraper.py (Fix 1 & 2)
- src/skill_seekers/cli/github_scraper.py (Fix 2)
- src/skill_seekers/cli/utils.py (Fix 3)
**Test Artifacts:**
- configs/deck_deck_go_local.json (test configuration)
- docs/LOCAL_REPO_TEST_RESULTS.md (comprehensive test report)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Issue: #11 (A1.3 test failures)
## Problem
3/8 tests were failing because ConfigValidator only validates structure
and required fields, NOT format validation (names, URLs, etc.).
## Root Cause
ConfigValidator checks:
- Required fields (name, description, sources/base_url)
- Source types validity
- Field types (arrays, integers)
ConfigValidator does NOT check:
- Name format (alphanumeric, hyphens, underscores)
- URL format (http:// or https://)
## Solution
Added additional format validation in submit_config_tool after ConfigValidator:
1. Name format validation using regex: `^[a-zA-Z0-9_-]+$`
2. URL format validation (must start with http:// or https://)
3. Validates both legacy (base_url) and unified (sources.base_url) formats
## Test Results
Before: 5/8 tests passing, 3 failing
After: 8/8 tests passing ✅
Full suite: 427 tests passing, 40 skipped ✅
## Changes Made
- src/skill_seekers/mcp/server.py:
* Added `import re` at top of file
* Added name format validation (line 1280-1281)
* Added URL format validation for legacy configs (line 1285-1289)
* Added URL format validation for unified configs (line 1291-1296)
- tests/test_mcp_server.py:
* Updated test_submit_config_validates_required_fields to accept
ConfigValidator's correct error message ("cannot detect" instead of "description")
## Validation Examples
Invalid name: "React@2024!" → ❌ "Invalid name format"
Invalid URL: "not-a-url" → ❌ "Invalid base_url format"
Valid name: "react-docs" → ✅
Valid URL: "https://react.dev/" → ✅🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements A1.2 - Add MCP tool to download configs from API
Features:
- Download config files from api.skillseekersweb.com
- List all available configs (24 configs)
- Filter configs by category
- Download specific config by name
- Save to local configs directory
- Display config metadata (category, tags, type, source, last_updated)
- Error handling for 404 and network errors
Usage:
- List configs: fetch_config with list_available=true
- Filter by category: fetch_config with list_available=true, category='web-frameworks'
- Download config: fetch_config with config_name='react'
- Custom destination: fetch_config with config_name='react', destination='my_configs/'
Technical:
- Uses httpx AsyncClient for HTTP requests
- Connects to https://api.skillseekersweb.com
- Returns formatted TextContent responses
- Supports GET /api/configs and GET /api/download endpoints
- Proper error handling for HTTP and JSON errors
Tests:
- ✅ List all configs (24 total)
- ✅ List by category filter (12 web-frameworks)
- ✅ Download specific config (react.json)
- ✅ Handle nonexistent config (404 error)
Issue: N/A (from roadmap task A1.2)
Major improvements:
- Configurable directory exclusions (Issue #203)
- Unlimited local repository analysis
- Skip llms.txt option (PR #198)
- 10+ bug fixes for GitHub scraper
- Test suite expanded to 427 tests
See CHANGELOG.md for full details.
Merges feat/add-skip-llm-to-config by @sogoiii.
This PR adds a valuable configuration option to explicitly skip llms.txt
detection, useful when a site's llms.txt is incomplete, incorrect, or when
specific HTML scraping is needed.
Key features:
- New 'skip_llms_txt' config option (default: false, backward compatible)
- Boolean type validation with warning for invalid values
- Support in both sync and async scraping modes
- 17 comprehensive tests (15 feature tests + 2 config validation tests)
All tests passing after fixing import paths to use proper package names.
Test results: ✅ 17/17 tests passing
Full test suite: ✅ 391 tests passing
Co-authored-by: sogoiii <sogoiii@users.noreply.github.com>