- Add pathspec import with graceful fallback
- Add gitignore_spec attribute to GitHubScraper class
- Implement _load_gitignore() method to parse .gitignore files
- Update should_exclude_dir() to check .gitignore rules
- Load .gitignore automatically in local repository mode
- Handle directory patterns with and without trailing slash
- Add 4 comprehensive tests for .gitignore functionality
Closes#63 - C2.1 File Tree Walker with .gitignore support complete
Features:
- Loads .gitignore from local repository root
- Respects .gitignore patterns for directory exclusion
- Falls back gracefully when pathspec not installed
- Works alongside existing hard-coded exclusions
- Only active in local_repo_path mode (not GitHub API mode)
Test coverage:
- test_load_gitignore_exists: .gitignore parsing
- test_load_gitignore_missing: Missing .gitignore handling
- test_should_exclude_dir_with_gitignore: .gitignore exclusion
- test_should_exclude_dir_default_exclusions: Existing exclusions still work
Integration:
- github_scraper.py now has same .gitignore support as codebase_scraper.py
- Both tools use pathspec library for consistent behavior
- Enables proper repository analysis respecting project .gitignore rules
- Created src/skill_seekers/cli/api_reference_builder.py (330 lines)
- Generates markdown API documentation from code analysis results
- Supports Python, JavaScript/TypeScript, and C++ code signatures
Features:
- Class documentation with inheritance and methods
- Function/method signatures with parameters and return types
- Parameter tables with types and defaults
- Async function indicators
- Decorators display (for Python)
- Standalone CLI tool for generating API docs from JSON
Tests:
- Created tests/test_api_reference_builder.py with 7 tests
- All tests passing ✅
- Test coverage: Class formatting, function formatting, parameter tables,
markdown structure, code analyzer integration, async indicators
Output Format:
- One .md file per analyzed source file
- Organized: Classes → Methods, then standalone Functions
- Professional markdown tables for parameters
CLI Usage:
python -m skill_seekers.cli.api_reference_builder \
code_analysis.json output/api_reference/
Related Issues:
- Closes#66 (C2.4 Build API reference from code)
- Part of C2 Local Codebase Scraping roadmap (TIER 3)
Increased timeout from 5 to 15 seconds for github command E2E test.
The test was flaky on macOS CI due to network latency when
checking non-existent GitHub repos.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Complete fix for Issue #219 - All three problems resolved
✅ Problem #1: Large file download via download_url
✅ Problem #2: CLI enhancement flags working
✅ Problem #3: Custom API endpoint support
Tests: 40/40 passing (31 unit + 9 E2E)
Fixes#219
**Problem #1: Large File Encoding Error** ✅ FIXED
- Add large file download support via download_url
- Detect encoding='none' for files >1MB
- Download via GitHub raw URL instead of API
- Handles ccxt/ccxt's 1.4MB CHANGELOG.md successfully
**Problem #2: Missing CLI Enhancement Flags** ✅ FIXED
- Add --enhance, --enhance-local, --api-key to main.py github_parser
- Add flag forwarding in CLI dispatcher
- Fixes 'unrecognized arguments' error
- Users can now use: skill-seekers github --repo owner/repo --enhance-local
**Problem #3: Custom API Endpoint Support** ✅ FIXED
- Support ANTHROPIC_BASE_URL environment variable
- Support ANTHROPIC_AUTH_TOKEN (alternative to ANTHROPIC_API_KEY)
- Fix ThinkingBlock.text error with newer Anthropic SDK
- Find TextBlock in response content array (handles thinking blocks)
**Changes**:
- src/skill_seekers/cli/enhance_skill.py:
- Support custom base_url parameter
- Support both ANTHROPIC_API_KEY and ANTHROPIC_AUTH_TOKEN
- Iterate through content blocks to find text (handles ThinkingBlock)
- src/skill_seekers/cli/main.py:
- Add --enhance, --enhance-local, --api-key to github_parser
- Forward flags to github_scraper.py in dispatcher
- src/skill_seekers/cli/github_scraper.py:
- Add large file detection (encoding=None/"none")
- Download via download_url with requests
- Log file size and download progress
- tests/test_github_scraper.py:
- Add test_get_file_content_large_file
- Add test_extract_changelog_large_file
- All 31 tests passing ✅
**Credits**:
- Thanks to @XGCoder for detailed bug report
- Thanks to @gorquan for local fixes and guidance
Fixes#219🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add _get_file_content() helper method to detect and follow symlinks
- Update _extract_readme() to use new helper
- Update _extract_changelog() to use new helper
- Add 7 comprehensive tests for symlink handling
- All 29 GitHub scraper tests passing
Fixes#225
When README.md or CHANGELOG.md are symlinks (like in vercel/ai repo),
PyGithub returns ContentFile with type='symlink' and encoding=None.
Direct access to decoded_content throws AssertionError.
Solution: Detect symlink type, follow target path, then decode actual file.
Handles edge cases: broken symlinks, missing targets, encoding errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add _check_skill_completeness() method to quality checker that validates:
- Prerequisites/verification sections (helps Claude check conditions first)
- Error handling/troubleshooting guidance (common issues and solutions)
- Workflow steps (sequential instructions using first/then/next/finally)
This addresses G2.3 and G2.4 from the roadmap:
- G2.3: Add readability scoring (via workflow step detection)
- G2.4: Add completeness checker
New checks use info-level messages (not warnings) to avoid affecting
quality scores for existing skills while still providing helpful guidance.
Includes 4 new unit tests for completeness checks.
Contributed by the AI Writing Guide project.
Fixes#226
Changed from manual package listing to automatic discovery:
- Replaced explicit packages list with [tool.setuptools.packages.find]
- Set package-dir = {"" = "src"} for src layout
- Added include = ["skill_seekers*"] to catch all submodules
- Set namespaces = false for proper package structure
This ensures the adaptors/ directory and all other submodules
are properly included when building the PyPI package.
The manual packages list was missing newly added modules and
didn't auto-discover the src/skill_seekers/cli/adaptors directory.
- Reduced from 1116 to 526 lines (53% reduction)
- Focused on architecture and testing requirements
- Removed redundant user-facing documentation
- Added critical development notes and workflows
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Switch from manual package listing to automatic discovery
- Improves maintainability and prevents missing module bugs
- All tests passing (700+ tests)
- Package contents verified identical to v2.5.1
Fixes#226
Merges #227
Thanks to @iamKhan79690 for the contribution!
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Anas Ur Rehman (@iamKhan79690) <noreply@github.com>
Added empty py.typed marker file to enable type checkers (mypy, pyright,
pylance) to use inline type hints from the package.
This file was declared in pyproject.toml package_data but was missing,
causing build warnings.
Benefits:
- Enables type checkers to use inline type hints
- Follows Python typing best practices (PEP 561)
- Improves IDE autocomplete/intellisense
Fixes#222🤖 Generated with [Claude Code](https://claude.com/claude-code)
Added empty py.typed marker file to enable type checkers (mypy, pyright,
pylance) to use inline type hints from the package.
This file was declared in pyproject.toml package_data but was missing,
causing build warnings.
Benefits:
- Enables type checkers to use inline type hints
- Follows Python typing best practices (PEP 561)
- Improves IDE autocomplete/intellisense
Fixes#222🤖 Generated with [Claude Code](https://claude.com/claude-code)
This merge brings critical PyPI bug fix and all v2.5.1 updates to main:
Critical Fixes:
- PR #221: Fixed missing skill_seekers.cli.adaptors package (breaks v2.5.0 on PyPI)
- All version strings updated to 2.5.1
- All test assertions fixed for v2.5.1
Changes Merged:
- Version bump to 2.5.1 across all modules
- CHANGELOG updated with v2.5.1 release notes
- CLAUDE.md updated with v2.5.0 multi-platform architecture
- All 857 tests passing
Commits included:
- 3e36571: test: Update all version assertions to 2.5.1
- 58d37c9: test: Update version assertions to 2.5.1
- 5e166c4: chore: Bump version to v2.5.1 - Critical PyPI Bug Fix
- b912331: chore: Bump version to v2.5.0 - Multi-Platform Feature Parity
- c8ca1e1: fix: Add missing adaptors module to pyproject.toml packages
Ready for v2.5.1 release to PyPI.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
The skill_seekers.cli.adaptors module was missing from the packages list in pyproject.toml, causing ModuleNotFoundError when using the package_skill command with PyPI-installed package (v2.5.0).
This module provides multi-LLM platform support:
- base.py - Base adaptor class
- claude.py - Claude AI adaptor
- gemini.py - Google Gemini adaptor
- openai.py - OpenAI ChatGPT adaptor
- markdown.py - Generic markdown export
Co-authored-by: MiaoDX <miaodongxu@xiaomi.com>
Updated test expectations from old versions to 2.5.0:
- tests/test_package_structure.py: 4 assertions (cli, mcp, mcp.tools, root)
- tests/test_cli_paths.py: CLI version output
- tests/test_adaptors/test_base.py: Metadata test data
All tests now expect 2.5.0 instead of 2.0.0/2.4.0.
Merge development branch for v2.5.0 release.
This release adds complete multi-platform support for Claude AI,
Google Gemini, OpenAI ChatGPT, and Generic Markdown with full
feature parity across all platforms and skill modes.
Major Features:
- 4 LLM platforms supported
- Platform-specific adaptors architecture
- 18 MCP tools with multi-platform support
- Complete feature parity implementation
- Comprehensive platform documentation
- 700 tests passing
See CHANGELOG.md for detailed release notes.
- Replace TextContent = None with proper fallback class in all MCP tool modules
- Fixes TypeError when MCP library is not fully initialized in test environment
- Ensures all 700 tests pass (was 699 passing, 1 failing)
- Affected files:
* packaging_tools.py
* config_tools.py
* scraping_tools.py
* source_tools.py
* splitting_tools.py
The fallback class maintains the same interface as mcp.types.TextContent,
allowing tests to run successfully even when the MCP library import fails.
Test results: ✅ 700 passed, 157 skipped, 2 warnings
Add three detailed platform guides:
1. **MULTI_LLM_SUPPORT.md** - Complete multi-platform overview
- Supported platforms comparison table
- Quick start for all platforms
- Installation options
- Complete workflow examples
- Advanced usage and troubleshooting
- Programmatic API usage examples
2. **GEMINI_INTEGRATION.md** - Google Gemini integration guide
- Setup and API key configuration
- Complete workflow with tar.gz packaging
- Gemini-specific format differences
- Files API + grounding usage
- Cost estimation and best practices
- Troubleshooting common issues
3. **OPENAI_INTEGRATION.md** - OpenAI ChatGPT integration guide
- Setup and API key configuration
- Complete workflow with Assistants API
- Vector Store + file_search integration
- Assistant instructions format
- Cost estimation and best practices
- Troubleshooting common issues
All guides include:
- Code examples for CLI and Python API
- Platform-specific features and differences
- Real-world usage patterns
- Troubleshooting sections
- Best practices
Related to #179
Add comprehensive multi-LLM support section featuring:
- 4 supported platforms (Claude, Gemini, OpenAI, Markdown)
- Comparison table showing format, upload, enhancement, API keys
- Example commands for each platform
- Installation instructions for optional dependencies
- 100% backward compatibility guarantee
Highlights:
- Claude remains default (no changes needed)
- Optional dependencies: [gemini], [openai], [all-llms]
- Universal scraping works for all platforms
- Platform-specific packaging and upload
Related to #179
Add optional dependency groups for LLM platforms:
- [gemini]: google-generativeai>=0.8.0
- [openai]: openai>=1.0.0
- [all-llms]: All LLM platform dependencies combined
- Updated [all] group to include all LLM dependencies
Users can now install with:
- pip install skill-seekers[gemini]
- pip install skill-seekers[openai]
- pip install skill-seekers[all-llms]
Core functionality remains unchanged (no breaking changes)
Related to #179
- Add MarkdownAdaptor for universal markdown export
- Pure markdown format (no platform-specific features)
- ZIP packaging with README.md, references/, DOCUMENTATION.md
- No upload capability (manual use only)
- No AI enhancement support
- Combines all references into single DOCUMENTATION.md
- Add 12 unit tests (all passing)
Test Results:
- 12 MarkdownAdaptor tests passing
- 45 total adaptor tests passing (4 skipped)
Phase 4 Complete ✅
Related to #179
Implements hybrid smart extraction + improved fallback templates for
skill descriptions across all scrapers.
Changes:
- github_scraper.py:
* Added extract_description_from_readme() helper
* Extracts from README first paragraph (60 lines)
* Updates description after README extraction
* Fallback: "Use when working with {name}"
* Updated 3 locations (GitHubScraper, GitHubToSkillConverter, main)
- doc_scraper.py:
* Added infer_description_from_docs() helper
* Extracts from meta tags or first paragraph (65 lines)
* Tries: meta description, og:description, first content paragraph
* Fallback: "Use when working with {name}"
* Updated 2 locations (create_enhanced_skill_md, get_configuration)
- pdf_scraper.py:
* Added infer_description_from_pdf() helper
* Extracts from PDF metadata (subject, title)
* Fallback: "Use when referencing {name} documentation"
* Updated 3 locations (PDFToSkillConverter, main x2)
- generate_router.py:
* Updated 2 locations with improved router descriptions
* "Use when working with {name} development and programming"
All changes:
- Only apply to NEW skill generations (don't modify existing)
- No API calls (free/offline)
- Smart extraction when metadata/README available
- Improved "Use when..." fallbacks instead of generic templates
- 612 tests passing (100%)
Fixes#191
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#209 - UnicodeDecodeError on Windows with non-ASCII characters
**Problem:**
Windows users with non-English locales (Chinese, Japanese, Korean, etc.)
experienced GBK/SHIFT-JIS codec errors when the system default encoding
is not UTF-8.
Error: 'gbk' codec can't decode byte 0xac in position 206: illegal
multibyte sequence
**Root Cause:**
File operations using open() without explicit encoding parameter use
the system default encoding, which on Windows Chinese edition is GBK.
JSON files contain UTF-8 encoded characters that fail to decode with GBK.
**Solution:**
Added encoding='utf-8' to ALL file operations across:
- doc_scraper.py (4 instances):
* load_config() - line 1310
* check_existing_data() - line 1416
* save_checkpoint() - line 173
* load_checkpoint() - line 186
- github_scraper.py (1 instance):
* main() config loading - line 922
- unified_scraper.py (10 instances):
* All JSON read/write operations - lines 134, 153, 205, 239, 275,
278, 325, 328, 342, 364
**Test Results:**
- ✅ All 612 tests passing (100% pass rate)
- ✅ Backward compatible (UTF-8 is standard on Linux/macOS)
- ✅ Fixes Windows locale issues
**Impact:**
- ✅ Works on ALL Windows locales (Chinese, Japanese, Korean, etc.)
- ✅ Maintains compatibility with Linux/macOS
- ✅ Prevents future encoding issues
**Thanks to:** @my5icol for the detailed bug report and fix suggestion!
Fixes#214 - Local enhancement now handles large skills automatically
**Problem:**
- Claude CLI has undocumented ~30-40K character limit
- Large skills (>30K chars) fail silently during local enhancement
- Users experience "Claude finished but SKILL.md was not updated" error
**Solution:**
- Auto-detect large skills (>30K chars)
- Apply intelligent summarization to reduce content size
- Preserve critical content:
* First 20% (introduction/overview)
* Up to 5 best code blocks
* Up to 10 section headings with context
- Target ~30% of original size
- Show clear warnings when summarization is applied
**Implementation:**
- Added `summarize_reference()` method to LocalSkillEnhancer
- Modified `create_enhancement_prompt()` to accept summarization parameters
- Updated `run()` method to auto-enable summarization for large skills
- Added comprehensive test suite (6 tests)
**Test Results:**
- ✅ All 612 tests passing (100% pass rate)
- ✅ 6 new smart summarization tests
- ✅ E2E test: 60K skill → 17K prompt (within limits)
- ✅ Code block preservation verified
**User Experience:**
When enhancement is triggered on a large skill:
```
⚠️ LARGE SKILL DETECTED
📊 Reference content: 60,072 characters
💡 Claude CLI limit: ~30,000-40,000 characters
🔧 Applying smart summarization to ensure success...
• Keeping introductions and overviews
• Extracting best code examples
• Preserving key concepts and headings
• Target: ~30% of original size
✓ Reduced from 60,072 to 15,685 chars (26%)
✓ Prompt created and optimized (17,804 characters)
✓ Ready for Claude CLI (within safe limits)
```
**Backward Compatibility:**
- No breaking changes
- Works with existing skills
- Falls back gracefully for normal-sized skills