**Problem #1: Large File Encoding Error** ✅ FIXED
- Add large file download support via download_url
- Detect encoding='none' for files >1MB
- Download via GitHub raw URL instead of API
- Handles ccxt/ccxt's 1.4MB CHANGELOG.md successfully
**Problem #2: Missing CLI Enhancement Flags** ✅ FIXED
- Add --enhance, --enhance-local, --api-key to main.py github_parser
- Add flag forwarding in CLI dispatcher
- Fixes 'unrecognized arguments' error
- Users can now use: skill-seekers github --repo owner/repo --enhance-local
**Problem #3: Custom API Endpoint Support** ✅ FIXED
- Support ANTHROPIC_BASE_URL environment variable
- Support ANTHROPIC_AUTH_TOKEN (alternative to ANTHROPIC_API_KEY)
- Fix ThinkingBlock.text error with newer Anthropic SDK
- Find TextBlock in response content array (handles thinking blocks)
**Changes**:
- src/skill_seekers/cli/enhance_skill.py:
- Support custom base_url parameter
- Support both ANTHROPIC_API_KEY and ANTHROPIC_AUTH_TOKEN
- Iterate through content blocks to find text (handles ThinkingBlock)
- src/skill_seekers/cli/main.py:
- Add --enhance, --enhance-local, --api-key to github_parser
- Forward flags to github_scraper.py in dispatcher
- src/skill_seekers/cli/github_scraper.py:
- Add large file detection (encoding=None/"none")
- Download via download_url with requests
- Log file size and download progress
- tests/test_github_scraper.py:
- Add test_get_file_content_large_file
- Add test_extract_changelog_large_file
- All 31 tests passing ✅
**Credits**:
- Thanks to @XGCoder for detailed bug report
- Thanks to @gorquan for local fixes and guidance
Fixes#219🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add _get_file_content() helper method to detect and follow symlinks
- Update _extract_readme() to use new helper
- Update _extract_changelog() to use new helper
- Add 7 comprehensive tests for symlink handling
- All 29 GitHub scraper tests passing
Fixes#225
When README.md or CHANGELOG.md are symlinks (like in vercel/ai repo),
PyGithub returns ContentFile with type='symlink' and encoding=None.
Direct access to decoded_content throws AssertionError.
Solution: Detect symlink type, follow target path, then decode actual file.
Handles edge cases: broken symlinks, missing targets, encoding errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add _check_skill_completeness() method to quality checker that validates:
- Prerequisites/verification sections (helps Claude check conditions first)
- Error handling/troubleshooting guidance (common issues and solutions)
- Workflow steps (sequential instructions using first/then/next/finally)
This addresses G2.3 and G2.4 from the roadmap:
- G2.3: Add readability scoring (via workflow step detection)
- G2.4: Add completeness checker
New checks use info-level messages (not warnings) to avoid affecting
quality scores for existing skills while still providing helpful guidance.
Includes 4 new unit tests for completeness checks.
Contributed by the AI Writing Guide project.
Fixes#226
Changed from manual package listing to automatic discovery:
- Replaced explicit packages list with [tool.setuptools.packages.find]
- Set package-dir = {"" = "src"} for src layout
- Added include = ["skill_seekers*"] to catch all submodules
- Set namespaces = false for proper package structure
This ensures the adaptors/ directory and all other submodules
are properly included when building the PyPI package.
The manual packages list was missing newly added modules and
didn't auto-discover the src/skill_seekers/cli/adaptors directory.
- Reduced from 1116 to 526 lines (53% reduction)
- Focused on architecture and testing requirements
- Removed redundant user-facing documentation
- Added critical development notes and workflows
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Switch from manual package listing to automatic discovery
- Improves maintainability and prevents missing module bugs
- All tests passing (700+ tests)
- Package contents verified identical to v2.5.1
Fixes#226
Merges #227
Thanks to @iamKhan79690 for the contribution!
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Anas Ur Rehman (@iamKhan79690) <noreply@github.com>
Added empty py.typed marker file to enable type checkers (mypy, pyright,
pylance) to use inline type hints from the package.
This file was declared in pyproject.toml package_data but was missing,
causing build warnings.
Benefits:
- Enables type checkers to use inline type hints
- Follows Python typing best practices (PEP 561)
- Improves IDE autocomplete/intellisense
Fixes#222🤖 Generated with [Claude Code](https://claude.com/claude-code)
Added empty py.typed marker file to enable type checkers (mypy, pyright,
pylance) to use inline type hints from the package.
This file was declared in pyproject.toml package_data but was missing,
causing build warnings.
Benefits:
- Enables type checkers to use inline type hints
- Follows Python typing best practices (PEP 561)
- Improves IDE autocomplete/intellisense
Fixes#222🤖 Generated with [Claude Code](https://claude.com/claude-code)
This merge brings critical PyPI bug fix and all v2.5.1 updates to main:
Critical Fixes:
- PR #221: Fixed missing skill_seekers.cli.adaptors package (breaks v2.5.0 on PyPI)
- All version strings updated to 2.5.1
- All test assertions fixed for v2.5.1
Changes Merged:
- Version bump to 2.5.1 across all modules
- CHANGELOG updated with v2.5.1 release notes
- CLAUDE.md updated with v2.5.0 multi-platform architecture
- All 857 tests passing
Commits included:
- 3e36571: test: Update all version assertions to 2.5.1
- 58d37c9: test: Update version assertions to 2.5.1
- 5e166c4: chore: Bump version to v2.5.1 - Critical PyPI Bug Fix
- b912331: chore: Bump version to v2.5.0 - Multi-Platform Feature Parity
- c8ca1e1: fix: Add missing adaptors module to pyproject.toml packages
Ready for v2.5.1 release to PyPI.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
The skill_seekers.cli.adaptors module was missing from the packages list in pyproject.toml, causing ModuleNotFoundError when using the package_skill command with PyPI-installed package (v2.5.0).
This module provides multi-LLM platform support:
- base.py - Base adaptor class
- claude.py - Claude AI adaptor
- gemini.py - Google Gemini adaptor
- openai.py - OpenAI ChatGPT adaptor
- markdown.py - Generic markdown export
Co-authored-by: MiaoDX <miaodongxu@xiaomi.com>
Updated test expectations from old versions to 2.5.0:
- tests/test_package_structure.py: 4 assertions (cli, mcp, mcp.tools, root)
- tests/test_cli_paths.py: CLI version output
- tests/test_adaptors/test_base.py: Metadata test data
All tests now expect 2.5.0 instead of 2.0.0/2.4.0.
Merge development branch for v2.5.0 release.
This release adds complete multi-platform support for Claude AI,
Google Gemini, OpenAI ChatGPT, and Generic Markdown with full
feature parity across all platforms and skill modes.
Major Features:
- 4 LLM platforms supported
- Platform-specific adaptors architecture
- 18 MCP tools with multi-platform support
- Complete feature parity implementation
- Comprehensive platform documentation
- 700 tests passing
See CHANGELOG.md for detailed release notes.
- Replace TextContent = None with proper fallback class in all MCP tool modules
- Fixes TypeError when MCP library is not fully initialized in test environment
- Ensures all 700 tests pass (was 699 passing, 1 failing)
- Affected files:
* packaging_tools.py
* config_tools.py
* scraping_tools.py
* source_tools.py
* splitting_tools.py
The fallback class maintains the same interface as mcp.types.TextContent,
allowing tests to run successfully even when the MCP library import fails.
Test results: ✅ 700 passed, 157 skipped, 2 warnings
Add three detailed platform guides:
1. **MULTI_LLM_SUPPORT.md** - Complete multi-platform overview
- Supported platforms comparison table
- Quick start for all platforms
- Installation options
- Complete workflow examples
- Advanced usage and troubleshooting
- Programmatic API usage examples
2. **GEMINI_INTEGRATION.md** - Google Gemini integration guide
- Setup and API key configuration
- Complete workflow with tar.gz packaging
- Gemini-specific format differences
- Files API + grounding usage
- Cost estimation and best practices
- Troubleshooting common issues
3. **OPENAI_INTEGRATION.md** - OpenAI ChatGPT integration guide
- Setup and API key configuration
- Complete workflow with Assistants API
- Vector Store + file_search integration
- Assistant instructions format
- Cost estimation and best practices
- Troubleshooting common issues
All guides include:
- Code examples for CLI and Python API
- Platform-specific features and differences
- Real-world usage patterns
- Troubleshooting sections
- Best practices
Related to #179
Add comprehensive multi-LLM support section featuring:
- 4 supported platforms (Claude, Gemini, OpenAI, Markdown)
- Comparison table showing format, upload, enhancement, API keys
- Example commands for each platform
- Installation instructions for optional dependencies
- 100% backward compatibility guarantee
Highlights:
- Claude remains default (no changes needed)
- Optional dependencies: [gemini], [openai], [all-llms]
- Universal scraping works for all platforms
- Platform-specific packaging and upload
Related to #179
Add optional dependency groups for LLM platforms:
- [gemini]: google-generativeai>=0.8.0
- [openai]: openai>=1.0.0
- [all-llms]: All LLM platform dependencies combined
- Updated [all] group to include all LLM dependencies
Users can now install with:
- pip install skill-seekers[gemini]
- pip install skill-seekers[openai]
- pip install skill-seekers[all-llms]
Core functionality remains unchanged (no breaking changes)
Related to #179
- Add MarkdownAdaptor for universal markdown export
- Pure markdown format (no platform-specific features)
- ZIP packaging with README.md, references/, DOCUMENTATION.md
- No upload capability (manual use only)
- No AI enhancement support
- Combines all references into single DOCUMENTATION.md
- Add 12 unit tests (all passing)
Test Results:
- 12 MarkdownAdaptor tests passing
- 45 total adaptor tests passing (4 skipped)
Phase 4 Complete ✅
Related to #179
Implements hybrid smart extraction + improved fallback templates for
skill descriptions across all scrapers.
Changes:
- github_scraper.py:
* Added extract_description_from_readme() helper
* Extracts from README first paragraph (60 lines)
* Updates description after README extraction
* Fallback: "Use when working with {name}"
* Updated 3 locations (GitHubScraper, GitHubToSkillConverter, main)
- doc_scraper.py:
* Added infer_description_from_docs() helper
* Extracts from meta tags or first paragraph (65 lines)
* Tries: meta description, og:description, first content paragraph
* Fallback: "Use when working with {name}"
* Updated 2 locations (create_enhanced_skill_md, get_configuration)
- pdf_scraper.py:
* Added infer_description_from_pdf() helper
* Extracts from PDF metadata (subject, title)
* Fallback: "Use when referencing {name} documentation"
* Updated 3 locations (PDFToSkillConverter, main x2)
- generate_router.py:
* Updated 2 locations with improved router descriptions
* "Use when working with {name} development and programming"
All changes:
- Only apply to NEW skill generations (don't modify existing)
- No API calls (free/offline)
- Smart extraction when metadata/README available
- Improved "Use when..." fallbacks instead of generic templates
- 612 tests passing (100%)
Fixes#191
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#209 - UnicodeDecodeError on Windows with non-ASCII characters
**Problem:**
Windows users with non-English locales (Chinese, Japanese, Korean, etc.)
experienced GBK/SHIFT-JIS codec errors when the system default encoding
is not UTF-8.
Error: 'gbk' codec can't decode byte 0xac in position 206: illegal
multibyte sequence
**Root Cause:**
File operations using open() without explicit encoding parameter use
the system default encoding, which on Windows Chinese edition is GBK.
JSON files contain UTF-8 encoded characters that fail to decode with GBK.
**Solution:**
Added encoding='utf-8' to ALL file operations across:
- doc_scraper.py (4 instances):
* load_config() - line 1310
* check_existing_data() - line 1416
* save_checkpoint() - line 173
* load_checkpoint() - line 186
- github_scraper.py (1 instance):
* main() config loading - line 922
- unified_scraper.py (10 instances):
* All JSON read/write operations - lines 134, 153, 205, 239, 275,
278, 325, 328, 342, 364
**Test Results:**
- ✅ All 612 tests passing (100% pass rate)
- ✅ Backward compatible (UTF-8 is standard on Linux/macOS)
- ✅ Fixes Windows locale issues
**Impact:**
- ✅ Works on ALL Windows locales (Chinese, Japanese, Korean, etc.)
- ✅ Maintains compatibility with Linux/macOS
- ✅ Prevents future encoding issues
**Thanks to:** @my5icol for the detailed bug report and fix suggestion!
Fixes#214 - Local enhancement now handles large skills automatically
**Problem:**
- Claude CLI has undocumented ~30-40K character limit
- Large skills (>30K chars) fail silently during local enhancement
- Users experience "Claude finished but SKILL.md was not updated" error
**Solution:**
- Auto-detect large skills (>30K chars)
- Apply intelligent summarization to reduce content size
- Preserve critical content:
* First 20% (introduction/overview)
* Up to 5 best code blocks
* Up to 10 section headings with context
- Target ~30% of original size
- Show clear warnings when summarization is applied
**Implementation:**
- Added `summarize_reference()` method to LocalSkillEnhancer
- Modified `create_enhancement_prompt()` to accept summarization parameters
- Updated `run()` method to auto-enable summarization for large skills
- Added comprehensive test suite (6 tests)
**Test Results:**
- ✅ All 612 tests passing (100% pass rate)
- ✅ 6 new smart summarization tests
- ✅ E2E test: 60K skill → 17K prompt (within limits)
- ✅ Code block preservation verified
**User Experience:**
When enhancement is triggered on a large skill:
```
⚠️ LARGE SKILL DETECTED
📊 Reference content: 60,072 characters
💡 Claude CLI limit: ~30,000-40,000 characters
🔧 Applying smart summarization to ensure success...
• Keeping introductions and overviews
• Extracting best code examples
• Preserving key concepts and headings
• Target: ~30% of original size
✓ Reduced from 60,072 to 15,685 chars (26%)
✓ Prompt created and optimized (17,804 characters)
✓ Ready for Claude CLI (within safe limits)
```
**Backward Compatibility:**
- No breaking changes
- Works with existing skills
- Falls back gracefully for normal-sized skills
Add automatic skill installation to 10+ AI coding agents with a single command.
New Features:
- New install-agent command for installing skills to any AI agent
- Support for 10+ agents: Claude Code, Cursor, VS Code, Amp, Goose, OpenCode, Letta, Aide, Windsurf
- Smart path resolution (global ~/.agent vs project-relative .agent/)
- Fuzzy agent name matching with suggestions
- --agent all flag to install to all agents at once
- --force flag to overwrite existing installations
- --dry-run flag to preview installations
- Comprehensive error handling and user feedback
Implementation:
- Created install_agent.py (379 lines) with core installation logic
- Updated main.py with install-agent subcommand
- Updated pyproject.toml with entry point
- Added 32 comprehensive tests (all passing, 603 total)
- No regressions in existing functionality
Documentation:
- Updated README.md with multi-agent installation guide
- Updated CLAUDE.md with install-agent examples
- Updated CHANGELOG.md with v2.3.0 release notes
- Added agent compatibility table
Technical Details:
- 100% own implementation (no external dependencies)
- Pure Python using stdlib (shutil, pathlib, argparse)
- Compatible with Agent Skills open standard (agentskills.io)
- Works offline
Closes#210🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Brings release commits back to development:
- Version bump to 2.2.0
- CLI version string update
- Test fix for version check
- CHANGELOG [Unreleased] section
- All CI tests passing
The exponential_backoff_timing test was flaky on CI due to strict timing assertions. On busy CI systems (especially macOS runners), CPU scheduling and execution time variance can cause measured delays to deviate from expected values.
Changes:
- Simplified test to check total elapsed time instead of individual delay comparisons
- Changed threshold from 1.5x comparison to lenient 0.25s total time minimum
- Expected delays: 0.1s + 0.2s = 0.3s minimum, using 0.25s threshold for variance
- Test now verifies behavior (delays applied) without strict timing requirements
This makes the test reliable across different CI environments while still validating retry logic.
Fixes CI failure on macOS runner (Python 3.12):
AssertionError: 0.249 not greater than 0.250 * 1.5
- Created LanguageDetector class supporting 20+ programming languages
- Confidence-based detection with customizable thresholds (min_confidence parameter)
- Replaces duplicate language detection code in doc_scraper and pdf_extractor
- Comprehensive test suite with 100+ test cases
Changes:
- NEW: src/skill_seekers/cli/language_detector.py (17 KB)
- Unified detector with pattern matching for 20+ languages
- Confidence scoring (0.0-1.0 scale)
- Supports: Python, JavaScript, TypeScript, Java, C++, C#, Go, Rust, PHP, Ruby, Swift, Kotlin, Shell, SQL, HTML, CSS, JSON, YAML, XML, and more
- NEW: tests/test_language_detector.py (20 KB)
- 100+ test cases covering all supported languages
- Edge case testing (mixed code, low confidence, etc.)
- MODIFIED: src/skill_seekers/cli/doc_scraper.py
- Removed 80+ lines of duplicate detection code
- Now uses shared LanguageDetector instance
- MODIFIED: src/skill_seekers/cli/pdf_extractor_poc.py
- Removed 130+ lines of duplicate detection code
- Now uses shared LanguageDetector instance
- MODIFIED: tests/test_pdf_extractor.py
- Fixed imports to use proper package paths
- Added manual detector initialization in test setup
Benefits:
- DRY: Single source of truth for language detection
- Maintainability: Add new languages in one place
- Consistency: Same detection logic across all scrapers
- Testability: Comprehensive test coverage
- Extensibility: Easy to add new languages or improve patterns
Addresses technical debt from having duplicate detection logic in multiple files.