Commit Graph

4 Commits

Author SHA1 Message Date
yusyus
f2faebb8d5 fix: Complete fix for Issue #219 - All three problems resolved
**Problem #1: Large File Encoding Error**  FIXED
- Add large file download support via download_url
- Detect encoding='none' for files >1MB
- Download via GitHub raw URL instead of API
- Handles ccxt/ccxt's 1.4MB CHANGELOG.md successfully

**Problem #2: Missing CLI Enhancement Flags**  FIXED
- Add --enhance, --enhance-local, --api-key to main.py github_parser
- Add flag forwarding in CLI dispatcher
- Fixes 'unrecognized arguments' error
- Users can now use: skill-seekers github --repo owner/repo --enhance-local

**Problem #3: Custom API Endpoint Support**  FIXED
- Support ANTHROPIC_BASE_URL environment variable
- Support ANTHROPIC_AUTH_TOKEN (alternative to ANTHROPIC_API_KEY)
- Fix ThinkingBlock.text error with newer Anthropic SDK
- Find TextBlock in response content array (handles thinking blocks)

**Changes**:
- src/skill_seekers/cli/enhance_skill.py:
  - Support custom base_url parameter
  - Support both ANTHROPIC_API_KEY and ANTHROPIC_AUTH_TOKEN
  - Iterate through content blocks to find text (handles ThinkingBlock)

- src/skill_seekers/cli/main.py:
  - Add --enhance, --enhance-local, --api-key to github_parser
  - Forward flags to github_scraper.py in dispatcher

- src/skill_seekers/cli/github_scraper.py:
  - Add large file detection (encoding=None/"none")
  - Download via download_url with requests
  - Log file size and download progress

- tests/test_github_scraper.py:
  - Add test_get_file_content_large_file
  - Add test_extract_changelog_large_file
  - All 31 tests passing 

**Credits**:
- Thanks to @XGCoder for detailed bug report
- Thanks to @gorquan for local fixes and guidance

Fixes #219

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 20:57:03 +03:00
yusyus
58286f454a fix: Handle symlinked README.md and CHANGELOG.md in GitHub scraper
- Add _get_file_content() helper method to detect and follow symlinks
- Update _extract_readme() to use new helper
- Update _extract_changelog() to use new helper
- Add 7 comprehensive tests for symlink handling
- All 29 GitHub scraper tests passing

Fixes #225

When README.md or CHANGELOG.md are symlinks (like in vercel/ai repo),
PyGithub returns ContentFile with type='symlink' and encoding=None.
Direct access to decoded_content throws AssertionError.

Solution: Detect symlink type, follow target path, then decode actual file.
Handles edge cases: broken symlinks, missing targets, encoding errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 20:41:28 +03:00
yusyus
50e0bfd19b fix: Update test file imports to use proper package paths
Fixed import errors in test_pdf_scraper.py and test_github_scraper.py:
- Replaced absolute imports with proper package imports
- Changed 'from pdf_scraper import' to 'from skill_seekers.cli.pdf_scraper import'
- Changed 'from github_scraper import' to 'from skill_seekers.cli.github_scraper import'
- Updated all @patch() decorators to use full module paths
- Removed sys.path manipulation workarounds

This completes the fix for import issues discovered during Task 1.2 (Issue #193).

Test Results:
- test_pdf_scraper.py: 18/18 passed 
- test_github_scraper.py: 22/22 passed 
2025-11-29 21:55:46 +03:00
yusyus
53d01910f9 test: Add comprehensive test suite for GitHub scraper (22 tests)
Tests cover all C1 tasks:
- GitHubScraper initialization and authentication (5 tests)
- README extraction (C1.2) (3 tests)
- Language detection (C1.4) (2 tests)
- GitHub Issues extraction (C1.7) (3 tests)
- CHANGELOG extraction (C1.8) (3 tests)
- GitHub Releases extraction (C1.9) (2 tests)
- GitHubToSkillConverter and skill building (C1.10) (2 tests)
- Error handling and edge cases (2 tests)

All tests passing: 22/22 

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 14:30:57 +03:00