Commit Graph

8 Commits

Author SHA1 Message Date
yusyus
ec3e0bf491 fix: Resolve 61 critical linting errors
Fixed priority linting errors to improve code quality:

Critical Fixes:
- F821 (2 errors): Fixed undefined name 'original_result' in config_enhancer.py
- UP035 (2 errors): Removed deprecated typing.Dict and typing.Type imports
- F401 (27 errors): Removed unused imports and added noqa for availability checks
- E722 (19 errors): Replaced bare 'except:' with 'except Exception:'

Code Quality Improvements:
- SIM201 (4 errors): Simplified 'not x == y' to 'x != y'
- SIM118 (2 errors): Removed unnecessary .keys() in dict iterations
- E741 (4 errors): Renamed ambiguous variable 'l' to 'line'
- I001 (1 error): Sorted imports in test_bootstrap_skill.py

All modified areas tested and passing:
- test_scraper_features.py: 42 passed
- test_integration.py: 51 passed
- test_architecture_scenarios.py: 11 passed
- test_real_world_fastmcp.py: 19 passed (1 skipped)

Remaining linting errors: 249 (mostly code style suggestions like ARG002, F841, SIM102)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-17 22:54:40 +03:00
Pablo Estevez
c33c6f9073 change max lenght 2026-01-17 17:48:15 +00:00
Pablo Estevez
5ed767ff9a run ruff 2026-01-17 17:29:21 +00:00
yusyus
eac1f4ef8e feat(C2.1): Add .gitignore support to github_scraper for local repos
- Add pathspec import with graceful fallback
- Add gitignore_spec attribute to GitHubScraper class
- Implement _load_gitignore() method to parse .gitignore files
- Update should_exclude_dir() to check .gitignore rules
- Load .gitignore automatically in local repository mode
- Handle directory patterns with and without trailing slash
- Add 4 comprehensive tests for .gitignore functionality

Closes #63 - C2.1 File Tree Walker with .gitignore support complete

Features:
- Loads .gitignore from local repository root
- Respects .gitignore patterns for directory exclusion
- Falls back gracefully when pathspec not installed
- Works alongside existing hard-coded exclusions
- Only active in local_repo_path mode (not GitHub API mode)

Test coverage:
- test_load_gitignore_exists: .gitignore parsing
- test_load_gitignore_missing: Missing .gitignore handling
- test_should_exclude_dir_with_gitignore: .gitignore exclusion
- test_should_exclude_dir_default_exclusions: Existing exclusions still work

Integration:
- github_scraper.py now has same .gitignore support as codebase_scraper.py
- Both tools use pathspec library for consistent behavior
- Enables proper repository analysis respecting project .gitignore rules
2026-01-01 23:21:12 +03:00
yusyus
f2faebb8d5 fix: Complete fix for Issue #219 - All three problems resolved
**Problem #1: Large File Encoding Error**  FIXED
- Add large file download support via download_url
- Detect encoding='none' for files >1MB
- Download via GitHub raw URL instead of API
- Handles ccxt/ccxt's 1.4MB CHANGELOG.md successfully

**Problem #2: Missing CLI Enhancement Flags**  FIXED
- Add --enhance, --enhance-local, --api-key to main.py github_parser
- Add flag forwarding in CLI dispatcher
- Fixes 'unrecognized arguments' error
- Users can now use: skill-seekers github --repo owner/repo --enhance-local

**Problem #3: Custom API Endpoint Support**  FIXED
- Support ANTHROPIC_BASE_URL environment variable
- Support ANTHROPIC_AUTH_TOKEN (alternative to ANTHROPIC_API_KEY)
- Fix ThinkingBlock.text error with newer Anthropic SDK
- Find TextBlock in response content array (handles thinking blocks)

**Changes**:
- src/skill_seekers/cli/enhance_skill.py:
  - Support custom base_url parameter
  - Support both ANTHROPIC_API_KEY and ANTHROPIC_AUTH_TOKEN
  - Iterate through content blocks to find text (handles ThinkingBlock)

- src/skill_seekers/cli/main.py:
  - Add --enhance, --enhance-local, --api-key to github_parser
  - Forward flags to github_scraper.py in dispatcher

- src/skill_seekers/cli/github_scraper.py:
  - Add large file detection (encoding=None/"none")
  - Download via download_url with requests
  - Log file size and download progress

- tests/test_github_scraper.py:
  - Add test_get_file_content_large_file
  - Add test_extract_changelog_large_file
  - All 31 tests passing 

**Credits**:
- Thanks to @XGCoder for detailed bug report
- Thanks to @gorquan for local fixes and guidance

Fixes #219

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 20:57:03 +03:00
yusyus
58286f454a fix: Handle symlinked README.md and CHANGELOG.md in GitHub scraper
- Add _get_file_content() helper method to detect and follow symlinks
- Update _extract_readme() to use new helper
- Update _extract_changelog() to use new helper
- Add 7 comprehensive tests for symlink handling
- All 29 GitHub scraper tests passing

Fixes #225

When README.md or CHANGELOG.md are symlinks (like in vercel/ai repo),
PyGithub returns ContentFile with type='symlink' and encoding=None.
Direct access to decoded_content throws AssertionError.

Solution: Detect symlink type, follow target path, then decode actual file.
Handles edge cases: broken symlinks, missing targets, encoding errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 20:41:28 +03:00
yusyus
50e0bfd19b fix: Update test file imports to use proper package paths
Fixed import errors in test_pdf_scraper.py and test_github_scraper.py:
- Replaced absolute imports with proper package imports
- Changed 'from pdf_scraper import' to 'from skill_seekers.cli.pdf_scraper import'
- Changed 'from github_scraper import' to 'from skill_seekers.cli.github_scraper import'
- Updated all @patch() decorators to use full module paths
- Removed sys.path manipulation workarounds

This completes the fix for import issues discovered during Task 1.2 (Issue #193).

Test Results:
- test_pdf_scraper.py: 18/18 passed 
- test_github_scraper.py: 22/22 passed 
2025-11-29 21:55:46 +03:00
yusyus
53d01910f9 test: Add comprehensive test suite for GitHub scraper (22 tests)
Tests cover all C1 tasks:
- GitHubScraper initialization and authentication (5 tests)
- README extraction (C1.2) (3 tests)
- Language detection (C1.4) (2 tests)
- GitHub Issues extraction (C1.7) (3 tests)
- CHANGELOG extraction (C1.8) (3 tests)
- GitHub Releases extraction (C1.9) (2 tests)
- GitHubToSkillConverter and skill building (C1.10) (2 tests)
- Error handling and edge cases (2 tests)

All tests passing: 22/22 

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 14:30:57 +03:00