skill-seekers-reference

firefrost-gaming/skill-seekers-reference

Author	SHA1	Message	Date
yusyus	0fa99641aa	style: fix pre-existing ruff format issues in 5 files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 21:24:21 +03:00
yusyus	2ef6e59d06	fix: stop blindly appending /index.html.md to non-.md URLs (#277 ) The previous fix (`a82cf69`) only addressed anchor fragment stripping but left the fundamental problem: _convert_to_md_urls() blindly appended /index.html.md to ALL non-.md URLs from llms.txt. This only works for Docusaurus sites — for sites like Discord docs it generates mass 404s. Changes: - _convert_to_md_urls() now strips anchors and deduplicates only, preserving original URLs as-is instead of appending /index.html.md - New _has_md_extension() helper uses urlparse().path.endswith(".md") instead of error-prone ".md" in url substring matching - Fixed ".md" in url checks at 4 locations (lines 465, 554, 716, 775) - Removed 24 lines of dead commented-out code - Added real-world e2e test against docs.discord.com (no mocks) - Updated unit tests for new behavior (32 tests) Fixes #277 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 23:44:35 +03:00
yusyus	0265de5816	style: Format all Python files with ruff - Formatted 103 files to comply with ruff format requirements - No code logic changes, only formatting/whitespace - Fixes CI formatting check failures	2026-02-08 14:42:27 +03:00
yusyus	51787e57bc	style: Fix 411 ruff lint issues (Kimi's issue #4 ) Auto-fixed lint issues with ruff --fix and --unsafe-fixes: Issue #4: Ruff Lint Issues - Before: 447 errors (originally reported as ~5,500) - After: 55 errors remaining - Fixed: 411 errors (92% reduction) Auto-fixes applied: - 156 UP006: List/Dict → list/dict (PEP 585) - 63 UP045: Optional[X] → X \| None (PEP 604) - 52 F401: Removed unused imports - 52 UP035: Fixed deprecated imports - 34 E712: True/False comparisons → not/bool() - 17 F841: Removed unused variables - Plus 37 other auto-fixable issues Remaining 55 errors (non-critical): - 39 B904: Exception chaining (best practice) - 5 F401: Unused imports (edge cases) - 3 SIM105: Could use contextlib.suppress - 8 other minor style issues These remaining issues are code quality improvements, not critical bugs. Result: Code quality significantly improved (92% of linting issues resolved) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-08 12:46:38 +03:00
yusyus	a82cf6967a	fix: Strip anchor fragments in URL conversion to prevent 404 errors (fixes #277 ) Critical bug fix for llms.txt URL parsing: Problem: - URLs with anchor fragments (e.g., #synchronous-initialization) were malformed when converting to .md format - Example: https://example.com/api#method → https://example.com/api#method/index.html.md ❌ - Caused 404 errors and duplicate requests for same page with different anchors Solution: 1. Parse URLs with urllib.parse.urlparse() to extract fragments 2. Strip anchor fragments before appending /index.html.md 3. Deduplicate base URLs (multiple anchors → single request) 4. Fix .md detection: '.md' in url → url.endswith('.md') - Prevents false matches on URLs like /cmd-line or /AMD-processors Changes: - src/skill_seekers/cli/doc_scraper.py (_convert_to_md_urls) - Added URL parsing to remove fragments - Added deduplication with seen_base_urls set - Fixed .md extension detection - Updated log message to show deduplicated count - tests/test_url_conversion.py (NEW) - 12 comprehensive tests covering all edge cases - Real-world MikroORM case validation - 54/54 tests passing (42 existing + 12 new) - CHANGELOG.md - Documented bug fix and solution Reported-by: @devjones <https://github.com/yusufkaraaslan/Skill_Seekers/issues/277>	2026-02-04 21:16:13 +03:00

5 Commits