skill-seekers-reference

firefrost-gaming/skill-seekers-reference

Author	SHA1	Message	Date
yusyus	1d3d7389d7	fix: sanitize_url crashes on Python 3.14 strict urlparse (#284 ) Python 3.14's urlparse() raises ValueError on URLs with unencoded brackets that look like malformed IPv6 (e.g. http://[fdaa:x:x:x::x from docs.openclaw.ai llms-full.txt). sanitize_url() called urlparse() BEFORE encoding brackets, so it crashed before it could fix them. Fix: catch ValueError from urlparse, encode ALL brackets, then retry. This is safe because if urlparse rejected the brackets, they are NOT valid IPv6 host literals and should be encoded anyway. Also fixed Discord e2e tests to skip gracefully on network issues. Fixes #284 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 00:30:48 +03:00
yusyus	b25a6f7f53	fix: centralize bracket-encoding to prevent 'Invalid IPv6 URL' on all code paths (#284 ) The original fix (`741daf1`) only patched LlmsTxtParser._clean_url(), which covers URLs extracted directly from llms.txt content. But URLs discovered from .md files during BFS crawl (_extract_markdown_content) and from HTML pages (extract_content) bypass _clean_url() entirely. When those pages contain links with square brackets (e.g. /api/[v1]/users), httpx raises 'Invalid IPv6 URL' on fetch. Fix: add a shared sanitize_url() utility in cli/utils.py that percent-encodes [ and ] in path/query components, and apply it at every URL ingestion point: - _enqueue_url(): main chokepoint — all discovered URLs pass through - scrape_page(): safety net for start_urls that skip _enqueue_url - scrape_page_async(): same for async mode - dry-run sync/async paths: direct fetches that also bypass _enqueue_url LlmsTxtParser._clean_url() now delegates bracket-encoding to the shared sanitize_url() (DRY), keeping only its malformed-anchor stripping logic. Added 16 tests: sanitize_url unit tests, _clean_url bracket tests, _enqueue_url sanitization tests, and integration test verifying markdown content with bracket URLs is handled safely. Fixes #284	2026-03-14 23:53:47 +03:00
yusyus	60c46673ed	feat: support multiple --enhance-workflow flags with shared workflow_runner - Change --enhance-workflow from type:str to action:append in all argument files (workflow, create, scrape, github, pdf) so the flag can be given multiple times to chain workflows in sequence - Add workflow_runner.py: shared utility used by all 4 scrapers - collect_workflow_vars(): merges extra context then user --var flags (user flags take precedence over scraper metadata) - run_workflows(): executes named workflows in order, then any inline --enhance-stage workflow; handles dry-run/preview mode - Remove duplicate ~115-130 line workflow blocks from doc_scraper, github_scraper, pdf_scraper, and codebase_scraper; replace with single run_workflows() call each - Remove mutual exclusivity between workflows and AI enhancement: workflows now run first, then traditional enhancement continues independently (--enhance-level 0 to disable) - Add tests/test_workflow_runner.py: 21 tests covering no-flags, single workflow, multiple/chained workflows, inline stages, mixed mode, variable precedence, and dry-run - Fix test_markdown_parsing: accept "text" or "unknown" for unlabelled code blocks (unified MarkdownParser returns "text" by default) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 22:05:27 +03:00
yusyus	6439c85cde	fix: Fix list comprehension variable names (NameError in CI) Fixed incorrect variable names in list comprehensions that were causing NameError in CI (Python 3.11/3.12): Critical fixes: - tests/test_markdown_parsing.py: 'l' → 'link' in list comprehension - src/skill_seekers/cli/pdf_extractor_poc.py: 'l' → 'line' (2 occurrences) Additional auto-lint fixes: - Removed unused imports in llms_txt_downloader.py, llms_txt_parser.py - Fixed comparison operators in config files - Fixed list comprehension in other files All tests now pass in CI. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-17 23:33:34 +03:00
yusyus	ec3e0bf491	fix: Resolve 61 critical linting errors Fixed priority linting errors to improve code quality: Critical Fixes: - F821 (2 errors): Fixed undefined name 'original_result' in config_enhancer.py - UP035 (2 errors): Removed deprecated typing.Dict and typing.Type imports - F401 (27 errors): Removed unused imports and added noqa for availability checks - E722 (19 errors): Replaced bare 'except:' with 'except Exception:' Code Quality Improvements: - SIM201 (4 errors): Simplified 'not x == y' to 'x != y' - SIM118 (2 errors): Removed unnecessary .keys() in dict iterations - E741 (4 errors): Renamed ambiguous variable 'l' to 'line' - I001 (1 error): Sorted imports in test_bootstrap_skill.py All modified areas tested and passing: - test_scraper_features.py: 42 passed - test_integration.py: 51 passed - test_architecture_scenarios.py: 11 passed - test_real_world_fastmcp.py: 19 passed (1 skipped) Remaining linting errors: 249 (mostly code style suggestions like ARG002, F841, SIM102) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-17 22:54:40 +03:00
Pablo Estevez	c33c6f9073	change max lenght	2026-01-17 17:48:15 +00:00
Pablo Estevez	5ed767ff9a	run ruff	2026-01-17 17:29:21 +00:00
tsyhahaha	4b764ed1c5	test: add unit tests for markdown parsing and multi-source features - Add test_markdown_parsing.py with 20 tests covering: - Markdown content extraction (titles, headings, code blocks, links) - HTML fallback when .md URL returns HTML - llms.txt URL extraction and cleaning - Empty/short content filtering - Add test_multi_source.py with 12 tests covering: - List-based scraped_data structure - Per-source subdirectory generation for docs/github/pdf - Index file generation for each source type 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2026-01-05 22:13:19 +08:00

8 Commits