skill-seekers-reference

firefrost-gaming/skill-seekers-reference

Author	SHA1	Message	Date
yusyus	00c72ea4a3	fix: resolve CI failures across all GitHub Actions workflows - Fix ruff format issue in doc_scraper.py - Add pytest skip markers for browser renderer tests when Playwright is not installed in CI - Replace broken Python heredocs in 4 workflow YAML files (scheduled-updates, vector-db-export, quality-metrics, test-vector-dbs) with python3 -c calls to fix YAML parsing errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 20:40:45 +03:00
yusyus	6fded977dd	feat: add Kotlin language support for codebase analysis (#287 ) Adds full C3.x pipeline support for Kotlin (.kt, .kts): - Language detection patterns (40+ weighted patterns for data/sealed classes, coroutines, companion objects, KMP, etc.) - AST regex parser in code_analyzer.py (classes, objects, functions, extension functions, suspend functions) - Dependency extraction for Kotlin import statements (with alias support) - Design pattern adaptations (object→Singleton, companion→Factory, sealed→Strategy, data→Builder, Flow→Observer) - Test example extraction for JUnit 4/5, Kotest, MockK, Spek - Config detection for build.gradle.kts / settings.gradle.kts - Extension maps registered in codebase_scraper, unified_codebase_analyzer, github_scraper, generate_router Also fixes pre-existing parser count tests (35→36 for doctor command added in previous commit). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 23:25:12 +03:00
yusyus	ea4fed0be4	feat: add headless browser rendering for JavaScript SPA sites (#321 ) New BrowserRenderer class uses Playwright to render JavaScript-heavy documentation sites (React, Vue SPAs) that return empty HTML shells with requests.get(). Activated via --browser flag on web scraping. - browser_renderer.py: Playwright wrapper with lazy browser launch, auto-install Chromium on first use, context manager support - doc_scraper.py: browser_mode config, _render_with_browser() helper, integrated into scrape_page() and scrape_page_async() - SPA detection warnings now suggest --browser flag - Optional dep: pip install "skill-seekers[browser]" - 14 real e2e tests (actual Chromium, no mocks) - UML updated: Scrapers class diagram (BrowserRenderer + dependency), Parsers (DoctorParser), Utilities (Doctor), Components, and new Browser Rendering sequence diagram (#20) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 22:06:14 +03:00
yusyus	006cccabae	feat: add skill-seekers doctor health check command (#316 ) 8 diagnostic checks: Python version (3.10+), package install, git, 14 core deps, 10 optional deps, API keys, MCP server, output dir. Each check reports pass/warn/fail with --verbose for extra detail. Exit code 0 if no critical failures, 1 otherwise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:27:17 +03:00
yusyus	43bdabb84f	feat: add prompt injection check workflow for content security (#324 ) New bundled workflow `prompt-injection-check` scans scraped content for prompt injection patterns (role assumption, instruction overrides, delimiter injection, hidden instructions, encoded payloads) using AI. Flags suspicious content without removing it — preserves documentation accuracy while warning about adversarial content. Added as first stage in both `default` and `security-focus` workflows so it runs automatically with --enhance-level >= 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:17:57 +03:00
yusyus	6beff3d52f	fix: add path traversal protection to get_workflow_tool + tests (#325 ) PR #326 added _validate_name() to create/update/delete but missed get_workflow_tool, which would raise an unhandled ValueError instead of returning a user-friendly error. Added try/except handling and 6 tests covering all 4 tool functions with malicious names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 20:56:52 +03:00
Spidershield-contrib	a12743769e	fix: prevent path traversal in workflow name parameter (CWE-22) (#326 ) Co-authored-by: spidershield-contrib <spidershield-contrib@users.noreply.github.com>	2026-03-28 20:55:13 +03:00
yusyus	c6c17ada95	docs: add 6 behavioral UML diagrams verified against codebase 3 sequence diagrams (create command dispatch, GitHub+C3.x pipeline with all 5 stages, MCP dual-path invocation), 2 activity diagrams (source detection in correct code order, enhancement level flag mapping), and 1 component diagram with corrected runtime dependency arrows. All diagrams cross-referenced against source code for accuracy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 20:45:30 +03:00
yusyus	d381315340	fix: pass enhance_level instead of removed enhance_with_ai/ai_mode to analyze_codebase (#323 ) Two call sites (_run_c3_analysis in unified_scraper.py and _analyze_c3x in unified_codebase_analyzer.py) still passed the old enhance_with_ai and ai_mode kwargs which were replaced by enhance_level. This caused a TypeError when running C3.x codebase analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 22:14:51 +03:00
yusyus	31a57c448b	style: apply ruff formatting to github_scraper.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 23:46:42 +03:00
yusyus	d71c1d3aa3	fix: filter non-integer metadata from GitHub languages API response (#322 ) PyGithub's get_languages() returns raw API JSON which in some environments includes non-integer metadata keys (e.g., "url"), causing a TypeError in sum(). Now filters to integer values only before calculating percentages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 23:44:52 +03:00
yusyus	336ab6aaac	Merge development into main: release v3.4.0 8 new LLM platform adaptors, 7 new CLI agents, OpenCode skill tools, 8 bug fixes including SPA site detection, UML architecture docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 22:20:38 +03:00
yusyus	5a93003da4	chore: bump version to 3.4.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 21:58:57 +03:00
yusyus	d76ab1d9a4	fix: report accurate saved/skipped page counts and detect SPA sites (#320 , #321 ) The scraper previously reported len(visited_urls) as "Scraped N pages" even when save_page() silently skipped pages with empty content (<50 chars). For JavaScript SPA sites this meant "Scraped 190 pages" followed by "No scraped data found!" with no explanation. Changes: - Added pages_saved/pages_skipped counters to DocToSkillConverter - save_page() now increments pages_skipped on skip, pages_saved on save - New _log_scrape_completion() reports "(N saved, M skipped)" breakdown - SPA detection warns when all/most pages have empty content - build_skill() error now explains empty content cause when pages skipped - Updated both sync and async scrape completion paths - 14 new tests across 4 test classes (counting, messages, SPA, build) Fixes #320 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 22:26:35 +03:00
yusyus	8152045e38	chore: consolidate Docs/ into docs/ (single documentation directory) Move UML/ directory and Architecture.md from Docs/ to docs/. Rename Architecture.md to UML_ARCHITECTURE.md to avoid collision with existing docs/ARCHITECTURE.md (docs organization file). Update all references in README.md, CONTRIBUTING.md, CLAUDE.md, and the architecture file itself. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 20:02:53 +03:00
yusyus	a1934905f6	docs: remove awesome-mcp-servers from ecosystem tables Not a Skill Seekers-specific repo — better suited for MCP docs section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 19:24:37 +03:00
yusyus	a1eab63daf	docs: add ecosystem section linking all Skill Seekers repos Add cross-repo discoverability for the 6 related repositories (website, configs, GitHub Action, plugin, Homebrew tap, MCP servers). - README.md: ecosystem table, Trendshift badge, pepy.tech downloads badge - All 11 translated READMEs: translated ecosystem sections - CONTRIBUTING.md: related repositories table for contributors - pyproject.toml: ecosystem URLs in [project.urls] for PyPI sidebar Addresses contributor feedback about difficulty finding the website repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 19:22:16 +03:00
yusyus	073e6b5a54	docs: add architecture references to README.md and CONTRIBUTING.md - README: Add Architecture section with package overview diagram, module table, and links to UML docs - README: Add Architecture subsection to Documentation with links to diagrams, HTML API reference, and StarUML project - CONTRIBUTING: Add UML Architecture subsection with design patterns documented and guidance to keep UML in sync with code changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 12:33:56 +03:00
yusyus	40603a3cf6	docs: remove stale UNIFIED_PARSERS.md superseded by UML architecture The parsers architecture is now fully documented in the StarUML project (Docs/UML/skill_seekers.mdj) with the Parsers class diagram showing all 28 SubcommandParser subclasses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 12:31:17 +03:00
yusyus	6b54988db5	docs: add StarUML HTML API reference documentation export 1,758 HTML files generated from StarUML project_export_doc containing full API reference for all ~200 classes, operations, attributes, and documentation across all 13 modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 12:29:14 +03:00
yusyus	30b877274b	docs: add full UML architecture with 14 class diagrams synced from source code - 14 StarUML diagrams covering all 13 modules (8 core + 5 utility) - ~200 classes with operations, attributes, and documentation from actual source - Package overview with 25 verified inter-module dependencies - Exported PNG diagrams in Docs/UML/exports/ - Architecture.md with embedded diagram descriptions - CLAUDE.md updated with architecture reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 12:24:43 +03:00
yusyus	d0d7d5a939	chore: remove stale root-level test scripts and junk files Remove files that should never have been committed: - test_api.py, test_httpx_quick.sh, test_httpx_skill.sh (ad-hoc test scripts) - test_week2_features.py (one-off validation script) - test_results.log (log file) - =0.24.0 (accidental pip error output) - demo_conflicts.py (demo script) - ruff_errors.txt (stale lint output) - TESTING_GAP_REPORT.md (stale one-time report) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 21:39:22 +03:00
yusyus	0fa99641aa	style: fix pre-existing ruff format issues in 5 files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 21:24:21 +03:00
yusyus	eb13f96ece	docs: update remaining docs for 12 LLM platforms Update platform counts (4→12) in: - docs/reference/CLAUDE_INTEGRATION.md (EN + zh-CN) - docs/guides/MCP_SETUP.md, UPLOAD_GUIDE.md, MIGRATION_GUIDE.md - docs/strategy/INTEGRATION_STRATEGY.md, DEEPWIKI_ANALYSIS.md, KIMI_ANALYSIS_COMPARISON.md - docs/archive/historical/HTTPX_SKILL_GRADING.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 20:50:50 +03:00
yusyus	6bb7078fbc	docs: update all documentation for 12 LLM platforms and 18 agents - README.md + 11 i18n READMEs: 5→12 LLM platforms, 11→18 agents, new platform/agent tables - CLAUDE.md: updated --target list, adaptor directory tree - CHANGELOG.md: added v3.4.0 entry with all Phase 1-4 changes - docs/reference/CLI_REFERENCE.md: new --target and --agent options - docs/reference/FEATURE_MATRIX.md: updated all platform counts and tables - docs/user-guide/04-packaging.md: new platform and agent rows - docs/FAQ.md: expanded platform/agent answers - docs/zh-CN/*: synchronized Chinese documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 20:42:31 +03:00
yusyus	cd7b322b5e	feat: expand platform coverage with 8 new adaptors, 7 new CLI agents, and OpenCode skill tools Phase 1 - OpenCode Integration: - Add OpenCodeAdaptor with directory-based packaging and dual-format YAML frontmatter - Kebab-case name validation matching OpenCode's regex spec Phase 2 - OpenAI-Compatible LLM Platforms: - Extract OpenAICompatibleAdaptor base class from MiniMax (shared format/package/upload/enhance) - Refactor MiniMax to ~20 lines of constants inheriting from base - Add 6 new LLM adaptors: Kimi, DeepSeek, Qwen, OpenRouter, Together AI, Fireworks AI - All use OpenAI-compatible API with platform-specific constants Phase 3 - CLI Agent Expansion: - Add 7 new install-agent paths: roo, cline, aider, bolt, kilo, continue, kimi-code - Total agents: 11 -> 18 Phase 4 - Advanced Features: - OpenCode skill splitter (auto-split large docs into focused sub-skills with router) - Bi-directional skill format converter (import/export between OpenCode and any platform) - GitHub Actions template for automated skill updates Totals: 12 --target platforms, 18 --agent paths, 2915 tests passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 20:31:51 +03:00
yusyus	1d3d7389d7	fix: sanitize_url crashes on Python 3.14 strict urlparse (#284 ) Python 3.14's urlparse() raises ValueError on URLs with unencoded brackets that look like malformed IPv6 (e.g. http://[fdaa:x:x:x::x from docs.openclaw.ai llms-full.txt). sanitize_url() called urlparse() BEFORE encoding brackets, so it crashed before it could fix them. Fix: catch ValueError from urlparse, encode ALL brackets, then retry. This is safe because if urlparse rejected the brackets, they are NOT valid IPv6 host literals and should be encoded anyway. Also fixed Discord e2e tests to skip gracefully on network issues. Fixes #284 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 00:30:48 +03:00
yusyus	2ef6e59d06	fix: stop blindly appending /index.html.md to non-.md URLs (#277 ) The previous fix (`a82cf69`) only addressed anchor fragment stripping but left the fundamental problem: _convert_to_md_urls() blindly appended /index.html.md to ALL non-.md URLs from llms.txt. This only works for Docusaurus sites — for sites like Discord docs it generates mass 404s. Changes: - _convert_to_md_urls() now strips anchors and deduplicates only, preserving original URLs as-is instead of appending /index.html.md - New _has_md_extension() helper uses urlparse().path.endswith(".md") instead of error-prone ".md" in url substring matching - Fixed ".md" in url checks at 4 locations (lines 465, 554, 716, 775) - Removed 24 lines of dead commented-out code - Added real-world e2e test against docs.discord.com (no mocks) - Updated unit tests for new behavior (32 tests) Fixes #277 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 23:44:35 +03:00
yusyus	f6131c6798	fix: unified scraper temp config uses unified format for doc_scraper (#317 ) The unified scraper's _scrape_documentation() was creating temp configs in flat/legacy format (no "sources" key), causing doc_scraper's ConfigValidator to reject them. Wrap the temp config in unified format with a "sources" array. Also remove dead code branches and fix a pre-existing test that didn't clear GITHUB_TOKEN from env. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:35:12 +03:00
yusyus	4f87de6b56	fix: improve MiniMax adaptor from PR #318 review (#319 ) * feat: add MiniMax AI as LLM platform adaptor Original implementation by octo-patch in PR #318. This commit includes comprehensive improvements and documentation. Code Improvements: - Fix API key validation to properly check JWT format (eyJ prefix) - Add specific exception handling for timeout and connection errors - Remove unused variable in upload method Dependencies: - Add MiniMax to [all-llms] extra group in pyproject.toml Tests: - Remove duplicate setUp method in integration test class - Add 4 new test methods: * test_package_excludes_backup_files * test_upload_success_mocked (with OpenAI mocking) * test_upload_network_error * test_upload_connection_error * test_validate_api_key_jwt_format - Update test_validate_api_key_valid to use JWT format keys - Fix test assertions for error message matching Documentation: - Create comprehensive MINIMAX_INTEGRATION.md guide (380+ lines) - Update MULTI_LLM_SUPPORT.md with MiniMax platform entry - Update 01-installation.md extras table - Update INTEGRATIONS.md AI platforms table - Update AGENTS.md adaptor import pattern example - Fix README.md platform count from 4 to 5 All tests pass (33 passed, 3 skipped) Lint checks pass Co-authored-by: octo-patch <octo-patch@users.noreply.github.com> * fix: improve MiniMax adaptor — typed exceptions, key validation, tests, docs - Remove invalid "minimax" self-reference from all-llms dependency group - Use typed OpenAI exceptions (APITimeoutError, APIConnectionError) instead of string-matching on generic Exception - Replace incorrect JWT assumption in validate_api_key with length check - Use DEFAULT_API_ENDPOINT constant instead of hardcoded URLs (3 sites) - Add Path() cast for output_path before .is_dir() call - Add sys.modules mock to test_enhance_missing_library - Add mocked test_enhance_success with backup/content verification - Update test assertions for new exception types and key validation - Add MiniMax to __init__.py docstrings (module, get_adaptor, list_platforms) - Add MiniMax sections to MULTI_LLM_SUPPORT.md (install, format, API key, workflow example, export-to-all) Follows up on PR #318 by @octo-patch (feat: add MiniMax AI as LLM platform adaptor). Co-Authored-By: Octopus <octo-patch@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: octo-patch <octo-patch@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 22:12:23 +03:00
yusyus	37a23e6c6d	fix: replace unicode arrows in CLI help text for Windows cp1252 compatibility Replace → (U+2192) with -> in argparse help strings. Windows cmd uses cp1252 encoding which cannot render unicode arrows, causing --help to crash with UnicodeEncodeError.	2026-03-19 00:10:50 +03:00
yusyus	26c2d0bd5c	fix: correct CLI flags in plugin slash commands (create uses --preset, package uses --target)	2026-03-17 22:03:20 +03:00
yusyus	5e4932e8b1	feat: add distribution files for Smithery, GitHub Action, and Claude Code Plugin - Add Claude Code Plugin: plugin.json, .mcp.json, 3 slash commands, skill-builder agent skill - Add GitHub Action: composite action.yml with 6 inputs/2 outputs, comprehensive README - Add Smithery: publishing guide with namespace yusufkaraaslan/skill-seekers created - Add render-mcp.yaml for MCP server deployment on Render - Fix Dockerfile.mcp: --transport flag (nonexistent) → --http, add dynamic PORT support - Update AGENTS.md to v3.3.0 with corrected test count and expanded CI section - Allow distribution/claude-plugin/.mcp.json in .gitignore	2026-03-16 23:29:50 +03:00
yusyus	2b725aa8f7	fix: update version strings and test expectations from 3.2.0 to 3.3.0 Fix CI failures: version hardcoded in _version.py fallbacks and test assertions (test_package_structure, test_cli_paths) still referenced 3.2.0 after the version bump.	2026-03-16 00:53:35 +03:00
yusyus	ca0890ba6f	chore: bump version to 3.3.0 and finalize changelog - Bump version in pyproject.toml: 3.2.0 -> 3.3.0 - Rename [Unreleased] to [3.3.0] - 2026-03-16 with theme line - Add Supported Source Types (17) reference table - Add 12 missing changelog entries: - feat: sync-config command (#306) - feat: best practices guide (#206) - docs: 32 files updated for 17 source types - docs: README translations for 10 languages - perf: pre-compiled regex, bisect line indexing, O(1) dedup (#309) - fix: Invalid IPv6 URL on bracket URLs (#284) - fix: GitHub scraper PaginatedList crash (#269) - fix: release workflow version mismatch and 3.10 compat - fix: infer_categories key mismatch - fix: flaky benchmark test - fix: CI branch protection pending	2026-03-16 00:23:48 +03:00
yusyus	9e405df9d0	docs: add README translations for 10 languages (12 total) Add machine-translated README files for Japanese, Korean, Spanish, French, German, Portuguese (BR), Turkish, Arabic, Hindi, and Russian. Update language selector in English and Chinese READMEs to link all 12 versions. New files: README.{ja,ko,es,fr,de,pt-BR,tr,ar,hi,ru}.md Modified: README.md, README.zh-CN.md (language selector bar)	2026-03-15 16:27:05 +03:00
yusyus	37cb307455	docs: update all documentation for 17 source types Update 32 documentation files across English and Chinese (zh-CN) docs to reflect the 10 new source types added in the previous commit. Updated files: - README.md, README.zh-CN.md — taglines, feature lists, examples, install extras - docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE - docs/features/ — UNIFIED_SCRAPING with generic merge docs - docs/advanced/ — multi-source guide, MCP server guide - docs/getting-started/ — installation extras, quick-start examples - docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge) - docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README - Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP - docs/zh-CN/ — Chinese translations for all of the above 32 files changed, +3,016 lines, -245 lines	2026-03-15 15:56:04 +03:00
yusyus	53b911b697	feat: add 10 new skill source types (17 total) with full pipeline integration Add Jupyter Notebook, Local HTML, OpenAPI/Swagger, AsciiDoc, PowerPoint, RSS/Atom, Man Pages, Confluence, Notion, and Slack/Discord Chat as new skill source types. Each type is fully integrated across: - Standalone CLI commands (skill-seekers <type>) - Auto-detection via 'skill-seekers create' (file extension + content sniffing) - Unified multi-source configs (scraped_data, dispatch, config validation) - Unified skill builder (generic merge + source-attributed synthesis) - MCP server (scrape_generic tool with per-type flag mapping) - pyproject.toml (entry points, optional deps, [all] group) Also fixes: EPUB unified pipeline gap, missing word/video config validators, OpenAPI yaml import guard, MCP flag mismatch for all 10 types, stale docstrings, and adds 77 integration tests + complex-merge workflow. 50 files changed, +20,201 lines	2026-03-15 15:30:15 +03:00
yusyus	64403a3686	docs: add best practices guide for high-quality skills (#206 ) Adds docs/BEST_PRACTICES.md — a comprehensive guide for creating high-quality Claude skills. Covers SKILL.md structure, code examples, prerequisites, troubleshooting, quality targets, and a real-world before/after example (Grade F to Grade A). Addresses roadmap item I2.2. Based on PR #206 by @jmagly from the AI Writing Guide project. Fixes applied: updated outdated CLI command, fixed broken doc links. Co-authored-by: jmagly <jmagly@users.noreply.github.com>	2026-03-15 02:51:02 +03:00
yusyus	7185531f94	fix: replace PaginatedList slicing with itertools.islice in _extract_issues PyGithub's PaginatedList slicing (issues[:max_issues]) may fail with 'list index out of range' on some PyGithub versions or when repos have no issues. Replace with itertools.islice() which works reliably with any iterable, including PaginatedList. Bug reported by @dream0438-cmd in PR #269. Closes #269	2026-03-15 02:44:06 +03:00
yusyus	2e30970dfb	feat: add EPUB input support (#310 ) Adds EPUB as a first-class input source for skill generation. - EpubToSkillConverter (epub_scraper.py, ~1200 lines) following PDF scraper pattern - Dublin Core metadata, spine items, code blocks, tables, images extraction - DRM detection (Adobe ADEPT, Apple FairPlay, Readium LCP) with fail-fast - EPUB 3 NCX TOC bug workaround (ignore_ncx=True) - ebooklib as optional dep: pip install skill-seekers[epub] - Wired into create command with .epub auto-detection - 104 tests, all passing Review fixes: removed 3 empty test stubs, fixed SVG double-counting in _extract_images(), added logger.debug to bare except pass. Based on PR #310 by @christianbaumann. Co-authored-by: Christian Baumann <mail@chriss-baumann.de>	2026-03-15 02:34:41 +03:00
yusyus	83b9a695ba	feat: add sync-config command to detect and update config start_urls (#306 ) ## Summary Add `skill-seekers sync-config` subcommand that crawls a docs site's navigation, diffs discovered URLs against a config's start_urls, and optionally writes the updated list back with --apply. - BFS link discovery with configurable depth (default 2), max-pages, rate-limit - Respects url_patterns.include/exclude from config - Supports optional nav_seed_urls config field - Handles both unified (sources array) and legacy flat config formats - MCP tool sync_config included - 57 tests (39 unit + 18 E2E with local HTTP server) - Fixed CI: renamed summary job to "Tests" to match branch protection rule Closes #306	2026-03-15 02:16:32 +03:00
yusyus	0c9504c944	fix(ci): rename summary job to 'Tests' to match branch protection rule The branch protection requires a status check named 'Tests', but GitHub reports checks using job names, not the workflow name. The summary job was named 'All Checks Complete' which never satisfied the required check, leaving PRs permanently stuck as 'Expected — Waiting for status to be reported'. Fix: rename the summary job from 'All Checks Complete' to 'Tests' so it matches the required status check exactly.	2026-03-15 01:39:58 +03:00
yusyus	b25a6f7f53	fix: centralize bracket-encoding to prevent 'Invalid IPv6 URL' on all code paths (#284 ) The original fix (`741daf1`) only patched LlmsTxtParser._clean_url(), which covers URLs extracted directly from llms.txt content. But URLs discovered from .md files during BFS crawl (_extract_markdown_content) and from HTML pages (extract_content) bypass _clean_url() entirely. When those pages contain links with square brackets (e.g. /api/[v1]/users), httpx raises 'Invalid IPv6 URL' on fetch. Fix: add a shared sanitize_url() utility in cli/utils.py that percent-encodes [ and ] in path/query components, and apply it at every URL ingestion point: - _enqueue_url(): main chokepoint — all discovered URLs pass through - scrape_page(): safety net for start_urls that skip _enqueue_url - scrape_page_async(): same for async mode - dry-run sync/async paths: direct fetches that also bypass _enqueue_url LlmsTxtParser._clean_url() now delegates bracket-encoding to the shared sanitize_url() (DRY), keeping only its malformed-anchor stripping logic. Added 16 tests: sanitize_url unit tests, _clean_url bracket tests, _enqueue_url sanitization tests, and integration test verifying markdown content with bracket URLs is handled safely. Fixes #284	2026-03-14 23:53:47 +03:00
yusyus	f214976ccd	fix: apply review fixes from PR #309 and stabilize flaky benchmark test Follow-up to PR #309 (perf: optimize with caching, pre-compiled regex, O(1) lookups, and bisect line indexing). These fixes were committed to the PR branch but missed the squash merge. Review fixes (credit: PR #309 by copperlang2007): 1. Rename _pending_set -> _enqueued_urls to accurately reflect that the set tracks all ever-enqueued URLs, not just currently pending ones 2. Extract duplicated _build_line_index()/_offset_to_line() into shared build_line_index()/offset_to_line() in cli/utils.py (DRY) 3. Fix pre-existing bug: infer_categories() guard checked 'tutorial' but wrote to 'tutorials' key, risking silent overwrites 4. Remove unnecessary _store_results() closure in scrape_page() 5. Simplify parser pre-import in codebase_scraper.py Benchmark stabilization: - test_benchmark_metadata_overhead was flaky on CI (106.7% overhead observed, threshold 50%) because 5 iterations with mean averaging can't reliably measure microsecond-level differences - Fix: 20 iterations, warm-up run, median instead of mean, threshold raised to 200% (guards catastrophic regression, not noise) Ref: https://github.com/yusufkaraaslan/Skill_Seekers/pull/309	2026-03-14 23:39:23 +03:00
copperlang2007	89f5e6fe5f	perf: optimize with caching, pre-compiled regex, O(1) lookups, and bisect line indexing (#309 ) ## Summary Performance optimizations across core scraping and analysis modules: - doc_scraper.py: Pre-compiled regex at module level, O(1) URL dedup via _enqueued_urls set, cached URL patterns, _enqueue_url() helper (DRY), seen_links set for link extraction, pre-lowercased category keywords, async error logging (bug fix), summary I/O error handling - code_analyzer.py: O(log n) bisect-based line lookups replacing O(n) count("\n") across all 10 language analyzers; O(n) parent class map replacing O(n^2) AST walks for Python method detection - dependency_analyzer.py: Same bisect line-index optimization for all import extractors - codebase_scraper.py: Module-level import re, pre-imported parser classes outside loop - github_scraper.py: deque.popleft() for O(1) tree traversal, module-level import fnmatch - utils.py: Shared build_line_index() / offset_to_line() utilities (DRY) - test_adaptor_benchmarks.py: Stabilized flaky test_benchmark_metadata_overhead (median, warm-up, more iterations) Review fixes applied on top of original PR: 1. Renamed misleading _pending_set to _enqueued_urls 2. Extracted duplicated line-index code into shared cli/utils.py 3. Fixed pre-existing "tutorial" vs "tutorials" key mismatch bug in infer_categories() 4. Removed unnecessary _store_results() closure 5. Simplified parser pre-import pattern	2026-03-14 23:35:39 +03:00
yusyus	0ca271cdcb	fix: use grep instead of tomllib for version check in release workflow tomllib is only available in Python 3.11+, but the release workflow runs on Python 3.10. Replace with grep/sed which works everywhere. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 21:14:31 +03:00
Claude	1254f0e1ac	chore: update uv.lock https://claude.ai/code/session_015hYfpKhFH3GSMVSKgA4JVd	2026-03-02 07:11:34 +00:00
Claude	fca89e6ee1	fix: add explicit release name and version consistency checks to release workflow The GitHub release was showing v3.1.3 instead of v3.2.0 because: 1. No explicit `name` was set on the GitHub release action, relying on defaults that could be unreliable 2. The sed command for extracting release notes used unescaped dots in the version regex, which could match wrong versions 3. No fallback if release notes extraction produced an empty file Changes: - Add explicit `name` and `tag_name` to softprops/action-gh-release - Add version consistency check (tag vs pyproject.toml vs package) - Escape dots in sed regex for exact version matching - Add fallback when release notes extraction produces empty output https://claude.ai/code/session_015hYfpKhFH3GSMVSKgA4JVd	2026-03-02 07:10:59 +00:00
yusyus	73349c616b	fix: update hardcoded version strings in tests to 3.2.0 Tests had hardcoded "3.1.3" version checks that broke after the version bump to 3.2.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 22:48:12 +03:00

1 2 3 4 5 ...

682 Commits