Adds full C3.x pipeline support for Kotlin (.kt, .kts): - Language detection patterns (40+ weighted patterns for data/sealed classes, coroutines, companion objects, KMP, etc.) - AST regex parser in code_analyzer.py (classes, objects, functions, extension functions, suspend functions) - Dependency extraction for Kotlin import statements (with alias support) - Design pattern adaptations (object→Singleton, companion→Factory, sealed→Strategy, data→Builder, Flow→Observer) - Test example extraction for JUnit 4/5, Kotest, MockK, Spek - Config detection for build.gradle.kts / settings.gradle.kts - Extension maps registered in codebase_scraper, unified_codebase_analyzer, github_scraper, generate_router Also fixes pre-existing parser count tests (35→36 for doctor command added in previous commit). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
164 KiB
Changelog
All notable changes to Skill Seeker will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
Added
- Kotlin language support for codebase analysis — Full C3.x pipeline support: AST parsing (classes, objects, functions, data/sealed classes, extension functions, coroutines), dependency extraction, design pattern recognition (object declaration→Singleton, companion object→Factory, sealed class→Strategy), test example extraction (JUnit, Kotest, MockK, Spek), language detection patterns, config detection (build.gradle.kts), and extension maps across all analyzers (#287)
- Headless browser rendering (
--browserflag) — uses Playwright to render JavaScript SPA sites (React, Vue, etc.) that return empty HTML shells. Auto-installs Chromium on first use. Optional dep:pip install "skill-seekers[browser]"(#321) skill-seekers doctorcommand — 8 diagnostic checks (Python version, package install, git, core/optional deps, API keys, MCP server, output dir) with pass/warn/fail status and--verboseflag (#316)- Prompt injection check workflow — bundled
prompt-injection-checkworkflow scans scraped content for injection patterns (role assumption, instruction overrides, delimiter injection, hidden instructions). Added as first stage indefaultandsecurity-focusworkflows. Flags suspicious content without removing it (#324) - 6 behavioral UML diagrams — 3 sequence (create pipeline, GitHub+C3.x flow, MCP invocation), 2 activity (source detection, enhancement pipeline), 1 component (runtime dependencies with interface contracts)
Fixed
- GitHub language detection crashes with
TypeErrorwhen API response contains non-integer metadata keys (e.g.,"url") — now filters to integer values only (#322) - C3.x codebase analysis crashes with
TypeError—_run_c3_analysis()and_analyze_c3x()passed removedenhance_with_ai/ai_modekwargs toanalyze_codebase()instead ofenhance_level(#323)
[3.4.0] - 2026-03-21
Added
- OpenCode adaptor (
--target opencode) - Directory-based packaging with dual-format YAML frontmatter - OpenAI-compatible base class - Shared base for all OpenAI-compatible LLM platforms
- 6 new LLM platform adaptors: Kimi (
--target kimi), DeepSeek (--target deepseek), Qwen (--target qwen), OpenRouter (--target openrouter), Together AI (--target together), Fireworks AI (--target fireworks) - 7 new CLI agent install paths: roo, cline, aider, bolt, kilo, continue, kimi-code (total: 18 agents)
- OpenCode skill splitter - Auto-split large docs into focused sub-skills with router
- Bi-directional skill converter - Import/export between OpenCode and any platform format
- Distribution files for Smithery (
smithery.yaml), GitHub Actions (templates/github-actions/update-skills.yml), and Claude Code Plugin - Full UML architecture documentation — 14 class diagrams synced from source code via StarUML
- StarUML HTML API reference documentation export
- Ecosystem section in README linking all Skill Seekers repos (PyPI, website, plugin, GitHub Action)
Fixed
sanitize_url()crashes on Python 3.14 due to stricturlparserejecting bracket-containing URLs (#284)- Blindly appending
/index.html.mdto non-.md URLs — now only appends for URLs that should have it (#277) - Unified scraper temp config uses unified format for
doc_scraperinstead of raw args (#317) - Unicode arrows in CLI help text replaced with ASCII for Windows cp1252 compatibility
- CLI flags in plugin slash commands corrected (
createuses--preset,packageuses--target) - MiniMax adaptor improvements from PR #318 review (#319)
- Misleading "Scraped N pages" count reported visited URLs instead of saved pages — now shows
(N saved, M skipped)(#320) - "No scraped data found" after successful scrape on JavaScript SPA sites — now warns that site requires JS rendering (#320, #321)
Changed
- Refactored MiniMax adaptor to inherit from shared OpenAI-compatible base class
- Platform count: 5 → 12 LLM targets
- Agent count: 11 → 18 install paths
- Consolidated
Docs/intodocs/(single documentation directory) - Removed stale root-level test scripts and junk files
- Removed stale
UNIFIED_PARSERS.mdsuperseded by UML architecture - Added architecture references to README.md and CONTRIBUTING.md
- Fixed pre-existing ruff format issues in 5 files
[3.3.0] - 2026-03-16
Theme: 10 new source types (17 total), EPUB unified integration, sync-config command, performance optimizations, 12 README translations, and 19 bug fixes. 117 files changed, +41,588 lines since v3.2.0.
Supported Source Types (17)
| # | Type | CLI Command | Config Type | Auto-Detection |
|---|---|---|---|---|
| 1 | Documentation (web) | scrape / create <url> |
documentation |
HTTP/HTTPS URLs |
| 2 | GitHub repository | github / create owner/repo |
github |
owner/repo or github.com URLs |
| 3 | PDF document | pdf / create file.pdf |
pdf |
.pdf extension |
| 4 | Word document | word / create file.docx |
word |
.docx extension |
| 5 | EPUB e-book | epub / create file.epub |
epub |
.epub extension |
| 6 | Video | video / create <url/file> |
video |
YouTube/Vimeo URLs, video extensions |
| 7 | Local codebase | analyze / create ./path |
local |
Directory paths |
| 8 | Jupyter Notebook | jupyter / create file.ipynb |
jupyter |
.ipynb extension |
| 9 | Local HTML | html / create file.html |
html |
.html/.htm extensions |
| 10 | OpenAPI/Swagger | openapi / create spec.yaml |
openapi |
.yaml/.yml with OpenAPI content |
| 11 | AsciiDoc | asciidoc / create file.adoc |
asciidoc |
.adoc/.asciidoc extensions |
| 12 | PowerPoint | pptx / create file.pptx |
pptx |
.pptx extension |
| 13 | RSS/Atom feed | rss / create feed.rss |
rss |
.rss/.atom extensions |
| 14 | Man pages | manpage / create cmd.1 |
manpage |
.1–.8/.man extensions |
| 15 | Confluence wiki | confluence |
confluence |
API or export directory |
| 16 | Notion pages | notion |
notion |
API or export directory |
| 17 | Slack/Discord chat | chat |
chat |
Export directory or API |
Added
10 New Skill Source Types (17 total)
Skill Seekers now supports 17 source types — up from 7. Every new type is fully integrated into the CLI (skill-seekers <type>), create command auto-detection, unified multi-source configs, config validation, the MCP server, and the skill builder.
-
Jupyter Notebook —
skill-seekers jupyter --notebook file.ipynborskill-seekers create file.ipynb- Extracts markdown cells, code cells with outputs, kernel metadata, imports, and language detection
- Handles single files and directories of notebooks; filters
.ipynb_checkpoints - Optional dependency:
pip install "skill-seekers[jupyter]"(nbformat) - Entry point:
skill-seekers-jupyter
-
Local HTML —
skill-seekers html --html-path file.htmlorskill-seekers create file.html- Parses HTML using BeautifulSoup with smart main content detection (
<article>,<main>,.content, largest div) - Extracts headings, code blocks, tables (to markdown), images, links; converts inline HTML to markdown
- Handles single files and directories; supports
.html,.htm,.xhtmlextensions - No extra dependencies (BeautifulSoup is a core dep)
- Parses HTML using BeautifulSoup with smart main content detection (
-
OpenAPI/Swagger —
skill-seekers openapi --spec spec.yamlorskill-seekers create spec.yaml- Parses OpenAPI 3.0/3.1 and Swagger 2.0 specs from YAML or JSON (local files or URLs via
--spec-url) - Extracts endpoints, parameters, request/response schemas, security schemes, tags
- Resolves
$refreferences with circular reference protection; handlesallOf/oneOf/anyOf - Groups endpoints by tags; generates comprehensive API reference markdown
- Source detection sniffs YAML file content for
openapi:orswagger:keys (avoids false positives on non-API YAML files) - Optional dependency:
pip install "skill-seekers[openapi]"(pyyaml — already a core dep, guard added for safety)
- Parses OpenAPI 3.0/3.1 and Swagger 2.0 specs from YAML or JSON (local files or URLs via
-
AsciiDoc —
skill-seekers asciidoc --asciidoc-path file.adocorskill-seekers create file.adoc- Regex-based parser (no external library required) with optional
asciidoclibrary support - Extracts headings (= through =====),
[source,lang]code blocks,|===tables, admonitions (NOTE/TIP/WARNING/IMPORTANT/CAUTION), andinclude::directives - Converts AsciiDoc formatting to markdown; handles single files and directories
- Optional dependency:
pip install "skill-seekers[asciidoc]"(asciidoc library for advanced rendering)
- Regex-based parser (no external library required) with optional
-
PowerPoint (.pptx) —
skill-seekers pptx --pptx file.pptxorskill-seekers create file.pptx- Extracts slide text, speaker notes, tables, images (with alt text), and grouped shapes
- Detects code blocks by monospace font analysis (30+ font families)
- Groups slides into sections by layout type; handles single files and directories
- Optional dependency:
pip install "skill-seekers[pptx]"(python-pptx)
-
RSS/Atom Feeds —
skill-seekers rss --feed-url <url>/--feed-path file.rssorskill-seekers create feed.rss- Parses RSS 2.0, RSS 1.0, and Atom feeds via feedparser
- Optionally follows article links (
--follow-links, default on) to scrape full page content using BeautifulSoup - Extracts article titles, summaries, authors, dates, categories; configurable
--max-articles(default 50) - Source detection matches
.rssand.atomextensions (.xmlexcluded to avoid false positives) - Optional dependency:
pip install "skill-seekers[rss]"(feedparser)
-
Man Pages —
skill-seekers manpage --man-names git,curl/--man-path dir/orskill-seekers create git.1- Extracts man pages by running
mancommand via subprocess or reading.1–.8/.manfiles directly - Handles gzip/bzip2/xz compressed man files; strips troff/groff formatting (backspace overstriking, macros, font escapes)
- Parses structured sections (NAME, SYNOPSIS, DESCRIPTION, OPTIONS, EXAMPLES, SEE ALSO)
- Source detection uses basename heuristic to avoid false positives on log rotation files (e.g.,
access.log.1) - No external dependencies (stdlib only)
- Extracts man pages by running
-
Confluence —
skill-seekers confluence --base-url <url> --space-key <key>or--export-path dir/- API mode: fetches pages from Confluence REST API with pagination (
atlassian-python-api) - Export mode: parses Confluence HTML/XML export directories
- Extracts page content, code/panel/info/warning macros, page hierarchy, tables
- Optional dependency:
pip install "skill-seekers[confluence]"(atlassian-python-api)
- API mode: fetches pages from Confluence REST API with pagination (
-
Notion —
skill-seekers notion --database-id <id>/--page-id <id>or--export-path dir/- API mode: fetches pages via Notion API with support for 20+ block types (paragraph, heading, code, callout, toggle, table, etc.)
- Export mode: parses Notion Markdown/CSV export directories
- Extracts rich text with annotations (bold, italic, code, links), 16+ property types for database entries
- Optional dependency:
pip install "skill-seekers[notion]"(notion-client)
-
Slack/Discord Chat —
skill-seekers chat --export-path dir/or--token <token> --channel <channel>- Slack: parses workspace JSON exports or fetches via Slack Web API (
slack_sdk) - Discord: parses DiscordChatExporter JSON or fetches via Discord HTTP API
- Extracts messages, code snippets (fenced blocks), shared URLs, threads, reactions, attachments
- Generates per-channel summaries and topic categorization
- Optional dependency:
pip install "skill-seekers[chat]"(slack-sdk)
- Slack: parses workspace JSON exports or fetches via Slack Web API (
EPUB Unified Pipeline Integration
- EPUB (.epub) input support via
skill-seekers create book.epuborskill-seekers epub --epub book.epub- Extracts chapters, metadata (Dublin Core), code blocks, images, and tables from EPUB 2 and EPUB 3 files
- DRM detection with clear error messages (Adobe ADEPT, Apple FairPlay, Readium LCP)
- Font obfuscation correctly identified as non-DRM
- EPUB 3 TOC bug workaround (
ignore_ncxoption) --help-epubflag for EPUB-specific help- Optional dependency:
pip install "skill-seekers[epub]"(ebooklib) - 107 tests across 14 test classes
- EPUB added to unified scraper —
_scrape_epub()method,scraped_data["epub"], config validation (_validate_epub_source), and dry-run display. Previously EPUB worked standalone but was missing from multi-source configs.
Unified Skill Builder — Generic Merge System
_generic_merge()— Priority-based section merge for any combination of source types not covered by existing pairwise synthesis (docs+github, docs+pdf, etc.). Produces YAML frontmatter + source-attributed sections._append_extra_sources()— Appends additional source type content (e.g., Jupyter + PPTX) to pairwise-synthesized SKILL.md._generate_generic_references()— Generatesreferences/<type>/index.mdfor any source type, with ID resolution fallback chain._SOURCE_LABELSdict — Human-readable labels for all 17 source types used in merge attribution.
Config Validator Expansion
- 17 source types in
VALID_SOURCE_TYPES— All new types pluswordandvideonow have per-type validation methods. _validate_word_source()— Validatespathfield for Word documents (was previously missing)._validate_video_source()— Validatesurl,path, orplaylistfield for video sources (was previously missing).- 11 new
_validate_*_source()methods — One for each new type with appropriate required-field checks.
Source Detection Improvements
- 7 new file extension detections in
SourceDetector.detect()—.ipynb,.html/.htm,.pptx,.adoc/.asciidoc,.rss/.atom,.1–.8/.man,.yaml/.yml(with content sniffing) _looks_like_openapi()— Content sniffing for YAML files: only classifies as OpenAPI if the file containsopenapi:orswagger:key in first 20 lines (prevents false positives on docker-compose, Ansible, Kubernetes manifests, etc.)- Man page basename heuristic —
.1–.8extensions only detected as man pages if the basename has no dots (e.g.,git.1matches butaccess.log.1does not) .xmlexcluded from RSS detection — Too generic; only.rssand.atomtrigger RSS detection
MCP Server Integration
scrape_generictool — New MCP tool handles all 10 new source types via subprocess with per-type flag mapping_PATH_FLAGS/_URL_FLAGSdicts — Correct flag routing for each source type (e.g., jupyter→--notebook, html→--html-path, rss→--feed-url)GENERIC_SOURCE_TYPEStuple — Lists all 10 new types for validation- Config validation display —
validate_configtool now shows source details for all new types - Tool count updated — 33 → 34 tools (scraping tools 10 → 11)
CLI Wiring
- 10 new CLI subcommands —
jupyter,html,openapi,asciidoc,pptx,rss,manpage,confluence,notion,chatinCOMMAND_MODULES - 10 new argument modules —
arguments/{jupyter,html,openapi,asciidoc,pptx,rss,manpage,confluence,notion,chat}.pywith per-type*_ARGUMENTSdicts - 10 new parser modules —
parsers/{jupyter,html,openapi,asciidoc,pptx,rss,manpage,confluence,notion,chat}_parser.pywithSubcommandParserimplementations createcommand routing —_route_generic()method for all new types with correct module names and CLI flags- 10 new entry points in pyproject.toml —
skill-seekers-{jupyter,html,openapi,asciidoc,pptx,rss,manpage,confluence,notion,chat} - 7 new optional dependency groups in pyproject.toml —
[jupyter],[asciidoc],[pptx],[confluence],[notion],[rss],[chat] [all]group updated — Includes all 7 new optional dependencies
Sync Config Command
skill-seekers sync-config— New subcommand that crawls a docs site's navigation, diffs discovered URLs against a config'sstart_urls, and optionally writes the updated list back with--apply(#306)- BFS link discovery with configurable depth (default 2), max-pages, rate-limit
- Respects
url_patterns.include/excludefrom config - Supports optional
nav_seed_urlsconfig field - Handles both unified (sources array) and legacy flat config formats
- MCP
sync_configtool included - 57 tests (39 unit + 18 E2E with local HTTP server)
Workflow & Documentation
complex-merge.yaml— New 7-stage AI-powered workflow for complex multi-source merging (source inventory → cross-reference → conflict detection → priority merge → gap analysis → synthesis → quality check)- AGENTS.md rewritten — Updated with all 17 source types, scraper pattern docs, project layout, and key pattern documentation
- 77 new integration tests in
test_new_source_types.py— Source detection, config validation, generic merge, CLI wiring, validation, and create command routing docs/BEST_PRACTICES.md— Comprehensive guide for creating high-quality skills: SKILL.md structure, code examples, prerequisites, troubleshooting, quality targets, and real-world Grade F to Grade A example (#206)- Documentation updated for 17 source types — 32 files updated across README, CLI reference, feature matrix, MCP reference, config format, API reference, unified scraping, multi-source guide, installation, quick-start, core concepts, user guide, FAQ, troubleshooting, architecture, and all Chinese (zh-CN) translations
- README translations for 10 languages (12 total) — Added Japanese (日本語), Korean (한국어), Spanish (Español), French (Français), German (Deutsch), Portuguese (Português), Turkish (Türkçe), Arabic (العربية), Hindi (हिन्दी), and Russian (Русский) README translations with language selector bar across all versions
Performance
- Pre-compiled regex and O(1) URL dedup in doc_scraper — Module-level compiled patterns,
_enqueued_urlsset for O(1) dedup, cached URL patterns, async error logging fix (#309) - Bisect-based line indexing in code_analyzer and dependency_analyzer — O(log n)
offset_to_line()via bisect replaces O(n)count("\n")across all 10 language analyzers and all import extractors - O(n) parent class map for Python method detection — Replaces O(n²) repeated AST walks in code_analyzer
- O(1) tree traversal in github_scraper —
deque.popleft()replaces listpop(0) - Shared
build_line_index()/offset_to_line()utilities incli/utils.py— DRY extraction from code_analyzer and dependency_analyzer
Fixed
- Config validator missing
wordandvideodispatch —_validate_source()had noelifbranches forwordorvideotypes, silently skipping validation. Added dispatch entries and_validate_word_source()/_validate_video_source()methods. openapi_scraper.pyunconditionalimport yaml— Would crash at import time if pyyaml not installed. Addedtry/except ImportErrorguard withYAML_AVAILABLEflag and_check_yaml_deps()helper.asciidoc_scraper.pymissing standard arguments —main()manually defined args instead of usingadd_asciidoc_arguments(). Refactored to use shared argument definitions + added enhancement workflow integration.pptx_scraper.pymissing standard arguments — Same issue. Refactored to useadd_pptx_arguments().chat_scraper.pymissing standard arguments — Same issue. Refactored to useadd_chat_arguments().notion_scraper.pymissingrun_workflowscall —--enhance-workflowflags were silently ignored. Added workflow runner integration.openapi_scraper.pyreturn typeNone—main()returnedNoneinstead ofint. Fixed toreturn 0on success, matching all other scrapers.- MCP
scrape_generic_toolflag mismatch — Was passing--path/--urlas generic flags, but every scraper expects its own flag name (e.g.,--notebook,--html-path,--spec). All 10 source types would have failed at runtime. Fixed with per-type_PATH_FLAGSand_URL_FLAGSmappings. - Word scraper
docx_idkey mismatch — Unified scraper data dict useddocx_idbut generic reference generation looked forword_id. Addedword_idalias. main.pydocstring stale — Missing all 10 new commands. Updated to list all 27 commands.source_detector.pymodule docstring stale — Described only 5 source types. Updated to describe 14+ detected types.manpage_parser.pydocstring referenced wrong file — Saidmanpage_scraper.pybut actual file isman_scraper.py. Fixed.- Parser registry test count — Updated expected count from 25 to 35 for 10 new parsers.
- 'Invalid IPv6 URL' error on bracket-containing URLs (#284) — URLs with square brackets (e.g.,
/api/[v1]/users) discovered via BFS crawl or HTML extraction bypassed the original fix in_clean_url(). Added sharedsanitize_url()utility applied at every URL ingestion point. 16 new tests. - GitHub scraper 'list index out of range' on issue extraction (#269) — PyGithub's
PaginatedListslicing could fail on some versions or empty repos. Replaced withitertools.islice(). - Release workflow version mismatch — GitHub release showed wrong version (v3.1.3 instead of v3.2.0) because no explicit release name was set and sed regex had unescaped dots. Added explicit
name/tag_name, version consistency check (tag vs pyproject.toml vs package), and empty release notes fallback. - Release workflow Python 3.10 compatibility — Version consistency check used
tomllib(Python 3.11+). Replaced with grep/sed for 3.10 compatibility. infer_categories()"tutorial" vs "tutorials" key mismatch — Guard checked'tutorial'but wrote to'tutorials'key, risking silent overwrites in category inference.- Flaky
test_benchmark_metadata_overhead— Stabilized with 20 iterations, warm-up run, median averaging, and 200% threshold (was failing on CI with 5 iterations and mean). - CI branch protection check permanently pending — Summary job was named 'All Checks Complete' but branch protection required 'Tests'. PRs were stuck as 'Expected — Waiting for status to be reported'. Renamed job to match.
[3.2.0] - 2026-03-01
Theme: Video source support, Word document support, Pinecone adaptor, and quality improvements. 94 files changed, +23,500 lines since v3.1.3. 2,540 tests passing.
🎬 Video Tutorial Scraping Pipeline (BETA)
Complete video tutorial extraction system that converts YouTube videos and local video files into AI-consumable skills. The pipeline extracts transcripts, performs visual OCR on code editor panels, tracks code evolution across frames, and generates structured SKILL.md output.
Added
Video Pipeline Core (skill-seekers video)
skill-seekers video --url <youtube-url>— New CLI command for video tutorial scraping. Also supports--video-filefor local files and--playlistfor YouTube playlistsskill-seekers create <youtube-url>— Auto-detects YouTube URLs and routes to video scrapervideo_scraper.py(~960 lines) — Main orchestrator: metadata → transcript → segmentation → visual extraction → SKILL.md generationvideo_models.py(~815 lines) — 20+ dataclasses:VideoMetadata,TranscriptSegment,VideoChapter,KeyframeData,FrameSubSection,TextBlock,CodeTimeline,SetupModules, etc.video_metadata.py(~270 lines) — YouTube metadata extraction (title, channel, views, chapters, duration) via yt-dlp; local file metadata via ffprobevideo_transcript.py(~370 lines) — Multi-source transcript extraction with 3-tier fallback: YouTube Transcript API → yt-dlp subtitles → faster-whisper local transcriptionvideo_segmenter.py(~220 lines) — Chapter-based and time-window segmentation with configurable overlapvideo_visual.py(~2,410 lines) — Visual extraction pipeline:- Keyframe detection via scene change (scenedetect) with configurable threshold
- Frame classification (code editor, slides, terminal, browser, other)
- Panel detection — splits IDE screenshots into independent sub-sections (code, terminal, file tree)
- Per-panel OCR — Each detected panel OCR'd independently with its own bounding box
- Multi-engine OCR ensemble — EasyOCR + pytesseract for code frames (per-line confidence merge with code-token preference), EasyOCR only for non-code frames
- Parallel OCR —
ThreadPoolExecutorfor multi-panel frames - Narrow panel filtering (300px min width) to skip UI chrome
- Text block tracking with spatial panel position matching across frames
- Code timeline with edit tracking (additions, modifications, deletions)
- Vision API fallback when OCR confidence < 0.5
- Tesseract circuit breaker (
_tesseract_brokenflag) — disables pytesseract after first failure
- Audio-visual alignment — Code blocks paired with narrator transcript for context
- Video-specific AI enhancement — Custom prompt for OCR denoising, code reconstruction, and tutorial narrative synthesis
- Two-pass AI enhancement — Pass 1 cleans reference files (Code Timeline reconstruction from transcript context), Pass 2 generates SKILL.md from cleaned references
_ai_clean_reference()— Sends reference file to Claude to reconstruct code blocks using transcript context, fixing OCR noise before SKILL.md generationvideo-tutorial.yamlworkflow preset — 4-stage enhancement pipeline (OCR cleanup → language detection → tutorial synthesis → skill polish)- Video arguments —
arguments/video.pywithVIDEO_ARGUMENTSdict:--url,--video-file,--playlist,--vision-ocr,--keyframe-threshold,--max-keyframes,--whisper-model,--setup, etc. - Video parser —
parsers/video_parser.pyfor unified CLI parser registry - MCP
scrape_videotool — Full video scraping from MCP server with 6 visual params, setup mode, and playlist support tests/test_video_scraper.py(197 tests) — Comprehensive coverage: models, metadata, transcript, segmenter, visual extraction, OCR, panel detection, scraper integration, CLI arguments, OCR cleaning, code filtering
Video --setup: GPU Auto-Detection & Dependency Installation
skill-seekers video --setup— One-command GPU auto-detection and dependency installationvideo_setup.py(~835 lines) — Complete setup orchestration module- GPU auto-detection — Detects NVIDIA (nvidia-smi → CUDA version), AMD (rocminfo → ROCm version), or CPU-only without requiring PyTorch
- Correct PyTorch variant — Installs from the right index URL:
cu124/cu121/cu118for NVIDIA,rocm6.3/rocm6.2.4for AMD,cpufor CPU-only - ROCm configuration — Sets
MIOPEN_FIND_MODE=FASTandHSA_OVERRIDE_GFX_VERSIONfor AMD GPUs - Virtual environment detection — Warns users outside a venv with opt-in
--forceoverride - System dependency checks — Validates
tesseractandffmpegbinaries, provides OS-specific install instructions - Module selection —
SetupModulesdataclass for optional component selection (easyocr, opencv, tesseract, scenedetect, whisper) - Base video deps always included —
yt-dlpandyoutube-transcript-apiinstalled automatically - Verification step — Post-install import checks including
torch.cuda.is_available()andtorch.version.hip - Non-interactive mode —
run_setup(interactive=False)for MCP server and CI/CD use
--setupearly-exit — Runs before source validation (no--urlrequired)- MCP
scrape_videosetup parameter —setup: bool = Falseinserver_fastmcp.pyandscraping_tools.py createcommand routing — Forwards--setupto video scrapertests/test_video_setup.py(60 tests) — GPU detection, CUDA/ROCm version mapping, installation, verification, venv checks, system deps, module selection
Microsoft Word (.docx) Support
skill-seekers word --docx <file>andskill-seekers create document.docx— Full pipeline: mammoth → HTML → BeautifulSoup → sections → SKILL.md + references/word_scraper.py—WordToSkillConverterclass (~600 lines) with heading/code/table/image/metadata extractionarguments/word.py—add_word_arguments()+WORD_ARGUMENTSdictparsers/word_parser.py— WordParser for unified CLI parser registrytests/test_word_scraper.py— Comprehensive test suite (~300 lines)
.docxauto-detection insource_detector.py— Routes to word scraper--help-wordflag in create command for Word-specific help- Word support in unified scraper —
_scrape_word()method for multi-source scraping skill-seekers-wordentry point in pyproject.tomldocxoptional dependency group —pip install skill-seekers[docx](mammoth + python-docx)
Other Additions
- Pinecone adaptor —
pinecone_adaptor.pywith full upload support videoandvideo-fulloptional dependency groups in pyproject.tomlskill-seekers-videoentry point in pyproject.toml- Video plan documents — 8 design documents in
docs/plans/video/(research, data models, pipeline, integration, output, testing, dependencies, overview)
Fixed
Video Pipeline OCR Quality Fixes (6)
- Webcam/OTHER frames skip OCR — WEBCAM and OTHER frame types no longer get OCR'd, eliminating ~64 junk OCR results per video
_clean_ocr_line()helper — Strips leading line numbers, IDE tab bar text, Unity Inspector labels, and VS Code collapse markers from OCR output_fix_intra_line_duplication()— Detects and removes token sequence repetition from multi-engine OCR overlap (e.g.,gpublic class Card Jpublic class Card→public class Card)_is_likely_code()filter — Reference file code fences now filtered to reject UI junk (Inspector, Hierarchy, Canvas labels) that passed frame classification- Language detection on text groups —
get_text_groups()now runsLanguageDetector.detect_from_code()on each group, filling the previously-always-Nonedetected_languagefield - OCR cleaning in text assembly —
_assemble_structured_text()applies_clean_ocr_line()to every line before joining
Video Pipeline Fixes (15)
extract_visual_datareturning 2-tuple instead of 3 — CausedValueErrorcrash when unpacking results- pytesseract in core deps — Moved from core dependencies to
[video-full]optional group - 30-min timeout for video enhancement subprocess — Previously could hang indefinitely
scrape_video_implmissing from MCP server fallback import — Added to import block- Auto-generated YouTube captions not detected — Now checks
is_generatedproperty on transcripts --vision-ocrand--video-playlistnot forwarded —createcommand now passes these to video scraper- Filename collision for non-ASCII video titles — Falls back to
video_idwhen title contains non-ASCII characters _vision_usednot a proper dataclass field — Made a proper field onFrameSubSectiondataclass- 6 visual params missing from MCP
scrape_video— Exposed keyframe_threshold, max_keyframes, whisper_model, vision_ocr, video_playlist, video_file - Missing video dep install instructions in unified scraper — Added guidance when video dependencies are not installed
- MCP docstring tool counts outdated — Updated from 25→33 tools across 7 categories
- Video and word commands missing from
main.pydocstring — Added to CLI help text video-fullexclusion from[all]deps undocumented — Added comment in pyproject.toml- Parser registry test count wrong — Updated expected count from 22→23 for video parser
Scraper & Quality Fixes
- Issue #300: Selector fallback & dry-run link discovery —
create https://reactflow.dev/now finds 20+ pages (was 1):extract_content()extracted links after early-return → moved before- Dry-run used
main.find_all("a")instead ofsoup.find_all("a")→ fixed - Async dry-run had no link extraction at all → added
get_configuration()CSS comma selector conflicted with fallback loop → removed defaultcreate --configwithbase_urlconfig incorrectly routed to unified_scraper → now peeks at JSON- Selector fallback duplicated in 3 places with
bodyfallback → extractedFALLBACK_MAIN_SELECTORSconstant +_find_main_content()helper
- Issue #301:
setup.shfails on macOS —pip3pointed to different Python thanpython3. Changed topython3 -m pip. - RAG chunking crash (
AttributeError: output_dir) —converter.output_dirdoesn't exist onDocToSkillConverter. Changed toPath(converter.skill_dir). --varflag silently dropped increaterouting —main.pyreadargs.workflow_varinstead ofargs.var--chunk-overlap-tokensmissing frompackagecommand — Wired through entire pipeline:package_skill()→adaptor.package()→format_skill_md()→_maybe_chunk_content()→RAGChunker- Chunk overlap auto-scaling — Auto-scales to
max(50, chunk_tokens // 10)when chunk size is non-default - Weaviate
ImportErrormasked by generic handler — Addedexcept ImportErrorbeforeexcept Exception - Hardcoded chunk defaults in 12 adaptors — Replaced
512/50withDEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENSconstants - Reference file code truncation —
codebase_scraper.pyno longer truncates code blocks to 500 chars (5 locations) - Enhancement code block limit —
summarize_reference()now uses character-budget approach instead of[:5]cap - Intro boundary code block desync — Tracks code block state to prevent splitting inside code blocks
- Hardcoded
pythonlanguage —unified_skill_builder.pyandhow_to_guide_builder.pynow use detected language - GitHub reference file limits removed — No more caps on issues (was 20), releases (was 10), or release bodies (was 500 chars)
- GitHub scraper reference limits removed —
github_scraper.pyno longer caps open_issues at 20 or closed_issues at 10 - PDF scraper fixes — Real API/LOCAL enhancement (was stub); removed
[:3]reference file limit - Word scraper code detection — Detect mammoth monospace
<p><br>blocks as code - Language detector method — Fixed
detect_from_text→detect_from_codein word scraper .docxfile extension validation — Non-.docxfiles raiseValueErrorwith clear message- Double
_score_code_quality()call — Consolidated to single call in word scraper --no-preserve-coderenamed — Now--no-preserve-code-blocks(backward-compat alias kept)- Dead variable — Removed unused
_target_linesinenhance_skill_local.py
Changed
easyocrremoved fromvideo-fulloptional deps — Was pulling ~2GB of NVIDIA CUDA packages regardless of GPU vendor. Now installed via--setupwith correct PyTorch variant.- Video dependency error messages —
video_scraper.pyandvideo_visual.pynow suggestskill-seekers video --setupas primary fix - Shared embedding methods consolidated —
_generate_openai_embeddings()and_generate_st_embeddings()moved toSkillAdaptorbase class, eliminating ~150 lines of duplication from chroma/weaviate/pinecone adaptors - Chunk constants centralized —
DEFAULT_CHUNK_TOKENS = 512andDEFAULT_CHUNK_OVERLAP_TOKENS = 50inarguments/common.py, used across all 12 adaptors + rag_chunker + base + package_skill + create_command - Enhancement summarizer architecture — Character-budget approach with
target_ratiofor both code blocks and heading chunks
[3.1.3] - 2026-02-24
🐛 Hotfix — Explicit Chunk Flags & Argument Pipeline Cleanup
Fixed
- Issue #299:
skill-seekers package --target claudeunrecognised argument crash —_reconstruct_argv()inmain.pyemits default flag values back into argv when routing subcommands.package_skill.pyhad a 105-line inline argparser that used different flag names to those inarguments/package.py, so forwarded flags were rejected. Fixed by replacing the inline block with a call toadd_package_arguments(parser)— the single source of truth.
Changed
package_skill.pyargparser refactored — Replaced ~105 lines of inline argparse duplication with a singleadd_package_arguments(parser)call. Flag names are now guaranteed consistent with_reconstruct_argv()output, preventing future argument-name drift.- Explicit chunk flag names — All
--chunk-*flags now include unit suffixes to eliminate ambiguity between RAG tokens and streaming characters:--chunk-size(RAG tokens) →--chunk-tokens--chunk-overlap(RAG tokens) →--chunk-overlap-tokens--chunk(enable RAG chunking) →--chunk-for-rag--streaming-chunk-size(chars) →--streaming-chunk-chars--streaming-overlap(chars) →--streaming-overlap-chars--chunk-sizein PDF extractor (pages) →--pdf-pages-per-chunk
setup_logging()centralized — Addedsetup_logging(verbose, quiet)toutils.pyand removed 4 duplicate module-levellogging.basicConfig()calls fromdoc_scraper.py,github_scraper.py,codebase_scraper.py, andunified_scraper.py
[3.1.2] - 2026-02-24
🔧 Fix create Command Argument Forwarding, Gemini Model, and Enhance Dispatcher
Fixed
createcommand argument forwarding — Universal flags (--dry-run,--verbose,--quiet,--name,--description) now work correctly across all source types. Previously,create <url> -p quick --dry-run,create owner/repo --dry-run, andcreate ./path --dry-runwould crash because sub-scrapers didn't accept those flagsskill-seekers analyze --dry-run— Fixed_handle_analyze_command()inmain.pynot forwarding--dry-run,--preset,--quiet,--name,--description,--api-key, and workflow flags to codebase_scraper- Gemini model 404 errors — Replaced retired
gemini-2.0-flash-expwithgemini-2.5-flash(stable GA) in the Gemini adaptor. Users attempting Gemini enhancement were getting 404 Not Found errors skill-seekers enhanceauto-detection — The documented behaviour of auto-detecting API vs LOCAL mode was never implemented.enhancenow correctly routes to the platform API when a key is present:ANTHROPIC_API_KEY→ Claude API,GOOGLE_API_KEY→ Gemini API,OPENAI_API_KEY→ OpenAI API, no key → LOCAL mode (Claude Code Max, free). Use--mode LOCALto force local mode regardless
Added
- Shared argument contract — New
add_all_standard_arguments(parser)inarguments/common.pyregisters common + behavior + workflow args on any parser as a single call BEHAVIOR_ARGUMENTS— Centralized--dry-run,--verbose,--quietdefinitions inarguments/common.py--dry-runfor GitHub scraper —skill-seekers github --repo owner/repo --dry-runnow previews the operation--dry-runfor PDF scraper —skill-seekers pdf --name test --dry-runnow previews the operation--verbose/--quietfor GitHub and PDF scrapers — Logging level control now works consistently across all scrapers--name/--descriptionfor codebase analyzer — Custom skill name and description can now be passed toskill-seekers analyze--mode LOCALflag forskill-seekers enhance— Explicitly forces LOCAL mode even when API keys are present
Changed
- Argument deduplication — Removed duplicated argument definitions from
arguments/github.py,arguments/scrape.py,arguments/analyze.py,arguments/pdf.py; all now import shared args fromarguments/common.py createcommand_add_common_args()— Only forwards truly universal flags; route-specific flags (--preset,--config,--chunk-for-rag, etc.) moved to their respective route methodscodebase_scraper.pyargparser — Replaced ~190 lines of inline argparser withadd_analyze_arguments(parser)call
[3.1.1] - 2026-02-23
🐛 Hotfix
Fixed
createcommandmax_pagesAttributeError — Fixed crash whenmax_pagesargument was not provided in web source routing. Usesgetattr()for safe attribute access (#293, #294)
Changed
- Version bump to 3.1.1
[3.1.0] - 2026-02-23
🎯 "Unified CLI & Developer Experience" — Feature Release
Theme: One command for everything. Better developer tooling. 2280+ tests passing.
Added
Unified create Command
- Single command for all source types — auto-detects URL, GitHub repo (
owner/repo), local directory, PDF file, or multi-source config JSONskill-seekers create https://docs.react.dev/ skill-seekers create facebook/react skill-seekers create ./my-project skill-seekers create tutorial.pdf - Progressive help disclosure — default
--helpshows 13 universal flags; detailed help per source:--help-web,--help-github,--help-local,--help-pdf,--help-advanced,--help-all
-pshortcut for preset selection:skill-seekers create <source> -p quick|standard|comprehensive--local-repo-pathflag for specifying local clone path in create command with validation- Supports multi-source config files as input (routes to unified scraper)
Enhancement Workflow Preset System
- New
workflowsCLI subcommand to manage enhancement workflow presets - 65 bundled workflow presets shipped as YAML files in
skill_seekers/workflows/:- Core:
default,minimal,security-focus,architecture-comprehensive,api-documentation - Domain-specific:
rest-api-design,graphql-schema,grpc-services,websockets-realtime,event-driven,message-queues,stream-processing - Architecture:
microservices-patterns,serverless-architecture,kubernetes-deployment,devops-deployment,terraform-guide - Frontend:
responsive-design,component-library,forms-validation,design-system,pwa-checklist,ssr-guide,deep-linking,state-management - Quality:
testing-focus,testing-frontend,performance-optimization,observability-stack,troubleshooting-guide,accessibility-a11y - Data:
database-schema,data-validation,feature-engineering,vector-databases,mlops-pipeline,model-deployment,computer-vision - Security:
encryption-guide,iam-identity,secrets-management,compliance-gdpr,auth-strategies - Cloud:
aws-services,backup-disaster-recovery - Patterns:
advanced-patterns,api-evolution,migration-guide,contribution-guide,onboarding-beginner,comparison-matrix,sdk-integration,platform-specific,cli-tooling,build-tools - Mobile:
push-notifications,offline-first,localization-i18n - Background:
background-jobs,rate-limiting,caching-strategies,webhook-guide,api-gateway
- Core:
- User presets stored in
~/.config/skill-seekers/workflows/ - Subcommands:
skill-seekers workflows list— List all bundled + user workflows with descriptionsskill-seekers workflows show <name>— Print YAML content of a workflowskill-seekers workflows copy <name> [name ...]— Copy bundled workflow(s) to user dirskill-seekers workflows add <file.yaml> [file ...]— Install custom YAML file(s) into user dirskill-seekers workflows remove <name> [name ...]— Delete user workflow(s)skill-seekers workflows validate <name|path>— Parse and validate a workflow
copy,add,removeall accept multiple names/files in one command (partial-failure: continues processing, returns non-zero if any item fails)- New entry point:
skill-seekers-workflows
Multiple --enhance-workflow Flags from CLI
- Chain workflows in a single command:
skill-seekers create <source> --enhance-workflow security-focus --enhance-workflow minimal - Supported across all scrapers:
scrape,github,analyze,pdf,unified
Smart Enhancement Dispatcher (skill-seekers enhance)
- Auto-routes to API mode (Claude/Gemini/OpenAI) when API key is available, LOCAL mode (Claude Code CLI) otherwise
- Decision priority:
--targetflag → configdefault_agent→ env vars (ANTHROPIC_API_KEY→ claude,GOOGLE_API_KEY→ gemini,OPENAI_API_KEY→ openai) → LOCAL fallback - Blocks LOCAL mode when running as root (Docker/VPS) with clear error message + API mode instructions (fixes #286, #289)
- New flags:
--target,--api-key,--dry-run,--interactive-enhancement
Unified Document Parser System
- New
parsers/extractors.pymodule withRstParser,MarkdownParserclasses - ReStructuredText (RST) support — parses class references, code blocks, tables, cross-references
- Shared
parse_document()factory function for RST/Markdown/PDF input - Integrated into documentation extraction pipeline for richer content
ContentBlockTypeandCrossRefTypeenums for structured parsing output
Local Source Support in Unified Scraper
"type": "local"source type in unified config JSONs — analyze local codebases alongside web/GitHub/PDF sources--local-repo-pathCLI flag in unified scraper for per-source path override
CLI Flag Parity Across All Commands
analyze,pdf, andunifiedcommands now have full flag parity withscrape/github:--api-keyonpdfandunified--enhance-levelonunified--dry-runonanalyze- All workflow flags (
--enhance-workflow,--enhance-stage,--var,--workflow-dry-run) onanalyze
- Workflow JSON config fields (
workflows,workflow_stages,workflow_vars) now merged with CLI flags inunifiedscraper
Fixed
- Percent-encode brackets in llms.txt URLs — prevent "Invalid IPv6 URL" errors when scraping sites with bracket characters (fixes #284)
- Platform-appropriate config paths on Windows — use
%APPDATA%instead of~/.config(fixes #283) createcommand multi-source config — now correctly routes to unified scraper when input is a.jsonconfig filecreatecommand_add_common_args()— correctly forwards each--enhance-workflowvalue as a separate flag to sub-scrapers (previously collapsed list to single string, causing workflows to be ignored)_extract_markdown_content— filter out bareh1headings and short stub paragraphs that polluted extracted content- Godot unified config language names — corrected
gdscript/gdsto proper names ingodot_unified.json - Python 3.10 type union compatibility — use
Optional[X]instead ofX | Nonein forward-reference positions _route_configin unified scraper — correctly handles all source types when routing config-driven scraping- CONFIG_ARGUMENTS — added to ensure unified CLI has full argument visibility for config-based sources
- Test suite isolation —
test_swift_detection.pynow saves/restoressys.modulesand parent package attributes; prevents@patchdecorators in downstream files from targeting stale module objects - Python 3.14 chromadb compatibility — catch
pydantic.v1.errors.ConfigError(not justImportError) when chromadb is installed - langchain import path — updated
langchain.schema→langchain_core.documentsfor langchain 1.x - Removed legacy
sys.path.insert()calls fromcodebase_scraper.py,doc_scraper.py,enhance_skill.py,enhance_skill_local.py,estimate_pages.py,install_skill.py(unnecessary withpip install -e .) - Benchmark timing threshold — relaxed metadata overhead assertion from 10% to 50% for CI runner variability
Changed
- Enhancement flags consolidated —
--enhance-level(0-3) replaces three separate flags (--enhance,--enhance-local,--api-key). Old flags still accepted with deprecation warnings until v4.0.0 workflows copy/add/removenow accept multiple names/files in one invocationpyproject.toml— PyYAML added as core dependency (required by workflow preset management); langchain and llama-index added as dependencies; MCP version requirement updated to>=1.25
Tests
- 2280+ tests passing (2158 non-MCP + ~122 MCP, up from 1852 in v3.0.0), 11 skipped (external services), 0 failures
- Added
TestAnalyzeWorkflowFlags,TestUnifiedCLIArguments,TestPDFCLIArgumentsclasses - Added
tests/test_mcp_workflow_tools.py— 5 MCP workflow tool tests - Added
tests/test_unified_scraper_orchestration.py— UnifiedScraper orchestration tests - Removed
@unittest.skipfrom gemini/openai/claude adaptor tests that were ready - Removed
@requires_githubfrom 5 unified_analyzer tests that fully mock their dependencies - Macros-specific tests now use
@patch(sys.platform)instead of runtimeskipTest()for platform portability
Config Repository (skill-seekers-configs)
- 178 production configs reviewed and enhanced across all 22 categories — brought to v1.1.0 quality standard
- Removed all
max_pagesfields from production configs (deprecated, defaults apply automatically) - Fixed outdated URLs:
astro.json(Astro v3 restructure:/en/core-concepts/→/en/basics/),laravel.json(11.x → 12.x throughout) - Fixed structural bug in
httpx_comprehensive.json—url_patterns,categories,rate_limitmoved from top-level intosources[0](required for unified format) - Removed hash-fragment start_urls from
zod.json(scrapers don't follow?id=anchors) - Improved category/selector quality across all 22 categories: 5-13 categories per config, 3-6 keywords each, semantic selector fallback chains
- README.md: corrected config count from outdated "50+" to accurate 178 production / 182 total; all category counts verified
- CONTRIBUTING.md, QUALITY_GUIDELINES.md, AGENTS.md: aligned with production standards; removed all
max_pagesguidance scripts/validate-config.py: fixed two bugs — unified config categories lookup (was always reporting "no categories" for multi-source configs) andmax_pageswarning logic (was warning when absent, now correctly warns when present)- Deleted
.github/ISSUE_TEMPLATE/submit-config.md(old duplicate ofsubmit-config.ymlwith outdated content)
[3.0.0] - 2026-02-10
🚀 "Universal Intelligence Platform" - Major Release
Theme: Transform any documentation into structured knowledge for any AI system.
This is our biggest release ever! v3.0.0 establishes Skill Seekers as the universal documentation preprocessor for the entire AI ecosystem - from RAG pipelines to AI coding assistants to Claude skills.
Highlights
- 🚀 16 platform adaptors (up from 4 in v2.x)
- 🛠️ 26 MCP tools (up from 9)
- ✅ 1,852 tests passing (up from 700+)
- ☁️ Cloud storage support (S3, GCS, Azure)
- 🔄 CI/CD ready (GitHub Action + Docker)
- 📦 12 example projects for every integration
- 📚 18 integration guides complete
Added - Platform Adaptors (16 Total)
RAG & Vector Databases (8)
- LangChain (
--format langchain) - Output LangChain Document objects - LlamaIndex (
--format llama-index) - Output LlamaIndex TextNode objects - Chroma (
--format chroma) - Direct ChromaDB integration - FAISS (
--format faiss) - Facebook AI Similarity Search - Haystack (
--format haystack) - Deepset Haystack pipelines - Qdrant (
--format qdrant) - Qdrant vector database - Weaviate (
--format weaviate) - Weaviate vector search - Pinecone-ready (
--target markdown) - Markdown format ready for Pinecone
AI Platforms (3)
- Claude (
--target claude) - Claude AI skills (ZIP + YAML) - Gemini (
--target gemini) - Google Gemini skills (tar.gz) - OpenAI (
--target openai) - OpenAI ChatGPT (ZIP + Vector Store)
AI Coding Assistants (4)
- Cursor (
--target claude+.cursorrules) - Cursor IDE integration - Windsurf (
--target claude+.windsurfrules) - Windsurf/Codeium - Cline (
--target claude+.clinerules) - VS Code extension - Continue.dev (
--target claude) - Universal IDE support
Generic (1)
- Markdown (
--target markdown) - Generic ZIP export
Added - MCP Tools (26 Total)
Config Tools (3)
generate_config- Generate scraping configurationlist_configs- List available preset configsvalidate_config- Validate config JSON structure
Scraping Tools (8)
estimate_pages- Estimate page count before scrapingscrape_docs- Scrape documentation websitesscrape_github- Scrape GitHub repositoriesscrape_pdf- Extract from PDF filesscrape_codebase- Analyze local codebasesdetect_patterns- Detect design patterns in codeextract_test_examples- Extract usage examples from testsbuild_how_to_guides- Build how-to guides from code
Packaging Tools (4)
package_skill- Package skill for target platformupload_skill- Upload to LLM platformenhance_skill- AI-powered enhancementinstall_skill- One-command complete workflow
Source Tools (5)
fetch_config- Fetch config from remote sourcesubmit_config- Submit config for approvaladd_config_source- Add Git config sourcelist_config_sources- List config sourcesremove_config_source- Remove config source
Splitting Tools (2)
split_config- Split large configsgenerate_router- Generate router skills
Vector DB Tools (4)
export_to_weaviate- Export to Weaviateexport_to_chroma- Export to ChromaDBexport_to_faiss- Export to FAISSexport_to_qdrant- Export to Qdrant
Added - Cloud Storage
Upload skills directly to cloud storage:
- AWS S3 -
skill-seekers cloud upload --provider s3 --bucket my-bucket - Google Cloud Storage -
skill-seekers cloud upload --provider gcs --bucket my-bucket - Azure Blob Storage -
skill-seekers cloud upload --provider azure --container my-container
Features:
- Upload/download directories
- List files with metadata
- Check file existence
- Generate presigned URLs
- Cloud-agnostic interface
Added - CI/CD Support
GitHub Action
- uses: skill-seekers/action@v1
with:
config: configs/react.json
format: langchain
Features:
- Auto-update on doc changes
- Matrix builds for multiple frameworks
- Scheduled updates
- Caching for faster runs
Docker
docker run -v $(pwd):/data skill-seekers:latest scrape --config /data/config.json
Added - Production Infrastructure
- Helm Charts - Kubernetes deployment
- Docker Compose - Local vector DB stack
- Monitoring - Sentry integration, sync monitoring
- Benchmarking - Performance testing framework
Added - 12 Example Projects
Complete working examples for every integration:
- langchain-rag-pipeline - React docs → LangChain → Chroma
- llama-index-query-engine - Vue docs → LlamaIndex
- pinecone-upsert - Documentation → Pinecone
- chroma-example - Full ChromaDB workflow
- faiss-example - FAISS index building
- haystack-pipeline - Haystack RAG pipeline
- qdrant-example - Qdrant vector DB
- weaviate-example - Weaviate integration
- cursor-react-skill - React skill for Cursor
- windsurf-fastapi-context - FastAPI for Windsurf
- cline-django-assistant - Django assistant for Cline
- continue-dev-universal - Universal IDE context
Quality Metrics
- ✅ 1,852 tests across 100 test files
- ✅ 58,512 lines of Python code
- ✅ 80+ documentation files
- ✅ 100% test coverage for critical paths
- ✅ CI/CD on every commit
Fixed
URL Conversion Bug with Anchor Fragments (Issue #277)
- Critical Bug Fix: Fixed 404 errors when scraping documentation with anchor links
- Problem: URLs with anchor fragments (e.g.,
#synchronous-initialization) were malformed- Incorrect:
https://example.com/docs/api#method/index.html.md❌ - Correct:
https://example.com/docs/api/index.html.md✅
- Incorrect:
- Root Cause:
_convert_to_md_urls()didn't strip anchor fragments before appending/index.html.md - Solution: Parse URLs with
urllib.parseto remove fragments and deduplicate base URLs - Impact: Prevents duplicate requests for the same page with different anchors
- Additional Fix: Changed
.mddetection from".md" in urltourl.endswith('.md')- Prevents false matches on URLs like
/cmd-lineor/AMD-processors
- Prevents false matches on URLs like
- Problem: URLs with anchor fragments (e.g.,
- Test Coverage: 12 comprehensive tests covering all edge cases
- Anchor fragment stripping
- Deduplication of multiple anchors on same URL
- Query parameter preservation
- Trailing slash handling
- Real-world MikroORM case validation
- 54/54 tests passing (42 existing + 12 new)
- Reported by: @devjones via Issue #277
Added
Extended Language Detection (NEW)
- 7 New Programming Languages: Dart, Scala, SCSS, SASS, Elixir, Lua, Perl
- Pattern-based detection with confidence scoring (0.6-0.8+ thresholds)
- 70 regex patterns prioritizing unique identifiers (weight 5)
- Framework-specific patterns:
- Dart: Flutter widgets (
StatelessWidget,StatefulWidget,Widget build()) - Scala: Pattern matching (
case class,trait,match {}) - SCSS: Preprocessor features (
$variables,@mixin,@include,@extend) - SASS: Indented syntax (
=mixin,+include,$variables) - Elixir: Functional patterns (
defmodule,def ... do, pipe operator|>) - Lua: Game scripting (
local,repeat...until,~=,elseif) - Perl: Text processing (
my $,use strict,sub,chomp, regex=~)
- Dart: Flutter widgets (
- Comprehensive test coverage: 7 new tests, 30/30 passing (100%)
- False positive prevention: Unique identifiers (weight 5) + confidence thresholds
- No regressions: All existing language detection tests still pass
- Total language support: Now 27+ programming languages
- Credit: Contributed by @PaawanBarach via PR #275
Multi-Agent Support for Local Enhancement (NEW)
- Multiple Coding Agent Support: Choose your preferred local coding agent for SKILL.md enhancement
- Claude Code (default): Claude Code CLI with
--dangerously-skip-permissions - Codex CLI: OpenAI Codex CLI with
--full-autoand--skip-git-repo-check - Copilot CLI: GitHub Copilot CLI (
gh copilot chat) - OpenCode CLI: OpenCode CLI
- Custom agents: Use any CLI tool with
--agent custom --agent-cmd "command {prompt_file}"
- Claude Code (default): Claude Code CLI with
- CLI Arguments: New flags for agent selection
--agent: Choose agent (claude, codex, copilot, opencode, custom)--agent-cmd: Override command template for custom agents
- Environment Variables: CI/CD friendly configuration
SKILL_SEEKER_AGENT: Default agent to useSKILL_SEEKER_AGENT_CMD: Default command template for custom agents
- Security First: Custom command validation
- Blocks dangerous shell characters (
;,&,|,$,`,\n,\r) - Validates executable exists in PATH
- Safe parsing with
shlex.split()
- Blocks dangerous shell characters (
- Dual Input Modes: Supports both file-based and stdin-based agents
- File-based: Uses
{prompt_file}placeholder (Claude, custom agents) - Stdin-based: Pipes prompt via stdin (Codex CLI)
- File-based: Uses
- Backward Compatible: Claude Code remains the default, no breaking changes
- Comprehensive Tests: 13 new tests covering all agent types and security validation
- Agent Normalization: Smart alias handling (e.g., "claude-code" → "claude")
- Credit: Contributed by @rovo79 (Robert Dean) via PR #270
C3.10: Signal Flow Analysis for Godot Projects (NEW)
-
Complete Signal Flow Analysis System: Analyze event-driven architectures in Godot game projects
- Signal declaration extraction (
signalkeyword detection) - Connection mapping (
.connect()calls with targets and methods) - Emission tracking (
.emit()andemit_signal()calls) - 208 signals, 634 connections, and 298 emissions detected in test project (Cosmic Idler)
- Signal density metrics (signals per file)
- Event chain detection (signals triggering other signals)
- Output:
signal_flow.json,signal_flow.mmd(Mermaid diagram),signal_reference.md
- Signal declaration extraction (
-
Signal Pattern Detection: Three major patterns identified
- EventBus Pattern (0.90 confidence): Centralized signal hub in autoload
- Observer Pattern (0.85 confidence): Multi-observer signals (3+ listeners)
- Event Chains (0.80 confidence): Cascading signal propagation
-
Signal-Based How-To Guides (C3.10.1): AI-generated usage guides
- Step-by-step guides (Connect → Emit → Handle)
- Real code examples from project
- Common usage locations
- Parameter documentation
- Output:
signal_how_to_guides.md(10 guides for Cosmic Idler)
Godot Game Engine Support
-
Comprehensive Godot File Type Support: Full analysis of Godot 4.x projects
- GDScript (.gd): 265 files analyzed in test project
- Scene files (.tscn): 118 scene files
- Resource files (.tres): 38 resource files
- Shader files (.gdshader, .gdshaderinc): 9 shader files
- C# integration: Phantom Camera addon (13 files)
-
GDScript Language Support: Complete GDScript parsing with regex-based extraction
- Dependency extraction:
preload(),load(),extendspatterns - Test framework detection: GUT, gdUnit4, WAT
- Test file patterns:
test_*.gd,*_test.gd - Signal syntax:
signal,.connect(),.emit() - Export decorators:
@export,@onready - Test decorators:
@test(gdUnit4)
- Dependency extraction:
-
Game Engine Framework Detection: Improved detection for Unity, Unreal, Godot
- Godot markers:
project.godot,.godotdirectory,.tscn,.tres,.gdfiles - Unity markers:
Assembly-CSharp.csproj,UnityEngine.dll,ProjectSettings/ProjectVersion.txt - Unreal markers:
.uproject,Source/,Config/DefaultEngine.ini - Fixed false positive Unity detection (was using generic "Assets" keyword)
- Godot markers:
-
GDScript Test Extraction: Extract usage examples from Godot test files
- 396 test cases extracted from 20 GUT test files in test project
- Patterns: instantiation (
preload().new(),load().new()), assertions (assert_eq,assert_true), signals - GUT framework:
extends GutTest,func test_*(),add_child_autofree() - Test categories: instantiation, assertions, signal connections, setup/teardown
- Real code examples from production test files
C3.9: Project Documentation Extraction
- Markdown Documentation Extraction: Automatically extracts and categorizes all
.mdfiles from projects- Smart categorization by folder/filename (overview, architecture, guides, workflows, features, etc.)
- Processing depth control:
surface(raw copy),deep(parse+summarize),full(AI-enhanced) - AI enhancement (level 2+) adds topic extraction and cross-references
- New "📖 Project Documentation" section in SKILL.md
- Output to
references/documentation/organized by category - Default ON, use
--skip-docsto disable - 15 new tests for documentation extraction features
Granular AI Enhancement Control
--enhance-levelFlag: Fine-grained control over AI enhancement (0-3)- Level 0: No AI enhancement (default)
- Level 1: SKILL.md enhancement only (fast, high value)
- Level 2: SKILL.md + Architecture + Config + Documentation
- Level 3: Full enhancement (patterns, tests, config, architecture, docs)
- Config Integration:
default_enhance_levelsetting in~/.config/skill-seekers/config.json - MCP Support: All MCP tools updated with
enhance_levelparameter - Independent from
--comprehensive: Enhancement level is separate from feature depth
C# Language Support
- C# Test Example Extraction: Full support for C# test frameworks
- Language alias mapping (C# → csharp, C++ → cpp)
- NUnit, xUnit, MSTest test framework patterns
- Mock pattern support (NSubstitute, Moq)
- Zenject dependency injection patterns
- Setup/teardown method extraction
- 2 new tests for C# extraction features
Performance Optimizations
- Parallel LOCAL Mode AI Enhancement: 6-12x faster with ThreadPoolExecutor
- Concurrent workers: 3 (configurable via
local_parallel_workers) - Batch processing: 20 patterns per Claude CLI call (configurable via
local_batch_size) - Significant speedup for large codebases
- Concurrent workers: 3 (configurable via
- Config Settings: New
ai_enhancementsection in configlocal_batch_size: Patterns per CLI call (default: 20)local_parallel_workers: Concurrent workers (default: 3)
UX Improvements
-
Auto-Enhancement: SKILL.md automatically enhanced when using
--enhanceor--comprehensive- No need for separate
skill-seekers enhancecommand - Seamless one-command workflow
- 10-minute timeout for large codebases
- Graceful fallback with retry instructions on failure
- No need for separate
-
LOCAL Mode Fallback: All AI enhancements now fall back to LOCAL mode when no API key is set
- Applies to: pattern enhancement (C3.1), test examples (C3.2), architecture (C3.7)
- Uses Claude Code CLI instead of failing silently
- Better UX: "Using LOCAL mode (Claude Code CLI)" instead of "AI disabled"
-
Support for custom Claude-compatible API endpoints via
ANTHROPIC_BASE_URLenvironment variable -
Compatibility with GLM-4.7 and other Claude-compatible APIs across all AI enhancement features
Changed
- All AI enhancement modules now respect
ANTHROPIC_BASE_URLfor custom endpoints - Updated documentation with GLM-4.7 configuration examples
- Rewritten LOCAL mode in
config_enhancer.pyto use Claude CLI properly with explicit output file paths - Updated MCP
scrape_codebase_toolwithskip_docsandenhance_levelparameters - Updated CLAUDE.md with C3.9 documentation extraction feature
- Increased default batch size from 5 to 20 patterns for LOCAL mode
Fixed
- C# Test Extraction: Fixed "Language C# not supported" error with language alias mapping
- Config Type Field Mismatch: Fixed KeyError in
config_enhancer.pyby supporting both "type" and "config_type" fields - LocalSkillEnhancer Import: Fixed incorrect import and method call in
main.py(SkillEnhancer → LocalSkillEnhancer) - Code Quality: Fixed 4 critical linter errors (unused imports, variables, arguments, import sorting)
Godot Game Engine Fixes
-
GDScript Dependency Extraction: Fixed 265+ "Syntax error in *.gd" warnings (commit
3e6c448)- GDScript files were incorrectly routed to Python AST parser
- Created dedicated
_extract_gdscript_imports()with regex patterns - Now correctly parses
preload(),load(),extendspatterns - Result: 377 dependencies extracted with 0 warnings
-
Framework Detection False Positive: Fixed Unity detection on Godot projects (commit
50b28fe)- Was detecting "Unity" due to generic "Assets" keyword in comments
- Changed Unity markers to specific files:
Assembly-CSharp.csproj,UnityEngine.dll,Library/ - Now correctly detects Godot via
project.godot,.godotdirectory
-
Circular Dependencies: Fixed self-referential cycles (commit
50b28fe)- 3 self-loop warnings (files depending on themselves)
- Added
target != file_pathcheck in dependency graph builder - Result: 0 circular dependencies detected
-
GDScript Test Discovery: Fixed 0 test files found in Godot projects (commit
50b28fe)- Added GDScript test patterns:
test_*.gd,*_test.gd - Added GDScript to LANGUAGE_MAP
- Result: 32 test files discovered (20 GUT files with 396 tests)
- Added GDScript test patterns:
-
GDScript Test Extraction: Fixed "Language GDScript not supported" warning (commit
c826690)- Added GDScript regex patterns to PATTERNS dictionary
- Patterns: instantiation (
preload().new()), assertions (assert_eq), signals (.connect()) - Result: 22 test examples extracted successfully
-
Config Extractor Array Handling: Fixed JSON/YAML array parsing (commit
fca0951)- Error:
'list' object has no attribute 'items'on root-level arrays - Added isinstance checks for dict/list/primitive at root
- Result: No JSON array errors, save.json parsed correctly
- Error:
-
Progress Indicators: Fixed missing progress for small batches (commit
eec37f5)- Progress only shown every 5 batches, invisible for small jobs
- Modified condition to always show for batches < 10
- Result: "Progress: 1/2 batches completed" now visible
Other Fixes
- C# Test Extraction: Fixed "Language C# not supported" error with language alias mapping
- Config Type Field Mismatch: Fixed KeyError in
config_enhancer.pyby supporting both "type" and "config_type" fields - LocalSkillEnhancer Import: Fixed incorrect import and method call in
main.py(SkillEnhancer → LocalSkillEnhancer) - Code Quality: Fixed 4 critical linter errors (unused imports, variables, arguments, import sorting)
Tests
- GDScript Test Extraction Test: Added comprehensive test case for GDScript GUT/gdUnit4 framework
- Tests player instantiation with
preload()andload() - Tests signal connections and emissions
- Tests gdUnit4
@testannotation syntax - Tests game state management patterns
- 4 test functions with 60+ lines of GDScript code
- Validates extraction of instantiations, assertions, and signal patterns
- Tests player instantiation with
Removed
- Removed client-specific documentation files from repository
[2.7.4] - 2026-01-22
🔧 Bug Fix - Language Selector Links
This patch release fixes the broken Chinese language selector link that appeared on PyPI and other non-GitHub platforms.
Fixed
- Broken Language Selector Links on PyPI
- Issue: Chinese language link used relative URL (
README.zh-CN.md) which only worked on GitHub - Impact: Users on PyPI clicking "简体中文" got 404 errors
- Solution: Changed to absolute GitHub URL (
https://github.com/yusufkaraaslan/Skill_Seekers/blob/main/README.zh-CN.md) - Result: Language selector now works on PyPI, GitHub, and all platforms
- Files Fixed:
README.md,README.zh-CN.md
- Issue: Chinese language link used relative URL (
Technical Details
Why This Happened:
- PyPI displays
README.mdbut doesn't includeREADME.zh-CN.mdin the package - Relative links break when README is rendered outside GitHub repository context
- Absolute GitHub URLs work universally across all platforms
Impact:
- ✅ Chinese language link now accessible from PyPI
- ✅ Consistent experience across all platforms
- ✅ Better user experience for Chinese developers
[2.7.3] - 2026-01-21
🌏 International i18n Release
This documentation release adds comprehensive Chinese language support, making Skill Seekers accessible to the world's largest developer community.
Added
-
🇨🇳 Chinese (Simplified) README Translation (#260)
- Complete 1,962-line translation of all documentation (README.zh-CN.md)
- Language selector badges in both English and Chinese READMEs
- Machine translation disclaimer with invitation for community improvements
- GitHub issue #260 created for community review and contributions
- Impact: Makes Skill Seekers accessible to 1+ billion Chinese speakers
-
📦 PyPI Metadata Internationalization
- Updated package description to highlight Chinese documentation availability
- Added i18n-related keywords: "i18n", "chinese", "international"
- Added Natural Language classifiers: English and Chinese (Simplified)
- Added direct link to Chinese README in project URLs
- Impact: Better discoverability on PyPI for Chinese developers
Why This Matters
- Market Reach: Addresses existing Chinese traffic and taps into world's largest developer community
- Discoverability: Better indexing on Chinese search engines (Baidu, Gitee, etc.)
- User Experience: Native language documentation lowers barrier to entry
- Community Growth: Opens contribution opportunities from Chinese developers
- Competitive Edge: Most similar tools don't offer Chinese documentation
Community Engagement
Chinese developers are invited to improve the translation quality:
- Review issue: https://github.com/yusufkaraaslan/Skill_Seekers/issues/260
- Translation guidelines provided for technical accuracy and natural expression
- All contributions welcome and appreciated
[2.7.2] - 2026-01-21
🚨 Critical CLI Bug Fixes
This hotfix release resolves 4 critical CLI bugs reported in issues #258 and #259 that prevented core commands from working correctly.
Fixed
-
Issue #258:
install --configcommand fails with unified scraper (#258)- Root Cause:
unified_scraper.pymissing--freshand--dry-runargument definitions - Solution: Added both flags to unified_scraper argument parser and main.py dispatcher
- Impact:
skill-seekers install --config reactnow works without "unrecognized arguments" error - Files Fixed:
src/skill_seekers/cli/unified_scraper.py,src/skill_seekers/cli/main.py
- Root Cause:
-
Issue #259 (Original):
scrapecommand doesn't accept URL and --max-pages (#259)- Root Cause: No positional URL argument or
--max-pagesflag support - Solution: Added positional URL argument and
--max-pagesflag with safety warnings - Impact:
skill-seekers scrape https://example.com --max-pages 50now works - Safety Warnings:
- ⚠️ Warning if max-pages > 1000 (may take hours)
- ⚠️ Warning if max-pages < 10 (incomplete skill)
- Files Fixed:
src/skill_seekers/cli/doc_scraper.py,src/skill_seekers/cli/main.py
- Root Cause: No positional URL argument or
-
Issue #259 (Comment A): Version shows 2.7.0 instead of actual version (#259)
- Root Cause: Hardcoded version string in main.py
- Solution: Import
__version__from__init__.pydynamically - Impact:
skill-seekers --versionnow shows correct version (2.7.2) - Files Fixed:
src/skill_seekers/cli/main.py
-
Issue #259 (Comment B): PDF command shows empty "Error: " message (#259)
- Root Cause: Exception handler didn't handle empty exception messages
- Solution:
- Improved exception handler to show exception type if message is empty
- Added proper error handling with context-specific messages
- Added traceback support in verbose mode
- Impact: PDF errors now show clear messages like "Error: RuntimeError occurred" instead of just "Error: "
- Files Fixed:
src/skill_seekers/cli/main.py,src/skill_seekers/cli/pdf_scraper.py
Testing
- ✅ Verified
skill-seekers install --config react --dry-runworks - ✅ Verified
skill-seekers scrape https://tailwindcss.com/docs/installation --max-pages 50works - ✅ Verified
skill-seekers --versionshows "2.7.2" - ✅ Verified PDF errors show proper messages
- ✅ All 202 tests passing
[2.7.1] - 2026-01-18
🚨 Critical Bug Fix - Config Download 404 Errors
This hotfix release resolves a critical bug causing 404 errors when downloading configs from the API.
Fixed
- Critical: Config download 404 errors - Fixed bug where code was constructing download URLs manually instead of using the
download_urlfield from the API response- Root Cause: Code was building
f"{API_BASE_URL}/api/download/{config_name}.json"which failed when actual URLs differed (CDN URLs, version-specific paths) - Solution: Changed to use
config_info.get("download_url")from API response in both MCP server implementations - Files Fixed:
src/skill_seekers/mcp/tools/source_tools.py(FastMCP server)src/skill_seekers/mcp/server_legacy.py(Legacy server)
- Impact: Fixes all config downloads from skillseekersweb.com API and private Git repositories
- Reported By: User testing
skill-seekers install --config godot --unlimited - Testing: All 15 source tools tests pass, all 8 fetch_config tests pass
- Root Cause: Code was building
[2.7.0] - 2026-01-18
🔐 Smart Rate Limit Management & Multi-Token Configuration
This minor feature release introduces intelligent GitHub rate limit handling, multi-profile token management, and comprehensive configuration system. Say goodbye to indefinite waits and confusing token setup!
Added
-
🎯 Multi-Token Configuration System - Flexible GitHub token management with profiles
- Secure config storage at
~/.config/skill-seekers/config.jsonwith 600 permissions - Multiple GitHub profiles support (personal, work, OSS, etc.)
- Per-profile rate limit strategies:
prompt,wait,switch,fail - Configurable timeout per profile (default: 30 minutes)
- Auto-detection and smart fallback chain
- Profile switching when rate limited
- Per-profile rate limit strategies:
- API key management for Claude, Gemini, OpenAI
- Environment variable fallback (ANTHROPIC_API_KEY, GOOGLE_API_KEY, OPENAI_API_KEY)
- Config file storage with secure permissions
- Progress tracking for resumable jobs
- Auto-save at configurable intervals (default: 60 seconds)
- Job metadata: command, progress, checkpoints, timestamps
- Stored at
~/.local/share/skill-seekers/progress/
- Auto-cleanup of old progress files (default: 7 days, configurable)
- First-run experience with welcome message and quick setup
- ConfigManager class with singleton pattern for global access
- Secure config storage at
-
🧙 Interactive Configuration Wizard - Beautiful terminal UI for easy setup
- Main menu with 7 options:
- GitHub Token Setup
- API Keys (Claude, Gemini, OpenAI)
- Rate Limit Settings
- Resume Settings
- View Current Configuration
- Test Connections
- Clean Up Old Progress Files
- GitHub token management:
- Add/remove profiles with descriptions
- Set default profile
- Browser integration - opens GitHub token creation page
- Token validation with format checking (ghp_, github_pat_)
- Strategy selection per profile
- API keys setup with browser integration for each provider
- Connection testing to verify tokens and API keys
- Configuration display with current status and sources
- CLI commands:
skill-seekers config- Main menuskill-seekers config --github- Direct to GitHub setupskill-seekers config --api-keys- Direct to API keysskill-seekers config --show- Show current configskill-seekers config --test- Test connections
- Main menu with 7 options:
-
🚦 Smart Rate Limit Handler - Intelligent GitHub API rate limit management
- Upfront warning about token status (60/hour vs 5000/hour)
- Real-time detection of rate limits from GitHub API responses
- Parses X-RateLimit-* headers
- Detects 403 rate limit errors
- Calculates reset time from timestamps
- Live countdown timers with progress display
- Automatic profile switching - tries next available profile when rate limited
- Four rate limit strategies:
prompt- Ask user what to do (default, interactive)wait- Auto-wait with countdown timerswitch- Automatically try another profilefail- Fail immediately with clear error
- Non-interactive mode for CI/CD (fail fast, no prompts)
- Configurable timeouts per profile (prevents indefinite waits)
- RateLimitHandler class with strategy pattern
- Integration points: GitHub fetcher, GitHub scraper
-
📦 Resume Command - Resume interrupted scraping jobs
- List resumable jobs with progress details:
- Job ID, started time, command
- Current phase and file counts
- Last updated timestamp
- Resume from checkpoints (skeleton implemented, ready for integration)
- Auto-cleanup of old jobs (respects config settings)
- CLI commands:
skill-seekers resume --list- List all resumable jobsskill-seekers resume <job-id>- Resume specific jobskill-seekers resume --clean- Clean up old jobs
- Progress storage at
~/.local/share/skill-seekers/progress/<job-id>.json
- List resumable jobs with progress details:
-
⚙️ CLI Enhancements - New flags and improved UX
- --non-interactive flag for CI/CD mode
- Available on:
skill-seekers github - Fails fast on rate limits instead of prompting
- Perfect for automated pipelines
- Available on:
- --profile flag to select specific GitHub profile
- Available on:
skill-seekers github - Uses configured profile from
~/.config/skill-seekers/config.json - Overrides environment variables and defaults
- Available on:
- Entry points for new commands:
skill-seekers-config- Direct config command accessskill-seekers-resume- Direct resume command access
- --non-interactive flag for CI/CD mode
-
🧪 Comprehensive Test Suite - Full test coverage for new features
- 16 new tests in
test_rate_limit_handler.py - Test coverage:
- Header creation (with/without token)
- Handler initialization (token, strategy, config)
- Rate limit detection and extraction
- Upfront checks (interactive and non-interactive)
- Response checking (200, 403, rate limit)
- Strategy handling (fail, wait, switch, prompt)
- Config manager integration
- Profile management (add, retrieve, switch)
- All tests passing ✅ (16/16)
- Test utilities: Mock responses, config isolation, tmp directories
- 16 new tests in
-
🎯 Bootstrap Skill Feature - Self-hosting capability (PR #249)
- Self-Bootstrap: Generate skill-seekers as a Claude Code skill
./scripts/bootstrap_skill.sh- One-command bootstrap- Combines manual header with auto-generated codebase analysis
- Output:
output/skill-seekers/ready for Claude Code - Install:
cp -r output/skill-seekers ~/.claude/skills/
- Robust Frontmatter Detection:
- Dynamic YAML frontmatter boundary detection (not hardcoded line counts)
- Fallback to line 6 if frontmatter not found
- Future-proof against frontmatter field additions
- SKILL.md Validation:
- File existence and non-empty checks
- Frontmatter delimiter presence
- Required fields validation (name, description)
- Exit with clear error messages on validation failures
- Comprehensive Error Handling:
- UV dependency check with install instructions
- Permission checks for output directory
- Graceful degradation on missing header file
- Self-Bootstrap: Generate skill-seekers as a Claude Code skill
-
🔧 MCP Now Optional - User choice for installation profile
- CLI Only:
pip install skill-seekers- No MCP dependencies - MCP Integration:
pip install skill-seekers[mcp]- Full MCP support - All Features:
pip install skill-seekers[all]- Everything enabled - Lazy Loading: Graceful failure with helpful error messages when MCP not installed
- Interactive Setup Wizard:
- Shows all installation options on first run
- Stored at
~/.config/skill-seekers/.setup_shown - Accessible via
skill-seekers-setupcommand
- Entry Point:
skill-seekers-setupfor manual access
- CLI Only:
-
🧪 E2E Testing for Bootstrap - Comprehensive end-to-end tests
- 6 core tests verifying bootstrap workflow:
- Output structure creation
- Header prepending
- YAML frontmatter validation
- Line count sanity checks
- Virtual environment installability
- Platform adaptor compatibility
- Pytest markers: @pytest.mark.e2e, @pytest.mark.venv, @pytest.mark.slow
- Execution modes:
- Fast tests:
pytest -k "not venv"(~2-3 min) - Full suite:
pytest -m "e2e"(~5-10 min)
- Fast tests:
- Test utilities: Fixtures for project root, bootstrap runner, output directory
- 6 core tests verifying bootstrap workflow:
-
📚 Comprehensive Documentation Overhaul - Complete v2.7.0 documentation update
- 7 new documentation files (~3,750 lines total):
docs/reference/API_REFERENCE.md(750 lines) - Programmatic usage guide for Python developersdocs/features/BOOTSTRAP_SKILL.md(450 lines) - Self-hosting capability documentationdocs/reference/CODE_QUALITY.md(550 lines) - Code quality standards and ruff linting guidedocs/guides/TESTING_GUIDE.md(750 lines) - Complete testing reference (1200+ test suite)docs/QUICK_REFERENCE.md(300 lines) - One-page cheat sheet for quick command lookupdocs/guides/MIGRATION_GUIDE.md(400 lines) - Version upgrade guides (v1.0.0 → v2.7.0)docs/FAQ.md(550 lines) - Comprehensive Q&A for common user questions
- 10 existing files updated:
README.md- Updated test count badge (700+ → 1200+ tests), v2.7.0 calloutROADMAP.md- Added v2.7.0 completion section with task statusesCONTRIBUTING.md- Added link to CODE_QUALITY.md referencedocs/README.md- Quick links by use case, recent updates sectiondocs/guides/MCP_SETUP.md- Fixed server_fastmcp references (PR #252)docs/QUICK_REFERENCE.md- Updated MCP server reference (server.py → server_fastmcp.py)CLAUDE_INTEGRATION.md- Updated version references- 3 other documentation files with v2.7.0 updates
- Version consistency: All version references standardized to v2.7.0
- Test counts: Standardized to 1200+ tests (was inconsistent 700+ in some docs)
- MCP tool counts: Updated to 18 tools (from 17)
- 7 new documentation files (~3,750 lines total):
-
📦 Git Submodules for Configuration Management - Improved config organization and API deployment
- Configs as git submodule at
api/configs_repo/for cleaner repository - Production configs: Added official production-ready configuration presets
- Duplicate removal: Cleaned up all duplicate configs from main repository
- Test filtering: Filtered out test-example configs from API endpoints
- CI/CD integration: GitHub Actions now initializes submodules automatically
- API deployment: Updated render.yaml to use git submodule for configs_repo
- Benefits: Cleaner main repo, better config versioning, production/test separation
- Configs as git submodule at
-
🔍 Config Discovery Enhancements - Improved config listing
- --all flag for estimate command:
skill-seekers estimate --all - Lists all available preset configurations with descriptions
- Helps users discover supported frameworks before scraping
- Shows config names, frameworks, and documentation URLs
- --all flag for estimate command:
Changed
-
GitHub Fetcher - Integrated rate limit handler
- Modified
github_fetcher.pyto useRateLimitHandler - Added upfront rate limit check before starting
- Check responses for rate limits on all API calls
- Automatic profile detection from config
- Raises
RateLimitErrorwhen rate limit cannot be handled - Constructor now accepts
interactiveandprofile_nameparameters
- Modified
-
GitHub Scraper - Added rate limit support
- New
--non-interactiveflag for CI/CD mode - New
--profileflag to select GitHub profile - Config now supports
interactiveandgithub_profilekeys - CLI argument passing for non-interactive and profile options
- New
-
Main CLI - Enhanced with new commands
- Added
configsubcommand with options (--github, --api-keys, --show, --test) - Added
resumesubcommand with options (--list, --clean) - Updated GitHub subcommand with --non-interactive and --profile flags
- Updated command documentation strings
- Version bumped to 2.7.0
- Added
-
pyproject.toml - New entry points and dependency restructuring
- Added
skill-seekers-configentry point - Added
skill-seekers-resumeentry point - Added
skill-seekers-setupentry point for setup wizard - MCP moved to optional dependencies - Now requires
pip install skill-seekers[mcp] - Updated pytest markers: e2e, venv, bootstrap, slow
- Version updated to 2.7.0
- Added
-
install_skill.py - Lazy MCP loading
- Try/except ImportError for MCP imports
- Graceful failure with helpful error message when MCP not installed
- Suggests alternatives: scrape + package workflow
- Maintains backward compatibility for existing MCP users
Fixed
-
Code Quality Improvements - Fixed all 21 ruff linting errors across codebase
- SIM102: Combined nested if statements using
andoperator (7 fixes) - SIM117: Combined multiple
withstatements into single multi-contextwith(9 fixes) - B904: Added
from eto exception chaining for proper error context (1 fix) - SIM113: Removed unused enumerate counter variable (1 fix)
- B007: Changed unused loop variable to
_(1 fix) - ARG002: Removed unused method argument in test fixture (1 fix)
- Files affected: config_extractor.py, config_validator.py, doc_scraper.py, pattern_recognizer.py (3), test_example_extractor.py (3), unified_skill_builder.py, pdf_scraper.py, and 6 test files
- Result: Zero linting errors, cleaner code, better maintainability
- SIM102: Combined nested if statements using
-
Version Synchronization - Fixed version mismatch across package (Issue #248)
- All
__init__.pyfiles now correctly show version 2.7.0 (was 2.5.2 in 4 files) - Files updated:
src/skill_seekers/__init__.py,src/skill_seekers/cli/__init__.py,src/skill_seekers/mcp/__init__.py,src/skill_seekers/mcp/tools/__init__.py - Ensures
skill-seekers --versionshows accurate version number - Critical: Prevents bug where PyPI shows wrong version (Issue #248)
- All
-
Case-Insensitive Regex in Install Workflow - Fixed install workflow failures (Issue #236)
- Made regex patterns case-insensitive using
(?i)flag - Patterns now match both "Saved to:" and "saved to:" (and any case variation)
- Files:
src/skill_seekers/mcp/tools/packaging_tools.py(lines 529, 668) - Impact: install_skill workflow now works reliably regardless of output formatting
- Made regex patterns case-insensitive using
-
Test Fixture Error - Fixed pytest fixture error in bootstrap skill tests
- Removed unused
tmp_pathparameter causing fixture lookup errors - File:
tests/test_bootstrap_skill.py:54 - Result: All CI test runs now pass without fixture errors
- Removed unused
-
MCP Setup Modernization - Updated MCP server configuration (PR #252, @MiaoDX)
- Fixed 41 instances of
server_fastmcp_fastmcp→server_fastmcptypo in docs/guides/MCP_SETUP.md - Updated all 12 files to use
skill_seekers.mcp.server_fastmcpmodule - Enhanced setup_mcp.sh with automatic venv detection (.venv, venv, $VIRTUAL_ENV)
- Updated tests to accept
-e ".[mcp]"format and module references - Files: .claude/mcp_config.example.json, CLAUDE.md, README.md, docs/guides/*.md, setup_mcp.sh, tests/test_setup_scripts.py
- Benefits: Eliminates "module not found" errors, clean dependency isolation, prepares for v3.0.0
- Fixed 41 instances of
-
Rate limit indefinite wait - No more infinite waiting
- Configurable timeout per profile (default: 30 minutes)
- Clear error messages when timeout exceeded
- Graceful exit with helpful next steps
- Resume capability for interrupted jobs
-
Token setup confusion - Clear, guided setup process
- Interactive wizard with browser integration
- Token validation with helpful error messages
- Clear documentation of required scopes
- Test connection feature to verify tokens work
-
CI/CD failures - Non-interactive mode support
--non-interactiveflag fails fast instead of hanging- No user prompts in non-interactive mode
- Clear error messages for automation logs
- Exit codes for pipeline integration
-
AttributeError in codebase_scraper.py - Fixed incorrect flag check (PR #249)
- Changed
if args.build_api_reference:toif not args.skip_api_reference: - Aligns with v2.5.2 opt-out flag strategy (--skip-* instead of --build-*)
- Fixed at line 1193 in codebase_scraper.py
- Changed
Technical Details
- Architecture: Strategy pattern for rate limit handling, singleton for config manager
- Files Modified: 6 (github_fetcher.py, github_scraper.py, main.py, pyproject.toml, install_skill.py, codebase_scraper.py)
- New Files: 6 (config_manager.py ~490 lines, config_command.py ~400 lines, rate_limit_handler.py ~450 lines, resume_command.py ~150 lines, setup_wizard.py ~95 lines, test_bootstrap_skill_e2e.py ~169 lines)
- Bootstrap Scripts: 2 (bootstrap_skill.sh enhanced, skill_header.md)
- Tests: 22 tests added, all passing (16 rate limit + 6 E2E bootstrap)
- Dependencies: MCP moved to optional, no new required dependencies
- Backward Compatibility: Fully backward compatible, MCP optionality via pip extras
- Credits: Bootstrap feature contributed by @MiaoDX (PR #249)
Migration Guide
Existing users - No migration needed! Everything works as before.
MCP users - If you use MCP integration features:
# Reinstall with MCP support
pip install -U skill-seekers[mcp]
# Or install everything
pip install -U skill-seekers[all]
New installation profiles:
# CLI only (no MCP)
pip install skill-seekers
# With MCP integration
pip install skill-seekers[mcp]
# With multi-LLM support (Gemini, OpenAI)
pip install skill-seekers[all-llms]
# Everything
pip install skill-seekers[all]
# See all options
skill-seekers-setup
To use new features:
# Set up GitHub token (one-time)
skill-seekers config --github
# Add multiple profiles
skill-seekers config
# → Select "1. GitHub Token Setup"
# → Select "1. Add New Profile"
# Use specific profile
skill-seekers github --repo owner/repo --profile work
# CI/CD mode
skill-seekers github --repo owner/repo --non-interactive
# View configuration
skill-seekers config --show
# Bootstrap skill-seekers as a Claude Code skill
./scripts/bootstrap_skill.sh
cp -r output/skill-seekers ~/.claude/skills/
Breaking Changes
None - this release is fully backward compatible.
[2.6.0] - 2026-01-13
🚀 Codebase Analysis Enhancements & Documentation Reorganization
This minor feature release completes the C3.x codebase analysis suite with standalone SKILL.md generation for codebase scraper, adds comprehensive documentation reorganization, and includes quality-of-life improvements for setup and testing.
Added
-
C3.8 Standalone Codebase Scraper SKILL.md Generation - Complete skill structure for standalone codebase analysis
- Generates comprehensive SKILL.md (300+ lines) with all C3.x analysis integrated
- Sections: Description, When to Use, Quick Reference, Design Patterns, Architecture, Configuration, Available References
- Includes language statistics, analysis depth indicators, and feature checkboxes
- Creates references/ directory with organized outputs (API, dependencies, patterns, architecture, config)
- Integration points:
- CLI tool:
skill-seekers analyze --directory /path/to/code --output /path/to/output - Unified scraper: Automatic SKILL.md generation when using codebase analysis
- CLI tool:
- Format helpers for all C3.x sections (patterns, examples, API, architecture, config)
- Perfect for local codebase documentation without GitHub
- Use Cases: Private codebases, offline analysis, local project documentation, pre-commit hooks
- Documentation: Integrated into codebase scraper workflow
-
Global Setup Script with FastMCP - setup.sh for end-user global installation
- New
setup.shscript for global PyPI installation (vssetup_mcp.shfor development) - Installs
skill-seekersglobally:pip3 install skill-seekers - Sets up MCP server configuration for Claude Code Desktop
- Creates MCP configuration in
~/.claude/mcp_settings.json - Uses global Python installation (no editable install)
- Perfect for end users who want to use Skill Seekers without development setup
- Separate from development setup:
setup_mcp.shremains for editable development installs - Documentation: Root-level setup.sh with clear installation instructions
- New
-
Comprehensive Documentation Reorganization - Complete overhaul of documentation structure
- Removed 7 temporary/analysis files from root directory
- Archived 14 historical documents to
docs/archive/(historical, research, temp) - Organized 29 documentation files into clear subdirectories:
docs/features/(10 files) - Core features, AI enhancement, PDF toolsdocs/integrations/(3 files) - Multi-LLM platform supportdocs/guides/(6 files) - Setup, MCP, usage guidesdocs/reference/(8 files) - Architecture, standards, technical reference
- Created
docs/README.md- Comprehensive navigation index with:- Quick navigation by category
- "I want to..." user-focused navigation
- Clear entry points for all documentation
- Links to guides, features, integrations, and reference docs
- Benefits: 3x faster documentation discovery, user-focused navigation, scalable structure
- Structure: Before: 64 files scattered → After: 57 files organized with clear navigation
-
Test Configuration - AstroValley unified config for testing
- Added
configs/astrovalley_unified.jsonfor comprehensive testing - Demonstrates GitHub + codebase analysis integration
- Verified AI enhancement works on both standalone and unified skills
- Tests context awareness: standalone (codebase-only) vs unified (GitHub+codebase)
- Quality metrics: 8.2x growth for standalone, 3.7x for unified enhancement
- Added
-
Enhanced LOCAL Enhancement Modes - Advanced enhancement execution options (moved from previous unreleased)
- 4 Execution Modes for different use cases:
- Headless (default): Runs in foreground, waits for completion (perfect for CI/CD)
- Background (
--background): Runs in background thread, returns immediately - Daemon (
--daemon): Fully detached process withnohup, survives parent exit - Terminal (
--interactive-enhancement): Opens new terminal window (macOS)
- Force Mode (Default ON): Skip all confirmations by default for maximum automation
- No flag needed - force mode is ON by default
- Use
--no-forceto enable confirmation prompts if needed - Perfect for CI/CD, batch processing, unattended execution
- "Dangerously skip mode" as requested - auto-yes to everything
- Status Monitoring: New
enhance-statuscommand for background/daemon processes- Check status once:
skill-seekers enhance-status output/react/ - Watch in real-time:
skill-seekers enhance-status output/react/ --watch - JSON output for scripts:
skill-seekers enhance-status output/react/ --json
- Check status once:
- Status File:
.enhancement_status.jsontracks progress (status, message, progress %, PID, timestamp, errors) - Daemon Logging:
.enhancement_daemon.logfor daemon mode execution logs - Timeout Configuration: Custom timeouts for different skill sizes (
--timeoutflag) - CLI Integration: All modes accessible via
skill-seekers enhancecommand - Documentation: New
docs/ENHANCEMENT_MODES.mdguide with examples - Use Cases:
- CI/CD pipelines: Force ON by default (no extra flags!)
- Long-running tasks:
--daemonfor tasks that survive logout - Parallel processing:
--backgroundfor batch enhancement - Debugging:
--interactive-enhancementto watch Claude Code work
- 4 Execution Modes for different use cases:
-
C3.1 Design Pattern Detection - Detect 10 common design patterns in code
- Detects: Singleton, Factory, Observer, Strategy, Decorator, Builder, Adapter, Command, Template Method, Chain of Responsibility
- Supports 9 languages: Python, JavaScript, TypeScript, C++, C, C#, Go, Rust, Java (plus Ruby, PHP)
- Three detection levels: surface (fast), deep (balanced), full (thorough)
- Language-specific adaptations for better accuracy
- CLI tool:
skill-seekers-patterns --file src/db.py - Codebase scraper integration:
--detect-patternsflag - MCP tool:
detect_patternsfor Claude Code integration - 24 comprehensive tests, 100% passing
- 87% precision, 80% recall (tested on 100 real-world projects)
- Documentation:
docs/PATTERN_DETECTION.md
-
C3.2 Test Example Extraction - Extract real usage examples from test files
- Analyzes test files to extract real API usage patterns
- Categories: instantiation, method_call, config, setup, workflow
- Supports 9 languages: Python (AST-based deep analysis), JavaScript, TypeScript, Go, Rust, Java, C#, PHP, Ruby (regex-based)
- Quality filtering with confidence scoring (removes trivial patterns)
- CLI tool:
skill-seekers extract-test-examples tests/ --language python - Codebase scraper integration:
--extract-test-examplesflag - MCP tool:
extract_test_examplesfor Claude Code integration - 19 comprehensive tests, 100% passing
- JSON and Markdown output formats
- Documentation:
docs/TEST_EXAMPLE_EXTRACTION.md
-
C3.3 How-To Guide Generation with Comprehensive AI Enhancement - Transform test workflows into step-by-step educational guides with professional AI-powered improvements
- Automatically generates comprehensive markdown tutorials from workflow test examples
- 🆕 COMPREHENSIVE AI ENHANCEMENT - 5 automatic improvements that transform basic guides (⭐⭐) into professional tutorials (⭐⭐⭐⭐⭐):
- Step Descriptions - Natural language explanations for each step (not just syntax)
- Troubleshooting Solutions - Diagnostic flows + solutions for common errors
- Prerequisites Explanations - Why each prerequisite is needed + setup instructions
- Next Steps Suggestions - Related guides, variations, learning paths
- Use Case Examples - Real-world scenarios showing when to use guide
- 🆕 DUAL-MODE AI SUPPORT - Choose how to enhance guides:
- API Mode: Uses Claude API directly (requires ANTHROPIC_API_KEY)
- Fast, efficient, perfect for automation/CI
- Cost: ~$0.15-$0.30 per guide
- LOCAL Mode: Uses Claude Code CLI (no API key needed)
- Uses your existing Claude Code Max plan (FREE!)
- Opens in terminal, takes 30-60 seconds
- Perfect for local development
- AUTO Mode (default): Automatically detects best available mode
- API Mode: Uses Claude API directly (requires ANTHROPIC_API_KEY)
- 🆕 QUALITY TRANSFORMATION: Basic templates become comprehensive professional tutorials
- Before: 75-line template with just code (⭐⭐)
- After: 500+ line guide with explanations, troubleshooting, learning paths (⭐⭐⭐⭐⭐)
- CLI Integration: Simple flags control AI enhancement
--ai-mode api- Use Claude API (requires ANTHROPIC_API_KEY)--ai-mode local- Use Claude Code CLI (no API key needed)--ai-mode auto- Automatic detection (default)--ai-mode none- Disable AI enhancement
- 4 Intelligent Grouping Strategies:
- AI Tutorial Group (default) - Uses C3.6 AI analysis for semantic grouping
- File Path - Groups by test file location
- Test Name - Groups by test name patterns
- Complexity - Groups by difficulty level (beginner/intermediate/advanced)
- Python AST-based Step Extraction - Precise step identification from test code
- Rich Markdown Guides with prerequisites, code examples, verification points, troubleshooting
- Automatic Complexity Assessment - Classifies guides by difficulty
- Multi-Language Support - Python (AST-based), JavaScript, TypeScript, Go, Rust, Java, C#, PHP, Ruby (heuristic)
- Integration Points:
- CLI tool:
skill-seekers-how-to-guides test_examples.json --group-by ai-tutorial-group --ai-mode auto - Codebase scraper:
--build-how-to-guides --ai-mode local(default ON,--skip-how-to-guidesto disable) - MCP tool:
build_how_to_guidesfor Claude Code integration
- CLI tool:
- Components: WorkflowAnalyzer, WorkflowGrouper, GuideGenerator, HowToGuideBuilder, GuideEnhancer (NEW!)
- Output: Comprehensive index + individual guides with complete examples + AI enhancements
- 56 comprehensive tests, 100% passing (30 GuideEnhancer tests + 21 original + 5 integration tests)
- Performance: 2.8s to process 50 workflows + 30-60s AI enhancement per guide
- Quality Metrics: Enhanced guides have 95%+ user satisfaction, 50% reduction in support questions
- Documentation:
docs/HOW_TO_GUIDES.mdwith AI enhancement guide
-
C3.4 Configuration Pattern Extraction with AI Enhancement - Analyze and document configuration files across your codebase with optional AI-powered insights
- 9 Supported Config Formats: JSON, YAML, TOML, ENV, INI, Python modules, JavaScript/TypeScript configs, Dockerfile, Docker Compose
- 7 Common Pattern Detection:
- Database configuration (host, port, credentials)
- API configuration (endpoints, keys, timeouts)
- Logging configuration (level, format, handlers)
- Cache configuration (backend, TTL, keys)
- Email configuration (SMTP, credentials)
- Authentication configuration (providers, secrets)
- Server configuration (host, port, workers)
- 🆕 COMPREHENSIVE AI ENHANCEMENT (optional) - Similar to C3.3 dual-mode support:
- API Mode: Uses Claude API (requires ANTHROPIC_API_KEY)
- LOCAL Mode: Uses Claude Code CLI (FREE, no API key needed)
- AUTO Mode: Automatically detects best available mode
- 5 AI-Powered Insights:
- Explanations - What each configuration setting does
- Best Practices - Suggested improvements (better structure, naming, organization)
- Security Analysis - Identifies hardcoded secrets, exposed credentials, security issues
- Migration Suggestions - Opportunities to consolidate or standardize configs
- Context - Explains detected patterns and when to use them
- Comprehensive Extraction:
- Extracts all configuration settings with type inference
- Detects environment variables and their usage
- Maps nested configuration structures
- Identifies required vs optional settings
- Integration Points:
- CLI tool:
skill-seekers-config-extractor --directory . --enhance-local(with AI) - Codebase scraper:
--extract-config-patterns --ai-mode local(default ON,--skip-config-patternsto disable) - MCP tool:
extract_config_patterns(directory=".", enhance_local=true)for Claude Code integration
- CLI tool:
- Output Formats: JSON (machine-readable with AI insights) + Markdown (human-readable documentation)
- Components: ConfigFileDetector, ConfigParser, ConfigPatternDetector, ConfigExtractor, ConfigEnhancer (NEW!)
- Performance: Analyzes 100 config files in ~3 seconds (basic) + 30-60 seconds (AI enhancement)
- Use Cases: Documentation generation, configuration auditing, migration planning, security reviews, onboarding new developers
- Test Coverage: 28 comprehensive tests covering all formats and patterns
-
C3.5 Architectural Overview & Skill Integrator - Comprehensive integration of ALL C3.x codebase analysis into unified skills
- ARCHITECTURE.md Generation - Comprehensive architectural overview with 8 sections:
- Overview - Project description and purpose
- Architectural Patterns - Detected patterns (MVC, MVVM, etc.) from C3.7 analysis
- Technology Stack - Frameworks, libraries, and languages detected
- Design Patterns - Summary of C3.1 design patterns (Factory, Singleton, etc.)
- Configuration Overview - C3.4 config files with security warnings
- Common Workflows - C3.3 how-to guides summary
- Usage Examples - C3.2 test examples statistics
- Entry Points & Directory Structure - Main directories and file organization
- Default ON Behavior - C3.x codebase analysis now runs automatically when GitHub sources have
local_repo_path - CLI Flag -
--skip-codebase-analysisto disable C3.x analysis if needed - Skill Directory Structure - New
references/codebase_analysis/with organized C3.x outputs:ARCHITECTURE.md- Master architectural overview (main deliverable)patterns/- C3.1 design pattern analysisexamples/- C3.2 test examplesguides/- C3.3 how-to tutorialsconfiguration/- C3.4 config patternsarchitecture_details/- C3.7 architectural pattern details
- Enhanced SKILL.md - Architecture & Code Analysis summary section with:
- Primary architectural pattern with confidence
- Design patterns count and top 3 patterns
- Test examples statistics
- How-to guides count
- Configuration files count with security alerts
- Link to ARCHITECTURE.md for complete details
- Config Properties:
enable_codebase_analysis(boolean, default: true) - Enable/disable C3.x analysisai_mode(enum: auto/api/local/none, default: auto) - AI enhancement mode
- Graceful Degradation - Skills build successfully even if C3.x analysis fails
- Integration Points:
- Unified scraper: Automatic C3.x analysis when
local_repo_pathexists - Skill builder: Automatic ARCHITECTURE.md + references generation
- Config validator: Validates new C3.x properties
- Unified scraper: Automatic C3.x analysis when
- Test Coverage: 9 comprehensive integration tests
- Updated Configs: 5 unified configs updated (react, django, fastapi, godot, svelte-cli)
- Use Cases: Understanding codebase architecture, onboarding developers, code reviews, documentation generation, skill completeness
- ARCHITECTURE.md Generation - Comprehensive architectural overview with 8 sections:
-
C3.6 AI Enhancement - AI-powered insights for patterns and test examples
- Enhances C3.1 (Pattern Detection) and C3.2 (Test Examples) with AI analysis
- Pattern Enhancement: Explains why patterns detected, suggests improvements, identifies issues
- Test Example Enhancement: Adds context, groups examples into tutorials, identifies best practices
- API Mode (for pattern/example enhancement):
- Uses Anthropic API with ANTHROPIC_API_KEY
- Batch processing (5 items per call) for efficiency
- Automatic activation when key is set
- Graceful degradation if no key (works offline)
- LOCAL Mode (for SKILL.md enhancement - existing feature):
- Uses
skill-seekers enhance output/skill/command - Opens Claude Code in new terminal (no API costs!)
- Uses your existing Claude Code Max plan
- Perfect for enhancing generated SKILL.md files
- Uses
- Note: Pattern/example enhancement uses API mode only (batch processing hundreds of items)
-
C3.7 Architectural Pattern Detection - Detect high-level architectural patterns
- Detects MVC, MVVM, MVP, Repository, Service Layer, Layered, Clean Architecture
- Multi-file analysis (analyzes entire codebase structure)
- Framework detection: Django, Flask, Spring, ASP.NET, Rails, Laravel, Angular, React, Vue.js
- Directory structure analysis for pattern recognition
- Evidence-based detection with confidence scoring
- AI-enhanced insights for architectural recommendations
- Always enabled (provides high-level overview)
- Output:
output/codebase/architecture/architectural_patterns.json - Integration with C3.6 for AI-powered architectural insights
Changed
- BREAKING: Analysis Features Now Default ON - Improved UX for codebase analysis
- All analysis features (API reference, dependency graph, patterns, test examples) are now enabled by default
- Changed flag pattern from
--build-*to--skip-*for better discoverability - Old flags (DEPRECATED):
--build-api-reference,--build-dependency-graph,--detect-patterns,--extract-test-examples - New flags:
--skip-api-reference,--skip-dependency-graph,--skip-patterns,--skip-test-examples - Migration: Remove old
--build-*flags from your scripts (features are now ON by default) - Backward compatibility: Deprecated flags show warnings but still work (will be removed in v3.0.0)
- Rationale: Users should get maximum value by default; explicitly opt-out if needed
- Impact:
codebase-scraper --directory .now runs all analysis features automatically
Fixed
- Codebase Scraper Language Stats - Fixed dict format handling in
_get_language_stats()- Issue:
AttributeError: 'dict' object has no attribute 'suffix'when generating SKILL.md - Cause: Function expected Path objects but received dict objects from analysis results
- Fix: Extract language from dict instead of calling
detect_language()on Path - Impact: SKILL.md generation now works correctly for all codebases
- Location:
src/skill_seekers/cli/codebase_scraper.py:778
- Issue:
Removed
[2.5.2] - 2025-12-31
🔧 Package Configuration Improvement
This patch release improves the packaging configuration by switching from manual package listing to automatic package discovery, preventing similar issues in the future.
Changed
- Package Discovery: Switched from manual package listing to automatic discovery in pyproject.toml (#227)
- Before: Manually listed 5 packages (error-prone when adding new modules)
- After: Automatic discovery using
[tool.setuptools.packages.find] - Benefits: Future-proof, prevents missing module bugs, follows Python packaging best practices
- Impact: No functional changes, same packages included
- Credit: Thanks to @iamKhan79690 for the improvement!
Package Structure
No changes to package contents - all modules from v2.5.1 are still included:
- ✅
skill_seekers(core) - ✅
skill_seekers.cli(CLI tools) - ✅
skill_seekers.cli.adaptors(platform adaptors) - ✅
skill_seekers.mcp(MCP server) - ✅
skill_seekers.mcp.tools(MCP tools)
Related Issues
- Closes #226 - MCP server package_skill tool fails (already fixed in v2.5.1, improved by this release)
- Merges #227 - Update setuptools configuration to include adaptors module
Contributors
- @iamKhan79690 - Automatic package discovery implementation
[2.5.1] - 2025-12-30
🐛 Critical Bug Fix - PyPI Package Broken
This patch release fixes a critical packaging bug that made v2.5.0 completely unusable for PyPI users.
Fixed
- CRITICAL: Added missing
skill_seekers.cli.adaptorsmodule to packages list in pyproject.toml (#221)- Issue: v2.5.0 on PyPI throws
ModuleNotFoundError: No module named 'skill_seekers.cli.adaptors' - Impact: Broke 100% of multi-platform features (Claude, Gemini, OpenAI, Markdown)
- Cause: The adaptors module was missing from the explicit packages list
- Fix: Added
skill_seekers.cli.adaptorsto packages in pyproject.toml - Credit: Thanks to @MiaoDX for finding and fixing this issue!
- Issue: v2.5.0 on PyPI throws
Package Structure
The skill_seekers.cli.adaptors module contains the platform adaptor architecture:
base.py- Abstract base class for all adaptorsclaude.py- Claude AI platform implementationgemini.py- Google Gemini platform implementationopenai.py- OpenAI ChatGPT platform implementationmarkdown.py- Generic markdown export
Note: v2.5.0 is broken on PyPI. All users should upgrade to v2.5.1 immediately.
[2.5.0] - 2025-12-28
🚀 Multi-Platform Feature Parity - 4 LLM Platforms Supported
This major feature release adds complete multi-platform support for Claude AI, Google Gemini, OpenAI ChatGPT, and Generic Markdown export. All features now work across all platforms with full feature parity.
🎯 Major Features
Multi-LLM Platform Support
- 4 platforms supported: Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown
- Complete feature parity: All skill modes work with all platforms
- Platform adaptors: Clean architecture with platform-specific implementations
- Unified workflow: Same scraping output works for all platforms
- Smart enhancement: Platform-specific AI models (Claude Sonnet 4, Gemini 2.0 Flash, GPT-4o)
Platform-Specific Capabilities
Claude AI (Default):
- Format: ZIP with YAML frontmatter + markdown
- Upload: Anthropic Skills API
- Enhancement: Claude Sonnet 4 (local or API)
- MCP integration: Full support
Google Gemini:
- Format: tar.gz with plain markdown
- Upload: Google Files API + Grounding
- Enhancement: Gemini 2.0 Flash
- Long context: 1M tokens supported
OpenAI ChatGPT:
- Format: ZIP with assistant instructions
- Upload: Assistants API + Vector Store
- Enhancement: GPT-4o
- File search: Semantic search enabled
Generic Markdown:
- Format: ZIP with pure markdown
- Upload: Manual distribution
- Universal compatibility: Works with any LLM
Complete Feature Parity
All skill modes work with all platforms:
- Documentation scraping → All 4 platforms
- GitHub repository analysis → All 4 platforms
- PDF extraction → All 4 platforms
- Unified multi-source → All 4 platforms
- Local repository analysis → All 4 platforms
18 MCP tools with multi-platform support:
package_skill- Now acceptstargetparameter (claude, gemini, openai, markdown)upload_skill- Now acceptstargetparameter (claude, gemini, openai)enhance_skill- NEW standalone tool withtargetparameterinstall_skill- Full multi-platform workflow automation
Added
Core Infrastructure
- Platform Adaptors (
src/skill_seekers/cli/adaptors/)base_adaptor.py- Abstract base class for all adaptorsclaude_adaptor.py- Claude AI implementationgemini_adaptor.py- Google Gemini implementationopenai_adaptor.py- OpenAI ChatGPT implementationmarkdown_adaptor.py- Generic Markdown export__init__.py- Factory functionget_adaptor(target)
CLI Tools
- Multi-platform packaging:
skill-seekers package output/skill/ --target gemini - Multi-platform upload:
skill-seekers upload skill.zip --target openai - Multi-platform enhancement:
skill-seekers enhance output/skill/ --target gemini --mode api - Target parameter: All packaging tools now accept
--targetflag
MCP Tools
-
enhance_skill(NEW) - Standalone AI enhancement tool- Supports local mode (Claude Code Max, no API key)
- Supports API mode (platform-specific APIs)
- Works with Claude, Gemini, OpenAI
- Creates SKILL.md.backup before enhancement
-
package_skill(UPDATED) - Multi-platform packaging- New
targetparameter (claude, gemini, openai, markdown) - Creates ZIP for Claude/OpenAI/Markdown
- Creates tar.gz for Gemini
- Shows platform-specific output messages
- New
-
upload_skill(UPDATED) - Multi-platform upload- New
targetparameter (claude, gemini, openai) - Platform-specific API key validation
- Returns skill ID and platform URL
- Graceful error for markdown (no upload)
- New
Documentation
-
docs/FEATURE_MATRIX.md(NEW) - Comprehensive feature matrix- Platform support comparison table
- Skill mode support across platforms
- CLI command support matrix
- MCP tool support matrix
- Platform-specific examples
- Verification checklist
-
docs/UPLOAD_GUIDE.md(REWRITTEN) - Multi-platform upload guide- Complete guide for all 4 platforms
- Platform selection table
- API key setup instructions
- Platform comparison matrices
- Complete workflow examples
-
docs/ENHANCEMENT.md(UPDATED)- Multi-platform enhancement section
- Platform-specific model information
- Cost comparison across platforms
-
docs/MCP_SETUP.md(UPDATED)- Added enhance_skill to tool listings
- Multi-platform usage examples
- Updated tool count (10 → 18 tools)
-
src/skill_seekers/mcp/README.md(UPDATED)- Corrected tool count (18 tools)
- Added enhance_skill documentation
- Updated package_skill with target parameter
- Updated upload_skill with target parameter
Optional Dependencies
-
[gemini]extra:pip install skill-seekers[gemini]- google-generativeai>=0.8.3
- Required for Gemini enhancement and upload
-
[openai]extra:pip install skill-seekers[openai]- openai>=1.59.6
- Required for OpenAI enhancement and upload
-
[all-llms]extra:pip install skill-seekers[all-llms]- Includes both Gemini and OpenAI dependencies
Tests
tests/test_adaptors.py- Comprehensive adaptor teststests/test_multi_llm_integration.py- E2E multi-platform teststests/test_install_multiplatform.py- Multi-platform install_skill tests- 700 total tests passing (up from 427 in v2.4.0)
Changed
CLI Architecture
- Package command: Now routes through platform adaptors
- Upload command: Now supports all 3 upload platforms
- Enhancement command: Now supports platform-specific models
- Unified workflow: All commands respect
--targetparameter
MCP Architecture
- Tool modularity: Cleaner separation with adaptor pattern
- Error handling: Platform-specific error messages
- API key validation: Per-platform validation logic
- TextContent fallback: Graceful degradation when MCP not installed
Documentation
- All platform documentation updated for multi-LLM support
- Consistent terminology across all docs
- Platform comparison tables added
- Examples updated to show all platforms
Fixed
-
TextContent import error in test environment (5 MCP tool files)
- Added fallback TextContent class when MCP not installed
- Prevents
TypeError: 'NoneType' object is not callable - Ensures tests pass without MCP library
-
UTF-8 encoding issues on Windows (continued from v2.4.0)
- All file operations use explicit UTF-8 encoding
- CHANGELOG encoding handling improved
-
API key environment variables - Clear documentation for all platforms
- ANTHROPIC_API_KEY for Claude
- GOOGLE_API_KEY for Gemini
- OPENAI_API_KEY for OpenAI
Other Improvements
Smart Description Generation
- Automatically generates skill descriptions from documentation
- Analyzes reference files to suggest "When to Use" triggers
- Improves SKILL.md quality without manual editing
Smart Summarization
- Large skills (500+ lines) automatically summarized
- Preserves key examples and patterns
- Maintains quality while reducing token usage
Deprecation Notice
None - All changes are backward compatible. Existing v2.4.0 workflows continue to work with default target='claude'.
Migration Guide
For users upgrading from v2.4.0:
-
No changes required - Default behavior unchanged (targets Claude AI)
-
To use other platforms:
# Install platform dependencies pip install skill-seekers[gemini] # For Gemini pip install skill-seekers[openai] # For OpenAI pip install skill-seekers[all-llms] # For all platforms # Set API keys export GOOGLE_API_KEY=AIzaSy... # For Gemini export OPENAI_API_KEY=sk-proj-... # For OpenAI # Use --target flag skill-seekers package output/react/ --target gemini skill-seekers upload react-gemini.tar.gz --target gemini -
MCP users - New tools available:
enhance_skill- Standalone enhancement (was only in install_skill)- All packaging tools now accept
targetparameter
See full documentation:
Contributors
- @yusufkaraaslan - Multi-platform architecture, all platform adaptors, comprehensive testing
Stats
- 16 commits since v2.4.0
- 700 tests (up from 427, +273 new tests)
- 4 platforms supported (was 1)
- 18 MCP tools (up from 17)
- 5 documentation guides updated/created
- 29 files changed, 6,349 insertions(+), 253 deletions(-)
[2.4.0] - 2025-12-25
🚀 MCP 2025 Upgrade - Multi-Agent Support & HTTP Transport
This major release upgrades the MCP infrastructure to the 2025 specification with support for 5 AI coding agents, dual transport modes (stdio + HTTP), and a complete FastMCP refactor.
🎯 Major Features
MCP SDK v1.25.0 Upgrade
- Upgraded from v1.18.0 to v1.25.0 - Latest MCP protocol specification (November 2025)
- FastMCP framework - Decorator-based tool registration, 68% code reduction (2200 → 708 lines)
- Enhanced reliability - Better error handling, automatic schema generation from type hints
- Backward compatible - Existing v2.3.0 configurations continue to work
Dual Transport Support
- stdio transport (default) - Standard input/output for Claude Code, VS Code + Cline
- HTTP transport (new) - Server-Sent Events for Cursor, Windsurf, IntelliJ IDEA
- Health check endpoint -
GET /healthfor monitoring - SSE endpoint -
GET /ssefor real-time communication - Configurable server -
--http,--port,--host,--log-levelflags - uvicorn-powered - Production-ready ASGI server
Multi-Agent Auto-Configuration
- 5 AI agents supported:
- Claude Code (stdio)
- Cursor (HTTP)
- Windsurf (HTTP)
- VS Code + Cline (stdio)
- IntelliJ IDEA (HTTP)
- Automatic detection -
agent_detector.pyscans for installed agents - One-command setup -
./setup_mcp.shconfigures all detected agents - Smart config merging - Preserves existing MCP servers, only adds skill-seeker
- Automatic backups - Timestamped backups before modifications
- HTTP server management - Auto-starts HTTP server for HTTP-based agents
Expanded Tool Suite (17 Tools)
- Config Tools (3): generate_config, list_configs, validate_config
- Scraping Tools (4): estimate_pages, scrape_docs, scrape_github, scrape_pdf
- Packaging Tools (3): package_skill, upload_skill, install_skill
- Splitting Tools (2): split_config, generate_router
- Source Tools (5): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
Added
Core Infrastructure
-
server_fastmcp.py(708 lines) - New FastMCP-based MCP server- Decorator-based tool registration (
@safe_tool_decorator) - Modular tool architecture (5 tool modules)
- HTTP transport with uvicorn
- stdio transport (default)
- Comprehensive error handling
- Decorator-based tool registration (
-
agent_detector.py(333 lines) - Multi-agent detection and configuration- Detects 5 AI coding agents across platforms (Linux, macOS, Windows)
- Generates agent-specific config formats (JSON, XML)
- Auto-selects transport type (stdio vs HTTP)
- Cross-platform path resolution
-
Tool modules (5 modules, 1,676 total lines):
tools/config_tools.py(249 lines) - Configuration managementtools/scraping_tools.py(423 lines) - Documentation scrapingtools/packaging_tools.py(514 lines) - Skill packaging and uploadtools/splitting_tools.py(195 lines) - Config splitting and routingtools/source_tools.py(295 lines) - Config source management
Setup & Configuration
-
setup_mcp.sh(rewritten, 661 lines) - Multi-agent auto-configuration- Detects installed agents automatically
- Offers configure all or select individual agents
- Manages HTTP server startup
- Smart config merging with existing configurations
- Comprehensive validation and testing
-
HTTP server - Production-ready HTTP transport
- Health endpoint:
/health - SSE endpoint:
/sse - Messages endpoint:
/messages/ - CORS middleware for cross-origin requests
- Configurable host and port
- Debug logging support
- Health endpoint:
Documentation
-
docs/MCP_SETUP.md(completely rewritten) - Comprehensive MCP 2025 guide- Migration guide from v2.3.0
- Transport modes explained (stdio vs HTTP)
- Agent-specific configuration for all 5 agents
- Troubleshooting for both transports
- Advanced configuration (systemd, launchd services)
-
docs/HTTP_TRANSPORT.md(434 lines, new) - HTTP transport guide -
docs/MULTI_AGENT_SETUP.md(643 lines, new) - Multi-agent setup guide -
docs/SETUP_QUICK_REFERENCE.md(387 lines, new) - Quick reference card -
SUMMARY_HTTP_TRANSPORT.md(360 lines, new) - Technical implementation details -
SUMMARY_MULTI_AGENT_SETUP.md(556 lines, new) - Multi-agent technical summary
Testing
-
test_mcp_fastmcp.py(960 lines, 63 tests) - Comprehensive FastMCP server tests- All 18 tools tested
- Error handling validation
- Type validation
- Integration workflows
-
test_server_fastmcp_http.py(165 lines, 6 tests) - HTTP transport tests- Health check endpoint
- SSE endpoint
- CORS middleware
- Argument parsing
-
All tests passing: 602/609 tests (99.1% pass rate)
Changed
MCP Server Architecture
- Refactored to FastMCP - Decorator-based, modular, maintainable
- Code reduction - 68% smaller (2200 → 708 lines)
- Modular tools - Separated into 5 category modules
- Type safety - Full type hints on all tool functions
- Improved error handling - Graceful degradation, clear error messages
Server Compatibility
server.py- Now a compatibility shim (delegates toserver_fastmcp.py)- Deprecation warning - Alerts users to migrate to
server_fastmcp - Backward compatible - Existing configurations continue to work
- Migration path - Clear upgrade instructions in docs
Setup Experience
- Multi-agent workflow - One script configures all agents
- Interactive prompts - User-friendly with sensible defaults
- Validation - Config file validation before writing
- Backup safety - Automatic timestamped backups
- Color-coded output - Visual feedback (success/warning/error)
Documentation
- README.md - Added comprehensive multi-agent section
- MCP_SETUP.md - Completely rewritten for v2.4.0
- CLAUDE.md - Updated with new server details
- Version badges - Updated to v2.4.0
Fixed
- Import issues in test files (updated to use new tool modules)
- CLI version test (updated to expect v2.3.0)
- Graceful MCP import handling (no sys.exit on import)
- Server compatibility for testing environments
Deprecated
server.py- Useserver_fastmcp.pyinstead- Compatibility shim provided
- Will be removed in v3.0.0 (6+ months)
- Migration guide available
Infrastructure
- Python 3.10+ - Recommended for best compatibility
- MCP SDK: v1.25.0 (pinned to v1.x)
- uvicorn: v0.40.0+ (for HTTP transport)
- starlette: v0.50.0+ (for HTTP transport)
Migration from v2.3.0
Upgrade Steps:
- Update dependencies:
pip install -e ".[mcp]" - Update MCP config to use
server_fastmcp:{ "mcpServers": { "skill-seeker": { "command": "python", "args": ["-m", "skill_seekers.mcp.server_fastmcp"] } } } - For HTTP agents, start HTTP server:
python -m skill_seekers.mcp.server_fastmcp --http - Or use auto-configuration:
./setup_mcp.sh
Breaking Changes: None - fully backward compatible
New Capabilities:
- Multi-agent support (5 agents)
- HTTP transport for web-based agents
- 8 new MCP tools
- Automatic agent detection and configuration
Contributors
- Implementation: Claude Sonnet 4.5
- Testing & Review: @yusufkaraaslan
[2.3.0] - 2025-12-22
🤖 Multi-Agent Installation Support
This release adds automatic skill installation to 10+ AI coding agents with a single command.
Added
- Multi-agent installation support (#210)
- New
install-agentcommand to install skills to any AI coding agent - Support for 10+ agents: Claude Code, Cursor, VS Code, Amp, Goose, OpenCode, Letta, Aide, Windsurf
--agent allflag to install to all agents at once--forceflag to overwrite existing installations--dry-runflag to preview installations- Intelligent path resolution (global vs project-relative)
- Fuzzy matching for agent names with suggestions
- Comprehensive error handling and user feedback
- New
Changed
- Skills are now compatible with the Agent Skills open standard (agentskills.io)
- Installation paths follow standard conventions for each agent
- CLI updated with install-agent subcommand
Documentation
- Added multi-agent installation guide to README.md
- Updated CLAUDE.md with install-agent examples
- Added agent compatibility table
Testing
- Added 32 comprehensive tests for install-agent functionality
- All tests passing (603 tests total, 86 skipped)
- No regressions in existing functionality
[2.2.0] - 2025-12-21
🚀 Private Config Repositories - Team Collaboration Unlocked
This major release adds git-based config sources, enabling teams to fetch configs from private/team repositories in addition to the public API. This unlocks team collaboration, enterprise deployment, and custom config collections.
🎯 Major Features
Git-Based Config Sources (Issue #211)
- Multi-source config management - Fetch from API, git URL, or named sources
- Private repository support - GitHub, GitLab, Bitbucket, Gitea, and custom git servers
- Team collaboration - Share configs across 3-5 person teams with version control
- Enterprise scale - Support 500+ developers with priority-based resolution
- Secure authentication - Environment variable tokens only (GITHUB_TOKEN, GITLAB_TOKEN, etc.)
- Intelligent caching - Shallow clone (10-50x faster), auto-pull updates
- Offline mode - Works with cached repos when offline
- Backward compatible - Existing API-based configs work unchanged
New MCP Tools
-
add_config_source- Register git repositories as config sources- Auto-detects source type (GitHub, GitLab, etc.)
- Auto-selects token environment variable
- Priority-based resolution for multiple sources
- SSH URL support (auto-converts to HTTPS + token)
-
list_config_sources- View all registered sources- Shows git URL, branch, priority, token env
- Filter by enabled/disabled status
- Sorted by priority (lower = higher priority)
-
remove_config_source- Unregister sources- Removes from registry (cache preserved for offline use)
- Helpful error messages with available sources
-
Enhanced
fetch_config- Three modes- Named source mode -
fetch_config(source="team", config_name="react-custom") - Git URL mode -
fetch_config(git_url="https://...", config_name="react-custom") - API mode -
fetch_config(config_name="react")(unchanged)
- Named source mode -
Added
Core Infrastructure
-
GitConfigRepo class (
src/skill_seekers/mcp/git_repo.py, 283 lines)clone_or_pull()- Shallow clone with auto-pull and force refreshfind_configs()- Recursive *.json discovery (excludes .git)get_config()- Load config with case-insensitive matchinginject_token()- Convert SSH to HTTPS with token authenticationvalidate_git_url()- Support HTTPS, SSH, and file:// URLs- Comprehensive error handling (auth failures, missing repos, corrupted caches)
-
SourceManager class (
src/skill_seekers/mcp/source_manager.py, 260 lines)add_source()- Register/update sources with validationget_source()- Retrieve by name with helpful errorslist_sources()- List all/enabled sources sorted by priorityremove_source()- Unregister sourcesupdate_source()- Modify specific fields- Atomic file I/O (write to temp, then rename)
- Auto-detect token env vars from source type
Storage & Caching
-
Registry file:
~/.skill-seekers/sources.json- Stores source metadata (URL, branch, priority, timestamps)
- Version-controlled schema (v1.0)
- Atomic writes prevent corruption
-
Cache directory:
$SKILL_SEEKERS_CACHE_DIR(default:~/.skill-seekers/cache/)- One subdirectory per source
- Shallow git clones (depth=1, single-branch)
- Configurable via environment variable
Documentation
-
docs/GIT_CONFIG_SOURCES.md (800+ lines) - Comprehensive guide
- Quick start, architecture, authentication
- MCP tools reference with examples
- Use cases (small teams, enterprise, open source)
- Best practices, troubleshooting, advanced topics
- Complete API reference
-
configs/example-team/ - Example repository for testing
react-custom.json- Custom React config with metadatavue-internal.json- Internal Vue configcompany-api.json- Company API config exampleREADME.md- Usage guide and best practicestest_e2e.py- End-to-end test script (7 steps, 100% passing)
-
README.md - Updated with git source examples
- New "Private Config Repositories" section in Key Features
- Comprehensive usage examples (quick start, team collaboration, enterprise)
- Supported platforms and authentication
- Example workflows for different team sizes
Dependencies
- GitPython>=3.1.40 - Git operations (clone, pull, branch switching)
- Replaces subprocess calls with high-level API
- Better error handling and cross-platform support
Testing
- 83 new tests (100% passing)
tests/test_git_repo.py(35 tests) - GitConfigRepo functionality- Initialization, URL validation, token injection
- Clone/pull operations, config discovery, error handling
tests/test_source_manager.py(48 tests) - SourceManager functionality- Add/get/list/remove/update sources
- Registry persistence, atomic writes, default token env
tests/test_mcp_git_sources.py(18 tests) - MCP integration- All 3 fetch modes (API, Git URL, Named Source)
- Source management tools (add/list/remove)
- Complete workflow (add → fetch → remove)
- Error scenarios (auth failures, missing configs)
Improved
- MCP server - Now supports 12 tools (up from 9)
- Maintains backward compatibility
- Enhanced error messages with available sources
- Priority-based config resolution
Use Cases
Small Teams (3-5 people):
# One-time setup
add_config_source(name="team", git_url="https://github.com/myteam/configs.git")
# Daily usage
fetch_config(source="team", config_name="react-internal")
Enterprise (500+ developers):
# IT pre-configures sources
add_config_source(name="platform", ..., priority=1)
add_config_source(name="mobile", ..., priority=2)
# Developers use transparently
fetch_config(config_name="platform-api") # Finds in platform source
Example Repository:
cd /path/to/Skill_Seekers
python3 configs/example-team/test_e2e.py # Test E2E workflow
Backward Compatibility
- ✅ All existing configs work unchanged
- ✅ API mode still default (no registration needed)
- ✅ No breaking changes to MCP tools or CLI
- ✅ New parameters are optional (git_url, source, refresh)
Security
- ✅ Tokens via environment variables only (not in files)
- ✅ Shallow clones minimize attack surface
- ✅ No token storage in registry file
- ✅ Secure token injection (auto-converts SSH to HTTPS)
Performance
- ✅ Shallow clone: 10-50x faster than full clone
- ✅ Minimal disk space (no git history)
- ✅ Auto-pull: Only fetches changes (not full re-clone)
- ✅ Offline mode: Works with cached repos
Files Changed
- Modified (2):
pyproject.toml,src/skill_seekers/mcp/server.py - Added (6): 3 source files + 3 test files + 1 doc + 1 example repo
- Total lines added: ~2,600
Migration Guide
No migration needed! This is purely additive:
# Before v2.2.0 (still works)
fetch_config(config_name="react")
# New in v2.2.0 (optional)
add_config_source(name="team", git_url="...")
fetch_config(source="team", config_name="react-custom")
Known Limitations
- MCP async tests require pytest-asyncio (added to dev dependencies)
- Example repository uses 'master' branch (git init default)
See Also
- GIT_CONFIG_SOURCES.md - Complete guide
- configs/example-team/ - Example repository
- Issue #211 - Original feature request
[2.1.1] - 2025-11-30
Fixed
- submit_config MCP tool - Comprehensive validation and format support (#11)
- Now uses ConfigValidator for comprehensive validation (previously only checked 3 fields)
- Validates name format (alphanumeric, hyphens, underscores only)
- Validates URL formats (must start with http:// or https://)
- Validates selectors, patterns, rate limits, and max_pages
- Supports both legacy and unified config formats
- Provides detailed error messages with validation failures and examples
- Adds warnings for unlimited scraping configurations
- Enhanced category detection for multi-source configs
- 8 comprehensive test cases added to test_mcp_server.py
- Updated GitHub issue template with format type and validation warnings
[2.1.1] - 2025-11-30
🚀 GitHub Repository Analysis Enhancements
This release significantly improves GitHub repository scraping with unlimited local analysis, configurable directory exclusions, and numerous bug fixes.
Added
- Configurable directory exclusions for local repository analysis (#203)
exclude_dirs_additional: Extend default exclusions with custom directoriesexclude_dirs: Replace default exclusions entirely (advanced users)- 19 comprehensive tests covering all scenarios
- Logging: INFO for extend mode, WARNING for replace mode
- Unlimited local repository analysis via
local_repo_pathconfiguration parameter - Auto-exclusion of virtual environments, build artifacts, and cache directories
- Support for analyzing repositories without GitHub API rate limits (50 → unlimited files)
- Skip llms.txt option - Force HTML scraping even when llms.txt is detected (#198)
Fixed
- Fixed logger initialization error causing
AttributeError: 'NoneType' object has no attribute 'setLevel'(#190) - Fixed 3 NoneType subscriptable errors in release tag parsing
- Fixed relative import paths causing
ModuleNotFoundError - Fixed hardcoded 50-file analysis limit preventing comprehensive code analysis
- Fixed GitHub API file tree limitation (140 → 345 files discovered)
- Fixed AST parser "not iterable" errors eliminating 100% of parsing failures (95 → 0 errors)
- Fixed virtual environment file pollution reducing file tree noise by 95%
- Fixed
force_rescrapeflag not checked before interactive prompt causing EOFError in CI/CD environments
Improved
- Increased code analysis coverage from 14% to 93.6% (+79.6 percentage points)
- Improved file discovery from 140 to 345 files (+146%)
- Improved class extraction from 55 to 585 classes (+964%)
- Improved function extraction from 512 to 2,784 functions (+444%)
- Test suite expanded to 427 tests (up from 391)
[2.1.0] - 2025-11-12
🎉 Major Enhancement: Quality Assurance + Race Condition Fixes
This release focuses on quality and reliability improvements, adding comprehensive quality checks and fixing critical race conditions in the enhancement workflow.
🚀 Major Features
Comprehensive Quality Checker
- Automatic quality checks before packaging - Validates skill quality before upload
- Quality scoring system - 0-100 score with A-F grades
- Enhancement verification - Checks for template text, code examples, sections
- Structure validation - Validates SKILL.md, references/ directory
- Content quality checks - YAML frontmatter, language tags, "When to Use" section
- Link validation - Validates internal markdown links
- Detailed reporting - Errors, warnings, and info messages with file locations
- CLI tool -
skill-seekers-quality-checkerwith verbose and strict modes
Headless Enhancement Mode (Default)
- No terminal windows - Runs enhancement in background by default
- Proper waiting - Main console waits for enhancement to complete
- Timeout protection - 10-minute default timeout (configurable)
- Verification - Checks that SKILL.md was actually updated
- Progress messages - Clear status updates during enhancement
- Interactive mode available -
--interactive-enhancementflag for terminal mode
Added
New CLI Tools
- quality_checker.py - Comprehensive skill quality validation
- Structure checks (SKILL.md, references/)
- Enhancement verification (code examples, sections)
- Content validation (frontmatter, language tags)
- Link validation (internal markdown links)
- Quality scoring (0-100 + A-F grade)
New Features
- Headless enhancement -
skill-seekers-enhanceruns in background by default - Quality checks in packaging - Automatic validation before creating .zip
- MCP quality skip - MCP server skips interactive checks
- Enhanced error handling - Better error messages and timeout handling
Tests
- +12 quality checker tests - Comprehensive validation testing
- 391 total tests passing - Up from 379 in v2.0.0
- 0 test failures - All tests green
- CI improvements - Fixed macOS terminal detection tests
Changed
Enhancement Workflow
- Default mode changed - Headless mode is now default (was terminal mode)
- Waiting behavior - Main console waits for enhancement completion
- No race conditions - Fixed "Package your skill" message appearing too early
- Better progress - Clear status messages during enhancement
Package Workflow
- Quality checks added - Automatic validation before packaging
- User confirmation - Ask to continue if warnings/errors found
- Skip option -
--skip-quality-checkflag to bypass checks - MCP context - Automatically skips checks in non-interactive contexts
CLI Arguments
- doc_scraper.py:
- Updated
--enhance-localhelp text (mentions headless mode) - Added
--interactive-enhancementflag
- Updated
- enhance_skill_local.py:
- Changed default to
headless=True - Added
--interactive-enhancementflag - Added
--timeoutflag (default: 600 seconds)
- Changed default to
- package_skill.py:
- Added
--skip-quality-checkflag
- Added
Fixed
Critical Bugs
- Enhancement race condition - Main console no longer exits before enhancement completes
- MCP stdin errors - MCP server now skips interactive prompts
- Terminal detection tests - Fixed for headless mode default
Enhancement Issues
- Process detachment - subprocess.run() now waits properly instead of Popen()
- Timeout handling - Added timeout protection to prevent infinite hangs
- Verification - Checks file modification time and size to verify success
- Error messages - Better error handling and user-friendly messages
Test Fixes
- package_skill tests - Added skip_quality_check=True to prevent stdin errors
- Terminal detection tests - Updated to use headless=False for interactive tests
- MCP server tests - Fixed to skip quality checks in non-interactive context
Technical Details
New Modules
src/skill_seekers/cli/quality_checker.py- Quality validation enginetests/test_quality_checker.py- 12 comprehensive tests
Modified Modules
src/skill_seekers/cli/enhance_skill_local.py- Added headless modesrc/skill_seekers/cli/doc_scraper.py- Updated enhancement integrationsrc/skill_seekers/cli/package_skill.py- Added quality checkssrc/skill_seekers/mcp/server.py- Skip quality checks in MCP contexttests/test_package_skill.py- Updated for quality checkertests/test_terminal_detection.py- Updated for headless default
Commits in This Release
e279ed6- Phase 1: Enhancement race condition fix (headless mode)3272f9c- Phases 2 & 3: Quality checker implementation2dd1027- Phase 4: Tests (+12 quality checker tests)befcb89- CI Fix: Skip quality checks in MCP context67ab627- CI Fix: Update terminal tests for headless default
Upgrade Notes
Breaking Changes
- Headless mode default - Enhancement now runs in background by default
- Use
--interactive-enhancementif you want the old terminal mode - Affects:
skill-seekers-enhanceandskill-seekers scrape --enhance-local
- Use
New Behavior
- Quality checks - Packaging now runs quality checks by default
- May prompt for confirmation if warnings/errors found
- Use
--skip-quality-checkto bypass (not recommended)
Recommendations
- Try headless mode - Faster and more reliable than terminal mode
- Review quality reports - Fix warnings before packaging
- Update scripts - Add
--skip-quality-checkto automated packaging scripts if needed
Migration Guide
If you want the old terminal mode behavior:
# Old (v2.0.0): Default was terminal mode
skill-seekers-enhance output/react/
# New (v2.1.0): Use --interactive-enhancement
skill-seekers-enhance output/react/ --interactive-enhancement
If you want to skip quality checks:
# Add --skip-quality-check to package command
skill-seekers-package output/react/ --skip-quality-check
[2.0.0] - 2025-11-11
🎉 Major Release: PyPI Publication + Modern Python Packaging
Skill Seekers is now available on PyPI! Install with: pip install skill-seekers
This is a major milestone release featuring complete restructuring for modern Python packaging, comprehensive testing improvements, and publication to the Python Package Index.
🚀 Major Changes
PyPI Publication
- Published to PyPI - https://pypi.org/project/skill-seekers/
- Installation:
pip install skill-seekersoruv tool install skill-seekers - No cloning required - Install globally or in virtual environments
- Automatic dependency management - All dependencies handled by pip/uv
Modern Python Packaging
- pyproject.toml-based configuration - Standard PEP 621 metadata
- src/ layout structure - Best practice package organization
- Entry point scripts -
skill-seekerscommand available globally - Proper dependency groups - Separate dev, test, and MCP dependencies
- Build backend - setuptools-based build with uv support
Unified CLI Interface
- Single
skill-seekerscommand - Git-style subcommands - Subcommands:
scrape,github,pdf,unified,enhance,package,upload,estimate - Consistent interface - All tools accessible through one entry point
- Help system - Comprehensive
--helpfor all commands
Added
Testing Infrastructure
- 379 passing tests (up from 299) - Comprehensive test coverage
- 0 test failures - All tests passing successfully
- Test suite improvements:
- Fixed import paths for src/ layout
- Updated CLI tests for unified entry points
- Added package structure verification tests
- Fixed MCP server import tests
- Added pytest configuration in pyproject.toml
Documentation
- Updated README.md - PyPI badges, reordered installation options
- ROADMAP.md - Comprehensive roadmap with task-based approach
- Installation guides - Simplified with PyPI as primary method
- Testing documentation - How to run full test suite
Changed
Package Structure
- Moved to src/ layout:
src/skill_seekers/- Main packagesrc/skill_seekers/cli/- CLI toolssrc/skill_seekers/mcp/- MCP server
- Import paths updated - All imports use proper package structure
- Entry points configured - All CLI tools available as commands
Import Fixes
- Fixed
merge_sources.py- Corrected conflict_detector import (.conflict_detector) - Fixed MCP server tests - Updated to use
skill_seekers.mcp.serverimports - Fixed test paths - All tests updated for src/ layout
Fixed
Critical Bugs
- Import path errors - Fixed relative imports in CLI modules
- MCP test isolation - Added proper MCP availability checks
- Package installation - Resolved entry point conflicts
- Dependency resolution - All dependencies properly specified
Test Improvements
- 17 test fixes - Updated for modern package structure
- MCP test guards - Proper skipif decorators for MCP tests
- CLI test updates - Accept both exit codes 0 and 2 for help
- Path validation - Tests verify correct package structure
Technical Details
Build System
- Build backend: setuptools.build_meta
- Build command:
uv build - Publish command:
uv publish - Distribution formats: wheel + source tarball
Dependencies
- Core: requests, beautifulsoup4, PyGithub, mcp, httpx
- PDF: PyMuPDF, Pillow, pytesseract
- Dev: pytest, pytest-cov, pytest-anyio, mypy
- MCP: mcp package for Claude Code integration
Migration Guide
For Users
Old way:
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
pip install -r requirements.txt
python3 cli/doc_scraper.py --config configs/react.json
New way:
pip install skill-seekers
skill-seekers scrape --config configs/react.json
For Developers
- Update imports:
from cli.* → from skill_seekers.cli.* - Use
pip install -e ".[dev]"for development - Run tests:
python -m pytest - Entry points instead of direct script execution
Breaking Changes
- CLI interface changed - Use
skill-seekerscommand instead ofpython3 cli/... - Import paths changed - Package now at
skill_seekers.*instead ofcli.* - Installation method changed - PyPI recommended over git clone
Deprecations
- Direct script execution - Still works but deprecated (use
skill-seekerscommand) - Old import patterns - Legacy imports still work but will be removed in v3.0
Compatibility
- Python 3.10+ required
- Backward compatible - Old scripts still work with legacy CLI
- Config files - No changes required
- Output format - No changes to generated skills
[1.3.0] - 2025-10-26
Added - Refactoring & Performance Improvements
- Async/Await Support for Parallel Scraping (2-3x performance boost)
--asyncflag to enable async modeasync def scrape_page_async()method using httpx.AsyncClientasync def scrape_all_async()method with asyncio.gather()- Connection pooling for better performance
- asyncio.Semaphore for concurrency control
- Comprehensive async testing (11 new tests)
- Full documentation in ASYNC_SUPPORT.md
- Performance: ~55 pages/sec vs ~18 pages/sec (sync)
- Memory: 40 MB vs 120 MB (66% reduction)
- Python Package Structure (Phase 0 Complete)
cli/__init__.py- CLI tools package with clean importsskill_seeker_mcp/__init__.py- MCP server package (renamed from mcp/)skill_seeker_mcp/tools/__init__.py- MCP tools subpackage- Proper package imports:
from cli import constants
- Centralized Configuration Module
cli/constants.pywith 18 configuration constantsDEFAULT_ASYNC_MODE,DEFAULT_RATE_LIMIT,DEFAULT_MAX_PAGES- Enhancement limits, categorization scores, file limits
- All magic numbers now centralized and configurable
- Code Quality Improvements
- Converted 71 print() statements to proper logging calls
- Added type hints to all DocToSkillConverter methods
- Fixed all mypy type checking issues
- Installed types-requests for better type safety
- Multi-variant llms.txt detection: downloads all 3 variants (full, standard, small)
- Automatic .txt → .md file extension conversion
- No content truncation: preserves complete documentation
detect_all()method for finding all llms.txt variantsget_proper_filename()for correct .md naming
Changed
_try_llms_txt()now downloads all available variants instead of just one- Reference files now contain complete content (no 2500 char limit)
- Code samples now include full code (no 600 char limit)
- Test count increased from 207 to 299 (92 new tests)
- All print() statements replaced with logging (logger.info, logger.warning, logger.error)
- Better IDE support with proper package structure
- Code quality improved from 5.5/10 to 6.5/10
Fixed
- File extension bug: llms.txt files now saved as .md
- Content loss: 0% truncation (was 36%)
- Test isolation issues in test_async_scraping.py (proper cleanup with try/finally)
- Import issues: no more sys.path.insert() hacks needed
- .gitignore: added test artifacts (.pytest_cache, .coverage, htmlcov, etc.)
1.2.0 - 2025-10-23
🚀 PDF Advanced Features Release
Major enhancement to PDF extraction capabilities with Priority 2 & 3 features.
Added
Priority 2: Support More PDF Types
-
OCR Support for Scanned PDFs
- Automatic text extraction from scanned documents using Tesseract OCR
- Fallback mechanism when page text < 50 characters
- Integration with pytesseract and Pillow
- Command:
--ocrflag - New dependencies:
Pillow==11.0.0,pytesseract==0.3.13
-
Password-Protected PDF Support
- Handle encrypted PDFs with password authentication
- Clear error messages for missing/wrong passwords
- Secure password handling
- Command:
--password PASSWORDflag
-
Complex Table Extraction
- Extract tables from PDFs using PyMuPDF's table detection
- Capture table data as 2D arrays with metadata (bbox, row/col count)
- Integration with skill references in markdown format
- Command:
--extract-tablesflag
Priority 3: Performance Optimizations
-
Parallel Page Processing
- 3x faster PDF extraction using ThreadPoolExecutor
- Auto-detect CPU count or custom worker specification
- Only activates for PDFs with > 5 pages
- Commands:
--paralleland--workers Nflags - Benchmarks: 500-page PDF reduced from 4m 10s to 1m 15s
-
Intelligent Caching
- In-memory cache for expensive operations (text extraction, code detection, quality scoring)
- 50% faster on re-runs
- Command:
--no-cacheto disable (enabled by default)
New Documentation
docs/PDF_ADVANCED_FEATURES.md(580 lines)- Complete usage guide for all advanced features
- Installation instructions
- Performance benchmarks showing 3x speedup
- Best practices and troubleshooting
- API reference with all parameters
Testing
- New test file:
tests/test_pdf_advanced_features.py(568 lines, 26 tests)- TestOCRSupport (5 tests)
- TestPasswordProtection (4 tests)
- TestTableExtraction (5 tests)
- TestCaching (5 tests)
- TestParallelProcessing (4 tests)
- TestIntegration (3 tests)
- Updated:
tests/test_pdf_extractor.py(23 tests fixed and passing) - Total PDF tests: 49/49 PASSING ✅ (100% pass rate)
Changed
- Enhanced
cli/pdf_extractor_poc.pywith all advanced features - Updated
requirements.txtwith new dependencies - Updated
README.mdwith PDF advanced features usage - Updated
docs/TESTING.mdwith new test counts (142 total tests)
Performance Improvements
- 3.3x faster with parallel processing (8 workers)
- 1.7x faster on re-runs with caching enabled
- Support for unlimited page PDFs (no more 500-page limit)
Dependencies
- Added
Pillow==11.0.0for image processing - Added
pytesseract==0.3.13for OCR support - Tesseract OCR engine (system package, optional)
1.1.0 - 2025-10-22
🌐 Documentation Scraping Enhancements
Major improvements to documentation scraping with unlimited pages, parallel processing, and new configs.
Added
Unlimited Scraping & Performance
- Unlimited Page Scraping - Removed 500-page limit, now supports unlimited pages
- Parallel Scraping Mode - Process multiple pages simultaneously for faster scraping
- Dynamic Rate Limiting - Smart rate limit control to avoid server blocks
- CLI Utilities - New helper scripts for common tasks
New Configurations
- Ansible Core 2.19 - Complete Ansible documentation config
- Claude Code - Documentation for this very tool!
- Laravel 9.x - PHP framework documentation
Testing & Quality
- Comprehensive test coverage for CLI utilities
- Parallel scraping test suite
- Virtual environment setup documentation
- Thread-safety improvements
Fixed
- Thread-safety issues in parallel scraping
- CLI path references across all documentation
- Flaky upload_skill tests
- MCP server streaming subprocess implementation
Changed
- All CLI examples now use
cli/directory prefix - Updated documentation structure
- Enhanced error handling
1.0.0 - 2025-10-19
🎉 First Production Release
This is the first production-ready release of Skill Seekers with complete feature set, full test coverage, and comprehensive documentation.
Added
Smart Auto-Upload Feature
- New
upload_skill.pyCLI tool for automatic API-based upload - Enhanced
package_skill.pywith--uploadflag - Smart API key detection with graceful fallback
- Cross-platform folder opening in
utils.py - Helpful error messages instead of confusing errors
MCP Integration Enhancements
- 9 MCP tools (added
upload_skilltool) mcp__skill-seeker__upload_skill- Upload .zip files to Claude automatically- Enhanced
package_skilltool with smart auto-upload parameter - Updated all MCP documentation to reflect 9 tools
Documentation Improvements
- Updated README with version badge (v1.0.0)
- Enhanced upload guide with 3 upload methods
- Updated MCP setup guide with all 9 tools
- Comprehensive test documentation (14/14 tests)
- All references to tool counts corrected
Fixed
- Missing
import osinmcp/server.py package_skill.pyexit code behavior (now exits 0 when API key missing)- Improved UX with helpful messages instead of errors
Changed
- Test count badge updated (96 → 14 passing)
- All documentation references updated to 9 tools
Testing
- CLI Tests: 8/8 PASSED ✅
- MCP Tests: 6/6 PASSED ✅
- Total: 14/14 PASSED (100%)
0.4.0 - 2025-10-18
Added
Large Documentation Support (40K+ Pages)
- Config splitting functionality for massive documentation sites
- Router/hub skill generation for intelligent query routing
- Checkpoint/resume feature for long scrapes
- Parallel scraping support for faster processing
- 4 split strategies: auto, category, router, size
New CLI Tools
split_config.py- Split large configs into focused sub-skillsgenerate_router.py- Generate router/hub skillspackage_multi.py- Package multiple skills at once
New MCP Tools
split_config- Split large documentation via MCPgenerate_router- Generate router skills via MCP
Documentation
- New
docs/LARGE_DOCUMENTATION.mdguide - Example config:
godot-large-example.json(40K pages)
Changed
- MCP tool count: 6 → 8 tools
- Updated documentation for large docs workflow
0.3.0 - 2025-10-15
Added
MCP Server Integration
- Complete MCP server implementation (
mcp/server.py) - 6 MCP tools for Claude Code integration:
list_configsgenerate_configvalidate_configestimate_pagesscrape_docspackage_skill
Setup & Configuration
- Automated setup script (
setup_mcp.sh) - MCP configuration examples
- Comprehensive MCP setup guide (
docs/MCP_SETUP.md) - MCP testing guide (
docs/TEST_MCP_IN_CLAUDE_CODE.md)
Testing
- 31 comprehensive unit tests for MCP server
- Integration tests via Claude Code MCP protocol
- 100% test pass rate
Documentation
- Complete MCP integration documentation
- Natural language usage examples
- Troubleshooting guides
Changed
- Restructured project as monorepo with CLI and MCP server
- Moved CLI tools to
cli/directory - Added MCP server to
mcp/directory
0.2.0 - 2025-10-10
Added
Testing & Quality
- Comprehensive test suite with 71 tests
- 100% test pass rate
- Test coverage for all major features
- Config validation tests
Optimization
- Page count estimator (
estimate_pages.py) - Framework config optimizations with
start_urls - Better URL pattern coverage
- Improved scraping efficiency
New Configs
- Kubernetes documentation config
- Tailwind CSS config
- Astro framework config
Changed
- Optimized all framework configs
- Improved categorization accuracy
- Enhanced error messages
0.1.0 - 2025-10-05
Added
Initial Release
- Basic documentation scraper functionality
- Manual skill creation
- Framework configs (Godot, React, Vue, Django, FastAPI)
- Smart categorization system
- Code language detection
- Pattern extraction
- Local and API-based enhancement options
- Basic packaging functionality
Core Features
- BFS traversal for documentation scraping
- CSS selector-based content extraction
- Smart categorization with scoring
- Code block detection and formatting
- Caching system for scraped data
- Interactive mode for config creation
Documentation
- README with quick start guide
- Basic usage documentation
- Configuration file examples
Release Links
- v1.2.0 - PDF Advanced Features
- v1.1.0 - Documentation Scraping Enhancements
- v1.0.0 - Production Release
- v0.4.0 - Large Documentation Support
- v0.3.0 - MCP Integration
Version History Summary
| Version | Date | Highlights |
|---|---|---|
| 1.2.0 | 2025-10-23 | 📄 PDF advanced features: OCR, passwords, tables, 3x faster |
| 1.1.0 | 2025-10-22 | 🌐 Unlimited scraping, parallel mode, new configs (Ansible, Laravel) |
| 1.0.0 | 2025-10-19 | 🚀 Production release, auto-upload, 9 MCP tools |
| 0.4.0 | 2025-10-18 | 📚 Large docs support (40K+ pages) |
| 0.3.0 | 2025-10-15 | 🔌 MCP integration with Claude Code |
| 0.2.0 | 2025-10-10 | 🧪 Testing & optimization |
| 0.1.0 | 2025-10-05 | 🎬 Initial release |