DocToSkillConverter has self.skill_dir (string), not self.output_dir.
The --chunk-for-rag flag on scrape command crashed with AttributeError.
Changed to Path(converter.skill_dir).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bare `pip3` can point to a different Python installation than `python3`.
On the reporter's macOS, python3 was 3.14 but pip3 was linked to 3.9,
causing "no matching distribution" since skill-seekers requires >=3.10.
Using `python3 -m pip` guarantees the same interpreter that passed the
version check is the one performing the install.
Closes#301
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add FALLBACK_MAIN_SELECTORS constant and _find_main_content() helper to
eliminate 3 duplicated fallback loops in doc_scraper.py
- Move link extraction before early return in extract_content() so links
are always discovered from the full page, not just main content
- Fix single-threaded dry-run to extract links from soup (full page)
instead of main element only — fixes reactflow.dev finding only 1 page
- Add link extraction to async dry-run path (was completely missing)
- Remove main_content from get_configuration() defaults so fallback logic
kicks in instead of a broad CSS comma selector matching body
- Smart create --config routing: peek at JSON to determine unified
(sources array → unified_scraper) vs simple (base_url → doc_scraper)
- Update docs/user-guide/02-scraping.md and docs/reference/CONFIG_FORMAT.md
to use unified config format (legacy format rejected since v2.11.0)
- Fix test_auto_fetch_enabled and test_mcp_validate_legacy_config
Closes#300
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stage 1 quality improvements from the Arbitrary Limits & Dead Code audit:
Reference file truncation removed:
- codebase_scraper.py: remove code[:500] truncation at 5 locations — reference
files now contain complete code blocks for copy-paste usability
- unified_skill_builder.py: remove issues[:20], releases[:10], body[:500],
and code_snippet[:300] caps in reference files — full content preserved
Enhancement summarizer rewrite:
- enhance_skill_local.py: replace arbitrary [:5] code block cap with
character-budget approach using target_ratio * content_chars
- Fix intro boundary bug: track code block state so intro never ends
inside a code block, which was desynchronizing the parser
- Remove dead _target_lines variable (assigned but never used)
- Heading chunks now also respect the character budget
Hardcoded language fixes:
- unified_skill_builder.py: test examples use ex["language"] instead of
always "python" for syntax highlighting
- how_to_guide_builder.py: add language field to HowToGuide dataclass,
set from workflow at creation, used in AI enhancement prompt
Test fixes:
- test_enhance_skill_local.py: rename test to test_code_blocks_not_arbitrarily_capped,
fix assertion to count actual blocks (```count // 2), use target_ratio=0.9
Documentation:
- Add Stage 1 plan, implementation summary, review, and corrected docs
- Update CHANGELOG.md with all changes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix#299: rename --chunk-size/--chunk-overlap to --streaming-chunk-size/
--streaming-overlap in arguments/package.py to avoid collision with the
RAG --chunk-size flag from arguments/common.py
- Phase 1a: make package_skill.py import args via add_package_arguments()
instead of a 105-line inline duplicate argparse block; fixes the root
cause of _reconstruct_argv() passing unrecognised flag names
- Phase 1b: centralise setup_logging() into utils.py and remove 4
duplicate module-level logging.basicConfig() calls from doc_scraper.py,
github_scraper.py, codebase_scraper.py, and unified_scraper.py
- Fix test_package_structure.py / test_cli_paths.py version strings
(3.1.1 → 3.1.2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix 1 — gemini.py: replace deprecated gemini-2.0-flash-exp (404 errors)
with gemini-2.5-flash (stable, GA, Google's recommended replacement).
Closes#290.
Fix 2 — enhance dispatcher: implement the documented auto-detection that
was missing from the code. skill-seekers enhance now correctly routes:
- ANTHROPIC_API_KEY set → Claude API mode (enhance_skill.py)
- GOOGLE_API_KEY set → Gemini API mode
- OPENAI_API_KEY set → OpenAI API mode
- No API keys → LOCAL mode (Claude Code Max, free)
Use --mode LOCAL to force local mode even when an API key is present.
9 new tests cover _detect_api_target() priority logic and main()
routing (API delegation, --mode LOCAL override, no-key fallback).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All scrapers (scrape, github, analyze, pdf) now share a common argument
contract via add_all_standard_arguments() in arguments/common.py.
Universal flags (--dry-run, --verbose, --quiet, --name, --description,
workflow args) work consistently across all source types.
Previously, `create <url> --dry-run`, `create owner/repo --dry-run`,
and `create ./path --dry-run` would crash because sub-scrapers didn't
accept those flags. Also fixes main.py _handle_analyze_command() not
forwarding --dry-run, --preset, --quiet, --name, --description to
codebase_scraper.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The create command crashed with 'Namespace' object has no attribute
'max_pages' because it accessed args.max_pages directly instead of
using getattr() like all other source-specific attributes in the
same method.
Closes#293
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- README, CONTRIBUTING, QUALITY_GUIDELINES, AGENTS.md all aligned with
production best practices (accurate counts, no max_pages, unified format)
- validate-config.py: fix two bugs (unified config categories lookup,
max_pages warning logic)
- Delete old submit-config.md (duplicate of submit-config.yml with
outdated content)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review and update all 7 configs in build-tools:
esbuild, rollup, storybook, swc, turborepo, vite, webpack — all v1.1.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review and update all 2 configs in api-tech:
- graphql.json: add mutations/subscriptions/variables categories,
more start_urls, v1.1.0
- trpc.json: update for tRPC v11, TanStack Query, more start_urls,
data_transformers category, v1.1.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Points submodule to merged main commit (bf9b0ff) after ai-ml
category review and enhancement was merged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review and update all 34 configs in the ai-ml category:
- Remove max_pages from all configs
- Rewrite anthropic, openai-api, langchain, ollama for current state
- Fix URL patterns in chroma, seaborn, nltk, keras, deepspeed
- All configs pass dry-run validation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The api/configs_repo git submodule was pinned to commit d4c0710 which only
had 14 configs. Updated to latest main (4275d6f) which has 178 configs across
21 categories (web-frameworks, ai-ml, game-engines, databases, devops, etc.)
Also fixed ConfigAnalyzer._categorize_config() to use directory structure
(official/{category}/{name}.json) as authoritative category instead of
keyword matching, which was classifying most new configs as "uncategorized".
Result: API /api/configs now returns 178 configs (was 14).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>