Replace → (U+2192) with -> in argparse help strings. Windows cmd uses
cp1252 encoding which cannot render unicode arrows, causing --help to
crash with UnicodeEncodeError.
Auto-detects NVIDIA (CUDA), AMD (ROCm), or CPU-only GPU and installs the
correct PyTorch variant + easyocr + all visual extraction dependencies.
Removes easyocr from video-full pip extras to avoid pulling ~2GB of wrong
CUDA packages on non-NVIDIA systems.
New files:
- video_setup.py (835 lines): GPU detection, PyTorch install, ROCm config,
venv checks, system dep validation, module selection, verification
- test_video_setup.py (60 tests): Full coverage of detection, install, verify
Updated docs: CHANGELOG, AGENTS.md, CLAUDE.md, README.md, CLI_REFERENCE,
FAQ, TROUBLESHOOTING, installation guide, video dependency plan
All 2523 tests passing (15 skipped).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix extract_visual_data returning 2-tuple instead of 3 (ValueError crash)
- Move pytesseract from core deps to [video-full] optional group
- Add 30-min timeout + user feedback to video enhancement subprocess
- Add scrape_video_impl to MCP server fallback import block
- Detect auto-generated YouTube captions via is_generated property
- Forward --vision-ocr and --video-playlist through create command
- Fix filename collision for non-ASCII video titles (fallback to video_id)
- Make _vision_used a proper dataclass field on FrameSubSection
- Expose 6 visual params in MCP scrape_video tool
- Add install instructions on missing video deps in unified scraper
- Update MCP docstring tool counts (25→33, 7 categories)
- Add video and word commands to main.py docstring
- Document video-full exclusion from [all] deps in pyproject.toml
- Update parser registry test count (22→23 for video parser)
All 2437 tests passing, 0 failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sync with latest development changes including ruff formatting,
bug fixes, and pinecone adaptor additions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add FALLBACK_MAIN_SELECTORS constant and _find_main_content() helper to
eliminate 3 duplicated fallback loops in doc_scraper.py
- Move link extraction before early return in extract_content() so links
are always discovered from the full page, not just main content
- Fix single-threaded dry-run to extract links from soup (full page)
instead of main element only — fixes reactflow.dev finding only 1 page
- Add link extraction to async dry-run path (was completely missing)
- Remove main_content from get_configuration() defaults so fallback logic
kicks in instead of a broad CSS comma selector matching body
- Smart create --config routing: peek at JSON to determine unified
(sources array → unified_scraper) vs simple (base_url → doc_scraper)
- Update docs/user-guide/02-scraping.md and docs/reference/CONFIG_FORMAT.md
to use unified config format (legacy format rejected since v2.11.0)
- Fix test_auto_fetch_enabled and test_mcp_validate_legacy_config
Closes#300
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All scrapers (scrape, github, analyze, pdf) now share a common argument
contract via add_all_standard_arguments() in arguments/common.py.
Universal flags (--dry-run, --verbose, --quiet, --name, --description,
workflow args) work consistently across all source types.
Previously, `create <url> --dry-run`, `create owner/repo --dry-run`,
and `create ./path --dry-run` would crash because sub-scrapers didn't
accept those flags. Also fixes main.py _handle_analyze_command() not
forwarding --dry-run, --preset, --quiet, --name, --description to
codebase_scraper.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The create command crashed with 'Namespace' object has no attribute
'max_pages' because it accessed args.max_pages directly instead of
using getattr() like all other source-specific attributes in the
same method.
Closes#293
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously _route_config only forwarded --dry-run, silently dropping
all enhancement workflows, --merge-mode, and --skip-codebase-analysis.
Changes:
- arguments/create.py: add CONFIG_ARGUMENTS dict with merge_mode and
skip_codebase_analysis; wire into get_source_specific_arguments(),
get_compatible_arguments(), and add_create_arguments(mode='config')
- create_command.py: fix _route_config to forward --fresh, --merge-mode,
--skip-codebase-analysis, and all 4 workflow flags; add --help-config
handler (skill-seekers create --help-config) matching other help modes
- parsers/create_parser.py: add --help-config flag for unified CLI parity
- tests/test_create_arguments.py: import CONFIG_ARGUMENTS; update config
source tests to assert correct content instead of empty dict
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add --local-repo-path to UNIVERSAL_ARGUMENTS in create.py so it is
registered in the actual parser (not just help display)
- Add --local-repo-path to GITHUB_ARGUMENTS in arguments/github.py for
the standalone github subcommand
- Forward --local-repo-path through create_command._route_github() to
github_scraper
- Add local_repo_path to the config dict built from CLI args in
github_scraper.main()
- Add early validation in GitHubScraper.__init__(): warn and reset to
None if path does not exist, triggering a real GitHub API fallback
instead of silently operating with an empty file tree (fixes#281)
- Update test_create_arguments.py count/names assertions (17 -> 18)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add YAML-based enhancement workflow presets shipped inside the package
(default, minimal, security-focus, architecture-comprehensive, api-documentation)
- Add `skill-seekers workflows` subcommand: list, show, copy, add, remove, validate
- copy/add/remove all accept multiple names/files in one invocation with partial-failure behaviour
- `add --name` override restricted to single-file operations
- Add 5 MCP tools: list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow
- Fix: create command _add_common_args() now correctly forwards each --enhance-workflow
as a separate flag instead of passing the whole list as a single argument
- Update README: reposition as "data layer for AI systems" with AI Skills front and centre
- Update CHANGELOG, QUICK_REFERENCE, CLAUDE.md with workflow preset details
- 1,880+ tests passing
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- enhancement_workflow.py: WorkflowEngine class for multi-stage AI
enhancement workflows with preset support (security-focus,
architecture-comprehensive, api-documentation, minimal, default)
- unified_enhancer.py: unified enhancement orchestrator integrating
workflow execution with traditional enhance-level based enhancement
- create_command.py: wire workflow args into the unified create command
- AGENTS.md: update agent capability documentation
- configs/godot_unified.json: add unified Godot documentation config
- ENHANCEMENT_WORKFLOW_SYSTEM.md: documentation for the workflow system
- WORKFLOW_ENHANCEMENT_SEQUENTIAL_EXECUTION.md: docs explaining
sequential execution of workflows followed by AI enhancement
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Use underscore prefix for help flag destinations (_help_web, etc.)
- Handle help flags in main.py argv reconstruction
- Ensures progressive disclosure works through unified CLI
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>