Commit Graph

9 Commits

Author SHA1 Message Date
yusyus
cc9cc32417 feat: add skill-seekers video --setup for GPU auto-detection and dependency installation
Auto-detects NVIDIA (CUDA), AMD (ROCm), or CPU-only GPU and installs the
correct PyTorch variant + easyocr + all visual extraction dependencies.
Removes easyocr from video-full pip extras to avoid pulling ~2GB of wrong
CUDA packages on non-NVIDIA systems.

New files:
- video_setup.py (835 lines): GPU detection, PyTorch install, ROCm config,
  venv checks, system dep validation, module selection, verification
- test_video_setup.py (60 tests): Full coverage of detection, install, verify

Updated docs: CHANGELOG, AGENTS.md, CLAUDE.md, README.md, CLI_REFERENCE,
FAQ, TROUBLESHOOTING, installation guide, video dependency plan

All 2523 tests passing (15 skipped).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:39:16 +03:00
yusyus
b81d55fda0 feat(B2): add Microsoft Word (.docx) support
Implements ROADMAP task B2 — full .docx scraping support via mammoth +
python-docx, producing SKILL.md + references/ output identical to other
source types.

New files:
- src/skill_seekers/cli/word_scraper.py — WordToSkillConverter class +
  main() entry point (~600 lines); mammoth → BeautifulSoup pipeline;
  handles headings, code detection (incl. monospace <p><br> blocks),
  tables, images, metadata extraction
- src/skill_seekers/cli/arguments/word.py — add_word_arguments() +
  WORD_ARGUMENTS dict
- src/skill_seekers/cli/parsers/word_parser.py — WordParser for unified
  CLI parser registry
- tests/test_word_scraper.py — comprehensive test suite (~300 lines)

Modified files:
- src/skill_seekers/cli/main.py — registered "word" command module
- src/skill_seekers/cli/source_detector.py — .docx auto-detection +
  _detect_word() classmethod
- src/skill_seekers/cli/create_command.py — _route_word() + --help-word
- src/skill_seekers/cli/arguments/create.py — WORD_ARGUMENTS + routing
- src/skill_seekers/cli/arguments/__init__.py — export word args
- src/skill_seekers/cli/parsers/__init__.py — register WordParser
- src/skill_seekers/cli/unified_scraper.py — _scrape_word() integration
- src/skill_seekers/cli/pdf_scraper.py — fix: real enhancement instead
  of stub; remove [:3] reference file limit; capture run_workflows return
- src/skill_seekers/cli/github_scraper.py — fix: remove arbitrary
  open_issues[:20] / closed_issues[:10] reference file limits
- pyproject.toml — skill-seekers-word entry point + docx optional dep
- tests/test_cli_parsers.py — update parser count 21→22

Bug fixes applied during real-world testing:
- Code detection: detect monospace <p><br> blocks as code (mammoth
  renders Courier paragraphs this way, not as <pre>/<code>)
- Language detector: fix wrong method name detect_from_text →
  detect_from_code
- Description inference: pass None from main() so extract_docx() can
  infer description from Word document subject/title metadata
- Bullet-point guard: exclude prose starting with •/-/* from code scoring
- Enhancement: implement real API/LOCAL enhancement (was stub)
- pip install message: add quotes around skill-seekers[docx]

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 21:47:30 +03:00
yusyus
ba9a8ff8b5 docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations
Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/

Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)

Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config

Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:01:51 +03:00
yusyus
a78f3fb376 docs: update version references and counts across markdown files
- AGENTS.md: version 3.0.0 → 3.1.0-dev
- CLAUDE.md: version v3.0.0 → v3.1.0-dev (2 places)
- ROADMAP.md: status v2.7.0 → v3.1.0-dev, 18 MCP tools → 26, 1200+ tests → 1,880+, add recent improvements
- docs/README.md: "New in v2.7.0" → "New in v3.x", 1200+ tests → 1,880+, docs version 2.7.0 → 3.1.0-dev, date updated

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-18 22:17:41 +03:00
yusyus
a9b51ab3fe feat: add enhancement workflow system and unified enhancer
- enhancement_workflow.py: WorkflowEngine class for multi-stage AI
  enhancement workflows with preset support (security-focus,
  architecture-comprehensive, api-documentation, minimal, default)
- unified_enhancer.py: unified enhancement orchestrator integrating
  workflow execution with traditional enhance-level based enhancement
- create_command.py: wire workflow args into the unified create command
- AGENTS.md: update agent capability documentation
- configs/godot_unified.json: add unified Godot documentation config
- ENHANCEMENT_WORKFLOW_SYSTEM.md: documentation for the workflow system
- WORKFLOW_ENHANCEMENT_SEQUENTIAL_EXECUTION.md: docs explaining
  sequential execution of workflows followed by AI enhancement

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 22:14:19 +03:00
yusyus
4c6d885725 docs: Enhance CLAUDE.md with unified create command documentation
Major improvements to developer documentation:

CLAUDE.md:
- Add unified `create` command to quick command reference
- New comprehensive section on unified create command architecture
  - Auto-detection of source types (web/GitHub/local/PDF/config)
  - Progressive disclosure help system (--help-web, --help-github, etc.)
  - Universal flags that work across all sources
  - -p shortcut for preset selection
- Document enhancement flag consolidation (Phase 1)
  - Old: --enhance, --enhance-local, --api-key (3 flags)
  - New: --enhance-level 0-3 (1 granular flag)
  - Auto-detection of API vs LOCAL mode
- Add "Modifying the Unified Create Command" section
  - Three-tier argument system (universal/source-specific/advanced)
  - File locations and architecture
  - Examples for contributors
- New troubleshooting: "Confused About Command Options"
- Update test counts: 1,765 current (1,852+ in v3.1.0)
- Add v3.1.0 to recent achievements
- Update best practices to prioritize create command

AGENTS.md:
- Update version to 3.0.0
- Add new directories: arguments/, presets/, create_command.py

These changes ensure future Claude instances understand the CLI
refactor work and can effectively contribute to the project.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 20:11:51 +03:00
yusyus
c8195bcd3a fix: QA audit - Fix 5 critical bugs in preset system
Comprehensive QA audit found and fixed 9 issues (5 critical, 2 docs, 2 minor).
All 65 tests now passing with correct runtime behavior.

## Critical Bugs Fixed

1. **--preset-list not working** (Issue #4)
   - Moved check before parse_args() to bypass --directory validation
   - Fix: Check sys.argv for --preset-list before parsing

2. **Missing preset flags in codebase_scraper.py** (Issue #5)
   - Preset flags only in analyze_parser.py, not codebase_scraper.py
   - Fix: Added --preset, --preset-list, --quick, --comprehensive to codebase_scraper.py

3. **Preset depth not applied** (Issue #7)
   - --depth default='deep' overrode preset's depth='surface'
   - Fix: Changed --depth default to None, apply default after preset logic

4. **No deprecation warnings** (Issue #6)
   - Fixed by Issue #5 (adding flags to parser)

5. **Argparse defaults conflict with presets** (Issue #8)
   - Related to Issue #7, same fix

## Documentation Errors Fixed

- Issue #1: Test count (10 not 20 for Phase 1)
- Issue #2: Total test count (65 not 75)
- Issue #3: File name (base.py not base_adaptor.py)

## Verification

All 65 tests passing:
- Phase 1 (Chunking): 10/10 ✓
- Phase 2 (Upload): 15/15 ✓
- Phase 3 (CLI): 16/16 ✓
- Phase 4 (Presets): 24/24 ✓

Runtime behavior verified:
✓ --preset-list shows available presets
✓ --quick sets depth=surface (not deep)
✓ CLI overrides work correctly
✓ Deprecation warnings function

See QA_AUDIT_REPORT.md for complete details.

Quality: 9.8/10 → 10/10 (Exceptional)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 02:12:06 +03:00
yusyus
6cb446d213 docs: Add 5 vector database integration guides (HAYSTACK, WEAVIATE, CHROMA, FAISS, QDRANT)
- Add HAYSTACK.md (700+ lines): Enterprise RAG framework with BM25 + hybrid search
- Add WEAVIATE.md (867 lines): Multi-tenancy, GraphQL, hybrid search, generative search
- Add CHROMA.md (832 lines): Local-first with free embeddings, persistent storage
- Add FAISS.md (785 lines): Billion-scale with GPU acceleration and product quantization
- Add QDRANT.md (746 lines): High-performance Rust engine with rich filtering

All guides follow proven 11-section pattern:
- Problem/Solution/Quick Start/Setup/Advanced/Best Practices
- Real-world examples (100-200 lines working code)
- Troubleshooting sections
- Before/After comparisons

Total: ~3,930 lines of comprehensive integration documentation

Test results:
- 26/26 tests passing for new features (RAG chunker + Haystack adaptor)
- 108 total tests passing (100%)
- 0 failures

This completes all optional integration guides from ACTION_PLAN.md.
Universal preprocessor positioning now covers:
- RAG Frameworks: LangChain, LlamaIndex, Haystack (3/3)
- Vector Databases: Pinecone, Weaviate, Chroma, FAISS, Qdrant (5/5)
- AI Coding Tools: Cursor, Windsurf, Cline, Continue.dev (4/4)
- Chat Platforms: Claude, Gemini, ChatGPT (3/3)

Total: 15 integration guides across 4 categories (+50% coverage)

Ready for v2.10.0 release.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-07 21:34:28 +03:00
yusyus
80a40b4fc9 docs: Add AGENTS.md guide for AI coding agents
- Comprehensive guide for AI assistants working with the codebase
- Covers project structure, development commands, architecture patterns
- Includes testing guidelines, CI/CD info, and troubleshooting
- Documents all entry points, dependencies, and best practices
2026-02-01 16:31:20 +03:00