Commit Graph

10 Commits

Author SHA1 Message Date
yusyus
6c31990941 style: fix ruff lint and formatting errors
- E741: rename ambiguous variable `l` → `line_text` in enhance_skill_local.py
- ARG001: suppress unused `doc` param in word_scraper _build_section()
- SIM108: use ternary for code_text assignment in word_scraper
- F841: remove unused `metadata` variable in test_chunking_integration
- F401: remove unused imports in test_pinecone_adaptor
- ARG001: rename unused `docs` → `_docs` in test_pinecone_adaptor
- Format 20 files to match ruff formatting rules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 10:54:32 +03:00
yusyus
064405c052 fix: resolve 18 bugs and code quality issues across adaptors, CLI, and chunking pipeline
Bug fixes:
- Fix --var flag silently dropped in create routing (args.workflow_var → args.var)
- Fix double _score_code_quality() call in word scraper
- Add .docx file extension validation in WordToSkillConverter
- Fix weaviate ImportError masked by generic Exception handler
- Fix RAG chunking crash using non-existent converter.output_dir

Chunking pipeline improvements:
- Wire --chunk-overlap-tokens through entire package pipeline
  (package_skill → adaptor.package → format_skill_md → _maybe_chunk_content → RAGChunker)
- Add auto-scaling overlap: max(50, chunk_tokens//10) when chunk size is non-default
- Rename --no-preserve-code to --no-preserve-code-blocks (backward-compat alias kept)
- Replace hardcoded 512/50 chunk defaults with DEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENS
  constants across all 12 concrete adaptors, rag_chunker, base, and package_skill

Code quality:
- Extract shared _generate_openai_embeddings() and _generate_st_embeddings() to SkillAdaptor
  base class, removing ~150 lines of duplication from chroma/weaviate/pinecone
- Add Pinecone adaptor with full upload support (pinecone_adaptor.py)

Tests (14 new):
- chunk_overlap_tokens parameter wiring, auto-scaling overlap, preserve_code_blocks flag
- .docx/.doc/no-extension file validation, --var flag routing E2E
- Embedding method inheritance verification, backward-compatible flag aliases

Docs:
- Update CHANGELOG, CLI_REFERENCE, API_REFERENCE, packaging guide (EN+ZH)
- Update README test count badge (1880+ → 2283+)

All 2283 tests passing, 8 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:57:59 +03:00
yusyus
7a2ffb286c refactor: rename all chunk flags to include explicit units
Replace ambiguous --chunk-size / --chunk-overlap names that meant different
things in different contexts (tokens vs characters) with fully explicit names:

- --chunk-size (RAG tokens)     → --chunk-tokens
- --chunk-overlap (RAG tokens)  → --chunk-overlap-tokens
- --chunk (enable RAG chunking) → --chunk-for-rag
- --streaming-chunk-size (chars) → --streaming-chunk-chars
- --streaming-overlap (chars)    → --streaming-overlap-chars
- --chunk-size (PDF pages)       → --pdf-pages-per-chunk (poc file)

Also aligns stream_parser.py help with streaming_ingest.py standalone parser.
All 2167 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:07:56 +03:00
yusyus
47226340ac feat: add CONFIG_ARGUMENTS and fix _route_config for unified scraper parity
Previously _route_config only forwarded --dry-run, silently dropping
all enhancement workflows, --merge-mode, and --skip-codebase-analysis.

Changes:
- arguments/create.py: add CONFIG_ARGUMENTS dict with merge_mode and
  skip_codebase_analysis; wire into get_source_specific_arguments(),
  get_compatible_arguments(), and add_create_arguments(mode='config')
- create_command.py: fix _route_config to forward --fresh, --merge-mode,
  --skip-codebase-analysis, and all 4 workflow flags; add --help-config
  handler (skill-seekers create --help-config) matching other help modes
- parsers/create_parser.py: add --help-config flag for unified CLI parity
- tests/test_create_arguments.py: import CONFIG_ARGUMENTS; update config
  source tests to assert correct content instead of empty dict

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:51:04 +03:00
yusyus
c996e88dac feat: wire --local-repo-path into create command and add validation
- Add --local-repo-path to UNIVERSAL_ARGUMENTS in create.py so it is
  registered in the actual parser (not just help display)
- Add --local-repo-path to GITHUB_ARGUMENTS in arguments/github.py for
  the standalone github subcommand
- Forward --local-repo-path through create_command._route_github() to
  github_scraper
- Add local_repo_path to the config dict built from CLI args in
  github_scraper.main()
- Add early validation in GitHubScraper.__init__(): warn and reset to
  None if path does not exist, triggering a real GitHub API fallback
  instead of silently operating with an empty file tree (fixes #281)
- Update test_create_arguments.py count/names assertions (17 -> 18)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 07:28:49 +03:00
yusyus
60c46673ed feat: support multiple --enhance-workflow flags with shared workflow_runner
- Change --enhance-workflow from type:str to action:append in all argument
  files (workflow, create, scrape, github, pdf) so the flag can be given
  multiple times to chain workflows in sequence
- Add workflow_runner.py: shared utility used by all 4 scrapers
  - collect_workflow_vars(): merges extra context then user --var flags
    (user flags take precedence over scraper metadata)
  - run_workflows(): executes named workflows in order, then any inline
    --enhance-stage workflow; handles dry-run/preview mode
- Remove duplicate ~115-130 line workflow blocks from doc_scraper,
  github_scraper, pdf_scraper, and codebase_scraper; replace with
  single run_workflows() call each
- Remove mutual exclusivity between workflows and AI enhancement:
  workflows now run first, then traditional enhancement continues
  independently (--enhance-level 0 to disable)
- Add tests/test_workflow_runner.py: 21 tests covering no-flags, single
  workflow, multiple/chained workflows, inline stages, mixed mode,
  variable precedence, and dry-run
- Fix test_markdown_parsing: accept "text" or "unknown" for unlabelled
  code blocks (unified MarkdownParser returns "text" by default)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 22:05:27 +03:00
yusyus
57061b7daf style: Auto-format 48 files with ruff format
- Fixed formatting to comply with ruff standards
- No functional changes, only formatting/style
- Completes CI/CD pipeline formatting requirements
2026-02-15 20:24:32 +03:00
yusyus
83b03d9f9f fix: Resolve all linting errors from ruff
Fix 145 linting errors across CLI refactor code:

Type annotation modernization (Python 3.9+):
- Replace typing.Dict with dict
- Replace typing.List with list
- Replace typing.Set with set
- Replace Optional[X] with X | None

Code quality improvements:
- Remove trailing whitespace (W291)
- Remove whitespace from blank lines (W293)
- Remove unused imports (F401)
- Use dictionary lookup instead of if-elif chains (SIM116)
- Combine nested if statements (SIM102)

Files fixed (45 files):
- src/skill_seekers/cli/arguments/*.py (10 files)
- src/skill_seekers/cli/parsers/*.py (24 files)
- src/skill_seekers/cli/presets/*.py (4 files)
- src/skill_seekers/cli/create_command.py
- src/skill_seekers/cli/source_detector.py
- src/skill_seekers/cli/github_scraper.py
- tests/test_*.py (5 test files)

All files now pass ruff linting checks.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 20:20:55 +03:00
yusyus
13838cb5a9 feat(cli): Phase 2 - Organize RAG arguments into common.py (DRY principle)
Changes:
- Added RAG_ARGUMENTS dict to common.py with 3 flags:
  - --chunk-for-rag (enable semantic chunking)
  - --chunk-size (default: 512 tokens)
  - --chunk-overlap (default: 50 tokens)
- Removed duplicate RAG arguments from create.py and scrape.py
- Used .update() pattern to merge RAG_ARGUMENTS into UNIVERSAL_ARGUMENTS and SCRAPE_ARGUMENTS
- Added helper functions: add_rag_arguments(), get_rag_argument_names()
- Updated tests to reflect new argument count (15 → 13 universal arguments)
- Fixed test expectations for boolean_args (removed 'enhance', 'enhance_local')

Result:
- Single source of truth for RAG arguments in common.py
- DRY principle maintained across all commands
- All 88 key tests passing

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 14:41:04 +03:00
yusyus
ba1670a220 feat: Unified create command + consolidated enhancement flags
This commit includes two major improvements:

## 1. Unified Create Command (v3.0.0 feature)
- Auto-detects source type (web, GitHub, local, PDF, config)
- Three-tier argument organization (universal, source-specific, advanced)
- Routes to existing scrapers (100% backward compatible)
- Progressive disclosure: 15 universal flags in default help

**New files:**
- src/skill_seekers/cli/source_detector.py - Auto-detection logic
- src/skill_seekers/cli/arguments/create.py - Argument definitions
- src/skill_seekers/cli/create_command.py - Main orchestrator
- src/skill_seekers/cli/parsers/create_parser.py - Parser integration

**Tests:**
- tests/test_source_detector.py (35 tests)
- tests/test_create_arguments.py (30 tests)
- tests/test_create_integration_basic.py (10 tests)

## 2. Enhanced Flag Consolidation (Phase 1)
- Consolidated 3 flags (--enhance, --enhance-local, --enhance-level) → 1 flag
- --enhance-level 0-3 with auto-detection of API vs LOCAL mode
- Default: --enhance-level 2 (balanced enhancement)

**Modified files:**
- arguments/{common,create,scrape,github,analyze}.py - Added enhance_level
- {doc_scraper,github_scraper,config_extractor,main}.py - Updated logic
- create_command.py - Uses consolidated flag

**Auto-detection:**
- If ANTHROPIC_API_KEY set → API mode
- Else → LOCAL mode (Claude Code)

## 3. PresetManager Bug Fix
- Fixed module naming conflict (presets.py vs presets/ directory)
- Moved presets.py → presets/manager.py
- Updated __init__.py exports

**Test Results:**
- All 160+ tests passing
- Zero regressions
- 100% backward compatible

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 14:29:19 +03:00