Commit Graph

6 Commits

Author SHA1 Message Date
yusyus
064405c052 fix: resolve 18 bugs and code quality issues across adaptors, CLI, and chunking pipeline
Bug fixes:
- Fix --var flag silently dropped in create routing (args.workflow_var → args.var)
- Fix double _score_code_quality() call in word scraper
- Add .docx file extension validation in WordToSkillConverter
- Fix weaviate ImportError masked by generic Exception handler
- Fix RAG chunking crash using non-existent converter.output_dir

Chunking pipeline improvements:
- Wire --chunk-overlap-tokens through entire package pipeline
  (package_skill → adaptor.package → format_skill_md → _maybe_chunk_content → RAGChunker)
- Add auto-scaling overlap: max(50, chunk_tokens//10) when chunk size is non-default
- Rename --no-preserve-code to --no-preserve-code-blocks (backward-compat alias kept)
- Replace hardcoded 512/50 chunk defaults with DEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENS
  constants across all 12 concrete adaptors, rag_chunker, base, and package_skill

Code quality:
- Extract shared _generate_openai_embeddings() and _generate_st_embeddings() to SkillAdaptor
  base class, removing ~150 lines of duplication from chroma/weaviate/pinecone
- Add Pinecone adaptor with full upload support (pinecone_adaptor.py)

Tests (14 new):
- chunk_overlap_tokens parameter wiring, auto-scaling overlap, preserve_code_blocks flag
- .docx/.doc/no-extension file validation, --var flag routing E2E
- Embedding method inheritance verification, backward-compatible flag aliases

Docs:
- Update CHANGELOG, CLI_REFERENCE, API_REFERENCE, packaging guide (EN+ZH)
- Update README test count badge (1880+ → 2283+)

All 2283 tests passing, 8 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:57:59 +03:00
yusyus
7a2ffb286c refactor: rename all chunk flags to include explicit units
Replace ambiguous --chunk-size / --chunk-overlap names that meant different
things in different contexts (tokens vs characters) with fully explicit names:

- --chunk-size (RAG tokens)     → --chunk-tokens
- --chunk-overlap (RAG tokens)  → --chunk-overlap-tokens
- --chunk (enable RAG chunking) → --chunk-for-rag
- --streaming-chunk-size (chars) → --streaming-chunk-chars
- --streaming-overlap (chars)    → --streaming-overlap-chars
- --chunk-size (PDF pages)       → --pdf-pages-per-chunk (poc file)

Also aligns stream_parser.py help with streaming_ingest.py standalone parser.
All 2167 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:07:56 +03:00
yusyus
b636a0a292 fix: resolve issue #299 and Phase 1 cleanup
- Fix #299: rename --chunk-size/--chunk-overlap to --streaming-chunk-size/
  --streaming-overlap in arguments/package.py to avoid collision with the
  RAG --chunk-size flag from arguments/common.py
- Phase 1a: make package_skill.py import args via add_package_arguments()
  instead of a 105-line inline duplicate argparse block; fixes the root
  cause of _reconstruct_argv() passing unrecognised flag names
- Phase 1b: centralise setup_logging() into utils.py and remove 4
  duplicate module-level logging.basicConfig() calls from doc_scraper.py,
  github_scraper.py, codebase_scraper.py, and unified_scraper.py
- Fix test_package_structure.py / test_cli_paths.py version strings
  (3.1.1 → 3.1.2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 21:22:05 +03:00
yusyus
57061b7daf style: Auto-format 48 files with ruff format
- Fixed formatting to comply with ruff standards
- No functional changes, only formatting/style
- Completes CI/CD pipeline formatting requirements
2026-02-15 20:24:32 +03:00
yusyus
83b03d9f9f fix: Resolve all linting errors from ruff
Fix 145 linting errors across CLI refactor code:

Type annotation modernization (Python 3.9+):
- Replace typing.Dict with dict
- Replace typing.List with list
- Replace typing.Set with set
- Replace Optional[X] with X | None

Code quality improvements:
- Remove trailing whitespace (W291)
- Remove whitespace from blank lines (W293)
- Remove unused imports (F401)
- Use dictionary lookup instead of if-elif chains (SIM116)
- Combine nested if statements (SIM102)

Files fixed (45 files):
- src/skill_seekers/cli/arguments/*.py (10 files)
- src/skill_seekers/cli/parsers/*.py (24 files)
- src/skill_seekers/cli/presets/*.py (4 files)
- src/skill_seekers/cli/create_command.py
- src/skill_seekers/cli/source_detector.py
- src/skill_seekers/cli/github_scraper.py
- tests/test_*.py (5 test files)

All files now pass ruff linting checks.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 20:20:55 +03:00
yusyus
ba1670a220 feat: Unified create command + consolidated enhancement flags
This commit includes two major improvements:

## 1. Unified Create Command (v3.0.0 feature)
- Auto-detects source type (web, GitHub, local, PDF, config)
- Three-tier argument organization (universal, source-specific, advanced)
- Routes to existing scrapers (100% backward compatible)
- Progressive disclosure: 15 universal flags in default help

**New files:**
- src/skill_seekers/cli/source_detector.py - Auto-detection logic
- src/skill_seekers/cli/arguments/create.py - Argument definitions
- src/skill_seekers/cli/create_command.py - Main orchestrator
- src/skill_seekers/cli/parsers/create_parser.py - Parser integration

**Tests:**
- tests/test_source_detector.py (35 tests)
- tests/test_create_arguments.py (30 tests)
- tests/test_create_integration_basic.py (10 tests)

## 2. Enhanced Flag Consolidation (Phase 1)
- Consolidated 3 flags (--enhance, --enhance-local, --enhance-level) → 1 flag
- --enhance-level 0-3 with auto-detection of API vs LOCAL mode
- Default: --enhance-level 2 (balanced enhancement)

**Modified files:**
- arguments/{common,create,scrape,github,analyze}.py - Added enhance_level
- {doc_scraper,github_scraper,config_extractor,main}.py - Updated logic
- create_command.py - Uses consolidated flag

**Auto-detection:**
- If ANTHROPIC_API_KEY set → API mode
- Else → LOCAL mode (Claude Code)

## 3. PresetManager Bug Fix
- Fixed module naming conflict (presets.py vs presets/ directory)
- Moved presets.py → presets/manager.py
- Updated __init__.py exports

**Test Results:**
- All 160+ tests passing
- Zero regressions
- 100% backward compatible

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 14:29:19 +03:00