5 Commits

Author SHA1 Message Date
yusyus
6c31990941 style: fix ruff lint and formatting errors
- E741: rename ambiguous variable `l` → `line_text` in enhance_skill_local.py
- ARG001: suppress unused `doc` param in word_scraper _build_section()
- SIM108: use ternary for code_text assignment in word_scraper
- F841: remove unused `metadata` variable in test_chunking_integration
- F401: remove unused imports in test_pinecone_adaptor
- ARG001: rename unused `docs` → `_docs` in test_pinecone_adaptor
- Format 20 files to match ruff formatting rules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 10:54:32 +03:00
yusyus
064405c052 fix: resolve 18 bugs and code quality issues across adaptors, CLI, and chunking pipeline
Bug fixes:
- Fix --var flag silently dropped in create routing (args.workflow_var → args.var)
- Fix double _score_code_quality() call in word scraper
- Add .docx file extension validation in WordToSkillConverter
- Fix weaviate ImportError masked by generic Exception handler
- Fix RAG chunking crash using non-existent converter.output_dir

Chunking pipeline improvements:
- Wire --chunk-overlap-tokens through entire package pipeline
  (package_skill → adaptor.package → format_skill_md → _maybe_chunk_content → RAGChunker)
- Add auto-scaling overlap: max(50, chunk_tokens//10) when chunk size is non-default
- Rename --no-preserve-code to --no-preserve-code-blocks (backward-compat alias kept)
- Replace hardcoded 512/50 chunk defaults with DEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENS
  constants across all 12 concrete adaptors, rag_chunker, base, and package_skill

Code quality:
- Extract shared _generate_openai_embeddings() and _generate_st_embeddings() to SkillAdaptor
  base class, removing ~150 lines of duplication from chroma/weaviate/pinecone
- Add Pinecone adaptor with full upload support (pinecone_adaptor.py)

Tests (14 new):
- chunk_overlap_tokens parameter wiring, auto-scaling overlap, preserve_code_blocks flag
- .docx/.doc/no-extension file validation, --var flag routing E2E
- Embedding method inheritance verification, backward-compatible flag aliases

Docs:
- Update CHANGELOG, CLI_REFERENCE, API_REFERENCE, packaging guide (EN+ZH)
- Update README test count badge (1880+ → 2283+)

All 2283 tests passing, 8 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:57:59 +03:00
yusyus
0265de5816 style: Format all Python files with ruff
- Formatted 103 files to comply with ruff format requirements
- No code logic changes, only formatting/whitespace
- Fixes CI formatting check failures
2026-02-08 14:42:27 +03:00
yusyus
51787e57bc style: Fix 411 ruff lint issues (Kimi's issue #4)
Auto-fixed lint issues with ruff --fix and --unsafe-fixes:

Issue #4: Ruff Lint Issues
- Before: 447 errors (originally reported as ~5,500)
- After: 55 errors remaining
- Fixed: 411 errors (92% reduction)

Auto-fixes applied:
- 156 UP006: List/Dict → list/dict (PEP 585)
- 63 UP045: Optional[X] → X | None (PEP 604)
- 52 F401: Removed unused imports
- 52 UP035: Fixed deprecated imports
- 34 E712: True/False comparisons → not/bool()
- 17 F841: Removed unused variables
- Plus 37 other auto-fixable issues

Remaining 55 errors (non-critical):
- 39 B904: Exception chaining (best practice)
- 5 F401: Unused imports (edge cases)
- 3 SIM105: Could use contextlib.suppress
- 8 other minor style issues

These remaining issues are code quality improvements, not critical bugs.

Result: Code quality significantly improved (92% of linting issues resolved)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 12:46:38 +03:00
yusyus
4f9a5a553b feat: Phase 2 - Real upload capabilities for ChromaDB and Weaviate
Implemented complete upload functionality for vector databases, replacing
stub implementations with real upload capabilities including embedding
generation, multiple connection modes, and comprehensive error handling.

## ChromaDB Upload (chroma.py)
-  Multiple connection modes (PersistentClient, HttpClient)
-  3 embedding strategies (OpenAI, sentence-transformers, default)
-  Batch processing (100 docs per batch)
-  Progress tracking for large uploads
-  Collection management (create if not exists)

## Weaviate Upload (weaviate.py)
-  Local and cloud connections
-  Schema management (auto-create)
-  Batch upload with progress tracking
-  OpenAI embedding support

## Upload Command (upload_skill.py)
-  Added 8 new CLI arguments for vector DBs
-  Platform-specific kwargs handling
-  Enhanced output formatting (collection/class names)
-  Backward compatibility (LLM platforms unchanged)

## Dependencies (pyproject.toml)
-  Added 4 optional dependency groups:
  - chroma = ["chromadb>=0.4.0"]
  - weaviate = ["weaviate-client>=3.25.0"]
  - sentence-transformers = ["sentence-transformers>=2.2.0"]
  - rag-upload = [all vector DB deps]

## Testing (test_upload_integration.py)
-  15 new tests across 4 test classes
-  Works without optional dependencies installed
-  Error handling tests (missing files, invalid JSON)
-  Fixed 2 existing tests (chroma/weaviate adaptors)
-  37/37 tests passing

## User-Facing Examples

Local ChromaDB:
  skill-seekers upload output/react-chroma.json --target chroma \
    --persist-directory ./chroma_db

Weaviate Cloud:
  skill-seekers upload output/react-weaviate.json --target weaviate \
    --use-cloud --cluster-url https://xxx.weaviate.network

With OpenAI embeddings:
  skill-seekers upload output/react-chroma.json --target chroma \
    --embedding-function openai --openai-api-key $OPENAI_API_KEY

## Files Changed
- src/skill_seekers/cli/adaptors/chroma.py (250 lines)
- src/skill_seekers/cli/adaptors/weaviate.py (200 lines)
- src/skill_seekers/cli/upload_skill.py (50 lines)
- pyproject.toml (15 lines)
- tests/test_upload_integration.py (NEW - 293 lines)
- tests/test_adaptors/test_chroma_adaptor.py (1 line)
- tests/test_adaptors/test_weaviate_adaptor.py (1 line)

Total: 7 files, ~810 lines added/modified

See PHASE2_COMPLETION_SUMMARY.md for detailed documentation.

Time: ~7 hours (estimated 6-8h)
Status:  COMPLETE - Ready for Phase 3

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 01:30:04 +03:00