yusyus
|
064405c052
|
fix: resolve 18 bugs and code quality issues across adaptors, CLI, and chunking pipeline
Bug fixes:
- Fix --var flag silently dropped in create routing (args.workflow_var → args.var)
- Fix double _score_code_quality() call in word scraper
- Add .docx file extension validation in WordToSkillConverter
- Fix weaviate ImportError masked by generic Exception handler
- Fix RAG chunking crash using non-existent converter.output_dir
Chunking pipeline improvements:
- Wire --chunk-overlap-tokens through entire package pipeline
(package_skill → adaptor.package → format_skill_md → _maybe_chunk_content → RAGChunker)
- Add auto-scaling overlap: max(50, chunk_tokens//10) when chunk size is non-default
- Rename --no-preserve-code to --no-preserve-code-blocks (backward-compat alias kept)
- Replace hardcoded 512/50 chunk defaults with DEFAULT_CHUNK_TOKENS/DEFAULT_CHUNK_OVERLAP_TOKENS
constants across all 12 concrete adaptors, rag_chunker, base, and package_skill
Code quality:
- Extract shared _generate_openai_embeddings() and _generate_st_embeddings() to SkillAdaptor
base class, removing ~150 lines of duplication from chroma/weaviate/pinecone
- Add Pinecone adaptor with full upload support (pinecone_adaptor.py)
Tests (14 new):
- chunk_overlap_tokens parameter wiring, auto-scaling overlap, preserve_code_blocks flag
- .docx/.doc/no-extension file validation, --var flag routing E2E
- Embedding method inheritance verification, backward-compatible flag aliases
Docs:
- Update CHANGELOG, CLI_REFERENCE, API_REFERENCE, packaging guide (EN+ZH)
- Update README test count badge (1880+ → 2283+)
All 2283 tests passing, 8 skipped, 0 failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-28 21:57:59 +03:00 |
|
yusyus
|
51787e57bc
|
style: Fix 411 ruff lint issues (Kimi's issue #4)
Auto-fixed lint issues with ruff --fix and --unsafe-fixes:
Issue #4: Ruff Lint Issues
- Before: 447 errors (originally reported as ~5,500)
- After: 55 errors remaining
- Fixed: 411 errors (92% reduction)
Auto-fixes applied:
- 156 UP006: List/Dict → list/dict (PEP 585)
- 63 UP045: Optional[X] → X | None (PEP 604)
- 52 F401: Removed unused imports
- 52 UP035: Fixed deprecated imports
- 34 E712: True/False comparisons → not/bool()
- 17 F841: Removed unused variables
- Plus 37 other auto-fixable issues
Remaining 55 errors (non-critical):
- 39 B904: Exception chaining (best practice)
- 5 F401: Unused imports (edge cases)
- 3 SIM105: Could use contextlib.suppress
- 8 other minor style issues
These remaining issues are code quality improvements, not critical bugs.
Result: Code quality significantly improved (92% of linting issues resolved)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-02-08 12:46:38 +03:00 |
|
yusyus
|
4f9a5a553b
|
feat: Phase 2 - Real upload capabilities for ChromaDB and Weaviate
Implemented complete upload functionality for vector databases, replacing
stub implementations with real upload capabilities including embedding
generation, multiple connection modes, and comprehensive error handling.
## ChromaDB Upload (chroma.py)
- ✅ Multiple connection modes (PersistentClient, HttpClient)
- ✅ 3 embedding strategies (OpenAI, sentence-transformers, default)
- ✅ Batch processing (100 docs per batch)
- ✅ Progress tracking for large uploads
- ✅ Collection management (create if not exists)
## Weaviate Upload (weaviate.py)
- ✅ Local and cloud connections
- ✅ Schema management (auto-create)
- ✅ Batch upload with progress tracking
- ✅ OpenAI embedding support
## Upload Command (upload_skill.py)
- ✅ Added 8 new CLI arguments for vector DBs
- ✅ Platform-specific kwargs handling
- ✅ Enhanced output formatting (collection/class names)
- ✅ Backward compatibility (LLM platforms unchanged)
## Dependencies (pyproject.toml)
- ✅ Added 4 optional dependency groups:
- chroma = ["chromadb>=0.4.0"]
- weaviate = ["weaviate-client>=3.25.0"]
- sentence-transformers = ["sentence-transformers>=2.2.0"]
- rag-upload = [all vector DB deps]
## Testing (test_upload_integration.py)
- ✅ 15 new tests across 4 test classes
- ✅ Works without optional dependencies installed
- ✅ Error handling tests (missing files, invalid JSON)
- ✅ Fixed 2 existing tests (chroma/weaviate adaptors)
- ✅ 37/37 tests passing
## User-Facing Examples
Local ChromaDB:
skill-seekers upload output/react-chroma.json --target chroma \
--persist-directory ./chroma_db
Weaviate Cloud:
skill-seekers upload output/react-weaviate.json --target weaviate \
--use-cloud --cluster-url https://xxx.weaviate.network
With OpenAI embeddings:
skill-seekers upload output/react-chroma.json --target chroma \
--embedding-function openai --openai-api-key $OPENAI_API_KEY
## Files Changed
- src/skill_seekers/cli/adaptors/chroma.py (250 lines)
- src/skill_seekers/cli/adaptors/weaviate.py (200 lines)
- src/skill_seekers/cli/upload_skill.py (50 lines)
- pyproject.toml (15 lines)
- tests/test_upload_integration.py (NEW - 293 lines)
- tests/test_adaptors/test_chroma_adaptor.py (1 line)
- tests/test_adaptors/test_weaviate_adaptor.py (1 line)
Total: 7 files, ~810 lines added/modified
See PHASE2_COMPLETION_SUMMARY.md for detailed documentation.
Time: ~7 hours (estimated 6-8h)
Status: ✅ COMPLETE - Ready for Phase 3
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-02-08 01:30:04 +03:00 |
|