feat: Phase 2 - Real upload capabilities for ChromaDB and Weaviate

Implemented complete upload functionality for vector databases, replacing
stub implementations with real upload capabilities including embedding
generation, multiple connection modes, and comprehensive error handling.

## ChromaDB Upload (chroma.py)
-  Multiple connection modes (PersistentClient, HttpClient)
-  3 embedding strategies (OpenAI, sentence-transformers, default)
-  Batch processing (100 docs per batch)
-  Progress tracking for large uploads
-  Collection management (create if not exists)

## Weaviate Upload (weaviate.py)
-  Local and cloud connections
-  Schema management (auto-create)
-  Batch upload with progress tracking
-  OpenAI embedding support

## Upload Command (upload_skill.py)
-  Added 8 new CLI arguments for vector DBs
-  Platform-specific kwargs handling
-  Enhanced output formatting (collection/class names)
-  Backward compatibility (LLM platforms unchanged)

## Dependencies (pyproject.toml)
-  Added 4 optional dependency groups:
  - chroma = ["chromadb>=0.4.0"]
  - weaviate = ["weaviate-client>=3.25.0"]
  - sentence-transformers = ["sentence-transformers>=2.2.0"]
  - rag-upload = [all vector DB deps]

## Testing (test_upload_integration.py)
-  15 new tests across 4 test classes
-  Works without optional dependencies installed
-  Error handling tests (missing files, invalid JSON)
-  Fixed 2 existing tests (chroma/weaviate adaptors)
-  37/37 tests passing

## User-Facing Examples

Local ChromaDB:
  skill-seekers upload output/react-chroma.json --target chroma \
    --persist-directory ./chroma_db

Weaviate Cloud:
  skill-seekers upload output/react-weaviate.json --target weaviate \
    --use-cloud --cluster-url https://xxx.weaviate.network

With OpenAI embeddings:
  skill-seekers upload output/react-chroma.json --target chroma \
    --embedding-function openai --openai-api-key $OPENAI_API_KEY

## Files Changed
- src/skill_seekers/cli/adaptors/chroma.py (250 lines)
- src/skill_seekers/cli/adaptors/weaviate.py (200 lines)
- src/skill_seekers/cli/upload_skill.py (50 lines)
- pyproject.toml (15 lines)
- tests/test_upload_integration.py (NEW - 293 lines)
- tests/test_adaptors/test_chroma_adaptor.py (1 line)
- tests/test_adaptors/test_weaviate_adaptor.py (1 line)

Total: 7 files, ~810 lines added/modified

See PHASE2_COMPLETION_SUMMARY.md for detailed documentation.

Time: ~7 hours (estimated 6-8h)
Status:  COMPLETE - Ready for Phase 3

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-08 01:30:04 +03:00
parent 59e77f42b3
commit 4f9a5a553b
7 changed files with 782 additions and 243 deletions

View File

@@ -106,6 +106,25 @@ azure = [
"azure-storage-blob>=12.19.0",
]
# RAG vector database upload support
chroma = [
"chromadb>=0.4.0",
]
weaviate = [
"weaviate-client>=3.25.0",
]
sentence-transformers = [
"sentence-transformers>=2.2.0",
]
rag-upload = [
"chromadb>=0.4.0",
"weaviate-client>=3.25.0",
"sentence-transformers>=2.2.0",
]
# All cloud storage providers combined
all-cloud = [
"boto3>=1.34.0",
@@ -135,6 +154,8 @@ all = [
"boto3>=1.34.0",
"google-cloud-storage>=2.10.0",
"azure-storage-blob>=12.19.0",
"chromadb>=0.4.0",
"weaviate-client>=3.25.0",
"fastapi>=0.109.0",
"sentence-transformers>=2.3.0",
"numpy>=1.24.0",