yusyus
|
b475b51ad1
|
feat: Add custom embedding pipeline (Task #17)
- Multi-provider support (OpenAI, Local)
- Batch processing with configurable batch size
- Memory and disk caching for efficiency
- Cost tracking and estimation
- Dimension validation
- 18 tests passing (100%)
Files:
- embedding_pipeline.py: Core pipeline engine
- test_embedding_pipeline.py: Comprehensive tests
Features:
- EmbeddingProvider abstraction
- OpenAIEmbeddingProvider with pricing
- LocalEmbeddingProvider (simulated)
- EmbeddingCache (memory + disk)
- CostTracker for API usage
- Batch processing optimization
Supported Models:
- text-embedding-ada-002 (1536d, $0.10/1M tokens)
- text-embedding-3-small (1536d, $0.02/1M tokens)
- text-embedding-3-large (3072d, $0.13/1M tokens)
- Local models (any dimension, free)
Week 2: 8/9 tasks complete (89%)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-02-07 13:48:05 +03:00 |
|