yusyus
|
ff4196897b
|
feat: Add FAISS similarity search adaptor (Task #12)
🎯 What's New
- FAISS adaptor for efficient similarity search
- JSON-based metadata management (secure & portable)
- Comprehensive usage examples with 3 index types
- Supports dynamic document addition and filtered search
📦 Implementation Details
FAISS (Facebook AI Similarity Search) is a library for efficient similarity
search but requires separate metadata management. Unlike Weaviate/Chroma,
FAISS doesn't have built-in metadata support, so we store it separately as JSON.
**Key Components:**
- src/skill_seekers/cli/adaptors/faiss_helpers.py (399 lines)
- FAISSHelpers class inheriting from SkillAdaptor
- _generate_id(): Deterministic ID from content hash (MD5)
- format_skill_md(): Converts docs to FAISS-compatible JSON
- package(): Creates JSON with documents, metadatas, ids, config
- upload(): Provides comprehensive example code (370 lines)
**Output Format:**
{
"documents": ["doc1", "doc2", ...],
"metadatas": [{"source": "...", "category": "..."}, ...],
"ids": ["hash1", "hash2", ...],
"config": {
"index_type": "IndexFlatL2",
"dimension": 1536,
"metric": "L2"
}
}
**Security Consideration:**
- Uses JSON instead of pickle for metadata storage
- Avoids arbitrary code execution risk
- More portable and human-readable
**Example Code Includes:**
1. Loading JSON data and generating embeddings (OpenAI ada-002)
2. Creating FAISS index with 3 options:
- IndexFlatL2 (exact search, <1M vectors)
- IndexIVFFlat (fast approximate, >100k vectors)
- IndexHNSWFlat (graph-based, very fast)
3. Saving index + JSON metadata separately
4. Search with metadata filtering (post-processing)
5. Loading saved index for reuse
6. Adding new documents dynamically
🔧 Files Changed
- src/skill_seekers/cli/adaptors/__init__.py
- Added FAISSHelpers import
- Registered 'faiss' in ADAPTORS dict
- src/skill_seekers/cli/package_skill.py
- Added 'faiss' to --target choices
- src/skill_seekers/cli/main.py
- Added 'faiss' to unified CLI --target choices
✅ Testing
- Tested with ansible skill: skill-seekers-package output/ansible --target faiss
- Verified JSON structure with jq
- Output: ansible-faiss.json (9.7 KB, 1 document)
- Package size: 9,717 bytes (9.5 KB)
📊 Week 2 Progress: 3/9 tasks complete
Task #12 Complete ✅
- Weaviate (Task #10) ✅
- Chroma (Task #11) ✅
- FAISS (Task #12) ✅ ← Just completed
Next: Task #13 (Qdrant adaptor)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-02-05 23:47:42 +03:00 |
|