feat: Add FAISS similarity search adaptor (Task #12)
🎯 What's New - FAISS adaptor for efficient similarity search - JSON-based metadata management (secure & portable) - Comprehensive usage examples with 3 index types - Supports dynamic document addition and filtered search 📦 Implementation Details FAISS (Facebook AI Similarity Search) is a library for efficient similarity search but requires separate metadata management. Unlike Weaviate/Chroma, FAISS doesn't have built-in metadata support, so we store it separately as JSON. **Key Components:** - src/skill_seekers/cli/adaptors/faiss_helpers.py (399 lines) - FAISSHelpers class inheriting from SkillAdaptor - _generate_id(): Deterministic ID from content hash (MD5) - format_skill_md(): Converts docs to FAISS-compatible JSON - package(): Creates JSON with documents, metadatas, ids, config - upload(): Provides comprehensive example code (370 lines) **Output Format:** { "documents": ["doc1", "doc2", ...], "metadatas": [{"source": "...", "category": "..."}, ...], "ids": ["hash1", "hash2", ...], "config": { "index_type": "IndexFlatL2", "dimension": 1536, "metric": "L2" } } **Security Consideration:** - Uses JSON instead of pickle for metadata storage - Avoids arbitrary code execution risk - More portable and human-readable **Example Code Includes:** 1. Loading JSON data and generating embeddings (OpenAI ada-002) 2. Creating FAISS index with 3 options: - IndexFlatL2 (exact search, <1M vectors) - IndexIVFFlat (fast approximate, >100k vectors) - IndexHNSWFlat (graph-based, very fast) 3. Saving index + JSON metadata separately 4. Search with metadata filtering (post-processing) 5. Loading saved index for reuse 6. Adding new documents dynamically 🔧 Files Changed - src/skill_seekers/cli/adaptors/__init__.py - Added FAISSHelpers import - Registered 'faiss' in ADAPTORS dict - src/skill_seekers/cli/package_skill.py - Added 'faiss' to --target choices - src/skill_seekers/cli/main.py - Added 'faiss' to unified CLI --target choices ✅ Testing - Tested with ansible skill: skill-seekers-package output/ansible --target faiss - Verified JSON structure with jq - Output: ansible-faiss.json (9.7 KB, 1 document) - Package size: 9,717 bytes (9.5 KB) 📊 Week 2 Progress: 3/9 tasks complete Task #12 Complete ✅ - Weaviate (Task #10) ✅ - Chroma (Task #11) ✅ - FAISS (Task #12) ✅ ← Just completed Next: Task #13 (Qdrant adaptor) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -155,7 +155,7 @@ Examples:
|
||||
|
||||
parser.add_argument(
|
||||
"--target",
|
||||
choices=["claude", "gemini", "openai", "markdown", "langchain", "llama-index", "weaviate", "chroma"],
|
||||
choices=["claude", "gemini", "openai", "markdown", "langchain", "llama-index", "weaviate", "chroma", "faiss"],
|
||||
default="claude",
|
||||
help="Target LLM platform (default: claude)",
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user