Created complete working examples for all 4 vector databases with RAG adaptors: Weaviate Example: - Comprehensive README with hybrid search guide - 3 Python scripts (generate, upload, query) - Sample outputs and query results - Covers hybrid search, filtering, schema design Chroma Example: - Simple, local-first approach - In-memory and persistent storage options - Semantic search and metadata filtering - Comparison with Weaviate FAISS Example: - Facebook AI Similarity Search integration - OpenAI embeddings generation - Index building and persistence - Performance-focused for scale Qdrant Example: - Advanced filtering capabilities - Production-ready features - Complex query patterns - Rust-based performance Each example includes: - Detailed README with setup and troubleshooting - requirements.txt with dependencies - 3 working Python scripts - Sample outputs directory Total files: 20 (4 examples × 5 files each) Documentation: 4 comprehensive READMEs (~800 lines total) Phase 2 of optional enhancements complete. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
96 lines
2.2 KiB
Markdown
96 lines
2.2 KiB
Markdown
# FAISS Vector Database Example
|
|
|
|
Facebook AI Similarity Search (FAISS) is a library for efficient similarity search of dense vectors. Perfect for large-scale semantic search.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# 2. Generate skill
|
|
python 1_generate_skill.py
|
|
|
|
# 3. Build FAISS index (requires OpenAI API key)
|
|
export OPENAI_API_KEY=sk-...
|
|
python 2_build_faiss_index.py
|
|
|
|
# 4. Query the index
|
|
python 3_query_example.py
|
|
```
|
|
|
|
## What's Different About FAISS?
|
|
|
|
- **No database server**: Pure Python library
|
|
- **Blazing fast**: Optimized C++ implementation
|
|
- **Scales to billions**: Efficient for massive datasets
|
|
- **Requires embeddings**: You must generate vectors (we use OpenAI)
|
|
|
|
## Key Features
|
|
|
|
### Generate Embeddings
|
|
|
|
FAISS doesn't generate embeddings - you must provide them:
|
|
|
|
```python
|
|
from openai import OpenAI
|
|
client = OpenAI()
|
|
|
|
# Generate embedding
|
|
response = client.embeddings.create(
|
|
model="text-embedding-ada-002",
|
|
input="Your text here"
|
|
)
|
|
embedding = response.data[0].embedding # 1536-dim vector
|
|
```
|
|
|
|
### Build Index
|
|
|
|
```python
|
|
import faiss
|
|
import numpy as np
|
|
|
|
# Create index (L2 distance)
|
|
dimension = 1536 # OpenAI ada-002
|
|
index = faiss.IndexFlatL2(dimension)
|
|
|
|
# Add vectors
|
|
vectors = np.array(embeddings).astype('float32')
|
|
index.add(vectors)
|
|
|
|
# Save to disk
|
|
faiss.write_index(index, "skill.index")
|
|
```
|
|
|
|
### Search
|
|
|
|
```python
|
|
# Load index
|
|
index = faiss.read_index("skill.index")
|
|
|
|
# Query (returns distances + indices)
|
|
distances, indices = index.search(query_vector, k=5)
|
|
```
|
|
|
|
## Cost Estimate
|
|
|
|
OpenAI embeddings: ~$0.10 per 1M tokens
|
|
- 20 documents (~10K tokens): < $0.001
|
|
- 1000 documents (~500K tokens): ~$0.05
|
|
|
|
## Files Structure
|
|
|
|
- `1_generate_skill.py` - Package for FAISS
|
|
- `2_build_faiss_index.py` - Generate embeddings & build index
|
|
- `3_query_example.py` - Search queries
|
|
|
|
## Resources
|
|
|
|
- **FAISS GitHub**: https://github.com/facebookresearch/faiss
|
|
- **FAISS Wiki**: https://github.com/facebookresearch/faiss/wiki
|
|
- **OpenAI Embeddings**: https://platform.openai.com/docs/guides/embeddings
|
|
|
|
---
|
|
|
|
**Note**: FAISS is best for advanced users who need maximum performance at scale. For simpler use cases, try ChromaDB or Weaviate.
|