Files
yusyus 53d37e61dd docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)
Created complete working examples for all 4 vector databases with RAG adaptors:

Weaviate Example:
- Comprehensive README with hybrid search guide
- 3 Python scripts (generate, upload, query)
- Sample outputs and query results
- Covers hybrid search, filtering, schema design

Chroma Example:
- Simple, local-first approach
- In-memory and persistent storage options
- Semantic search and metadata filtering
- Comparison with Weaviate

FAISS Example:
- Facebook AI Similarity Search integration
- OpenAI embeddings generation
- Index building and persistence
- Performance-focused for scale

Qdrant Example:
- Advanced filtering capabilities
- Production-ready features
- Complex query patterns
- Rust-based performance

Each example includes:
- Detailed README with setup and troubleshooting
- requirements.txt with dependencies
- 3 working Python scripts
- Sample outputs directory

Total files: 20 (4 examples × 5 files each)
Documentation: 4 comprehensive READMEs (~800 lines total)

Phase 2 of optional enhancements complete.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-07 22:38:15 +03:00

96 lines
2.2 KiB
Markdown

# FAISS Vector Database Example
Facebook AI Similarity Search (FAISS) is a library for efficient similarity search of dense vectors. Perfect for large-scale semantic search.
## Quick Start
```bash
# 1. Install dependencies
pip install -r requirements.txt
# 2. Generate skill
python 1_generate_skill.py
# 3. Build FAISS index (requires OpenAI API key)
export OPENAI_API_KEY=sk-...
python 2_build_faiss_index.py
# 4. Query the index
python 3_query_example.py
```
## What's Different About FAISS?
- **No database server**: Pure Python library
- **Blazing fast**: Optimized C++ implementation
- **Scales to billions**: Efficient for massive datasets
- **Requires embeddings**: You must generate vectors (we use OpenAI)
## Key Features
### Generate Embeddings
FAISS doesn't generate embeddings - you must provide them:
```python
from openai import OpenAI
client = OpenAI()
# Generate embedding
response = client.embeddings.create(
model="text-embedding-ada-002",
input="Your text here"
)
embedding = response.data[0].embedding # 1536-dim vector
```
### Build Index
```python
import faiss
import numpy as np
# Create index (L2 distance)
dimension = 1536 # OpenAI ada-002
index = faiss.IndexFlatL2(dimension)
# Add vectors
vectors = np.array(embeddings).astype('float32')
index.add(vectors)
# Save to disk
faiss.write_index(index, "skill.index")
```
### Search
```python
# Load index
index = faiss.read_index("skill.index")
# Query (returns distances + indices)
distances, indices = index.search(query_vector, k=5)
```
## Cost Estimate
OpenAI embeddings: ~$0.10 per 1M tokens
- 20 documents (~10K tokens): < $0.001
- 1000 documents (~500K tokens): ~$0.05
## Files Structure
- `1_generate_skill.py` - Package for FAISS
- `2_build_faiss_index.py` - Generate embeddings & build index
- `3_query_example.py` - Search queries
## Resources
- **FAISS GitHub**: https://github.com/facebookresearch/faiss
- **FAISS Wiki**: https://github.com/facebookresearch/faiss/wiki
- **OpenAI Embeddings**: https://platform.openai.com/docs/guides/embeddings
---
**Note**: FAISS is best for advanced users who need maximum performance at scale. For simpler use cases, try ChromaDB or Weaviate.