Files
skill-seekers-reference/examples/faiss-example
yusyus 53d37e61dd docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)
Created complete working examples for all 4 vector databases with RAG adaptors:

Weaviate Example:
- Comprehensive README with hybrid search guide
- 3 Python scripts (generate, upload, query)
- Sample outputs and query results
- Covers hybrid search, filtering, schema design

Chroma Example:
- Simple, local-first approach
- In-memory and persistent storage options
- Semantic search and metadata filtering
- Comparison with Weaviate

FAISS Example:
- Facebook AI Similarity Search integration
- OpenAI embeddings generation
- Index building and persistence
- Performance-focused for scale

Qdrant Example:
- Advanced filtering capabilities
- Production-ready features
- Complex query patterns
- Rust-based performance

Each example includes:
- Detailed README with setup and troubleshooting
- requirements.txt with dependencies
- 3 working Python scripts
- Sample outputs directory

Total files: 20 (4 examples × 5 files each)
Documentation: 4 comprehensive READMEs (~800 lines total)

Phase 2 of optional enhancements complete.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-07 22:38:15 +03:00
..

FAISS Vector Database Example

Facebook AI Similarity Search (FAISS) is a library for efficient similarity search of dense vectors. Perfect for large-scale semantic search.

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Generate skill
python 1_generate_skill.py

# 3. Build FAISS index (requires OpenAI API key)
export OPENAI_API_KEY=sk-...
python 2_build_faiss_index.py

# 4. Query the index
python 3_query_example.py

What's Different About FAISS?

  • No database server: Pure Python library
  • Blazing fast: Optimized C++ implementation
  • Scales to billions: Efficient for massive datasets
  • Requires embeddings: You must generate vectors (we use OpenAI)

Key Features

Generate Embeddings

FAISS doesn't generate embeddings - you must provide them:

from openai import OpenAI
client = OpenAI()

# Generate embedding
response = client.embeddings.create(
    model="text-embedding-ada-002",
    input="Your text here"
)
embedding = response.data[0].embedding  # 1536-dim vector

Build Index

import faiss
import numpy as np

# Create index (L2 distance)
dimension = 1536  # OpenAI ada-002
index = faiss.IndexFlatL2(dimension)

# Add vectors
vectors = np.array(embeddings).astype('float32')
index.add(vectors)

# Save to disk
faiss.write_index(index, "skill.index")
# Load index
index = faiss.read_index("skill.index")

# Query (returns distances + indices)
distances, indices = index.search(query_vector, k=5)

Cost Estimate

OpenAI embeddings: ~$0.10 per 1M tokens

  • 20 documents (~10K tokens): < $0.001
  • 1000 documents (~500K tokens): ~$0.05

Files Structure

  • 1_generate_skill.py - Package for FAISS
  • 2_build_faiss_index.py - Generate embeddings & build index
  • 3_query_example.py - Search queries

Resources


Note: FAISS is best for advanced users who need maximum performance at scale. For simpler use cases, try ChromaDB or Weaviate.