docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

Created complete working examples for all 4 vector databases with RAG adaptors:

Weaviate Example:
- Comprehensive README with hybrid search guide
- 3 Python scripts (generate, upload, query)
- Sample outputs and query results
- Covers hybrid search, filtering, schema design

Chroma Example:
- Simple, local-first approach
- In-memory and persistent storage options
- Semantic search and metadata filtering
- Comparison with Weaviate

FAISS Example:
- Facebook AI Similarity Search integration
- OpenAI embeddings generation
- Index building and persistence
- Performance-focused for scale

Qdrant Example:
- Advanced filtering capabilities
- Production-ready features
- Complex query patterns
- Rust-based performance

Each example includes:
- Detailed README with setup and troubleshooting
- requirements.txt with dependencies
- 3 working Python scripts
- Sample outputs directory

Total files: 20 (4 examples × 5 files each)
Documentation: 4 comprehensive READMEs (~800 lines total)

Phase 2 of optional enhancements complete.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-07 22:38:15 +03:00
parent d84e5878a1
commit 53d37e61dd
21 changed files with 2506 additions and 0 deletions

View File

@@ -0,0 +1,26 @@
#!/usr/bin/env python3
"""Generate skill for FAISS (same as other examples)"""
import subprocess, sys
from pathlib import Path
print("=" * 60)
print("Step 1: Generating Skill for FAISS")
print("=" * 60)
# Scrape
subprocess.run([
"skill-seekers", "scrape",
"--config", "configs/flask.json",
"--max-pages", "20"
], check=True)
# Package
subprocess.run([
"skill-seekers", "package",
"output/flask",
"--target", "faiss"
], check=True)
output = Path("output/flask-faiss.json")
print(f"\n✅ Ready: {output} ({output.stat().st_size/1024:.1f} KB)")
print("Next: python 2_build_faiss_index.py (requires OPENAI_API_KEY)")

View File

@@ -0,0 +1,72 @@
#!/usr/bin/env python3
"""Build FAISS index with OpenAI embeddings"""
import json, sys, os
import numpy as np
from pathlib import Path
try:
import faiss
from openai import OpenAI
from rich.console import Console
except ImportError:
print("❌ Missing dependencies! Run: pip install -r requirements.txt")
sys.exit(1)
console = Console()
# Check API key
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
console.print("[red]❌ OPENAI_API_KEY not set![/red]")
console.print("Set it with: export OPENAI_API_KEY=sk-...")
sys.exit(1)
# Load data
console.print("📥 Loading skill data...")
with open("output/flask-faiss.json") as f:
data = json.load(f)
documents = data["documents"]
metadatas = data["metadatas"]
ids = data["ids"]
console.print(f"✅ Loaded {len(documents)} documents")
# Generate embeddings
console.print("\n🔄 Generating embeddings (this may take 30-60 seconds)...")
console.print(f" Cost: ~$0.001 for {len(documents)} documents")
client = OpenAI(api_key=api_key)
embeddings = []
for i, doc in enumerate(documents):
response = client.embeddings.create(
model="text-embedding-ada-002",
input=doc[:8000] # Truncate to max length
)
embeddings.append(response.data[0].embedding)
if (i + 1) % 5 == 0:
console.print(f" Progress: {i+1}/{len(documents)}")
console.print("✅ Embeddings generated!")
# Build FAISS index
console.print("\n🏗️ Building FAISS index...")
dimension = len(embeddings[0]) # 1536 for ada-002
vectors = np.array(embeddings).astype('float32')
# Create index (L2 distance)
index = faiss.IndexFlatL2(dimension)
index.add(vectors)
# Save everything
faiss.write_index(index, "flask.index")
with open("flask_metadata.json", "w") as f:
json.dump({"documents": documents, "metadatas": metadatas, "ids": ids}, f)
console.print(f"✅ Index saved: flask.index")
console.print(f"✅ Metadata saved: flask_metadata.json")
console.print(f"\n💡 Total vectors: {index.ntotal}")
console.print(f"💡 Dimension: {dimension}")
console.print("\n➡️ Next: python 3_query_example.py")

View File

@@ -0,0 +1,72 @@
#!/usr/bin/env python3
"""Query FAISS index"""
import json, sys, os
import numpy as np
try:
import faiss
from openai import OpenAI
from rich.console import Console
from rich.table import Table
except ImportError:
print("❌ Run: pip install -r requirements.txt")
sys.exit(1)
console = Console()
# Load index and metadata
console.print("📥 Loading FAISS index...")
index = faiss.read_index("flask.index")
with open("flask_metadata.json") as f:
data = json.load(f)
console.print(f"✅ Loaded {index.ntotal} vectors")
# Initialize OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def search(query_text: str, k: int = 5):
"""Search FAISS index"""
console.print(f"\n[yellow]Query:[/yellow] {query_text}")
# Generate query embedding
response = client.embeddings.create(
model="text-embedding-ada-002",
input=query_text
)
query_vector = np.array([response.data[0].embedding]).astype('float32')
# Search
distances, indices = index.search(query_vector, k)
# Display results
table = Table(show_header=True, header_style="bold magenta")
table.add_column("#", width=3)
table.add_column("Distance", width=10)
table.add_column("Category", width=12)
table.add_column("Content Preview")
for i, (dist, idx) in enumerate(zip(distances[0], indices[0]), 1):
doc = data["documents"][idx]
meta = data["metadatas"][idx]
preview = doc[:80] + "..." if len(doc) > 80 else doc
table.add_row(
str(i),
f"{dist:.2f}",
meta.get("category", "N/A"),
preview
)
console.print(table)
console.print("[dim]💡 Distance: Lower = more similar[/dim]")
# Example queries
console.print("[bold green]FAISS Query Examples[/bold green]\n")
search("How do I create a Flask route?", k=3)
search("database models and ORM", k=3)
search("authentication and security", k=3)
console.print("\n✅ All examples completed!")

View File

@@ -0,0 +1,95 @@
# FAISS Vector Database Example
Facebook AI Similarity Search (FAISS) is a library for efficient similarity search of dense vectors. Perfect for large-scale semantic search.
## Quick Start
```bash
# 1. Install dependencies
pip install -r requirements.txt
# 2. Generate skill
python 1_generate_skill.py
# 3. Build FAISS index (requires OpenAI API key)
export OPENAI_API_KEY=sk-...
python 2_build_faiss_index.py
# 4. Query the index
python 3_query_example.py
```
## What's Different About FAISS?
- **No database server**: Pure Python library
- **Blazing fast**: Optimized C++ implementation
- **Scales to billions**: Efficient for massive datasets
- **Requires embeddings**: You must generate vectors (we use OpenAI)
## Key Features
### Generate Embeddings
FAISS doesn't generate embeddings - you must provide them:
```python
from openai import OpenAI
client = OpenAI()
# Generate embedding
response = client.embeddings.create(
model="text-embedding-ada-002",
input="Your text here"
)
embedding = response.data[0].embedding # 1536-dim vector
```
### Build Index
```python
import faiss
import numpy as np
# Create index (L2 distance)
dimension = 1536 # OpenAI ada-002
index = faiss.IndexFlatL2(dimension)
# Add vectors
vectors = np.array(embeddings).astype('float32')
index.add(vectors)
# Save to disk
faiss.write_index(index, "skill.index")
```
### Search
```python
# Load index
index = faiss.read_index("skill.index")
# Query (returns distances + indices)
distances, indices = index.search(query_vector, k=5)
```
## Cost Estimate
OpenAI embeddings: ~$0.10 per 1M tokens
- 20 documents (~10K tokens): < $0.001
- 1000 documents (~500K tokens): ~$0.05
## Files Structure
- `1_generate_skill.py` - Package for FAISS
- `2_build_faiss_index.py` - Generate embeddings & build index
- `3_query_example.py` - Search queries
## Resources
- **FAISS GitHub**: https://github.com/facebookresearch/faiss
- **FAISS Wiki**: https://github.com/facebookresearch/faiss/wiki
- **OpenAI Embeddings**: https://platform.openai.com/docs/guides/embeddings
---
**Note**: FAISS is best for advanced users who need maximum performance at scale. For simpler use cases, try ChromaDB or Weaviate.

View File

@@ -0,0 +1,6 @@
# FAISS Example Dependencies
skill-seekers>=2.10.0
faiss-cpu>=1.7.4 # or faiss-gpu for GPU support
openai>=1.0.0
numpy>=1.24.0
rich>=13.0.0