docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

Created complete working examples for all 4 vector databases with RAG adaptors: Weaviate Example: - Comprehensive README with hybrid search guide - 3 Python scripts (generate, upload, query) - Sample outputs and query results - Covers hybrid search, filtering, schema design Chroma Example: - Simple, local-first approach - In-memory and persistent storage options - Semantic search and metadata filtering - Comparison with Weaviate FAISS Example: - Facebook AI Similarity Search integration - OpenAI embeddings generation - Index building and persistence - Performance-focused for scale Qdrant Example: - Advanced filtering capabilities - Production-ready features - Complex query patterns - Rust-based performance Each example includes: - Detailed README with setup and troubleshooting - requirements.txt with dependencies - 3 working Python scripts - Sample outputs directory Total files: 20 (4 examples × 5 files each) Documentation: 4 comprehensive READMEs (~800 lines total) Phase 2 of optional enhancements complete. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-07 22:38:15 +03:00
parent d84e5878a1
commit 53d37e61dd
21 changed files with 2506 additions and 0 deletions
--- a/examples/faiss-example/README.md
+++ b/examples/faiss-example/README.md
@@ -0,0 +1,95 @@
+# FAISS Vector Database Example
+
+Facebook AI Similarity Search (FAISS) is a library for efficient similarity search of dense vectors. Perfect for large-scale semantic search.
+
+## Quick Start
+
+```bash
+# 1. Install dependencies
+pip install -r requirements.txt
+
+# 2. Generate skill
+python 1_generate_skill.py
+
+# 3. Build FAISS index (requires OpenAI API key)
+export OPENAI_API_KEY=sk-...
+python 2_build_faiss_index.py
+
+# 4. Query the index
+python 3_query_example.py
+```
+
+## What's Different About FAISS?
+
+- **No database server**: Pure Python library
+- **Blazing fast**: Optimized C++ implementation
+- **Scales to billions**: Efficient for massive datasets
+- **Requires embeddings**: You must generate vectors (we use OpenAI)
+
+## Key Features
+
+### Generate Embeddings
+
+FAISS doesn't generate embeddings - you must provide them:
+
+```python
+from openai import OpenAI
+client = OpenAI()
+
+# Generate embedding
+response = client.embeddings.create(
+    model="text-embedding-ada-002",
+    input="Your text here"
+)
+embedding = response.data[0].embedding  # 1536-dim vector
+```
+
+### Build Index
+
+```python
+import faiss
+import numpy as np
+
+# Create index (L2 distance)
+dimension = 1536  # OpenAI ada-002
+index = faiss.IndexFlatL2(dimension)
+
+# Add vectors
+vectors = np.array(embeddings).astype('float32')
+index.add(vectors)
+
+# Save to disk
+faiss.write_index(index, "skill.index")
+```
+
+### Search
+
+```python
+# Load index
+index = faiss.read_index("skill.index")
+
+# Query (returns distances + indices)
+distances, indices = index.search(query_vector, k=5)
+```
+
+## Cost Estimate
+
+OpenAI embeddings: ~$0.10 per 1M tokens
+- 20 documents (~10K tokens): < $0.001
+- 1000 documents (~500K tokens): ~$0.05
+
+## Files Structure
+
+- `1_generate_skill.py` - Package for FAISS
+- `2_build_faiss_index.py` - Generate embeddings & build index
+- `3_query_example.py` - Search queries
+
+## Resources
+
+- **FAISS GitHub**: https://github.com/facebookresearch/faiss
+- **FAISS Wiki**: https://github.com/facebookresearch/faiss/wiki
+- **OpenAI Embeddings**: https://platform.openai.com/docs/guides/embeddings
+
+---
+
+**Note**: FAISS is best for advanced users who need maximum performance at scale. For simpler use cases, try ChromaDB or Weaviate.