Created complete working examples for all 4 vector databases with RAG adaptors: Weaviate Example: - Comprehensive README with hybrid search guide - 3 Python scripts (generate, upload, query) - Sample outputs and query results - Covers hybrid search, filtering, schema design Chroma Example: - Simple, local-first approach - In-memory and persistent storage options - Semantic search and metadata filtering - Comparison with Weaviate FAISS Example: - Facebook AI Similarity Search integration - OpenAI embeddings generation - Index building and persistence - Performance-focused for scale Qdrant Example: - Advanced filtering capabilities - Production-ready features - Complex query patterns - Rust-based performance Each example includes: - Detailed README with setup and troubleshooting - requirements.txt with dependencies - 3 working Python scripts - Sample outputs directory Total files: 20 (4 examples × 5 files each) Documentation: 4 comprehensive READMEs (~800 lines total) Phase 2 of optional enhancements complete. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
340 lines
8.1 KiB
Markdown
340 lines
8.1 KiB
Markdown
# Weaviate Vector Database Example
|
|
|
|
This example demonstrates how to use Skill Seekers with Weaviate, a powerful vector database with hybrid search capabilities (keyword + semantic).
|
|
|
|
## What You'll Learn
|
|
|
|
- How to generate skills in Weaviate format
|
|
- How to create a Weaviate schema and upload data
|
|
- How to perform hybrid searches (keyword + vector)
|
|
- How to filter by metadata categories
|
|
|
|
## Prerequisites
|
|
|
|
### 1. Weaviate Instance
|
|
|
|
**Option A: Weaviate Cloud (Recommended for production)**
|
|
- Sign up at https://console.weaviate.cloud/
|
|
- Create a free sandbox cluster
|
|
- Get your cluster URL and API key
|
|
|
|
**Option B: Local Docker (Recommended for development)**
|
|
```bash
|
|
docker run -d \
|
|
--name weaviate \
|
|
-p 8080:8080 \
|
|
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
|
|
-e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
|
|
semitechnologies/weaviate:latest
|
|
```
|
|
|
|
### 2. Python Dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Step-by-Step Guide
|
|
|
|
### Step 1: Generate Skill from Documentation
|
|
|
|
First, we'll scrape React documentation and package it for Weaviate:
|
|
|
|
```bash
|
|
python 1_generate_skill.py
|
|
```
|
|
|
|
This script will:
|
|
1. Scrape React docs (limited to 20 pages for demo)
|
|
2. Package the skill in Weaviate format (JSON with schema + objects)
|
|
3. Save to `sample_output/react-weaviate.json`
|
|
|
|
**Expected Output:**
|
|
```
|
|
✅ Weaviate data packaged successfully!
|
|
📦 Output: output/react-weaviate.json
|
|
📊 Total objects: 21
|
|
📂 Categories: overview (1), guides (8), api (12)
|
|
```
|
|
|
|
**What's in the JSON?**
|
|
```json
|
|
{
|
|
"schema": {
|
|
"class": "React",
|
|
"description": "React documentation skill",
|
|
"properties": [
|
|
{"name": "content", "dataType": ["text"]},
|
|
{"name": "source", "dataType": ["text"]},
|
|
{"name": "category", "dataType": ["text"]},
|
|
...
|
|
]
|
|
},
|
|
"objects": [
|
|
{
|
|
"id": "uuid-here",
|
|
"properties": {
|
|
"content": "React is a JavaScript library...",
|
|
"source": "react",
|
|
"category": "overview",
|
|
...
|
|
}
|
|
}
|
|
],
|
|
"class_name": "React"
|
|
}
|
|
```
|
|
|
|
### Step 2: Upload to Weaviate
|
|
|
|
Now we'll create the schema and upload all objects to Weaviate:
|
|
|
|
```bash
|
|
python 2_upload_to_weaviate.py
|
|
```
|
|
|
|
**For local Docker:**
|
|
```bash
|
|
python 2_upload_to_weaviate.py --url http://localhost:8080
|
|
```
|
|
|
|
**For Weaviate Cloud:**
|
|
```bash
|
|
python 2_upload_to_weaviate.py \
|
|
--url https://your-cluster.weaviate.network \
|
|
--api-key YOUR_API_KEY
|
|
```
|
|
|
|
This script will:
|
|
1. Connect to your Weaviate instance
|
|
2. Create the schema (class + properties)
|
|
3. Batch upload all objects
|
|
4. Verify the upload was successful
|
|
|
|
**Expected Output:**
|
|
```
|
|
🔗 Connecting to Weaviate at http://localhost:8080...
|
|
✅ Weaviate is ready!
|
|
|
|
📊 Creating schema: React
|
|
✅ Schema created successfully!
|
|
|
|
📤 Uploading 21 objects in batches...
|
|
✅ Batch 1/1 uploaded (21 objects)
|
|
|
|
✅ Successfully uploaded 21 documents to Weaviate
|
|
🔍 Class 'React' now contains 21 objects
|
|
```
|
|
|
|
### Step 3: Query and Search
|
|
|
|
Now the fun part - querying your knowledge base!
|
|
|
|
```bash
|
|
python 3_query_example.py
|
|
```
|
|
|
|
**For local Docker:**
|
|
```bash
|
|
python 3_query_example.py --url http://localhost:8080
|
|
```
|
|
|
|
**For Weaviate Cloud:**
|
|
```bash
|
|
python 3_query_example.py \
|
|
--url https://your-cluster.weaviate.network \
|
|
--api-key YOUR_API_KEY
|
|
```
|
|
|
|
This script demonstrates:
|
|
1. **Keyword Search**: Traditional text search
|
|
2. **Hybrid Search**: Combines keyword + vector similarity
|
|
3. **Metadata Filtering**: Filter by category
|
|
4. **Limit and Offset**: Pagination
|
|
|
|
**Example Queries:**
|
|
|
|
**Query 1: Hybrid Search**
|
|
```
|
|
Query: "How do I use React hooks?"
|
|
Alpha: 0.5 (50% keyword, 50% vector)
|
|
|
|
Results:
|
|
1. Category: api
|
|
Snippet: Hooks are functions that let you "hook into" React state and lifecycle...
|
|
|
|
2. Category: guides
|
|
Snippet: To use a Hook, you need to call it at the top level of your component...
|
|
```
|
|
|
|
**Query 2: Filter by Category**
|
|
```
|
|
Query: API reference
|
|
Category: api
|
|
|
|
Results:
|
|
1. useState Hook - Manage component state
|
|
2. useEffect Hook - Perform side effects
|
|
3. useContext Hook - Access context values
|
|
```
|
|
|
|
## Understanding Weaviate Features
|
|
|
|
### Hybrid Search (`alpha` parameter)
|
|
|
|
Weaviate's killer feature is hybrid search, which combines:
|
|
- **Keyword Search (BM25)**: Traditional text matching
|
|
- **Vector Search (ANN)**: Semantic similarity
|
|
|
|
Control the balance with `alpha`:
|
|
- `alpha=0`: Pure keyword search (BM25 only)
|
|
- `alpha=0.5`: Balanced (default - recommended)
|
|
- `alpha=1`: Pure vector search (semantic only)
|
|
|
|
**When to use what:**
|
|
- **Exact terms** (API names, error messages): `alpha=0` to `alpha=0.3`
|
|
- **Concepts** (how to do X, why does Y): `alpha=0.7` to `alpha=1`
|
|
- **General queries**: `alpha=0.5` (balanced)
|
|
|
|
### Metadata Filtering
|
|
|
|
Filter results by any property:
|
|
```python
|
|
.with_where({
|
|
"path": ["category"],
|
|
"operator": "Equal",
|
|
"valueText": "api"
|
|
})
|
|
```
|
|
|
|
Supported operators:
|
|
- `Equal`, `NotEqual`
|
|
- `GreaterThan`, `LessThan`
|
|
- `And`, `Or`, `Not`
|
|
|
|
### Schema Design
|
|
|
|
Our schema includes:
|
|
- **content**: The actual documentation text (vectorized)
|
|
- **source**: Skill name (e.g., "react")
|
|
- **category**: Document category (e.g., "api", "guides")
|
|
- **file**: Source file name
|
|
- **type**: Document type ("overview" or "reference")
|
|
- **version**: Skill version
|
|
|
|
## Customization
|
|
|
|
### Generate Your Own Skill
|
|
|
|
Want to use a different documentation source? Easy:
|
|
|
|
```python
|
|
# 1_generate_skill.py (modify line 10)
|
|
"--config", "configs/vue.json", # Change to your config
|
|
```
|
|
|
|
Or scrape from scratch:
|
|
```bash
|
|
skill-seekers scrape --config configs/your_framework.json
|
|
skill-seekers package output/your_framework --target weaviate
|
|
```
|
|
|
|
### Adjust Search Parameters
|
|
|
|
In `3_query_example.py`, modify:
|
|
```python
|
|
# Adjust hybrid search balance
|
|
alpha=0.7 # More semantic, less keyword
|
|
|
|
# Adjust result count
|
|
.with_limit(10) # Get more results
|
|
|
|
# Add more filters
|
|
.with_where({
|
|
"operator": "And",
|
|
"operands": [
|
|
{"path": ["category"], "operator": "Equal", "valueText": "api"},
|
|
{"path": ["type"], "operator": "Equal", "valueText": "reference"}
|
|
]
|
|
})
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Connection Refused
|
|
```
|
|
Error: Connection refused to http://localhost:8080
|
|
```
|
|
|
|
**Solution:** Ensure Weaviate is running:
|
|
```bash
|
|
docker ps | grep weaviate
|
|
# If not running, start it:
|
|
docker start weaviate
|
|
```
|
|
|
|
### Schema Already Exists
|
|
```
|
|
Error: Class 'React' already exists
|
|
```
|
|
|
|
**Solution:** Delete the existing class:
|
|
```bash
|
|
# In Python or using Weaviate API
|
|
client.schema.delete_class("React")
|
|
```
|
|
|
|
Or use the example's built-in reset:
|
|
```bash
|
|
python 2_upload_to_weaviate.py --reset
|
|
```
|
|
|
|
### Empty Results
|
|
```
|
|
Query returned 0 results
|
|
```
|
|
|
|
**Possible causes:**
|
|
1. **No embeddings**: Weaviate needs a vectorizer configured (we use default)
|
|
2. **Wrong class name**: Check the class name matches
|
|
3. **Data not uploaded**: Verify with `client.query.aggregate("React").with_meta_count().do()`
|
|
|
|
**Solution:** Check object count:
|
|
```python
|
|
result = client.query.aggregate("React").with_meta_count().do()
|
|
print(result) # Should show {"data": {"Aggregate": {"React": [{"meta": {"count": 21}}]}}}
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
1. **Try other skills**: Generate skills for your favorite frameworks
|
|
2. **Production deployment**: Use Weaviate Cloud for scalability
|
|
3. **Add custom vectorizers**: Use OpenAI, Cohere, or local models
|
|
4. **Build RAG apps**: Integrate with LangChain or LlamaIndex
|
|
|
|
## Resources
|
|
|
|
- **Weaviate Docs**: https://weaviate.io/developers/weaviate
|
|
- **Hybrid Search**: https://weaviate.io/developers/weaviate/search/hybrid
|
|
- **Python Client**: https://weaviate.io/developers/weaviate/client-libraries/python
|
|
- **Skill Seekers Docs**: https://github.com/yourusername/skill-seekers
|
|
|
|
## File Structure
|
|
|
|
```
|
|
weaviate-example/
|
|
├── README.md # This file
|
|
├── requirements.txt # Python dependencies
|
|
├── 1_generate_skill.py # Generate Weaviate-format skill
|
|
├── 2_upload_to_weaviate.py # Upload to Weaviate instance
|
|
├── 3_query_example.py # Query demonstrations
|
|
└── sample_output/ # Example outputs
|
|
├── react-weaviate.json # Generated skill (21 objects)
|
|
└── query_results.txt # Sample query results
|
|
```
|
|
|
|
---
|
|
|
|
**Last Updated:** February 2026
|
|
**Tested With:** Weaviate v1.25.0, Python 3.10+, skill-seekers v2.10.0
|