# Weaviate Vector Database Example This example demonstrates how to use Skill Seekers with Weaviate, a powerful vector database with hybrid search capabilities (keyword + semantic). ## What You'll Learn - How to generate skills in Weaviate format - How to create a Weaviate schema and upload data - How to perform hybrid searches (keyword + vector) - How to filter by metadata categories ## Prerequisites ### 1. Weaviate Instance **Option A: Weaviate Cloud (Recommended for production)** - Sign up at https://console.weaviate.cloud/ - Create a free sandbox cluster - Get your cluster URL and API key **Option B: Local Docker (Recommended for development)** ```bash docker run -d \ --name weaviate \ -p 8080:8080 \ -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \ -e PERSISTENCE_DATA_PATH=/var/lib/weaviate \ semitechnologies/weaviate:latest ``` ### 2. Python Dependencies ```bash pip install -r requirements.txt ``` ## Step-by-Step Guide ### Step 1: Generate Skill from Documentation First, we'll scrape React documentation and package it for Weaviate: ```bash python 1_generate_skill.py ``` This script will: 1. Scrape React docs (limited to 20 pages for demo) 2. Package the skill in Weaviate format (JSON with schema + objects) 3. Save to `sample_output/react-weaviate.json` **Expected Output:** ``` ✅ Weaviate data packaged successfully! 📦 Output: output/react-weaviate.json 📊 Total objects: 21 📂 Categories: overview (1), guides (8), api (12) ``` **What's in the JSON?** ```json { "schema": { "class": "React", "description": "React documentation skill", "properties": [ {"name": "content", "dataType": ["text"]}, {"name": "source", "dataType": ["text"]}, {"name": "category", "dataType": ["text"]}, ... ] }, "objects": [ { "id": "uuid-here", "properties": { "content": "React is a JavaScript library...", "source": "react", "category": "overview", ... } } ], "class_name": "React" } ``` ### Step 2: Upload to Weaviate Now we'll create the schema and upload all objects to Weaviate: ```bash python 2_upload_to_weaviate.py ``` **For local Docker:** ```bash python 2_upload_to_weaviate.py --url http://localhost:8080 ``` **For Weaviate Cloud:** ```bash python 2_upload_to_weaviate.py \ --url https://your-cluster.weaviate.network \ --api-key YOUR_API_KEY ``` This script will: 1. Connect to your Weaviate instance 2. Create the schema (class + properties) 3. Batch upload all objects 4. Verify the upload was successful **Expected Output:** ``` 🔗 Connecting to Weaviate at http://localhost:8080... ✅ Weaviate is ready! 📊 Creating schema: React ✅ Schema created successfully! 📤 Uploading 21 objects in batches... ✅ Batch 1/1 uploaded (21 objects) ✅ Successfully uploaded 21 documents to Weaviate 🔍 Class 'React' now contains 21 objects ``` ### Step 3: Query and Search Now the fun part - querying your knowledge base! ```bash python 3_query_example.py ``` **For local Docker:** ```bash python 3_query_example.py --url http://localhost:8080 ``` **For Weaviate Cloud:** ```bash python 3_query_example.py \ --url https://your-cluster.weaviate.network \ --api-key YOUR_API_KEY ``` This script demonstrates: 1. **Keyword Search**: Traditional text search 2. **Hybrid Search**: Combines keyword + vector similarity 3. **Metadata Filtering**: Filter by category 4. **Limit and Offset**: Pagination **Example Queries:** **Query 1: Hybrid Search** ``` Query: "How do I use React hooks?" Alpha: 0.5 (50% keyword, 50% vector) Results: 1. Category: api Snippet: Hooks are functions that let you "hook into" React state and lifecycle... 2. Category: guides Snippet: To use a Hook, you need to call it at the top level of your component... ``` **Query 2: Filter by Category** ``` Query: API reference Category: api Results: 1. useState Hook - Manage component state 2. useEffect Hook - Perform side effects 3. useContext Hook - Access context values ``` ## Understanding Weaviate Features ### Hybrid Search (`alpha` parameter) Weaviate's killer feature is hybrid search, which combines: - **Keyword Search (BM25)**: Traditional text matching - **Vector Search (ANN)**: Semantic similarity Control the balance with `alpha`: - `alpha=0`: Pure keyword search (BM25 only) - `alpha=0.5`: Balanced (default - recommended) - `alpha=1`: Pure vector search (semantic only) **When to use what:** - **Exact terms** (API names, error messages): `alpha=0` to `alpha=0.3` - **Concepts** (how to do X, why does Y): `alpha=0.7` to `alpha=1` - **General queries**: `alpha=0.5` (balanced) ### Metadata Filtering Filter results by any property: ```python .with_where({ "path": ["category"], "operator": "Equal", "valueText": "api" }) ``` Supported operators: - `Equal`, `NotEqual` - `GreaterThan`, `LessThan` - `And`, `Or`, `Not` ### Schema Design Our schema includes: - **content**: The actual documentation text (vectorized) - **source**: Skill name (e.g., "react") - **category**: Document category (e.g., "api", "guides") - **file**: Source file name - **type**: Document type ("overview" or "reference") - **version**: Skill version ## Customization ### Generate Your Own Skill Want to use a different documentation source? Easy: ```python # 1_generate_skill.py (modify line 10) "--config", "configs/vue.json", # Change to your config ``` Or scrape from scratch: ```bash skill-seekers scrape --config configs/your_framework.json skill-seekers package output/your_framework --target weaviate ``` ### Adjust Search Parameters In `3_query_example.py`, modify: ```python # Adjust hybrid search balance alpha=0.7 # More semantic, less keyword # Adjust result count .with_limit(10) # Get more results # Add more filters .with_where({ "operator": "And", "operands": [ {"path": ["category"], "operator": "Equal", "valueText": "api"}, {"path": ["type"], "operator": "Equal", "valueText": "reference"} ] }) ``` ## Troubleshooting ### Connection Refused ``` Error: Connection refused to http://localhost:8080 ``` **Solution:** Ensure Weaviate is running: ```bash docker ps | grep weaviate # If not running, start it: docker start weaviate ``` ### Schema Already Exists ``` Error: Class 'React' already exists ``` **Solution:** Delete the existing class: ```bash # In Python or using Weaviate API client.schema.delete_class("React") ``` Or use the example's built-in reset: ```bash python 2_upload_to_weaviate.py --reset ``` ### Empty Results ``` Query returned 0 results ``` **Possible causes:** 1. **No embeddings**: Weaviate needs a vectorizer configured (we use default) 2. **Wrong class name**: Check the class name matches 3. **Data not uploaded**: Verify with `client.query.aggregate("React").with_meta_count().do()` **Solution:** Check object count: ```python result = client.query.aggregate("React").with_meta_count().do() print(result) # Should show {"data": {"Aggregate": {"React": [{"meta": {"count": 21}}]}}} ``` ## Next Steps 1. **Try other skills**: Generate skills for your favorite frameworks 2. **Production deployment**: Use Weaviate Cloud for scalability 3. **Add custom vectorizers**: Use OpenAI, Cohere, or local models 4. **Build RAG apps**: Integrate with LangChain or LlamaIndex ## Resources - **Weaviate Docs**: https://weaviate.io/developers/weaviate - **Hybrid Search**: https://weaviate.io/developers/weaviate/search/hybrid - **Python Client**: https://weaviate.io/developers/weaviate/client-libraries/python - **Skill Seekers Docs**: https://github.com/yourusername/skill-seekers ## File Structure ``` weaviate-example/ ├── README.md # This file ├── requirements.txt # Python dependencies ├── 1_generate_skill.py # Generate Weaviate-format skill ├── 2_upload_to_weaviate.py # Upload to Weaviate instance ├── 3_query_example.py # Query demonstrations └── sample_output/ # Example outputs ├── react-weaviate.json # Generated skill (21 objects) └── query_results.txt # Sample query results ``` --- **Last Updated:** February 2026 **Tested With:** Weaviate v1.25.0, Python 3.10+, skill-seekers v2.10.0