firefrost-gaming/skill-seekers-reference

Files

yusyus 53d37e61dd docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

Created complete working examples for all 4 vector databases with RAG adaptors:

Weaviate Example:
- Comprehensive README with hybrid search guide
- 3 Python scripts (generate, upload, query)
- Sample outputs and query results
- Covers hybrid search, filtering, schema design

Chroma Example:
- Simple, local-first approach
- In-memory and persistent storage options
- Semantic search and metadata filtering
- Comparison with Weaviate

FAISS Example:
- Facebook AI Similarity Search integration
- OpenAI embeddings generation
- Index building and persistence
- Performance-focused for scale

Qdrant Example:
- Advanced filtering capabilities
- Production-ready features
- Complex query patterns
- Rust-based performance

Each example includes:
- Detailed README with setup and troubleshooting
- requirements.txt with dependencies
- 3 working Python scripts
- Sample outputs directory

Total files: 20 (4 examples × 5 files each)
Documentation: 4 comprehensive READMEs (~800 lines total)

Phase 2 of optional enhancements complete.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-07 22:38:15 +03:00

8.1 KiB

Raw Permalink Blame History

Weaviate Vector Database Example

This example demonstrates how to use Skill Seekers with Weaviate, a powerful vector database with hybrid search capabilities (keyword + semantic).

What You'll Learn

How to generate skills in Weaviate format
How to create a Weaviate schema and upload data
How to perform hybrid searches (keyword + vector)
How to filter by metadata categories

Prerequisites

1. Weaviate Instance

Option A: Weaviate Cloud (Recommended for production)

Sign up at https://console.weaviate.cloud/
Create a free sandbox cluster
Get your cluster URL and API key

Option B: Local Docker (Recommended for development)

docker run -d \
  --name weaviate \
  -p 8080:8080 \
  -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
  -e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
  semitechnologies/weaviate:latest

2. Python Dependencies

pip install -r requirements.txt

Step-by-Step Guide

Step 1: Generate Skill from Documentation

First, we'll scrape React documentation and package it for Weaviate:

python 1_generate_skill.py

This script will:

Scrape React docs (limited to 20 pages for demo)
Package the skill in Weaviate format (JSON with schema + objects)
Save to sample_output/react-weaviate.json

Expected Output:

✅ Weaviate data packaged successfully!
📦 Output: output/react-weaviate.json
📊 Total objects: 21
📂 Categories: overview (1), guides (8), api (12)

What's in the JSON?

{
  "schema": {
    "class": "React",
    "description": "React documentation skill",
    "properties": [
      {"name": "content", "dataType": ["text"]},
      {"name": "source", "dataType": ["text"]},
      {"name": "category", "dataType": ["text"]},
      ...
    ]
  },
  "objects": [
    {
      "id": "uuid-here",
      "properties": {
        "content": "React is a JavaScript library...",
        "source": "react",
        "category": "overview",
        ...
      }
    }
  ],
  "class_name": "React"
}

Step 2: Upload to Weaviate

Now we'll create the schema and upload all objects to Weaviate:

python 2_upload_to_weaviate.py

For local Docker:

python 2_upload_to_weaviate.py --url http://localhost:8080

For Weaviate Cloud:

python 2_upload_to_weaviate.py \
  --url https://your-cluster.weaviate.network \
  --api-key YOUR_API_KEY

This script will:

Connect to your Weaviate instance
Create the schema (class + properties)
Batch upload all objects
Verify the upload was successful

Expected Output:

🔗 Connecting to Weaviate at http://localhost:8080...
✅ Weaviate is ready!

📊 Creating schema: React
✅ Schema created successfully!

📤 Uploading 21 objects in batches...
✅ Batch 1/1 uploaded (21 objects)

✅ Successfully uploaded 21 documents to Weaviate
🔍 Class 'React' now contains 21 objects

Step 3: Query and Search

Now the fun part - querying your knowledge base!

python 3_query_example.py

For local Docker:

python 3_query_example.py --url http://localhost:8080

For Weaviate Cloud:

python 3_query_example.py \
  --url https://your-cluster.weaviate.network \
  --api-key YOUR_API_KEY

This script demonstrates:

Keyword Search: Traditional text search
Hybrid Search: Combines keyword + vector similarity
Metadata Filtering: Filter by category
Limit and Offset: Pagination

Example Queries:

Query 1: Hybrid Search

Query: "How do I use React hooks?"
Alpha: 0.5 (50% keyword, 50% vector)

Results:
1. Category: api
   Snippet: Hooks are functions that let you "hook into" React state and lifecycle...

2. Category: guides
   Snippet: To use a Hook, you need to call it at the top level of your component...

Query 2: Filter by Category

Query: API reference
Category: api

Results:
1. useState Hook - Manage component state
2. useEffect Hook - Perform side effects
3. useContext Hook - Access context values

Understanding Weaviate Features

Hybrid Search (`alpha` parameter)

Weaviate's killer feature is hybrid search, which combines:

Keyword Search (BM25): Traditional text matching
Vector Search (ANN): Semantic similarity

Control the balance with alpha:

alpha=0: Pure keyword search (BM25 only)
alpha=0.5: Balanced (default - recommended)
alpha=1: Pure vector search (semantic only)

When to use what:

Exact terms (API names, error messages): alpha=0 to alpha=0.3
Concepts (how to do X, why does Y): alpha=0.7 to alpha=1
General queries: alpha=0.5 (balanced)

Metadata Filtering

Filter results by any property:

.with_where({
    "path": ["category"],
    "operator": "Equal",
    "valueText": "api"
})

Supported operators:

Equal, NotEqual
GreaterThan, LessThan
And, Or, Not

Schema Design

Our schema includes:

content: The actual documentation text (vectorized)
source: Skill name (e.g., "react")
category: Document category (e.g., "api", "guides")
file: Source file name
type: Document type ("overview" or "reference")
version: Skill version

Customization

Generate Your Own Skill

Want to use a different documentation source? Easy:

# 1_generate_skill.py (modify line 10)
"--config", "configs/vue.json",  # Change to your config

Or scrape from scratch:

skill-seekers scrape --config configs/your_framework.json
skill-seekers package output/your_framework --target weaviate

Adjust Search Parameters

In 3_query_example.py, modify:

# Adjust hybrid search balance
alpha=0.7  # More semantic, less keyword

# Adjust result count
.with_limit(10)  # Get more results

# Add more filters
.with_where({
    "operator": "And",
    "operands": [
        {"path": ["category"], "operator": "Equal", "valueText": "api"},
        {"path": ["type"], "operator": "Equal", "valueText": "reference"}
    ]
})

Troubleshooting

Connection Refused

Error: Connection refused to http://localhost:8080

Solution: Ensure Weaviate is running:

docker ps | grep weaviate
# If not running, start it:
docker start weaviate

Schema Already Exists

Error: Class 'React' already exists

Solution: Delete the existing class:

# In Python or using Weaviate API
client.schema.delete_class("React")

Or use the example's built-in reset:

python 2_upload_to_weaviate.py --reset

Empty Results

Query returned 0 results

Possible causes:

No embeddings: Weaviate needs a vectorizer configured (we use default)
Wrong class name: Check the class name matches
Data not uploaded: Verify with client.query.aggregate("React").with_meta_count().do()

Solution: Check object count:

result = client.query.aggregate("React").with_meta_count().do()
print(result)  # Should show {"data": {"Aggregate": {"React": [{"meta": {"count": 21}}]}}}

Next Steps

Try other skills: Generate skills for your favorite frameworks
Production deployment: Use Weaviate Cloud for scalability
Add custom vectorizers: Use OpenAI, Cohere, or local models
Build RAG apps: Integrate with LangChain or LlamaIndex

Resources

Weaviate Docs: https://weaviate.io/developers/weaviate
Hybrid Search: https://weaviate.io/developers/weaviate/search/hybrid
Python Client: https://weaviate.io/developers/weaviate/client-libraries/python
Skill Seekers Docs: https://github.com/yourusername/skill-seekers

File Structure

weaviate-example/
├── README.md                      # This file
├── requirements.txt               # Python dependencies
├── 1_generate_skill.py            # Generate Weaviate-format skill
├── 2_upload_to_weaviate.py        # Upload to Weaviate instance
├── 3_query_example.py             # Query demonstrations
└── sample_output/                 # Example outputs
    ├── react-weaviate.json        # Generated skill (21 objects)
    └── query_results.txt          # Sample query results

Last Updated: February 2026 Tested With: Weaviate v1.25.0, Python 3.10+, skill-seekers v2.10.0

8.1 KiB Raw Permalink Blame History