firefrost-gaming/skill-seekers-reference

Files

yusyus 53d37e61dd docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

Created complete working examples for all 4 vector databases with RAG adaptors:

Weaviate Example:
- Comprehensive README with hybrid search guide
- 3 Python scripts (generate, upload, query)
- Sample outputs and query results
- Covers hybrid search, filtering, schema design

Chroma Example:
- Simple, local-first approach
- In-memory and persistent storage options
- Semantic search and metadata filtering
- Comparison with Weaviate

FAISS Example:
- Facebook AI Similarity Search integration
- OpenAI embeddings generation
- Index building and persistence
- Performance-focused for scale

Qdrant Example:
- Advanced filtering capabilities
- Production-ready features
- Complex query patterns
- Rust-based performance

Each example includes:
- Detailed README with setup and troubleshooting
- requirements.txt with dependencies
- 3 working Python scripts
- Sample outputs directory

Total files: 20 (4 examples × 5 files each)
Documentation: 4 comprehensive READMEs (~800 lines total)

Phase 2 of optional enhancements complete.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-07 22:38:15 +03:00

1_generate_skill.py

docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

2026-02-07 22:38:15 +03:00

2_upload_to_chroma.py

docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

2026-02-07 22:38:15 +03:00

3_query_example.py

docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

2026-02-07 22:38:15 +03:00

README.md

docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

2026-02-07 22:38:15 +03:00

requirements.txt

docs: Add 4 comprehensive vector database examples (Weaviate, Chroma, FAISS, Qdrant)

2026-02-07 22:38:15 +03:00

README.md

ChromaDB Vector Database Example

This example demonstrates how to use Skill Seekers with ChromaDB, the AI-native open-source embedding database. Chroma is designed to be simple, fast, and easy to use locally.

What You'll Learn

How to generate skills in ChromaDB format
How to create local Chroma collections
How to perform semantic searches
How to filter by metadata categories

Why ChromaDB?

No Server Required: Works entirely in-process (perfect for development)
Simple API: Clean Python interface, no complex setup
Fast: Built for speed with smart indexing
Open Source: MIT licensed, community-driven

Prerequisites

Python Dependencies

pip install -r requirements.txt

That's it! No Docker, no server setup. Chroma runs entirely in your Python process.

Step-by-Step Guide

Step 1: Generate Skill from Documentation

First, we'll scrape Vue documentation and package it for ChromaDB:

python 1_generate_skill.py

This script will:

Scrape Vue docs (limited to 20 pages for demo)
Package the skill in ChromaDB format (JSON with documents + metadata + IDs)
Save to output/vue-chroma.json

Expected Output:

✅ ChromaDB data packaged successfully!
📦 Output: output/vue-chroma.json
📊 Total documents: 21
📂 Categories: overview (1), guides (8), api (12)

What's in the JSON?

{
  "documents": [
    "Vue is a progressive JavaScript framework...",
    "Components are the building blocks..."
  ],
  "metadatas": [
    {
      "source": "vue",
      "category": "overview",
      "file": "SKILL.md",
      "type": "documentation",
      "version": "1.0.0"
    }
  ],
  "ids": [
    "a1b2c3d4e5f6...",
    "b2c3d4e5f6g7..."
  ],
  "collection_name": "vue"
}

Step 2: Create Collection and Upload

Now we'll create a ChromaDB collection and load all documents:

python 2_upload_to_chroma.py

This script will:

Create an in-memory Chroma client (or persistent with --persist)
Create a collection with the skill name
Add all documents with metadata and IDs
Verify the upload was successful

Expected Output:

📊 Creating ChromaDB client...
✅ Client created (in-memory)

📦 Creating collection: vue
✅ Collection created!

📤 Adding 21 documents to collection...
✅ Successfully added 21 documents to ChromaDB

🔍 Collection 'vue' now contains 21 documents

Persistent Storage:

# Save to disk for later use
python 2_upload_to_chroma.py --persist ./chroma_db

Step 3: Query and Search

Now search your knowledge base!

python 3_query_example.py

With persistent storage:

python 3_query_example.py --persist ./chroma_db

This script demonstrates:

Semantic Search: Natural language queries
Metadata Filtering: Filter by category
Top-K Results: Get most relevant documents
Distance Scoring: See how relevant each result is

Example Queries:

Query 1: Semantic Search

Query: "How do I create a Vue component?"
Top 3 results:

1. [Distance: 0.234] guides/components.md
   Components are reusable Vue instances with a name. You can use them as custom
   elements inside a root Vue instance...

2. [Distance: 0.298] api/component_api.md
   The component API reference describes all available options for defining
   components using the Options API...

3. [Distance: 0.312] guides/single_file_components.md
   Single-File Components (SFCs) allow you to define templates, logic, and
   styling in a single .vue file...

Query 2: Filtered Search

Query: "reactivity"
Filter: category = "api"

Results:
1. ref() - Create reactive references
2. reactive() - Create reactive proxies
3. computed() - Create computed properties

Understanding ChromaDB Features

Semantic Search

Chroma automatically:

Generates embeddings for your documents (using default model)
Indexes them for fast similarity search
Finds semantically similar content

Distance Scores:

Lower = more similar
0.0 = identical
< 0.5 = very relevant
0.5-1.0 = somewhat relevant
> 1.0 = less relevant

Metadata Filtering

Filter results before semantic search:

collection.query(
    query_texts=["your query"],
    n_results=5,
    where={"category": "api"}
)

Supported operators:

$eq: Equal to
$ne: Not equal to
$gt, $gte: Greater than (or equal)
$lt, $lte: Less than (or equal)
$in: In list
$nin: Not in list

Complex filters:

where={
    "$and": [
        {"category": {"$eq": "api"}},
        {"type": {"$eq": "reference"}}
    ]
}

Collection Management

# List all collections
client.list_collections()

# Get collection
collection = client.get_collection("vue")

# Get count
collection.count()

# Delete collection
client.delete_collection("vue")

Customization

Use Your Own Embeddings

Chroma supports custom embedding functions:

from chromadb.utils import embedding_functions

# OpenAI embeddings
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-key",
    model_name="text-embedding-ada-002"
)

collection = client.create_collection(
    name="your_skill",
    embedding_function=openai_ef
)

Supported embedding functions:

OpenAI: text-embedding-ada-002 (best quality)
Cohere: embed-english-v2.0
HuggingFace: Various models (local, no API key)
Sentence Transformers: Local models

Generate Different Skills

# Change the config in 1_generate_skill.py
"--config", "configs/django.json",  # Your framework

# Or use CLI directly
skill-seekers scrape --config configs/flask.json
skill-seekers package output/flask --target chroma

Adjust Query Parameters

In 3_query_example.py:

# Get more results
n_results=10  # Default is 5

# Include more metadata
include=["documents", "metadatas", "distances"]

# Different distance metrics
# (configure when creating collection)
metadata={"hnsw:space": "cosine"}  # or "l2", "ip"

Performance Tips

Batch Operations: Add documents in batches for better performance

collection.add(
    documents=batch_docs,
    metadatas=batch_metadata,
    ids=batch_ids
)

Persistent Storage: Use --persist for production

python 2_upload_to_chroma.py --persist ./prod_db

Custom Embeddings: Use OpenAI for best quality (costs $)
Index Tuning: Adjust HNSW parameters for speed vs accuracy

Troubleshooting

Import Error

ModuleNotFoundError: No module named 'chromadb'

Solution:

pip install chromadb

Collection Already Exists

Error: Collection 'vue' already exists

Solution:

# Delete existing collection
client.delete_collection("vue")

# Or use --reset flag
python 2_upload_to_chroma.py --reset

Empty Results

Query returned empty results

Possible causes:

Collection empty: Check collection.count()
Query too specific: Try broader queries
Wrong collection name: Verify collection exists

Debug:

# Check collection contents
collection.get()  # Get all documents

# Check embedding function
collection._embedding_function  # Should not be None

Performance Issues

Query is slow

Solutions:

Use persistent storage (faster than in-memory for large datasets)
Reduce n_results (fewer results = faster)
Add metadata filters to narrow search space
Consider using OpenAI embeddings (better quality = faster convergence)

Next Steps

Try other skills: Package your favorite documentation
Build a chatbot: Integrate with LangChain or LlamaIndex
Production deployment: Use persistent storage + API wrapper
Custom embeddings: Experiment with different models

Resources

ChromaDB Docs: https://docs.trychroma.com/
GitHub: https://github.com/chroma-core/chroma
Discord: https://discord.gg/MMeYNTmh3x
Skill Seekers: https://github.com/yourusername/skill-seekers

File Structure

chroma-example/
├── README.md                      # This file
├── requirements.txt               # Python dependencies
├── 1_generate_skill.py            # Generate ChromaDB-format skill
├── 2_upload_to_chroma.py          # Create collection and upload
├── 3_query_example.py             # Query demonstrations
└── sample_output/                 # Example outputs
    ├── vue-chroma.json            # Generated skill (21 docs)
    └── query_results.txt          # Sample query results

Comparison: Chroma vs Weaviate

Feature	ChromaDB	Weaviate
Setup	✅ No server needed	⚠️ Docker/Cloud required
API	✅ Very simple	⚠️ More complex
Performance	✅ Fast for < 1M docs	✅ Scales to billions
Hybrid Search	❌ Semantic only	✅ Keyword + semantic
Production	✅ Good for small-medium	✅ Built for scale

Use Chroma for: Development, prototypes, small-medium datasets (< 1M docs) Use Weaviate for: Production, large datasets (> 1M docs), hybrid search

Last Updated: February 2026 Tested With: ChromaDB v0.4.22, Python 3.10+, skill-seekers v2.10.0