Created complete working examples for all 4 vector databases with RAG adaptors: Weaviate Example: - Comprehensive README with hybrid search guide - 3 Python scripts (generate, upload, query) - Sample outputs and query results - Covers hybrid search, filtering, schema design Chroma Example: - Simple, local-first approach - In-memory and persistent storage options - Semantic search and metadata filtering - Comparison with Weaviate FAISS Example: - Facebook AI Similarity Search integration - OpenAI embeddings generation - Index building and persistence - Performance-focused for scale Qdrant Example: - Advanced filtering capabilities - Production-ready features - Complex query patterns - Rust-based performance Each example includes: - Detailed README with setup and troubleshooting - requirements.txt with dependencies - 3 working Python scripts - Sample outputs directory Total files: 20 (4 examples × 5 files each) Documentation: 4 comprehensive READMEs (~800 lines total) Phase 2 of optional enhancements complete. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
ChromaDB Vector Database Example
This example demonstrates how to use Skill Seekers with ChromaDB, the AI-native open-source embedding database. Chroma is designed to be simple, fast, and easy to use locally.
What You'll Learn
- How to generate skills in ChromaDB format
- How to create local Chroma collections
- How to perform semantic searches
- How to filter by metadata categories
Why ChromaDB?
- No Server Required: Works entirely in-process (perfect for development)
- Simple API: Clean Python interface, no complex setup
- Fast: Built for speed with smart indexing
- Open Source: MIT licensed, community-driven
Prerequisites
Python Dependencies
pip install -r requirements.txt
That's it! No Docker, no server setup. Chroma runs entirely in your Python process.
Step-by-Step Guide
Step 1: Generate Skill from Documentation
First, we'll scrape Vue documentation and package it for ChromaDB:
python 1_generate_skill.py
This script will:
- Scrape Vue docs (limited to 20 pages for demo)
- Package the skill in ChromaDB format (JSON with documents + metadata + IDs)
- Save to
output/vue-chroma.json
Expected Output:
✅ ChromaDB data packaged successfully!
📦 Output: output/vue-chroma.json
📊 Total documents: 21
📂 Categories: overview (1), guides (8), api (12)
What's in the JSON?
{
"documents": [
"Vue is a progressive JavaScript framework...",
"Components are the building blocks..."
],
"metadatas": [
{
"source": "vue",
"category": "overview",
"file": "SKILL.md",
"type": "documentation",
"version": "1.0.0"
}
],
"ids": [
"a1b2c3d4e5f6...",
"b2c3d4e5f6g7..."
],
"collection_name": "vue"
}
Step 2: Create Collection and Upload
Now we'll create a ChromaDB collection and load all documents:
python 2_upload_to_chroma.py
This script will:
- Create an in-memory Chroma client (or persistent with
--persist) - Create a collection with the skill name
- Add all documents with metadata and IDs
- Verify the upload was successful
Expected Output:
📊 Creating ChromaDB client...
✅ Client created (in-memory)
📦 Creating collection: vue
✅ Collection created!
📤 Adding 21 documents to collection...
✅ Successfully added 21 documents to ChromaDB
🔍 Collection 'vue' now contains 21 documents
Persistent Storage:
# Save to disk for later use
python 2_upload_to_chroma.py --persist ./chroma_db
Step 3: Query and Search
Now search your knowledge base!
python 3_query_example.py
With persistent storage:
python 3_query_example.py --persist ./chroma_db
This script demonstrates:
- Semantic Search: Natural language queries
- Metadata Filtering: Filter by category
- Top-K Results: Get most relevant documents
- Distance Scoring: See how relevant each result is
Example Queries:
Query 1: Semantic Search
Query: "How do I create a Vue component?"
Top 3 results:
1. [Distance: 0.234] guides/components.md
Components are reusable Vue instances with a name. You can use them as custom
elements inside a root Vue instance...
2. [Distance: 0.298] api/component_api.md
The component API reference describes all available options for defining
components using the Options API...
3. [Distance: 0.312] guides/single_file_components.md
Single-File Components (SFCs) allow you to define templates, logic, and
styling in a single .vue file...
Query 2: Filtered Search
Query: "reactivity"
Filter: category = "api"
Results:
1. ref() - Create reactive references
2. reactive() - Create reactive proxies
3. computed() - Create computed properties
Understanding ChromaDB Features
Semantic Search
Chroma automatically:
- Generates embeddings for your documents (using default model)
- Indexes them for fast similarity search
- Finds semantically similar content
Distance Scores:
- Lower = more similar
0.0= identical< 0.5= very relevant0.5-1.0= somewhat relevant> 1.0= less relevant
Metadata Filtering
Filter results before semantic search:
collection.query(
query_texts=["your query"],
n_results=5,
where={"category": "api"}
)
Supported operators:
$eq: Equal to$ne: Not equal to$gt,$gte: Greater than (or equal)$lt,$lte: Less than (or equal)$in: In list$nin: Not in list
Complex filters:
where={
"$and": [
{"category": {"$eq": "api"}},
{"type": {"$eq": "reference"}}
]
}
Collection Management
# List all collections
client.list_collections()
# Get collection
collection = client.get_collection("vue")
# Get count
collection.count()
# Delete collection
client.delete_collection("vue")
Customization
Use Your Own Embeddings
Chroma supports custom embedding functions:
from chromadb.utils import embedding_functions
# OpenAI embeddings
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-key",
model_name="text-embedding-ada-002"
)
collection = client.create_collection(
name="your_skill",
embedding_function=openai_ef
)
Supported embedding functions:
- OpenAI:
text-embedding-ada-002(best quality) - Cohere:
embed-english-v2.0 - HuggingFace: Various models (local, no API key)
- Sentence Transformers: Local models
Generate Different Skills
# Change the config in 1_generate_skill.py
"--config", "configs/django.json", # Your framework
# Or use CLI directly
skill-seekers scrape --config configs/flask.json
skill-seekers package output/flask --target chroma
Adjust Query Parameters
In 3_query_example.py:
# Get more results
n_results=10 # Default is 5
# Include more metadata
include=["documents", "metadatas", "distances"]
# Different distance metrics
# (configure when creating collection)
metadata={"hnsw:space": "cosine"} # or "l2", "ip"
Performance Tips
-
Batch Operations: Add documents in batches for better performance
collection.add( documents=batch_docs, metadatas=batch_metadata, ids=batch_ids ) -
Persistent Storage: Use
--persistfor productionpython 2_upload_to_chroma.py --persist ./prod_db -
Custom Embeddings: Use OpenAI for best quality (costs $)
-
Index Tuning: Adjust HNSW parameters for speed vs accuracy
Troubleshooting
Import Error
ModuleNotFoundError: No module named 'chromadb'
Solution:
pip install chromadb
Collection Already Exists
Error: Collection 'vue' already exists
Solution:
# Delete existing collection
client.delete_collection("vue")
# Or use --reset flag
python 2_upload_to_chroma.py --reset
Empty Results
Query returned empty results
Possible causes:
- Collection empty: Check
collection.count() - Query too specific: Try broader queries
- Wrong collection name: Verify collection exists
Debug:
# Check collection contents
collection.get() # Get all documents
# Check embedding function
collection._embedding_function # Should not be None
Performance Issues
Query is slow
Solutions:
- Use persistent storage (faster than in-memory for large datasets)
- Reduce
n_results(fewer results = faster) - Add metadata filters to narrow search space
- Consider using OpenAI embeddings (better quality = faster convergence)
Next Steps
- Try other skills: Package your favorite documentation
- Build a chatbot: Integrate with LangChain or LlamaIndex
- Production deployment: Use persistent storage + API wrapper
- Custom embeddings: Experiment with different models
Resources
- ChromaDB Docs: https://docs.trychroma.com/
- GitHub: https://github.com/chroma-core/chroma
- Discord: https://discord.gg/MMeYNTmh3x
- Skill Seekers: https://github.com/yourusername/skill-seekers
File Structure
chroma-example/
├── README.md # This file
├── requirements.txt # Python dependencies
├── 1_generate_skill.py # Generate ChromaDB-format skill
├── 2_upload_to_chroma.py # Create collection and upload
├── 3_query_example.py # Query demonstrations
└── sample_output/ # Example outputs
├── vue-chroma.json # Generated skill (21 docs)
└── query_results.txt # Sample query results
Comparison: Chroma vs Weaviate
| Feature | ChromaDB | Weaviate |
|---|---|---|
| Setup | ✅ No server needed | ⚠️ Docker/Cloud required |
| API | ✅ Very simple | ⚠️ More complex |
| Performance | ✅ Fast for < 1M docs | ✅ Scales to billions |
| Hybrid Search | ❌ Semantic only | ✅ Keyword + semantic |
| Production | ✅ Good for small-medium | ✅ Built for scale |
Use Chroma for: Development, prototypes, small-medium datasets (< 1M docs) Use Weaviate for: Production, large datasets (> 1M docs), hybrid search
Last Updated: February 2026 Tested With: ChromaDB v0.4.22, Python 3.10+, skill-seekers v2.10.0