- Forked yusufkaraaslan/Skill_Seekers to Gitea (MIT License) - Added Qdrant integration guide for Task #93 - Tool converts docs/repos/PDFs to RAG-ready format - Directly applicable to Trinity Codex knowledge base Chronicler #73
194 lines
4.6 KiB
Markdown
194 lines
4.6 KiB
Markdown
# Skill Seekers + Qdrant Integration
|
|
|
|
**Source:** https://github.com/yusufkaraaslan/Skill_Seekers
|
|
**License:** MIT
|
|
**Gitea Fork:** https://git.firefrostgaming.com/firefrost-gaming/skill-seekers-reference
|
|
|
|
## Overview
|
|
|
|
Skill Seekers converts documentation sites, GitHub repos, PDFs, and 17+ source types into structured knowledge assets ready for RAG pipelines. This is directly applicable to Trinity Codex.
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
pip install skill-seekers
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Convert docs to skill
|
|
skill-seekers create https://docs.example.com/
|
|
|
|
# Package for Qdrant
|
|
skill-seekers package output/example --target qdrant
|
|
```
|
|
|
|
## Supported Sources (17 types)
|
|
|
|
- Documentation websites
|
|
- GitHub repositories
|
|
- PDF documents
|
|
- Word documents (.docx)
|
|
- EPUB e-books
|
|
- Jupyter Notebooks
|
|
- OpenAPI specs
|
|
- PowerPoint presentations
|
|
- AsciiDoc documents
|
|
- HTML files
|
|
- RSS/Atom feeds
|
|
- Man pages
|
|
- YouTube videos (with `skill-seekers[video]`)
|
|
|
|
## Qdrant Pipeline
|
|
|
|
### Step 1: Generate Skill
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
import subprocess
|
|
from pathlib import Path
|
|
|
|
# Scrape documentation
|
|
subprocess.run([
|
|
"skill-seekers", "scrape",
|
|
"--config", "configs/your-config.json",
|
|
"--max-pages", "20"
|
|
], check=True)
|
|
|
|
# Package for Qdrant
|
|
subprocess.run([
|
|
"skill-seekers", "package",
|
|
"output/your-skill",
|
|
"--target", "qdrant"
|
|
], check=True)
|
|
|
|
output = Path("output/your-skill-qdrant.json")
|
|
print(f"Ready: {output} ({output.stat().st_size/1024:.1f} KB)")
|
|
```
|
|
|
|
### Step 2: Upload to Qdrant
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
import json
|
|
from qdrant_client import QdrantClient
|
|
from qdrant_client.models import Distance, VectorParams, PointStruct
|
|
|
|
# Connect to Qdrant (our instance will be on TX1)
|
|
client = QdrantClient(url="http://localhost:6333")
|
|
|
|
# Load packaged data
|
|
with open("output/your-skill-qdrant.json") as f:
|
|
data = json.load(f)
|
|
|
|
collection_name = data["collection_name"]
|
|
config = data["config"]
|
|
|
|
# Create collection
|
|
client.create_collection(
|
|
collection_name=collection_name,
|
|
vectors_config=VectorParams(
|
|
size=config["vector_size"],
|
|
distance=Distance.COSINE
|
|
)
|
|
)
|
|
|
|
# Upload points (add real embeddings in production)
|
|
points = []
|
|
for point in data["points"]:
|
|
points.append(PointStruct(
|
|
id=point["id"],
|
|
vector=[0.0] * config["vector_size"], # Replace with real embeddings
|
|
payload=point["payload"]
|
|
))
|
|
|
|
client.upsert(collection_name=collection_name, points=points)
|
|
print(f"Uploaded {len(points)} points to {collection_name}")
|
|
```
|
|
|
|
### Step 3: Query
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
from qdrant_client import QdrantClient
|
|
from qdrant_client.models import Filter, FieldCondition, MatchValue
|
|
|
|
client = QdrantClient(url="http://localhost:6333")
|
|
collection_name = "your-collection"
|
|
|
|
# Filter by category
|
|
result = client.scroll(
|
|
collection_name=collection_name,
|
|
scroll_filter=Filter(
|
|
must=[
|
|
FieldCondition(
|
|
key="category",
|
|
match=MatchValue(value="api")
|
|
)
|
|
]
|
|
),
|
|
limit=5
|
|
)
|
|
|
|
for point in result[0]:
|
|
print(f"- {point.payload['file']}: {point.payload['content'][:100]}...")
|
|
```
|
|
|
|
## Trinity Codex Application
|
|
|
|
### Phase 1: Documentation Ingestion
|
|
|
|
Convert key Firefrost documentation sources:
|
|
|
|
```bash
|
|
# Pterodactyl docs
|
|
skill-seekers create https://pterodactyl.io/project/introduction.html
|
|
skill-seekers package output/pterodactyl --target qdrant
|
|
|
|
# Minecraft Wiki (modding)
|
|
skill-seekers create https://minecraft.wiki/w/Mods
|
|
|
|
# Operations Manual (local)
|
|
skill-seekers create ./docs/
|
|
skill-seekers package output/docs --target qdrant
|
|
```
|
|
|
|
### Phase 2: Vector Database Setup
|
|
|
|
Qdrant runs on TX1 (38.68.14.26) alongside Dify:
|
|
|
|
```bash
|
|
# Docker deployment
|
|
docker run -d \
|
|
--name qdrant \
|
|
-p 6333:6333 \
|
|
-v /opt/qdrant/storage:/qdrant/storage \
|
|
qdrant/qdrant:latest
|
|
```
|
|
|
|
### Phase 3: Dify Integration
|
|
|
|
Dify connects to Qdrant for RAG queries. See Dify documentation for knowledge base configuration.
|
|
|
|
## Key Features for Firefrost
|
|
|
|
| Feature | Benefit |
|
|
|---------|---------|
|
|
| Multi-source ingestion | Combine wiki, docs, PDFs into one knowledge base |
|
|
| Qdrant-native output | Direct integration with our planned stack |
|
|
| Smart chunking | Preserves code blocks and context |
|
|
| Metadata preservation | Category, file, type fields for filtering |
|
|
| 500+ line SKILL.md | High-quality Claude skills from any source |
|
|
|
|
## Resources
|
|
|
|
- **Full repo:** https://git.firefrostgaming.com/firefrost-gaming/skill-seekers-reference
|
|
- **Original:** https://github.com/yusufkaraaslan/Skill_Seekers
|
|
- **Website:** https://skillseekersweb.com/
|
|
- **Qdrant Docs:** https://qdrant.tech/documentation/
|
|
|
|
---
|
|
|
|
*Added by Chronicler #73 on 2026-04-09*
|