* Add script to generate skills index from markdown files This script generates an index of skills from markdown files in a specified directory, inferring categories and extracting metadata. * chore: sync generated registry files [ci skip] * Add unit tests for category inference functions * chore: sync generated registry files [ci skip] * Add Smart Auto-Categorization Guide Added comprehensive guide for smart auto-categorization of skills, detailing the process, current status, category distribution, and usage instructions. * chore: sync generated registry files [ci skip] * chore: update star history chart --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: sck_0 <samujackson1337@gmail.com>
6.2 KiB
Smart Auto-Categorization Guide
Overview
The skill collection now uses intelligent auto-categorization to eliminate "uncategorized" and organize skills into meaningful categories based on their content.
tools/scripts/generate_index.py now classifies skills at index-build time and writes two explainability fields to every record in skills_index.json:
category_confidence(numeric confidence score)category_reason(how the category was selected)
Current Status
✅ Current repository indexed through the generated catalog
- Most skills are in meaningful categories
- A smaller tail still needs manual review or better keyword coverage
- 11 primary categories
- Categories sorted by skill count (most first)
Category Distribution
| Category | Count | Examples |
|---|---|---|
| Backend | 164 | Node.js, Django, Express, FastAPI |
| Web Development | 107 | React, Vue, Tailwind, CSS |
| Automation | 103 | Workflow, Scripting, RPA |
| DevOps | 83 | Docker, Kubernetes, CI/CD, Git |
| AI/ML | 79 | TensorFlow, PyTorch, NLP, LLM |
| Content | 47 | Documentation, SEO, Writing |
| Database | 44 | SQL, MongoDB, PostgreSQL |
| Testing | 38 | Jest, Cypress, Unit Testing |
| Security | 36 | Encryption, Authentication |
| Cloud | 33 | AWS, Azure, GCP |
| Mobile | 21 | React Native, Flutter, iOS |
| Game Dev | 15 | Unity, WebGL, 3D |
| Data Science | 14 | Pandas, NumPy, Analytics |
How It Works
1. Keyword-Based Analysis
The system analyzes skill names and descriptions for keywords:
- Backend: nodejs, express, fastapi, django, server, api, database
- Web Dev: react, vue, angular, frontend, css, html, tailwind
- AI/ML: ai, machine learning, tensorflow, nlp, gpt
- DevOps: docker, kubernetes, ci/cd, deploy
- And more...
2. Priority System
Frontmatter category > Detected Keywords > Fallback (uncategorized)
If a skill already has a category in frontmatter, that's preserved.
When no known keyword category matches, the index builder derives a dynamic category from the skill id tokens. This keeps the system open-ended and allows new categories without maintaining a fixed allow-list.
3. Scope-Based Matching
- Exact phrase matches weighted 2x higher than partial matches
- Uses word boundaries to avoid false positives
Using the Auto-Categorization
Run on Uncategorized Skills
python tools/scripts/auto_categorize_skills.py
Build index with explainable auto-categorization
python tools/scripts/generate_index.py
Preview Changes First (Dry Run)
python tools/scripts/auto_categorize_skills.py --dry-run
Output
======================================================================
AUTO-CATEGORIZATION REPORT
======================================================================
Summary:
✅ Categorized: 776
⏭️ Already categorized: 46
❌ Failed to categorize: 124
📈 Total processed: full repository
Sample changes:
• 3d-web-experience
uncategorized → web-development
• ab-test-setup
uncategorized → testing
• agent-framework-azure-ai-py
uncategorized → backend
Web App Improvements
Category Filter
Before:
- Unordered list including "uncategorized"
- No indication of category size
After:
- Categories sorted by skill count (most first, "uncategorized" last)
- Shows count: "Backend (164)" "Web Development (107)"
- Much easier to browse
Example Dropdowns
Sorted Order:
- All Categories
- Backend (164)
- Web Development (107)
- Automation (103)
- DevOps (83)
- AI/ML (79)
- ... more categories ...
- Uncategorized (126) ← at the end
For Skill Creators
When Adding a New Skill
Include category in frontmatter:
---
name: my-skill
description: "..."
category: web-development
date_added: "2026-03-06"
---
If You're Not Sure
The system will automatically categorize on next index regeneration:
python tools/scripts/generate_index.py
Keyword Reference
Available auto-categorization keywords by category:
Backend: nodejs, node.js, express, fastapi, django, flask, spring, java, python, golang, rust, server, api, rest, graphql, database, sql, mongodb
Web Development: react, vue, angular, html, css, javascript, typescript, frontend, tailwind, bootstrap, webpack, vite, pwa, responsive, seo
Database: database, sql, postgres, mysql, mongodb, firestore, redis, orm, schema
AI/ML: ai, machine learning, ml, tensorflow, pytorch, nlp, llm, gpt, transformer, embedding, training
DevOps: docker, kubernetes, ci/cd, git, jenkins, terraform, ansible, deploy, container, monitoring
Cloud: aws, azure, gcp, serverless, lambda, storage, cdn
Security: encryption, cryptography, jwt, oauth, authentication, authorization, vulnerability
Testing: test, jest, mocha, pytest, cypress, selenium, unit test, e2e
Mobile: mobile, react native, flutter, ios, android, swift, kotlin
Automation: automation, workflow, scripting, robot, trigger, integration
Game Development: game, unity, unreal, godot, threejs, 2d, 3d, physics
Data Science: data, analytics, pandas, numpy, statistics, visualization
Customization
Add Custom Keywords
Edit tools/scripts/auto_categorize_skills.py:
CATEGORY_KEYWORDS = {
'your-category': [
'keyword1', 'keyword2', 'exact phrase', 'another-keyword'
],
# ... other categories
}
Then re-run:
python tools/scripts/auto_categorize_skills.py
python tools/scripts/generate_index.py
Troubleshooting
"Failed to categorize" Skills
Some skills may be too generic or unique. You can:
- Manually set category in the skill's frontmatter:
category: your-chosen-category
-
Add keywords to CATEGORY_KEYWORDS config
-
Move to folder if it fits a broader category:
skills/backend/my-new-skill/SKILL.md
Regenerating Index
After making changes to SKILL.md files:
python tools/scripts/generate_index.py
This will:
- Parse frontmatter categories
- Fallback to folder structure
- Generate new skills_index.json
- Copy to apps/web-app/public/skills.json
Result: Much cleaner category filter with smart, meaningful organization! 🎉