- Add CHANGELOG.md entry for v3.5.0 with all PR #336 changes - Update README.md: version 3.5.0, agent-agnostic examples, marketplace pipeline, SPA discovery - Update CLAUDE.md: AgentClient architecture, 40 MCP tools, new modules - Update docs/: UML architecture, MCP reference (40 tools, new tool categories), enhancement modes (multi-provider/multi-agent), FAQ - Update src/skill_seekers/mcp/README.md: accurate tool count and paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
16 KiB
Skill Seekers Roadmap
Transform Skill Seekers into the easiest way to create Claude AI skills from any knowledge source - documentation websites, PDFs, codebases, GitHub repos, Office docs, and more - with both CLI and MCP interfaces.
🎯 Current Status: v3.2.0 ✅
Latest Release: v3.2.0 (March 2026)
What Works:
- ✅ 17 source types — documentation, GitHub, PDF, video, Word, EPUB, Jupyter, local HTML, OpenAPI, AsciiDoc, PowerPoint, RSS/Atom, man pages, Confluence, Notion, Slack/Discord, local codebase
- ✅ Unified multi-source scraping with generic merge for any source combination
- ✅ 40 MCP tools fully functional
- ✅ Multi-platform support (16 platforms: Claude, Gemini, OpenAI, LangChain, LlamaIndex, Haystack, ChromaDB, FAISS, Weaviate, Qdrant, Cursor, Windsurf, Cline, Continue.dev, Pinecone, Markdown)
- ✅ Auto-upload to all platforms
- ✅ 24 preset configs (including 7 unified configs)
- ✅ Large docs support (40K+ pages with router skills)
- ✅ C3.x codebase analysis suite (C3.1-C3.10)
- ✅ Bootstrap skill feature - self-hosting capability
- ✅ 1,880+ tests passing
- ✅ Unified
createcommand with auto-detection for all 17 source types - ✅ Enhancement workflow presets (5 bundled: default, minimal, security-focus, architecture-comprehensive, api-documentation)
- ✅ Cloud storage integration (S3, GCS, Azure)
- ✅ Source auto-detection via
source_detector.py
Recent Improvements (v3.2.0):
- ✅ 10 new source types: Word, EPUB, video, Jupyter, local HTML, OpenAPI, AsciiDoc, PowerPoint, RSS/Atom, man pages, Confluence, Notion, Slack/Discord
- ✅ Generic merge system:
_generic_merge()inunified_skill_builder.pyhandles arbitrary source combinations - ✅ Unified CLI:
createcommand auto-detects all 17 source types - ✅ Workflow Presets: YAML-based enhancement presets with CLI management
- ✅ Progressive Disclosure: Default help shows 13 universal flags, detailed help per source
- ✅ Bug Fixes: Markdown parser h1 filtering, paragraph length filtering
- ✅ Docs Cleanup: Removed 47 stale planning/QA/release markdown files
🧭 Development Philosophy
Small tasks → Pick one → Complete → Move on
Instead of rigid milestones, we use a flexible task-based approach:
- 136 small, independent tasks across 10 categories
- Pick any task, any order
- Start small, ship often
- No deadlines, just continuous progress
Philosophy: Small steps → Consistent progress → Compound results
📋 Task-Based Roadmap (136 Tasks, 10 Categories)
🌐 Category A: Community & Sharing
Small tasks that build community features incrementally
A1: Config Sharing (Website Feature)
- Task A1.1: Create simple JSON API endpoint to list configs ✅ COMPLETE
- Status: Live at https://api.skillseekersweb.com
- Features: 6 REST endpoints, auto-categorization, auto-tags, filtering, SSL enabled
- Task A1.2: Add MCP tool
fetch_configto download from website ✅ COMPLETE- Features: List 24 configs, filter by category, download by name
- Task A1.3: Add MCP tool
submit_configto submit custom configs- Purpose: Allow users to submit custom configs via MCP (creates GitHub issue)
- Time: 2-3 hours
- Task A1.4: Create static config catalog website (GitHub Pages)
- Purpose: Read-only catalog to browse/search configs
- Time: 2-3 hours
- Task A1.5: Add config rating/voting system
- Purpose: Community feedback on config quality
- Time: 3-4 hours
- Task A1.6: Admin review queue for submitted configs
- Approach: Use GitHub Issues with labels
- Time: 1-2 hours
- Task A1.7: Add MCP tool
install_skillfor one-command workflow ✅ COMPLETE- Features: fetch → scrape → enhance → package → upload
- Completed: December 21, 2025
- Task A1.8: Add smart skill detection and auto-install
- Purpose: Auto-detect missing skills from user queries
- Time: 4-6 hours
Start Next: Pick A1.3 (MCP submit tool)
A2: Knowledge Sharing (Website Feature)
- Task A2.1: Design knowledge database schema
- Task A2.2: Create API endpoint to upload knowledge (.zip files)
- Task A2.3: Add MCP tool
fetch_knowledgeto download from site - Task A2.4: Add knowledge preview/description
- Task A2.5: Add knowledge categorization (by framework/topic)
- Task A2.6: Add knowledge search functionality
Start Small: Pick A2.1 first (schema design, no coding)
A3: Simple Website Foundation
- Task A3.1: Create single-page static site (GitHub Pages)
- Task A3.2: Add config gallery view
- Task A3.3: Add "Submit Config" link
- Task A3.4: Add basic stats
- Task A3.5: Add simple blog using GitHub Issues
- Task A3.6: Add RSS feed for updates
Start Small: Pick A3.1 first (single HTML page)
🛠️ Category B: New Input Formats
Add support for non-HTML documentation sources
B1: PDF Documentation Support ✅ COMPLETE (v3.0.0)
- Task B1.1: Research PDF parsing libraries ✅
- Task B1.2: Create simple PDF text extractor (POC) ✅
- Task B1.3: Add PDF page detection and chunking ✅
- Task B1.4: Extract code blocks from PDFs ✅
- Task B1.5: Add PDF image extraction ✅
- Task B1.6: Create
pdf_scraper.pyCLI tool ✅ - Task B1.7: Add MCP tool
scrape_pdf✅ - Task B1.8: Create PDF config format ✅
B2: Microsoft Word (.docx) Support ✅ COMPLETE (v3.2.0)
- Task B2.1-B2.7: Word document parsing and scraping ✅
B3: Excel/Spreadsheet (.xlsx) Support
- Task B3.1-B3.6: Spreadsheet parsing and API extraction
B4: Markdown Files Support ✅ COMPLETE (v3.1.0)
- Task B4.1-B4.6: Local markdown directory scraping ✅
B5: Additional Source Types ✅ COMPLETE (v3.2.0)
- EPUB -
epub_scraper.py✅ - Video -
video_scraper.py(YouTube, Vimeo, local files) ✅ - Jupyter Notebook -
jupyter_scraper.py✅ - Local HTML -
html_scraper.py✅ - OpenAPI/Swagger -
openapi_scraper.py✅ - AsciiDoc -
asciidoc_scraper.py✅ - PowerPoint -
pptx_scraper.py✅ - RSS/Atom -
rss_scraper.py✅ - Man pages -
manpage_scraper.py✅ - Confluence -
confluence_scraper.py✅ - Notion -
notion_scraper.py✅ - Slack/Discord -
chat_scraper.py✅
💻 Category C: Codebase Knowledge
Generate skills from actual code repositories
C1: GitHub Repository Scraping
- Task C1.1-C1.12: GitHub API integration and code analysis
C2: Local Codebase Scraping
- Task C2.1-C2.8: Local directory analysis and API extraction
C3: Code Pattern Recognition
- Task C3.1: Detect common patterns (singleton, factory, etc.) ✅ v2.6.0
- 10 GoF patterns, 9 languages, 87% precision
- Task C3.2: Extract usage examples from test files ✅ v2.6.0
- 5 categories, 9 languages, 80%+ high-confidence examples
- Task C3.3: Build "how to" guides from code
- Task C3.4: Extract configuration patterns
- Task C3.5: Create architectural overview
- Task C3.6: AI Enhancement for Pattern Detection ✅ v2.6.0
- Claude API integration for enhanced insights
- Task C3.7: Architectural Pattern Detection ✅ v2.6.0
- Detects 8 architectural patterns, framework-aware
Start Next: Pick C3.3 (build guides from workflow examples)
🔌 Category D: Context7 Integration
- Task D1.1-D1.4: Research and planning
- Task D2.1-D2.5: Basic integration
🚀 Category E: MCP Enhancements
Small improvements to existing MCP tools
E1: New MCP Tools
- Task E1.3: Add
scrape_pdfMCP tool ✅ - Task E1.1: Add
fetch_configMCP tool - Task E1.2: Add
fetch_knowledgeMCP tool - Task E1.4-E1.9: Additional format scrapers
E2: MCP Quality Improvements
- Task E2.1: Add error handling to all tools
- Task E2.2: Add structured logging
- Task E2.3: Add progress indicators
- Task E2.4: Add validation for all inputs
- Task E2.5: Add helpful error messages
- Task E2.6: Add retry logic for network failures ✅ Utilities ready
⚡ Category F: Performance & Reliability
Technical improvements to existing features
F1: Core Scraper Improvements
- Task F1.1: Add URL normalization
- Task F1.2: Add duplicate page detection
- Task F1.3: Add memory-efficient streaming
- Task F1.4: Add HTML parser fallback
- Task F1.5: Add network retry with exponential backoff ✅
- Task F1.6: Fix package path output bug
F2: Incremental Updates
- Task F2.1-F2.5: Track modifications, update only changed content
🎨 Category G: Tools & Utilities
Small standalone tools that add value
G1: Config Tools
- Task G1.1: Create
validate_config.py - Task G1.2: Create
test_selectors.py - Task G1.3: Create
auto_detect_selectors.py(AI-powered) - Task G1.4: Create
compare_configs.py - Task G1.5: Create
optimize_config.py
G2: Skill Quality Tools
- Task G2.1-G2.5: Quality analysis and reporting
📚 Category H: Community Response
- Task H1.1-H1.5: Address open GitHub issues
🎓 Category I: Content & Documentation
- Task I1.1-I1.6: Video tutorials
- Task I2.1-I2.5: Written guides
🧪 Category J: Testing & Quality
- Task J1.1-J1.6: Test expansion and coverage
🎯 Recommended Starting Tasks
Quick Wins (1-2 hours each):
- H1.1 - Respond to Issue #8
- J1.1 - Install MCP package
- A3.1 - Create GitHub Pages site
- B1.1 - Research PDF parsing
- F1.1 - Add URL normalization
Medium Tasks (3-5 hours each):
- ✅ A1.1 - JSON API for configs (COMPLETE)
- G1.1 - Config validator script
- C1.1 - GitHub API client
- I1.1 - Video script writing
- E2.1 - Error handling for MCP tools
📊 Release History
✅ v2.6.0 - C3.x Codebase Analysis Suite (January 14, 2026)
Focus: Complete codebase analysis with multi-platform support
Completed Features:
- C3.x suite (C3.1-C3.8): Pattern detection, test extraction, architecture analysis
- Multi-platform support: Claude, Gemini, OpenAI, Markdown
- Platform adaptor architecture
- 18 MCP tools (up from 9)
- 700+ tests passing
- Unified multi-source scraping maturity
✅ v2.1.0 - Test Coverage & Quality (November 29, 2025)
Focus: Test coverage and unified scraping
Completed Features:
- Fixed 12 unified scraping tests
- GitHub repository scraping with unlimited local analysis
- PDF extraction and conversion
- 427 tests passing
✅ v1.0.0 - Production Release (October 19, 2025)
First stable release
Core Features:
- Documentation scraping with BFS
- Smart categorization
- Language detection
- Pattern extraction
- 12 preset configurations
- MCP server with 9 tools
- Large documentation support (40K+ pages)
- Auto-upload functionality
📅 Release Planning
Release: v2.7.0 (Estimated: February 2026)
Focus: Router Quality Improvements & Multi-Source Maturity
Planned Features:
- Router skill quality improvements
- Enhanced multi-source synthesis
- Source-parity for all scrapers
- AI enhancement improvements
- Documentation refinements
Release: v2.8.0 (Estimated: Q1 2026)
Focus: Web Presence & Community Growth
Planned Features:
- GitHub Pages website (skillseekersweb.com)
- Interactive documentation
- Config submission workflow
- Community showcase
- Video tutorials
Release: v2.9.0 (Estimated: Q2 2026)
Focus: Developer Experience & Integrations
Planned Features:
- Web UI for config generation
- CI/CD integration examples
- Docker containerization
- Enhanced scraping formats (Sphinx, Docusaurus detection)
- Performance optimizations
🔮 Long-term Vision (v3.0+)
Major Features Under Consideration
Advanced Scraping
- Real-time documentation monitoring
- Automatic skill updates
- Change notifications
- Multi-language documentation support
Collaboration
- Collaborative skill curation
- Shared skill repositories
- Community ratings and reviews
- Skill marketplace
AI & Intelligence
- Enhanced AI analysis
- Better conflict detection algorithms
- Automatic documentation quality scoring
- Semantic understanding and natural language queries
Ecosystem
- VS Code extension
- IntelliJ/PyCharm plugin
- Interactive TUI mode
- Skill diff and merge tools
📈 Metrics & Goals
Current State (v3.2.0) ✅
- ✅ 17 source types supported
- ✅ 24 preset configs (14 official + 10 test/examples)
- ✅ 1,880+ tests (excellent coverage)
- ✅ 40 MCP tools
- ✅ 4 platform adaptors (Claude, Gemini, OpenAI, Markdown)
- ✅ C3.x codebase analysis suite complete
- ✅ Multi-source synthesis with generic merge for any combination
Goals for v2.7-v2.9
- 🎯 Professional website live
- 🎯 50+ preset configs
- 🎯 Video tutorial series (5+ videos)
- 🎯 100+ GitHub stars
- 🎯 Community contributions flowing
Goals for v3.0+
- 🎯 Auto-detection for 80%+ of sites
- 🎯 <1 minute skill generation
- 🎯 Active community marketplace
- 🎯 Quality scoring system
- 🎯 Real-time monitoring
🤝 How to Influence the Roadmap
Priority System
Features are prioritized based on:
- User impact - How many users will benefit?
- Technical feasibility - How complex is the implementation?
- Community interest - How many upvotes/requests?
- Strategic alignment - Does it fit our vision?
Ways to Contribute
- Vote on Features - ⭐ Star feature request issues
- Contribute Code - Pick any task from the 136 available
- Share Feedback - Open issues, share success stories
- Help with Documentation - Write tutorials, improve docs
See CONTRIBUTING.md for detailed guidelines.
🎨 Flexibility Rules
- Pick any task, any order - No rigid dependencies
- Start small - Research tasks before implementation
- One task at a time - Focus, complete, move on
- Switch anytime - Not enjoying it? Pick another!
- Document as you go - Each task should update docs
- Test incrementally - Each task should have a quick test
- Ship early - Don't wait for "complete" features
📊 Progress Tracking
Completed Tasks: 10+ (C3.1, C3.2, C3.6, C3.7, A1.1, A1.2, A1.7, E1.3, E2.6, F1.5) In Progress: Router quality improvements (v2.7.0) Total Available Tasks: 136
No pressure, no deadlines, just progress! ✨
🔗 Related Projects
- Model Context Protocol
- Claude Code
- Anthropic Claude
- Documentation frameworks we support: Docusaurus, GitBook, VuePress, Sphinx, MkDocs
📚 Learn More
- Project Board: https://github.com/users/yusufkaraaslan/projects/2
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Discussions: https://github.com/yusufkaraaslan/Skill_Seekers/discussions
- Issues: https://github.com/yusufkaraaslan/Skill_Seekers/issues
Last Updated: March 15, 2026 Philosophy: Small steps → Consistent progress → Compound results
Together, we're building the future of documentation-to-AI skill conversion! 🚀