Updated 6 of 21 diagrams to reflect the SkillConverter + ExecutionContext architecture: Structural: - 01 CLICore: Added ExecutionContext singleton, updated CreateCommand methods (_build_config, _route_to_scraper, _run_enhancement) - 02 Scrapers: Replaced IScraper interface with SkillConverter base class, added CONVERTER_REGISTRY, removed main() from all 18 converters - 09 Parsers: Removed 18 scraper-specific parsers (27 → 18 subclasses) Behavioral: - 14 Create Pipeline: Rewritten — converter.run() replaces scraper.main(), ExecutionContext.initialize() added, enhancement centralized - 17 MCP Invocation: Path A now uses get_converter() in-process, not subprocess - UML_ARCHITECTURE.md descriptions updated throughout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
154 lines
11 KiB
Markdown
154 lines
11 KiB
Markdown
# Skill Seekers Architecture
|
|
|
|
> Updated 2026-04-08 | StarUML project: `docs/UML/skill_seekers.mdj`
|
|
|
|
## Overview
|
|
|
|
Skill Seekers converts documentation from 18 source types into production-ready formats for 24+ AI platforms. The architecture follows a layered module design with 8 core modules and 5 utility modules. All source types are routed through a single `skill-seekers create` command via the `SkillConverter` base class + factory pattern.
|
|
|
|
## Package Diagram
|
|
|
|

|
|
|
|
**Core Modules** (upper area):
|
|
- **CLICore** -- Git-style command dispatcher, entry point for all `skill-seekers` commands
|
|
- **Scrapers** -- 17 source-type extractors (web, GitHub, PDF, Word, EPUB, video, etc.)
|
|
- **Adaptors** -- Strategy+Factory pattern for 20+ output platforms (Claude, Gemini, OpenAI, RAG frameworks)
|
|
- **Analysis** -- C3.x codebase analysis pipeline (AST parsing, 10 GoF pattern detectors, guide builders)
|
|
- **Enhancement** -- AI-powered skill improvement via `AgentClient` (API mode: Anthropic/Kimi/Gemini/OpenAI + LOCAL mode: Claude Code/Kimi/Codex/Copilot/OpenCode/custom, --enhance-level 0-3)
|
|
- **Packaging** -- Package, upload, and install skills to AI agent directories
|
|
- **MCP** -- FastMCP server exposing 40 tools via stdio/HTTP transport (includes marketplace and config publishing)
|
|
- **Sync** -- Documentation change detection and re-scraping triggers
|
|
|
|
**Utility Modules** (lower area):
|
|
- **Parsers** -- CLI argument parsers (30+ SubcommandParser subclasses)
|
|
- **Storage** -- Cloud storage abstraction (S3, GCS, Azure)
|
|
- **Embedding** -- Multi-provider vector embedding generation
|
|
- **Benchmark** -- Performance measurement framework
|
|
- **Utilities** -- Shared helpers (LanguageDetector, RAGChunker, MarkdownCleaner, etc.)
|
|
|
|
## Core Module Diagrams
|
|
|
|
### CLICore
|
|

|
|
|
|
Entry point: `skill-seekers` CLI. `CLIDispatcher` maps subcommands to modules via `COMMAND_MODULES` dict. `CreateCommand` auto-detects source type via `SourceDetector`, initializes `ExecutionContext` singleton (Pydantic model, single source of truth for all config), then calls `get_converter()` → `converter.run()`. Enhancement runs centrally in CreateCommand after the converter completes.
|
|
|
|
### Scrapers
|
|

|
|
|
|
18 converter classes inheriting `SkillConverter` base class (Template Method: `run()` → `extract()` → `build_skill()`). Factory: `get_converter(source_type, config)` via `CONVERTER_REGISTRY`. No `main()` entry points — all routing through `CreateCommand`. Notable: `GitHubScraper` (3-stream fetcher) + `GitHubToSkillConverter` (builder), `UnifiedScraper` (multi-source orchestrator).
|
|
|
|
### Adaptors
|
|

|
|
|
|
`SkillAdaptor` ABC with 3 abstract methods: `format_skill_md()`, `package()`, `upload()`. Two-level hierarchy: direct subclasses (Claude, Gemini, OpenAI, Markdown, OpenCode, RAG adaptors) and `OpenAICompatibleAdaptor` intermediate (MiniMax, Kimi, DeepSeek, Qwen, OpenRouter, Together, Fireworks).
|
|
|
|
### Analysis (C3.x Pipeline)
|
|

|
|
|
|
`UnifiedCodebaseAnalyzer` controller orchestrates: `CodeAnalyzer` (AST, 9 languages), `PatternRecognizer` (10 GoF detectors via `BasePatternDetector`), `TestExampleExtractor`, `HowToGuideBuilder`, `ConfigExtractor`, `SignalFlowAnalyzer`, `DependencyAnalyzer`, `ArchitecturalPatternDetector`.
|
|
|
|
### Enhancement
|
|

|
|
|
|
Two enhancement hierarchies: `AIEnhancer` (API mode, multi-provider via `AgentClient`) and `UnifiedEnhancer` (C3.x pipeline enhancers). Each has specialized subclasses for patterns, test examples, guides, and configs. `WorkflowEngine` orchestrates multi-stage `EnhancementWorkflow`. The `AgentClient` (`cli/agent_client.py`) centralizes all AI invocations, supporting API mode (Anthropic, Moonshot/Kimi, Gemini, OpenAI) and LOCAL mode (Claude Code, Kimi Code, Codex, Copilot, OpenCode, custom agents).
|
|
|
|
### Packaging
|
|

|
|
|
|
`PackageSkill` delegates to adaptors for format-specific packaging. `UploadSkill` handles platform API uploads. `InstallSkill`/`InstallAgent` install to AI agent directories. `OpenCodeSkillSplitter` handles large file splitting.
|
|
|
|
### MCP Server
|
|

|
|
|
|
`SkillSeekerMCPServer` (FastMCP) with 40 tools in 10 categories. Supporting classes: `SourceManager` (config CRUD), `AgentDetector` (environment detection), `GitConfigRepo` (community configs), `MarketplacePublisher` (publish skills to marketplace repos), `MarketplaceManager` (marketplace registry CRUD), `ConfigPublisher` (push configs to registered source repos).
|
|
|
|
### Sync
|
|

|
|
|
|
`SyncMonitor` controller schedules periodic checks via `ChangeDetector` (SHA-256 hashing, HTTP headers, content diffing). `Notifier` sends alerts when changes are found. Pydantic models: `PageChange`, `ChangeReport`, `SyncConfig`, `SyncState`.
|
|
|
|
## Utility Module Diagrams
|
|
|
|
### Parsers
|
|

|
|
|
|
`SubcommandParser` ABC with 18 subclasses — individual scraper parsers removed after Grand Unification (all source types route through `CreateParser`). Remaining: Create, Doctor, Config, Enhance, EnhanceStatus, Package, Upload, Estimate, Install, InstallAgent, TestExamples, Resume, Quality, Workflows, SyncConfig, Stream, Update, Multilang.
|
|
|
|
### Storage
|
|

|
|
|
|
`BaseStorageAdaptor` ABC with `S3StorageAdaptor`, `GCSStorageAdaptor`, `AzureStorageAdaptor`. `StorageObject` dataclass for file metadata.
|
|
|
|
### Embedding
|
|

|
|
|
|
`EmbeddingGenerator` (multi-provider: OpenAI, Sentence Transformers, Voyage AI). `EmbeddingPipeline` coordinates provider, caching, and cost tracking. `EmbeddingProvider` ABC with OpenAI and Local implementations.
|
|
|
|
### Benchmark
|
|

|
|
|
|
`BenchmarkRunner` orchestrates `Benchmark` instances. `BenchmarkResult` collects timings/memory/metrics and produces `BenchmarkReport`. Supporting data types: `Metric`, `TimingResult`, `MemoryUsage`, `ComparisonReport`.
|
|
|
|
### Utilities
|
|

|
|
|
|
16 shared helper classes: `LanguageDetector`, `MarkdownCleaner`, `RAGChunker`, `RateLimitHandler`, `ConfigManager`, `ConfigValidator`, `SkillQualityChecker`, `QualityAnalyzer`, `LlmsTxtDetector`/`Downloader`/`Parser`, `ConfigSplitter`, `ConflictDetector`, `IncrementalUpdater`, `MultiLanguageManager`, `StreamingIngester`.
|
|
|
|
## Key Design Patterns
|
|
|
|
| Pattern | Where | Classes |
|
|
|---------|-------|---------|
|
|
| Strategy + Factory | Adaptors | `SkillAdaptor` ABC + `get_adaptor()` factory + 20+ implementations |
|
|
| Strategy + Factory | Storage | `BaseStorageAdaptor` ABC + S3/GCS/Azure |
|
|
| Strategy + Factory | Embedding | `EmbeddingProvider` ABC + OpenAI/Local |
|
|
| Template Method + Factory | Scrapers | `SkillConverter` base + `get_converter()` factory + 18 converter subclasses |
|
|
| Singleton | Configuration | `ExecutionContext` Pydantic model — single source of truth for all config |
|
|
| Command | CLI | `CLIDispatcher` + `COMMAND_MODULES` lazy dispatch |
|
|
| Template Method | Pattern Detection | `BasePatternDetector` + 10 GoF detectors |
|
|
| Template Method | Parsers | `SubcommandParser` + 18 subclasses |
|
|
|
|
## Behavioral Diagrams
|
|
|
|
### Create Pipeline Sequence
|
|

|
|
|
|
`CreateCommand` is now the pipeline orchestrator. Flow: User → `execute()` → `SourceDetector.detect(source)` → `validate_source()` → `ExecutionContext.initialize()` → `_validate_arguments()` → `get_converter(type, config)` → `converter.run()` (extract + build_skill) → `_run_enhancement(ctx)` → `_run_workflows()`. Enhancement is centralized in CreateCommand, not inside each converter.
|
|
|
|
### GitHub Unified Flow + C3.x
|
|

|
|
|
|
`UnifiedScraper` orchestrates GitHub scraping (3-stream fetch) then delegates to `analyze_codebase(enhance_level)` for C3.x analysis. Shows all 5 C3.x stages: `PatternRecognizer` (C3.1), `TestExampleExtractor` (C3.2), `HowToGuideBuilder` with examples from C3.2 (C3.3), `ConfigExtractor` (C3.4), and `ArchitecturalPatternDetector` (C3.5). Note: `enhance_level` is the sole AI control parameter — `enhance_with_ai`/`ai_mode` are internal to C3.x classes only.
|
|
|
|
### Source Auto-Detection
|
|

|
|
|
|
Activity diagram showing `source_detector.py` decision tree in correct code order: file extension first (.json config, .pdf/.docx/.epub/.ipynb/.html/.pptx/etc) → video URL → `os.path.isdir()` (Codebase) → GitHub pattern (owner/repo or github.com URL) → http/https URL (Web) → bare domain inference → error.
|
|
|
|
### MCP Tool Invocation
|
|

|
|
|
|
MCP Client (Claude Code/Cursor) → FastMCPServer (stdio/HTTP) with two invocation paths: **Path A** (scraping tools) uses `get_converter(type, config).run()` in-process via `_run_converter()` helper, **Path B** (packaging/config tools) uses direct Python imports (`get_adaptor()`, `sync_config()`). Both return TextContent → JSON-RPC.
|
|
|
|
### Enhancement Pipeline
|
|

|
|
|
|
`--enhance-level` decision flow with precise internal variable mapping: Level 0 sets `ai_mode=none`, skips all AI. Level >= 1 selects `ai_mode=api` (if any supported API key set: Anthropic, Moonshot/Kimi, Gemini, OpenAI) or `ai_mode=local` (via `AgentClient` with configurable agent: Claude Code, Kimi, Codex, Copilot, OpenCode, or custom), then SKILL.md enhancement happens post-build via `enhance_command`. Level >= 2 enables `enhance_config=True`, `enhance_architecture=True` inside `analyze_codebase()`. Level 3 adds `enhance_patterns=True`, `enhance_tests=True`.
|
|
|
|
### Runtime Components
|
|

|
|
|
|
Component diagram with runtime dependencies. Key flows: `CLI Core` dispatches to `Scrapers` via `get_converter()` → `converter.run()` (in-process, no subprocess). `Scrapers` call `Codebase Analysis` via `analyze_codebase(enhance_level)`. `Codebase Analysis` uses `C3.x Classes` internally and `Enhancement` when level ≥ 2. `MCP Server` reaches `Scrapers` via `get_converter()` in-process and `Adaptors` via direct import. `Scrapers` optionally use `Browser Renderer (Playwright)` via `render_page()` when `--browser` flag is set for JavaScript SPA sites.
|
|
|
|
### Browser Rendering Flow
|
|

|
|
|
|
When `--browser` flag is set, `DocScraper.scrape_page()` delegates to `BrowserRenderer.render_page(url)` instead of `requests.get()`. The renderer auto-installs Chromium on first use, navigates with `wait_until='networkidle'` to let JavaScript execute, then returns the fully-rendered HTML. The rest of the pipeline (BeautifulSoup → `extract_content()` → `save_page()`) remains unchanged. Optional dependency: `pip install "skill-seekers[browser]"`.
|
|
|
|
## File Locations
|
|
|
|
- **StarUML project**: `docs/UML/skill_seekers.mdj`
|
|
- **Diagram exports**: `docs/UML/exports/*.png`
|
|
- **Source code**: `src/skill_seekers/`
|