# CLI Architecture Refactor Proposal ## Fixing Issue #285 (Parser Sync) and Enabling Issue #268 (Preset System) **Date:** 2026-02-14 **Status:** Proposal - Pending Review **Related Issues:** #285, #268 --- ## Executive Summary This proposal outlines a unified architecture to: 1. **Fix Issue #285**: Parser definitions are out of sync with scraper modules 2. **Enable Issue #268**: Add a preset system to simplify user experience **Recommended Approach:** Pure Explicit (shared argument definitions) **Estimated Effort:** 2-3 days **Breaking Changes:** None (fully backward compatible) --- ## 1. Problem Analysis ### Issue #285: Parser Drift Current state: ``` src/skill_seekers/cli/ ├── doc_scraper.py # 26 arguments defined here ├── github_scraper.py # 15 arguments defined here ├── parsers/ │ ├── scrape_parser.py # 12 arguments (OUT OF SYNC!) │ ├── github_parser.py # 10 arguments (OUT OF SYNC!) ``` **Impact:** Users cannot use arguments like `--interactive`, `--url`, `--verbose` via the unified CLI. **Root Cause:** Code duplication - same arguments defined in two places. ### Issue #268: Flag Complexity Current `analyze` command has 10+ flags. Users are overwhelmed. **Proposed Solution:** Preset system (`--preset quick|standard|comprehensive`) --- ## 2. Proposed Architecture: Pure Explicit ### Core Principle Define arguments **once** in a shared location. Both the standalone scraper and unified CLI parser import and use the same definition. ``` ┌─────────────────────────────────────────────────────────────┐ │ SHARED ARGUMENT DEFINITIONS │ │ (src/skill_seekers/cli/arguments/*.py) │ ├─────────────────────────────────────────────────────────────┤ │ scrape.py ← All 26 scrape arguments defined ONCE │ │ github.py ← All 15 github arguments defined ONCE │ │ analyze.py ← All analyze arguments + presets │ │ common.py ← Shared arguments (verbose, config, etc) │ └─────────────────────────────────────────────────────────────┘ │ ┌───────────────┴───────────────┐ ▼ ▼ ┌─────────────────────────┐ ┌─────────────────────────┐ │ Standalone Scrapers │ │ Unified CLI Parsers │ ├─────────────────────────┤ ├─────────────────────────┤ │ doc_scraper.py │ │ parsers/scrape_parser.py│ │ github_scraper.py │ │ parsers/github_parser.py│ │ codebase_scraper.py │ │ parsers/analyze_parser.py│ └─────────────────────────┘ └─────────────────────────┘ ``` ### Why "Pure Explicit" Over "Hybrid" | Approach | Description | Risk Level | |----------|-------------|------------| | **Pure Explicit** (Recommended) | Define arguments in shared functions, call from both sides | ✅ Low - Uses only public APIs | | **Hybrid with Auto-Introspection** | Use `parser._actions` to copy arguments automatically | ⚠️ High - Uses internal APIs | | **Quick Fix** | Just fix scrape_parser.py | 🔴 Tech debt - Problem repeats | **Decision:** Use Pure Explicit. Slightly more code, but rock-solid maintainability. --- ## 3. Implementation Details ### 3.1 New Directory Structure ``` src/skill_seekers/cli/ ├── arguments/ # NEW: Shared argument definitions │ ├── __init__.py │ ├── common.py # Shared args: --verbose, --config, etc. │ ├── scrape.py # All scrape command arguments │ ├── github.py # All github command arguments │ ├── analyze.py # All analyze arguments + preset support │ └── pdf.py # PDF arguments │ ├── presets/ # NEW: Preset system (Issue #268) │ ├── __init__.py │ ├── base.py # Preset base class │ └── analyze_presets.py # Analyze-specific presets │ ├── parsers/ # EXISTING: Modified to use shared args │ ├── __init__.py │ ├── base.py │ ├── scrape_parser.py # Now imports from arguments/ │ ├── github_parser.py # Now imports from arguments/ │ ├── analyze_parser.py # Adds --preset support │ └── ... │ └── scrapers/ # EXISTING: Modified to use shared args ├── doc_scraper.py # Now imports from arguments/ ├── github_scraper.py # Now imports from arguments/ └── codebase_scraper.py # Now imports from arguments/ ``` ### 3.2 Shared Argument Definitions **File: `src/skill_seekers/cli/arguments/scrape.py`** ```python """Shared argument definitions for scrape command. This module defines ALL arguments for the scrape command in ONE place. Both doc_scraper.py and parsers/scrape_parser.py use these definitions. """ import argparse def add_scrape_arguments(parser: argparse.ArgumentParser) -> None: """Add all scrape command arguments to a parser. This is the SINGLE SOURCE OF TRUTH for scrape arguments. Used by: - doc_scraper.py (standalone scraper) - parsers/scrape_parser.py (unified CLI) """ # Positional argument parser.add_argument( "url", nargs="?", help="Documentation URL (positional argument)" ) # Core options parser.add_argument( "--url", type=str, help="Base documentation URL (alternative to positional)" ) parser.add_argument( "--interactive", "-i", action="store_true", help="Interactive configuration mode" ) parser.add_argument( "--config", "-c", type=str, help="Load configuration from JSON file" ) parser.add_argument( "--name", type=str, help="Skill name" ) parser.add_argument( "--description", "-d", type=str, help="Skill description" ) # Scraping options parser.add_argument( "--max-pages", type=int, dest="max_pages", metavar="N", help="Maximum pages to scrape (overrides config)" ) parser.add_argument( "--rate-limit", "-r", type=float, metavar="SECONDS", help="Override rate limit in seconds" ) parser.add_argument( "--workers", "-w", type=int, metavar="N", help="Number of parallel workers (default: 1, max: 10)" ) parser.add_argument( "--async", dest="async_mode", action="store_true", help="Enable async mode for better performance" ) parser.add_argument( "--no-rate-limit", action="store_true", help="Disable rate limiting" ) # Control options parser.add_argument( "--skip-scrape", action="store_true", help="Skip scraping, use existing data" ) parser.add_argument( "--dry-run", action="store_true", help="Preview what will be scraped without scraping" ) parser.add_argument( "--resume", action="store_true", help="Resume from last checkpoint" ) parser.add_argument( "--fresh", action="store_true", help="Clear checkpoint and start fresh" ) # Enhancement options parser.add_argument( "--enhance", action="store_true", help="Enhance SKILL.md using Claude API (requires API key)" ) parser.add_argument( "--enhance-local", action="store_true", help="Enhance using Claude Code (no API key needed)" ) parser.add_argument( "--interactive-enhancement", action="store_true", help="Open terminal for enhancement (with --enhance-local)" ) parser.add_argument( "--api-key", type=str, help="Anthropic API key (or set ANTHROPIC_API_KEY)" ) # Output options parser.add_argument( "--verbose", "-v", action="store_true", help="Enable verbose output" ) parser.add_argument( "--quiet", "-q", action="store_true", help="Minimize output" ) # RAG chunking options parser.add_argument( "--chunk-for-rag", action="store_true", help="Enable semantic chunking for RAG" ) parser.add_argument( "--chunk-size", type=int, default=512, metavar="TOKENS", help="Target chunk size in tokens (default: 512)" ) parser.add_argument( "--chunk-overlap", type=int, default=50, metavar="TOKENS", help="Overlap between chunks (default: 50)" ) parser.add_argument( "--no-preserve-code-blocks", action="store_true", help="Allow splitting code blocks" ) parser.add_argument( "--no-preserve-paragraphs", action="store_true", help="Ignore paragraph boundaries" ) ``` ### 3.3 How Existing Files Change **Before (doc_scraper.py):** ```python def create_argument_parser(): parser = argparse.ArgumentParser(...) parser.add_argument("url", nargs="?", help="...") parser.add_argument("--interactive", "-i", action="store_true", help="...") # ... 24 more add_argument calls ... return parser ``` **After (doc_scraper.py):** ```python from skill_seekers.cli.arguments.scrape import add_scrape_arguments def create_argument_parser(): parser = argparse.ArgumentParser(...) add_scrape_arguments(parser) # ← Single function call return parser ``` **Before (parsers/scrape_parser.py):** ```python class ScrapeParser(SubcommandParser): def add_arguments(self, parser): parser.add_argument("url", nargs="?", help="...") # ← Duplicate! parser.add_argument("--config", help="...") # ← Duplicate! # ... only 12 args, missing many! ``` **After (parsers/scrape_parser.py):** ```python from skill_seekers.cli.arguments.scrape import add_scrape_arguments class ScrapeParser(SubcommandParser): def add_arguments(self, parser): add_scrape_arguments(parser) # ← Same function as doc_scraper! ``` ### 3.4 Preset System (Issue #268) **File: `src/skill_seekers/cli/presets/analyze_presets.py`** ```python """Preset definitions for analyze command.""" from dataclasses import dataclass from typing import Dict @dataclass(frozen=True) class AnalysisPreset: """Definition of an analysis preset.""" name: str description: str depth: str # "surface", "deep", "full" features: Dict[str, bool] enhance_level: int estimated_time: str # Preset definitions PRESETS = { "quick": AnalysisPreset( name="Quick", description="Fast basic analysis (~1-2 min)", depth="surface", features={ "api_reference": True, "dependency_graph": False, "patterns": False, "test_examples": False, "how_to_guides": False, "config_patterns": False, }, enhance_level=0, estimated_time="1-2 minutes" ), "standard": AnalysisPreset( name="Standard", description="Balanced analysis with core features (~5-10 min)", depth="deep", features={ "api_reference": True, "dependency_graph": True, "patterns": True, "test_examples": True, "how_to_guides": False, "config_patterns": True, }, enhance_level=0, estimated_time="5-10 minutes" ), "comprehensive": AnalysisPreset( name="Comprehensive", description="Full analysis with AI enhancement (~20-60 min)", depth="full", features={ "api_reference": True, "dependency_graph": True, "patterns": True, "test_examples": True, "how_to_guides": True, "config_patterns": True, }, enhance_level=1, estimated_time="20-60 minutes" ), } def apply_preset(args, preset_name: str) -> None: """Apply a preset to args namespace.""" preset = PRESETS[preset_name] args.depth = preset.depth args.enhance_level = preset.enhance_level for feature, enabled in preset.features.items(): setattr(args, f"skip_{feature}", not enabled) ``` **Usage in analyze_parser.py:** ```python from skill_seekers.cli.arguments.analyze import add_analyze_arguments from skill_seekers.cli.presets.analyze_presets import PRESETS class AnalyzeParser(SubcommandParser): def add_arguments(self, parser): # Add all base arguments add_analyze_arguments(parser) # Add preset argument parser.add_argument( "--preset", choices=list(PRESETS.keys()), help=f"Analysis preset ({', '.join(PRESETS.keys())})" ) ``` --- ## 4. Testing Strategy ### 4.1 Parser Sync Test (Prevents Regression) **File: `tests/test_parser_sync.py`** ```python """Test that parsers stay in sync with scraper modules.""" import argparse import pytest class TestScrapeParserSync: """Ensure scrape_parser has all arguments from doc_scraper.""" def test_scrape_arguments_in_sync(self): """Verify unified CLI parser has all doc_scraper arguments.""" from skill_seekers.cli.doc_scraper import create_argument_parser from skill_seekers.cli.parsers.scrape_parser import ScrapeParser # Get source arguments from doc_scraper source_parser = create_argument_parser() source_dests = {a.dest for a in source_parser._actions} # Get target arguments from unified CLI parser target_parser = argparse.ArgumentParser() ScrapeParser().add_arguments(target_parser) target_dests = {a.dest for a in target_parser._actions} # Check for missing arguments missing = source_dests - target_dests assert not missing, f"scrape_parser missing arguments: {missing}" class TestGitHubParserSync: """Ensure github_parser has all arguments from github_scraper.""" def test_github_arguments_in_sync(self): """Verify unified CLI parser has all github_scraper arguments.""" from skill_seekers.cli.github_scraper import create_argument_parser from skill_seekers.cli.parsers.github_parser import GitHubParser source_parser = create_argument_parser() source_dests = {a.dest for a in source_parser._actions} target_parser = argparse.ArgumentParser() GitHubParser().add_arguments(target_parser) target_dests = {a.dest for a in target_parser._actions} missing = source_dests - target_dests assert not missing, f"github_parser missing arguments: {missing}" ``` ### 4.2 Preset System Tests ```python """Test preset system functionality.""" import pytest from skill_seekers.cli.presets.analyze_presets import ( PRESETS, apply_preset, AnalysisPreset ) class TestAnalyzePresets: """Test analyze preset definitions.""" def test_all_presets_have_required_fields(self): """Verify all presets have required attributes.""" required_fields = ['name', 'description', 'depth', 'features', 'enhance_level', 'estimated_time'] for preset_name, preset in PRESETS.items(): for field in required_fields: assert hasattr(preset, field), \ f"Preset '{preset_name}' missing field '{field}'" def test_preset_quick_has_minimal_features(self): """Verify quick preset disables most features.""" preset = PRESETS['quick'] assert preset.depth == 'surface' assert preset.enhance_level == 0 assert preset.features['dependency_graph'] is False assert preset.features['patterns'] is False def test_preset_comprehensive_has_all_features(self): """Verify comprehensive preset enables all features.""" preset = PRESETS['comprehensive'] assert preset.depth == 'full' assert preset.enhance_level == 1 assert all(preset.features.values()), \ "Comprehensive preset should enable all features" def test_apply_preset_modifies_args(self): """Verify apply_preset correctly modifies args.""" from argparse import Namespace args = Namespace() apply_preset(args, 'quick') assert args.depth == 'surface' assert args.enhance_level == 0 assert args.skip_dependency_graph is True ``` --- ## 5. Migration Plan ### Phase 1: Foundation (Day 1) 1. **Create `arguments/` module** - `arguments/__init__.py` - `arguments/common.py` - shared arguments - `arguments/scrape.py` - all 26 scrape arguments 2. **Update `doc_scraper.py`** - Replace inline argument definitions with import from `arguments/scrape.py` - Test: `python -m skill_seekers.cli.doc_scraper --help` works 3. **Update `parsers/scrape_parser.py`** - Replace inline definitions with import from `arguments/scrape.py` - Test: `skill-seekers scrape --help` shows all 26 arguments ### Phase 2: Extend to Other Commands (Day 2) 1. **Create `arguments/github.py`** 2. **Update `github_scraper.py` and `parsers/github_parser.py`** 3. **Repeat for `pdf`, `analyze`, `unified` commands** 4. **Add parser sync tests** (`tests/test_parser_sync.py`) ### Phase 3: Preset System (Day 2-3) 1. **Create `presets/` module** - `presets/__init__.py` - `presets/base.py` - `presets/analyze_presets.py` 2. **Update `parsers/analyze_parser.py`** - Add `--preset` argument - Add preset resolution logic 3. **Update `codebase_scraper.py`** - Handle preset mapping in main() 4. **Add preset tests** ### Phase 4: Documentation & Cleanup (Day 3) 1. **Update docstrings** 2. **Update README.md** with preset examples 3. **Run full test suite** 4. **Verify backward compatibility** --- ## 6. Backward Compatibility ### Fully Maintained | Aspect | Compatibility | |--------|---------------| | Command-line interface | ✅ 100% compatible - no removed arguments | | JSON configs | ✅ No changes | | Python API | ✅ No changes to public functions | | Existing scripts | ✅ Continue to work | ### New Capabilities | Feature | Availability | |---------|--------------| | `--interactive` flag | Now works in unified CLI | | `--url` flag | Now works in unified CLI | | `--preset quick` | New capability | | All scrape args | Now available in unified CLI | --- ## 7. Benefits Summary | Benefit | How Achieved | |---------|--------------| | **Fixes #285** | Single source of truth - parsers cannot drift | | **Enables #268** | Preset system built on clean foundation | | **Maintainable** | Explicit code, no magic, no internal APIs | | **Testable** | Easy to verify sync with automated tests | | **Extensible** | Easy to add new commands or presets | | **Type-safe** | Functions can be type-checked | | **Documented** | Arguments defined once, documented once | --- ## 8. Trade-offs | Aspect | Trade-off | |--------|-----------| | **Lines of code** | ~200 more lines than hybrid approach (acceptable) | | **Import overhead** | One extra import per module (negligible) | | **Refactoring effort** | 2-3 days vs 2 hours for quick fix (worth it) | --- ## 9. Decision Required Please review this proposal and indicate: 1. **✅ Approve** - Start implementation of Pure Explicit approach 2. **🔄 Modify** - Request changes to the approach 3. **❌ Reject** - Choose alternative (Hybrid or Quick Fix) **Questions to consider:** - Does this architecture meet your long-term maintainability goals? - Is the 2-3 day timeline acceptable? - Should we include any additional commands in the refactor? --- ## Appendix A: Alternative Approaches Considered ### A.1 Quick Fix (Rejected) Just fix `scrape_parser.py` to match `doc_scraper.py`. **Why rejected:** Problem will recur. No systematic solution. ### A.2 Hybrid with Auto-Introspection (Rejected) Use `parser._actions` to copy arguments automatically. **Why rejected:** Uses internal argparse APIs (`_actions`). Fragile. ```python # FRAGILE - Uses internal API for action in source_parser._actions: if action.dest not in common_dests: # How to clone? _clone_argument doesn't exist! ``` ### A.3 Click Framework (Rejected) Migrate entire CLI to Click. **Why rejected:** Major refactor, breaking changes, too much effort. --- ## Appendix B: Example User Experience ### After Fix (Issue #285) ```bash # Before: ERROR $ skill-seekers scrape --interactive error: unrecognized arguments: --interactive # After: WORKS $ skill-seekers scrape --interactive ? Enter documentation URL: https://react.dev ? Skill name: react ... ``` ### With Presets (Issue #268) ```bash # Before: Complex flags $ skill-seekers analyze --directory . --depth full \ --skip-patterns --skip-test-examples ... # After: Simple preset $ skill-seekers analyze --directory . --preset comprehensive 🚀 Comprehensive analysis mode: all features + AI enhancement (~20-60 min) ``` --- *End of Proposal*