Files
skill-seekers-reference/CLI_REFACTOR_PROPOSAL.md
yusyus ba1670a220 feat: Unified create command + consolidated enhancement flags
This commit includes two major improvements:

## 1. Unified Create Command (v3.0.0 feature)
- Auto-detects source type (web, GitHub, local, PDF, config)
- Three-tier argument organization (universal, source-specific, advanced)
- Routes to existing scrapers (100% backward compatible)
- Progressive disclosure: 15 universal flags in default help

**New files:**
- src/skill_seekers/cli/source_detector.py - Auto-detection logic
- src/skill_seekers/cli/arguments/create.py - Argument definitions
- src/skill_seekers/cli/create_command.py - Main orchestrator
- src/skill_seekers/cli/parsers/create_parser.py - Parser integration

**Tests:**
- tests/test_source_detector.py (35 tests)
- tests/test_create_arguments.py (30 tests)
- tests/test_create_integration_basic.py (10 tests)

## 2. Enhanced Flag Consolidation (Phase 1)
- Consolidated 3 flags (--enhance, --enhance-local, --enhance-level) → 1 flag
- --enhance-level 0-3 with auto-detection of API vs LOCAL mode
- Default: --enhance-level 2 (balanced enhancement)

**Modified files:**
- arguments/{common,create,scrape,github,analyze}.py - Added enhance_level
- {doc_scraper,github_scraper,config_extractor,main}.py - Updated logic
- create_command.py - Uses consolidated flag

**Auto-detection:**
- If ANTHROPIC_API_KEY set → API mode
- Else → LOCAL mode (Claude Code)

## 3. PresetManager Bug Fix
- Fixed module naming conflict (presets.py vs presets/ directory)
- Moved presets.py → presets/manager.py
- Updated __init__.py exports

**Test Results:**
- All 160+ tests passing
- Zero regressions
- 100% backward compatible

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 14:29:19 +03:00

22 KiB

CLI Architecture Refactor Proposal

Fixing Issue #285 (Parser Sync) and Enabling Issue #268 (Preset System)

Date: 2026-02-14
Status: Proposal - Pending Review
Related Issues: #285, #268


Executive Summary

This proposal outlines a unified architecture to:

  1. Fix Issue #285: Parser definitions are out of sync with scraper modules
  2. Enable Issue #268: Add a preset system to simplify user experience

Recommended Approach: Pure Explicit (shared argument definitions)
Estimated Effort: 2-3 days
Breaking Changes: None (fully backward compatible)


1. Problem Analysis

Issue #285: Parser Drift

Current state:

src/skill_seekers/cli/
├── doc_scraper.py          # 26 arguments defined here
├── github_scraper.py       # 15 arguments defined here
├── parsers/
│   ├── scrape_parser.py    # 12 arguments (OUT OF SYNC!)
│   ├── github_parser.py    # 10 arguments (OUT OF SYNC!)

Impact: Users cannot use arguments like --interactive, --url, --verbose via the unified CLI.

Root Cause: Code duplication - same arguments defined in two places.

Issue #268: Flag Complexity

Current analyze command has 10+ flags. Users are overwhelmed.

Proposed Solution: Preset system (--preset quick|standard|comprehensive)


2. Proposed Architecture: Pure Explicit

Core Principle

Define arguments once in a shared location. Both the standalone scraper and unified CLI parser import and use the same definition.

┌─────────────────────────────────────────────────────────────┐
│              SHARED ARGUMENT DEFINITIONS                     │
│         (src/skill_seekers/cli/arguments/*.py)              │
├─────────────────────────────────────────────────────────────┤
│  scrape.py      ← All 26 scrape arguments defined ONCE      │
│  github.py      ← All 15 github arguments defined ONCE      │
│  analyze.py     ← All analyze arguments + presets           │
│  common.py      ← Shared arguments (verbose, config, etc)   │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌─────────────────────────┐      ┌─────────────────────────┐
│   Standalone Scrapers   │      │   Unified CLI Parsers   │
├─────────────────────────┤      ├─────────────────────────┤
│ doc_scraper.py          │      │ parsers/scrape_parser.py│
│ github_scraper.py       │      │ parsers/github_parser.py│
│ codebase_scraper.py     │      │ parsers/analyze_parser.py│
└─────────────────────────┘      └─────────────────────────┘

Why "Pure Explicit" Over "Hybrid"

Approach Description Risk Level
Pure Explicit (Recommended) Define arguments in shared functions, call from both sides Low - Uses only public APIs
Hybrid with Auto-Introspection Use parser._actions to copy arguments automatically ⚠️ High - Uses internal APIs
Quick Fix Just fix scrape_parser.py 🔴 Tech debt - Problem repeats

Decision: Use Pure Explicit. Slightly more code, but rock-solid maintainability.


3. Implementation Details

3.1 New Directory Structure

src/skill_seekers/cli/
├── arguments/                    # NEW: Shared argument definitions
│   ├── __init__.py
│   ├── common.py                # Shared args: --verbose, --config, etc.
│   ├── scrape.py                # All scrape command arguments
│   ├── github.py                # All github command arguments
│   ├── analyze.py               # All analyze arguments + preset support
│   └── pdf.py                   # PDF arguments
│
├── presets/                     # NEW: Preset system (Issue #268)
│   ├── __init__.py
│   ├── base.py                  # Preset base class
│   └── analyze_presets.py       # Analyze-specific presets
│
├── parsers/                     # EXISTING: Modified to use shared args
│   ├── __init__.py
│   ├── base.py
│   ├── scrape_parser.py         # Now imports from arguments/
│   ├── github_parser.py         # Now imports from arguments/
│   ├── analyze_parser.py        # Adds --preset support
│   └── ...
│
└── scrapers/                    # EXISTING: Modified to use shared args
    ├── doc_scraper.py           # Now imports from arguments/
    ├── github_scraper.py        # Now imports from arguments/
    └── codebase_scraper.py      # Now imports from arguments/

3.2 Shared Argument Definitions

File: src/skill_seekers/cli/arguments/scrape.py

"""Shared argument definitions for scrape command.

This module defines ALL arguments for the scrape command in ONE place.
Both doc_scraper.py and parsers/scrape_parser.py use these definitions.
"""

import argparse


def add_scrape_arguments(parser: argparse.ArgumentParser) -> None:
    """Add all scrape command arguments to a parser.
    
    This is the SINGLE SOURCE OF TRUTH for scrape arguments.
    Used by:
    - doc_scraper.py (standalone scraper)
    - parsers/scrape_parser.py (unified CLI)
    """
    # Positional argument
    parser.add_argument(
        "url",
        nargs="?",
        help="Documentation URL (positional argument)"
    )
    
    # Core options
    parser.add_argument(
        "--url",
        type=str,
        help="Base documentation URL (alternative to positional)"
    )
    parser.add_argument(
        "--interactive", "-i",
        action="store_true",
        help="Interactive configuration mode"
    )
    parser.add_argument(
        "--config", "-c",
        type=str,
        help="Load configuration from JSON file"
    )
    parser.add_argument(
        "--name",
        type=str,
        help="Skill name"
    )
    parser.add_argument(
        "--description", "-d",
        type=str,
        help="Skill description"
    )
    
    # Scraping options
    parser.add_argument(
        "--max-pages",
        type=int,
        dest="max_pages",
        metavar="N",
        help="Maximum pages to scrape (overrides config)"
    )
    parser.add_argument(
        "--rate-limit", "-r",
        type=float,
        metavar="SECONDS",
        help="Override rate limit in seconds"
    )
    parser.add_argument(
        "--workers", "-w",
        type=int,
        metavar="N",
        help="Number of parallel workers (default: 1, max: 10)"
    )
    parser.add_argument(
        "--async",
        dest="async_mode",
        action="store_true",
        help="Enable async mode for better performance"
    )
    parser.add_argument(
        "--no-rate-limit",
        action="store_true",
        help="Disable rate limiting"
    )
    
    # Control options
    parser.add_argument(
        "--skip-scrape",
        action="store_true",
        help="Skip scraping, use existing data"
    )
    parser.add_argument(
        "--dry-run",
        action="store_true",
        help="Preview what will be scraped without scraping"
    )
    parser.add_argument(
        "--resume",
        action="store_true",
        help="Resume from last checkpoint"
    )
    parser.add_argument(
        "--fresh",
        action="store_true",
        help="Clear checkpoint and start fresh"
    )
    
    # Enhancement options
    parser.add_argument(
        "--enhance",
        action="store_true",
        help="Enhance SKILL.md using Claude API (requires API key)"
    )
    parser.add_argument(
        "--enhance-local",
        action="store_true",
        help="Enhance using Claude Code (no API key needed)"
    )
    parser.add_argument(
        "--interactive-enhancement",
        action="store_true",
        help="Open terminal for enhancement (with --enhance-local)"
    )
    parser.add_argument(
        "--api-key",
        type=str,
        help="Anthropic API key (or set ANTHROPIC_API_KEY)"
    )
    
    # Output options
    parser.add_argument(
        "--verbose", "-v",
        action="store_true",
        help="Enable verbose output"
    )
    parser.add_argument(
        "--quiet", "-q",
        action="store_true",
        help="Minimize output"
    )
    
    # RAG chunking options
    parser.add_argument(
        "--chunk-for-rag",
        action="store_true",
        help="Enable semantic chunking for RAG"
    )
    parser.add_argument(
        "--chunk-size",
        type=int,
        default=512,
        metavar="TOKENS",
        help="Target chunk size in tokens (default: 512)"
    )
    parser.add_argument(
        "--chunk-overlap",
        type=int,
        default=50,
        metavar="TOKENS",
        help="Overlap between chunks (default: 50)"
    )
    parser.add_argument(
        "--no-preserve-code-blocks",
        action="store_true",
        help="Allow splitting code blocks"
    )
    parser.add_argument(
        "--no-preserve-paragraphs",
        action="store_true",
        help="Ignore paragraph boundaries"
    )

3.3 How Existing Files Change

Before (doc_scraper.py):

def create_argument_parser():
    parser = argparse.ArgumentParser(...)
    parser.add_argument("url", nargs="?", help="...")
    parser.add_argument("--interactive", "-i", action="store_true", help="...")
    # ... 24 more add_argument calls ...
    return parser

After (doc_scraper.py):

from skill_seekers.cli.arguments.scrape import add_scrape_arguments

def create_argument_parser():
    parser = argparse.ArgumentParser(...)
    add_scrape_arguments(parser)  # ← Single function call
    return parser

Before (parsers/scrape_parser.py):

class ScrapeParser(SubcommandParser):
    def add_arguments(self, parser):
        parser.add_argument("url", nargs="?", help="...")  # ← Duplicate!
        parser.add_argument("--config", help="...")         # ← Duplicate!
        # ... only 12 args, missing many!

After (parsers/scrape_parser.py):

from skill_seekers.cli.arguments.scrape import add_scrape_arguments

class ScrapeParser(SubcommandParser):
    def add_arguments(self, parser):
        add_scrape_arguments(parser)  # ← Same function as doc_scraper!

3.4 Preset System (Issue #268)

File: src/skill_seekers/cli/presets/analyze_presets.py

"""Preset definitions for analyze command."""

from dataclasses import dataclass
from typing import Dict


@dataclass(frozen=True)
class AnalysisPreset:
    """Definition of an analysis preset."""
    name: str
    description: str
    depth: str  # "surface", "deep", "full"
    features: Dict[str, bool]
    enhance_level: int
    estimated_time: str


# Preset definitions
PRESETS = {
    "quick": AnalysisPreset(
        name="Quick",
        description="Fast basic analysis (~1-2 min)",
        depth="surface",
        features={
            "api_reference": True,
            "dependency_graph": False,
            "patterns": False,
            "test_examples": False,
            "how_to_guides": False,
            "config_patterns": False,
        },
        enhance_level=0,
        estimated_time="1-2 minutes"
    ),
    
    "standard": AnalysisPreset(
        name="Standard",
        description="Balanced analysis with core features (~5-10 min)",
        depth="deep",
        features={
            "api_reference": True,
            "dependency_graph": True,
            "patterns": True,
            "test_examples": True,
            "how_to_guides": False,
            "config_patterns": True,
        },
        enhance_level=0,
        estimated_time="5-10 minutes"
    ),
    
    "comprehensive": AnalysisPreset(
        name="Comprehensive",
        description="Full analysis with AI enhancement (~20-60 min)",
        depth="full",
        features={
            "api_reference": True,
            "dependency_graph": True,
            "patterns": True,
            "test_examples": True,
            "how_to_guides": True,
            "config_patterns": True,
        },
        enhance_level=1,
        estimated_time="20-60 minutes"
    ),
}


def apply_preset(args, preset_name: str) -> None:
    """Apply a preset to args namespace."""
    preset = PRESETS[preset_name]
    args.depth = preset.depth
    args.enhance_level = preset.enhance_level
    
    for feature, enabled in preset.features.items():
        setattr(args, f"skip_{feature}", not enabled)

Usage in analyze_parser.py:

from skill_seekers.cli.arguments.analyze import add_analyze_arguments
from skill_seekers.cli.presets.analyze_presets import PRESETS

class AnalyzeParser(SubcommandParser):
    def add_arguments(self, parser):
        # Add all base arguments
        add_analyze_arguments(parser)
        
        # Add preset argument
        parser.add_argument(
            "--preset",
            choices=list(PRESETS.keys()),
            help=f"Analysis preset ({', '.join(PRESETS.keys())})"
        )

4. Testing Strategy

4.1 Parser Sync Test (Prevents Regression)

File: tests/test_parser_sync.py

"""Test that parsers stay in sync with scraper modules."""

import argparse
import pytest


class TestScrapeParserSync:
    """Ensure scrape_parser has all arguments from doc_scraper."""
    
    def test_scrape_arguments_in_sync(self):
        """Verify unified CLI parser has all doc_scraper arguments."""
        from skill_seekers.cli.doc_scraper import create_argument_parser
        from skill_seekers.cli.parsers.scrape_parser import ScrapeParser
        
        # Get source arguments from doc_scraper
        source_parser = create_argument_parser()
        source_dests = {a.dest for a in source_parser._actions}
        
        # Get target arguments from unified CLI parser
        target_parser = argparse.ArgumentParser()
        ScrapeParser().add_arguments(target_parser)
        target_dests = {a.dest for a in target_parser._actions}
        
        # Check for missing arguments
        missing = source_dests - target_dests
        assert not missing, f"scrape_parser missing arguments: {missing}"


class TestGitHubParserSync:
    """Ensure github_parser has all arguments from github_scraper."""
    
    def test_github_arguments_in_sync(self):
        """Verify unified CLI parser has all github_scraper arguments."""
        from skill_seekers.cli.github_scraper import create_argument_parser
        from skill_seekers.cli.parsers.github_parser import GitHubParser
        
        source_parser = create_argument_parser()
        source_dests = {a.dest for a in source_parser._actions}
        
        target_parser = argparse.ArgumentParser()
        GitHubParser().add_arguments(target_parser)
        target_dests = {a.dest for a in target_parser._actions}
        
        missing = source_dests - target_dests
        assert not missing, f"github_parser missing arguments: {missing}"

4.2 Preset System Tests

"""Test preset system functionality."""

import pytest
from skill_seekers.cli.presets.analyze_presets import (
    PRESETS, 
    apply_preset,
    AnalysisPreset
)


class TestAnalyzePresets:
    """Test analyze preset definitions."""
    
    def test_all_presets_have_required_fields(self):
        """Verify all presets have required attributes."""
        required_fields = ['name', 'description', 'depth', 'features', 
                          'enhance_level', 'estimated_time']
        
        for preset_name, preset in PRESETS.items():
            for field in required_fields:
                assert hasattr(preset, field), \
                    f"Preset '{preset_name}' missing field '{field}'"
    
    def test_preset_quick_has_minimal_features(self):
        """Verify quick preset disables most features."""
        preset = PRESETS['quick']
        
        assert preset.depth == 'surface'
        assert preset.enhance_level == 0
        assert preset.features['dependency_graph'] is False
        assert preset.features['patterns'] is False
    
    def test_preset_comprehensive_has_all_features(self):
        """Verify comprehensive preset enables all features."""
        preset = PRESETS['comprehensive']
        
        assert preset.depth == 'full'
        assert preset.enhance_level == 1
        assert all(preset.features.values()), \
            "Comprehensive preset should enable all features"
    
    def test_apply_preset_modifies_args(self):
        """Verify apply_preset correctly modifies args."""
        from argparse import Namespace
        
        args = Namespace()
        apply_preset(args, 'quick')
        
        assert args.depth == 'surface'
        assert args.enhance_level == 0
        assert args.skip_dependency_graph is True

5. Migration Plan

Phase 1: Foundation (Day 1)

  1. Create arguments/ module

    • arguments/__init__.py
    • arguments/common.py - shared arguments
    • arguments/scrape.py - all 26 scrape arguments
  2. Update doc_scraper.py

    • Replace inline argument definitions with import from arguments/scrape.py
    • Test: python -m skill_seekers.cli.doc_scraper --help works
  3. Update parsers/scrape_parser.py

    • Replace inline definitions with import from arguments/scrape.py
    • Test: skill-seekers scrape --help shows all 26 arguments

Phase 2: Extend to Other Commands (Day 2)

  1. Create arguments/github.py
  2. Update github_scraper.py and parsers/github_parser.py
  3. Repeat for pdf, analyze, unified commands
  4. Add parser sync tests (tests/test_parser_sync.py)

Phase 3: Preset System (Day 2-3)

  1. Create presets/ module

    • presets/__init__.py
    • presets/base.py
    • presets/analyze_presets.py
  2. Update parsers/analyze_parser.py

    • Add --preset argument
    • Add preset resolution logic
  3. Update codebase_scraper.py

    • Handle preset mapping in main()
  4. Add preset tests

Phase 4: Documentation & Cleanup (Day 3)

  1. Update docstrings
  2. Update README.md with preset examples
  3. Run full test suite
  4. Verify backward compatibility

6. Backward Compatibility

Fully Maintained

Aspect Compatibility
Command-line interface 100% compatible - no removed arguments
JSON configs No changes
Python API No changes to public functions
Existing scripts Continue to work

New Capabilities

Feature Availability
--interactive flag Now works in unified CLI
--url flag Now works in unified CLI
--preset quick New capability
All scrape args Now available in unified CLI

7. Benefits Summary

Benefit How Achieved
Fixes #285 Single source of truth - parsers cannot drift
Enables #268 Preset system built on clean foundation
Maintainable Explicit code, no magic, no internal APIs
Testable Easy to verify sync with automated tests
Extensible Easy to add new commands or presets
Type-safe Functions can be type-checked
Documented Arguments defined once, documented once

8. Trade-offs

Aspect Trade-off
Lines of code ~200 more lines than hybrid approach (acceptable)
Import overhead One extra import per module (negligible)
Refactoring effort 2-3 days vs 2 hours for quick fix (worth it)

9. Decision Required

Please review this proposal and indicate:

  1. Approve - Start implementation of Pure Explicit approach
  2. 🔄 Modify - Request changes to the approach
  3. Reject - Choose alternative (Hybrid or Quick Fix)

Questions to consider:

  • Does this architecture meet your long-term maintainability goals?
  • Is the 2-3 day timeline acceptable?
  • Should we include any additional commands in the refactor?

Appendix A: Alternative Approaches Considered

A.1 Quick Fix (Rejected)

Just fix scrape_parser.py to match doc_scraper.py.

Why rejected: Problem will recur. No systematic solution.

A.2 Hybrid with Auto-Introspection (Rejected)

Use parser._actions to copy arguments automatically.

Why rejected: Uses internal argparse APIs (_actions). Fragile.

# FRAGILE - Uses internal API
for action in source_parser._actions:
    if action.dest not in common_dests:
        # How to clone? _clone_argument doesn't exist!

A.3 Click Framework (Rejected)

Migrate entire CLI to Click.

Why rejected: Major refactor, breaking changes, too much effort.


Appendix B: Example User Experience

After Fix (Issue #285)

# Before: ERROR
$ skill-seekers scrape --interactive
error: unrecognized arguments: --interactive

# After: WORKS
$ skill-seekers scrape --interactive
? Enter documentation URL: https://react.dev
? Skill name: react
...

With Presets (Issue #268)

# Before: Complex flags
$ skill-seekers analyze --directory . --depth full \
    --skip-patterns --skip-test-examples ...

# After: Simple preset
$ skill-seekers analyze --directory . --preset comprehensive
🚀 Comprehensive analysis mode: all features + AI enhancement (~20-60 min)

End of Proposal