diff --git a/engineering/skill-tester/references/quality-scoring-rubric.md b/engineering/skill-tester/references/quality-scoring-rubric.md index 00ed0b6..e639227 100644 --- a/engineering/skill-tester/references/quality-scoring-rubric.md +++ b/engineering/skill-tester/references/quality-scoring-rubric.md @@ -1,12 +1,27 @@ # Quality Scoring Rubric **Version**: 2.0.0 -**Last Updated**: 2026-03-26 +**Last Updated**: 2026-03-27 **Authority**: Claude Skills Engineering Team ## Overview -This document defines the comprehensive quality scoring methodology used to assess skills within the claude-skills ecosystem. The scoring system evaluates five key dimensions, each weighted equally at 20%, to provide an objective and consistent measure of skill quality. +This document defines the comprehensive quality scoring methodology used to assess skills within the claude-skills ecosystem. The scoring system evaluates four key dimensions by default (each weighted at 25%), with an optional fifth Security dimension (enabled via `--include-security` flag). + +### Dimension Configuration + +**Default Mode (backward compatible)**: +- **Documentation Quality**: 25% +- **Code Quality**: 25% +- **Completeness**: 25% +- **Usability**: 25% + +**With `--include-security` flag**: +- **Documentation Quality**: 20% +- **Code Quality**: 20% +- **Completeness**: 20% +- **Security**: 20% +- **Usability**: 20% ## Scoring Framework @@ -23,13 +38,15 @@ This document defines the comprehensive quality scoring methodology used to asse - **D (50-54)**: Very poor quality, fundamental issues present - **F (0-49)**: Failing quality, does not meet basic standards -### Dimension Weights +### Dimension Weights (Default: 4 dimensions × 25%) + Each dimension contributes equally to the overall score: -- **Documentation Quality**: 20% -- **Code Quality**: 20% -- **Completeness**: 20% -- **Security**: 20% -- **Usability**: 20% +- **Documentation Quality**: 25% +- **Code Quality**: 25% +- **Completeness**: 25% +- **Usability**: 25% + +When `--include-security` is used, all five dimensions are weighted at 20% each. ## Documentation Quality (20% Weight) @@ -380,10 +397,19 @@ def assign_letter_grade(overall_score): - Create beginner-friendly tutorials - Add interactive examples -## Security (20% Weight) +## Security (Optional, 20% Weight when enabled) ### Overview -The Security dimension evaluates Python scripts for security vulnerabilities and best practices. This dimension is critical for ensuring that skills do not introduce security risks into the claude-skills ecosystem. +The Security dimension evaluates Python scripts for security vulnerabilities and best practices. This dimension is **optional** and only evaluated when the `--include-security` flag is passed to the quality scorer. + +**Important**: By default, the quality scorer uses 4 dimensions × 25% weights for backward compatibility. To include Security assessment, use: +```bash +python quality_scorer.py --include-security +``` + +When Security is enabled, all dimensions are rebalanced to 20% each (5 dimensions × 20% = 100%). + +This dimension is critical for ensuring that skills do not introduce security risks into the claude-skills ecosystem. ### Scoring Components @@ -490,10 +516,17 @@ eval(user_input) # VERY BAD ``` ### Security Score Impact on Tiers + +**Note**: Security requirements only apply when `--include-security` is used. + +When Security dimension is enabled: - **POWERFUL Tier**: Requires Security score ≥ 70 - **STANDARD Tier**: Requires Security score ≥ 50 - **BASIC Tier**: No minimum Security requirement +When Security dimension is not enabled (default): +- Tier recommendations are based on the 4 core dimensions (Documentation, Code Quality, Completeness, Usability) + #### Low Security Scores - Remove hardcoded credentials, use environment variables - Fix path traversal vulnerabilities diff --git a/engineering/skill-tester/references/tier-requirements-matrix.md b/engineering/skill-tester/references/tier-requirements-matrix.md index 871d776..0e756e7 100644 --- a/engineering/skill-tester/references/tier-requirements-matrix.md +++ b/engineering/skill-tester/references/tier-requirements-matrix.md @@ -1,13 +1,15 @@ # Tier Requirements Matrix **Version**: 2.0.0 -**Last Updated**: 2026-03-26 +**Last Updated**: 2026-03-27 **Authority**: Claude Skills Engineering Team ## Overview This document provides a comprehensive matrix of requirements for each skill tier within the claude-skills ecosystem. Skills are classified into three tiers based on complexity, functionality, and comprehensiveness: BASIC, STANDARD, and POWERFUL. +**Note**: Security dimension requirements are optional and only apply when `--include-security` flag is used. By default, tier recommendations are based on 4 core dimensions (Documentation, Code Quality, Completeness, Usability) at 25% weight each. + ## Tier Classification Philosophy ### BASIC Tier @@ -33,9 +35,9 @@ Advanced skills that provide comprehensive functionality with sophisticated impl | **Documentation Depth** | Functional | Comprehensive | Expert-level | | **Examples Provided** | ≥1 | ≥3 | ≥5 | | **Test Coverage** | Basic validation | Sample data testing | Comprehensive test suite | -| **Security Score** | ≥40 | ≥50 | ≥70 | -| **Hardcoded Secrets** | None | None | None | -| **Input Validation** | Basic | Comprehensive | Advanced with sanitization | +| **Security Score** *(opt-in)* | ≥40 | ≥50 | ≥70 | +| **Hardcoded Secrets** *(opt-in)* | None | None | None | +| **Input Validation** *(opt-in)* | Basic | Comprehensive | Advanced with sanitization | ## Detailed Requirements by Tier @@ -68,7 +70,9 @@ Advanced skills that provide comprehensive functionality with sophisticated impl - **Usability**: Clear usage instructions and examples - **Completeness**: All essential components present -#### Security Requirements +#### Security Requirements *(opt-in with --include-security)* +**Note**: These requirements only apply when the Security dimension is enabled via `--include-security` flag. + - **Security Score**: Minimum 40/100 - **Hardcoded Secrets**: No hardcoded passwords, API keys, or tokens - **Input Validation**: Basic validation for user inputs @@ -106,7 +110,9 @@ Advanced skills that provide comprehensive functionality with sophisticated impl - **Testing**: Sample data processing with validation - **Integration**: Consideration for CI/CD and automation use -#### Security Requirements +#### Security Requirements *(opt-in with --include-security)* +**Note**: These requirements only apply when the Security dimension is enabled via `--include-security` flag. + - **Security Score**: Minimum 50/100 - **Hardcoded Secrets**: No hardcoded credentials (zero tolerance) - **Input Validation**: Comprehensive validation with error messages @@ -146,7 +152,9 @@ Advanced skills that provide comprehensive functionality with sophisticated impl - **Integration**: Full CI/CD integration capabilities - **Maintainability**: Designed for long-term maintenance and extension -#### Security Requirements +#### Security Requirements *(opt-in with --include-security)* +**Note**: These requirements only apply when the Security dimension is enabled via `--include-security` flag. + - **Security Score**: Minimum 70/100 - **Hardcoded Secrets**: Zero tolerance for hardcoded credentials - **Input Validation**: Advanced validation with sanitization and type checking diff --git a/engineering/skill-tester/scripts/quality_scorer.py b/engineering/skill-tester/scripts/quality_scorer.py index 62acc60..14f876b 100644 --- a/engineering/skill-tester/scripts/quality_scorer.py +++ b/engineering/skill-tester/scripts/quality_scorer.py @@ -13,7 +13,9 @@ Author: Claude Skills Engineering Team Version: 2.0.0 Dependencies: Python Standard Library Only Changelog: - v2.0.0 - Added Security dimension (20% weight), rebalanced all dimensions to 20% + v2.0.0 - Added Security dimension (opt-in via --include-security flag) + Default: 4 dimensions × 25% (backward compatible) + With --include-security: 5 dimensions × 20% v1.0.0 - Initial release with 4 dimensions (25% each) """ @@ -156,21 +158,35 @@ class QualityReport: usability_score = self.dimensions.get("Usability", QualityDimension("", 0, "")).score security_score = self.dimensions.get("Security", QualityDimension("", 0, "")).score - # POWERFUL tier requirements (all dimensions must be strong) - if (self.overall_score >= 80 and - all(score >= 75 for score in [doc_score, code_score, completeness_score, usability_score]) and - security_score >= 70): - self.tier_recommendation = "POWERFUL" - - # STANDARD tier requirements (most dimensions good) - elif (self.overall_score >= 70 and - sum(1 for score in [doc_score, code_score, completeness_score, usability_score, security_score] if score >= 65) >= 4 and - security_score >= 50): - self.tier_recommendation = "STANDARD" - - # BASIC tier (minimum viable quality) + # Check if Security dimension is included + has_security = "Security" in self.dimensions + + # POWERFUL tier requirements + if has_security: + # With Security: all 5 dimensions must be strong + if (self.overall_score >= 80 and + all(score >= 75 for score in [doc_score, code_score, completeness_score, usability_score]) and + security_score >= 70): + self.tier_recommendation = "POWERFUL" + + # STANDARD tier requirements (with Security) + elif (self.overall_score >= 70 and + sum(1 for score in [doc_score, code_score, completeness_score, usability_score, security_score] if score >= 65) >= 4 and + security_score >= 50): + self.tier_recommendation = "STANDARD" else: - self.tier_recommendation = "BASIC" + # Without Security: 4 dimensions must be strong + if (self.overall_score >= 80 and + all(score >= 75 for score in [doc_score, code_score, completeness_score, usability_score])): + self.tier_recommendation = "POWERFUL" + + # STANDARD tier requirements (without Security) + elif (self.overall_score >= 70 and + sum(1 for score in [doc_score, code_score, completeness_score, usability_score] if score >= 65) >= 3): + self.tier_recommendation = "STANDARD" + + # BASIC tier (minimum viable quality) + # Falls through to BASIC if no other tier matched def _generate_improvement_roadmap(self): """Generate prioritized improvement suggestions""" @@ -209,10 +225,11 @@ class QualityReport: class QualityScorer: """Main quality scoring engine""" - def __init__(self, skill_path: str, detailed: bool = False, verbose: bool = False): + def __init__(self, skill_path: str, detailed: bool = False, verbose: bool = False, include_security: bool = False): self.skill_path = Path(skill_path).resolve() self.detailed = detailed self.verbose = verbose + self.include_security = include_security self.report = QualityReport(str(self.skill_path)) def log_verbose(self, message: str): @@ -229,12 +246,20 @@ class QualityScorer: if not self.skill_path.exists(): raise ValueError(f"Skill path does not exist: {self.skill_path}") - # Score each dimension (20% weight each, 5 dimensions total) - self._score_documentation() - self._score_code_quality() - self._score_completeness() - self._score_security() - self._score_usability() + # Score each dimension + # Default: 4 dimensions at 25% each (backward compatible) + # With --include-security: 5 dimensions at 20% each + weight = 0.20 if self.include_security else 0.25 + + self._score_documentation(weight) + self._score_code_quality(weight) + self._score_completeness(weight) + + if self.include_security: + self._score_security(0.20) + self._score_usability(0.20) + else: + self._score_usability(0.25) # Calculate overall metrics self.report.calculate_overall_score() @@ -247,11 +272,11 @@ class QualityScorer: return self.report - def _score_documentation(self): - """Score documentation quality (25% weight)""" + def _score_documentation(self, weight: float = 0.25): + """Score documentation quality""" self.log_verbose("Scoring documentation quality...") - dimension = QualityDimension("Documentation", 0.20, "Quality of documentation and written materials") + dimension = QualityDimension("Documentation", weight, "Quality of documentation and written materials") # Score SKILL.md self._score_skill_md(dimension) @@ -501,11 +526,11 @@ class QualityScorer: dimension.add_score("examples_availability", score, 25, f"Found {len(example_files)} example/sample files") - def _score_code_quality(self): - """Score code quality (25% weight)""" + def _score_code_quality(self, weight: float = 0.25): + """Score code quality""" self.log_verbose("Scoring code quality...") - dimension = QualityDimension("Code Quality", 0.20, "Quality of Python scripts and implementation") + dimension = QualityDimension("Code Quality", weight, "Quality of Python scripts and implementation") scripts_dir = self.skill_path / "scripts" if not scripts_dir.exists(): @@ -688,11 +713,11 @@ class QualityScorer: if avg_output_score < 15: dimension.add_suggestion("Add support for both JSON and human-readable output formats") - def _score_completeness(self): - """Score completeness (25% weight)""" + def _score_completeness(self, weight: float = 0.25): + """Score completeness""" self.log_verbose("Scoring completeness...") - dimension = QualityDimension("Completeness", 0.20, "Completeness of required components and assets") + dimension = QualityDimension("Completeness", weight, "Completeness of required components and assets") # Score directory structure self._score_directory_structure(dimension) @@ -810,11 +835,11 @@ class QualityScorer: if not test_files: dimension.add_suggestion("Add test files or validation scripts") - def _score_usability(self): - """Score usability (25% weight)""" + def _score_usability(self, weight: float = 0.25): + """Score usability""" self.log_verbose("Scoring usability...") - dimension = QualityDimension("Usability", 0.20, "Ease of use and user experience") + dimension = QualityDimension("Usability", weight, "Ease of use and user experience") # Score installation simplicity self._score_installation(dimension) @@ -946,11 +971,11 @@ class QualityScorer: dimension.add_score("practical_examples", score, 25, f"Practical examples: {len(example_files)} files") - def _score_security(self): - """Score security quality (20% weight)""" + def _score_security(self, weight: float = 0.20): + """Score security quality""" self.log_verbose("Scoring security quality...") - dimension = QualityDimension("Security", 0.20, "Security practices and vulnerability prevention") + dimension = QualityDimension("Security", weight, "Security practices and vulnerability prevention") # Find Python scripts python_files = list(self.skill_path.rglob("*.py")) @@ -1078,13 +1103,17 @@ Examples: python quality_scorer.py engineering/my-skill python quality_scorer.py engineering/my-skill --detailed --json python quality_scorer.py engineering/my-skill --minimum-score 75 + python quality_scorer.py engineering/my-skill --include-security -Quality Dimensions (each 25%): +Quality Dimensions (default: 4 dimensions × 25%): Documentation - SKILL.md quality, README, references, examples Code Quality - Script complexity, error handling, structure, output Completeness - Directory structure, assets, expected outputs, tests Usability - Installation simplicity, usage clarity, help accessibility +With --include-security (5 dimensions × 20%): + Security - Sensitive data exposure, command injection, input validation + Letter Grades: A+ (95+), A (90+), A- (85+), B+ (80+), B (75+), B- (70+), C+ (65+), C (60+), C- (55+), D (50+), F (<50) """ ) @@ -1104,12 +1133,15 @@ Letter Grades: A+ (95+), A (90+), A- (85+), B+ (80+), B (75+), B- (70+), C+ (65+ parser.add_argument("--verbose", action="store_true", help="Enable verbose logging") + parser.add_argument("--include-security", + action="store_true", + help="Include Security dimension (switches to 5 dimensions × 20%% each)") args = parser.parse_args() try: # Create scorer and assess quality - scorer = QualityScorer(args.skill_path, args.detailed, args.verbose) + scorer = QualityScorer(args.skill_path, args.detailed, args.verbose, args.include_security) report = scorer.assess_quality() # Format and output report