diff --git a/skills/vibe-code-auditor/SKILL.md b/skills/vibe-code-auditor/SKILL.md index d1e41c1f..ed7d0497 100644 --- a/skills/vibe-code-auditor/SKILL.md +++ b/skills/vibe-code-auditor/SKILL.md @@ -5,7 +5,7 @@ risk: safe source: original date_added: "2026-02-28" metadata: - version: 1.0.0 + version: 2.0.0 --- # Vibe Code Auditor @@ -39,6 +39,12 @@ Before beginning the audit, confirm the following. If any item is missing, state - **Scope defined**: Identify whether the input is a snippet, single file, or multi-file system. - **Context noted**: If no context was provided, state the assumptions made (e.g., "Assuming a web API backend with no specified scale requirements"). +**Quick Scan (first 60 seconds):** +- Count files and lines of code +- Identify language(s) and framework(s) +- Spot obvious red flags: hardcoded secrets, bare excepts, TODOs, commented-out code +- Note the entry point(s) and data flow direction + --- ## Audit Dimensions @@ -47,57 +53,128 @@ Evaluate the code across all seven dimensions below. For each finding, record: t **Do not invent findings. Do not report issues you cannot substantiate from the code provided.** +**Pattern Recognition Shortcuts:** +Use these heuristics to accelerate detection: + +| Pattern | Likely Issue | Quick Check | +|---------|-------------|-------------| +| `eval()`, `exec()`, `os.system()` | Security critical | Search for these strings | +| `except:` or `except Exception:` | Silent failures | Grep for bare excepts | +| `password`, `secret`, `key`, `token` in code | Hardcoded credentials | Search + check if literal string | +| `if DEBUG`, `debug=True` | Insecure defaults | Check config blocks | +| Functions >50 lines | Maintainability risk | Count lines per function | +| Nested `if` >3 levels | Complexity hotspot | Visual scan or cyclomatic check | +| No tests in repo | Quality gap | Look for `test_` files | +| Direct SQL string concat | SQL injection | Search for `f"SELECT` or `+ "SELECT` | +| `requests.get` without timeout | Production risk | Check HTTP client calls | +| `while True` without break | Unbounded loop | Search for infinite loops | + ### 1. Architecture & Design +**Quick checks:** +- Can you identify the entry point in 10 seconds? +- Are there clear boundaries between layers (API, business logic, data)? +- Does any single file exceed 300 lines? + - Separation of concerns violations (e.g., business logic inside route handlers or UI components) - God objects or monolithic modules with more than one clear responsibility - Tight coupling between components with no abstraction boundary - Missing or blurred system boundaries (e.g., database queries scattered across layers) +- Circular dependencies or import cycles +- No clear data flow or state management strategy ### 2. Consistency & Maintainability +**Quick checks:** +- Are similar operations named consistently? (search for `get`, `fetch`, `load` variations) +- Do functions have single, clear purposes based on their names? +- Is duplicated logic visible? (search for repeated code blocks) + - Naming inconsistencies (e.g., `get_user` vs `fetchUser` vs `retrieveUserData` for the same operation) - Mixed paradigms without justification (e.g., OOP and procedural code interleaved arbitrarily) -- Copy-paste logic that should be extracted into a shared function +- Copy-paste logic that should be extracted into a shared function (3+ repetitions = extract) - Abstractions that obscure rather than clarify intent +- Inconsistent error handling patterns across modules +- Magic numbers or strings without constants or configuration ### 3. Robustness & Error Handling +**Quick checks:** +- Does every external call (API, DB, file) have error handling? +- Are there any bare `except:` blocks? +- What happens if inputs are empty, null, or malformed? + - Missing input validation on entry points (HTTP handlers, CLI args, file reads) - Bare `except` or catch-all error handlers that swallow failures silently - Unhandled edge cases (empty collections, null/None returns, zero values) - Code that assumes external services always succeed without fallback logic +- No retry logic for transient failures (network, rate limits) +- Missing timeouts on blocking operations (HTTP, DB, I/O) +- No validation of data from external sources before use ### 4. Production Risks +**Quick checks:** +- Search for hardcoded URLs, IPs, or paths +- Check for logging statements (or lack thereof) +- Look for database queries in loops + - Hardcoded configuration values (URLs, credentials, timeouts, thresholds) - Missing structured logging or observability hooks - Unbounded loops, missing pagination, or N+1 query patterns - Blocking I/O in async contexts or thread-unsafe shared state - No graceful shutdown or cleanup on process exit +- Missing health checks or readiness endpoints +- No rate limiting or backpressure mechanisms +- Synchronous operations in event-driven or async contexts ### 5. Security & Safety +**Quick checks:** +- Search for: `eval`, `exec`, `os.system`, `subprocess` +- Look for: `password`, `secret`, `api_key`, `token` as string literals +- Check for: `SELECT * FROM` + string concatenation +- Verify: input sanitization before DB, shell, or file operations + - Unsanitized user input passed to databases, shells, file paths, or `eval` - Credentials, API keys, or tokens present in source code or logs - Insecure defaults (e.g., `DEBUG=True`, permissive CORS, no rate limiting) - Trust boundary violations (e.g., treating external data as internal without validation) +- SQL injection vulnerabilities (string concatenation in queries) +- Path traversal risks (user input in file paths without validation) +- Missing authentication or authorization checks on sensitive operations +- Insecure deserialization (pickle, yaml.load without SafeLoader) ### 6. Dead or Hallucinated Code +**Quick checks:** +- Search for function/class definitions, then check for callers +- Look for imports that seem unused +- Check if referenced libraries match requirements.txt or package.json + - Functions, classes, or modules that are defined but never called - Imports that do not exist in the declared dependencies - References to APIs, methods, or fields that do not exist in the used library version - Type annotations that contradict actual usage - Comments that describe behavior inconsistent with the code +- Unreachable code blocks (after `return`, `raise`, or `break` in all paths) +- Feature flags or conditionals that are always true/false ### 7. Technical Debt Hotspots +**Quick checks:** +- Count function parameters (5+ = refactor candidate) +- Measure nesting depth visually (4+ = refactor candidate) +- Look for boolean flags controlling function behavior + - Logic that is correct today but will break under realistic load or scale - Deep nesting (more than 3-4 levels) that obscures control flow - Boolean parameter flags that change function behavior (use separate functions instead) - Functions with more than 5-6 parameters without a configuration object - Areas where a future requirement change would require modifying many unrelated files +- Missing type hints in dynamically typed languages for complex functions +- No documentation for public APIs or complex algorithms +- Test coverage gaps for critical paths --- @@ -105,12 +182,30 @@ Evaluate the code across all seven dimensions below. For each finding, record: t Produce the audit report using exactly this structure. Do not omit sections. If a section has no findings, write "None identified." +**Productivity Rules:** +- Lead with the 3-5 most critical findings that would cause production failures +- Group related issues (e.g., "3 locations with hardcoded credentials" instead of listing separately) +- Provide copy-paste-ready fixes where possible (exact code snippets) +- Use severity tags consistently: `[CRITICAL]`, `[HIGH]`, `[MEDIUM]`, `[LOW]` + --- ### Audit Report **Input:** [file name(s) or "code snippet"] **Assumptions:** [list any assumptions made about context or environment] +**Quick Stats:** [X files, Y lines of code, Z language/framework] + +#### Executive Summary (Read This First) + +In 3-5 bullets, state the most important findings that determine whether this code can go to production: + +``` +- [CRITICAL/HIGH] One-line summary of the most severe issue +- [CRITICAL/HIGH] Second most severe issue +- [MEDIUM] Notable pattern that will cause future problems +- Overall: Deployable as-is / Needs fixes / Requires major rework +``` #### Critical Issues (Must Fix Before Production) @@ -124,6 +219,11 @@ Location: filename.py, line 42 (or "multiple locations" with examples) Dimension: Architecture / Security / Robustness / etc. Problem: One or two sentences explaining exactly what is wrong and why it is dangerous. Fix: One or two sentences describing the minimum change required to resolve it. +Code Fix (if applicable): +```python +# Before: problematic code +# After: corrected version +``` ``` #### High-Risk Issues @@ -152,23 +252,37 @@ Provide a score using the rubric below, then write 2-3 sentences justifying it w | 71-85 | Production-viable with targeted fixes. Known risks are bounded. | | 86-100 | Production-ready. Minor improvements only. | -Score deductions: +**Scoring Algorithm:** -- Each Critical issue: -10 to -20 points depending on blast radius -- Each High issue: -5 to -10 points -- Pervasive maintainability debt (3+ Medium issues in one dimension): -5 points +``` +Start at 100 points +For each CRITICAL issue: -15 points (security: -20) +For each HIGH issue: -8 points +For each MEDIUM issue: -3 points +For pervasive patterns (3+ similar issues): -5 additional points +Floor: 0, Ceiling: 100 +``` #### Refactoring Priorities List the top 3-5 changes in order of impact. Each item must reference a specific finding from above. ``` -1. [Priority] Fix title — addresses [CRITICAL/HIGH ref] — estimated effort: S/M/L -2. ... +1. [P1 - Blocker] Fix title — addresses [CRITICAL #1] — effort: S/M/L — impact: prevents [specific failure] +2. [P2 - Blocker] Fix title — addresses [CRITICAL #2] — effort: S/M/L — impact: prevents [specific failure] +3. [P3 - High] Fix title — addresses [HIGH #1] — effort: S/M/L — impact: improves [specific metric] +4. [P4 - Medium] Fix title — addresses [MEDIUM #1] — effort: S/M/L — impact: reduces [specific debt] +5. [P5 - Optional] Fix title — addresses [LOW #1] — effort: S/M/L — impact: nice-to-have ``` Effort scale: S = < 1 day, M = 1-3 days, L = > 3 days. +**Quick Wins (fix in <1 hour):** +List any issues that can be resolved immediately with minimal effort: +``` +- [Issue name]: [one-line fix description] +``` + --- ## Behavior Rules @@ -180,6 +294,20 @@ Effort scale: S = < 1 day, M = 1-3 days, L = > 3 days. - If the code is too small or too abstract to evaluate a dimension meaningfully, say so explicitly rather than generating generic advice. - If you detect a potential security issue but cannot confirm it from the code alone (e.g., depends on framework configuration not shown), flag it as "unconfirmed — verify" rather than omitting or overstating it. +**Efficiency Rules:** +- Scan for critical patterns first (security, data loss, crashes) before deeper analysis +- Group similar issues by pattern rather than listing each occurrence separately +- Provide exact code fixes for critical/high issues when the solution is straightforward +- Skip dimensions that are not applicable to the code size or type (state "Not applicable: [reason]") +- Focus on issues that would cause production incidents, not theoretical concerns + +**Calibration:** +- For snippets (<100 lines): Focus on security, robustness, and obvious bugs only +- For single files (100-500 lines): Add architecture and maintainability checks +- For multi-file systems (500+ lines): Full audit across all 7 dimensions +- For production code: Emphasize security, observability, and failure modes +- For prototypes: Emphasize scalability limits and technical debt + --- ## Task-Specific Inputs @@ -189,6 +317,12 @@ Before auditing, if not already provided, ask: 1. **Code or files**: Share the source code to audit. Accepted: single file, multiple files, directory listing, or snippet. 2. **Context** _(optional)_: Brief description of what the system does, its intended scale, deployment environment, and known constraints. 3. **Target environment** _(optional)_: Target runtime (e.g., production web service, CLI tool, data pipeline). Used to calibrate risk severity. +4. **Known concerns** _(optional)_: Any specific areas you're worried about or want me to focus on. + +**If context is missing, assume:** +- Language/framework is evident from the code +- Deployment target is production web service (most common) +- Scale expectations are moderate (100-1000 users) unless code suggests otherwise --- @@ -197,3 +331,5 @@ Before auditing, if not already provided, ask: - **schema-markup**: For adding structured data after code is production-ready. - **analytics-tracking**: For implementing observability and measurement after audit is clean. - **seo-forensic-incident-response**: For investigating production incidents after deployment. +- **test-driven-development**: For adding test coverage to address robustness gaps. +- **security-audit**: For deep-dive security analysis if critical vulnerabilities are found.