- Codex CLI: 174 skills synced, 11 new symlinks - Gemini CLI: 262 items synced, 11 new - engineering plugin.json: 33 → 35 skills - engineering-team plugin.json: 28 → 29 skills - Docs regenerated: 261 pages (214 skills + 25 agents + 22 commands) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 KiB
title, description
| title | description |
|---|---|
| Security Penetration Testing — Agent Skill & Codex Plugin | Use when the user asks to perform security audits, penetration testing, vulnerability scanning, OWASP Top 10 checks, or offensive security. Agent skill for Claude Code, Codex CLI, Gemini CLI, OpenClaw. |
Security Penetration Testing
claude /plugin install engineering-skills
Hands-on offensive security testing skill for finding vulnerabilities before attackers do. This is NOT compliance checking (see senior-secops) or security policy writing (see senior-security) — this is about systematic vulnerability discovery through authorized testing.
Table of Contents
- Overview
- OWASP Top 10 Systematic Audit
- Static Analysis
- Dependency Vulnerability Scanning
- Secret Scanning
- API Security Testing
- Web Vulnerability Testing
- Infrastructure Security
- Pen Test Report Generation
- Responsible Disclosure Workflow
- Workflows
- Anti-Patterns
- Cross-References
Overview
What This Skill Does
This skill provides the methodology, checklists, and automation for offensive security testing — actively probing systems to discover exploitable vulnerabilities. It covers web applications, APIs, infrastructure, and supply chain security.
Distinction from Other Security Skills
| Skill | Focus | Approach |
|---|---|---|
| security-pen-testing (this) | Finding vulnerabilities | Offensive — simulate attacker techniques |
| senior-secops | Security operations | Defensive — monitoring, incident response, SIEM |
| senior-security | Security policy | Governance — policies, frameworks, risk registers |
| skill-security-auditor | CI/CD gates | Automated — pre-merge security checks |
Prerequisites
All testing described here assumes written authorization from the system owner. Unauthorized testing is illegal under the CFAA and equivalent laws worldwide. Always obtain a signed scope-of-work or rules-of-engagement document before starting.
OWASP Top 10 Systematic Audit
Use the vulnerability scanner tool for automated checklist generation:
# Generate OWASP checklist for a web application
python scripts/vulnerability_scanner.py --target web --scope full
# Quick API-focused scan
python scripts/vulnerability_scanner.py --target api --scope quick --json
Quick Reference
| # | Category | Key Tests |
|---|---|---|
| A01 | Broken Access Control | IDOR, vertical escalation, CORS, JWT claim manipulation, forced browsing |
| A02 | Cryptographic Failures | TLS version, password hashing, hardcoded keys, weak PRNG |
| A03 | Injection | SQLi, NoSQLi, command injection, template injection, XSS |
| A04 | Insecure Design | Rate limiting, business logic abuse, multi-step flow bypass |
| A05 | Security Misconfiguration | Default credentials, debug mode, security headers, directory listing |
| A06 | Vulnerable Components | Dependency audit (npm/pip/go), EOL checks, known CVEs |
| A07 | Auth Failures | Brute force, session cookie flags, session invalidation, MFA bypass |
| A08 | Integrity Failures | Unsafe deserialization, SRI checks, CI/CD pipeline integrity |
| A09 | Logging Failures | Auth event logging, sensitive data in logs, alerting thresholds |
| A10 | SSRF | Internal IP access, cloud metadata endpoints, DNS rebinding |
# Audit dependencies
python scripts/dependency_auditor.py --file package.json --severity high
python scripts/dependency_auditor.py --file requirements.txt --json
See owasp_top_10_checklist.md for detailed test procedures, code patterns to detect, remediation steps, and CVSS scoring guidance for each category.
Static Analysis
Recommended tools: CodeQL (custom queries for project-specific patterns), Semgrep (rule-based scanning with auto-fix), ESLint security plugins (eslint-plugin-security, eslint-plugin-no-unsanitized).
Key patterns to detect: SQL injection via string concatenation, hardcoded JWT secrets, unsafe YAML/pickle deserialization, missing security middleware (e.g., Express without Helmet).
See attack_patterns.md for code patterns and detection payloads across injection types.
Dependency Vulnerability Scanning
Ecosystem commands: npm audit, pip audit, govulncheck ./..., bundle audit check
CVE Triage Workflow:
- Collect — Run ecosystem audit tools, aggregate findings
- Deduplicate — Group by CVE ID across direct and transitive deps
- Prioritize — Critical + exploitable + reachable = fix immediately
- Remediate — Upgrade, patch, or mitigate with compensating controls
- Verify — Rerun audit to confirm fix, update lock files
python scripts/dependency_auditor.py --file package.json --severity critical --json
Secret Scanning
Tools: TruffleHog (git history + filesystem), Gitleaks (regex-based with custom rules).
# Scan git history for verified secrets
trufflehog git file://. --only-verified --json
# Scan filesystem
trufflehog filesystem . --json
Integration points: Pre-commit hooks (gitleaks, trufflehog), CI/CD gates (GitHub Actions with trufflesecurity/trufflehog@main). Configure .gitleaks.toml for custom rules (AWS keys, API keys, private key headers) and allowlists for test fixtures.
API Security Testing
Authentication Bypass
- JWT manipulation: Change
algtonone, RS256-to-HS256 confusion, claim modification (role: "admin",exp: 9999999999) - Session fixation: Check if session ID changes after authentication
Authorization Flaws
- IDOR/BOLA: Change resource IDs in every endpoint — test read, update, delete across users
- BFLA: Regular user tries admin endpoints (expect 403)
- Mass assignment: Add privileged fields (
role,is_admin) to update requests
Rate Limiting & GraphQL
- Rate limiting: Rapid-fire requests to auth endpoints; expect 429 after threshold
- GraphQL: Test introspection (should be disabled in prod), query depth attacks, batch mutations bypassing rate limits
See attack_patterns.md for complete JWT manipulation payloads, IDOR testing methodology, BFLA endpoint lists, GraphQL introspection/depth/batch attack patterns, and rate limiting bypass techniques.
Web Vulnerability Testing
| Vulnerability | Key Tests |
|---|---|
| XSS | Reflected (script/img/svg payloads), Stored (persistent fields), DOM-based (innerHTML + location.hash) |
| CSRF | Replay without token (expect 403), cross-session token replay, check SameSite cookie attribute |
| SQL Injection | Error-based (' OR 1=1--), union-based enumeration, time-based blind (SLEEP(5)), boolean-based blind |
| SSRF | Internal IPs, cloud metadata endpoints (AWS/GCP/Azure), IPv6/hex/decimal encoding bypasses |
| Path Traversal | etc/passwd, URL encoding, double encoding bypasses |
See attack_patterns.md for complete test payloads (XSS filter bypasses, context-specific XSS, SQL injection per database engine, SSRF bypass techniques, and DOM-based XSS source/sink pairs).
Infrastructure Security
Key checks:
- Cloud storage: S3 bucket public access (
aws s3 ls s3://bucket --no-sign-request), bucket policies, ACLs - HTTP security headers: HSTS, CSP (no
unsafe-inline/unsafe-eval), X-Content-Type-Options, X-Frame-Options, Referrer-Policy - TLS configuration:
nmap --script ssl-enum-ciphers -p 443 target.comortestssl.sh— reject TLS 1.0/1.1, RC4, 3DES, export-grade ciphers - Port scanning:
nmap -sV target.com— flag dangerous open ports (FTP/21, Telnet/23, Redis/6379, MongoDB/27017)
Pen Test Report Generation
Generate professional reports from structured findings:
# Generate markdown report from findings JSON
python scripts/pentest_report_generator.py --findings findings.json --format md --output report.md
# Generate JSON report
python scripts/pentest_report_generator.py --findings findings.json --format json --output report.json
Findings JSON Format
[
{
"title": "SQL Injection in Login Endpoint",
"severity": "critical",
"cvss_score": 9.8,
"cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H",
"category": "A03:2021 - Injection",
"description": "The /api/login endpoint is vulnerable to SQL injection via the email parameter.",
"evidence": "Request: POST /api/login {\"email\": \"' OR 1=1--\", \"password\": \"x\"}\nResponse: 200 OK with admin session token",
"impact": "Full database access, authentication bypass, potential remote code execution",
"remediation": "Use parameterized queries. Replace string concatenation with prepared statements.",
"references": ["https://cwe.mitre.org/data/definitions/89.html"]
}
]
Report Structure
- Executive Summary: Business impact, overall risk level, top 3 findings
- Scope: What was tested, what was excluded, testing dates
- Methodology: Tools used, testing approach (black/gray/white box)
- Findings Table: Sorted by severity with CVSS scores
- Detailed Findings: Each with description, evidence, impact, remediation
- Remediation Priority Matrix: Effort vs. impact for each fix
- Appendix: Raw tool output, full payload lists
Responsible Disclosure Workflow
Responsible disclosure is mandatory for any vulnerability found during authorized testing. Standard timeline: report on day 1, follow up at day 7, status update at day 30, public disclosure at day 90.
Key principles: Never exploit beyond proof of concept, encrypt all communications, do not access real user data, document everything with timestamps.
See responsible_disclosure.md for full disclosure timelines (standard 90-day, accelerated 30-day, extended 120-day), communication templates, legal considerations, bug bounty program integration, and CVE request process.
Workflows
Workflow 1: Quick Security Check (15 Minutes)
For pre-merge reviews or quick health checks:
# 1. Generate OWASP checklist
python scripts/vulnerability_scanner.py --target web --scope quick
# 2. Scan dependencies
python scripts/dependency_auditor.py --file package.json --severity high
# 3. Check for secrets in recent commits
# (Use gitleaks or trufflehog as described in Secret Scanning section)
# 4. Review HTTP security headers
curl -sI https://target.com | grep -iE "(strict-transport|content-security|x-frame|x-content-type)"
Decision: If any critical or high findings, block the merge.
Workflow 2: Full Penetration Test (Multi-Day Assessment)
Day 1 — Reconnaissance:
- Map the attack surface: endpoints, authentication flows, third-party integrations
- Run automated OWASP checklist (full scope)
- Run dependency audit across all manifests
- Run secret scan on full git history
Day 2 — Manual Testing:
- Test authentication and authorization (IDOR, BOLA, BFLA)
- Test injection points (SQLi, XSS, SSRF, command injection)
- Test business logic flaws
- Test API-specific vulnerabilities (GraphQL, rate limiting, mass assignment)
Day 3 — Infrastructure and Reporting:
- Check cloud storage permissions
- Verify TLS configuration and security headers
- Port scan for unnecessary services
- Compile findings into structured JSON
- Generate pen test report
# Generate final report
python scripts/pentest_report_generator.py --findings findings.json --format md --output pentest-report.md
Workflow 3: CI/CD Security Gate
Automated security checks on every PR: secret scanning (TruffleHog), dependency audit (npm audit, pip audit), SAST (Semgrep with p/security-audit, p/owasp-top-ten), and security headers check on staging.
Gate Policy: Block merge on critical/high findings. Warn on medium. Log low/info.
Anti-Patterns
- Testing in production without authorization — Always get written permission and use staging/test environments when possible
- Ignoring low-severity findings — Low findings compound; a chain of lows can become a critical exploit path
- Skipping responsible disclosure — Every vulnerability found must be reported through proper channels
- Relying solely on automated tools — Tools miss business logic flaws, chained exploits, and novel attack vectors
- Testing without a defined scope — Scope creep leads to legal liability; document what is and isn't in scope
- Reporting without remediation guidance — Every finding must include actionable remediation steps
- Storing evidence insecurely — Pen test evidence (screenshots, payloads, tokens) is sensitive; encrypt and restrict access
- One-time testing — Security testing must be continuous; integrate into CI/CD and schedule periodic assessments
Cross-References
| Skill | Relationship |
|---|---|
| senior-secops | Defensive security operations — monitoring, incident response, SIEM configuration |
| senior-security | Security policy and governance — frameworks, risk registers, compliance |
| dependency-auditor | Deep supply chain security — SBOMs, license compliance, transitive risk |
| code-reviewer | Code review practices — includes security review checklist |