feat: add three-layer PII defense system (pre-commit + gitleaks + CLAUDE.md)

Prevents sensitive data (user paths, phone numbers, personal IDs) from
entering git history. Born from redacting 6 historical commits.

- .gitleaks.toml: custom rules for absolute paths, phone numbers, usernames
- .githooks/pre-commit: dual-layer scan (gitleaks + regex fallback)
- CLAUDE.md: updated Privacy section documenting the defense system

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
daymade
2026-04-04 12:54:10 +08:00
parent 28cd6bd813
commit 0715ffb4bd
3 changed files with 117 additions and 2 deletions

53
.githooks/pre-commit Executable file
View File

@@ -0,0 +1,53 @@
#!/bin/bash
# Pre-commit hook: scan staged changes for sensitive data
# Install: git config core.hooksPath .githooks
set -euo pipefail
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
echo "🔍 Scanning staged changes for sensitive data..."
FAILED=0
# Layer 1: gitleaks (if available)
if command -v gitleaks &>/dev/null; then
if ! gitleaks protect --staged --config .gitleaks.toml --no-banner 2>/dev/null; then
echo -e "${RED}❌ gitleaks found secrets in staged changes${NC}"
FAILED=1
fi
else
echo -e "${YELLOW}⚠ gitleaks not installed (brew install gitleaks), falling back to pattern scan${NC}"
fi
# Layer 2: fast regex scan (always runs, catches what gitleaks config might miss)
STAGED_DIFF=$(git diff --cached --diff-filter=ACDMR)
PATTERNS=(
'/Users/[a-zA-Z][a-zA-Z0-9_-]+/'
'/home/[a-zA-Z][a-zA-Z0-9_-]+/'
'C:\\Users\\[a-zA-Z]'
'songtiansheng'
'tiansheng'
'15366[0-9]+'
)
for pattern in "${PATTERNS[@]}"; do
MATCHES=$(echo "$STAGED_DIFF" | grep -nE "^\+" | grep -E "$pattern" | grep -v "^+++\|\.gitleaks\.toml\|\.githooks/\|\.gitignore\|placeholder\|example\|CLAUDE\.md" || true)
if [ -n "$MATCHES" ]; then
echo -e "${RED}❌ Found sensitive pattern '${pattern}':${NC}"
echo "$MATCHES" | head -5
FAILED=1
fi
done
if [ $FAILED -eq 1 ]; then
echo ""
echo -e "${RED}Commit blocked. Fix the issues above, or use --no-verify to bypass (not recommended).${NC}"
exit 1
fi
echo -e "${GREEN}✅ No sensitive data found in staged changes.${NC}"

53
.gitleaks.toml Normal file
View File

@@ -0,0 +1,53 @@
# Gitleaks custom rules for claude-code-skills repo
# Catches personal info that shouldn't be in an open source repo
title = "claude-code-skills sensitive data rules"
[extend]
useDefault = true
# Global allowlist: files that are allowed to contain patterns
# (the config file itself, hooks, and contribution guides)
[allowlist]
paths = [
'''\.gitleaks\.toml$''',
'''\.githooks/''',
'''CONTRIBUTING\.md$''',
'''CLAUDE\.md$''',
]
[[rules]]
id = "absolute-user-path-macos"
description = "Hardcoded macOS user home directory path"
regex = '''/Users/[a-zA-Z][a-zA-Z0-9_-]+/'''
tags = ["pii", "path"]
[[rules]]
id = "absolute-user-path-linux"
description = "Hardcoded Linux home directory path"
regex = '''/home/[a-zA-Z][a-zA-Z0-9_-]+/'''
tags = ["pii", "path"]
[[rules]]
id = "windows-user-path"
description = "Hardcoded Windows user profile path"
regex = '''C:\\Users\\[a-zA-Z][a-zA-Z0-9_-]+\\'''
tags = ["pii", "path"]
[[rules]]
id = "phone-number-cn"
description = "Chinese mobile phone number"
regex = '''1[3-9]\d{9}'''
tags = ["pii", "phone"]
[[rules]]
id = "douban-user-id-literal"
description = "Hardcoded Douban user ID"
regex = '''songtiansheng'''
tags = ["pii", "username"]
[[rules]]
id = "email-personal"
description = "Personal email address"
regex = '''[a-zA-Z0-9._%+-]+@(gmail|qq|163|126|outlook|hotmail|yahoo|icloud|foxmail)\.[a-zA-Z]{2,}'''
tags = ["pii", "email"]

View File

@@ -115,13 +115,22 @@ description: Clear description with activation triggers. This skill should be us
--- ---
``` ```
### Privacy and Path Guidelines ### Privacy and Path Guidelines (Enforced by Pre-commit Hook)
Skills for public distribution must NOT contain: Skills for public distribution must NOT contain:
- Absolute paths to user directories (`/home/username/`, `/Users/username/`) - Absolute paths to user directories (`/home/username/`, `/Users/username/`)
- Personal usernames, company names, product names - Personal usernames, company names, product names
- Phone numbers, personal email addresses
- OneDrive paths or environment-specific absolute paths - OneDrive paths or environment-specific absolute paths
- Use relative paths within skill bundle or standard placeholders - Use relative paths within skill bundle or standard placeholders (`~/workspace/`, `<user_id>`)
**Three-layer defense system:**
1. **CLAUDE.md rules** (this section) — Claude avoids generating sensitive content
2. **Pre-commit hook** (`.githooks/pre-commit`) — blocks commits with sensitive patterns
3. **gitleaks** (`.gitleaks.toml`) — deep scan with custom rules for this repo
The pre-commit hook is auto-activated via `git config core.hooksPath .githooks`.
If it fires, fix the issue — do NOT use `--no-verify` to bypass.
### Content Organization ### Content Organization