feat(history-finder): Add claude-code-history-files-finder skill

Add new skill for finding and recovering content from Claude Code session history files (.claude/projects/). Features: - Search sessions by keywords across project history - Recover deleted files from Write tool calls - Analyze session statistics and tool usage - Track file evolution across multiple sessions Best practice improvements applied: - Third-person description in frontmatter - Imperative writing style throughout - Progressive disclosure (workflows in references/) - No content duplication between SKILL.md and references - Proper exception handling in scripts - Documented magic numbers Marketplace integration: - Updated marketplace.json (v1.13.0, 20 plugins) - Updated README.md badges, skill section, use cases - Updated README.zh-CN.md with Chinese translations - Updated CLAUDE.md skill count and available skills list - Updated CHANGELOG.md with v1.13.0 entry 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-09 14:44:57 +08:00
parent 31a535b409
commit 20cc442ec4
11 changed files with 1736 additions and 9 deletions
--- a/claude-code-history-files-finder/references/session_file_format.md
+++ b/claude-code-history-files-finder/references/session_file_format.md
@@ -0,0 +1,285 @@
+# Claude Code Session File Format
+
+## Overview
+
+Claude Code stores conversation history in JSONL (JSON Lines) format, where each line is a complete JSON object representing a message or event in the conversation.
+
+## File Locations
+
+### Session Files
+
+```
+~/.claude/projects/<normalized-project-path>/<session-id>.jsonl
+```
+
+**Path normalization**: Project paths are converted by replacing `/` with `-`
+
+Example:
+- Project: `/Users/username/Workspace/js/myproject`
+- Directory: `~/.claude/projects/-Users-username-Workspace-js-myproject/`
+
+### File Types
+
+| Pattern | Type | Description |
+|---------|------|-------------|
+| `<uuid>.jsonl` | Main session | User conversation sessions |
+| `agent-<id>.jsonl` | Agent session | Sub-agent execution logs |
+
+## JSON Structure
+
+### Message Object
+
+Every line in a JSONL file follows this structure:
+
+```json
+{
+  "role": "user" | "assistant",
+  "message": {
+    "role": "user" | "assistant",
+    "content": [...]
+  },
+  "timestamp": "2025-11-26T00:00:00.000Z",
+  "uuid": "message-uuid",
+  "parentUuid": "parent-message-uuid",
+  "sessionId": "session-uuid"
+}
+```
+
+### Content Types
+
+The `content` array contains different types of content blocks:
+
+#### Text Content
+
+```json
+{
+  "type": "text",
+  "text": "Message text content"
+}
+```
+
+#### Tool Use (Write)
+
+```json
+{
+  "type": "tool_use",
+  "name": "Write",
+  "input": {
+    "file_path": "/absolute/path/to/file.js",
+    "content": "File content here..."
+  }
+}
+```
+
+#### Tool Use (Edit)
+
+```json
+{
+  "type": "tool_use",
+  "name": "Edit",
+  "input": {
+    "file_path": "/absolute/path/to/file.js",
+    "old_string": "Original text",
+    "new_string": "Replacement text",
+    "replace_all": false
+  }
+}
+```
+
+#### Tool Use (Read)
+
+```json
+{
+  "type": "tool_use",
+  "name": "Read",
+  "input": {
+    "file_path": "/absolute/path/to/file.js",
+    "offset": 0,
+    "limit": 100
+  }
+}
+```
+
+#### Tool Use (Bash)
+
+```json
+{
+  "type": "tool_use",
+  "name": "Bash",
+  "input": {
+    "command": "ls -la",
+    "description": "List files"
+  }
+}
+```
+
+### Tool Result
+
+```json
+{
+  "type": "tool_result",
+  "tool_use_id": "tool-use-uuid",
+  "content": "Result content",
+  "is_error": false
+}
+```
+
+## Common Extraction Patterns
+
+### Finding Write Operations
+
+Look for assistant messages with `tool_use` type and `name: "Write"`:
+
+```python
+if item.get("type") == "tool_use" and item.get("name") == "Write":
+    file_path = item["input"]["file_path"]
+    content = item["input"]["content"]
+```
+
+### Finding Edit Operations
+
+```python
+if item.get("type") == "tool_use" and item.get("name") == "Edit":
+    file_path = item["input"]["file_path"]
+    old_string = item["input"]["old_string"]
+    new_string = item["input"]["new_string"]
+```
+
+### Extracting Text Content
+
+```python
+for item in message_content:
+    if item.get("type") == "text":
+        text = item.get("text", "")
+```
+
+## Field Locations
+
+Due to schema variations, some fields may appear in different locations:
+
+### Role Field
+
+```python
+role = data.get("role") or data.get("message", {}).get("role")
+```
+
+### Content Field
+
+```python
+content = data.get("content") or data.get("message", {}).get("content", [])
+```
+
+### Timestamp Field
+
+```python
+timestamp = data.get("timestamp", "")
+```
+
+## Common Use Cases
+
+### Recover Deleted Files
+
+1. Search for `Write` tool calls with matching file path
+2. Extract `input.content` from latest occurrence
+3. Save to disk with original filename
+
+### Track File Changes
+
+1. Find all `Edit` and `Write` operations for a file
+2. Build chronological list of changes
+3. Reconstruct file history
+
+### Search Conversations
+
+1. Extract all `text` content from messages
+2. Search for keywords or patterns
+3. Return matching sessions
+
+### Analyze Tool Usage
+
+1. Count occurrences of each tool type
+2. Track which files were accessed
+3. Generate usage statistics
+
+## Edge Cases
+
+### Empty Content
+
+Some messages may have empty content arrays:
+
+```python
+content = data.get("content", [])
+if not content:
+    continue
+```
+
+### Missing Fields
+
+Always use `.get()` with defaults:
+
+```python
+file_path = item.get("input", {}).get("file_path", "")
+```
+
+### JSON Decode Errors
+
+Session files may contain malformed lines:
+
+```python
+try:
+    data = json.loads(line)
+except json.JSONDecodeError:
+    continue  # Skip malformed lines
+```
+
+### Large Files
+
+Session files can be very large (>100MB). Process line-by-line:
+
+```python
+with open(session_file, 'r') as f:
+    for line in f:  # Streaming, not f.read()
+        process_line(line)
+```
+
+## Performance Tips
+
+### Memory Efficiency
+
+- Process files line-by-line (streaming)
+- Don't load entire file into memory
+- Use generators for large result sets
+
+### Search Optimization
+
+- Early exit when keyword count threshold met
+- Case-insensitive search: normalize once
+- Use `in` operator for substring matching
+
+### Deduplication
+
+When recovering files, keep latest version only:
+
+```python
+files_by_path = {}
+for call in write_calls:
+    files_by_path[file_path] = call  # Overwrites earlier versions
+```
+
+## Security Considerations
+
+### Personal Information
+
+Session files may contain:
+- Absolute file paths with usernames
+- API keys or credentials in code
+- Company-specific information
+- Private conversations
+
+### Safe Sharing
+
+Before sharing extracted content:
+1. Remove absolute paths
+2. Redact sensitive information
+3. Use placeholders for usernames
+4. Verify no credentials present
--- a/claude-code-history-files-finder/references/workflow_examples.md
+++ b/claude-code-history-files-finder/references/workflow_examples.md
@@ -0,0 +1,88 @@
+# Workflow Examples
+
+Detailed workflow examples for common session history recovery scenarios.
+
+## Recover Files Deleted in Cleanup
+
+**Scenario**: Files were deleted during code review, need to recover specific components.
+
+```bash
+# 1. Find sessions mentioning the deleted files
+python3 scripts/analyze_sessions.py search /path/to/project \
+    DeletedComponent ModelScreen RemovedFeature
+
+# 2. Recover content from most relevant session
+python3 scripts/recover_content.py ~/.claude/projects/.../session-id.jsonl \
+    -k DeletedComponent ModelScreen \
+    -o ./recovered/
+
+# 3. Review recovered files
+ls -lh ./recovered/
+```
+
+## Track File Evolution Across Sessions
+
+**Scenario**: Understand how a file changed over multiple sessions.
+
+```bash
+# 1. Find sessions that modified the file
+python3 scripts/analyze_sessions.py search /path/to/project \
+    "componentName.jsx"
+
+# 2. Analyze each session's file operations
+for session in session1.jsonl session2.jsonl session3.jsonl; do
+    python3 scripts/analyze_sessions.py stats $session --show-files | \
+        grep "componentName.jsx"
+done
+
+# 3. Recover all versions
+python3 scripts/recover_content.py session1.jsonl -k componentName -o ./v1/
+python3 scripts/recover_content.py session2.jsonl -k componentName -o ./v2/
+python3 scripts/recover_content.py session3.jsonl -k componentName -o ./v3/
+
+# 4. Compare versions
+diff ./v1/componentName.jsx ./v2/componentName.jsx
+```
+
+## Find Session with Specific Implementation
+
+**Scenario**: Remember implementing a feature but can't find which session.
+
+```bash
+# Search for distinctive keywords from that implementation
+python3 scripts/analyze_sessions.py search /path/to/project \
+    "useModelStatus" "downloadProgress" "ModelScope"
+
+# Review top match
+python3 scripts/analyze_sessions.py stats <top-result-session.jsonl>
+```
+
+## Batch Recovery Across Multiple Sessions
+
+**Scenario**: Recover files containing a keyword from all matching sessions.
+
+```bash
+# Find relevant sessions
+sessions=$(python3 scripts/analyze_sessions.py search /path/to/project \
+    keyword --limit 999 | grep "Path:" | awk '{print $2}')
+
+# Recover from each session
+for session in $sessions; do
+    output_dir="./recovery_$(basename $session .jsonl)"
+    python3 scripts/recover_content.py "$session" -k keyword -o "$output_dir"
+done
+```
+
+## Custom Extraction from Raw JSONL
+
+For extraction needs not covered by bundled scripts:
+
+```python
+import json
+
+with open('session.jsonl', 'r') as f:
+    for line in f:
+        data = json.loads(line)
+        # Custom extraction logic
+        # See references/session_file_format.md for structure
+```