feat(history-finder): Add claude-code-history-files-finder skill

Add new skill for finding and recovering content from Claude Code
session history files (.claude/projects/).

Features:
- Search sessions by keywords across project history
- Recover deleted files from Write tool calls
- Analyze session statistics and tool usage
- Track file evolution across multiple sessions

Best practice improvements applied:
- Third-person description in frontmatter
- Imperative writing style throughout
- Progressive disclosure (workflows in references/)
- No content duplication between SKILL.md and references
- Proper exception handling in scripts
- Documented magic numbers

Marketplace integration:
- Updated marketplace.json (v1.13.0, 20 plugins)
- Updated README.md badges, skill section, use cases
- Updated README.zh-CN.md with Chinese translations
- Updated CLAUDE.md skill count and available skills list
- Updated CHANGELOG.md with v1.13.0 entry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
daymade
2025-12-09 14:44:57 +08:00
parent 31a535b409
commit 20cc442ec4
11 changed files with 1736 additions and 9 deletions

View File

@@ -0,0 +1,285 @@
# Claude Code Session File Format
## Overview
Claude Code stores conversation history in JSONL (JSON Lines) format, where each line is a complete JSON object representing a message or event in the conversation.
## File Locations
### Session Files
```
~/.claude/projects/<normalized-project-path>/<session-id>.jsonl
```
**Path normalization**: Project paths are converted by replacing `/` with `-`
Example:
- Project: `/Users/username/Workspace/js/myproject`
- Directory: `~/.claude/projects/-Users-username-Workspace-js-myproject/`
### File Types
| Pattern | Type | Description |
|---------|------|-------------|
| `<uuid>.jsonl` | Main session | User conversation sessions |
| `agent-<id>.jsonl` | Agent session | Sub-agent execution logs |
## JSON Structure
### Message Object
Every line in a JSONL file follows this structure:
```json
{
"role": "user" | "assistant",
"message": {
"role": "user" | "assistant",
"content": [...]
},
"timestamp": "2025-11-26T00:00:00.000Z",
"uuid": "message-uuid",
"parentUuid": "parent-message-uuid",
"sessionId": "session-uuid"
}
```
### Content Types
The `content` array contains different types of content blocks:
#### Text Content
```json
{
"type": "text",
"text": "Message text content"
}
```
#### Tool Use (Write)
```json
{
"type": "tool_use",
"name": "Write",
"input": {
"file_path": "/absolute/path/to/file.js",
"content": "File content here..."
}
}
```
#### Tool Use (Edit)
```json
{
"type": "tool_use",
"name": "Edit",
"input": {
"file_path": "/absolute/path/to/file.js",
"old_string": "Original text",
"new_string": "Replacement text",
"replace_all": false
}
}
```
#### Tool Use (Read)
```json
{
"type": "tool_use",
"name": "Read",
"input": {
"file_path": "/absolute/path/to/file.js",
"offset": 0,
"limit": 100
}
}
```
#### Tool Use (Bash)
```json
{
"type": "tool_use",
"name": "Bash",
"input": {
"command": "ls -la",
"description": "List files"
}
}
```
### Tool Result
```json
{
"type": "tool_result",
"tool_use_id": "tool-use-uuid",
"content": "Result content",
"is_error": false
}
```
## Common Extraction Patterns
### Finding Write Operations
Look for assistant messages with `tool_use` type and `name: "Write"`:
```python
if item.get("type") == "tool_use" and item.get("name") == "Write":
file_path = item["input"]["file_path"]
content = item["input"]["content"]
```
### Finding Edit Operations
```python
if item.get("type") == "tool_use" and item.get("name") == "Edit":
file_path = item["input"]["file_path"]
old_string = item["input"]["old_string"]
new_string = item["input"]["new_string"]
```
### Extracting Text Content
```python
for item in message_content:
if item.get("type") == "text":
text = item.get("text", "")
```
## Field Locations
Due to schema variations, some fields may appear in different locations:
### Role Field
```python
role = data.get("role") or data.get("message", {}).get("role")
```
### Content Field
```python
content = data.get("content") or data.get("message", {}).get("content", [])
```
### Timestamp Field
```python
timestamp = data.get("timestamp", "")
```
## Common Use Cases
### Recover Deleted Files
1. Search for `Write` tool calls with matching file path
2. Extract `input.content` from latest occurrence
3. Save to disk with original filename
### Track File Changes
1. Find all `Edit` and `Write` operations for a file
2. Build chronological list of changes
3. Reconstruct file history
### Search Conversations
1. Extract all `text` content from messages
2. Search for keywords or patterns
3. Return matching sessions
### Analyze Tool Usage
1. Count occurrences of each tool type
2. Track which files were accessed
3. Generate usage statistics
## Edge Cases
### Empty Content
Some messages may have empty content arrays:
```python
content = data.get("content", [])
if not content:
continue
```
### Missing Fields
Always use `.get()` with defaults:
```python
file_path = item.get("input", {}).get("file_path", "")
```
### JSON Decode Errors
Session files may contain malformed lines:
```python
try:
data = json.loads(line)
except json.JSONDecodeError:
continue # Skip malformed lines
```
### Large Files
Session files can be very large (>100MB). Process line-by-line:
```python
with open(session_file, 'r') as f:
for line in f: # Streaming, not f.read()
process_line(line)
```
## Performance Tips
### Memory Efficiency
- Process files line-by-line (streaming)
- Don't load entire file into memory
- Use generators for large result sets
### Search Optimization
- Early exit when keyword count threshold met
- Case-insensitive search: normalize once
- Use `in` operator for substring matching
### Deduplication
When recovering files, keep latest version only:
```python
files_by_path = {}
for call in write_calls:
files_by_path[file_path] = call # Overwrites earlier versions
```
## Security Considerations
### Personal Information
Session files may contain:
- Absolute file paths with usernames
- API keys or credentials in code
- Company-specific information
- Private conversations
### Safe Sharing
Before sharing extracted content:
1. Remove absolute paths
2. Redact sensitive information
3. Use placeholders for usernames
4. Verify no credentials present

View File

@@ -0,0 +1,88 @@
# Workflow Examples
Detailed workflow examples for common session history recovery scenarios.
## Recover Files Deleted in Cleanup
**Scenario**: Files were deleted during code review, need to recover specific components.
```bash
# 1. Find sessions mentioning the deleted files
python3 scripts/analyze_sessions.py search /path/to/project \
DeletedComponent ModelScreen RemovedFeature
# 2. Recover content from most relevant session
python3 scripts/recover_content.py ~/.claude/projects/.../session-id.jsonl \
-k DeletedComponent ModelScreen \
-o ./recovered/
# 3. Review recovered files
ls -lh ./recovered/
```
## Track File Evolution Across Sessions
**Scenario**: Understand how a file changed over multiple sessions.
```bash
# 1. Find sessions that modified the file
python3 scripts/analyze_sessions.py search /path/to/project \
"componentName.jsx"
# 2. Analyze each session's file operations
for session in session1.jsonl session2.jsonl session3.jsonl; do
python3 scripts/analyze_sessions.py stats $session --show-files | \
grep "componentName.jsx"
done
# 3. Recover all versions
python3 scripts/recover_content.py session1.jsonl -k componentName -o ./v1/
python3 scripts/recover_content.py session2.jsonl -k componentName -o ./v2/
python3 scripts/recover_content.py session3.jsonl -k componentName -o ./v3/
# 4. Compare versions
diff ./v1/componentName.jsx ./v2/componentName.jsx
```
## Find Session with Specific Implementation
**Scenario**: Remember implementing a feature but can't find which session.
```bash
# Search for distinctive keywords from that implementation
python3 scripts/analyze_sessions.py search /path/to/project \
"useModelStatus" "downloadProgress" "ModelScope"
# Review top match
python3 scripts/analyze_sessions.py stats <top-result-session.jsonl>
```
## Batch Recovery Across Multiple Sessions
**Scenario**: Recover files containing a keyword from all matching sessions.
```bash
# Find relevant sessions
sessions=$(python3 scripts/analyze_sessions.py search /path/to/project \
keyword --limit 999 | grep "Path:" | awk '{print $2}')
# Recover from each session
for session in $sessions; do
output_dir="./recovery_$(basename $session .jsonl)"
python3 scripts/recover_content.py "$session" -k keyword -o "$output_dir"
done
```
## Custom Extraction from Raw JSONL
For extraction needs not covered by bundled scripts:
```python
import json
with open('session.jsonl', 'r') as f:
for line in f:
data = json.loads(line)
# Custom extraction logic
# See references/session_file_format.md for structure
```