Replace hardcoded user paths that triggered gitleaks PII detection: - /Users/username/ → ~/ - /Users/user/ → ~/ - -Users-username- → -Users-<username>- (normalized paths) Also fix the sed example to use <home> placeholder instead of regex pattern that would match actual usernames. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.3 KiB
Claude Code Session File Format
Overview
Claude Code stores conversation history in JSONL (JSON Lines) format, where each line is a complete JSON object representing a message or event in the conversation.
File Locations
Session Files
~/.claude/projects/<normalized-project-path>/<session-id>.jsonl
Path normalization: Project paths are converted by replacing / with -
Example:
- Project:
~/Workspace/js/myproject - Directory:
~/.claude/projects/-Users-<username>-Workspace-js-myproject/
File Types
| Pattern | Type | Description |
|---|---|---|
<uuid>.jsonl |
Main session | User conversation sessions |
agent-<id>.jsonl |
Agent session | Sub-agent execution logs |
JSON Structure
Message Object
Every line in a JSONL file follows this structure:
{
"role": "user" | "assistant",
"message": {
"role": "user" | "assistant",
"content": [...]
},
"timestamp": "2025-11-26T00:00:00.000Z",
"uuid": "message-uuid",
"parentUuid": "parent-message-uuid",
"sessionId": "session-uuid"
}
Content Types
The content array contains different types of content blocks:
Text Content
{
"type": "text",
"text": "Message text content"
}
Tool Use (Write)
{
"type": "tool_use",
"name": "Write",
"input": {
"file_path": "/absolute/path/to/file.js",
"content": "File content here..."
}
}
Tool Use (Edit)
{
"type": "tool_use",
"name": "Edit",
"input": {
"file_path": "/absolute/path/to/file.js",
"old_string": "Original text",
"new_string": "Replacement text",
"replace_all": false
}
}
Tool Use (Read)
{
"type": "tool_use",
"name": "Read",
"input": {
"file_path": "/absolute/path/to/file.js",
"offset": 0,
"limit": 100
}
}
Tool Use (Bash)
{
"type": "tool_use",
"name": "Bash",
"input": {
"command": "ls -la",
"description": "List files"
}
}
Tool Result
{
"type": "tool_result",
"tool_use_id": "tool-use-uuid",
"content": "Result content",
"is_error": false
}
Common Extraction Patterns
Finding Write Operations
Look for assistant messages with tool_use type and name: "Write":
if item.get("type") == "tool_use" and item.get("name") == "Write":
file_path = item["input"]["file_path"]
content = item["input"]["content"]
Finding Edit Operations
if item.get("type") == "tool_use" and item.get("name") == "Edit":
file_path = item["input"]["file_path"]
old_string = item["input"]["old_string"]
new_string = item["input"]["new_string"]
Extracting Text Content
for item in message_content:
if item.get("type") == "text":
text = item.get("text", "")
Field Locations
Due to schema variations, some fields may appear in different locations:
Role Field
role = data.get("role") or data.get("message", {}).get("role")
Content Field
content = data.get("content") or data.get("message", {}).get("content", [])
Timestamp Field
timestamp = data.get("timestamp", "")
Common Use Cases
Recover Deleted Files
- Search for
Writetool calls with matching file path - Extract
input.contentfrom latest occurrence - Save to disk with original filename
Track File Changes
- Find all
EditandWriteoperations for a file - Build chronological list of changes
- Reconstruct file history
Search Conversations
- Extract all
textcontent from messages - Search for keywords or patterns
- Return matching sessions
Analyze Tool Usage
- Count occurrences of each tool type
- Track which files were accessed
- Generate usage statistics
Edge Cases
Empty Content
Some messages may have empty content arrays:
content = data.get("content", [])
if not content:
continue
Missing Fields
Always use .get() with defaults:
file_path = item.get("input", {}).get("file_path", "")
JSON Decode Errors
Session files may contain malformed lines:
try:
data = json.loads(line)
except json.JSONDecodeError:
continue # Skip malformed lines
Large Files
Session files can be very large (>100MB). Process line-by-line:
with open(session_file, 'r') as f:
for line in f: # Streaming, not f.read()
process_line(line)
Performance Tips
Memory Efficiency
- Process files line-by-line (streaming)
- Don't load entire file into memory
- Use generators for large result sets
Search Optimization
- Early exit when keyword count threshold met
- Case-insensitive search: normalize once
- Use
inoperator for substring matching
Deduplication
When recovering files, keep latest version only:
files_by_path = {}
for call in write_calls:
files_by_path[file_path] = call # Overwrites earlier versions
Security Considerations
Personal Information
Session files may contain:
- Absolute file paths with usernames
- API keys or credentials in code
- Company-specific information
- Private conversations
Safe Sharing
Before sharing extracted content:
- Remove absolute paths
- Redact sensitive information
- Use placeholders for usernames
- Verify no credentials present