Files
claude-code-skills-reference/claude-code-history-files-finder/references/session_file_format.md
daymade 2833aaec42 Fix PII: replace /Users/username/ with ~ and <username> placeholder
Replace hardcoded user paths that triggered gitleaks PII detection:
- /Users/username/ → ~/
- /Users/user/ → ~/
- -Users-username- → -Users-<username>- (normalized paths)

Also fix the sed example to use <home> placeholder instead of
regex pattern that would match actual usernames.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:36:50 +08:00

5.3 KiB

Claude Code Session File Format

Overview

Claude Code stores conversation history in JSONL (JSON Lines) format, where each line is a complete JSON object representing a message or event in the conversation.

File Locations

Session Files

~/.claude/projects/<normalized-project-path>/<session-id>.jsonl

Path normalization: Project paths are converted by replacing / with -

Example:

  • Project: ~/Workspace/js/myproject
  • Directory: ~/.claude/projects/-Users-<username>-Workspace-js-myproject/

File Types

Pattern Type Description
<uuid>.jsonl Main session User conversation sessions
agent-<id>.jsonl Agent session Sub-agent execution logs

JSON Structure

Message Object

Every line in a JSONL file follows this structure:

{
  "role": "user" | "assistant",
  "message": {
    "role": "user" | "assistant",
    "content": [...]
  },
  "timestamp": "2025-11-26T00:00:00.000Z",
  "uuid": "message-uuid",
  "parentUuid": "parent-message-uuid",
  "sessionId": "session-uuid"
}

Content Types

The content array contains different types of content blocks:

Text Content

{
  "type": "text",
  "text": "Message text content"
}

Tool Use (Write)

{
  "type": "tool_use",
  "name": "Write",
  "input": {
    "file_path": "/absolute/path/to/file.js",
    "content": "File content here..."
  }
}

Tool Use (Edit)

{
  "type": "tool_use",
  "name": "Edit",
  "input": {
    "file_path": "/absolute/path/to/file.js",
    "old_string": "Original text",
    "new_string": "Replacement text",
    "replace_all": false
  }
}

Tool Use (Read)

{
  "type": "tool_use",
  "name": "Read",
  "input": {
    "file_path": "/absolute/path/to/file.js",
    "offset": 0,
    "limit": 100
  }
}

Tool Use (Bash)

{
  "type": "tool_use",
  "name": "Bash",
  "input": {
    "command": "ls -la",
    "description": "List files"
  }
}

Tool Result

{
  "type": "tool_result",
  "tool_use_id": "tool-use-uuid",
  "content": "Result content",
  "is_error": false
}

Common Extraction Patterns

Finding Write Operations

Look for assistant messages with tool_use type and name: "Write":

if item.get("type") == "tool_use" and item.get("name") == "Write":
    file_path = item["input"]["file_path"]
    content = item["input"]["content"]

Finding Edit Operations

if item.get("type") == "tool_use" and item.get("name") == "Edit":
    file_path = item["input"]["file_path"]
    old_string = item["input"]["old_string"]
    new_string = item["input"]["new_string"]

Extracting Text Content

for item in message_content:
    if item.get("type") == "text":
        text = item.get("text", "")

Field Locations

Due to schema variations, some fields may appear in different locations:

Role Field

role = data.get("role") or data.get("message", {}).get("role")

Content Field

content = data.get("content") or data.get("message", {}).get("content", [])

Timestamp Field

timestamp = data.get("timestamp", "")

Common Use Cases

Recover Deleted Files

  1. Search for Write tool calls with matching file path
  2. Extract input.content from latest occurrence
  3. Save to disk with original filename

Track File Changes

  1. Find all Edit and Write operations for a file
  2. Build chronological list of changes
  3. Reconstruct file history

Search Conversations

  1. Extract all text content from messages
  2. Search for keywords or patterns
  3. Return matching sessions

Analyze Tool Usage

  1. Count occurrences of each tool type
  2. Track which files were accessed
  3. Generate usage statistics

Edge Cases

Empty Content

Some messages may have empty content arrays:

content = data.get("content", [])
if not content:
    continue

Missing Fields

Always use .get() with defaults:

file_path = item.get("input", {}).get("file_path", "")

JSON Decode Errors

Session files may contain malformed lines:

try:
    data = json.loads(line)
except json.JSONDecodeError:
    continue  # Skip malformed lines

Large Files

Session files can be very large (>100MB). Process line-by-line:

with open(session_file, 'r') as f:
    for line in f:  # Streaming, not f.read()
        process_line(line)

Performance Tips

Memory Efficiency

  • Process files line-by-line (streaming)
  • Don't load entire file into memory
  • Use generators for large result sets

Search Optimization

  • Early exit when keyword count threshold met
  • Case-insensitive search: normalize once
  • Use in operator for substring matching

Deduplication

When recovering files, keep latest version only:

files_by_path = {}
for call in write_calls:
    files_by_path[file_path] = call  # Overwrites earlier versions

Security Considerations

Personal Information

Session files may contain:

  • Absolute file paths with usernames
  • API keys or credentials in code
  • Company-specific information
  • Private conversations

Safe Sharing

Before sharing extracted content:

  1. Remove absolute paths
  2. Redact sensitive information
  3. Use placeholders for usernames
  4. Verify no credentials present