# Claude Code Session File Format

## Overview

Claude Code stores conversation history in JSONL (JSON Lines) format, where each line is a complete JSON object representing a message or event in the conversation.

## File Locations

### Session Files

```
~/.claude/projects/<normalized-project-path>/<session-id>.jsonl
```

**Path normalization**: Project paths are converted by replacing `/` with `-`

Example:
- Project: `~/Workspace/js/myproject`
- Directory: `~/.claude/projects/-Users-<username>-Workspace-js-myproject/`

### File Types

| Pattern | Type | Description |
|---------|------|-------------|
| `<uuid>.jsonl` | Main session | User conversation sessions |
| `agent-<id>.jsonl` | Agent session | Sub-agent execution logs |

## JSON Structure

### Message Object

Every line in a JSONL file follows this structure:

```json
{
  "role": "user" | "assistant",
  "message": {
    "role": "user" | "assistant",
    "content": [...]
  },
  "timestamp": "2025-11-26T00:00:00.000Z",
  "uuid": "message-uuid",
  "parentUuid": "parent-message-uuid",
  "sessionId": "session-uuid"
}
```

### Content Types

The `content` array contains different types of content blocks:

#### Text Content

```json
{
  "type": "text",
  "text": "Message text content"
}
```

#### Tool Use (Write)

```json
{
  "type": "tool_use",
  "name": "Write",
  "input": {
    "file_path": "/absolute/path/to/file.js",
    "content": "File content here..."
  }
}
```

#### Tool Use (Edit)

```json
{
  "type": "tool_use",
  "name": "Edit",
  "input": {
    "file_path": "/absolute/path/to/file.js",
    "old_string": "Original text",
    "new_string": "Replacement text",
    "replace_all": false
  }
}
```

#### Tool Use (Read)

```json
{
  "type": "tool_use",
  "name": "Read",
  "input": {
    "file_path": "/absolute/path/to/file.js",
    "offset": 0,
    "limit": 100
  }
}
```

#### Tool Use (Bash)

```json
{
  "type": "tool_use",
  "name": "Bash",
  "input": {
    "command": "ls -la",
    "description": "List files"
  }
}
```

### Tool Result

```json
{
  "type": "tool_result",
  "tool_use_id": "tool-use-uuid",
  "content": "Result content",
  "is_error": false
}
```

## Common Extraction Patterns

### Finding Write Operations

Look for assistant messages with `tool_use` type and `name: "Write"`:

```python
if item.get("type") == "tool_use" and item.get("name") == "Write":
    file_path = item["input"]["file_path"]
    content = item["input"]["content"]
```

### Finding Edit Operations

```python
if item.get("type") == "tool_use" and item.get("name") == "Edit":
    file_path = item["input"]["file_path"]
    old_string = item["input"]["old_string"]
    new_string = item["input"]["new_string"]
```

### Extracting Text Content

```python
for item in message_content:
    if item.get("type") == "text":
        text = item.get("text", "")
```

## Field Locations

Due to schema variations, some fields may appear in different locations:

### Role Field

```python
role = data.get("role") or data.get("message", {}).get("role")
```

### Content Field

```python
content = data.get("content") or data.get("message", {}).get("content", [])
```

### Timestamp Field

```python
timestamp = data.get("timestamp", "")
```

## Common Use Cases

### Recover Deleted Files

1. Search for `Write` tool calls with matching file path
2. Extract `input.content` from latest occurrence
3. Save to disk with original filename

### Track File Changes

1. Find all `Edit` and `Write` operations for a file
2. Build chronological list of changes
3. Reconstruct file history

### Search Conversations

1. Extract all `text` content from messages
2. Search for keywords or patterns
3. Return matching sessions

### Analyze Tool Usage

1. Count occurrences of each tool type
2. Track which files were accessed
3. Generate usage statistics

## Edge Cases

### Empty Content

Some messages may have empty content arrays:

```python
content = data.get("content", [])
if not content:
    continue
```

### Missing Fields

Always use `.get()` with defaults:

```python
file_path = item.get("input", {}).get("file_path", "")
```

### JSON Decode Errors

Session files may contain malformed lines:

```python
try:
    data = json.loads(line)
except json.JSONDecodeError:
    continue  # Skip malformed lines
```

### Large Files

Session files can be very large (>100MB). Process line-by-line:

```python
with open(session_file, 'r') as f:
    for line in f:  # Streaming, not f.read()
        process_line(line)
```

## Performance Tips

### Memory Efficiency

- Process files line-by-line (streaming)
- Don't load entire file into memory
- Use generators for large result sets

### Search Optimization

- Early exit when keyword count threshold met
- Case-insensitive search: normalize once
- Use `in` operator for substring matching

### Deduplication

When recovering files, keep latest version only:

```python
files_by_path = {}
for call in write_calls:
    files_by_path[file_path] = call  # Overwrites earlier versions
```

## Security Considerations

### Personal Information

Session files may contain:
- Absolute file paths with usernames
- API keys or credentials in code
- Company-specific information
- Private conversations

### Safe Sharing

Before sharing extracted content:
1. Remove absolute paths
2. Redact sensitive information
3. Use placeholders for usernames
4. Verify no credentials present