Replace hardcoded user paths that triggered gitleaks PII detection: - /Users/username/ → ~/ - /Users/user/ → ~/ - -Users-username- → -Users-<username>- (normalized paths) Also fix the sed example to use <home> placeholder instead of regex pattern that would match actual usernames. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
286 lines
5.3 KiB
Markdown
286 lines
5.3 KiB
Markdown
# Claude Code Session File Format
|
|
|
|
## Overview
|
|
|
|
Claude Code stores conversation history in JSONL (JSON Lines) format, where each line is a complete JSON object representing a message or event in the conversation.
|
|
|
|
## File Locations
|
|
|
|
### Session Files
|
|
|
|
```
|
|
~/.claude/projects/<normalized-project-path>/<session-id>.jsonl
|
|
```
|
|
|
|
**Path normalization**: Project paths are converted by replacing `/` with `-`
|
|
|
|
Example:
|
|
- Project: `~/Workspace/js/myproject`
|
|
- Directory: `~/.claude/projects/-Users-<username>-Workspace-js-myproject/`
|
|
|
|
### File Types
|
|
|
|
| Pattern | Type | Description |
|
|
|---------|------|-------------|
|
|
| `<uuid>.jsonl` | Main session | User conversation sessions |
|
|
| `agent-<id>.jsonl` | Agent session | Sub-agent execution logs |
|
|
|
|
## JSON Structure
|
|
|
|
### Message Object
|
|
|
|
Every line in a JSONL file follows this structure:
|
|
|
|
```json
|
|
{
|
|
"role": "user" | "assistant",
|
|
"message": {
|
|
"role": "user" | "assistant",
|
|
"content": [...]
|
|
},
|
|
"timestamp": "2025-11-26T00:00:00.000Z",
|
|
"uuid": "message-uuid",
|
|
"parentUuid": "parent-message-uuid",
|
|
"sessionId": "session-uuid"
|
|
}
|
|
```
|
|
|
|
### Content Types
|
|
|
|
The `content` array contains different types of content blocks:
|
|
|
|
#### Text Content
|
|
|
|
```json
|
|
{
|
|
"type": "text",
|
|
"text": "Message text content"
|
|
}
|
|
```
|
|
|
|
#### Tool Use (Write)
|
|
|
|
```json
|
|
{
|
|
"type": "tool_use",
|
|
"name": "Write",
|
|
"input": {
|
|
"file_path": "/absolute/path/to/file.js",
|
|
"content": "File content here..."
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Tool Use (Edit)
|
|
|
|
```json
|
|
{
|
|
"type": "tool_use",
|
|
"name": "Edit",
|
|
"input": {
|
|
"file_path": "/absolute/path/to/file.js",
|
|
"old_string": "Original text",
|
|
"new_string": "Replacement text",
|
|
"replace_all": false
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Tool Use (Read)
|
|
|
|
```json
|
|
{
|
|
"type": "tool_use",
|
|
"name": "Read",
|
|
"input": {
|
|
"file_path": "/absolute/path/to/file.js",
|
|
"offset": 0,
|
|
"limit": 100
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Tool Use (Bash)
|
|
|
|
```json
|
|
{
|
|
"type": "tool_use",
|
|
"name": "Bash",
|
|
"input": {
|
|
"command": "ls -la",
|
|
"description": "List files"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Tool Result
|
|
|
|
```json
|
|
{
|
|
"type": "tool_result",
|
|
"tool_use_id": "tool-use-uuid",
|
|
"content": "Result content",
|
|
"is_error": false
|
|
}
|
|
```
|
|
|
|
## Common Extraction Patterns
|
|
|
|
### Finding Write Operations
|
|
|
|
Look for assistant messages with `tool_use` type and `name: "Write"`:
|
|
|
|
```python
|
|
if item.get("type") == "tool_use" and item.get("name") == "Write":
|
|
file_path = item["input"]["file_path"]
|
|
content = item["input"]["content"]
|
|
```
|
|
|
|
### Finding Edit Operations
|
|
|
|
```python
|
|
if item.get("type") == "tool_use" and item.get("name") == "Edit":
|
|
file_path = item["input"]["file_path"]
|
|
old_string = item["input"]["old_string"]
|
|
new_string = item["input"]["new_string"]
|
|
```
|
|
|
|
### Extracting Text Content
|
|
|
|
```python
|
|
for item in message_content:
|
|
if item.get("type") == "text":
|
|
text = item.get("text", "")
|
|
```
|
|
|
|
## Field Locations
|
|
|
|
Due to schema variations, some fields may appear in different locations:
|
|
|
|
### Role Field
|
|
|
|
```python
|
|
role = data.get("role") or data.get("message", {}).get("role")
|
|
```
|
|
|
|
### Content Field
|
|
|
|
```python
|
|
content = data.get("content") or data.get("message", {}).get("content", [])
|
|
```
|
|
|
|
### Timestamp Field
|
|
|
|
```python
|
|
timestamp = data.get("timestamp", "")
|
|
```
|
|
|
|
## Common Use Cases
|
|
|
|
### Recover Deleted Files
|
|
|
|
1. Search for `Write` tool calls with matching file path
|
|
2. Extract `input.content` from latest occurrence
|
|
3. Save to disk with original filename
|
|
|
|
### Track File Changes
|
|
|
|
1. Find all `Edit` and `Write` operations for a file
|
|
2. Build chronological list of changes
|
|
3. Reconstruct file history
|
|
|
|
### Search Conversations
|
|
|
|
1. Extract all `text` content from messages
|
|
2. Search for keywords or patterns
|
|
3. Return matching sessions
|
|
|
|
### Analyze Tool Usage
|
|
|
|
1. Count occurrences of each tool type
|
|
2. Track which files were accessed
|
|
3. Generate usage statistics
|
|
|
|
## Edge Cases
|
|
|
|
### Empty Content
|
|
|
|
Some messages may have empty content arrays:
|
|
|
|
```python
|
|
content = data.get("content", [])
|
|
if not content:
|
|
continue
|
|
```
|
|
|
|
### Missing Fields
|
|
|
|
Always use `.get()` with defaults:
|
|
|
|
```python
|
|
file_path = item.get("input", {}).get("file_path", "")
|
|
```
|
|
|
|
### JSON Decode Errors
|
|
|
|
Session files may contain malformed lines:
|
|
|
|
```python
|
|
try:
|
|
data = json.loads(line)
|
|
except json.JSONDecodeError:
|
|
continue # Skip malformed lines
|
|
```
|
|
|
|
### Large Files
|
|
|
|
Session files can be very large (>100MB). Process line-by-line:
|
|
|
|
```python
|
|
with open(session_file, 'r') as f:
|
|
for line in f: # Streaming, not f.read()
|
|
process_line(line)
|
|
```
|
|
|
|
## Performance Tips
|
|
|
|
### Memory Efficiency
|
|
|
|
- Process files line-by-line (streaming)
|
|
- Don't load entire file into memory
|
|
- Use generators for large result sets
|
|
|
|
### Search Optimization
|
|
|
|
- Early exit when keyword count threshold met
|
|
- Case-insensitive search: normalize once
|
|
- Use `in` operator for substring matching
|
|
|
|
### Deduplication
|
|
|
|
When recovering files, keep latest version only:
|
|
|
|
```python
|
|
files_by_path = {}
|
|
for call in write_calls:
|
|
files_by_path[file_path] = call # Overwrites earlier versions
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### Personal Information
|
|
|
|
Session files may contain:
|
|
- Absolute file paths with usernames
|
|
- API keys or credentials in code
|
|
- Company-specific information
|
|
- Private conversations
|
|
|
|
### Safe Sharing
|
|
|
|
Before sharing extracted content:
|
|
1. Remove absolute paths
|
|
2. Redact sensitive information
|
|
3. Use placeholders for usernames
|
|
4. Verify no credentials present
|