Claude Code — Session JSONL Files
Claude Code stores complete machine-readable transcripts of every session as JSONL files on disk. Understanding the format is necessary before using them for summarization or analysis.
Why / When to Use
When building tooling that reads, summarises, or analyses Claude Code sessions — e.g. a daily PKM hook, a cost dashboard, or a post-session review script.
Core Concept / Commands
File locations
~/.claude/projects/<project-hash>/sessions/<session-uuid>.jsonl # full transcript
~/.claude/projects/<project-hash>/sessions-index.json # lightweight index
The index contains auto-generated summaries, message counts, git branches, and creation/modification timestamps.
What’s in the JSONL
Each line is a structured event. The file contains:
- Full human/assistant turn content
- Tool calls with exact inputs and outputs
- Extended thinking blocks
- Subagent spawning events
- Token usage per turn
- Model selection
- Working directory and git state snapshots
Filter before summarising
Raw JSONL is too large to feed directly to the API. Extract only conversational turns:
import json
def extract_conversation(jsonl_path):
messages = []
with open(jsonl_path) as f:
for line in f:
event = json.loads(line)
# Only keep human/assistant turns, skip tool noise
if event.get("type") in ("user", "assistant"):
messages.append(event)
return messagesThen send the filtered list to Claude API for summarization.
Key Options / Variants
- sessions-index.json — safe for quick lookups (title, count, dates); no need to parse full JSONL
- Filter by
event.get("type")to scope what you extract
Gotchas
- A single session can be tens of thousands of lines and grow to 5+ GB in extreme cases.
- Known bug: “progress” events can bloat the file — a 1,739-entry session averaged ~3 MB per entry while the conversation itself was only 0.67 MB.
- Never feed raw JSONL directly to the API; always filter first.
Source
Conversation: “CC-Hooks” — 2026-05-16