What Is Prompt Caching? How to Save 90% on Claude API Costs (2026)

How Prompt Caching Works

Without caching, every API call processes your entire prompt from scratch — even if the first 10,000 tokens are identical across all calls (e.g., the same system prompt or document).

With caching, you mark a prefix as cacheable. The first call writes it to cache (cache write, charged at a small premium). All subsequent calls within the TTL read from cache (cache read, charged at 90% off standard input price).

# Standard call — pays full input price for system prompt every time

{"role": "system", "content": "You are a helpful assistant. [2,000 tokens of instructions]..."}

# With caching — marks system prompt as cacheable prefix

{"role": "system", "content": [{"type": "text", "text": "...", "cache_control": {"type": "ephemeral"}}]}

Cache Pricing: All Claude Models

Model	Standard input	Cache write (5-min)	Cache write (1-hour)	Cache read	Savings on read
Claude Haiku 4.5	$1.00/M	$1.25/M	$2.00/M	$0.10/M	90%
Claude Sonnet 4.6	$3.00/M	$3.75/M	$6.00/M	$0.30/M	90%
Claude Opus 4.6	$5.00/M	$6.25/M	$10.00/M	$0.50/M	90%

Cache write costs 25% more than standard input (5-min TTL) or 2× more (1-hour TTL). Cache reads break even after just a few calls.

The Break-Even Calculation

Cache writes cost 25% more than standard input. Cache reads cost 10% of standard input. Break-even = number of reads where total cost = standard input cost for all calls.

Formula: Break-even reads = 1.25 / (1 - 0.10) = 1.39 reads → break even after 2 reads

After 2 cache reads, every additional read saves 90% vs standard pricing. For any system prompt used more than twice, caching is profitable.

Worked Example: SaaS Chatbot with 100K Calls/Month

System prompt: 2,000 tokens (company context, instructions, persona). Each call also includes user message (200 tokens) + history (300 tokens).

Scenario	Input tokens/call	Cost/call	Monthly cost (100K calls)
Without caching (Haiku)	2,500 total input	$0.00250	$250
With caching (Haiku) — system prompt cached	2,000 cache read + 500 uncached	$0.000700	$70 + ~$3 (writes) = $73
Monthly savings			$177/month (71% saved)

Cache write: 100 writes × 2,000 tokens × $1.25/M = $0.25. Effective savings are ~70-72% when accounting for write overhead.

Haiku cache reads ($0.10/M) are cheaper than GPT-5.4 nano's standard input ($0.20/M). If your system prompt is large and reused heavily, Claude Haiku 4.5 with caching active can become the lowest-cost option in the market — even cheaper than nano on a per-effective-token basis.

When Caching Helps Most

Large system prompts (500+ tokens): Instructions, persona, product context, examples — anything repeated across all calls
Document-in-context: The same policy document, knowledge base, or legal text injected into every query
Agent memory: A persistent memory block representing the agent's accumulated context
Few-shot examples: A bank of 10–20 labeled examples included in every prompt for consistency

When Caching Does NOT Help

Dynamic prefixes: If every call has a unique user-specific prefix, there's nothing to cache
Low-volume endpoints: Below ~3 calls/hour on the same prefix, cache may expire before it's useful
Short system prompts (<1,024 tokens): Anthropic requires at least 1,024 cacheable tokens — short prompts don't qualify
Output-dominated costs: If you're generating long outputs (reports, content), the savings on cached input tokens are less impactful relative to output cost

TTL Options: 5-Minute vs 1-Hour Cache

Anthropic offers two cache durations:

Ephemeral (5 minutes): Write costs 25% premium. Best for high-frequency chatbots where the same user makes multiple turns quickly
1-hour cache: Write costs 2× standard input. Best for document Q&A sessions where context needs to persist across a longer working session

For most chatbots, ephemeral (5-min) caching is sufficient — users rarely take more than 5 minutes between messages. Use 1-hour for document analysis workflows.

What Is Prompt Caching?
How to Save 90% on Repeated AI Context

How Prompt Caching Works

Cache Pricing: All Claude Models

The Break-Even Calculation

Worked Example: SaaS Chatbot with 100K Calls/Month

When Caching Helps Most

When Caching Does NOT Help

TTL Options: 5-Minute vs 1-Hour Cache

See Your Prompt Caching Savings

What Is Prompt Caching?How to Save 90% on Repeated AI Context

How Prompt Caching Works

Cache Pricing: All Claude Models

The Break-Even Calculation

Worked Example: SaaS Chatbot with 100K Calls/Month

When Caching Helps Most

When Caching Does NOT Help

TTL Options: 5-Minute vs 1-Hour Cache

See Your Prompt Caching Savings

What Is Prompt Caching?
How to Save 90% on Repeated AI Context