What Is Prompt Caching?
How to Save 90% on Repeated AI Context
Prompt caching lets you save a prefix of your prompt (system instructions, documents, examples) so that subsequent calls reuse it at a fraction of the cost. Anthropic's implementation cuts cache-read costs by 90% vs standard input pricing. Last verified: 2026-04-01.
How Prompt Caching Works
Without caching, every API call processes your entire prompt from scratch — even if the first 10,000 tokens are identical across all calls (e.g., the same system prompt or document).
With caching, you mark a prefix as cacheable. The first call writes it to cache (cache write, charged at a small premium). All subsequent calls within the TTL read from cache (cache read, charged at 90% off standard input price).
Cache Pricing: All Claude Models
| Model | Standard input | Cache write (5-min) | Cache write (1-hour) | Cache read | Savings on read |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | $1.00/M | $1.25/M | $2.00/M | $0.10/M | 90% |
| Claude Sonnet 4.6 | $3.00/M | $3.75/M | $6.00/M | $0.30/M | 90% |
| Claude Opus 4.6 | $5.00/M | $6.25/M | $10.00/M | $0.50/M | 90% |
Cache write costs 25% more than standard input (5-min TTL) or 2× more (1-hour TTL). Cache reads break even after just a few calls.
The Break-Even Calculation
Cache writes cost 25% more than standard input. Cache reads cost 10% of standard input. Break-even = number of reads where total cost = standard input cost for all calls.
Formula: Break-even reads = 1.25 / (1 - 0.10) = 1.39 reads → break even after 2 reads
After 2 cache reads, every additional read saves 90% vs standard pricing. For any system prompt used more than twice, caching is profitable.
Worked Example: SaaS Chatbot with 100K Calls/Month
System prompt: 2,000 tokens (company context, instructions, persona). Each call also includes user message (200 tokens) + history (300 tokens).
| Scenario | Input tokens/call | Cost/call | Monthly cost (100K calls) |
|---|---|---|---|
| Without caching (Haiku) | 2,500 total input | $0.00250 | $250 |
| With caching (Haiku) — system prompt cached | 2,000 cache read + 500 uncached | $0.000700 | $70 + ~$3 (writes) = $73 |
| Monthly savings | $177/month (71% saved) |
Cache write: 100 writes × 2,000 tokens × $1.25/M = $0.25. Effective savings are ~70-72% when accounting for write overhead.
When Caching Helps Most
- Large system prompts (500+ tokens): Instructions, persona, product context, examples — anything repeated across all calls
- Document-in-context: The same policy document, knowledge base, or legal text injected into every query
- Agent memory: A persistent memory block representing the agent's accumulated context
- Few-shot examples: A bank of 10–20 labeled examples included in every prompt for consistency
When Caching Does NOT Help
- Dynamic prefixes: If every call has a unique user-specific prefix, there's nothing to cache
- Low-volume endpoints: Below ~3 calls/hour on the same prefix, cache may expire before it's useful
- Short system prompts (<1,024 tokens): Anthropic requires at least 1,024 cacheable tokens — short prompts don't qualify
- Output-dominated costs: If you're generating long outputs (reports, content), the savings on cached input tokens are less impactful relative to output cost
TTL Options: 5-Minute vs 1-Hour Cache
Anthropic offers two cache durations:
- Ephemeral (5 minutes): Write costs 25% premium. Best for high-frequency chatbots where the same user makes multiple turns quickly
- 1-hour cache: Write costs 2× standard input. Best for document Q&A sessions where context needs to persist across a longer working session
For most chatbots, ephemeral (5-min) caching is sufficient — users rarely take more than 5 minutes between messages. Use 1-hour for document analysis workflows.
See Your Prompt Caching Savings
Enter your system prompt size and monthly call volume to calculate exact savings with caching.
AI API Cost Calculator