What Is a Context Window?
LLM Memory Limits and Cost Implications Explained
A context window is the maximum number of tokens an AI model can process in a single request — its working memory. This guide explains context window limits across production models, why it affects cost, and how to work within or around them. Last verified: 2026-04-01.
What Is a Context Window?
Think of the context window as the AI model's working memory for a single conversation or task. It encompasses everything the model can "see" and reason over at once:
- Your system prompt (instructions and persona)
- The full conversation history (all previous messages in a chat)
- Any documents or data you inject (via RAG or direct paste)
- The current user message
- The model's response (which also consumes tokens)
When the total exceeds the context window limit, the request fails — or older content gets truncated, degrading quality.
Context Window Sizes by Model (2026)
| Model | Context window | Approx. pages of text | Max document size |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | 1M tokens | ~750 pages | Full codebase / book-length |
| Gemini 2.5 Flash | 1M tokens | ~750 pages | Full codebase / book-length |
| Gemini 2.5 Pro | 1M tokens | ~750 pages | Full codebase / book-length |
| Claude Haiku 4.5 | 200K tokens | ~150 pages | Long reports, mid-size codebases |
| Claude Sonnet 4.6 | 200K tokens | ~150 pages | Long reports, mid-size codebases |
| Claude Opus 4.6 | 200K tokens | ~150 pages | Long reports, mid-size codebases |
| GPT-5.4 nano | 128K tokens | ~96 pages | Short to medium documents |
| GPT-5.4 mini | 128K tokens | ~96 pages | Short to medium documents |
| GPT-5.4 | 1M tokens | ~750 pages | Full codebase / book-length |
| Mistral Small 3.2 | 128K tokens | ~96 pages | Short to medium documents |
Context Window vs Use Case
| Use Case | Tokens needed | Minimum context window | Which models work |
|---|---|---|---|
| Chatbot (5 turns) | ~3,500 | Any model | All models |
| 10-page PDF analysis | ~8,000 | 8K+ | All models |
| 50-page report | ~40,000 | 40K+ | All models (well within any) |
| 100-page report | ~80,000 | 80K+ | All (at 63% of 128K — approaching GPT/Mistral limit) |
| Full legal contract review (200 pages) | ~150,000 | 150K+ | Claude (200K) ✓, Gemini (1M) ✓ — GPT nano/mini ✗ |
| Full codebase (1,000 files) | ~500,000 | 500K+ | Gemini 2.5 Flash/Pro, GPT-5.4 (1M) only |
| Book-length analysis | ~400,000 | 400K+ | Gemini 2.5 Flash/Pro, GPT-5.4 only |
Context Window and Cost: The Relationship
A larger context window doesn't change your per-token price — but it changes how much you can spend per request. Sending 100K tokens of document context to Claude Sonnet 4.6 costs $0.30 just for the input, before any output.
In practice:
- Large context = large input cost — a 100K-token document at $3/M = $0.30/call in input alone
- Chatbot context grows with turns — a 30-turn conversation may accumulate 15K+ input tokens from history
- RAG limits context cost — instead of sending full documents, retrieve only the 3–5 relevant chunks (~2,000 tokens) via vector search
Context Window Strategies
1. Truncate conversation history
For chatbots, only keep the last N turns (3–5) in the context. For most use cases, older turns don't affect answer quality — and keeping them adds linear cost per turn.
2. Use RAG instead of full-document injection
Rather than injecting 50 pages into the context, use embeddings to retrieve the 3–5 most relevant passages (~2,000 tokens). This keeps context small, cost low, and often improves relevance vs. overwhelming the model with noise.
3. Match model to document size
Don't use Claude Sonnet 4.6 ($3/M) for short chatbot turns — use Claude Haiku 4.5 ($1/M) or GPT-5.4 nano ($0.20/M). Reserve large-context models for tasks that actually need it.
4. Prompt caching for large repeated contexts
Claude's prompt caching lets you pay 90% less for re-reading the same context. If you inject the same 10,000-token document into every call for a given user session, caching that prefix at $0.10/M (vs $1.00/M uncached on Haiku) saves $0.009 per call — significant at high volume.
Calculate Your Context Cost
See exactly what your document size or conversation length will cost across all major models.
AI API Cost Calculator