Skip to content
Glossary

What Is a Context Window?
LLM Memory Limits and Cost Implications Explained

A context window is the maximum number of tokens an AI model can process in a single request — its working memory. This guide explains context window limits across production models, why it affects cost, and how to work within or around them. Last verified: 2026-04-01.

8 min read·Updated April 2026
Context Window Sizes — Production Models 2026
1M tokens
Gemini 2.5 Flash / Pro
200K tokens
Claude 4.6 family
128K tokens
GPT-5.4 nano/mini
~750 pages
= 1M tokens of text

What Is a Context Window?

Think of the context window as the AI model's working memory for a single conversation or task. It encompasses everything the model can "see" and reason over at once:

  • Your system prompt (instructions and persona)
  • The full conversation history (all previous messages in a chat)
  • Any documents or data you inject (via RAG or direct paste)
  • The current user message
  • The model's response (which also consumes tokens)

When the total exceeds the context window limit, the request fails — or older content gets truncated, degrading quality.

Context Window Sizes by Model (2026)

ModelContext windowApprox. pages of textMax document size
Gemini 2.5 Flash-Lite1M tokens~750 pagesFull codebase / book-length
Gemini 2.5 Flash1M tokens~750 pagesFull codebase / book-length
Gemini 2.5 Pro1M tokens~750 pagesFull codebase / book-length
Claude Haiku 4.5200K tokens~150 pagesLong reports, mid-size codebases
Claude Sonnet 4.6200K tokens~150 pagesLong reports, mid-size codebases
Claude Opus 4.6200K tokens~150 pagesLong reports, mid-size codebases
GPT-5.4 nano128K tokens~96 pagesShort to medium documents
GPT-5.4 mini128K tokens~96 pagesShort to medium documents
GPT-5.41M tokens~750 pagesFull codebase / book-length
Mistral Small 3.2128K tokens~96 pagesShort to medium documents

Context Window vs Use Case

Use CaseTokens neededMinimum context windowWhich models work
Chatbot (5 turns)~3,500Any modelAll models
10-page PDF analysis~8,0008K+All models
50-page report~40,00040K+All models (well within any)
100-page report~80,00080K+All (at 63% of 128K — approaching GPT/Mistral limit)
Full legal contract review (200 pages)~150,000150K+Claude (200K) ✓, Gemini (1M) ✓ — GPT nano/mini ✗
Full codebase (1,000 files)~500,000500K+Gemini 2.5 Flash/Pro, GPT-5.4 (1M) only
Book-length analysis~400,000400K+Gemini 2.5 Flash/Pro, GPT-5.4 only

Context Window and Cost: The Relationship

A larger context window doesn't change your per-token price — but it changes how much you can spend per request. Sending 100K tokens of document context to Claude Sonnet 4.6 costs $0.30 just for the input, before any output.

In practice:

  • Large context = large input cost — a 100K-token document at $3/M = $0.30/call in input alone
  • Chatbot context grows with turns — a 30-turn conversation may accumulate 15K+ input tokens from history
  • RAG limits context cost — instead of sending full documents, retrieve only the 3–5 relevant chunks (~2,000 tokens) via vector search

Context Window Strategies

1. Truncate conversation history

For chatbots, only keep the last N turns (3–5) in the context. For most use cases, older turns don't affect answer quality — and keeping them adds linear cost per turn.

2. Use RAG instead of full-document injection

Rather than injecting 50 pages into the context, use embeddings to retrieve the 3–5 most relevant passages (~2,000 tokens). This keeps context small, cost low, and often improves relevance vs. overwhelming the model with noise.

3. Match model to document size

Don't use Claude Sonnet 4.6 ($3/M) for short chatbot turns — use Claude Haiku 4.5 ($1/M) or GPT-5.4 nano ($0.20/M). Reserve large-context models for tasks that actually need it.

4. Prompt caching for large repeated contexts

Claude's prompt caching lets you pay 90% less for re-reading the same context. If you inject the same 10,000-token document into every call for a given user session, caching that prefix at $0.10/M (vs $1.00/M uncached on Haiku) saves $0.009 per call — significant at high volume.

Calculate Your Context Cost

See exactly what your document size or conversation length will cost across all major models.

AI API Cost Calculator