What Is Token Pricing? How AI APIs Charge Per Token Explained (2026)

What Is a Token?

A token is the basic unit of text that language models process. Tokenization splits text into subword units — not always clean word boundaries. Key rules of thumb:

1 token ≈ 4 characters of English text
1 token ≈ ¾ of a word (or ~0.75 words)
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words ≈ 1.5 pages of text
Code tokenizes densely — special characters, indentation, and symbols are often individual tokens

Examples: "Hello" = 1 token. "I love natural language processing" = 7 tokens. A 3-page PDF ≈ 2,000 tokens.

How Token Pricing Works

All major AI APIs charge separately for input tokens (what you send) and output tokens (what the model generates). The formula is simple:

cost = (input_tokens / 1,000,000) × input_price_per_million

+ (output_tokens / 1,000,000) × output_price_per_million

Input vs Output Pricing: The Key Asymmetry

Output tokens almost always cost more than input tokens — typically 3–5× more. This is because generating text requires more compute than processing it:

Model	Input price/1M	Output price/1M	Output multiplier
Gemini 2.5 Flash-Lite	$0.10	$0.40	4×
Mistral Small 3.2	$0.10	$0.30	3×
GPT-5.4 nano	$0.20	$1.25	6.25×
Claude Haiku 4.5	$1.00	$5.00	5×
Claude Sonnet 4.6	$3.00	$15.00	5×
Claude Opus 4.6	$5.00	$25.00	5×

What Counts as Input Tokens?

Everything you send to the model in a single API call counts as input:

System prompt — instructions, persona, context rules
Conversation history — all previous turns (in chat applications)
User message — the current user input
Retrieved context — documents injected via RAG
Tool results — outputs from function/tool calls returned to the model

In multi-turn chatbots, the conversation history grows with each turn — meaning each successive message costs more than the previous one. Turn 10 can cost 10× more than turn 1.

Worked Example: Customer Support Chatbot

Turn	Input tokens	Output tokens	Cost (Haiku 4.5)
Turn 1 (system + user)	350	200	$0.00135
Turn 2 (+ history)	700	200	$0.00170
Turn 5 (+ growing history)	1,750	200	$0.00275
Turn 10	3,500	200	$0.00450
Total (10-turn conversation)	19,250	2,000	$0.0293

A 10-turn conversation costs $0.029 on Haiku 4.5 — not $0.00135 × 10 = $0.0135 that a naive per-message estimate would suggest.

Special Token Pricing: Caching and Batch

Pricing type	How it works	Discount vs standard	When to use
Prompt caching (Anthropic)	Cache repeated prefix, pay 90% less on cache reads	90% off	Large system prompts reused across calls
Batch API (Anthropic & OpenAI)	Submit async jobs, results in <24h	50% off	Non-realtime bulk processing
Standard (default)	Real-time inference, full price	—	Interactive applications

How to Estimate Your Token Usage

Use a tokenizer — OpenAI's tiktoken and Anthropic's tokenizer count exact tokens for your prompts
Rule of thumb — 1 token ≈ 4 chars. Count characters, divide by 4
Log real usage — Every API response includes token counts in the response metadata; log these for accurate actuals
Watch for context accumulation — In chat apps, total tokens per session grows with each turn

Why Does Token Pricing Vary So Much Between Models?

Token price reflects model capability and size. Larger, more capable models require more compute per token:

Reasoning quality — Opus 4.6 ($5/$25) produces substantially better analysis than Flash-Lite ($0.10/$0.40)
Model size — Larger parameter counts = more GPU time = higher cost
Provider economics — Google can price Gemini Flash-Lite aggressively due to infrastructure scale; models also compete on price
Market positioning — Budget models (nano, Flash-Lite) are priced to win high-volume workloads; frontier models charge a premium for quality

What Is Token Pricing?How AI APIs Charge Per Token Explained