What Is Token Pricing?
How AI APIs Charge Per Token Explained
Token pricing is the standard billing model for AI language model APIs. You pay per token — a unit of text roughly equal to ¾ of a word — for both text sent to the model (input) and text the model returns (output). This guide explains exactly how it works and how to estimate costs. Last verified: 2026-04-01.
What Is a Token?
A token is the basic unit of text that language models process. Tokenization splits text into subword units — not always clean word boundaries. Key rules of thumb:
- 1 token ≈ 4 characters of English text
- 1 token ≈ ¾ of a word (or ~0.75 words)
- 100 tokens ≈ 75 words
- 1,000 tokens ≈ 750 words ≈ 1.5 pages of text
- Code tokenizes densely — special characters, indentation, and symbols are often individual tokens
Examples: "Hello" = 1 token. "I love natural language processing" = 7 tokens. A 3-page PDF ≈ 2,000 tokens.
How Token Pricing Works
All major AI APIs charge separately for input tokens (what you send) and output tokens (what the model generates). The formula is simple:
Input vs Output Pricing: The Key Asymmetry
Output tokens almost always cost more than input tokens — typically 3–5× more. This is because generating text requires more compute than processing it:
| Model | Input price/1M | Output price/1M | Output multiplier |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 4× |
| Mistral Small 3.2 | $0.10 | $0.30 | 3× |
| GPT-5.4 nano | $0.20 | $1.25 | 6.25× |
| Claude Haiku 4.5 | $1.00 | $5.00 | 5× |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 5× |
| Claude Opus 4.6 | $5.00 | $25.00 | 5× |
What Counts as Input Tokens?
Everything you send to the model in a single API call counts as input:
- System prompt — instructions, persona, context rules
- Conversation history — all previous turns (in chat applications)
- User message — the current user input
- Retrieved context — documents injected via RAG
- Tool results — outputs from function/tool calls returned to the model
In multi-turn chatbots, the conversation history grows with each turn — meaning each successive message costs more than the previous one. Turn 10 can cost 10× more than turn 1.
Worked Example: Customer Support Chatbot
| Turn | Input tokens | Output tokens | Cost (Haiku 4.5) |
|---|---|---|---|
| Turn 1 (system + user) | 350 | 200 | $0.00135 |
| Turn 2 (+ history) | 700 | 200 | $0.00170 |
| Turn 5 (+ growing history) | 1,750 | 200 | $0.00275 |
| Turn 10 | 3,500 | 200 | $0.00450 |
| Total (10-turn conversation) | 19,250 | 2,000 | $0.0293 |
A 10-turn conversation costs $0.029 on Haiku 4.5 — not $0.00135 × 10 = $0.0135 that a naive per-message estimate would suggest.
Special Token Pricing: Caching and Batch
| Pricing type | How it works | Discount vs standard | When to use |
|---|---|---|---|
| Prompt caching (Anthropic) | Cache repeated prefix, pay 90% less on cache reads | 90% off | Large system prompts reused across calls |
| Batch API (Anthropic & OpenAI) | Submit async jobs, results in <24h | 50% off | Non-realtime bulk processing |
| Standard (default) | Real-time inference, full price | — | Interactive applications |
How to Estimate Your Token Usage
- Use a tokenizer — OpenAI's tiktoken and Anthropic's tokenizer count exact tokens for your prompts
- Rule of thumb — 1 token ≈ 4 chars. Count characters, divide by 4
- Log real usage — Every API response includes token counts in the response metadata; log these for accurate actuals
- Watch for context accumulation — In chat apps, total tokens per session grows with each turn
Why Does Token Pricing Vary So Much Between Models?
Token price reflects model capability and size. Larger, more capable models require more compute per token:
- Reasoning quality — Opus 4.6 ($5/$25) produces substantially better analysis than Flash-Lite ($0.10/$0.40)
- Model size — Larger parameter counts = more GPU time = higher cost
- Provider economics — Google can price Gemini Flash-Lite aggressively due to infrastructure scale; models also compete on price
- Market positioning — Budget models (nano, Flash-Lite) are priced to win high-volume workloads; frontier models charge a premium for quality
Calculate Exact Token Costs for Your Workload
Enter your input/output token counts and model to get a precise monthly cost estimate.
AI API Cost Calculator