Skip to content
Glossary

What Is Token Pricing?
How AI APIs Charge Per Token Explained

Token pricing is the standard billing model for AI language model APIs. You pay per token — a unit of text roughly equal to ¾ of a word — for both text sent to the model (input) and text the model returns (output). This guide explains exactly how it works and how to estimate costs. Last verified: 2026-04-01.

7 min read·Updated April 2026
Token Pricing Quick Reference (2026)
$0.10/M
cheapest input (Flash-Lite/Mistral S3.2)
$5.00/M
mid-range output (Haiku 4.5)
$25/M
premium output (Opus 4.6)
~750 words
= 1,000 tokens

What Is a Token?

A token is the basic unit of text that language models process. Tokenization splits text into subword units — not always clean word boundaries. Key rules of thumb:

  • 1 token ≈ 4 characters of English text
  • 1 token ≈ ¾ of a word (or ~0.75 words)
  • 100 tokens ≈ 75 words
  • 1,000 tokens ≈ 750 words ≈ 1.5 pages of text
  • Code tokenizes densely — special characters, indentation, and symbols are often individual tokens

Examples: "Hello" = 1 token. "I love natural language processing" = 7 tokens. A 3-page PDF ≈ 2,000 tokens.

How Token Pricing Works

All major AI APIs charge separately for input tokens (what you send) and output tokens (what the model generates). The formula is simple:

cost = (input_tokens / 1,000,000) × input_price_per_million
+ (output_tokens / 1,000,000) × output_price_per_million

Input vs Output Pricing: The Key Asymmetry

Output tokens almost always cost more than input tokens — typically 3–5× more. This is because generating text requires more compute than processing it:

ModelInput price/1MOutput price/1MOutput multiplier
Gemini 2.5 Flash-Lite$0.10$0.40
Mistral Small 3.2$0.10$0.30
GPT-5.4 nano$0.20$1.256.25×
Claude Haiku 4.5$1.00$5.00
Claude Sonnet 4.6$3.00$15.00
Claude Opus 4.6$5.00$25.00

What Counts as Input Tokens?

Everything you send to the model in a single API call counts as input:

  • System prompt — instructions, persona, context rules
  • Conversation history — all previous turns (in chat applications)
  • User message — the current user input
  • Retrieved context — documents injected via RAG
  • Tool results — outputs from function/tool calls returned to the model

In multi-turn chatbots, the conversation history grows with each turn — meaning each successive message costs more than the previous one. Turn 10 can cost 10× more than turn 1.

Worked Example: Customer Support Chatbot

TurnInput tokensOutput tokensCost (Haiku 4.5)
Turn 1 (system + user)350200$0.00135
Turn 2 (+ history)700200$0.00170
Turn 5 (+ growing history)1,750200$0.00275
Turn 103,500200$0.00450
Total (10-turn conversation)19,2502,000$0.0293

A 10-turn conversation costs $0.029 on Haiku 4.5 — not $0.00135 × 10 = $0.0135 that a naive per-message estimate would suggest.

Special Token Pricing: Caching and Batch

Pricing typeHow it worksDiscount vs standardWhen to use
Prompt caching (Anthropic)Cache repeated prefix, pay 90% less on cache reads90% offLarge system prompts reused across calls
Batch API (Anthropic & OpenAI)Submit async jobs, results in <24h50% offNon-realtime bulk processing
Standard (default)Real-time inference, full priceInteractive applications

How to Estimate Your Token Usage

  1. Use a tokenizer — OpenAI's tiktoken and Anthropic's tokenizer count exact tokens for your prompts
  2. Rule of thumb — 1 token ≈ 4 chars. Count characters, divide by 4
  3. Log real usage — Every API response includes token counts in the response metadata; log these for accurate actuals
  4. Watch for context accumulation — In chat apps, total tokens per session grows with each turn

Why Does Token Pricing Vary So Much Between Models?

Token price reflects model capability and size. Larger, more capable models require more compute per token:

  • Reasoning quality — Opus 4.6 ($5/$25) produces substantially better analysis than Flash-Lite ($0.10/$0.40)
  • Model size — Larger parameter counts = more GPU time = higher cost
  • Provider economics — Google can price Gemini Flash-Lite aggressively due to infrastructure scale; models also compete on price
  • Market positioning — Budget models (nano, Flash-Lite) are priced to win high-volume workloads; frontier models charge a premium for quality

Calculate Exact Token Costs for Your Workload

Enter your input/output token counts and model to get a precise monthly cost estimate.

AI API Cost Calculator