How to Calculate AI API Costs 2026: Step-by-Step Guide

Step 1: Understand Tokens

Every AI API charges by tokens, not words. Understanding this is the foundation of cost estimation:

Quick test: Use OpenAI's Tokenizer at platform.openai.com/tokenizer to count tokens in your actual prompts.

For each request, estimate:

System prompt tokens: The fixed instructions you send every request (e.g., 200–500 tokens)
User input tokens: What the user sends (varies widely, 50–2,000 tokens)
Output tokens: What the model generates (50–2,000 tokens)
Total per request: Sum of all three

Scenario	Tokens/Request	10K req/mo (GPT-4o)	10K req/mo (GPT-4o mini)
Simple chatbot (short)	500 in / 200 out	$32.50	$1.95
Customer support (medium)	800 in / 400 out	$60.00	$3.60
Document analysis	3,000 in / 500 out	$125.00	$7.50
Long-context analysis	10,000 in / 1,000 out	$350.00	$21.00

Model	Input $/1M	Output $/1M	10M tokens/mo cost
Gemini 2.0 Flash	$0.10	$0.40	$1.75
Claude Haiku 3.5	$0.80	$4.00	$16.80
GPT-4o mini	$0.15	$0.60	$3.00
GPT-4o	$2.50	$10.00	$50.00
Claude Sonnet 4.5	$3.00	$15.00	$60.00
Claude Opus 4	$15.00	$75.00	$300.00

Every major provider has spending controls:

OpenAI: Platform settings → Billing → Usage limits. Set soft and hard limits.
Anthropic: Console → Billing → Spending limit. Configure per-month caps.
Google AI: Cloud Console → Billing → Budgets & Alerts. Set by project.
AWS Bedrock: AWS Budgets service with SNS notifications.

Recommended approach: Set alert at 50%, 80%, and hard limit at 100% of budget.

Forgetting system prompts: A 500-token system prompt × 100K requests = 50M tokens/month. At GPT-4o rates, that's $125/month just for the system prompt.
Underestimating output length: Models often generate more than you expect. Test with real data.
Ignoring retry logic: If your app retries on errors, you could be paying 2× for the same request.
Not using caching: If the same context is sent repeatedly, enable prompt caching for 50–90% savings.
Using GPT-4o for everything: 80% of tasks can use mini/flash at 10–20× lower cost.

Startup tip: Use Gemini Flash free tier for development and testing — it's 1M tokens/day, enough for most prototypes.