Cost Optimization
OpenAI Batch API:
50% Cost Savings — How It Works in 2026
OpenAI's Batch API cuts costs by 50% across GPT-4o, GPT-4o mini, and o-series models. Here's exactly how it works, when to use it, and how much you can save.
7 min read·Updated April 2026
Batch API vs Standard API — Pricing
50%
discount vs standard
24h
max turnaround time
$1.25
GPT-4o input (vs $2.50)
$5.00
GPT-4o output (vs $10.00)
Batch API Pricing vs Standard API
| Model | Standard Input | Batch Input | Standard Output | Batch Output |
|---|---|---|---|---|
| GPT-4o | $2.50/M | $1.25/M | $10.00/M | $5.00/M |
| GPT-4o mini | $0.15/M | $0.075/M | $0.60/M | $0.30/M |
| o3-mini | $1.10/M | $0.55/M | $4.40/M | $2.20/M |
| o3 | $10.00/M | $5.00/M | $40.00/M | $20.00/M |
| text-embedding-3-large | $0.13/M | $0.065/M | — | — |
How the Batch API Works
The Batch API processes requests asynchronously — you submit a batch of requests, OpenAI processes them within 24 hours, and you retrieve results when ready:
- Create a batch file: JSONL file with up to 50,000 requests
- Upload the file: via Files API
- Submit the batch: POST to /v1/batches
- Poll for completion: GET /v1/batches/{batch_id}
- Retrieve results: Download output JSONL file
Key constraints:
- Maximum 24-hour turnaround (not guaranteed latency)
- Up to 50,000 requests per batch
- Enqueued token limits: 90,000 tokens/minute per model by default
- No streaming — results available only after batch completes
Real-World Batch API Savings
Example: Content Moderation at Scale
Processing 100,000 user reviews per day for sentiment and safety:
- Average: 200 tokens input + 50 tokens output per review
- Daily tokens: 20M input + 5M output
- Standard GPT-4o mini: 20M × $0.15 + 5M × $0.60 = $3.00 + $3.00 = $6.00/day
- Batch GPT-4o mini: 20M × $0.075 + 5M × $0.30 = $1.50 + $1.50 = $3.00/day
- Annual savings: $1,095
Example: Research Paper Analysis Pipeline
Processing 10,000 academic papers monthly with GPT-4o:
- Average: 3,000 tokens input + 800 tokens output
- Monthly tokens: 30M input + 8M output
- Standard: 30M × $2.50 + 8M × $10.00 = $75 + $80 = $155/month
- Batch: 30M × $1.25 + 8M × $5.00 = $37.50 + $40 = $77.50/month
- Annual savings: $930
When to Use Batch vs Real-Time API
| Use Batch API | Use Standard API |
|---|---|
| ✅ Nightly data processing jobs | ⚡ User-facing chatbots (response in <5s) |
| ✅ Document classification pipelines | ⚡ Real-time content generation |
| ✅ Embedding generation for vector DBs | ⚡ Interactive code completion |
| ✅ Research analysis (offline) | ⚡ Real-time translation |
| ✅ Content moderation (async) | ⚡ Customer service with SLA |
| ✅ SEO metadata generation | ⚡ Safety-critical real-time decisions |
Combining Batch API with Prompt Caching
For maximum savings, combine Batch API (50% off) with prompt caching (up to 90% off on cached tokens):
- Long system prompt (10K tokens) shared across all batch requests
- After first request, system prompt cached at 50% discount
- Effective cost on cached prompt tokens in batch: $0.0625/M (GPT-4o mini)
- That's 96% cheaper than standard GPT-4o mini uncached input
Calculate Your Batch API Savings
Enter your monthly token volume and see how much you'd save with Batch API.
AI Cost Calculator