AWS Bedrock Pricing 2026:
Claude, Llama, Titan & All Models
Complete AWS Bedrock pricing guide for 2026 — all available models including Anthropic Claude, Meta Llama, Amazon Titan, Mistral, and Cohere. Includes on-demand vs provisioned throughput comparison.
AWS Bedrock Model Pricing (2026)
| Model | Input $/1K tokens | Output $/1K tokens | Context Window |
|---|---|---|---|
| Anthropic Claude Sonnet 4.5 | $0.00300 | $0.01500 | 200K tokens |
| Anthropic Claude Haiku 3.5 | $0.00080 | $0.00400 | 200K tokens |
| Anthropic Claude Opus 4 | $0.01500 | $0.07500 | 200K tokens |
| Meta Llama 3.1 8B Instruct | $0.00030 | $0.00060 | 128K tokens |
| Meta Llama 3.1 70B Instruct | $0.00265 | $0.00350 | 128K tokens |
| Meta Llama 3.3 70B Instruct | $0.00265 | $0.00350 | 128K tokens |
| Amazon Titan Text Express | $0.00030 | $0.00040 | 8K tokens |
| Amazon Titan Text Lite | $0.00015 | $0.00020 | 4K tokens |
| Mistral 7B Instruct | $0.00015 | $0.00020 | 32K tokens |
| Mistral Large 2 | $0.00200 | $0.00600 | 128K tokens |
| Cohere Command R | $0.00050 | $0.00150 | 128K tokens |
AWS Bedrock vs Direct API: Which Is Cheaper?
For most models, AWS Bedrock charges a small premium over direct API access for the convenience of AWS integration:
- Claude Sonnet via Anthropic API: $3.00/M input, $15.00/M output
- Claude Sonnet via AWS Bedrock: $3.00/M input, $15.00/M output (same price)
- Llama 3.1 70B via Groq: $0.59/M input, $0.79/M output
- Llama 3.1 70B via AWS Bedrock: $2.65/M input, $3.50/M output (4× more expensive)
Key insight: AWS Bedrock's value is not price — it's AWS ecosystem integration (IAM, VPC, CloudWatch, compliance). For pure cost, direct APIs are usually cheaper.
AWS Bedrock Provisioned Throughput
For predictable, high-volume workloads, Bedrock offers Provisioned Throughput (PT) — reserved model units billed hourly:
- Minimum commitment: 1 month
- Benefit: Guaranteed throughput, lower per-token cost at high volume
- Break-even: Typically at 50M+ tokens/month
- Example: Claude Haiku PT at 100M tokens/month saves ~30–40% vs on-demand
AWS Bedrock Batch Inference (50% Off)
Like OpenAI's Batch API, Bedrock offers batch inference at 50% discount for asynchronous workloads:
- Submit S3-stored JSONL files
- Results written back to S3
- Typical completion: 6–24 hours
- Supported models: Claude, Llama, Titan, Mistral
Why Choose AWS Bedrock?
- ✅ AWS account consolidation — single bill, existing credits, enterprise agreements
- ✅ VPC endpoint support — traffic stays in your AWS network, never public internet
- ✅ IAM-based access control — no separate API key management
- ✅ CloudWatch monitoring — integrated with your existing observability stack
- ✅ HIPAA/SOC2/GDPR compliance — inherited from AWS infrastructure
- ✅ Model diversity — switch between providers without new contracts
- ❌ Not cheapest for Llama — Groq/Together are 4× cheaper for Llama models
- ❌ No ChatGPT/GPT-4o — OpenAI not available on Bedrock
Compare AWS Bedrock vs Direct API Costs
See total cost for your workload across all major providers.
AI Cost Calculator