Skip to content
Cloud AI Pricing

AWS Bedrock Pricing 2026:
Claude, Llama, Titan & All Models

Complete AWS Bedrock pricing guide for 2026 — all available models including Anthropic Claude, Meta Llama, Amazon Titan, Mistral, and Cohere. Includes on-demand vs provisioned throughput comparison.

13 min read·Updated March 2026
AWS Bedrock Pricing at a Glance
$3.00/M
Claude Sonnet input tokens
$2.65/M
Llama 3.1 70B input tokens
$0.30/M
Amazon Titan input tokens
50%
Batch inference discount

AWS Bedrock Model Pricing (2026)

ModelInput $/1K tokensOutput $/1K tokensContext Window
Anthropic Claude Sonnet 4.5$0.00300$0.01500200K tokens
Anthropic Claude Haiku 3.5$0.00080$0.00400200K tokens
Anthropic Claude Opus 4$0.01500$0.07500200K tokens
Meta Llama 3.1 8B Instruct$0.00030$0.00060128K tokens
Meta Llama 3.1 70B Instruct$0.00265$0.00350128K tokens
Meta Llama 3.3 70B Instruct$0.00265$0.00350128K tokens
Amazon Titan Text Express$0.00030$0.000408K tokens
Amazon Titan Text Lite$0.00015$0.000204K tokens
Mistral 7B Instruct$0.00015$0.0002032K tokens
Mistral Large 2$0.00200$0.00600128K tokens
Cohere Command R$0.00050$0.00150128K tokens

AWS Bedrock vs Direct API: Which Is Cheaper?

For most models, AWS Bedrock charges a small premium over direct API access for the convenience of AWS integration:

  • Claude Sonnet via Anthropic API: $3.00/M input, $15.00/M output
  • Claude Sonnet via AWS Bedrock: $3.00/M input, $15.00/M output (same price)
  • Llama 3.1 70B via Groq: $0.59/M input, $0.79/M output
  • Llama 3.1 70B via AWS Bedrock: $2.65/M input, $3.50/M output (4× more expensive)

Key insight: AWS Bedrock's value is not price — it's AWS ecosystem integration (IAM, VPC, CloudWatch, compliance). For pure cost, direct APIs are usually cheaper.

AWS Bedrock Provisioned Throughput

For predictable, high-volume workloads, Bedrock offers Provisioned Throughput (PT) — reserved model units billed hourly:

  • Minimum commitment: 1 month
  • Benefit: Guaranteed throughput, lower per-token cost at high volume
  • Break-even: Typically at 50M+ tokens/month
  • Example: Claude Haiku PT at 100M tokens/month saves ~30–40% vs on-demand

AWS Bedrock Batch Inference (50% Off)

Like OpenAI's Batch API, Bedrock offers batch inference at 50% discount for asynchronous workloads:

  • Submit S3-stored JSONL files
  • Results written back to S3
  • Typical completion: 6–24 hours
  • Supported models: Claude, Llama, Titan, Mistral

Why Choose AWS Bedrock?

  • AWS account consolidation — single bill, existing credits, enterprise agreements
  • VPC endpoint support — traffic stays in your AWS network, never public internet
  • IAM-based access control — no separate API key management
  • CloudWatch monitoring — integrated with your existing observability stack
  • HIPAA/SOC2/GDPR compliance — inherited from AWS infrastructure
  • Model diversity — switch between providers without new contracts
  • Not cheapest for Llama — Groq/Together are 4× cheaper for Llama models
  • No ChatGPT/GPT-4o — OpenAI not available on Bedrock

Compare AWS Bedrock vs Direct API Costs

See total cost for your workload across all major providers.

AI Cost Calculator