AI COGS for SaaS 2026:
What Goes Into Cost of Goods Sold
AI SaaS cost of goods sold (COGS) is not just your LLM API bill. It includes vector DBs, hosting, support infrastructure, and model costs. Here's exactly what goes into AI COGS and how to target 70–80% gross margins. Last verified: 2026-04-01.
What Is COGS for an AI SaaS Product?
COGS (Cost of Goods Sold) in SaaS represents the direct costs of delivering your product to customers. For AI SaaS, this typically includes:
- LLM API costs — what you pay Anthropic, OpenAI, or Google per token
- Vector database — Pinecone, Weaviate, pgvector hosting for semantic search/RAG
- Application hosting — servers, CDN, load balancers running your app
- Customer-facing support infrastructure — support tooling, status page, monitoring
- Embedding model costs — typically OpenAI embeddings at $0.02/M tokens
- Third-party APIs — STT (Deepgram), TTS, search APIs if applicable
What is NOT in COGS: engineering salaries, marketing, sales, G&A. Those are operating expenses (OpEx).
COGS Breakdown by AI Product Type
| Product type | LLM API | Vector DB | Hosting | Other | Total COGS % revenue |
|---|---|---|---|---|---|
| AI writing tool (content gen) | 8% | 1% | 4% | 2% | 15% |
| AI customer support bot | 6% | 4% | 5% | 3% | 18% |
| AI coding copilot | 25% | 2% | 8% | 5% | 40% |
| AI voice agent (STT+LLM+TTS) | 15% | 2% | 5% | 18% (STT/TTS) | 40% |
| AI document processor | 4% | 3% | 5% | 2% | 14% |
| AI knowledge base / RAG tool | 10% | 12% | 6% | 2% | 30% |
Coding copilots and voice agents have structurally higher COGS due to commoditized pricing and high infra costs. Document processors and content tools are the most margin-friendly AI product types.
Worked Example: $100K MRR AI Writing Tool
| COGS line item | Monthly cost | % of $100K MRR |
|---|---|---|
| LLM API (Claude Haiku 4.5, 10M output tokens) | $5,000 | 5.0% |
| LLM API (Claude Sonnet 4.6, 500K output tokens premium users) | $7,500 | 7.5% |
| Embeddings (OpenAI text-embedding-3-small, 100M tokens) | $200 | 0.2% |
| Vector DB (Pinecone Starter, 10M vectors) | $70 | 0.1% |
| App hosting (Vercel + Railway) | $800 | 0.8% |
| Monitoring, logging, error tracking | $300 | 0.3% |
| Support tooling (Intercom) | $500 | 0.5% |
| Total COGS | $14,370 | 14.4% |
| Gross Margin | $85,630 | 85.6% |
This is a realistic scenario for an AI writing tool routing premium users to Sonnet and the majority to Haiku. Gross margin of 85.6% is above the industry benchmark for AI-native SaaS.
COGS Warning: High-Compute Use Cases
| Scenario | Why COGS explodes | COGS % at risk | Fix |
|---|---|---|---|
| Flat-fee unlimited AI coding tool | Power users run 1,000+ completions/day vs 50 for light users | 60–80% | Hard daily limits or credit model |
| Agent-first product with no loop limit | 5-step agent can become 50-step if no max_iterations set | 100%+ | Always set max_iterations in agent code |
| Long-context document Q&A with no chunking | 100K-token document sent on every query = $0.30/query at Sonnet | 30–50% | Use RAG to retrieve 3–5 relevant chunks instead |
| Chatbot with full conversation history | Turn 20 of a conversation has 10K+ accumulated input tokens | 20–40% | Truncate to last 5 turns; summarize old turns |
COGS Reduction Playbook
- Model tiering: Use Haiku for 80% of requests, Sonnet for premium users. At Haiku's $1/M vs Sonnet's $3/M, this alone cuts blended LLM cost by 60%.
- Prompt caching: System prompts cached at 90% discount. 2,000-token system prompt cached saves $0.0018/call on Haiku vs uncached.
- Batch API for async: All non-realtime processing (document enrichment, scheduled reports) should go through Batch API at 50% off.
- Output length control: Instruct models to be concise. Every token saved is money saved. "Respond in 3 bullet points max" vs "Respond thoroughly".
- Response caching: Cache LLM responses for identical or near-identical inputs (FAQ questions, common prompts). No API call needed for repeat questions.
Model Your AI COGS
Enter your monthly token volume and model mix to calculate exactly what your AI COGS will be.
AI API Cost Calculator