Unit Economics for AI: Calculating Cost Per Token, Per Inference, and Per Customer
Introduction
Cloud bills tell you what you spent. Unit economics tell you whether what you spent was worth it.
For traditional SaaS, the formula is simple: infrastructure cost per customer per month. For AI-native products, the formula has more variables: tokens per request, model tier, cache hit rate, agent loop depth, and embedding costs—all of which vary per user, per feature, and per interaction.
49% of organizations now use unit economics to connect technology spend to business outcomes. For AI workloads, this is not optional—it is how you avoid building a product that loses money on every user.
Section 1: The Three Unit Economics Metrics for AI
Cost per token
The atomic unit of LLM spend:
cost_per_token = (input_tokens × input_price) + (output_tokens × output_price)
Track this per model, per use case. A support bot averaging 3,000 tokens per response at GPT-4o pricing costs ~$0.03–$0.05 per response. At 10,000 responses/day, that is $300–$500/day.
Cost per inference
One inference = one model call (which may include multiple tokens):
cost_per_inference = total_tokens × cost_per_token + embedding_cost + reranking_cost
For RAG systems, add retrieval costs:
cost_per_rag_query = embedding_cost + vector_search_cost + llm_inference_cost + reranking_cost
Cost per customer
The metric finance cares about:
cost_per_customer_month = (total_ai_spend + total_infra_spend) / active_customers
Compare against revenue per customer. If cost per customer exceeds gross margin, you have a unit economics problem—not a scaling problem.
Section 2: Building the Attribution Pipeline
You need to trace every dollar from API call to business unit:
Instrumentation layer
Log on every AI request:
request_id,customer_idortenant_id,model_id,input_tokens,output_tokens,use_case(support, search, generation, classification),cache_hit(boolean),latency_ms,estimated_cost_usd.
Aggregation layer
Roll up daily:
- cost per customer per day,
- cost per use case per day,
- cost per model per day,
- cache hit rate per use case.
Reporting layer
Weekly dashboard:
| Customer tier | Avg cost/customer/month | Revenue/customer/month | Margin |
|---|---|---|---|
| Free | $0.45 | $0 | -$0.45 |
| Pro | $2.10 | $29 | $26.90 |
| Enterprise | $18.50 | $500 | $481.50 |
This table tells you whether your free tier is subsidized acceptably and whether enterprise margins hold at scale.
Section 3: The Agent Loop Multiplier
Agents are the unit economics wildcard. A single user request can trigger:
- 1 planning call (2,000 tokens),
- 3 tool invocations (1,500 tokens each),
- 1 synthesis call (3,000 tokens),
- 1 retry after a tool failure (5,000 tokens).
Total: 15,500 tokens for one "simple" user request. At GPT-4o pricing, that is $0.15–$0.25 per agent task vs $0.03 for a single-shot completion.
Track cost per agent task, not just cost per API call. Set token budgets per task and per session.
Section 4: Optimizing Unit Economics
Once you can measure, you can optimize:
| Lever | Typical savings | Tradeoff |
|---|---|---|
| Semantic caching | 50–70% on repeated queries | Stale responses if invalidation is poor |
| Model routing | 60–80% on simple tasks | Quality drop if routing is too aggressive |
| Prompt compression | 20–40% on token count | May lose context for complex tasks |
| Batch processing | 30–50% on non-real-time workloads | Increased latency |
| Context window management | 15–30% on long conversations | Requires summarization logic |
Optimize in order of impact: caching first, then model routing, then prompt engineering.
Section 5: When to Worry
Red flags in your unit economics:
- Cost per customer growing faster than revenue per customer (margin compression),
- Free tier cost exceeding 10% of total AI spend (unsustainable subsidy),
- Agent tasks costing 5x+ single-shot completions (loop control problem),
- No correlation between AI spend and retention/engagement (feature may not justify cost).
Conclusion
Unit economics for AI is not a finance exercise—it is an engineering discipline. Instrument cost per token today, aggregate to cost per customer this week, and review margins monthly.
If you cannot tell your CFO what it costs to serve one customer, you are not ready to scale AI features.
Related reading:
- FinOps 2.0: Technology Value Engineering
- Building the Cost Observability Layer
- Semantic Caching at Scale
For cost optimization: