Unit Economics for AI: Calculating Cost Per Token, Per Inference, and Per Customer

Introduction

Cloud bills tell you what you spent. Unit economics tell you whether what you spent was worth it.

For traditional SaaS, the formula is simple: infrastructure cost per customer per month. For AI-native products, the formula has more variables: tokens per request, model tier, cache hit rate, agent loop depth, and embedding costs—all of which vary per user, per feature, and per interaction.

49% of organizations now use unit economics to connect technology spend to business outcomes. For AI workloads, this is not optional—it is how you avoid building a product that loses money on every user.

Section 1: The Three Unit Economics Metrics for AI

Cost per token

The atomic unit of LLM spend:

cost_per_token = (input_tokens × input_price) + (output_tokens × output_price)

Track this per model, per use case. A support bot averaging 3,000 tokens per response at GPT-4o pricing costs ~$0.03–$0.05 per response. At 10,000 responses/day, that is $300–$500/day.

Cost per inference

One inference = one model call (which may include multiple tokens):

cost_per_inference = total_tokens × cost_per_token + embedding_cost + reranking_cost

For RAG systems, add retrieval costs:

cost_per_rag_query = embedding_cost + vector_search_cost + llm_inference_cost + reranking_cost

Cost per customer

The metric finance cares about:

cost_per_customer_month = (total_ai_spend + total_infra_spend) / active_customers

Compare against revenue per customer. If cost per customer exceeds gross margin, you have a unit economics problem—not a scaling problem.

Section 2: Building the Attribution Pipeline

You need to trace every dollar from API call to business unit:

Instrumentation layer

Log on every AI request:

request_id,
customer_id or tenant_id,
model_id,
input_tokens, output_tokens,
use_case (support, search, generation, classification),
cache_hit (boolean),
latency_ms,
estimated_cost_usd.

Aggregation layer

Roll up daily:

cost per customer per day,
cost per use case per day,
cost per model per day,
cache hit rate per use case.

Reporting layer

Weekly dashboard:

Customer tier	Avg cost/customer/month	Revenue/customer/month	Margin
Free	$0.45	$0	-$0.45
Pro	$2.10	$29	$26.90
Enterprise	$18.50	$500	$481.50

This table tells you whether your free tier is subsidized acceptably and whether enterprise margins hold at scale.

Section 3: The Agent Loop Multiplier

Agents are the unit economics wildcard. A single user request can trigger:

1 planning call (2,000 tokens),
3 tool invocations (1,500 tokens each),
1 synthesis call (3,000 tokens),
1 retry after a tool failure (5,000 tokens).

Total: 15,500 tokens for one "simple" user request. At GPT-4o pricing, that is $0.15–$0.25 per agent task vs $0.03 for a single-shot completion.

Track cost per agent task, not just cost per API call. Set token budgets per task and per session.

Section 4: Optimizing Unit Economics

Once you can measure, you can optimize:

Lever	Typical savings	Tradeoff
Semantic caching	50–70% on repeated queries	Stale responses if invalidation is poor
Model routing	60–80% on simple tasks	Quality drop if routing is too aggressive
Prompt compression	20–40% on token count	May lose context for complex tasks
Batch processing	30–50% on non-real-time workloads	Increased latency
Context window management	15–30% on long conversations	Requires summarization logic

Optimize in order of impact: caching first, then model routing, then prompt engineering.

Section 5: When to Worry

Red flags in your unit economics:

Cost per customer growing faster than revenue per customer (margin compression),
Free tier cost exceeding 10% of total AI spend (unsustainable subsidy),
Agent tasks costing 5x+ single-shot completions (loop control problem),
No correlation between AI spend and retention/engagement (feature may not justify cost).

Conclusion

Unit economics for AI is not a finance exercise—it is an engineering discipline. Instrument cost per token today, aggregate to cost per customer this week, and review margins monthly.

If you cannot tell your CFO what it costs to serve one customer, you are not ready to scale AI features.