Building the 'Cost Observability' Layer: Every AI Architecture Needs One in 2026
Introduction
You have application monitoring. You have infrastructure monitoring. You probably do not have cost monitoring—and for AI systems, that gap is expensive.
Traditional APM tracks latency, error rates, and throughput. None of those tell you that a single agent task burned $0.47 in tokens, that your free tier users consume 3x the AI budget of paid users, or that switching models last Tuesday doubled inference cost with no quality improvement.
Cost observability is the practice of tracking dollar-level spend per request, per feature, per model, and per customer—with the same granularity you apply to latency and error rates. In 2026, it is foundational infrastructure for any AI-native product.
Section 1: What Cost Observability Tracks
Per-request cost
Every AI API call logs:
{
"request_id": "req_abc123",
"model": "gpt-4o",
"input_tokens": 1842,
"output_tokens": 356,
"input_cost_usd": 0.0046,
"output_cost_usd": 0.0036,
"total_cost_usd": 0.0082,
"latency_ms": 1240,
"cache_hit": false,
"use_case": "support_bot"
}
Per-task cost (agents)
Agents make multiple model calls per user request. Roll up:
{
"task_id": "task_xyz789",
"total_model_calls": 4,
"total_tokens": 12400,
"total_cost_usd": 0.047,
"tool_calls": 3,
"duration_ms": 8500,
"outcome": "completed"
}
Per-customer cost
Aggregate daily/monthly:
{
"customer_id": "cust_456",
"period": "2026-06",
"total_requests": 2840,
"total_cost_usd": 42.30,
"cost_per_request_usd": 0.0149,
"revenue_usd": 29.00,
"margin_usd": -13.30
}
That last row is the one that should trigger an alert.
Section 2: Architecture
Middleware layer
Insert cost tracking between your application and model providers:
App → Cost Middleware → Model Provider
↓
Cost Event Store
↓
Aggregation + Dashboard
The middleware:
- intercepts every model call,
- calculates cost from token counts and model pricing,
- attaches cost metadata to the request context,
- emits events to your cost store.
Cost event store
Options by scale:
- Structured logs (CloudWatch, Datadog): good for < 100K requests/day,
- Time-series DB (TimescaleDB, InfluxDB): good for aggregation and alerting,
- Dedicated cost platforms (Helicone, LangSmith): good for LLM-specific features.
Pricing table
Maintain a versioned pricing table:
{
"gpt-4o": {
"input_per_1k": 0.0025,
"output_per_1k": 0.01,
"effective_date": "2026-01-01"
}
}
Update when providers change pricing. Historical costs must use the price at time of request.
Section 3: Dashboards and Alerts
Essential dashboards
- Daily spend trend (total + by model + by use case),
- Cost per request distribution (p50, p95, p99),
- Top 10 most expensive customers/tasks,
- Cache hit rate vs cost savings,
- Model routing effectiveness (cost vs quality by model tier).
Essential alerts
| Alert | Threshold | Action |
|---|---|---|
| Daily spend exceeds budget | > 100% of daily budget | Page on-call |
| Cost per request spike | > 2x 7-day average | Investigate model/routing change |
| Customer margin negative | cost > revenue for tier | Review usage patterns |
| Agent task cost spike | > $0.50 per task | Check for infinite loops |
Section 4: Integrating with Existing Observability
Cost observability should live alongside your APM, not in a separate silo:
- Attach
cost_usdas a span attribute in distributed traces, - Include cost in your existing request dashboards,
- Correlate cost spikes with deployment events and model changes,
- Export cost metrics to the same alerting system (PagerDuty, Opsgenie).
When debugging a latency spike, you should also see whether cost spiked—often the same root cause (model change, context window bloat, agent loop).
Section 5: Implementation Priority
Week 1:
- Log token counts and estimated cost on every model call,
- Create a daily spend spreadsheet from logs.
Week 2:
- Build per-request cost middleware,
- Set daily budget alert.
Week 3:
- Add per-customer aggregation,
- Build cost per request dashboard.
Week 4:
- Integrate with distributed tracing,
- Add cost to CI/CD eval reports.
Conclusion
Cost observability is not a FinOps luxury—it is production infrastructure. You cannot optimize what you cannot measure, and you cannot measure AI costs with cloud billing alone.
Build the middleware layer this week. The first time it catches a runaway agent loop, it pays for itself.
Related reading:
For cost optimization: