CodeBurn
All posts

You Are Flying Blind on AI Coding Costs

Every tool shows you a bill. None of them show you where the money went. That used to be fine. It is not fine anymore.

A developer left Claude Code running overnight on an automation loop. The loop kept rebuilding a large context and re-sending it, dozens of times an hour. By morning the bill was about $6,000. There was no live counter to warn them. The first sign of trouble was the charge.

Stories like that are easy to file under “rookie mistake.” They are not. They are what happens when a powerful, metered tool has no instrument panel. And right now, almost nobody flying these tools has one.

A bill is not an instrument panel

Anthropic has a usage dashboard. So does OpenAI. But a usage dashboard answers exactly one question: how much money left your account. It does not tell you where the money went.

It cannot tell you that your debugging sessions cost three times what your coding sessions do. It cannot tell you that one project ate $200 last Tuesday, a project that is not even shipping this quarter. It cannot tell you that your cache hit rate quietly dropped after a prompt change and is now costing you hundreds extra a month. The number on the bill is real, but it is the answer to the wrong question.

And if you use more than one tool, even that number is split across logins. Your Codex spend is on one page, your Claude Code spend on another, your Cursor usage behind a third. You might be paying $20 here, $20 there, $100 for one API and a bit more for another, and the total exists nowhere except in your head, where you are probably underestimating it.

Why this matters more than it did last year

A year ago, AI coding was a side-project expense. The cost was small and the stakes were lower. Two things changed.

First, pricing moved to metered. GitHub Copilot switched to usage-based billing in June 2026, and some developers reported burning through their quota in a couple of hours. Cursor replaced flat-rate requests with a credit pool last year and had to apologize publicly when users hit unexpected overages of hundreds of dollars. “Flat monthly fee” is turning into “you pay for what you use,” and without knowing what you use, that transition is a trap.

Second, the scale went up. Uber rolled Claude Code out to its engineering org in late 2025 and, according to Fortune, burned through its entire 2026 AI budget within a few months, with per-engineer costs landing between $500 and $2,000 a month and adoption around 95 percent. At that scale, flying blind is not an inconvenience. It is a budget that disappears before anyone notices.

What an instrument panel would actually show

The same things you would want from production monitoring, pointed at a different system. Three readouts matter most.

  1. Cost by project, model, and activity. Not just a monthly total, but which project, which model, and whether the spend went to writing code, exploring, debugging, or plain conversation. That is the difference between knowing you spent $3,000 and knowing that $1,200 of it went to one model on one project doing one thing you could have done cheaper.
  2. Cache hit rate. Most coding tools cache the prompt prefix, and when it works you pay a fraction of full price. When it breaks, from a context window that grew too large or a system prompt that changed, you pay full price on every call. A few points of cache rate, at volume, is real money. You want it on the dashboard, not in a postmortem.
  3. Retry and context patterns. Sessions with high retry counts and fat context windows are where money quietly evaporates. Seeing them lets you spot a runaway loop while it is running instead of on the statement.

What mine looked like

Here is one recent 30-day window from my own usage across several tools, priced at public API rates (real money on metered billing, the equivalent value on a flat subscription):

None of that is visible on a billing page. I got it by reading the session data that was already sitting on my disk and doing the math.

The data is already on your machine

Here is the part that makes this tractable, and a little sensitive. Your AI coding sessions contain your code, your prompts, and your project names. Claude Code stores transcripts in ~/.claude. Cursor keeps a SQLite database. Codex writes to its own local store. The raw material for full cost visibility is already there.

Which is also why I would be wary of any cost tool that uploads this data to a server to analyze it. You should not have to hand over your code and prompts to find out what you spent. A tool that reads those files locally and prices them with public rate data can answer every one of the questions above without anything leaving your laptop.

That is the approach I took with CodeBurn. It is an open-source CLI that reads your local session data, prices each call, and breaks your spend down by project, model, activity, and cache rate. One command, nothing uploaded:

npx codeburn

Whether you use that or build your own, the point stands on its own. You would not run a production service with no monitoring and a monthly invoice as your only signal. A metered AI coding workflow deserves the same instrument panel. The bill tells you that you spent. It is worth knowing where.