CodeBurn
All posts

The Real Cost of Vibe Coding

“Just keep iterating until it works” feels free while you do it. The surprising part is what actually drives the bill.

One night I spent $424 on a single debugging session. Fifty-nine rounds of paste-the-error, get-a-fix, watch-it-break-differently, paste-the-error. I never stopped to read the code. I just kept feeding the model the latest failure and hoping the next answer would be the one. It was not. I closed the laptop with nothing shipped and a bill that could have bought a decent dinner for four.

That is vibe coding. Not the good kind, where you let the model draft something and then shape it. The bad kind, where iteration replaces thinking. And it has a cost most people never see, because the cost does not show up until the invoice does.

The thing I got wrong about retries

I assumed the expense scaled with how many times I retried. More rounds, more money. Then I looked at two of my most expensive runs side by side, and the numbers said something stranger.

More than twice as many retries, and it cost less than half as much. If retries were the driver, that makes no sense. So what was different?

Context. The $884 session was dragging about 40 million tokens of input and cache into every single round. The $397 one carried around 15 million. Each retry re-sends the whole conversation so far, so the price of a retry is set by how much you are hauling along with it, not by the fact that you hit enter again.

That reframes the whole problem. The retry tax is really a context tax. A loop that re-sends a giant pile of files and failed attempts on every round is expensive whether it runs 50 times or 500. A loop that keeps its context small stays cheap even if it runs forever. Cutting the number of retries helps. Cutting the size of each one helps far more.

(Two of my dramatic examples here were actually an automated eval harness, not me typing. I am being upfront about that. But the mechanic is identical, and my human sessions, the $424 one above, show the same shape at smaller scale. An agent looping on bloated context is the purest, most expensive form of vibe coding there is. And to be clear on the dollars: these are tokens priced at public API rates, real money if you pay per token, the equivalent value of what you burned if you are on a flat subscription.)

Why the loop does not converge

There is a deeper reason these sessions run long. AI models are not debuggers. When you paste an error, the model does not narrow down a root cause. It generates a plausible fix. If that fix creates a new error, it generates another plausible fix. Nothing tracks what has already been tried, so the model can and does suggest the same fix it offered thirty rounds ago.

A 263-retry session is not 263 steps toward a solution. It is closer to a random walk where every step has a price tag. Mine cost $884, and I cannot point to what it produced.

The cost does not stop when the code appears

Even when the loop spits out something that runs, you are not done paying. The research on this is consistent enough to be uncomfortable.

METR ran a randomized trial with experienced open-source developers in early 2025. The developers expected AI tools to speed them up by about 24 percent. They felt about 20 percent faster while doing it. Measured against the clock, including review and rework, they were roughly 19 percent slower. The feeling of speed and the actual speed pointed in opposite directions. (METR notes models have improved since, so treat this as a snapshot, not a law.)

Faros AI measured the downstream effects across engineering orgs in 2025: code reviews took about 91 percent longer, and bugs per developer rose about 9 percent. The code shows up fast, then takes longer to verify and breaks more often.

GitClear's analysis of 211 million changed lines found refactoring dropping from about 25 percent of changes to under 10 percent, while code churn rose from 5.5 to 7.9 percent. Teams are writing more and cleaning up less, then rewriting more of what they wrote.

And Cursor's own 18-month dataset suggests the tools widen the gap between developers rather than closing it. Strong engineers use AI as a power tool and apply judgment about when to accept and when to rewrite. Weaker ones accept more and review less, and generate more debt doing it. The loop rewards the people who already know when to stop.

How to stop paying the tax

The fix is not exciting. It is the boring discipline that AI was supposed to free us from, applied on purpose.

Seeing your own loops

The reason vibe coding persists is that the cost is invisible while it happens. There is no running meter in your editor. The retry loop that cost you $400 looks exactly like the one that cost $4 until the statement arrives.

I only found my own pattern because I went looking in my session history: which runs had high retry counts, fat context, and nothing shipped at the end. If you want to check yours, that is one command and it reads only your local session files:

npx codeburn optimize

It will show you your retry-heavy sessions and how much context each one carried. The $424 night taught me more than the code would have. Twenty minutes of reading the codebase up front would have been cheaper than 59 rounds of guessing.