The Real Cost of Vibe Coding
“Just keep iterating until it works” feels free while you do it. The surprising part is what actually drives the bill.
One night I spent $424 on a single debugging session. Fifty-nine rounds of paste-the-error, get-a-fix, watch-it-break-differently, paste-the-error. I never stopped to read the code. I just kept feeding the model the latest failure and hoping the next answer would be the one. It was not. I closed the laptop with nothing shipped and a bill that could have bought a decent dinner for four.
That is vibe coding. Not the good kind, where you let the model draft something and then shape it. The bad kind, where iteration replaces thinking. And it has a cost most people never see, because the cost does not show up until the invoice does.
The thing I got wrong about retries
I assumed the expense scaled with how many times I retried. More rounds, more money. Then I looked at two of my most expensive runs side by side, and the numbers said something stranger.
- One session: $884 over 263 retries. That works out to $3.36 per retry.
- Another: $397 over 600 retries. That is $0.66 per retry.
More than twice as many retries, and it cost less than half as much. If retries were the driver, that makes no sense. So what was different?
Context. The $884 session was dragging about 40 million tokens of input and cache into every single round. The $397 one carried around 15 million. Each retry re-sends the whole conversation so far, so the price of a retry is set by how much you are hauling along with it, not by the fact that you hit enter again.
That reframes the whole problem. The retry tax is really a context tax. A loop that re-sends a giant pile of files and failed attempts on every round is expensive whether it runs 50 times or 500. A loop that keeps its context small stays cheap even if it runs forever. Cutting the number of retries helps. Cutting the size of each one helps far more.
(Two of my dramatic examples here were actually an automated eval harness, not me typing. I am being upfront about that. But the mechanic is identical, and my human sessions, the $424 one above, show the same shape at smaller scale. An agent looping on bloated context is the purest, most expensive form of vibe coding there is. And to be clear on the dollars: these are tokens priced at public API rates, real money if you pay per token, the equivalent value of what you burned if you are on a flat subscription.)
Why the loop does not converge
There is a deeper reason these sessions run long. AI models are not debuggers. When you paste an error, the model does not narrow down a root cause. It generates a plausible fix. If that fix creates a new error, it generates another plausible fix. Nothing tracks what has already been tried, so the model can and does suggest the same fix it offered thirty rounds ago.
A 263-retry session is not 263 steps toward a solution. It is closer to a random walk where every step has a price tag. Mine cost $884, and I cannot point to what it produced.
The cost does not stop when the code appears
Even when the loop spits out something that runs, you are not done paying. The research on this is consistent enough to be uncomfortable.
METR ran a randomized trial with experienced open-source developers in early 2025. The developers expected AI tools to speed them up by about 24 percent. They felt about 20 percent faster while doing it. Measured against the clock, including review and rework, they were roughly 19 percent slower. The feeling of speed and the actual speed pointed in opposite directions. (METR notes models have improved since, so treat this as a snapshot, not a law.)
Faros AI measured the downstream effects across engineering orgs in 2025: code reviews took about 91 percent longer, and bugs per developer rose about 9 percent. The code shows up fast, then takes longer to verify and breaks more often.
GitClear's analysis of 211 million changed lines found refactoring dropping from about 25 percent of changes to under 10 percent, while code churn rose from 5.5 to 7.9 percent. Teams are writing more and cleaning up less, then rewriting more of what they wrote.
And Cursor's own 18-month dataset suggests the tools widen the gap between developers rather than closing it. Strong engineers use AI as a power tool and apply judgment about when to accept and when to rewrite. Weaker ones accept more and review less, and generate more debt doing it. The loop rewards the people who already know when to stop.
How to stop paying the tax
The fix is not exciting. It is the boring discipline that AI was supposed to free us from, applied on purpose.
- Keep the context small. This is the single biggest lever, because cost tracks context size per round. Close files the model does not need. Do not paste entire logs when three lines matter. The leaner each round, the cheaper the whole loop.
- Start a new session instead of extending an old one. A fresh session has a clean window with none of the failed attempts still riding along. You pay for the context you need now, not the debris of every wrong turn since lunch.
- Stop after about three failed retries. If the model has not solved it in three honest attempts, it usually lacks the context to solve it at all. Read the code yourself, rewrite the prompt with the missing information, or fix it by hand. The first few retries are cheap. The fiftieth is not.
- Use cheaper models to explore. Sonnet costs a fraction of Opus, and Haiku a fraction of Sonnet. When you are not sure an approach will even work, find out on the cheap model, then switch up once you have a plan.
- Write one sentence of plan first. What you want, and how you will know it worked. Thirty seconds of that kills entire categories of retry loop before they start.
Seeing your own loops
The reason vibe coding persists is that the cost is invisible while it happens. There is no running meter in your editor. The retry loop that cost you $400 looks exactly like the one that cost $4 until the statement arrives.
I only found my own pattern because I went looking in my session history: which runs had high retry counts, fat context, and nothing shipped at the end. If you want to check yours, that is one command and it reads only your local session files:
npx codeburn optimizeIt will show you your retry-heavy sessions and how much context each one carried. The $424 night taught me more than the code would have. Twenty minutes of reading the codebase up front would have been cheaper than 59 rounds of guessing.
