When vibe coding turns into an unfixable mess

I want to share my experience with AI agents enough to understand how they work in the real world!

I wanted to create a new slider with a clear idea in mind adn I built most of it until I decided to give control to AI. I got lazy, and boy, I am sorry now. The code is unreadable at this point, too many flags and functions all over the place. I was forced to give up on it about a week ago and about 50$…

What I’ve noticed is that this is definitely not true AI — it has no real sense of context. It fixes one thing and breaks two more. For example, I had a transition with an opacity effect and a separate pixelation effect. The pixelation inherited the opacity transition, but the opacity had no pixelation. I spent an entire day trying to make the pixelation apply only to the pixelation effect while still combining correctly with opacity, and don’t tell me I did not prompt it right because I tried more than one hudred times, give it all the detrails in the works and when ask it do you undersntand it give the right answer only to mess up things with each iteration.

It would fix one issue and create two more. It doesn’t remember anything if what you’re building hasn’t been done before, which is exactly my case. It just hallucinates back and forth until you lose your mind. And of course, every time I say something is wrong, it replies, “Yes, you are right.”

Creating an app involves a huge amount of context and countless fine details that AI simply doesn’t understand. For it, everything is just a concatenation of words. The hallucinations get worse the longer you use it, and the larger the context becomes. In my case, by that point, I could not even take over as a developer anymore — there are red flags everywhere: functions inside functions, conditions triggering other conditions, and everything tangled into an unsalvageable mess.

So the conclusion is simple: never let it take full control. At that point, it’s game over — the code becomes unmodifiable and nearly impossible to fix, even for you as the developer.

I don’t see how this could ever be true AGI, since it clearly doesn’t understand context. Explaining an app to it is just feeding it concatenated words; it doesn’t grasp scope or how to fine-tune the small details that make a project truly good and finished.

But the propaganda works, all companies fall into this mess, thinking that now skill is not required anymore. If you let an agent loose inside a commercial or already working app or platform, it will destroy it — 100%, and if you ask for retribution, it will say,’ yes you are right :)’. CEO’s don’t understand this part, and of course, they want to replace everything so it’s just them and a bunch of agents that can “read their thoughts.”

So the conclusion, considering that this doesn’t really improve with new model iterations, is worrying. I could have used GPT-4.1 in my project and ended up with the same mess, not much difference. I expect the bubble to burst badly, because you can’t just pour billions in forever with no return yes, not much of a return if I, with 25 years of experience and solid skills towards creative development, failed so bad 99% of other devs, if not more, will fail the same or worse.

Yes, it can write an app from a prompt up to lets say 99.9% but what do you do with the 0.01% wihout that it will not work, and you will not be able to fix it as a developer due to the mess of code it writes. What is the point then?

This also reflects on all vidoes on YouTube with vibe coding, all tries are unfinished, look bad all kinds of bugs. If you think you can fix such a mess as a dev, well, you are wrong, my friend.

The only sane usage of this now is to use it for compressed tasks in the code and make sure you understand and follow it so that you can give strong guidance and do not let it go to lala land… and for that, you need solid skills, probably is best to do it yourself.

Is it good? I honestly can’t decide anymore. I feel as confused as the mess of words it produces.

Probably is a big fat lie at this point, I don’t see this getting better, the model by design is broken!

Another aspect of all their tests with new graphs showing improvements means nothing in the real world.

It’s funny that Sam Altman talks about curing cancer — really? More likely, AI could end up creating a new form of cancer that will be infectious and transmissible by air. This is a way better probability than fixing it. It feels like a disaster, destroying things around it and making the world worse overall, while the promise sounds like a big, fat lie.

I’m really angry that I wasted a week on this project, but at least I understand now how this ‘AI’ works.

As for replacing me as a developer, that will never happen. The more I use it, the more I see what this really is — propaganda designed to attract billions more in funding, because apparently, it’s never enough.

4 Likes

You have to define the scope - dimensions upfront. Describing something slightly bigger still requires thousands of lines of description if not tens of thousands.
And of course that will always be that way.
Each dimension of complexity adds a new level of work and of course also a new level of possibilities.
I don’t think AI will be capable of building software the way we want it.
But it can build something different a new shiny world with UI that has 125 refresh buttons in an event driven - websocket based app..
It won’t be as bad as windows though.

2 Likes

You still have to be skilled in the area you are working to properly “guide” the AI to do the right thing.

For example, I can usually guide the AI to write code well, but I am horrible at guiding it to setup a working sync in Quickbooks. I can usually catch the errors it makes in writing code, but no idea about what assumptions it is making in Quickbooks.

So you have to guide it with your expertise to be effective. Otherwise, you can ending up wasting whole days, like I did (with accounting), when doing something considered advanced, outside of my knowledge or experience.

4 Likes

How could a coder write software in accounting without learning every thing in accounting first, right?

1 Like

It my specific case, GPT 5.2 assumed it was a perfect accountant, and the Stripe integration into quickbooks reported the Stripe fees (it doesn’t). So this is more of what you would assume crowdsourced expertise would come in, and AI would be trained on.

Same with code. I feel like the models perform at a medium depth without guidance, but aren’t really remarkable without guidance. I had to teach the AI the trick of picking random records out of a database scan by picking random hashes randomly and going up or down from the hash to pick the random record. It probably should already know this. But if I didn’t know better, I would have ended up with suboptimal code.

Bottom line here, if you want high-end stuff, you still need to be the expert guiding the AI. Otherwise you will end up with average results. Sometimes those average results are good enough, as most things in life have yet to be truly considered for automation by an expert. And so it may feel awesome, and like progress, but there is such a knowledge gap, maybe it’s worth another pass by a much smarter AI in the future to get a real upgrade.

You can see this over time, as code written by GPT-3 is dwarfed by what we have now. And presumably this trend will continue as the models improve. There may be a time where no expertise is needed, and perhaps that is when we can declare achieving AGI.

But for now, you still need to guide the AI.

The bigger fear is that of enfeeblement, or basically losing your skills by letting AI do all the work for you. I’m sure software folks here are already feeling this, but imagine how enfeeblement will hit other sectors.

2 Likes

I don’t see the enfeeblement striking hard.
Yes, I am too lazy to write code by hand - and how could I write tens of thousands LoC per hour without AI. But I could write code by hand. I totally understand the code it writes and with a few days of practice the old instincts would kick in.

1 Like

Probably like anything in this world, there are people who like things and others who don’t.
I am pretty busy in this space too, so I believe I can speak with some knowledge on this matter.

What doesn’t work is uttering a sentence to Codex and expecting it to magically produce something out of thin air. It’s not humanly possible, so how would a machine be able to do this at the stage we are in right now (today)?

What works is approaches that already showed their value in the real world:

  • Test Driven Development
  • Spec Driven Development
  • Normalized Systems Theory
  • Agile/Sliced approach to software engineering
    -…

When you put all this in a repo and manage to get all the ambiguity out of the system before the first letter of programming (read test code) takes place, you happen to get some pretty good code out of the models. I often (7/10) get first-time-right results with this method.

Ideal yet? Probably not
Question: Is human software development ideal yet (after 50 years) :wink:

Food for thought…

2 Likes

I have 25 years exeprience and about 50K licenses sold all over the world!

This is not AI, it feels more like a game. It literally does not understand anythiung is all some word concatenation that feels like magic!

1 Like

First using Codex is like a religious experience.

The experience of having your code touched by the noodly appendage of a flying spaghetti-writing monster.

2 Likes

It is more like autocompletion and pattern recognition. It would be a stretch to say the current model architecture “understands” what is going on.

This is pretty much been my experience. You get solid gains when your repo has plenty of good examples to look at, and also your AGENTS.md file is up to date.

You also need to really watch how big the context window gets too, or else you might find yourself paying big bucks. I recently did a big repo-wide refactor that took days and burned about $600. If I didn’t periodically summarize and start a new session with the summary, that could’ve easily ballooned to $3000 or more. Also a summary/reboot helps it stay focused.

Another trick is model interleaving. We use to do this in the old GPT-3 / 4 days, where you start with a higher model, like GPT-4 and then follow with a lesser model like GPT-3. You get better and cheaper results, since the lesser model learns from the higher model, and both models know different things and they end up complimenting each other. But you can do this with different frontier/powerhouse models as well, as both models learn from one another with the same shared context and history being sent, and really stirring up those neural net weights.

2 Likes

It’s definitely still like having a 2nd year college student interning for you.
But any manager worth his or her salt will know that you need to supervise interns.
It’s AI, it’s just not Jesus Level™ yet. Relax, take a breath, and don’t vibe code complicated things you don’t understand the architecture for.

4 Likes

i remember calling GPT-2 a toddler with a 100+ IQ. :sweat_smile:

curt summed it up well for me - you really have to know (currently) what you’re doing in order to steer the models. that’s becoming more and more automated, though.

like, I wonder if the $10k+ C compiler would’ve been cheaper if someone else was driving the model? that had problems too, IIRC, though it was still impressive… if you remember GPT-2 was less than a decade ago… and things are ramping up even more.

the next hockey-stick moment is gonna be nuts, imho.

2 Likes

I totally agree. One thing that I’m a bit worried about is how hard it is to get rid of baked-in GPT-isms. “It’s not about X \em it’s about Y” is a good example that still won’t go away.

1 Like

we’ve got a few gems among our Prompting threads if you search around. it really comes down to the prompt, but even then, some things are definitely baked in so to speak.

1 Like