Adventures in Vibe Coding

Ai it's everywhere these days and more and more our corporate overlords are saying it will replace anyone and everyone who does anything creative or technical or in some case legal (which ironically is usually illegal).

More and more pressure to "explore the tools" and "come up with agentic work flows" and "put Instagram style duck face filters on all games"

So this week so I could at least say I'd installed the tools and played with them should that become a mandatory rather than heavily advised requirement.

I installed the various plugins for vscode I tried cline and codex which were then hooked up to one of the models in this case chatgpt 5.4.

First I tried to use it with our source and it wasn't initally promising codex refused to work over our remote tunnel to the build servers only able to function locally so that was out right away. I switched it to cline and I asked it where in the code various debug lines were and what they meant. It just spat out a grep command with the parameters I'd already specified which wasn't really what I was after.

We've had various engineers throwing patch at cline to check them and almost always it has reported bugs that arent bugs or advises changes that would actively break things. When its right its so far been somethign very trivial like an extra charige return on the end of a string that would automatically put one in due to the type of string handling they were using. Its been really pissing off some of our senior devs as they submit a code drop for review someone runs cline on it and copy pastes the results into the review and then they spend a few hours trying to work out what the hell the thing is trying to say find it was completely wrong or some minor thing and they've wasted half a day on that.

So not that promising, I wondered if our code base is just too large for it to fit in the context window it is hundreds of thousands of files and a lot of code when it pulls it in you get one or two questions before it runs out of tokens and then has to compress the context and loses most of its capabilities and really goes to shit.

So maybe I needed something smaller something it can start from scratch with. So I decided to throw a simple project at it and vibe code it as the ai boosters want you to. So I decided to ask it to make me a ray casting engine in html5 so essentially doom or wolfenstien. With just that prompt it spat out a game with solid colored walls generated from a simple array map with WASD controls and collision detection and a rendered minimap. from there I had it add mouse look and then change the solid coloured walls for procedural textures. The textures had issues when the camera was very close a quick sentance and it fixed that. It had a collision issue with corners it fixed that. I had it add enemies health and health pickups basic hit scan guns. I had it add simple AI to the enemies with patrol routes and chasing mechanics based on the last position they saw the player. I had it add a debug mode where it would show paths and states as well as positions and such initially it put that under the other gui elements and it took a few goes to move that somewhere you could see it.

Then I had it change the map to be more procedural rather than fixed it did that but that fucked up the textures on the walls and put all the enemies in the same room. I got it to fix that but more and more issues began to pile up. Since it changed the map generation to make rooms connected by corridors rather than a big open room with a few fixed walls when the patrol routes overlapped in a corridor and two or more enemies were in there at the same time they would collide and get stuck. I asked it to fix that it made them try and shuffle past each other but they still got stuck. I asked it again it tried some sort of queuing mechanism which also didn't work. I asked it a third time and it claimed it had it but again it just didn't work.

At this point it had burned through 200k tokens and increasingly it was taking more and more to try and fix things and it was just failing. It also began breaking other parts of it as it added to it pathing on things that had worked began to fail. Attacks stopped working and things that had functioned didn't anymore and each of these things was then a loop that you had to go through trying to get it to fix it hoping that didn't break something else.

I gave it one more attempt and then gave up.

So it was an interesting experiment. What it managed to throw together with me just saying things like "add enemies" "add health pickups" "the health pickups look weird when you are further away" "the health pickups look like they are sliding into the floor when further away" and so on.

It's an impressive magic trick what it can throw together just with plain English and produce something quite polished at least on the surface but then as you go further and the code gets more complex it starts to break down it takes more and more goes to get it to fix things and increasingly the more complex issues seemed to be too much for it to fix. Then I could go in and try and fix it myself but unlike a normal coding project where I've put it together piece by piece and understand its systems already and have some idea of what I need to even look at with this code I've not looked at any of it I don't understand what it is doing or why. I could go through it and work that out but I would basically need to try and reverse engineer the whole thing from top to bottom to get to the point I understand it enough to make changes and fix it.

And this seems to me to be the fundamental problem with the ai coding model. The more it is used the less you understand what it has developed the more of crutch it becomes and the more you can't fix it yourself and need to hope it can be persuaded to fix the thing by asking it over and over and over in slightly different ways each one potentially breaking something else as the context window slowly balloons and it burns more and more tokens to try and fix things.

I can also see why non technical people are fooled by it. That first pass that first 90% that it can generate from a simple prompt is pretty impressive but the further you go the more it breaks down and requires someone that actually understands the code to come in and fix it and that's not even considering what ever crazy security holes it creates, or code maintainability / reuse, or compatibility with other projects outside its limited context, or performance or memory considerations. When what it produces is a on the fly sort of design thrown together with bits it grabbed from a whole host of sources its a bit of a black box in terms of what its doing under the hood.

Back in the day when I were a youngun and I wanted to make games I bought this book on coding doomlikes. It was probably already obselete by that point as it used video hardware interrupt based frame buffers and manual bliting peeking and poking memory and so on which had already started to go out of fashion for more hardware based rendering and the early 3d stuff but it gave me a glimpse into how all of that worked and I took a few stabs at writing my own before giving up and trying to learn glide the voodoo 3d propriety opengl esq api. If I'd had this tool back then I could have gotten a lot further but without having to learn the basics of how code is put together of how hardware works how you get it to comply to your whims and how it fails most of the time and what to do if it does.

I can see a kid these days with the same interests just throwing prompts at chatgpt and then not having the tools to get beyond a certain point and by then the code would be sufficiently weird and complicated even if they have the interest to learn what is going on under the hood they have a much more complex and difficult task than if they'd started from scratch themselves and worked their way up. I can easily see people thinking the magic trick gets them close enough and then when they hit that wall they get discouraged and can't do anything more but repeatedly prod chatgpt till they run out of tokens or patience.

This is also in a case where a lot of these models are currently offering vastly more tokens that you would get if you actually had to pay for them. Its definatley in the first taste is free part of the marketing and when that goes away either you are paying hundreds or thousands or even in some cases tens of thousands of dollars in compute for the same service or you try and get by on the lower tier rate limited or lower token context modes. And if it only just barely works with the floodgates wide open what is going to be like when they turn that down to a trickle.

Still I feel like the magic trick of how it can get you most of the way there in the first go for managers and people that never actually have touched this stuff will easilly convince a lot of them they can just fire all their engineers and replace them with one guy and a chatgpt sub and for a while that might work but in the long run its going to stack up a lot of problems although by the time those chickens come home to roost maybe it will be too late or they'll hire us all back at a fraction of our previous salaries since we'll be desperate. I worry for the next generation as I said having this when I was a kid I could easilly see I might have used it and never learned the actual coding and the only way to really get good at these things is to write and review a lot of code and if AI is doing all of that grunt work how do young kids even get a foot in the door to start down the path to becoming cynical broken grizzled veterans like wot I am.

Comments

An interesting read! I've not tried vibe coding anything complex from scratch but I have used it for specific things:

Rename refactoring that isn't just types but variable names too

For example, take a pseudocode:


class Car
{
int wheels = 4;
}
var myCar = new Car();

And you realised that you're not making an app for an ordinary person but Pete instead, so you prompt something like:

I'm making this for bikes not cars, rename and suggest fixes


class Bike
{
int wheels = 2; // TODO suggested fix
}
var myBike = new Bike();

Now most type safe languages will let you change Car to Bike but picking up the variables is more tricky.

Writing shit I can't remember what the syntax is

Particular in the shell. I don't do enough bash or Powershell or putting together docker containers to remember what the syntax is. You can ask AI to do that.

Act as an architectural sounding board

In VSCode with Unity code for Clomper, I asked how it might improve some of the classes. It could not see the makeup of the Unity project, which might be my fault but it still suggested some reasonable changes. I could have just told it to make them but were just before a release and didn't want any regression impact. It did help reason where I could improve maintenance.

Regex

Fuck regex. But AI is quite good at it, especially if you dump a load of test data in it.

Others have done more

I spoke briefly with a game dev chum Lau who makes Fireworks Simulator (which has done very well, fair play to him). He has an agent that keeps the mod documentation up to date automatically. Every time he commits to the mod tools, a GitHub hosted agent checks for changes and updates documentation site automatically. That's cool.

Lau also said that he got it to edit prefab and scene files, which is a big plus because only sticking with the code can leave you in a pickle.

Still accelerating

While I feel the commercial hype has a ceiling and that the monthly subscription cost for a sensible number of tokens, I think the tooling is going to improve and broaden as more model context servers pop up. Especially in arcane business processes that many medium enterprises have spread across them.

When I am back from holiday, I'm going to sign up to $20 month claude and get it to help me build the Clomper tutorial - even as a sounding board. It's a bit of a weird ask as I like writing code. It's my hobby. I don't want to watch someone else do it! I almost like the idea of it being handcrafted.

brainwipe's picture

Yeah as a tool for more sophisticated refactoring it's slightly better than the existing ide functions and something like that it's easy to spot where it's wrong so theoretically safe from hallucinations if you are paying attention.

Reformating and putting existing text into other document formats works well I've used it to take tables and spit out markdown versions since writing out tables in markdown is a bit of a pain. Again easy to see when it fucks up, which admittedly it has on this putting things in completely the wrong line now and again but easy to fix and still net gain in terms of my time.

Syntax for stuff like shell scripts I've seen people use it although often it's understanding of what is happening is not complete. I've seen things like it will have debug or return output in lines after it's turned off the power or reset or lines that can never be executed since it lacks an understanding of what it's driving. Still generally good but needs that level of review by an actual thinking person who knows what it's trying to do and how that should work

I can see automatically generating documentation as a good use it's something that developers hate doing it's always out of date non existent or wildly inaccurate so ai can't do a worse job :D

Automatic testing worries me there have already been cases where closed loop ai testing and validation misses critical failures and gives a false sense of confidence. I've also seen people throw ai feedback on code into reviews that is wrong then an engineer has to spend extra time to review understand and then dismiss it so cost a lot of extra time for nothing.

It struggles with complex large codebases and is also more capable in the languages it was designed to code with so python html css front end stuff. For less common things or high performance languages it's has less to work with c and c++ or heaven forbid rust it's just not as good in.

It can get quite far like the first 90% the last 10 is tricky those really complex interactions make it fall flat.

some of the really tricky stuff race conditions, cache coherency, memory fragmentation, latency issues, the bugs that take a week of continuous operation to show up, and the like I suspect are just too rooted in understanding the system as a whole for it to even spot.

I think it can give people a false sense of confidence and if you're not using it sensibly can stack up issues. It's something I've heard has been increasingly used as a crutch.

One of my friends who's in a more management role was working on a project with another teams software and there was a lot of feedback about how his team were not writing very good code that they didn't understand how the software works. He found out his engineers had just thrown it at chatgpt and not dug into the code to try and understand it. So it wrote a bunch of code that was not quite right and not having the understanding and relying too much on this tool they didn't realize that. So they got shot down in the pull requests and took a hit in terms of respect. Some of that was down to a lot of pressure to finish things now and if there is this tool that claims it can do things for you it becomes tempting to just run it without the work to understand what the code should be especially in situations like that where you are dealing with an external bit of software or libraries that you have no familiarity with it can seem like a good time saver and if it's something you are not completely familiar with it can look like it's correct when it's not.

I think the worst part of the ai coding stuff is that it's effective 90% of the time so you can end up trusting it as most of the time it's right or at least looks like it's right. Couple that with the relentless pace people are being required to maintain it's easy to lean on this tech to lift that burden but then it lets you down or slips in done terrifying security hole or critical bug.

It very easily becomes a crutch and degrades people's ability to fix or understand the code. You just hit the button and it does the rest tests validates commits documents and you can just move quickly on to the next thing and especially with companies mandating it's use I hear Facebook mandates 70% of commits must be ai. So people have a budget they have to work with and it's always the question what simple things can I give the ai to meet my quota and save my 30% or real development time for the hard issues it can't manage.

It is a tool that has uses but you cannot trust it. So long as you know this and always do the extra work it may be a useful addition but it's easy for it to become a crutch if you're not careful or in these instances where you are forced to use it it becomes another hurdle you have to work around.

Will it improve l, probably I also wonder a but about how many bad habits it's going to acquire over time it's a ouroboros it's already ingested all the code online and since more and more code fed into it is ai generated there is a bit of a feedback loop there. It's likely as good as it's ever going to get with this tech it will maybe get cheaper but then again maybe not given the exponential token consumption of these recursive reasoning models.

I worry most about the younguns coming on board in a time where this tech is being pushed as the ultimate replacement for knowing how to code. It can get you pretty far and it can look like it has all the answers but without that struggle that long period of learning and failure the processes of becoming a grizzled veteran an angry old man who at his opticians appointment got told "you don't need old man reading glasses yet ... But soon!" they'll never learn those skills or become hopelessly reliant on this tool that can and will let them down in some instances and then have no idea how to fix it

I had a colleges who was about a year out of college they had somehow never learned how to debug. They would throw lines into the code to diagnose the problem but not realizing they'd written the code such that those statements were never getting called. They had never learned the way you narrow down the issue you look for what is set where and how to logically use that to find the source of a bug. Somehow they'd gotten by without learning that and I think a lot of the reason they got that far in a company without anyone spotting this was their mentor was basically telling them exactly how to fix any issue they had rather than guiding them to learn how to fix things themselves. with AI I feel it's going to increase that sort of thing where they never learn the skills to truly understand how things work.

Evilmatt's picture

I wonder a lot about the long term viability of this tech. I mean look at the video stuff Sora made 2.1 million over the last 6 months and the conservative estimate for the cost to run it was between 1 and 15 million dollars per day. I know the coding isn't anywhere near that expensive to run but still it's not cheap and at the moment none of these companies are paying market rate for their compute its all given to them as part of their ownership deal in the case of OpenAI and Microsoft and for Anthropic its Amazon & google who own it giving them subsidized rates.

AI is currently a furnace that burns money in the billions and they are just gambling that the cost is somehow going to come down with no real plan to achieve that and a lot of the economics thye do release are sus like depreciation they currently rate it at 5 years but thats just nuts its more like 2 so the cost is unrealistic

Yet again the only outfit making any money is Nvidia

Evilmatt's picture