LLMs seem to be flawed at the core for real programming - Help Needed

Ozone421 · February 24, 2025, 3:57pm

Has anyone actually been able to use ChatGPT or any other LLM for coding other than for simple code completion? No, I don’t want a 100 line python snake game.

I am a programmer with more than 20 years of experience with programming various languages from assembler all the way to C#, JavaScript, HTML, Java etc.

I have been trying for the last year, since the hype started about how smart these LLMs are, to actually use them for real-world programming and have not been able to do anything useful with them.

I gave it a new prompt this weekend and I gave up after about 2 hours to get a simple ExpressJS app to work. I always run into the same issue: The LLM has been trained on all the versions of the libraries and does not know the difference between the latest and older code. It always makes stupid errors and gets itself wrapped up around an axle of uselessness. Even when telling it to use a specific version of the library or the “latest” and including the links, they still can’t do it without me having to fix things.

Has anyone been able to get around this problem? Do I have to attach the full source code of the library I want to use? How do I solve this issue?

hugebelts · February 24, 2025, 5:22pm

Hello, @Ozone421.

Can you give a more precise kind of thing you’re trying to do?

Or even an alternative project if the one you’re working on is too highly confidential. So we can see if we get a proof of concept working here.

Would that be a feasible suggestion?

johnroy · February 24, 2025, 5:26pm

Probably not the answer you are looking for but I believe the use case of present day LLMs in general purpose coding is a myth/overblown.

Sure it’s easy to get them to implement a function given a definition or simple CRUD code. But the task of doing something very specific … (library / style / special considerations etc. etc.) is hard. And that’s before you even start the job of refining the code… (which is most of the job of a software developer).

To solve that problem I think requires a lot of prompting and other work above and beyond the LLM – some folks like using claude (sp?) there is the cursor IDE, lovable.dev – all examples of what I believe are a lot of refining work on top of LLMs – and at that they are not perfect.

The route I’ve committed to is using LLM’s to build nocode applications composed from smaller building blocks each of which is well defined and above the level of programming language, so it’s easier to get right, be useful and critically: be refineable and enhanceable. When it does write code it’s to a very specific plugin API that is well described in prompting.

Thats my take though I don’t have as broad an overview of the state of the art as some.
-J

hugebelts · February 24, 2025, 6:10pm

Ever tried to solve this using meta prompting? Because I rarely see someone here applying this.

I just saw now you’ve offered to attach the code potentially as well. Clearly seen faut on my PC. But not on my mobile, only now.

I’m not here to proof it may be possible or not possible to do things using llm or hype or unhype them. Only yo help to ser, especially in this case, how much we can close the gap and the model may potentially help or not. If that makes sense. Nothing more. Nothing less.

So, if you’re still open to it we may have a look and see how far we get it together or if the models can’t do it.

davejh · February 24, 2025, 6:21pm

If you want a counter-example, you could take a look at the Humbug tool I’ve been building for the last 3 months. 80%+ of this has been built by AI although it takes some effort by me to keep the architecture right (although I have AI do all the large-scale refactoring). As it stands, the app is just over 26k lines of Python.

It doesn’t have many dependencies (there are just 5 plus the Python standard library), but the next thing I’m building addresses some of the major problems with third party code.

Before I built this I did come up with a simple language, Metaphor, that lets me construct and iterate complex prompts. That includes modules that can be chained together and embedding files. Typical prompts range from about 1200 to 6000 lines, although the biggest one I’ve used was about 26k lines.

The biggest single new output I got was a lexer/parser combo at about 800 lines of code (had a couple of bugs to fix) and there are a couple of demo videos on YouTube. The most complex single output was a 1300 line refactor in the latest release (went out today).

Sources for both what I’m building and the metaphor compiler are on github:

GitHub - m6r-ai/humbug: A GUI-based AI development tool with integrated Metaphor support and GitHub - m6r-ai/m6rc: Metaphor prompt compiler

Ozone421 · February 25, 2025, 2:35am

Thank you everyone for your answers. I tried a different route and seem to have come up with a solution. I first asked the LLM what they suggest based on a description of what I want to do and then I used the suggested libraries. I had previously suggested an older library that there wasn’t enough examples and documentation for. So when I used a newer and more popular library, the LLM produced near perfect code. I still had to resolve some issues but they were minor. I used ChatGPT successfully tonight to create the simple web application I wanted.

I am not sure how to use chatGPT for a large or complex project though. I don’t know how one would attach hundreds of files for review or maybe I should only attached files that are relevant to the changes needed which might be hard to do if you don’t know enough of the project.

So in summary: Know the limitations of LLMs. Use code that is popular and has many examples.

PS: Where I eventually want to go is complete modification/creation of an application and giving the LLM the tools to compile and check the results by itself. Has anyone done this before and or is there a commercial product like this?

mokhir56 · February 25, 2025, 4:47am

Hey man… you really need to be using LLMs in an IDE.

If you use AI properly with tools that support it, it’s a 3000% speed boost.

If you use LLMs in a chat window, it’s mostly a helpful assistant

Ozone421 · February 25, 2025, 5:15am

Which IDE and extension or plugin do you suggest? I tried a few but most had some serious limitations or issues for me.

thomas11 · February 25, 2025, 8:41am

We’re using it in our SQL Studio component, attaching the schema of the database, allowing the user to use natural language and have it transpiled into SQL - Which allows our users to use natural language to “query” their database, and it works 100% perfect, as long as the schema isn’t too large …

vb · February 26, 2025, 1:55pm

I don’t want to suggest any commercial solutions without a proper reason but there is an open source project ‘aider’ that you can look into.
It’s a step into the right direction.

davejh · February 27, 2025, 9:48am

The problem with attaching very large numbers of files is that will burn a lot of input tokens. Depending on the model that can be a huge problem, but also ramps up the cost.

The Metaphor tool I mentioned (which is free and open source) lets you decide which files to embed into any prompt. Typically I’ll embed all the files I need modified and then any that are necessary to give the AI enough context to know what to do. For example if I want something rewritten in a particular style I’ll embed something with the style I want and tell the AI that I want it to use this as a style template.

The nice thing about generating these large context prompts automatically is if the AI does something wrong then I can tweak the design of what I’m asking and then get it to do the same thing again in a few seconds.

Another thing I’ve been exploring with a client is using fragments of API definitions and embedding them so the AI knows what a particular library or module is supposed to look like. This is a real headache with libraries where there have been many published versions as the AIs generally struggle with differentiating things that have changed over time

thomas11 · March 28, 2025, 1:52pm

You can’t create “programs”, but you can (sometimes) very efficiently create well defined components, which is where it shines …

lucid.dev · March 28, 2025, 3:22pm

My experience with the same has been similar. I’ve successfully gotten the LLM to produce around 25k lines of working python and .tsx files into a single integrated system. Of course, I managed the blueprint/architectural process of the project, and conceptually designed it. However I had the LLM work at all levels of blueprinting, re-factoring, code sprints, etc.

I echo @davejh - I’ve actually gotten recently .tsx files out of the LLM at around 1800 lines (primarily using o1 and sometimes o3-mini). I’ve done refactoring of highly logically complex python parts of the system by doing rounds of review (producing (.mmd) mermaid flow charts/sequence diagrams), blueprinting, and then giving a go for re-factoring… and getting 5-10 files at a time back, averaging between 300-500 lines each… in a response… very often 100% complete and compile-without-error code.

What helped me was:

Understanding quite specifically how the context window and stateless nature of the LLM’s functions, and how the prompt structure can be dramatically manipulated beyond traditional turn-based chat structures in order to achieve greater results.
Understanding you have to work within the scope of the training dataset that the LLM has access to, even if extending that via web search or direct provision of documentation, you’re going to hit limitations if the data isn’t already a part of the “training parameters” (though you “extend it’s capacity” within a given context window quite a bit, if you play your tokens right..)
As others said, either building or using an integrated IDE environment, specifically one that allows dynamic management of documentation and sharing of documents and versions during a “chat” (i.e. when sending a context window to the LLM)

+1 for the previously mentioned “humbug” : https://m6r.ai/ - that looks awesome!!

I can only state my own personal success with using the LLM for programming tasks - it was a significant learning curve, because you have to accept the process of designing your own flow for integrating the LLM into your process. But once you start to design a dedicated workflow and management process for providing proper prompting and context window building, like others said - the capacity is astounding. It’s truly remarkable. It’s just also very “limited” in a sense I see it like this:

the capacity of it to produce the desired response is very high,

but the likelihood of that happening is completely and totally based on:

YOUR capacity to provide the prompt (context window) that allows the model to output the response your looking for!

Topic		Replies	Views
I cant get chatgpt (or any llm) to actually contribute beyond examples Prompting chatgpt	1	205	March 20, 2025
Developers how do you efficiently modify any code using gpt4 or gpt4o? Prompting gpt-4 , prompt-engineering	7	2349	June 6, 2024
Turning chatgpt API into a assistant for a (complex) website API	20	4330	December 21, 2023
Proven and reliable productivity use cases for GPT4 Community gpt-4	32	5652	June 20, 2023
Are GPT writers a waste of time? GPT builders	17	1824	December 11, 2024

LLMs seem to be flawed at the core for real programming - Help Needed

Related topics