LLMs seem to be flawed at the core for real programming - Help Needed

Has anyone actually been able to use ChatGPT or any other LLM for coding other than for simple code completion? No, I don’t want a 100 line python snake game.

I am a programmer with more than 20 years of experience with programming various languages from assembler all the way to C#, JavaScript, HTML, Java etc.

I have been trying for the last year, since the hype started about how smart these LLMs are, to actually use them for real-world programming and have not been able to do anything useful with them.

I gave it a new prompt this weekend and I gave up after about 2 hours to get a simple ExpressJS app to work. I always run into the same issue: The LLM has been trained on all the versions of the libraries and does not know the difference between the latest and older code. It always makes stupid errors and gets itself wrapped up around an axle of uselessness. Even when telling it to use a specific version of the library or the “latest” and including the links, they still can’t do it without me having to fix things.

Has anyone been able to get around this problem? Do I have to attach the full source code of the library I want to use? How do I solve this issue?

1 Like

Hello, @Ozone421.

Can you give a more precise kind of thing you’re trying to do?

Or even an alternative project if the one you’re working on is too highly confidential. So we can see if we get a proof of concept working here.

Would that be a feasible suggestion?

Probably not the answer you are looking for but I believe the use case of present day LLMs in general purpose coding is a myth/overblown.

Sure it’s easy to get them to implement a function given a definition or simple CRUD code. But the task of doing something very specific … (library / style / special considerations etc. etc.) is hard. And that’s before you even start the job of refining the code… (which is most of the job of a software developer).

To solve that problem I think requires a lot of prompting and other work above and beyond the LLM – some folks like using claude (sp?) there is the cursor IDE, lovable.dev – all examples of what I believe are a lot of refining work on top of LLMs – and at that they are not perfect.

The route I’ve committed to is using LLM’s to build nocode applications composed from smaller building blocks each of which is well defined and above the level of programming language, so it’s easier to get right, be useful and critically: be refineable and enhanceable. When it does write code it’s to a very specific plugin API that is well described in prompting.

Thats my take though I don’t have as broad an overview of the state of the art as some.
-J

1 Like

Ever tried to solve this using meta prompting? Because I rarely see someone here applying this.

I just saw now you’ve offered to attach the code potentially as well. Clearly seen faut on my PC. But not on my mobile, only now.

I’m not here to proof it may be possible or not possible to do things using llm or hype or unhype them. Only yo help to ser, especially in this case, how much we can close the gap and the model may potentially help or not. If that makes sense. :smiling_face: Nothing more. Nothing less.

So, if you’re still open to it we may have a look and see how far we get it together or if the models can’t do it.

If you want a counter-example, you could take a look at the Humbug tool I’ve been building for the last 3 months. 80%+ of this has been built by AI although it takes some effort by me to keep the architecture right (although I have AI do all the large-scale refactoring). As it stands, the app is just over 26k lines of Python.

It doesn’t have many dependencies (there are just 5 plus the Python standard library), but the next thing I’m building addresses some of the major problems with third party code.

Before I built this I did come up with a simple language, Metaphor, that lets me construct and iterate complex prompts. That includes modules that can be chained together and embedding files. Typical prompts range from about 1200 to 6000 lines, although the biggest one I’ve used was about 26k lines.

The biggest single new output I got was a lexer/parser combo at about 800 lines of code (had a couple of bugs to fix) and there are a couple of demo videos on YouTube. The most complex single output was a 1300 line refactor in the latest release (went out today).

Sources for both what I’m building and the metaphor compiler are on github:

GitHub - m6r-ai/humbug: A GUI-based AI development tool with integrated Metaphor support and GitHub - m6r-ai/m6rc: Metaphor prompt compiler

Thank you everyone for your answers. I tried a different route and seem to have come up with a solution. I first asked the LLM what they suggest based on a description of what I want to do and then I used the suggested libraries. I had previously suggested an older library that there wasn’t enough examples and documentation for. So when I used a newer and more popular library, the LLM produced near perfect code. I still had to resolve some issues but they were minor. I used ChatGPT successfully tonight to create the simple web application I wanted.

I am not sure how to use chatGPT for a large or complex project though. I don’t know how one would attach hundreds of files for review or maybe I should only attached files that are relevant to the changes needed which might be hard to do if you don’t know enough of the project.

So in summary: Know the limitations of LLMs. Use code that is popular and has many examples.

PS: Where I eventually want to go is complete modification/creation of an application and giving the LLM the tools to compile and check the results by itself. Has anyone done this before and or is there a commercial product like this?

Hey man… you really need to be using LLMs in an IDE.

If you use AI properly with tools that support it, it’s a 3000% speed boost.

If you use LLMs in a chat window, it’s mostly a helpful assistant

1 Like

Which IDE and extension or plugin do you suggest? I tried a few but most had some serious limitations or issues for me.

We’re using it in our SQL Studio component, attaching the schema of the database, allowing the user to use natural language and have it transpiled into SQL - Which allows our users to use natural language to “query” their database, and it works 100% perfect, as long as the schema isn’t too large …