I don’t see any value in a system scoring itself.
So prove it. I will specify something end of next week and you release the finished product on GitHub as an example, within two days.
You don’t give the model just the code. So it is not the model doing the evaluation. It is the model transforming your prompt into tests… You can give a lot of information into it. Or do you mean tests like “hey model check this code”
Let me give an example. Prompt: Python code, input two strings, concatenate them, space between, function. Generate 10 tests with corner cases and run them and print them for me.
def concatenate_strings(str1: str, str2: str) -> str:
"""Concatenates two strings with a space in between."""
return f"{str1} {str2}"
# Define test cases with corner cases
test_cases = [
("Hello", "World"), # Normal case
("", "World"), # First string empty
("Hello", ""), # Second string empty
("", ""), # Both strings empty
(" ", "Test"), # First string contains only spaces
("Test", " "), # Second string contains only spaces
("Special!@#", "$%^&*()"), # Special characters
("LongString" * 10, "Short"), # Very long first string
("Short", "LongString" * 10), # Very long second string
("こんにちは", "世界") # Unicode characters (Japanese)
]
# Run tests and print results
test_results = [(str1, str2, concatenate_strings(str1, str2)) for str1, str2 in test_cases]
import pandas as pd
df = pd.DataFrame(test_results, columns=["Input 1", "Input 2", "Concatenated Output"])
import ace_tools as tools
tools.display_dataframe_to_user(name="Concatenation Test Results", dataframe=df)
I would look at this, copy and paste, and use this code based on code inspection and self test results. Does this make me bad?
Always better than no tests for sure.
Hi ‘nearly 4 o’clock’, we meet again! I loved your response to that thread - I hadn’t thought about OpenAI making use of local server IP geolocation and passing that on to an unsuspecting ChatGPT! As to following you, no sorry - that’s perhaps your self importance emerging again lol. I spotted this thread and was reading through when I spotted your unusual and intriguing avatar, and then spotted your put-down. See you again sometime ![]()
So I’m looking at a huge PR in which the LLM wrote the code and the new tests and miraculously the CI is green.
What confidence do I have in the PR? Not that much.
If the LLM’s goal is to make the tests pass, it has the freedom to change the code and/or the tests
This is too much freedom and in this case you have the LLM deciding the spec.
It will do it better than 90% of the developers that do it normally
By the way I have checked thousands of results of my code analyzer manually and from my perspective it was flawless.
You don’t think we add a gpt model to controll stuff, do you?
Code does. Solid rule based systems created by GPT is the maximum I can imagine.
Good developers are better than 90% of developers ![]()
Yeah, and 90% of developers believe they belong to the 10%.
A joke popped up in my head when I read this:
Students of University Foo evaluated themselves and all received 100% score!
or, more real-life
The department of justice investigated itself and discovered nothing wrong
I can understand where you’re coming from. In a ideal setting the unit tests are built to benchmark and constrain the AI.
If you ask it to write these tests, then they will just be matching whatever it decided to write, and not a true evaluation
They are not meant to test model quality. They are meant to make sure the code still runs when you change it later.
Other than that they have no purpose at all.
While I appreciate you starting a discussion on the important topic of AI’s impact on software development, I feel compelled to address a broader issue: the manner in which you’ve been interacting with other users throughout this thread. It’s not so much about my personal exchanges with you, but more about the overall tone you’ve set and how it’s impacting the potential for a productive conversation for everyone.
From the outset, your approach has been characterized by a dismissiveness and condescension that, frankly, is not conducive to a healthy forum environment. It’s becoming increasingly difficult for anyone to engage constructively when met with such consistent negativity and a perceived need to assert dominance in the conversation.
Looking at your interactions with various users, a pattern emerges:
- Dismissing @j.wischnat’s expertise: When @j.wischnat, who seems to be genuinely trying to engage with your points, mentions GitHub Copilot and the current state of AI coding, your response is to immediately question their programming abilities and imply they are out of touch:
This immediately shuts down @j.wischnat’s attempt to contribute and makes them defensive.
- Condescending towards general knowledge: You repeatedly imply that anyone who doesn’t immediately grasp your claims or express skepticism is simply uninformed and relying on
This is a broad generalization and dismisses the valid experiences and perspectives of others.
- Ignoring valid points with assertions of fact: When I, and others like @curt.kennedy, raise concerns about the need for human oversight, coding knowledge for effective AI use, or the impossibility of 100% testing coverage, you often respond with pronouncements like
or
suggesting this is some perfect, magic-bullet solution without providing any such evidence and also insinuating others not using this prompting method are somehow interior. without actually addressing the substance of the arguments. This is not discussion; it’s simply stating your position and dismissing all others.
- Creating a hostile environment for questions: When @j.wischnat and I, and even @CupCake, express curiosity or ask for more information about your system, you become defensive and even accuse @j.wischnat of being
for simply asking questions. This discourages curiosity and makes it feel as though questioning your claims is unwelcome, maybe it is. But, you’re making some pretty grandiose claims, you should expect others will want some information they can verify and validate.
- Boasting and self-promotion without substance: You consistently highlight your own accomplishments and the supposed superiority of your “secret” system, stating things like
and
. However, you steadfastly refuse to provide any verifiable details, making these claims sound less like insightful contributions and more like unsubstantiated boasts.
It’s not about whether your system is real or not, or whether your predictions are accurate. It’s about the way you are communicating and the impact it’s having on this thread. A forum like this thrives on open discussion, the sharing of diverse perspectives, and respectful debate. Your current approach is actively undermining these principles by creating an atmosphere where questioning or disagreeing with you is met with dismissal and condescension.
If you genuinely want to discuss the future of AI and coding, I would strongly encourage you to reconsider your communication style. Engage with people’s questions and concerns respectfully, explain your points without resorting to arrogance, and consider that others might have valuable insights to offer, even if they don’t perfectly align with your own views. A more collaborative and less adversarial approach would benefit everyone, including yourself, and could actually lead to the kind of insightful discussion your initial post seemed to invite. It’s worth noting that in some of your more recent posts, there’s been a slight shift towards a less confrontational tone, which is a positive step. However, the overall pattern of dismissive and condescending communication throughout the thread remains a significant issue that I hope you will address.
You’re clearly intelligent and capable, but your responses also come across as arrogant and egotistical. Your approach to this thread has ensured it devolved into a contentious and combative argument instead of a fun, friendly, and fruitful discussion.
I can only hope this is a one-off for you, maybe you’re simply having a bad day and when you have a chance to reflect on how you’ve communicated here you will choose to do it differently next time.
![]()
This is not exclusively what unit tests are for. It’s not necessarily put in place to make sure that the code runs, but that it runs efficiently and as expected. This includes even how errors are handled.
For a small system it doesn’t matter, but as it grows in complexity it’s important to ensure that changes in some location don’t cause changes in others.
LLMs, and how they are used is typically in small containers of code. It’s nice to know that if it modifies Module A, the downstream task to Module B is still running as expected.
Personally, I delegate a lot of coding to LLMs. I still continuously find it trying to do things that aren’t exactly necessary because, well, it doesn’t need to know the grand picture. I imagine in the future this will change though.
You mean a step by step plan to build it by yourself. Nah, not gonna happen.
That is what I wrote. When you change the code… Of course I meant the overall code base.
Wow, very philosophical question outline and long thread – jumping on a few points from the OP that jiggle my own evolving opinion: –
-
Existing programming languages (python, javascript, C, asm) where designed to help humans convey intention to computers (asm, Fortran), and then do so minimizing human error (modern languages with type-checking). I find the conversation about AI writing python code misses the point – it’s like using rocket technology to make a better steam locomotive. It’s a failure of imagination to measure AI in terms of how good it writes python and critically it burdens AI will an unnecessary task of getting the semicolons right (metaphor), which is error prone. So I imagine (and pursue) the nocode route as a first step in bypassing that failed paradigm.
-
The real consequential job of “programmers” is at it’s peak is very high level executive conceptualization and problem solving, I grew bored of programming languages 20 years ago, the real skill is understanding a problem space, simplifying it and “coding” something that works great in what the problem becomes 2 years down the line. I’ve only met maybe <10 ppl who think this way – it’s always somewhat a frustration to encounter teams who are so very very serious on where the semicolons go. The difference in thinking I’m conveying is very important when we consider how AI can be applied.
-
On a sidenote (and follow-on from above). I was blown away 1 year back when I attached some functions to a gui that could draw shapes, but not yet delete them – for fun I asked it to delete a blue circle it had just drawn – “I can’t delete the circle but I have added a white one above it to hide it” <== this was more innovation than 75% of the programmers I ever worked with! Right now today, 50% - 75% of programmers are obsolete in terms of the jobs they fill.
-
I think the 1 year timeframe overestimates just how slow some of this change may take – ppl can be very attached to their ways once you look at corporations, their structures, concepts. The 2 other great disruptive events I lived thru 1. mass adoption of internet (yes I’m that old), 2. globalization each took 10 years of shake-up, grasping and re-org to settle on new equilibriums.
-
Those are some thoughts of the immediate future, in my old age wisdom I’m shy about trying to predict anything beyond next 1-2 years, in sci-fi terms we could be approaching the “singularity” event beyond which everything changes so quickly we can’t begin to fathom. I don’t know whether to be excited or scared.
Jochen, no, my friend, obviously I am not referring to GPT; I am talking about the artificial intelligence industry applied to different fields. This technology, which is now novel, will soon be implemented in all systems—we will have AI even in the most absurd places, just as we now have the internet even in washing machines.
What is important, and what I emphasize, is how beneficial this technology is in certain fields and how highly dangerous it is in others.
This technology offers incredible versatility. It is only a matter of time before it is integrated into weapons. There is a video on the internet of some Chinese robots performing a choreographed dance that is genuinely terrifying. We are no longer just talking about machines in virtual environments; we are also talking about machines in physical environments—machines that, sooner or later, will control highly dangerous systems.
@jochenschultz, I have to say, your response leaves me a little puzzled. To think I was asking for a “step-by-step plan” seems to miss the point of the questions many of us have been raising here.
It does bring up something important, though. If just showing some real evidence – enough for someone like me, who you see as just getting information online, to understand and maybe even copy your work – would be enough to recreate your “masterpiece,” then it probably isn’t as incredibly complex as you suggest. It’s hard to believe something so easily copied would be brand new, especially in the fast-moving world of AI.
The simpler explanation, and what most people in AI development believe, is that we haven’t reached that point yet. The fact that we haven’t seen truly independent AI coders, even though many smart people are trying to build them, says a lot.
Also, if you really had this amazing technology, it’s difficult to imagine you wouldn’t be using it more widely. There are so many problems it could solve, for everyone. It feels like you’d almost have to share it somehow, even if you wanted to make money later. And practically speaking, even a small demonstration of what it can do would surely be worth billions.
When claims are this extraordinary, as the saying goes, extraordinary evidence is required. And that’s really what many of us have been asking for. Because ultimately, the original point of this discussion wasn’t about your specific, unverified system. It was about the broader future of software development and the potential impact of AI. But when the discussion keeps circling back to claims that seem, frankly, quite unbelievable without any supporting evidence, it becomes difficult to have a serious conversation with you about those larger, genuinely important questions.
So, honestly, I’m struggling to connect what you’re saying with what seems logical and realistic. Maybe if you could explain things more clearly, beyond just saying it exists, it would help us understand better, and perhaps get back to the more meaningful discussion we started with.