Inspired by @vertinski and built by Codex. I am only a bystander here
Top notch video!
The temperature tuning experiments shed more light on the levels of abstraction in this particular Codex model.
I posted the link to the one that I generated…not sure what you mean
right, and the model, as you’ve seen was able to get the job done in half as many lines of code…300 something vs 140
To be fair those 300 lines were generated as a result of Your on-camera tuning action. I already got the code in 140 lines, with game logic intact – using my original prompt and settings. As the saying goes – don’t fix what is not broken.
Although the viewers seemed to be entertained by all the action, judging by the comments
I agree on the github code abundance in the case of snake games.
My reply was more about the style of code and prompt presentation in the video.
Speaking about more complex or just different tasks — my 3D Cube animation prompt works, and is even reproducible and adjustable.
But, taking in account the gatekeepey and somehow misrepresentative nature of information channels about this new craft of Prompt Design, I’m starting to reconsider my publishing strategies.
I’m not against showing the “outside-of-beta-test-crowd” people the prompting magic of Codex — I’m all about showing this tech to more people. But firstly there is a huge need to explain, in more detail, what and how important the prompt actually is. And specific wording of the promt is very important.
Yes, I noticed that effect when testing many prompts. I guess the engine is still learning and changing too.
Also, I was thinking about future prices of the tokens. The code generation and testing process eats a lot of tokens! So, they gotta be somehow affordable, or — the prompting has to evolve into a max efficient craft. Which I also like…
The outputs will usually be identical, however when the probability of the next token in the sequence is very close between two different tokens, some non-deterministic floating point operations might give a very slightly different result, causing the chosen token to be different. After one different token, all subsequent tokens are likely to be affected as well, causing a different completion.
tbh it also retains a bit of memory between many re-runs – can observe results converging to mostly one thing; and generates slightly different quirks again after a complete engine restart. (0>temp>0.05; no parameter change, for testing purposes) sooo, idk 'bout that…
I think you’re reading too much into the random behaviour. It’s the same model, with no state / memory.
The code generated by Codex should either be verified by humans, or there should be significant safety restrictions to what it can be done within it, so that it can be safely deployed. Yes, no rocket moon landing without anyone checking the outputs just yet!
Let me try to get my head around the phrase “code generation”. Is it actually creating new code which we have never seen or is it simply recalls from its memory (like a fancy google search) the code which already been written by one of us? I believe it is more of the latter and so should expect a lot of these code don’t even compile or work efficiently given the pace we have seen in these domains. Great achievement still, just want to make sure we don’t necessarily frighten people unintentionally.
I was going to say can it tell the difference between multithread, multiprocessing, and coroutine? I hope not, at least not yet. We should always reserve rooms for improvement
Nice!!! I was implying whether it knows when to apply the appropriate multi/coroutine given situation without explicitly telling it for obvious reason (otherwise we will have a lot of inefficient code). Or what happens if we can find examples in Github for async sorting but no one hasn’t contributed to github a UTF-8 transformation in coroutine yet and we need to write a program that requires both. How often will the model be updated? If someone contributes a new piece of code to the internet, will google or this model finds it first?
Btw, this is the best solution we have at hand and it is understandable to get to 99 percent, we need to get past 60 percent first. While I feel blessed to be part of this journey, it does seem to me that coding is not the ideal problem for the laws of large numbers or a pure ML problem because we don’t really need many permutations to express an algorithm and there are best solutions. The fact we have 3000 GitHub repositories of snake pygame says more about we techies like to reinvent the wheel than anything else. And I hope that we will never have the problem to decide between 100 ML frameworks at a particular moment. This, however, does lower the barrier for the general public to become programmers for popular subjects while at the same time it creates another problem for infrastructure as they need to power up to bear the brunt of all these inefficient code. And this problem could get worst with code quality as more of these generated inefficient code arrive at github.
nicely put.
Btw, the consumers of tutorials are humans. We love examples and more often than not, more the merrier. Not sure the same can be said for computers.
If we are looking at another way to find code, for search, it is a zero-sum game. Have we seen what happened to yahoo?
You meant these snake games actually got stars, sigh…
Btw, I hope I didn’t rubber the conversation the wrong way. This is a really cool and history-defining project.
I can’t stop wondering what new skill this will create. Will Shakespeare or Jedi style of writing net us the same favoritism in code?