ChatGPT's Response to Context in Coding Prompts: Snake tests

TL;DR: tested ChatGPT’s response to the same coding request (“Ok write a python script that recreates the game snake”) set in different contexts. Found notable variations in the scripts’ complexity, structure, and playability, highlighting how prompt context shapes ChatGPT-generated code.

Experiment Setup

I gave ChatGPT the same final request but embedded it within different contextual scenarios. These ranged from straightforward prompts to ones with specific constraints or altered scenarios (like a tight deadline, instruction on providing only code-block without thought process, etc…).


Each different context led to a unique version of the Snake game, varying in code complexity, structure, and user experience. Some versions were basic, while others incorporated advanced features and refined gameplay mechanics. Best results were obtained by providing Custom Instructions, specifying to answer only in code blocks starting and ending with ```.

Repository and Evaluation

I’ve documented the entire process in a GitHub repo, which includes:

Insights and Discussion

Test the different scripts yourself, I will run more tests in the next days with different coding tasks, but keeping the same prompts. If anyone is interested I’ll share my findings.


Coding prompt the better way:

  • Don’t restrict the AI’s output; instead expand on the AI’s language production with planning ability;
  • Define programming environment for clarity;
  • Write better game instructions, or let the AI rewrite to better game specifications before starting.

I upgrade the model and the fun to ChatGPT 3.5 - where 10+ iterations would cost the same as one gpt-4-turbo response (or infinitely more free in ChatGPT).

1 Like

4 iterations and it still looks like you are the one assisting gpt :sweat_smile:. It’s almost like it confuses itself with all the thought process and unnecessary descriptions of what it’s about to do.

Hey it’s gpt-3.5-turbo: what one would have dismissed earlier for coding if gpt-4 hadn’t been taken down a peg.

The prediscussion is chain-of-thought, giving the AI planning ability. The AI text generation is one-directional; it can’t go back and add an “import” statement when it comes to the point of writing code tokens for a task and discovering the need for a library or extra function.

Did you try out what the AI actually wrote? Your “medium” is silly logo turtle code with “pen up” etc. that does nothing. You’re going to need a lot of manual assisting on that junk.

I added “rewrite the user’s specification” as a task the AI must also do…and handed the prompt I wrote over to gpt-4-turbo (gpt-4-preview-1106). Not GPT-4.

Guess what - as hint of its origins and qualities, it also gives a traceback at first go, then the code has the same faults as gpt-3.5-turbo: gameplay is too fast, it needed the window dimensions similarly squared, and wraps instead of crashing into the edge. But then to add insult it goes into code omission mode, filling its output with ellipsis elide markers. Then it again draws the food in the border wall I specify. Not worth wasting more tokens on what is clearly the same path as gpt-3.5.

You can try out no prompt except the date on lesser than GPT-4 and see how that works out for you…

1 Like