A better Chain Of Thought prompt

I also think it may be useful in the COT Prompt to specify variable names. We know that GPT can write information to variables for storage. It may help to make this explicit.

I use an internal compression that is only seen within the role attribute to reduce my token count. It seems to only work on GPT-4.

The same basic idea as Better CoT. One tip I’d give is that if you lead the prompts output by re-stating your first step, the model will follow all the steps more reliably. The model likes seeing itself follow patterns so this just helps kick start it in following the steps.

1 Like

Pretty good CoT - I tried it with several cases and its process was impecable.

1 Like

I’ve evolved Better CoT into a full fledged programming language I’m calling INSTRUCT

8 Likes

While I have evolved a slightly different approach that is more efficient for our specific purpose, I find your CoT/Instruct information to be so foundational to effective prompt writing for these models, that I’ve declared this article required reading for anyone on our team who works with them.

Excellent work. Thank you for sharing.

2 Likes

Thank you so much… Honestly just trying to share what I’ve worked out with the community. There are definitely more specialized approaches you can take to prompting but I’m trying to find foundational patterns and feel like I’m on the cusp of that.

1 Like

Had you told me 2 weeks ago that you could essentially create a programming language for LLM’s that they would run reliably, I would have said you’re nuts. 2 weeks is an eternity in the land of LLM’s isn’t it? :slight_smile:

3 Likes

The programming language idea is genius. Take a look my promptlang() could be modified to work with COT.

2 Likes

A lot of my inspirations for INSTRUCT were taken directly from GPT. I noticed that it would often want to write python programs when trying to solve more complex problems. I was fascinated by the fact the model could write a program which it would then mock execute as part of its output. The issue is it would often get the programs output wrong. After some digging I narrowed the issue down to the model not seeing its variable assignments.

If the model can see the values being assigned to variables it will often come to the correct final answer. When it can’t see the value assigned to a variable it guesses at what the value is and often guesses wrong. That’s why I tell it not to write code in a lot of my example prompts. By telling it not to write code I’m preventing it from using the beloved python language it likes to use. It still wants to write a program which is fine but instead of using python it has to spell out everything its doing which includes any variable assignments. It will more often then not reach the correct final conclusion when forced to solve the problem without using code.

2 Likes

I am sure you have tried already the “auto-gpt” package. Pretty interesting. I think that the auto-gpt is the next step to CoT.

Has anyone here started developing something in the lines of Auto GPT?

I’m actually working towards something along the lines of Auto-GPT but I don’t want to build anything until I can leapfrog what everyone else has built. The core issue with GPT-4 is that as great as it is, it’s mediocre at planning. A human can easily outperform GPT-4 from a plan authoring perspective. LangChain, Auto-GPT, all of these systems currently lean on GPT to do all the planning and it generally sucks at that. That’s actually the last thing you should be asking it to do.

I want to solve that first (I’m making progress with INSTRUCT) and then I’ll build a brain of some sort…

3 Likes

I’ve been doing this reiteration of the first step in some cases without really theorizing about the reason for the effectiveness. Good tip!

2 Likes

Thank you so much for your post. I’ve already been doing the “chain of instructions”, but I didn’t realize you could put them in a single prompt using the method you described above.

I never would have guessed that the model treats the generated text as if it is part of the prompt. And upon testing, it does. Even GPT-3.5-turbo.

1 Like

That’s a useful way of thinking about it, but it’s not technically true. What is actually happening is the model is trying to predict the most correct next token. Determining the most correct next token requires considering all previous tokens. So, in that way, everything generated previously will always influence what comes next.

I have also come to the conclusion that, at least with gpt-3.5-turbo, it has essentially zero memory for anything except the tokens that it outputs. Therefore, assigning to a variable in memory will not work consistently, but having it output the value it assigns to a variable usually does.

I wrote this post months ago which might as well have been a year ago in LLM time. I’ve spent a lot of time with gpt-3.5-turbo over the last month or two and I would actually argue that it needs to see example tokens of proper behavior more than any of the other models. It definitely has a bias towards recency so it’s less likely to follow instructions if there are too many tokens between the instruction and the current token being predicted.

I’ve pretty much mastered the art of getting gpt-3.5-turbo to reliably follow instructions which I would have said was impossible even a month ago.

What I usually do is that I will guide the model to output the analysis first and then based on that it spits the actual output.

1 Like

I’m sure others have noticed as well, but chatGPT fails to provide the correct response to the family income problem, even though it follows the instructions as well as chatGPT4:

Plan:

  1. Determine Valerie’s brother’s monthly earnings.
  2. Calculate the combined monthly earnings of Valerie and her brother.
  3. Determine the mother’s monthly earnings.
  4. Add up the total amount of money earned by Valerie, her brother, and their mother.

Additional steps:

  1. Divide Valerie’s monthly earnings by 1/2 to find her brother’s earnings.
  2. Add Valerie’s monthly earnings to her brother’s earnings to get their combined earnings.
  3. Multiply the combined earnings of Valerie and her brother by 2 to find the mother’s earnings.
  4. Add Valerie’s earnings, her brother’s earnings, and their mother’s earnings to calculate the total amount of money they all have together.

Step-by-step calculations:

  1. Valerie’s brother’s earnings: Valerie’s brother earns 1/2 of what Valerie earns, so his monthly earnings are: Brother’s earnings = 1/2 * $5000 = $2500
  2. Combined earnings of Valerie and her brother: Combined earnings = Valerie’s earnings + Brother’s earnings = $5000 + $2500 = $7500
  3. Mother’s earnings: The mother earns twice the combined earnings of Valerie and her brother, so her monthly earnings are: Mother’s earnings = 2 * $7500 = $15000
  4. Total amount of money they all have together: Total = Valerie’s earnings + Brother’s earnings + Mother’s earnings = $5000 + $2500 + $15000 = $22500

The total amount of money Valerie, her brother, and their mother have together is $22,500.

One week it will be able to solve it the next week it can’t. Kind of all over the map right now

(this long old thread got a rude bump, btw)

This is essential to understanding what you can do with the AI. It has no inner monologue, thoughts, memory, variables, random number generator, increasing counters or any other things you might associate with machine intelligence.

The only thing it does is generate the next likely token based on what’s in the context window.

Especially in chat, it can only go off of what it has seen produced in prior outputs.

Prompts bound for failure:

  • Create an adventure game that has a 8-room fixed map of a spooky hotel.
  • The player will battle a D&D orc, create character attributes at random for this NPC but don’t tell the player the hit points remaining.
  • Think of a number between 1-10 and I’ll try to guess.
  • Simulate two people that try to answer the user’s questions, you’ll assign each personality attributes, a ranking 1-10 for [friendly, confident, knowledgeable]

There is no ‘map’, ‘hit points’, ‘number’, or ‘personality’ in those cases (nor can the AI make up things randomly). There is only generated tokens that are merely colored by the language that was previously used.

Then you have to consider the value of what you are even asking. “Roll a 256-sided die three times. Print the values. Then use those values printed to give me a CSS color code in hexadecimal”. Even if it could reliably do hex math, is that better than just asking for the random color? Set the temperature and top-p low and you get the same answer every time anyway, and even with high temperature the results are still heavily weighted.

1 Like