Impact of Pre-Structured Reasoning in LLM Prompts

Hi all,

I am interested in how pre-structured reasoning within AI model prompts affects output quality. Specifically, I’m examining two structures:

  1. ‘p r → o’, where reasoning (r) is part of the prompt (p), and
  2. ‘p → r o’, where the model generates reasoning after the prompt.

Does including reasoning in the prompt (‘p r’) enhance output (o) similarly to when the model generates its own reasoning?

Any insights or related studies would be very helpful!

Thanks!

1 Like

Hi!
The answer you are looking for can probably be found easily.

Example 1: tell the model how to reason in general and if done correct it will improve the results.

Example 2: provide specific instructions as to how to behave and act and the result quality will differ accordingly.

  • explain the concept of mass as if you were a researcher from bizarroworld (but I don’t have a paper for this)
2 Likes

There are countless extensions on this including,

  • Tree-of-thought
  • Graph-of-thought
  • Program-of-thought
  • Etc.

If the basic “Chain-of-Thought” doesn’t give you what you need, perhaps one of these will.

(I’ll provide links when I’m able.)

1 Like

I have a dumb question :thinking:

where did the reasoning in this case come from?

Or are you thinking

( p → r ) → o

There are some technical considerations that are more specific to OpenAI’s GPT chat models if you’re interested:

There didn’t use to be a difference, but it now seems to make a big difference whether it’s a user or assistant message.

sequence diagram

sequenceDiagram

actor User

activate User

alt 1: allow machine to reason and conclude

User->>gpt call 1: user: prompt + please reason, then conclude

activate gpt call 1

gpt call 1->>gpt call 1: generate reasoning

gpt call 1->>gpt call 1: generate conclusion

gpt call 1->>User: assistant: reasoning + conclusion

deactivate gpt call 1

else 2: present machine reasoning as human reasoning

User->>gpt call 1: user: prompt + please reason

activate gpt call 1

gpt call 1->>gpt call 1: generate reasoning

gpt call 1->>User: assistant: reasoning

deactivate gpt call 1

User->>gpt call 2: user: prompt + reasoning + please conclude

activate gpt call 2

gpt call 2->>gpt call 2: generate conclusion

gpt call 2->>User: assistant: conclusion

deactivate gpt call 2

else 3: present machine reasoning as machine reasoning

User->>gpt call 1: user: prompt + please reason

activate gpt call 1

gpt call 1->>gpt call 1: generate reasoning

User-->>gpt call 2: user: prompt + please reason

gpt call 1-->>gpt call 2: assistant: reasoning

deactivate gpt call 1

User-->>gpt call 2: user: based on that, please conclude

activate gpt call 2

gpt call 2->>gpt call 2: generate conclusion

gpt call 2->>User: assistant: conclusion

deactivate gpt call 2

end

deactivate User

in scenario 3, I’ve often seen the model confuse itself:

  1. after the follow-up prompt, the model might say "Apologies for the confusion in my previous response."
  2. the model will then sometimes rehash the entire process from the start
  3. when it comes to concluding, you might get completely unreliable results because it may start mixing and matching incongruent points from two different reasoning processes. this hapens when you have similar sequences in a long context.

this can also happen in scenario 2, but it’s easier to get the model to accept the reasoning at face value.

2 Likes

Found a example of how including the line of reasoning in the prompt increases output quality.

Note that the models have been improved over time and some of the examples will today be answered correct without additional help. But the general conclusion is correct:
instructing the model exactly how to reason about a specific type of question will improve the output for that use case.

Also, all of this is happening inside a single prompt-reply conversational turn.

2 Likes

Hi Diet,

Thank you so much for your detailed response!

I can’t really tell you about the source of the reasoning in this scenario but I’ll still try shed some light on the motivation for my question:

Within my domain, I have access to this mysterious cheap way of acquiring gold standard reasoning r* for any p. However, I don’t have a direct source for the corresponding output, o.

My goal is to fine-tune a model that can efficiently transition from ‘p’ to ‘o’, utilizing this r*.

If the output quality of ‘p r → o’ and ‘p → r o’ is similar, I can train my model to use r* in it’s input.

Otherwise I would have to train my model to approximate r* in it’s output and then generate o. This would yield both lower quality r (and therefore lower quality o) and also incur higher costs because of r is now output tokens instead of input tokens which are cheaper.

My scenario is most analogous to case 2 in your diagram, just with the alteration that the reasoning comes from a different source. Since I am fine, cheating a model I should be able to prevent it from rehashing the entire process!

Thank you again for your input!

1 Like