in scenario 3, I’ve often seen the model confuse itself:
after the follow-up prompt, the model might say "Apologies for the confusion in my previous response."
the model will then sometimes rehash the entire process from the start
when it comes to concluding, you might get completely unreliable results because it may start mixing and matching incongruent points from two different reasoning processes. this hapens when you have similar sequences in a long context.
this can also happen in scenario 2, but it’s easier to get the model to accept the reasoning at face value.
Found a example of how including the line of reasoning in the prompt increases output quality.
Note that the models have been improved over time and some of the examples will today be answered correct without additional help. But the general conclusion is correct:
instructing the model exactly how to reason about a specific type of question will improve the output for that use case.
Also, all of this is happening inside a single prompt-reply conversational turn.
I can’t really tell you about the source of the reasoning in this scenario but I’ll still try shed some light on the motivation for my question:
Within my domain, I have access to this mysterious cheap way of acquiring gold standard reasoning r* for any p. However, I don’t have a direct source for the corresponding output, o.
My goal is to fine-tune a model that can efficiently transition from ‘p’ to ‘o’, utilizing this r*.
If the output quality of ‘p r → o’ and ‘p → r o’ is similar, I can train my model to use r* in it’s input.
Otherwise I would have to train my model to approximate r* in it’s output and then generate o. This would yield both lower quality r (and therefore lower quality o) and also incur higher costs because of r is now output tokens instead of input tokens which are cheaper.
–
My scenario is most analogous to case 2 in your diagram, just with the alteration that the reasoning comes from a different source. Since I am fine, cheating a model I should be able to prevent it from rehashing the entire process!