[PAPERS] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

PaulBellow · March 16, 2024, 12:45am

Quiet-STaR is a method that helps language models (LMs) to improve their predictions by teaching them to generate rationales, or internal thoughts, for each piece of text they produce. This method builds on an earlier system called STaR, which helped LMs learn by using rationales in a question-answering context. Quiet-STaR addresses three main challenges: the high computational cost of generating text, teaching the LM how to produce and use internal thoughts, and predicting beyond just the next word. The solution includes a new sampling algorithm that operates token by token, special tokens to mark the start and end of a thought, and an improved training technique. As a result, the model better predicts difficult parts of the text and improves its performance on complex reasoning tasks without needing task-specific training. This suggests Quiet-STaR is a significant advancement toward more general and scalable reasoning in language models. The quote from Kierkegaard at the end underlines the idea that understanding comes from reflection, just as Quiet-STaR allows an LM to “understand” text by reflecting on its internal rationale.

oceanpeach.vn · March 17, 2024, 5:08am

Is it the same as Q* which once raising in the Sam’s fire case last year?

_j · March 17, 2024, 5:13am

It is not a “q-star” whatever, like the nonsense video headline would have you believe.

This is from Stanford.

Star = Self-Taught Reasoner (multi-step agent)

Quiet, because like you’ve always been able to do, even in an OpenAI cookbook, only the final result is presented.

ezelikman · March 25, 2024, 1:10am

Hi, author here to clarify a few details. You might be confusing vanilla chain-of-thought prompting with Quiet-STaR. No worries at all - the main differences are

We train the model to generate more useful thoughts using RL, like in the original STaR paper from a few years ago,
Unlike STaR, we reward the model for generating inner monologues that help predict web text instead of answers to specific questions - this helps it generate thoughts that are less domain-specific

One of the coolest results is that this internal monologue also improves the model’s external CoT: by “thinking” before each external CoT token, the model makes fewer mistakes in its steps and scores better on reasoning tasks

But there are a lot of details needed to make this work. If the OAI cookbook happens to mention how to do tokenwise-parallel RL fine-tuning with learned meta-tokens and an LM-objective-based reward, please share that page and we’d definitely cite it as related work

ikibeer · April 8, 2024, 10:48am

I would like to test this approach and design. How can this be done?

haotian200107 · July 30, 2024, 9:17am

I check the code and find that <|startofthought|> and <|endofthought|> will not be sampled in inference. So I wonder when the model starts to think? Can we see the rationale explicitly in inference?

dustinw · September 11, 2024, 3:34am

It would help people not misunderstand your work if you included in the paper a full page or more of generated tokens. More specifically:

Cherry-pick an output-only string S, i.e. with all “thought” strings filtered out. You show:

S

Topic		Replies	Views
Algorithm of Thoughts [New Prompting Strategy] Prompting prompt-engineering	1	2309	December 14, 2023
LLMs as basis for general problem-solving Community tree-of-thoughts	22	4277	December 14, 2023
Impact of Pre-Structured Reasoning in LLM Prompts Prompting research , prompt-engineering , gpt	5	2070	January 21, 2024
Graph of Thought as prompt Prompting chatgpt	4	5136	November 16, 2024
Tree of Thoughts — prompting method that outperforms other methods Prompting gpt-4 , chatgpt , api , prompt	6	14413	December 14, 2023

[PAPERS] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Related topics