2-shot plus step-by-step prompts for gpt-3.5-turbo performance at gpt-4 level?

another paper along these lines … [2305.02301] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Very interested in similar references, especially fresh ones.


Great find @qrdl :hugs:

They make a good observation about their data:

It is worth noting that the behavior of the our downstream smaller models is subject to biases inherited from the larger teacher LLM.

I found a similar paper about improving instruction data for smaller models i found very interesting:


I just realized that there might be a need for a disclaimer here:

For those who are not familiar, arxiv.org is a popular repository for scientific preprints in various fields. While it’s a fantastic platform for sharing research quickly and openly, please note that the papers posted there haven’t necessarily undergone the rigorous peer-review process that is standard for published journal articles.

Tldr: ArXiv Paper ≠ Peer Reviewed


The point is reasonable, though after reading and trying to repro a lot of papers that have been ‘rigorously peer-reviewed’, well…

Frankly, the only papers that matter to me are the ones with git repos attached. Peer reviewed or not. But maybe that’s specific to my field.

1 Like

I can totally understand where you’re coming from, I’m having the same experience as you.

But when I say rigours I actually mean it, it’s very normal for papers to stay at the “peer review stage” for 6 months or more when submitted to an actual journal, it’s a shame the process is so long, but it’s usually because the author has to do more work. The comment I made earlier would most likely cause the author to provide more data or rewrite his article, had I been a peer reviewer for the article Bruce posted.

Unfortunately there’s a lot of scam “peer-review” journals out there, I was spammed by 3 such journals just this morning, all they do is publish your article 4 money.

It’s a huge problem, because these journals will be the first to publish every time there is some “new, hot tech”, in this case GPT-4 and large language models in general. This period align very well with the period in which the “new thing” is most newsworthy.

What we end up with is mainstream media and various internet services promoting science that’s not really peer reviewed.

Yup, agree completely.
Too much irreproducable junk is published, even in peer-reviewed journals. Even with repositories and reproducibility, results can be so narrowly applicable as to be useless. Maybe we can use this topic (or start a discord, or …) where we help each other track papers that meet our expectations? There are a few gems out there in this fast-moving field, I know I’m missing most.

I need to read the paper but I would add that 2-shot is essential how I’ve been getting GPT-3.5 to behave in my Self-INSTRUCT stuff. I use GPT-4 to generate an example that I feed into 3.5 and it definitely improves 3.5s reasoning moving forward

You may not find all that much there. It’s a pretty expensive prompt if you don’t need TOM capability. I’m always looking to understand, from a TOM perspective about LLMs themselves, where their limitations are.

Here’s another paper I just stumbled on. Long, but probably a must read. Eric Horvitz is chief scientific officer (or something like that) at Microsoft research, and a very smart guy. Acknowledgement - I followed a link in another topic about prompts, @mstefanec found it before me.

Sparks of Artificial General Intelligence: Early experiments with GPT-4


Yeah, that’s a classic at this point. Really great paper, must read.

Maybe start a new thread (I’ll contribute!) though to track generic great papers as it’s somewhat off topic to this one.

Good idea, what should we call it? (the new thread?)

Good question, I like asking GPT4 for advice on these things


Maybe “must read GPT/LLM papers”? heh

works for me.
I’ll repost that link to start the thread

1 Like

forums rejected the title! must be at least 25 chars :rofl:

I added a ‘Foundational’ up front

1 Like

For anyone looking for this it’s here - Foundational must read GPT/LLM papers

If you post a paper in another thread that works well in the foundational one, link back so people find it :slight_smile: It’s a little hidden away in the ChatGPT category which is muted for everyone I think. Not necessarily a bad thing

1 Like

Didn’t realize that! it was the closest category I could find, no way I could find to create a new category. Maybe I can still edit the original post if there is a better category you can suggest.

The community category would probably be good. It’s below the fold, but I think it’s most appropriate.

FYI the API needs you to send the entire session history each time you want a response, as they don’t retain any state on their end. So conversations get exponentially more expensive as they go on, as this history counts towards token costs each time.