Foundational must read GPT/LLM papers

Initializing a new thread on the very best, must read, well-written, papers on Large Language Model capabilities, limits, and use. We’re all balancing ‘do I continue my own work’ vs ‘do I keep up with the literature so my work stays cutting-edge and relevant, and I don’t waste my time re-inventing the wheel.’
Suggested criteria for posting - if, as an academically inclined practitioner you were only going to read one paper on GPT and LLMs this month, would this be it?
Sorry to sound harsh, I really do encourage participation. In any case, I have no moderation authority, anyone can post anything they want!
Here’s my initial contribution. It’s from microsoft research, who has had full access to all of GPT4 long before most of us. Eric Horvitz, one of the authors and CSO (or something like that) of Microsoft research, is one of the smartest guys I know.
warning the paper is Long

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Note: There is a separate topic for discussions about this topic.

Blog pages

While these section is not research paper(s) they are blog pages that note key research papers.


I really liked this paper as it talked about some of the capabilities for generating new science as well some interesting warnings about the dangers of “AI Malassist” which is AI help bad guys do bad things.

I think this is an important paper as it talks about smaller models can perform as well as bigger models. Potentially relevant when you look at GPT3.5 being magnitude cheaper than GPT4.

Good paper for everyone to start out with. Dated now, but provides a good foundation.

Just one page to this paper, but it makes a point:

This was a fun paper. Check out the simulation here - Reverie

Good twitter user to follow who regularly posts popular papers:


Interesting paper on fine tuning llama without distilling from GPT. IBM

Similar paper

1 Like

Hey @qrdl, you posted a bunch of pprs! It will take me a while to catch up :grinning:
On the scientific one, nice paper outlining the basics of engineering LLMs into larger applications.
I noticed the comment about segmenting the documentation into 7800 token windows for embedding. Hmm - has anyone tried embedding multiple (two is probably enough) offset sliding windows of a text, in the case where the most relevant piece is split over two windows in one or another of the windowings?

On the ‘dangers’ - Hmm. Whom are ‘we’ protecting from what? Few people will have the actual physical resources to actually do any of the syntheses, and those who do probably don’t need gpt to figure out the synthesis. But that is just revealing my bias.

More importantly, I’ve been around long enough to be skeptical about how quickly new technology behind small demos will quickly top out.

Having said that, I’m spending sleepless days and nights persuing adaptations of exactly these techniques! Nice layout of the essential process of building around LLMs.

On the ‘dangers’ - Hmm. Whom are ‘we’ protecting from what? Few people will have the actual physical resources to actually do any of the syntheses, and those who do probably don’t need gpt to figure out the synthesis. But that is just revealing my bias.

The problem of AI Malassist I think was best discussed in this video. This is not sci-fi AGI kill all humans stuff, but a real and present danger. I really encourage anyone and everyone interested in AI safety to watch this. I link to to the critical part, but there are other earlier parts in the video that are related.

I am quite concerned about this, but I also am cognizant that this will be an excuse for regulatory capture, which I and pretty much everyone else would hate to see. It’s a difficult situation.

1 Like

I’m posting papers which aren’t necessarily foundational, but ones I find particularly interesting. Hope that’s ok.

I find any paper which addresses the ‘chain of thought’ type planning/execution/reasoning to be very interesting, and the more I can read the better. They all give me lots of cool ideas.

If the forum lets me, I’ll re-edit and add more COT papers here to avoid too much bumping noise.

Good paper with some already known tips (sadly they don’t bother looking for prior art unless its in a paper, I guess), but still useful anyways for lowering GPT4 costs (by 98% they say)

1 Like

Whose job does AI automate?: Whose job does AI automate? - YouTube
AI and Stochastic Parrots: A.I. and Stochastic Parrots | FACTUALLY with Emily Bender and Timnit Gebru - YouTube

Some thoughtful discussions on the real impact of AI on our society. I believe that the good will outweigh the bad, but a lot. But it is also important to recognize the potential problems now in order to develop the strategies to deal with them.

1 Like

Yeah those were good papers to start the discussion. I was a bit skeptical of the job one, but at least someone started the ball rolling.

The technical report on Palm 2 that was announced today at Google IO is out.

Doesn’t seem to do so well at humaneval, which by some, is considered a gold standard for python code generation.


I strongly recommend that people read the ReAct paper, it introduces a novel method for synergistically combining reasoning and acting in large language models (LLMs). Their approach outperforms existing benchmarks on various tasks but also tries to address common issues such as hallucination and error propagation in chain-of-thought reasoning.


Sorry about the deleted posts! Perhaps the forum software can be updated not to show them?

Moving discussion about papers to Discussion thread for "Foundational must read GPT/LLM papers"

Develop GPT-3 from scratch with Andrej Karpathy, legendary founding member of OpenAI (and lead AI engineer at Tesla). The focus is on the original paper “Attention is All You Need”.

The video is a model trained on Shakespeare, but you can evolve it from there. All code is in the Colab notebook. You can generalize it from the Bigram to the Trigram, etc. to get more and more GPT level quality. Also swap out the tokenizer for the one GPT-4 uses, etc.

Then develop on his nanoGPT framework on an A100, but that’s cheating :rofl:


Tree of Thoughts - Great paper, simple idea, described cleanly. More evals would have been nice, but its a start.

I actually proposed this idea a week ago, though I’ll admit the concept of prune/backtrack had not yet occured me, but from a DFS/BFS point of view seems sort of obvious now.

What’s interesting is this is somewhat thematically related to simbiotico, the mindmapping tool I built for GPT4. There are a number of alternative solutions, btw,

These are going to be a lot to read through. A lot of times I’ll read a paper and then find out it was not even close to anything I’ve needed. I’ve incorporated AI into my paper/report reading regiment.

If I am on mobile I use Ask Your PDF.
If I am on desktop I use LightPDF.

They help me interrogate the documents a lot faster than me trying to skip over parts and wondering if maybe I didn’t search it thoroughly enough. It’s rare that I find a paper that I need to read 100% of. So for the people in the same boat, I suggest using those to increase your research throughput.

1 Like

Know of a comparable desktop tool for linux? Seems like a great idea, too many papers, too little time.

Maybe I’ll have to build one

1 Like

answer my own question - there is a plugin for this
Also LinkReader

1 Like

And earlier paper with very similar ideas to the “tree of thoughts” paper:

1 Like

They conflate AI misuse/malassist with autonomous fantasy, but otherwise a foundational / must read.

TBH, you could just read this one page and gain about 90% of the utility of this paper.

Limits on Compositionality

LLMs just do pattern-matching on linearized short multi-step logic chaings, they can’t scale reasoning very far.

@jwatte will probably find much to agree with here.
I don’t find much to disagree with…


Thinking further about the paper, I believe the lack of ‘logical reasoning’ on language as compared to the incredible capability at code-generation speaks more to the existing of logical thinking in the training texts than it does to the inability of LLMs.
Maybe we should increase the representation of Principia Mathematica and Plato in the training corpus and decrease the representation of your favorite flame blog and troll posts.

You’re not wrong :smiley:

I think this will get further in pattern matching on logic that it’s trained on, but it still doesn’t have the ability to extrapolate to new patterns.
(It can extrapolate within existing patterns, sometimes to great effect, and sometimes … that’s where hallucinations come from!)

That being said – if this is a tool for “everyday work” rather than “high end thinking,” then maybe filling in enough of the basic cases will just make them good enough for those use cases?

1 Like