Foundational must read GPT/LLM papers

bruce.dambrosio · May 7, 2023, 10:04pm

Initializing a new thread on the very best, must read, well-written, papers on Large Language Model capabilities, limits, and use. We’re all balancing ‘do I continue my own work’ vs ‘do I keep up with the literature so my work stays cutting-edge and relevant, and I don’t waste my time re-inventing the wheel.’
Suggested criteria for posting - if, as an academically inclined practitioner you were only going to read one paper on GPT and LLMs this month, would this be it?
Sorry to sound harsh, I really do encourage participation. In any case, I have no moderation authority, anyone can post anything they want!
Here’s my initial contribution. It’s from microsoft research, who has had full access to all of GPT4 long before most of us. Eric Horvitz, one of the authors and CSO (or something like that) of Microsoft research, is one of the smartest guys I know.
warning the paper is Long

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Note: There is a separate topic for discussions about this topic.

Blog pages

While these section is not research paper(s) they are blog pages that note key research papers.

qrdl · May 7, 2023, 11:07pm

I really liked this paper as it talked about some of the capabilities for generating new science as well some interesting warnings about the dangers of “AI Malassist” which is AI help bad guys do bad things.

I think this is an important paper as it talks about smaller models can perform as well as bigger models. Potentially relevant when you look at GPT3.5 being magnitude cheaper than GPT4.

Good paper for everyone to start out with. Dated now, but provides a good foundation.

Just one page to this paper, but it makes a point:

This was a fun paper. Check out the simulation here - Reverie

Good twitter user to follow who regularly posts popular papers:

https://twitter.com/_akhaliq

qrdl · May 9, 2023, 4:10pm

Interesting paper on fine tuning llama without distilling from GPT. IBM

Similar paper

bruce.dambrosio · May 9, 2023, 6:35pm

Hey @qrdl, you posted a bunch of pprs! It will take me a while to catch up
On the scientific one, nice paper outlining the basics of engineering LLMs into larger applications.
I noticed the comment about segmenting the documentation into 7800 token windows for embedding. Hmm - has anyone tried embedding multiple (two is probably enough) offset sliding windows of a text, in the case where the most relevant piece is split over two windows in one or another of the windowings?

On the ‘dangers’ - Hmm. Whom are ‘we’ protecting from what? Few people will have the actual physical resources to actually do any of the syntheses, and those who do probably don’t need gpt to figure out the synthesis. But that is just revealing my bias.

More importantly, I’ve been around long enough to be skeptical about how quickly new technology behind small demos will quickly top out.

Having said that, I’m spending sleepless days and nights persuing adaptations of exactly these techniques! Nice layout of the essential process of building around LLMs.

qrdl · May 9, 2023, 7:20pm

On the ‘dangers’ - Hmm. Whom are ‘we’ protecting from what? Few people will have the actual physical resources to actually do any of the syntheses, and those who do probably don’t need gpt to figure out the synthesis. But that is just revealing my bias.

The problem of AI Malassist I think was best discussed in this video. This is not sci-fi AGI kill all humans stuff, but a real and present danger. I really encourage anyone and everyone interested in AI safety to watch this. I link to to the critical part, but there are other earlier parts in the video that are related.

I am quite concerned about this, but I also am cognizant that this will be an excuse for regulatory capture, which I and pretty much everyone else would hate to see. It’s a difficult situation.

qrdl · May 9, 2023, 8:03pm

I’m posting papers which aren’t necessarily foundational, but ones I find particularly interesting. Hope that’s ok.

I find any paper which addresses the ‘chain of thought’ type planning/execution/reasoning to be very interesting, and the more I can read the better. They all give me lots of cool ideas.

If the forum lets me, I’ll re-edit and add more COT papers here to avoid too much bumping noise.

Good paper with some already known tips (sadly they don’t bother looking for prior art unless its in a paper, I guess), but still useful anyways for lowering GPT4 costs (by 98% they say)

SomebodySysop · May 10, 2023, 5:31am

Whose job does AI automate?: Whose job does AI automate? - YouTube
AI and Stochastic Parrots: A.I. and Stochastic Parrots | FACTUALLY with Emily Bender and Timnit Gebru - YouTube

Some thoughtful discussions on the real impact of AI on our society. I believe that the good will outweigh the bad, but a lot. But it is also important to recognize the potential problems now in order to develop the strategies to deal with them.

qrdl · May 10, 2023, 8:42pm

Yeah those were good papers to start the discussion. I was a bit skeptical of the job one, but at least someone started the ball rolling.

The technical report on Palm 2 that was announced today at Google IO is out.

Doesn’t seem to do so well at humaneval, which by some, is considered a gold standard for python code generation.

N2U · May 13, 2023, 5:09pm

I strongly recommend that people read the ReAct paper, it introduces a novel method for synergistically combining reasoning and acting in large language models (LLMs). Their approach outperforms existing benchmarks on various tasks but also tries to address common issues such as hallucination and error propagation in chain-of-thought reasoning.

qrdl · May 14, 2023, 3:04am

Sorry about the deleted posts! Perhaps the forum software can be updated not to show them?

Moving discussion about papers to Discussion thread for "Foundational must read GPT/LLM papers"

curt.kennedy · May 15, 2023, 1:58am

Develop GPT-3 from scratch with Andrej Karpathy, legendary founding member of OpenAI (and lead AI engineer at Tesla). The focus is on the original paper “Attention is All You Need”.

The video is a model trained on Shakespeare, but you can evolve it from there. All code is in the Colab notebook. You can generalize it from the Bigram to the Trigram, etc. to get more and more GPT level quality. Also swap out the tokenizer for the one GPT-4 uses, etc.

Then develop on his nanoGPT framework on an A100, but that’s cheating

qrdl · May 20, 2023, 9:15pm

Tree of Thoughts - Great paper, simple idea, described cleanly. More evals would have been nice, but its a start.

I actually proposed this idea a week ago, though I’ll admit the concept of prune/backtrack had not yet occured me, but from a DFS/BFS point of view seems sort of obvious now.

github.com/openai/evals

eval approach: GPT4 provides alternative responses with reasoning and then has to pick the best one

opened 02:43AM - 14 May 23 UTC

qrdlgit

### Describe the feature or improvement you're requesting I think it would be… interesting to see how GPT4 performs on more ideal scenarios, where COT is allowed extensively. In particular, GPT4 is allowed to think step by step along several different approaches to solve a problem and provide its reasoning. Some prompting should be done to ensure that it tries to look at the problem from different angles / viewpoints / perspectives. Then, in a second prompt, it's asked to pick from the approach it thinks is most likely to be correct. If there has been a paper or effort done on evaluating GPT4 for something like this, I'd greatly appreciate a link. :) ### Additional context _No response_

What’s interesting is this is somewhat thematically related to simbiotico, the mindmapping tool I built for GPT4. There are a number of alternative solutions, btw,

codie · May 20, 2023, 9:41pm

These are going to be a lot to read through. A lot of times I’ll read a paper and then find out it was not even close to anything I’ve needed. I’ve incorporated AI into my paper/report reading regiment.

If I am on mobile I use Ask Your PDF.
If I am on desktop I use LightPDF.

They help me interrogate the documents a lot faster than me trying to skip over parts and wondering if maybe I didn’t search it thoroughly enough. It’s rare that I find a paper that I need to read 100% of. So for the people in the same boat, I suggest using those to increase your research throughput.

bruce.dambrosio · May 20, 2023, 10:45pm

Know of a comparable desktop tool for linux? Seems like a great idea, too many papers, too little time.

Maybe I’ll have to build one

bruce.dambrosio · May 21, 2023, 6:41pm

answer my own question - there is a plugin for this
AskYourPdf
Also LinkReader

N2U · May 25, 2023, 5:20pm

And earlier paper with very similar ideas to the “tree of thoughts” paper:

qrdl · May 25, 2023, 9:02pm

They conflate AI misuse/malassist with autonomous fantasy, but otherwise a foundational / must read.

TBH, you could just read this one page and gain about 90% of the utility of this paper.

bruce.dambrosio · May 31, 2023, 7:43pm

Limits on Compositionality

tl;dr:
LLMs just do pattern-matching on linearized short multi-step logic chaings, they can’t scale reasoning very far.

@jwatte will probably find much to agree with here.
I don’t find much to disagree with…

bruce.dambrosio · May 31, 2023, 9:57pm

Thinking further about the paper, I believe the lack of ‘logical reasoning’ on language as compared to the incredible capability at code-generation speaks more to the existing of logical thinking in the training texts than it does to the inability of LLMs.
Maybe we should increase the representation of Principia Mathematica and Plato in the training corpus and decrease the representation of your favorite flame blog and troll posts.

jwatte · June 1, 2023, 12:00am

You’re not wrong

I think this will get further in pattern matching on logic that it’s trained on, but it still doesn’t have the ability to extrapolate to new patterns.
(It can extrapolate within existing patterns, sometimes to great effect, and sometimes … that’s where hallucinations come from!)

That being said – if this is a tool for “everyday work” rather than “high end thinking,” then maybe filling in enough of the basic cases will just make them good enough for those use cases?

Topic		Replies	Views
Best way to create responses that exceed token length Prompting	10	5158	December 17, 2023
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	11125	September 3, 2024
2-shot plus step-by-step prompts for gpt-3.5-turbo performance at gpt-4 level? Prompting gpt-4	33	8146	December 25, 2023
Text "sculpting" for long-form completions Prompting	3	1643	June 14, 2021
Providing context to the Chat API before a conversation Prompting gpt-4 , gpt-35-turbo , chatml , chatml-system , chatml-user	8	58358	December 13, 2023

Related topics