Foundational must read GPT/LLM papers

This is pretty close to what I mentioned above wrt quality of human ‘reasoning’ in most of the training corpus:

Entailment as Robust Self-Learner

Basically, training on text that is logically entailed by prior text delivers much improved performance.

Duh. Anyone surprised?

1 Like

A significant step forwards in model inferencing performance and context.

1 Like

@bruce.dambrosio this is really good work! Might I suggest you rethink this as a wiki so you might be better able to organize papers by category?

You can turn a topic or post into a wiki here. I’m not sure off-hand if there is a trust-level requirement or not.

After I drop my kid off at school I’ll take another look.


This post is now a wiki and anyone should be able to edit it.

You can turn a topic/post into a wiki by clicking on the three dots menu item for the topic/post, clicking the :gear: then selecting make wiki.

Please see What is a wiki post? for more details.

You could just make your original topic post a wiki, but I think it might make more sense for you to start a new topic and organize it with some structure and bring all of these papers together more concisely.

I’d also include them as simple links rather than as “onebox” links.

I’d then also add a second post to the wiki outlining the “rules” for editing the wiki in terms of structure and how you want people to add additional papers for inclusion or consideration (maybe have an “Uncategorized” category for people to just drop links to papers, etc.

One thing I think would be nice is a format like this for each paper:

Paper Title (as link to source)

  • Published date
  • List of authors as links to their pages


Summary written by ChatGPT



  • Link to a separate topic created for discussion of this particular paper

Discussion summary

Summary of the topic discussion made by Discourse AI.

I think this type of structure would keep the wiki post very focused and high-quality.

Feel free to use any of this or none—I think this should be your baby and I’ll be very happy to support you and it in any way you want.


This topic was originally created as a spinoff from conversations between @qrdl, @bruce.dambrosio
and myself, and because the “sharing papers” pm between me and @bruce.dambrosio was reaching +100 posts.

I think it’s a good idea to turn this into a wiki section after a bit of “off-topic” cleanup.

Let’s discuss

1 Like

It is a wiki now.

Click the image button below the first post.

You can move your references to papers into the first topic then delete your reply.

Working to get a Table of Contents added to the topic.

No need to include more people for the Table of Contents, 2 admins and Bruce have needed info in DM from me.

1 Like

ORCA ppr. An oldie but goodie. I especially like the example cases at the end for exercising my latest prompt ideas…


Was the paper that changed the entire way I created prompts and instructed the model, great post!

1 Like

I’ve recently been reading about toxic language detection & moderation using AI, and I found these papers particularly interesting due to their different approaches to the generation of training datasets.

In this study they introduce a modified BERT model specifically designed for detecting abusive language in English using data from banned Reddit communities. Not only does HateBERT surpass general BERT models in identifying hate speech, but the research also delves into how training data influences the adaptability of such models across various datasets.

The next paper tries to address the issue of toxic language detection systems inaccurately flagging mentions of minority groups.

This study introduces “ToxiGen”, a large dataset produced with GPT-3, that generates both subtle toxic and benign statements; initial tests showed that refining toxicity classifiers with ToxiGen significantly enhances their accuracy on human-written content.


Was pretty interesting to me. Something on the nature of intelligence and conciousness.


Probably not ‘Foundational’, but the idea of using multiple embeddings of a prompt, then sending the retrieval results from each as part of a separate gpt query, then ensembling the gpt query results, is certainly an idea I hadn’t thought of.
Only relevant for multiple-choice type queries, or maybe feed all results back again to llm for final compilation?


Liquid NN’s one of the most impressive looking potential steps forwards I’ve seen.


It looks interesting, but can it be applied to a transformer network?

1 Like

Similar to the “tree of thoughts” paper, here’s “graph of thoughts”:

Thoughts and comments can be found in the discussion topic over here.


Welcome posters of papers!

I want to remind forum members that posts to this thread should be the links to the paper and a brief summary of the paper only.

Please use the accompanying discussion breakout thread to discus papers of interest


Some stuff I run across isn’t necessarily ‘foundational’ and the depth isn’t always that extensive (at least math wise), but it talks about an idea that I’m interested in which perhaps doesn’t lend itself to formal exploration - when few other papers will. Probably because it’s hard to measure precisely and conclude concretely.

This is one example:

It’s particularly relevant I think because of GPT-4s visual capabilities, where you can generate code from UML modeling.


The biggest takeway I got from this was how it used object constraint language / OCL to enhance GPT4’s code generation capability. When I go to experiment with it, I’ll need to read that more carefully to see what skimming missed.

They used plantuml (text based uml), so some of that isn’t useful given image support but it is interesting. Maybe converting uml diagrams to plantuml first might work better than straight to code, especially if the diagrams are extensive and disparate.