Foundational must read GPT/LLM papers

qrdl · May 15, 2024, 6:29am

I imagine he re-ran the test a few times, so I doubt that’s an issue.

One thing I didnt like was breaking at failure though, so I tried that. Results mostly hold up for gpt-4o, though the success at 30 is interesting.

{10: {‘prcntg_trials_passed’: 0.6666666666666666}, 15: {‘prcntg_trials_passed’: 0.6666666666666666}, 20: {‘prcntg_trials_passed’: 1.0}, 25: {‘prcntg_trials_passed’: 0.0}, 30: {‘prcntg_trials_passed’: 1.0}, 50: {‘prcntg_trials_passed’: 0.0}, 75: {‘prcntg_trials_passed’: 0.0}, 85: {‘prcntg_trials_passed’: 0.0}}

gpt-4
{10: {‘prcntg_trials_passed’: 1.0}, 15: {‘prcntg_trials_passed’: 1.0}, 20: {‘prcntg_trials_passed’: 1.0}, 25: {‘prcntg_trials_passed’: 1.0}, 30: {‘prcntg_trials_passed’: 1.
0}, 50: {‘prcntg_trials_passed’: 1.0}, 75: {‘prcntg_trials_passed’: 0.3333333333333333}, 85: {‘prcntg_trials_passed’: 0.6666666666666666}}

gpt-4-turbo
{10: {‘prcntg_trials_passed’: 1.0}, 15: {‘prcntg_trials_passed’: 1.0}, 20: {‘prcntg_trials_passed’: 1.0}, 25: {‘prcntg_trials_passed’: 0.3333333333333333}, 30: {‘prcntg_tri
als_passed’: 0.6666666666666666}, 50: {‘prcntg_trials_passed’: 0.6666666666666666}, 75: {‘prcntg_trials_passed’: 0.3333333333333333}, 85: {‘prcntg_trials_passed’: 0.3333333
333333333}}

Hmm, tried changing the seed as well: gpt-4o different results, but same perf:

{10: {‘prcntg_trials_passed’: 0.6666666666666666}, 15: {‘prcntg_trials_passed’: 0.6666666666666666}, 20: {‘prcntg_trials_passed’: 1.0}, 25: {‘prcntg_trials_passed’: 0.0}, 30: {‘prcntg_trials_passed’: 0.6666666666666666}, 50: {‘prcntg_trials_passed’: 0.0}, 75: {‘prcntg_trials_passed’: 0.0}, 85: {‘prcntg_trials_passed’: 0.0}}

gpt-4-turbo
{10: {‘prcntg_trials_passed’: 1.0}, 15: {‘prcntg_trials_passed’: 1.0}, 20: {‘prcntg_trials_passed’: 0.6666666666666666}, 25: {‘prcntg_trials_passed’: 0.6666666666666666}, 3
0: {‘prcntg_trials_passed’: 0.6666666666666666}, 50: {‘prcntg_trials_passed’: 1.0}, 75: {‘prcntg_trials_passed’: 0.3333333333333333}, 85: {‘prcntg_trials_passed’: 0.3333333
333333333}}

gpt-4
{10: {‘prcntg_trials_passed’: 1.0}, 15: {‘prcntg_trials_passed’: 1.0}, 20: {‘prcntg_trials_passed’: 1.0}, 25: {‘prcntg_trials_passed’: 1.0}, 30: {‘prcntg_trials_passed’: 1.
0}, 50: {‘prcntg_trials_passed’: 1.0}, 75: {‘prcntg_trials_passed’: 0.6666666666666666}, 85: {‘prcntg_trials_passed’: 0.6666666666666666}}

And for fun I fiddled with prompt placement and changed the names to AAA,BBB,CCC,DDD … same results.

Topic		Replies	Views
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	10565	September 3, 2024
Phasm - Macro Assembler of User Concepts Community chatgpt , project , macros , phasm	29	729	April 24, 2025
Day 12 of Shipmas: New frontier models o3 and o3-mini announcement Community shipmas	71	8242	December 26, 2024
Mystery model popped up on lmsys gpt2-chatbot - gpt4.5? Community gpt-4	53	11580	May 14, 2024
A sanity check for future plugins to access private SQL databases Plugins / Actions builders	61	5734	November 30, 2023

Foundational must read GPT/LLM papers

Related topics