I think I've found a much better way to evaluate an LLM's true intelligence

ecvanbrussel · September 7, 2024, 11:24pm

I’ve tried to tweet OpenAI, but my tweets don’t reach you because I’m so inactive on twitter. I think I’ve found a much better way to evaluate an AI’s intelligence. Please read my tweets, and tell me if my testing method is a valid testing method.

And if you agree it’s a good way to test, then please retweet to let it reach the other companies as well.

x[dot]com/EvertvanBrussel/status/1832550630163898841

vb · September 8, 2024, 6:23am

Hi and welcome back!

If your goal is to discuss whether your approach is valuable to OpenAI and other researchers, I suggest editing your initial post to include information about what your approach entails, how it works, and why it would be a better solution compared to those currently in use.

Otherwise, simply posting a link to another link isn’t the most helpful way to start a conversation.

ecvanbrussel · September 8, 2024, 3:02pm

Okay well basically: I recently watched Atomic Blonde and Death Note, in both the movie and the show the protagonist finds themself in an extremely challenging situation. Where they have a certain goal in mind they want to achieve, but there are lots of antagonists working against them. And some allies who create more problems simply by making some honest mistakes, because they just aren’t as smart as the protagonist. And most importantly, in Atomic Blonde at least, there’s a character who is supposed to be an ally, but turns out to be a traitor.

I imagine that if a bunch of humans (preferably highly skilled story writers) would engage with an LLM in which they put the LLM in such a story. And especially the traitor should be written by the most intelligent and cunning human. Then if the LLM can actually overcome the obstacles and achieve its goal, that to me would be a real sign of intelligence.

Of course since the human writers won’t know exactly how the LLM will respond to each message, they will need to improvise on the spot. However, this also has an upside, because this means that future LLMs’ training data cannot accidentally include this test. (Or if it does, the writers can simply begin with a different premise and the problem is solved.)

You could even automate this test to some degree. By letting some or all of the characters (which would normally be written by humans) be written by a variety of the top LLMs at the time. But it would be important that every individial charachter would be written by a unique LLM, so that it hopefully brings in more creativity / unpredictability and unique challenges for the AI (that’s being tested) to overcome. Although I think for now, the LLMs we have wouldn’t be smart enough to make this test scenario both challenging and make sense still.

Topic		Replies	Views
Thought Experiment using GPT-4 as an AI trainer Prompting gpt-4 , api	5	1035	December 17, 2023
Suggestion for a Turing Test with ChatGPT Community gpt-4 , chatgpt	2	271	October 22, 2024
Biggest problem with LLMs: "LLMs don't know anything about how they themselves are built" Community agi	22	447	February 7, 2025
Iterated inner-voice: GPT-4 as a subcomponent of a general AI system Community	9	2670	February 11, 2025
New ideas for upcoming AI models Community gpt-4	22	2220	February 28, 2024

I think I've found a much better way to evaluate an LLM's true intelligence

Related topics