How to handle different ChatGPT answers to the same question?

Hello,
I am a computerlinguist and I want to compare the grammar abilities of ChatGPT with the grammar abilities of my rule-based parser. For this, I will simply ask ChatGPT some grammar questions to a given sentence, which are in principle questions that can only be answered right or wrong.

But asking the same question several times with a fresh context window, I get answers, which are sometimes wrong and sometimes true.

I think, I am not the first person with that problem. What is the standard to handle this? The simpliest solution would be to ask 20 times the same question and count how often it is wrong or true.

Can someone tell me a paper with the same problem?

Have you tried the Playground where you can control the randomness of the model output via the temperature and top-p settings? It should help you control the variability in your responses.

https://platform.openai.com/playground

Ps. If you want to check the responses to the same question 20 or even a thousand times you can set up an eval for this case:

Thanks, that helps a lot. I will play with the temperatur value.

Gets ChatGPT deterministic if the temperature is zero?

1 Like

It will never be fully deterministic until we make some exciting progress on the hardware side. Until then this is all the control we have but it already does help a lot.
There used to be a little pop-up bubble with more information when moving the sliders but I think this has been removed. So, even though it sounds like it can be deterministic, it won’t be.

Also, if the answer is not in your required format you can write a simple script to discard the reply and resend the request. With the API, and the Playground is just a user friendly implementation of the API, there are tons of options available.

Out of curiosity care to share which programming language or tool?

I ask because I use Prolog daily and regularly create parsers for data, programming languages and Turing complete languages but not human text.

Hello EricGT,
the rule-based parser is an own creation, which is not fully published yet. And it is solely for the German language. The parser works with many interpretations of a single sentence and sorts out the false ones by German grammar.

The upcoming paper describes the whole parse process. That is the reason for my question, I want to compare the grammar abilities of LLMs against the parser.

1 Like

This may help you in getting to your need.

The original post noted

In the reply to me is noted a slightly different need

While ChatGPT is based on an LLM it is a chatbot with training beyond just the foundational LLM.

So to me these are two different questions.


With regards to your question:

Can someone tell me a paper with the same problem?

I passed your post to ChatGPT to see if it would list some papers and it did not.

ChatGPT use to have plugins that could be used to search thousands of research papers and give a short list of relevant papers but that is no longer working for me.

My suggestion would be to post a new question asking for just the papers. I use to try to keep up with papers related to LLMs but there are hundreds being released daily and most are not worth the trouble to even read the title.

You should also note your level of expertise for this when asking. Don’t take this wrong but I get the impression you are writing a thesis paper for a degree.

HTH