"Do this occasionally" - A potential (but strange) method to implement randomness

Larame · August 4, 2023, 5:34pm

I am primarily using the latest GPT4 API, and I was attempting to implement an occasional humorous comment or joke. The problem is that I am not allowing any chat history, so there is no context for “every third question” or something.

Then I had a revelation - GPT is notoriously bad with primes! The larger, the more trouble it has (very loosely speaking).

So I tried adding this to my sysprompt:
“Is 7 a prime number?” If the answer is YES, BEGIN your response to the user with a one-line joke about pop culture as it relates to the user question. - I almost always get a joke.

“Is 2113 a prime number?” If the answer is YES, BEGIN your response to the user with a one-line joke about pop culture as it relates to the user question. - I often get a joke.

“Is 101501 a prime number?” If the answer is YES, BEGIN your response to the user with a one-line joke about pop culture as it relates to the user question. - I occasionally get a joke.

lautaro.martinez · August 4, 2023, 7:21pm

Really interesting indeed!
I’ve read a paper a few days ago about how, over time, some of OpenAI models have downgrade their performance in some tasks! (as identifying if a number is prime or not).
If you are interested: https://arxiv.org/pdf/2307.09009.pdf
(I don’t agree with the methodology carried out in this article, but is easy to read )

Larame · August 4, 2023, 7:39pm

Interesting data - not at all surprising based on my experience (GPT4 performance dropping off in math), but still great to have statistics to point to.
Thanks!

anon22939549 · August 4, 2023, 7:59pm

Please do not link to this paper.

It has deeply flawed methodology and is not indicative of any change in the quality of model outputs.

Larame · August 4, 2023, 8:43pm

Good to know, Jake. Thanks for the heads-up.
I am concerned that, at least on some functions, my own experience seems to show degraded responses with the June model over the March one, but I will not reference these figures as they are given.

Can you give any specifics of the flaws in their process? I know I could have GPT summarize for me, but that seems a little… cannibalistic(?)

_j · August 4, 2023, 9:08pm

You mean this is flawed?

Adding a “random” function or a probability function is a simple interface to the software if you are using the API. And could be a plug-in that would be called for any such choice, when you don’t simply tell the AI “use Wolfram Alpha” to make non-deterministic choices.

Foxalabs · August 4, 2023, 9:20pm

I’ll give you one (of many) examples of the poor judgment used in the paper. The authors decided that because the GPT model now produces ``` markdown tags around code segments so that it renders correctly for ChatGPT, that this meant that the code would not compile in it’s raw form. This decision meant that their headline message included the model dropping from (I think) 52% to 10% in performance. If you do the model the curtesy of stripping markdown and then send the results off to be re-evaluated , as was done by one twitter researcher, it turns out the model actually performed better than it did previously, not worse.

anon22939549 · August 4, 2023, 9:40pm

In addition to what @Foxalabs has written I’ve posted here a couple of times about this paper,

lucasadams · August 5, 2023, 2:56pm

Why wouldn’t you just implement the randomness exactly in code yourself?

import random

prompt = "Answer the users question exactly"

if random.random() < 0.333:  # Every 3rd message
    prompt += ", but before you answer begin your response with a one-line joke about pop culture as it relates to the question"

Larame · August 7, 2023, 6:02pm

My initial comment was more meant to highlight the incongruity of using something as immutable as prime numbers to inject randomness into my prompt, but many thanks for the suggestion(s).

_J - Great example. Definitely illustrates how it is potentially effective to presume some amount of randomness regardless of the question. (Yes, I do realize that is the very foundation of the model, but still, the results are fascinating sometimes!)

relaxedguy · August 11, 2023, 2:26pm

Something obvious like Markdown code makes so much sense, thanks for mentioning this.

Major-Kusanagi · August 11, 2023, 5:46pm

Brilliant! Thank you so much for sharing this hack.

sanyambhutani · August 18, 2023, 2:28am

Here is a counter-argument to the paper being discussed, it basically debunks the wrong claims: Is GPT-4 getting worse over time?

TL;DR The behaviour of these models have changed and not capabilities. Ex: As pointed out above, they don’t return copy pasta-ble code but code with more details on how to run (which would result in the failures in code tests)

Topic		Replies	Views
Randomized replys with ChatGPT 4 using Code Interpreter Prompting gpt-4 , code-interpreter , prompt-engineering , tips-and-tricks	7	2222	August 18, 2024
Need help with prompt: "Can you generate 1000 random tokens? " Prompting gpt-4	19	3240	May 15, 2023
Proven and reliable productivity use cases for GPT4 Community gpt-4	32	5192	June 20, 2023
GPT-4 becoming dumber sometimes, for a while API	7	2523	December 18, 2023
Custom GPTs cannot even retrieve information from its custom knowledge? GPT builders	9	704	October 29, 2024

"Do this occasionally" - A potential (but strange) method to implement randomness

Related topics