How to do repeatable testing for ChatGPT prompts?

gaurang · May 16, 2023, 9:47am

I am running in circles around gpt-3.5-turbo. Every day, I am making some modification to the system prompt, to prevent any of the following behaviours:

adding greetings and filler content before the answer (“Sure, I can tell you how to do X”) instead of outputting the answer directly
outputting the answer inside quotation marks (“hello”) instead of without them (hello)
Adding explanations to the answer (“This is how X works”) to make it longer when the user didn’t even ask for the explanation

After making the modification, I test it on a couple of sample inputs, and it seems to work fine so I commit it. Then the next day, someone else comes up with a new sample input with which this system prompt doesn’t work - ChatGPT doesn’t follow the instruction I gave in the system prompt. Now, I have to modify the prompt and test on all my sample inputs all over again.

So, question: is there a repeatable testing module for chatgpt - where I can add my sample inputs and have it run the prompt on them - to check that the outputs satisfy some requirements (like the three requirements I gave above)?

I can write this myself, but I’m hoping someone else has thought through this first

Follow up question: ChatGPT answers are non-deterministic, so sometimes the underlying issues are only revealed on the fifth or sixth submission (even for the same prompt). Is there any way to make these issues be “revealed” quickly on the first/second test run itself? By issues, I am referring to instances when ChatGPT doesn’t follow instructions from the system prompt (I gave three examples above).

chrisd · November 28, 2023, 7:25pm

Openai has announced that you can produce semi-deterministic responses by setting the seed parameter in the api. The seed should be set to an integer.

The responses are only semi-deterministic because sometimes the results can be different with the same seed. However, it looks like in most cases the result will be the same.

Although Openai do not state this, it looks to me that the seed is used to as the seed for their random number generator, so that the random numbers produced in generating the output are the same each time if the seed is provided.

Topic		Replies	Views
How can i reproduce chat completions? API gpt-4	11	1577	January 26, 2024
The seed inference parameter for reproducibility API	5	4218	December 13, 2023
Is there a way to set a a "random seed" for responses with temperature > 0? Prompting	11	11332	November 6, 2023
How can I generate a prompt that ChatGPT respects and doesn't hallucinate? Prompting chatgpt , api	2	1622	December 19, 2023
ChatGPT and API results are quite different API chatgpt , api	5	2525	December 18, 2023

How to do repeatable testing for ChatGPT prompts?

Related Topics