Getting inconsistent results with same prompts

Greetings from Australia!

I am trying to build a GPT-3 powered debate moderator and one of the features is that it needs to take two seemingly opposing arguments and identify the relationship between them. Following are prompts I sent to davinci_instruct. (temp=0, top_p=1)

The screenshot on the left had the desired result because GPT took the two seemingly opposing statements and synthesized a new idea that established a causal link between them which is that dogs want to protect due to their loyalty. Awesome!

The picture on the right has the exact same prompt but in a less “polite” fashion. I reproduced them a few times with the same results. Saying “please” yields me the desirable answer.

I then tried formatting arguments into one claim + three warrants(reasonings to back the claim). Then asked essentially the same question as the first screenshot.

I didn’t get the answer I wanted but maybe the prompt made it more confusing, but the answer is technically correct nonetheless in this context. So I tried again with basically the same prompt as the first time but with different IDs for the statements.


This is essentially the same prompt as the first one, I was even “polite”. But I didn’t get the same dialectical synthesis like in the first prompt…

Can anyone weigh in on this? Any help is appreciated!


Hey @philhosophy, I suggest you use finetuning instead if you want more consistent results. With a prompted model alone you would have to add examples to the input to steer how it performs completions. For finetuning, you can use the prompted model to generate the training data, then filter out the examples that don’t fit the behavior you’re seeking.


Hi @philhosophy :wave:

This is to be expected. It is also mentioned in the OpenAI docs.

The API is stochastic by default which means that you might get a slightly different completion every time you call it, even if your prompt stays the same. You can control this behavior with the temperature setting.

I’m not 100% sure on how to fix this but, being very intentional/specific with the prompt and providing examples of how you want it to complete may help.

e.g If you want a detailed completion, you can make the prompt as:

Describe the link between (1) and (2) in detail


Describe the link between (1) and (2) in 30-40 words.

Hope it helps.

1 Like

Hey Philhosophy

another philosopher here using GPT3 for philosophy as well

Isn’t it awesome to finally have a patient, unemotional interlocutor to test philosophical theses till your wife yells at you? lol ;p this is the best tool humans have ever made

to answer your questions

yes it is stochastic - but this is not really the issue - this will give very minor fluctuations when you are on the cusp of entity semantic relation (like say between x.1% and x.2 percent off for hypothetical example, the quantum bits might choose one or the other)

yes fine tuning would help, theoretically -but really not because on a prompt like this you want to not polute the semantic relations - you want it’s true opinion so to speak, if you keep using dog examples then no arguments about dogs, even mamals or cats will ever be safe again (eg) - it will all influenced by your arbitrary examples, make sure to randomly choose different points of the argument, never accidentally comparing 1 and 3 too much for example

in this case, keeping it simple for it, and saying “please” literally, accidentally, tied into a semantic of deeper grammar and more well reasoned, explicative, responses. I have seen how changing one, seemingly insignificant word, can change an entire branch of hidden semantics (in ethics, from the right action suddenly being the good action, or vice versa… a large distinction lost on GPT3, for example)

the short answer is keep it simple and just say please (and i use davinci not instruct - i don’t see it being any better, not for philosophy - i’ve spent thousand hours at this… literally)

for a prompt like this where you don’t want to taint the answer with your prompt, you need to keep the examples light - DON’T have those big C1W2E4.Wittgenstein format, you are confusing it :slight_smile:

also this does not appear to be syllogistic, GPT3 can and does and will think syllogistically out of the box, so that will mess with your random analyses as well between propositions - GPT3 is looking for structure, if you don’t want that structure, don’t give it any :slight_smile:

so keep it simple, less words is better, and give it a very good thorough example - just one really good one should do for what oyu want to to - keep temperature at zero of course - use the QnA settings from their example, that should be best for you, learn to use the probabilities feature to see what ir wanted to say and give it examples to move the numbers up as you want (you cannot move the numbers down, only up, give it positive reinforcement, no negatives)

if you have any questions at all feel free to msg me! i’d be happy to help


A Radically Minimalist Approach , I love it

Hey @joshbachynski! Not a philosopher here, more a philosophist :stuck_out_tongue:

I appreciate immensely your detailed insights and advice, as well as @m-a.schenk @asabet and @sps! I will definitely implement all of them to see how well GPT can do dialectics.

You guys are the reason I feel motivated to continue my project, thanks for being an awesome community and helping a newbie like myself. Rest assured I will have many more questions for you guys in the future xD

Lots to read, lots to do. Time to get to work! :smiley:


Hey @joshbachynski! I took your advice and tried out the Q&A format and giving the bots a few good examples. I gave it the first two examples and it did the rest.

Since in the examples I only gave one word answers, it followed the format and gave all one-worded answers, they are all very sensible answers and are kinda cool because they made GPT seem like a wise old man giving incomplete yet insightful answers for the student to ponder himself xD

Just to get some clarification for a piece of your advice earlier, by probability do you mean the Top_P parameter? Specifically, do you mean I should start low and only move up when I get answers that are undesirable? Or do you mean the “show probabilities” feature? As in I should aim to see the “right” answers have high probability instead of low?

Another question is, in my context of building a debate moderator where it has to detect context change, find causal/dialectical relationships between the arguments in a cross examination, find formal and informal fallacies…etc. Out of your extensive experience with GPT-3, what would you say are the limitations to its reasoning capabilities? Is there anything you would say that it just cannot do? Or would you say the limitations lie with us, in terms of how we “program” it and how much we train it?

Also, I liked the Google super highway vs bing bike lane analogy from your talk. Good stuff! :smiley:


can you give me a few examples (like you were teaching GPT3) or context change, causal/dialectical relationships, and finding formal/informal fallacies?

i suspect it can do the first 2, it might not be able to read between the lines and choose the subtle semantic of detecting formal or informal fallacies, but i want to try it (again, i tried before and gave up, it is difficult)

you might need fine tuning for more examples to nail the semantic for the latter two, which is no problem, build as much as you can in playground in stages and then hire someone on upwork to make a fine tuning model for you and host your AI, should not cost more than a few hundred bucks

re: probabilities i meant show probabilities, i do my prompts basic statements first line bv line to ensure i am increasing the right answer percentage change by clicking and seeing a test question at the bottom to see the “wrong” semantic moving down in percentage chance to be chosen, and the correct semantic moving up in chances to be chosen

x is y at the bottom of the semantic stack (the top of the prompt)

x is y because this in these conditions but excluding this at the top of the semantic stack (the bottom of the prompt)

or like in some philosophy, basic paradigm shifting definitions first, fine details later

sorry i am in a hurry i can give more examples if this does not make sense - lmk your examples for the 3 things and i will try to build examples for you for each

1 Like