Tuning the prompt?

The very existence of the “Prompt Assistance” category seems to suggest that a properly phrased prompt can have a positive impact on performance. Also, GPT3 is totally capable of producing similarly worded, but different, prompts.

Does it make sense to fine tune my prompts this way? By having GPT3 generate lots of similar ones, and then keeping the ones that work best? Or is that some form of over-fitting? Or does GPT3 do something internally which would render this approach ineffective?

Probably a silly question, sorry. Just makes me smile to think of using GPT3 to brute force good prompts for GPT3.

2 Likes

What you propose is called “meta prompt” or “prompt chaining”. The hardest thing is measuring how “good” a prompt is, since natural language is not easy to quantify. Thus you have to move to more subjective, qualitative measures for some tasks. In other cases, you can quantify “did GPT-3 get the answer right, yes or no” but forcing natural language into a boolean like that is kinda missing the point of reaching towards AGI.

Thanks for the link, fascinating that this is a real problem that people are thinking about! This seems like a good tool for varying the prompt. I’m not sure I understand what you mean by repeating (input output) pairs, though. I probably need to read the article more closely.

1 Like

Let me make sure I understand: by “create” you are referring to openai.Completion.create(model, prompt) from the python API? And by “pairs” you mean a list of many completions, all from the same prompt? I see that you can always get more completions, and then fine tune with those. I see how you could repeat that process, and get a better fine tuned model, with diminishing returns. But I don’t understand in what part of that process the prompt changes.

I think you are saying that the iterative process, of fine tuning for better and better completions, makes more sense than doing a similar thing, but trying to vary the prompt?

1 Like

I’m very interested to know more about prompt chaining! I think that the tool m-a.schenk has provided will assist me in trying it out. Do you know of any projects that use prompt chaining with GPT3 already?

My goal is only to use GPT3 as effectively as possible. Are you saying that if I try to use it for tasks that are qualitatively measurable, on which other machine learning techniques could be applied, that its an inferior tool?

It seems to me like a small amount of prompt chaining, when possible, would be useful.

1 Like

The opposite. Conventional ML approaches are inferior, largely because comp sci purists are only trained to think quantitatively. I see people publishing studies about GPT-3 and other LLM and their ability to do trivial tasks but no one is really exploring the full depth of what it means to have mastered language. You can ask GPT-3 to write Shakespearean sonnets, but you can’t put a number on that, so a lot of scientists ignore it.

For instance, this response is far more nuanced and comprehensive than 90% of humans are capable of thinking, when it comes to science. I think the fact that GPT-3 surpasses many people is why they don’t comprehend it.

1 Like