What's your system for creating and iteratively improving prompts?

Curious how others here are approaching creating and improving prompts. My method is pretty basic:

  1. Start with the closest example I can find from playground examples.
  2. Modify until I get a working result, save it as version 1 in playground and save a backup of it locally/google docs etc.
  3. Copy version 1 as version 2, and improve it. Once it’s where I like it, I try to strip it down to be as short as possible without effecting results. Rinse and repeat.

What is your system? Is anyone more methodically testing variations of a prompt using a script to replace common words, etc? Curious to hear the current “state of the art”.


That’s a very interesting question @mlapeter,

I personally have a similar workflow as yours. Here’s another very recent post that you might find interesting

1 Like

Thanks @sps! That is helpful. I’m sure we’ll very soon have some kind of version control and a more automated way to iteratively improve things. I find it super interesting since it’s such a new field and everyone’s kind of just inventing their own way at the moment.


I’ve been doing exactly that. I have a set of scripts to do the finetuning and processing, and as I do each one, I archive the previous jsonl file. I’m on my 13th iteration now. :slight_smile: I’ve got a Medium article about prompt engineering, but not about the archival process.


I like to write “half” the prompt, and let GPT-3 complete the prompt for me. So I know what’s more likely to be “within distribution” and it generally yields better results than me working the prompt on my own (which may sometimes be very different than what is common on the internet)

But that was back when the base models were the default. I haven’t tried it with the instruct series models