Is there any gain in having GPT-4-Turbo edit/rewrite the custom instructions?

The use case here is writing a complex set of custom instructions for a Custom GPT.

These instructions are numerous, quite specific and complex, so I would not necessarily have GPT write them from scratch (e.g., via GPT Builder). However, I can write the prompt by myself, and then ask GPT to edit it.

In your experience, did you find any gain in having GPT-4-Turbo (or GPT Builder?) helping edit/rewrite a complex prompt?
By “gain” here I mean better adherence to the instructions, with less mistakes and attention lapses, etc.

Reasons for which it might not be a good idea is that GPT-4-Turbo doesn’t necessarily know the best way of prompting itself, so it’s just going to rewrite or summarize a precisely crafted prompt, possibly losing details in the process.

Conversely, even if that’s true, one could argue that GPT-4-Turbo will rewrite the instructions in a way which is “natural” for GPT-4-Turbo to interpret, specifically via relatively-high-probability sequences under the GPT-4-Turbo model. So maybe this will have a positive effect, making it easier for the GPT to follow the instructions. (Not sure if there has been any study on how the likelihood of a sequence affects the attention mechanism.)

Of course I plan to test this out as well for my usage case; but it sounds like something people here would have experimented with already and have some opinions, at least anecdotally.

An AI can get you to 80% if you only want to put in 50% of the effort required.

However, it doesn’t comprehend itself with the subtlety and techniques of an experienced human. It might not even write instructions in the correct “person”.

One benefit of “improving quality” by AI is that the sequences the AI generates will be more common and more similar to what it is trained on, thereby improving comprehension.

image

(in the last paragraph, you can see the original text of mine was written as one benefit of an “improve quality” - letting you infer I am talking about an exact instruction to the AI. That meaning has been changed.)

One benefit of “improving quality” by AI is that the sequences the AI generates will be more common and more similar to what it is trained on, thereby improving comprehension.

Yes, this is exactly the point I was making above.

I think you can summarize my question to whether higher-likelihood sequences (under a specific LLM) are “easier to attend to” and elicit better responses for that LLM; compared to semantically equivalent sequences but with lower likelihood under the LLM.

It seems a reasonable hypothesis, but I don’t know if it has been proven, nor it seems trivially true. I was wondering if people here has gathered any (anecdotal) evidence for that.

The availability of logprobs allows you to see how certain an AI model is of its response. This allows you to get down to your crafting of particular turns of phrase to ensure high likelihood of a response at least starting the way you expect.

Indeed, that’s how one could research it on some open-source LLMs, and it may be a nice idea if it has not been done before; but here to be honest I was being pragmatic, simply asking for empirical evidence from the prompt engineering crowd. Surely people must have noticed whether this is the case, specifically for GPT-4-Turbo.

My short term goal is just some intuition whether this seems to be true (in which case I would start revising my prompts with GPT-4-Turbo for my Custom GPTs), or not particularly promising (i.e., no noticeable difference between an already decent human-crafted prompt and a revised GPT prompt).