When using the new gpt4-turbo models it seems to work better when I leave examples out of my system prompts. As soon as I put them in (even with 3-4 different examples), the model sticks too closely to those examples, and I basically get an overfitted answer.
Let’s say that I want json output: I’ve had better results by providing an output template, than to give 3 output examples. When I use the examples, it will stick too closely to the values I had used there.
I should say that I always work with temperature 0 or close to 0 as that makes the most sense for the type of work that I do. I did not see this before the turbo models were introduced though.
Any others experiencing this, or do I just need to work on my “example engineering” craft?
Personally, I’ve never gotten multi-shot working satisfactorily.
We’re discussing multishot here, if you wanna have a look at it:
for json output, you can do stuff like “start with {” and “you are part of a larger system, and must provide json output - any other response will cause the system to crash.”
it’s generally capable of following simple schemas (I like using TS type definitions) fairly reliably.
Adding examples tends to add a lot of unnecessary noise to your signal with minimal utility, as you’ve discovered. Working on your description on what it should do, and cutting as much noise as possible is generally a good idea IMO.
so tldr: this individual (me) believes using examples aren’t a good idea in most cases.
Sorry for the late reply, but thanks for taking the time to answer in detail!
By the way, I never have issues with the model generating json: I just give a json output instruction, and also add the json parameter to the call. My question was more about the best way to specify the specific format of the generated json.
You answer was useful though and confirmed some of my own findings, thanks!