Same prompt giving required output and then missing details when reprompted again

I have crafted a prompt for extracting some information from a document in a required format. The prompt gives me the desired output extracted from the document in the specified format and then when I prompt is again without any changes, then it misses information or does not follow the given format.

I am using an open source model and I have trained it.

Why is there inconsistency with the same prompt? Is the prompt missing something or is it the model that needs more training?

Are you doing it with the context of the previous request? Or as a new fresh request?

Also, you can’t guarantee output, but it should be really similar…

I have tried giving it previous context and with a fresh context also, but the result is same. According to me, if the prompt was not understood by the LLM then the result should not be generated even once.

Hi @jitender.verma ,

this can happen due to several reasons, such as a) temperature set to > 0, b) training made on not sufficient amount of data (different approaches require different qty of samples - like for p-tuning 20 samples can be enough, for LoRA it’s better to have 100+ samples, etc), or c) fine-tuning is done on good enough qty of samples but the quality of samples might not be good enough, d) the prompt might not be good enough. And even if all of that is not the case- I had this kind of issue with Cohere Classify model and I spoke to their engineers and they said that it could happen because of the complexity of inference and large computations on GPUs.

So unfortunately there might be multiple reasons and the only way is to iterate on the prompt-engineering side, increase the quantity of training dataset, improve the quality of training data (generating synthetic data if necessary), etc.

1 Like

Share your prompt. It’s impossible to diagnose prompt issues without knowing what the prompt is. Just share it and someone will likely be able to point you in the right direction.

Wait… This isn’t a GPT-4 issue? You’re using some open source model you’ve then done a further fine-tune on.

Yeah… maybe someone will be able to give you some advice with the prompt, but you’re literally off-the-map here. No one other than you had any intuition for your model’s behavior.

If you’re just asking this in a general sense, the answer is that GPT models are stochastic token-predicting machines. There’s always going to be some degree of unexpected behavior—by design.

Even if you do everything you can to try to clamp things down, setting a seed, setting a zero-temperature, using a very small top-p, etc, there is still the potential for non-deterministic results due to things like the difficulty of ensuring reproducible results in a heavily parallelized environment, especially when computing on the GPU.

In short, you’ll likely never be able to guarantee 100% reproducible results. The best you can really aim to do is to reduce the incidence-rate of bad results. For anyone here to have a hope of helping you do that though, they’re going to need a lot more information from you.

Just based on the context you mentioned, i have constructed similar prompt like yours, to extract information and demand it to fill those data into certain formats. My method is to use [xxxxx] format which contains the information or definition/rule, to be able to recall it when giving GPT the task. It makes GPT focus on [xxxxx]. Meanwhile, it is very much suggested that you gave GPT one or two examples which must contain a proper format of output. I usually wrote those examples in the end. I have used this method and ran them like hundreds of times. Missing information or not following given structure seems an extremely rare occasion to me. Last but not least, it is very notable that your example must be written concise, which might sometimes mean you have to avoid similar words that might confuse GPT ( I happen to run into this situation quite a lot)