Running the exact code returns sometimes wrong and sometimes correct output

I’ve been trying to match a list of words to their corresponding synonyms using GPT-4o. The problem is that sometimes the output is 100% correct and sometimes it’s totally nonsense. It’s just multiple runs without changing a single character.

I understand that the variations in answers are due to the randomness of the models, but it’s not to the extent of matching something like “alcohol” to “nicotine” as a synonym :sweat_smile:.

Looking at this old topic about the same randomness issue, I tried to set the temperature to 0 to make the model deterministic, but I still sometimes get 100% correct answer and sometimes 100% wrong answers.

Note: I don’t really know the list of words I’ll have, so I cannot code any validation functions for this.

Please let me know if there’s something I can do to get consistent correct replies. Thanks! :smiley:

The unpredictable behavior of the model’s responses is somewhat troublesome, but you can make the model attempt to sample deterministically by specifying a seed value (although this does not guarantee the same result will be reproduced).

I hope this can be of some help :slightly_smiling_face:

1 Like

Thanks for your response. Actually, I tried setting the seed several times but unfortunately, the issue is not solved.

1 Like

Compared to GPT-4 turbo, many seem to feel that gpt-4o is more unpredictable.

If you set the temperature and top_p parameters to very low values and use seed values and still get random results, there is nothing we can do.

No one knows why the results are so unpredictable, but it is what it is :sweat_smile:

1 Like

This is a close semantic match if not a synonym and explains why you are getting this behaviour.

I’ve seen similar issues with categorisation where it gives a general match but misses nuance.

AI not replacing humans any time soon :sweat_smile:

1 Like

I tried to make changes to the prompt so many times by saying synonyms to the word synonym trying to make it work, but sometimes it works and sometimes not. :sweat_smile:

It won’t replace humans in terms of logic and understanding, but the knowledge superpassed humans.

1 Like

Out of interest is the embedding of “booze” closer to “alcohol” than “nicotine”?

I suspect it might be …

You might think about embedding a large dictionary and do a cosine difference comparison with a small threshold …

… problem is your threshold might have to be dynamic to get similar results across domains.

Not a simple problem I suspect.

It’s not just about this example, I’ve got weird matches. They disappeared in the second code run and came back in the fifth though I controlled the temperature and random seed.

I’m currently using cosine similarity with the help of “The Fuzz” package. I’m just working to have a “smarter” solution because synonyms can be completely different in terms of tokens.

Thanks a lot for your recommendations.

1 Like