Finetuned gpt4o is not deterministic

a.b · December 20, 2024, 4:40pm

I finetuned the gpt4o and I fixed a seed for the job seed=123. This is my code for the Chat Completions:
completion = client.chat.completions.create(
model=model,
temperature=0.0000001,
top_p=0.1,
seed=123,
messages=[
{…}]

I run the model multiple times but each time I receive a slight changes. What I understand is by setting the seed during the finetuning I can garentee the reproducibility but it was not the case! Any explanation ?

Diet · December 20, 2024, 5:12pm

Welcome to the community!

(un?)fortunately nothing is really deterministic anymore. Temperature and top_p 0 can get you pretty far, but it also depends on your prompt to a degree. If you engineer your prompt such that the answer is obvious, you’re more likely to get “deterministic” behavior. If you created the equivalent of a slots machine with your prompt, even turning off the sampler won’t save you

I use response variability between models when I can (not logprobs) as a measure of prompt stability. I measure final workflow output though (not CoT steps).

a.b · December 20, 2024, 5:27pm

Thanks, understood.

Well my task is a translation from source to target language in a gaming context. I design the system prompt to ensure the output provides high-quality translations that are accurate, contextually aware, and preserve the original tone. It also ensures cultural nuances are effectively adapted to suit the target audience.

_j · December 20, 2024, 7:13pm

This was just revisited recently.

Translation is naturally a quite uncertain task. Give the assignment to two different human translators, you’ll get divergence within a few words if not the very first. An AI model, however, can explain why statistically instead of by intuition:

A very useful tool not (yet? Please) available with OpenAI would be assistant response continuation and completion. For developing a particular work or developing training, a native speaker could spot those highlighted token positions with high uncertainty or close ranks (or just abnormalities in the writing itself that becomes awkward), edit manually or choose from token alternates, even those close enough to flip despite deterministic attempts, and sent the AI back on path from that point to still produce, in total time, far faster than one might write themselves.

Diet · December 20, 2024, 7:19pm

yeah copilot2, that would be grand. But OpenAI would need to fire its safety team before that could happen, as far as I understand.

Topic		Replies	Views
Deterministic Results Impossible for GPT-4o API gpt-4 , chat-completion , api-temperature , seed	6	620	December 19, 2024
ChatCompletions are not deterministic even with seed set, temperature=0, top_p=0, n=1 API gpt-4 , api	9	1549	October 7, 2024
How can i reproduce chat completions? API gpt-4	11	3309	January 26, 2024
Achieving deterministic API output on language models - HOWTO API statistics	3	7784	December 21, 2023
Seed param and reproducible output do not work API	24	14749	March 10, 2024

Finetuned gpt4o is not deterministic

Related topics