Playground assistants don't give the same response as python code

I have created an assistant through my python code. I’m trying the same assistant instance from the code and from the playground, both using the same prompt. However the response is different for each platform. More complete from the Playground (as expected) than from the code (I have tried several times with differents prompts)

Can anyone explain how this can happen?

My goal is to get stable, homogeneous and coherent responses. It’s that possible?

Since hallucination is part of the product, doesn’t seem like absolute consistency is attainable. You might want to mess with temperature and other settings but getting a word for word response from the same query doesn’t seem possible. Also, since we don’t know what’s behind the scenes the playground could certainly behave differently than with external API calls.

Thanks for your message @scharleswatson

I’m trying to fight hallucinations by using two assistants. I ask the first assistant to give an ample response, then, I pass the first output as input to a second assistant who will perfect the response. Also, I have a loop that iterates when the app or the second assistant thinks there is an error, so it asks for correction.

My results are not bad, but far from perfect. The worst thing about finding a solution is the instability that hallucinations are provoking. I have noticed models behave differently on the same day for short periods, so this is not helping to have a clear strategy.

Just sharing ideas to know if someone else found a way to keep hallucinations controlled.

Hallucination seems to be an antithesis to control, but maybe you’ll find a middle ground. Having a 2nd assistant checking is a cool idea, but at the same time just feeding in to the same issue. Interesting you find models behave differently over various periods of time, I think that feeds in to the fact we are using a closed source product that we simply have no real control over. So I guess embrace the chaos, or only use it for smaller tasks that might not be detrimental to your end product. But if your entire product is reliant on chatGPT, as things stand right now you’ll never get consistency, and it will continue to vary in this inconsistency over time, since you have no actual control of it. Just hope you aren’t working in banking, law, governance, medicine, or anything else that poor results might be detrimental to your end user. At the end of the day, this tech is in its infancy, so keep expectations moderate.