ONE question = 2 completely DIFFERENT answers on 4 vs 4o

althea_ko · July 9, 2024, 1:35pm

I’ve been trying to compare different models to choose one optimal one for me since ChatGPT 4 has gotten so bad lately, pretty much unusable. I’ve been doing comparisons between GPT, Perplexity, Claude and Gemini. Today I decided to do more 4 vs 4o testing, since I have a paid version and debating about cancelling it.

My input was - what is the most effective optimal method to diffuse essential oil during the day while I’m working at my desk.

ChatGPT4 - DIY hot water diffuser.
ChatGPT4o - Cotton ball or tissue.

I asked to provide reasoning for each answer.

ChatGPT4 - stronger dispersion, sustained release, adjustable intensity, safety and cleanliness.
ChatGPT4o - ease of setup, safety, consistency, maintenance, portability.

Now, how is it possible to get 2 completely different optimal methods even after asking it like ‘are you sure? and explain the ‘why’’? So, I asked each version to explain to me why the discrepancy. Note I used the EXACT SAME WORDING in my inputs for both models.

GPT4 - model variations based on different datasets or updated algorithms it’s been trained on, inherent ambiguity, ensure the query is as specific as possible, specify criteria, decide what model is more advantageous for you, use external sources, refine inquiries.

GTP4o - variability in context interpretation in each model, even slightest differences in context can lead to different interpretations, training data and model variability, user preferences and feedback. It suggested to explicitly state preferences to provide more targeted consistent response. Such as in this scenario, to state specific concern - safety, ease of use, intensity, etc. Combine methods and consult external sources.

It helped me understand the variability and lack of consistency in each model better, but still left me pondering the reliability of each.

EricGT · July 9, 2024, 2:07pm

Instead of asking the LLM to explain why which can result in a hallucination, ask it for a reference to the source which can more easily be confirmed if it is real or a hallucination, e.g.

what is the most effective optimal method to diffuse essential oil during the day while I’m working at my desk. Please provide references back to source material.

althea_ko · July 12, 2024, 1:10pm

I’ll try that next time. However, my point still remains.

Topic		Replies	Views
Different output generated for same prompt in chat mode and API mode using gpt-3.5-turbo API gpt-35-turbo , chatgpt , api	16	11098	December 18, 2023
Chatgpt API isn't good as it's website Prompting api , prompt	3	8083	January 11, 2024
Not getting best answer from GPT models GPT builders chatgpt , api , gpt	12	498	February 9, 2025
ChatGPT 4o model feedback after a few months of usage (coding) Feedback gpt-4	8	1556	December 3, 2024
I get different answers to the same request API gpt-4 , gpt-35-turbo , chatgpt , api	2	5361	December 8, 2023

ONE question = 2 completely DIFFERENT answers on 4 vs 4o

Related topics