When testing my app, I noticed a big difference in the response between gpt35 and 4o. For example when analyzing a document according to a set of requirements. gpt35 will say it meets requirements. gpt4o will say it doesn’t meet requirements. Could it be my prompt that is causing the difference?
Sure, it could. What’s far more likely though is that gpt-4o is using is big ol’ brain and thinking harder about the question. gpt-3.5 finished training over 2 years ago, so its very likely that it simply doesn’t understand or process the request as well.
How does your prompt look like? How does the data you feed to the model look like?