We are providing ChatGPT a text and a set of Qs which it needs to classify to ‘Q based on text’ or ‘Q out of text’. Noticed that 4o and o3-mini provide different results. Which model would you guys recommend for this Q classification task?
1 Like
i’d run each 100/500/1000x times and grade accuracy on control data. i presume o3-mini might be slightly better, but cost an order of magnitude more.
GPT-4o generally has better reasoning and contextual understanding, making it more reliable for classification tasks. However, if speed and cost are priorities, GPT-3.5 (o3-mini) might be sufficient. Testing both on your dataset is the best way to determine which aligns with your needs.
1 Like