Classifying Qs based on text. Which model is best? 4o or o3-mini?

We are providing ChatGPT a text and a set of Qs which it needs to classify to ‘Q based on text’ or ‘Q out of text’. Noticed that 4o and o3-mini provide different results. Which model would you guys recommend for this Q classification task?

1 Like

i’d run each 100/500/1000x times and grade accuracy on control data. i presume o3-mini might be slightly better, but cost an order of magnitude more.

GPT-4o generally has better reasoning and contextual understanding, making it more reliable for classification tasks. However, if speed and cost are priorities, GPT-3.5 (o3-mini) might be sufficient. Testing both on your dataset is the best way to determine which aligns with your needs.

1 Like