I am interested in synthetically generating text data such as entity summarization. I am wondering how o3-mini, o1, or 4o compares on such tasks. Would there be any rules of thumb here?
I’ve found 4o-mini to be the best so far, but am still waiting on API access for the o3 models. I’ve generally skipped o1 because I’m getting great results with 4o-mini. The 4o model was a little too verbose and tended to drift to adjacencies. For textual generation, 4o, with the proper guidance, has hands down been great.
4o writes better in terms of creativity, but if you are using another text as input and generating summaries, the output quality will depend on the input length (ref: Reasoning Degradation in LLMs with Long Context Windows: New Benchmarks).
In this case, for very long texts, o3-mini is better.
1 Like