A lot of users have been complaining about the degradation of the GPT-4 but most of them didn’t present concrete prompt/response examples to support their claims. Well, I understand we don’t have the previous state of the GPT-4 to compare with. By the way, I suggest OpenAI publish the release notes that document the changes in detail so that people won’t speculate about them.
Without that info, we cannot compare the GPT-4 releases vertically. I suggest we compare GPT-4 horizontally with other models.
Here is my first comparison. I am pleasantly stunned by Mistral’s ability to solve this math problem, which is presented by OpenAI here, without resorting to any kinds of code interpreter concept. In a way, I think Mistral’s language ability is stronger than that of GPT-4, at least on this particular problem. How do you think?