From what I understand, the “turbo” models are ablated (reduced-size) models, where they follow the coefficients of each neuron/layer/parameter backwards, and those that have very little impact on the output overall, are removed. That way, the model can be made to run using less memory, and thus run faster, while giving approximately the same output as the pre-ablation model.
I personally believe ‘yes.’ I also believe it better understands my prompts. While I haven’t conducted any official tests, based on my usage, GPT-4-Turbo performs as well as GPT-4. The responses from GPT-4 and GPT-4-Turbo are subtly different, but acceptable. For instance, when using GPT-4-Turbo, you might encounter ‘’'html or ‘’'json at the beginning, but you can remove that when streaming or after completion from the backend or frontend. So… No significant issue.
I have the same issue. I have the feeling that the gpt-4 turbo is less smart than the original gpt-4… I sometimes encounter oddities or a lack of ‘reflection’ from GPT-4 Turbo.