I am testing the API Quick Start code provided by OpenAI, and the following situation occurred. (Of course, the payment is proceeding normally.)
- When running the Python code locally, I could confirm better results with the GPT-4 model compared to the GPT-3.5 model.
- When I uploaded the same Python code on a web server and applied it online (using Microsoft Azure virtual server, and the same code from GitHub provided in the Quick Start), the results were the same as GPT-3.5 even though I applied the GPT-4 model. (However, the log shows a response indicating that the GPT-4 model was used.) It was not like this from the beginning, but this phenomenon started to occur at some point during the more than two weeks of testing for development.
- Another problem is that the error message about the usage limit comes with the same criteria as GPT-4. However, the results are the same as GPT-3.5 when tested locally.
How can this phenomenon be explained? (I only took the Quick Start sample code and changed a little bit of the front end for testing…)