It’s a simple GPT API call for “a helpful assistant” for a simple application (model=“gpt-4”). It’s been my experience that for off hours, latency isn’t obvious, meaning, decent user experience. However, at peak hours, like mid to late morning or mid afternoon, such API call would result in considerable latency, which seriously compromises user experience. And for this application, most users would use it at regular business hours, after all, it’s a business application, hence, the latency poses a problem.
Questions:
(1) How’s your experience with same or similar GPT API call?
(2) If many developers experience the same, I hope OpenAI would take an action on it. For most individual users, I guest, ChatGPT experience is pretty good, so, we definitely want our GPT API-based applications deliver a similar user experience.
(3) Any suggestion on using lesser model like “3.5 turbo” would be muted, we are not talking about demo application but production application. Paying the same fee, we definitely want to use the latest and stable model.
Edit: same experience with model=“gpt-4-turbo”