Performance issue with gpt-4-turbo-preview API

The gpt-4-turbo preview API demonstrates inconsistency in generating responses to prompts, leading to unreliable outputs. Additionally, I am experiencing delays in API calling, impacting workflow efficiency. What strategies should I employ to mitigate these challenges?

Hello! I can offer some variations in how you might presently be employing the OpenAI API, to give better impression in use.

API parameters

If you are using the gpt-4-turbo AI model and getting inconsistency between runs, with more unexpected conversation paths or word choices than are expected, then, alongside the API parameter "model":"gpt-4-turbo-preview", you can add another API parameter: "top_p":0.5,

The purpose of top-p is to constrain the AI’s output to only the most certain choices as it generates an output. It can be set as high as 0.99 and still effectively block some poor word selections.

Measuring time

Furthermore, if you experience delays in API calling that impact your application, it may be a good start to implement some logging of time, so you can see exactly when the API was invoked and when the response was terminated.

Use streaming responses

Furthermore, you can modify your chat completions API endpoint code to use the "stream":"True" parameter. This must be received as chunks and iterated over. By doing this, you can monitor the time it takes to receive the first tokens. If you’re waiting for over five seconds without a response, the client can be closed and you can try again.

I hope the techniques advance your application into the realm of success.

1 Like