Seeking Advises on Optimizing openAI API Calls

harishchitluri · November 16, 2023, 6:00pm

My Application has integrated with open AI for API calls for the responses to the user prompts. . The API calls are taking longer time to generate a response. I need a best feasible solution to achieve faster responses within milli seconds.

fra_ab · November 16, 2023, 6:45pm

One solution could be to switch to GPT 3.5 Turbo.

If this is not feasible because of quality, you could try fine-tuning a GPT 3.5 Turbo using GPT 4 to give you the fine-tuning dataset.

Another thing you can do is add a semantic caching layer between your server and OpenAI, and check if that query has already been asked and just fetch the answer from your semantic cache.

Foxalabs · November 16, 2023, 6:46pm

The only way you will get millisecond responses from an LLM reliably without a dedicated instance will be to host a small open source model on a high performance GPU and have only you as the client. You could take advantage of a dedicated instance and have it make commercial sense if your needs are 450M tokens per day for greater.

Topic		Replies	Views
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	21525	November 9, 2023
Performance issue with gpt-4-turbo-preview API API gpt-4 , api , performance	1	1232	February 17, 2024
Too long response time on API gpt-3.5-turbo model API	3	1598	December 25, 2023
How to reduce OpenAI response time? API	13	17525	December 13, 2023
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	2943	December 24, 2023

Seeking Advises on Optimizing openAI API Calls

Related topics