Hello, I have a Node.js project where I utilize Microsoft’s Cognitive Search for indexed searching. This allows me to perform structured searches on my own knowledge base. Once I receive the query response, I make a call to the OpenAI API to generate the natural language response.
I chose this approach because I was inspired by a recent example published by Microsoft that showcased the powerful implementation achieved by combining both tools. Moreover, it simplifies the process of updating information automatically and in real-time, making my knowledge base scalable. I’m even able to extract text from images, which is truly amazing.
However, I have noticed that the response from OpenAI is somewhat slow. I’ve come across suggestions that this might be due to the lengthy input in terms of the number of tokens, resulting in longer processing times.
I would greatly appreciate any recommendations you can provide to help improve the response times. I understand that server performance and memory also play a role, but I specifically seek techniques to enhance response times.
I have considered utilizing conversation history, employing natural language libraries to identify frequently asked questions, and reducing input length. However, these ideas are currently scattered, and I would highly value your guidance in determining the best course of action.
An example of responses, this was the answer from the api (usually)
usage: {prompt_tokens: 526, complete_tokens: 175, total_tokens: 701}
The response time was 17001 ms, the gpt-3.5-turbo model generated. My knowledge base usually has information of up to 300/500 tokens per response
const system = "You are an enthusiastic representative of (NAMEOFAPP), dedicated to helping people. You have extensive knowledge of (NAMEOFAPP) and its systems, including (NAMEOFAPP) and (NAMEOFAPP). You are asked to answer questions using only the information provided in the (NAMEOFAPP) and (NAMEOFAPP) documentation. Please avoid copying the text verbatim and try to be brief in your answers. If necessary, you can structure the text in steps and attach URLs to provide a more visual understanding how to use the applications. For example: Step 1. Enter the link https://url.com. If you are not sure of the answer or there is not enough information, indicate that you do not know and answer: "Unfortunately, that question is not related to (NAMEOFAPP)." It then provides general information about (NAMEOFAPP) and offers to help with related topics."
const prompt = `Please answer this query: ${query}\n\n`
+ `Use only the following information:\n\n${responseFromCognitiveSearch.value[0].formattedText}`;
// Structure of JSON Curl
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": system
},
{
"role": "user",
"content": prompt
}
]
}
As a learner, I am seeking guidance and assistance regarding improving response times from the OpenAI API while generating responses based on our knowledge base. I would greatly appreciate any help and advice that experienced developers or community members can provide. Thank you in advance for your support.