When we try to connect with OpenAi using chat.completions.create, we have to create this with every response.
const response: any = await openAi.chat.completions.create({
model: OPENAI_MODEL,
store: false,
messages: conversationHistory,
stream: true,
stream_options: { include_usage: true },
});
This line alone takes anywhere between 500-800 ms which is huge in real time conversation usecases.
Is there a way to initialize response once and to reuse this, hence reducing the latency of OpenAI Chat Initialization?
Something like a socket :
const response: any = await openAi.chat.completions.create({
model: OPENAI_MODEL,
store: false,
messages: conversationHistory,
stream: true,
stream_options: { include_usage: true },
});
response.send(conversationHistory);
Anyone else has a solution to this?