ChatGPT API Very Slow at generating Responses

zitalu.odenigbo · June 13, 2023, 11:22pm

Is there any way to make my ChatGPT application generate quicker responses?

It takes about 30 seconds for it to generate a response, and I want to bring it down closer to the 10-15 second response time as I am building a conversation application. I find the responses from Snapchat’s “My AI” is about twice as fast as my application.

Currently, I’m using the GPT-4 API and limiting responses to 300 tokens. I heard that fine-tuning the AI makes it faster, but I wonder if there are any alternatives to this.

supershaneski · June 14, 2023, 1:44am

Not sure if you are already using but if not, try to enable streaming. It might make the response appear faster than waiting for the complete response.

mendynew · June 14, 2023, 2:51am

It seems streaming can not solve this problem, I tested with GPT4 and turbo model in two styles, the results are below:
|turbo stream=True 3.59s|stream=False 3.92s
|GPT-4 stream=True 20.55s |stream=False 20.91s

zitalu.odenigbo · June 14, 2023, 3:27am

I’ve never heard of streaming, does this work in Python? I should also mention that I expect detailed and accurate responses from GPT-4, will enabling this affect the quality of responses?

novaphil · June 14, 2023, 3:49am

Stream will make it seem faster since you get tokens as they are generated. Not a whole lot of options to make it actually faster. Simplify your prompt, use a less advanced model, maybe try Azure.

zitalu.odenigbo · June 14, 2023, 4:07am

Can streaming be achieved through Python or only Javascript?

supershaneski · June 14, 2023, 4:08am

It is part of the API

github.com

openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to stream completions\n",
    "\n",
    "By default, when you request a completion from the OpenAI, the entire completion is generated before being sent back in a single response.\n",
    "\n",
    "If you're generating long completions, waiting for the response can take many seconds.\n",
    "\n",
    "To get responses sooner, you can 'stream' the completion as it's being generated. This allows you to start printing or processing the beginning of the completion before the full completion is finished.\n",
    "\n",
    "To stream completions, set `stream=True` when calling the chat completions or completions endpoints. This will return an object that streams back the response as [data-only server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format). Extract chunks from the `delta` field rather than the `message` field.\n",
    "\n",
    "## Downsides\n",
    "\n",
    "Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. which has implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n",

This file has been truncated. show original

wolfgeppert · June 14, 2023, 5:07am

if your content (or context) allows to split (token length or tasks) you can multithread your user task and call parallel. I do that with up to 20 threads.

Topic		Replies	Views
Improve response time of GPT API gpt-4	1	1043	December 30, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	22436	November 9, 2023
Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? API gpt-35-turbo , api-speed	7	6577	December 19, 2023
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	2960	December 24, 2023
Why is OpenAI API gpt-4o slow to respond? API python , gpt-4o	3	549	February 25, 2025

ChatGPT API Very Slow at generating Responses

Related topics