Why is OpenAI API gpt-4o slow to respond?

astnv31 · February 25, 2025, 8:30am

I’m currently working on a chatbot with FastAPI using OpenAI API.

I’m going to make a chatbot to review the business plan, and it takes me almost 20-30 seconds to respond.

I tried reducing and increasing max_tokens, but there was no significant change. If I lowered the model to gpt-3, the quality of the answer was too low.

The input prompt is long I know, but I can’t reduce it further. Is the original API response speed like this? Or is it recent?

Below is the code I’m currently working on, is there a way to improve the speed?

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import requests
import openai
import os
import asyncio
import time
from dotenv import load_dotenv

load_dotenv()

app = FastAPI()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
client = openai.OpenAI(api_key=OPENAI_API_KEY)
ACCESS_TOKEN = os.getenv("ACCESS_TOKEN")

class ChatRequest(BaseModel):
    message: str

async def get_business_plan_detail(parse_url, access_token=ACCESS_TOKEN, member_no=1000):
    parse_split = parse_url.split("/")
    bbi_no = parse_split[-1]

    url = "my_api_which_get_business_plan_document"
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "application/json",
    }
    data = {
        "bbiNo": int(bbi_no),
        "memberNo": member_no
    }

    start_time = time.time()
    response = await asyncio.to_thread(requests.post, url, headers=headers, json=data)
    end_time = time.time() 

    if response.status_code == 200:
        return response.json()
    else:
        return {"error": True, "status_code": response.status_code, "message": response.text}

async def analyze_business_plan(text):
    start_time = time.time() 

    response = await asyncio.to_thread(
    client.chat.completions.create,
    model="gpt-4o",
    response_format={"type": "text"},
    messages=[
        {"role": "system", "content": "You are an expert business consultant with deep market analysis expertise. Please provide a **highly detailed** and thorough review of the following business plan."},
        {"role": "user", "content": f"""
                You are an AI-based business consultant.  
                Please verify the market response of the following business plan and provide feedback in Korean ** as much detail as possible from the virtual customer's point of view.
                It's the most important thing to see from the customer's point of view.

                📌 **Request for detailed analysis**:
                1️⃣ **Product power and differentiators** → Deeply analyze strengths and complement points, compare them with existing markets, and explain functional differentiators
                2️⃣ **Customer Experience and Marketing** → Detailed Customer Persona Analysis, Key Considerations for Customer Purchase Journey
                3️⃣ **Market competitiveness analysis** → Compared to similar products in the current market, competitive advantage
                4️⃣ **Sustainability of business model** → Revenue model, B2C vs B2B strategy, long-term scalability
                5️⃣ **Market expansion potential and global expansion strategy** → Regional characteristics to consider when expanding overseas, localization strategy

                📄 **Contents of the business plan:**
                {text}
                """}
    ],
    max_tokens=4096, 
    temperature=0.7,
)


    end_time = time.time()  
    print(f"⏱️ Time to answer OpenAI Response: {end_time - start_time:.2f}sec") 

    return response.choices[0].message.content

@app.post("/chat")
async def chat(request: ChatRequest):
    total_start_time = time.time()

    user_message = request.message

    if not user_message:
        raise HTTPException(status_code=400, detail="Enter the message to chat.")

    # Pre-processing data if a particular URL pattern is included
    if "something-specific-url-pattern" in user_message:
        result = await get_business_plan_detail(user_message)

        if isinstance(result, dict) and "sections" in result:
            business_document = f"""
            Project Name : {result['sections'][0]['details']['bbiProjectName']}
            Team Name : {result['sections'][0]['details']['bbiTeamName']}
            Item Overview : {result['sections'][0]['details']['bbiItemOverview']}
            
            1. Item Overview
            Item Name : {result['sections'][1]['details']['bitItemName']}
            Key Features : {result['sections'][1]['details']['bitKeyFeatures']}
            Relevant Technologies : {result['sections'][1]['details'].get('bitRelevantTechnologies', 'N/A')}
            
            2. Problem Recognition
            Background Motivation : {result['sections'][2]['details']['bprBackgroundMotivation']}
            Purpose Needed : {result['sections'][2]['details']['bprPurposeNeed']}
            
            3. Feasibility
            Commercialization Strategy : {result['sections'][3]['details']['bfCommercializationStrategy']}
            Market Analysis : {result['sections'][3]['details']['bfMarketAnalysis']}
            
            4. Growth Strategy
            Growth Strategy : {result['sections'][4]['details']['bgsGrowthStrategy']}
            Market strategy : {result['sections'][4]['details'].get('bgsMarketStrategy', 'N/A')}
            """

            print("Pre-processed Document:\n", business_document)
            
            analysis = await analyze_business_plan(business_document)

            total_end_time = time.time()
            print(f"⏱️ Total Answer Response: {total_end_time - total_start_time:.2f}sec")

            return {
                "message": "📄 Finish Answer.",
                "analysis": analysis
            }

        else:
            return {"error": "Failed to get data of business plan document.", "details": result}

    return {"message": "🔍 Give me the Link of the Business Plan Document."}

# Run FastAPI
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=5000)

_j · February 25, 2025, 8:41am

There is no “gpt-3” model for you to use.

You seem to be needlessly using a response format parameter, and are passing the library a BaseModel. This necessitates setting up a strict response format on the API, taking several seconds. I would omit this, and you will also get higher-quality responses.

You can rotate through specific models gpt-4o-2024-11-20, gpt-4o-2024-08-06, gpt-4o-2024-05-13, and see if one provides faster responses at a particular time.

Sending no temperature parameter can be faster. max_tokens is not necessary; you set it higher than the responses typical of the AI anyway.

You can eliminate the use of the OpenAI SDK entirely to cut down the loading on someone else’s platform. Just make RESTful requests with a preinstalled library such as `requests`` to the API.

astnv31 · February 25, 2025, 9:03am

OMG When I modified the code with reference to advice, the response time was drastically reduced from 20-30s to 20s down, from 10s to 20s. THANK YOU SO MUCH… You’re a genius and my savior… really…

But is there any way I can improve response speed more than this? I was wondering if it’s possible to get responses down to 10s down.

_j · February 25, 2025, 9:23am

AI language models use a lot of computation for every “token” generated in a response. The longer it writes, the longer it takes.

You can use streaming, so you can at least see the AI response as it is being generated token-by-token, instead of waiting for it to be complete, for a better user experience.

Another caveat is that writing in Unicode and in Korean requires many more tokens to form a response of similar content to Latin scripts, just because of the way text is internally encoded.

If it is an automated task, you can send off several API calls at once.

Below is a quick summary table of each model’s average token rate - what I’m getting right now for a 256-token output, with two trials per model

Model	Avg Token Rate (tokens/s)
gpt-4o-2024-08-06	63.80
gpt-4o-2024-05-13	64.55
gpt-4o-2024-11-20	52.45
gpt-4o-mini	139.50

Topic		Replies	Views
ChatGPT API Very Slow at generating Responses API gpt-4 , api	8	5604	December 25, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	23979	November 9, 2023
GPT-3.5 Turbo API response is slow API	20	12562	November 11, 2023
ChatGPT API responses are very slow API	31	29873	December 12, 2023
（Ask for help）Question about API speed API api	2	500	September 21, 2023

Why is OpenAI API gpt-4o slow to respond?

Related topics