Chat Completion API super slow and hanging

I am using the latest version of the async openai python client.

As you can see below in the trace of my calls, the API calls are extremly slow. Sometimes they hang indefinitiely. I am tier 1 but the RPM and TPM are way under the hard limits.
I have this issue with both gpt-4-1106-preview and gpt-3.5-turbo-1106.

My code is:

async def call_to_llm_async(system_message: str, messages: List[str], return_json: bool = False) -> Union[str, Dict]:
    all_messages = [
        {'role': 'system', 'content': system_message},
        *[
            {'role': ('user' if i % 2 == 0 else 'assistant'), 'content': message}
            for i, message in enumerate(messages)
        ]
    ]
    model = 'gpt-4-1106-preview'
    from random import randint
    x = randint(0, 1000000)
    print(f'start completion {x} {datetime.datetime.utcnow().isoformat()}')
    completion = await openai_async_client.chat.completions.create(
        model=model,
        messages=all_messages,
        response_format={'type': 'json_object' if return_json else 'text'}
    )
    content = completion.choices[0].message.content
    tokens = num_tokens_from_string(f'{all_messages}\n{content}', model=model)
    print(f'end   completion {x} {datetime.datetime.utcnow().isoformat()} - {tokens}')
    if return_json:
        try:
            result = json.loads(content)
        except json.JSONDecodeError:
            print(content)
            raise
    else:
        result = content
    return result

The format of the log below is:
start|end completion task_id datetime - tokens(input+output)

start completion 692959 2023-12-11T21:37:01.073218
start completion 174451 2023-12-11T21:37:01.131440
start completion 42366 2023-12-11T21:37:01.220825
start completion 622334 2023-12-11T21:37:01.278182
start completion 109161 2023-12-11T21:37:01.322227
start completion 666597 2023-12-11T21:37:01.331944
start completion 305363 2023-12-11T21:37:01.342785
start completion 128124 2023-12-11T21:37:01.350892
start completion 275284 2023-12-11T21:37:01.393148
start completion 618638 2023-12-11T21:37:01.633575
end   completion 275284 2023-12-11T21:38:08.203531 - 3354
end   completion 128124 2023-12-11T21:38:27.111886 - 3040
end   completion 692959 2023-12-11T21:38:39.072215 - 3357
end   completion 109161 2023-12-11T21:38:54.443224 - 3308
end   completion 666597 2023-12-11T21:39:17.491550 - 3357
end   completion 174451 2023-12-11T21:39:34.002941 - 3291
end   completion 618638 2023-12-11T21:39:44.970294 - 3182
end   completion 305363 2023-12-11T21:39:57.987723 - 3190
end   completion 622334 2023-12-11T21:40:08.626620 - 4041
end   completion 42366 2023-12-11T21:40:32.624114 - 4160

It seems like you are not asynchronously gathering the results, not sure what your calling code looks like but you could try this (gpt-4 generated)

import asyncio
from typing import List, Union, Dict
import json
import datetime
# Import other necessary modules here

async def call_to_llm_async(system_message: str, messages: List[str], return_json: bool = False) -> Union[str, Dict]:
    # ... [rest of your existing function here] ...

async def main():
    # Example system message
    system_message = "Your system message here"

    # Example list of user messages for different calls
    list_of_user_messages = [
        ["User message set 1"], 
        ["User message set 2"], 
        ["User message set 3"]
    ]

    # Create tasks for each set of messages
    tasks = [call_to_llm_async(system_message, messages) for messages in list_of_user_messages]

    # Run tasks concurrently
    results = await asyncio.gather(*tasks)

    # Print or process the results
    for result in results:
        print(result)

# Run the main function
asyncio.run(main())

Here is my code - i do gather the results as soon as they are available with asyncio.as_completed

async def process_data(self, data_input: DataInput, limit_amount: int = 30) -> List[ProcessedData]:
    """Execute operations on input data and save results"""
    dataset = self.query_data_source(data_input)

    processing_tasks = []
    for item in dataset[:min(limit_amount, len(dataset))]:
        try:
            file_name, data_file = self.fetch_data_item(item['fileUrl'])
        except requests.RequestException as error_request:
            print(str(error_request))
            continue
        processing_task = asyncio.create_task(ProcessedData.from_data_file(self.current_user, file_name, data_file, None, item))
        processing_tasks.append(processing_task)

    processed_data: List[ProcessedData] = []
    for task in asyncio.as_completed(processing_tasks):
        try:
            result_data = await task
        except (CustomErrorType1, CustomErrorType2) as error_type:
            print(str(error_type))
            continue
        processed_data.append(result_data)
    return processed_data

Your results are arriving between 7 and 15 seconds after each other sequentially, so something is not right somewhere. I have async systems running in production environments at the moment and they are not suffering from this.

It is possible that Tier 1 stacks calls like this and the higher Tiers do not, that is my only explanation other than an issue in your receiving side.

2 Likes

Oh, I feel your pain! My plugins seem to have a mind of their own too. Hanging, printing BB code, making a total mess—it’s like they’re having a party without an invitation. Solidarity in plugin struggles! If only they came with a user manual for users and themselves. :sweat_smile::robot:

On a more serious note, it sounds like you’re dealing with some challenging plugin issues with the async OpenAI Python client. Slow API calls, indefinite hangs, and tier 1 blues—I hear you. Have you tried any wizardry, or are you hoping for a magical fix? :man_mage::sparkles: Feel free to share more details, and maybe the collective plugin wisdom here can conjure up a solution!

@Foxalabs
I checked with a simple task.

    async def task():
        from uuid import uuid4
        from random import randint
        x = uuid4()
        time = randint(5, 30)
        print('START ', x, time)
        await asyncio.sleep(time)
        print('END ', x, time)

As you see below in the trace, the gathering of the results happens asynchronously without a problem. The problem relies on OpenAI API:

START  372ddb6a-2f00-4a7f-9ff7-e572c0400eaa 12
START  435a40c8-1f4d-4d58-aca0-eb429e91bcae 17
START  d9d63125-3e2d-4fd8-b576-3004f62dc4aa 10
START  8f16c113-7753-455f-a69a-ea9a2c9dff7b 24
START  95d44840-980f-4510-aede-228fa5f84293 9
START  1cda8e33-c0d3-4d6c-9129-98ba187992ec 5
START  5c7e5913-e228-47e1-9e41-a158431d0e12 6
START  c7153185-6f08-4102-9b78-50f87f4f4594 22
START  be1f5ad7-2323-45a0-880f-7b61539a7b25 14
START  122dd570-f5e7-4694-8a75-316a4e9172ae 23
END  1cda8e33-c0d3-4d6c-9129-98ba187992ec 5
END  5c7e5913-e228-47e1-9e41-a158431d0e12 6
END  95d44840-980f-4510-aede-228fa5f84293 9
END  d9d63125-3e2d-4fd8-b576-3004f62dc4aa 10
END  372ddb6a-2f00-4a7f-9ff7-e572c0400eaa 12
END  be1f5ad7-2323-45a0-880f-7b61539a7b25 14
END  435a40c8-1f4d-4d58-aca0-eb429e91bcae 17
END  c7153185-6f08-4102-9b78-50f87f4f4594 22
END  122dd570-f5e7-4694-8a75-316a4e9172ae 23
END  8f16c113-7753-455f-a69a-ea9a2c9dff7b 24

This topic should have been resolved much earlier.

OpenAI decided to worsen the performance of accounts that hadn’t paid significant money, starting with the prepaid accounts.

The forum was left in the dark by OpenAI as the deluge of issues that were specific to organizations came in.

Then the tier system, was slowly revealed, with evasive language like “may” and “latency”.

Plainly:
You are throttled and token rate slowed or passed off to lower compute performance until you have paid OpenAI more than $50 on API if in the prepay plan - at least the threshold for now.

Oh wow what is going on in that company

(Actually, what is going on in the tech sector - it feels like everybody is losing their minds)

That’s the only explanation I see honestly. Speed depends on the api key… (org).