I am using the latest version of the async openai python client.
As you can see below in the trace of my calls, the API calls are extremly slow. Sometimes they hang indefinitiely. I am tier 1 but the RPM and TPM are way under the hard limits.
I have this issue with both gpt-4-1106-preview and gpt-3.5-turbo-1106.
My code is:
async def call_to_llm_async(system_message: str, messages: List[str], return_json: bool = False) -> Union[str, Dict]:
all_messages = [
{'role': 'system', 'content': system_message},
*[
{'role': ('user' if i % 2 == 0 else 'assistant'), 'content': message}
for i, message in enumerate(messages)
]
]
model = 'gpt-4-1106-preview'
from random import randint
x = randint(0, 1000000)
print(f'start completion {x} {datetime.datetime.utcnow().isoformat()}')
completion = await openai_async_client.chat.completions.create(
model=model,
messages=all_messages,
response_format={'type': 'json_object' if return_json else 'text'}
)
content = completion.choices[0].message.content
tokens = num_tokens_from_string(f'{all_messages}\n{content}', model=model)
print(f'end completion {x} {datetime.datetime.utcnow().isoformat()} - {tokens}')
if return_json:
try:
result = json.loads(content)
except json.JSONDecodeError:
print(content)
raise
else:
result = content
return result
The format of the log below is: start|end completion task_id datetime - tokens(input+output)
start completion 692959 2023-12-11T21:37:01.073218
start completion 174451 2023-12-11T21:37:01.131440
start completion 42366 2023-12-11T21:37:01.220825
start completion 622334 2023-12-11T21:37:01.278182
start completion 109161 2023-12-11T21:37:01.322227
start completion 666597 2023-12-11T21:37:01.331944
start completion 305363 2023-12-11T21:37:01.342785
start completion 128124 2023-12-11T21:37:01.350892
start completion 275284 2023-12-11T21:37:01.393148
start completion 618638 2023-12-11T21:37:01.633575
end completion 275284 2023-12-11T21:38:08.203531 - 3354
end completion 128124 2023-12-11T21:38:27.111886 - 3040
end completion 692959 2023-12-11T21:38:39.072215 - 3357
end completion 109161 2023-12-11T21:38:54.443224 - 3308
end completion 666597 2023-12-11T21:39:17.491550 - 3357
end completion 174451 2023-12-11T21:39:34.002941 - 3291
end completion 618638 2023-12-11T21:39:44.970294 - 3182
end completion 305363 2023-12-11T21:39:57.987723 - 3190
end completion 622334 2023-12-11T21:40:08.626620 - 4041
end completion 42366 2023-12-11T21:40:32.624114 - 4160
It seems like you are not asynchronously gathering the results, not sure what your calling code looks like but you could try this (gpt-4 generated)
import asyncio
from typing import List, Union, Dict
import json
import datetime
# Import other necessary modules here
async def call_to_llm_async(system_message: str, messages: List[str], return_json: bool = False) -> Union[str, Dict]:
# ... [rest of your existing function here] ...
async def main():
# Example system message
system_message = "Your system message here"
# Example list of user messages for different calls
list_of_user_messages = [
["User message set 1"],
["User message set 2"],
["User message set 3"]
]
# Create tasks for each set of messages
tasks = [call_to_llm_async(system_message, messages) for messages in list_of_user_messages]
# Run tasks concurrently
results = await asyncio.gather(*tasks)
# Print or process the results
for result in results:
print(result)
# Run the main function
asyncio.run(main())
Your results are arriving between 7 and 15 seconds after each other sequentially, so something is not right somewhere. I have async systems running in production environments at the moment and they are not suffering from this.
It is possible that Tier 1 stacks calls like this and the higher Tiers do not, that is my only explanation other than an issue in your receiving side.
Oh, I feel your pain! My plugins seem to have a mind of their own too. Hanging, printing BB code, making a total mess—it’s like they’re having a party without an invitation. Solidarity in plugin struggles! If only they came with a user manual for users and themselves.
On a more serious note, it sounds like you’re dealing with some challenging plugin issues with the async OpenAI Python client. Slow API calls, indefinite hangs, and tier 1 blues—I hear you. Have you tried any wizardry, or are you hoping for a magical fix? Feel free to share more details, and maybe the collective plugin wisdom here can conjure up a solution!
This topic should have been resolved much earlier.
OpenAI decided to worsen the performance of accounts that hadn’t paid significant money, starting with the prepaid accounts.
The forum was left in the dark by OpenAI as the deluge of issues that were specific to organizations came in.
Then the tier system, was slowly revealed, with evasive language like “may” and “latency”.
Plainly:
You are throttled and token rate slowed or passed off to lower compute performance until you have paid OpenAI more than $50 on API if in the prepay plan - at least the threshold for now.