First, I encountered lag, and then an API error occurred

razvan.i.savin · May 19, 2025, 2:14pm

Here are my logs:

2025-05-19 17:09:32,904 - INFO - event_handler.py - [_handle_run_created] Run 'run_GeouQ7Kxwq4KAWuKYf3ze8ix' created for thread 'thread_pg3RT23KONlxNsJuDxTm7c3i'.
2025-05-19 17:09:36,648 - INFO - event_handler.py - [_handle_run_queued] Run queued for thread 'thread_pg3RT23KONlxNsJuDxTm7c3i'.
2025-05-19 17:10:28,572 - ERROR - event_handler.py - [start_run] Error during run: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID req_9fabd5879874b0985105e99f395fd40d in your email.)
Traceback (most recent call last):
  File "/home/razvansavin/Proiecte/flexiai-toolsmith/flexiai/core/handlers/event_handler.py", line 73, in start_run
    async for event in run_stream:
    ...<4 lines>...
        self.event_dispatcher.dispatch(etype, event, thread_id)
  File "/home/razvansavin/miniconda3/envs/.conda_flexiai/lib/python3.13/site-packages/openai/_streaming.py", line 147, in __aiter__
    async for item in self._iterator:
        yield item
  File "/home/razvansavin/miniconda3/envs/.conda_flexiai/lib/python3.13/site-packages/openai/_streaming.py", line 193, in __stream__
    raise APIError(
    ...<3 lines>...
    )
openai.APIError: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID req_9fabd5879874b0985105e99f395fd40d in your email.)
2025-05-19 17:10:28,575 - DEBUG - _trace.py - response_closed.started
2025-05-19 17:10:28,575 - DEBUG - _trace.py - receive_response_body.failed exception=GeneratorExit()
2025-05-19 17:10:28,576 - DEBUG - _trace.py - response_closed.complete

_j · May 19, 2025, 11:48pm

First step is to write some code that will accurately report 500 status errors like you might have received, and have some error-handling logic that might retry a few times when it is not 404 or 429 types of “bad model ID” or “not paying your bill”…

https://platform.openai.com/docs/guides/error-codes

Then, what model is under discussion?

razvan.i.savin · May 20, 2025, 2:43am

I used gpt-4o-mini and still have $10.
EDIT: Tested again and I see improvements.

_j · May 20, 2025, 6:36am

You might see how portable your application is to gpt-4.1-mini. It outperforms except in wanting to write over 1500 tokens.

Here’s the performance of each right now, all models launched asyncio in parallel together. And it is night, between 7am London and midnight California.

1024 max tokens

Model	Trials	Avg Latency (s)	Avg Rate (tokens/s)
gpt-4o-mini	10	0.938	36.083
gpt-4.1-mini	10	0.749	72.579

512 max tokens

Model	Trials	Avg Latency (s)	Avg Rate (tokens/s)
gpt-4o-mini	3	0.704	49.785
gpt-4.1-mini	3	0.753	62.024

128 max tokens

Model	Trials	Avg Latency (s)	Avg Rate (tokens/s)
gpt-4o-mini	3	0.812	38.981
gpt-4.1-mini	3	0.682	59.107

(caching is broken by a varying nonce at token input 0..)

_j · May 20, 2025, 11:50am

And then, in five hours, the generation rate of models has reversed…nullifying a recommendation.

1024 max tokens before

Model	Trials	Avg Latency (s)	Avg Rate (tokens/s)
gpt-4o-mini	10	0.891	43.081
gpt-4.1-mini	10	0.683	65.232

1024 max tokens now

Model	Trials	Avg Latency (s)	Avg Rate (tokens/s)
gpt-4o-mini	10	0.951	51.417
gpt-4.1-mini	10	1.084	37.569

768 max tokens now

Model	Trials	Avg Latency (s)	Avg Rate (tokens/s)
gpt-4o-mini	10	0.759	53.049
gpt-4.1-mini	10	0.751	39.203

If curious, this blast of calls gets any cache disrupted with an initial inserted random system message that is “{session_id} is chat session ID.”, differing not just in content but in token length. The request is not large enough to receive a cached discount. However, my prior statistical distributions found that even when not discounted, there was difference correlated to cacheability.

top_p: 0.001 reflects the desire of any developer to control sampling, where departure from default in temperature or top_p also affects performance.

011
870694
192728562
318594459065
294490027466378
292691559810764014
069141475949951893187
306070117383162570272003
831484445272805486878781564
982814277251838824698011873434
954829967379650690909346590862592
458714204821044389627179228435846126
366273607463525488234825564357321280023
298319909803501411099723503965263403745793
640397000481629968082786311860899616432119908
925792740705724365379669505448589547656777982421
516486298580211071938307766531419472360332080576986
052517981526082022117191542895851280998512358126848251
070347151877136915331394882160523063605568533412750825181
117053800018121052190908638028822232695888423787844214209839
716
555525
207421707
937159959367
646461467424236
421723921309670050
719551001757333572050
332575342619161089739050
506833403456697150009272194
827810134041012003821518653439
428385452719398099113605210539633
914397146599788341390831456558868248
528052564130914569097753680305609806241
042107896578133647294886224949703719971186
341474072152270760139004577249271343872207661
658780600599666828391154711464019280033047961091
663061316446079529370032252339957196898969340713810
325533249746091983735767660503162821555290219584335185
652093775667321088542384668336368138646710677149314991137
247616427411582439843694436142634272198624744919597770713473

razvan.i.savin · May 20, 2025, 4:59pm

Thank you, Jay, you gave me some ideas
hug-hugs

_j · May 20, 2025, 5:06pm

Another idea: don’t pass your call through Assistants as a middleman - with its own delays that can vary. Multiple calls are required to set it up and run, with little benefit except for its internal tools and thread reuse, reducing the network transmission of historic messages.

Topic		Replies	Views
Assistant/Thread Model Stress Test: Concerning Results [See inside] API	18	415	January 15, 2025
Rate become slower over time (GPT 1o mini) API	7	143	February 15, 2025
Is api not working? I get the error run is active right from the second request to the specified thread_Id API api	13	3730	December 15, 2023
Runs randomly take > 30sec Bugs assistants-api	7	728	September 11, 2024
GPT-4o-2024–08–06 slower then previous version API gpt-4o	9	1094	January 7, 2025