Getting Intermittent failed Runs in Assistant API when generating images with code interpreter

I have deployed app using the beta assistant API to build graphs from an uploaded CSV to a thread. The assistant uses code interpreter to parse/graph the CSV. It uses matplotlib to generate these graphs and exports them as an image.

After deploying my changes, I noticed that runs have been failing somewhat frequently(~1/10 tries). The interesting thing is it appears code interpreter had run fine(it generated a valid image/graph) but the run step says it failed(with last_error set to {“code”: “server_error”,“message”: “Sorry, something went wrong.”}). This results in no message being generated(with the link to the image), leaving a bad user experience for my users.

I tried asking for the same graph in a separate thread to verify if it was an issue with the code generated and the run ran fine(the code it wrote to generate the graph was functionally identical).

I verified these errors didn’t occur during OpenAI incidents. My guess is this is a provisioning issue with the sandboxes code interpreter uses to run code.

Really appreciate all the hard work you guys have put into this new endpoint/code interpreter, the results we’ve been getting have been truly stunning! If you have any other questions for more context on the issue don’t hesitate to ask!

ThreadIds with failed runs: