Assistance Needed with File Processing Tracking in OpenAI GPT-4 API

  1. File Upload to OpenAI: I’m successfully uploading files to OpenAI using the /v1/files endpoint. This returns a file_id, which indicates that the file upload was successful. However, this doesn’t confirm whether the file has been fully processed and is ready to use.
  2. File Processing Status: My main issue is tracking the processing status of these uploaded files. The OpenAI API doesn’t seem to provide a direct method to check if a file is fully processed and ready for use.
  3. Use of file_id in Threads: I’m using the file_id in threads for text generation, but without knowing the file’s processing status, I run into uncertainties. If the file isn’t ready, it might lead to errors or unsatisfactory responses from the assistant.
  4. Current Approach: Currently, I’m introducing a delay between file upload and its usage in a thread, hoping the file will be processed in this time. I also have error handling for cases where the file might not be ready. However, this approach doesn’t guarantee that the file is processed.

I am seeking advice or solutions from anyone who has experience with handling file processing status in the OpenAI GPT-4 API, particularly for asynchronous file processing scenarios. Any insights or alternative approaches to ensure a file is ready for use in threads would be greatly appreciated.

Thank you!

I will admit I have not been using ‘huge’ files - but I have been using a lot of files and never ran into problems with it being ‘ready’. It seems do be more or less instantaneous. Did you run into problems before?
I run email processing where I upload all email attachments and then run a thread with those files attached.

I agree it would be nice to have a ‘formal’ status!

1 Like

Same problem, but it does not happen all the time, some days my Openai bot assistant uploads files, I query the content, I receive textual and graphical answers. Other days I receive a standard negative answer like for example “I cannot find any csv file, did you upload it?”. It is unpredictable.
According to the official documentation after uploading a file, it is necessary to attach it to the thread. It happens that it is probably the only way to “inform” the assistant that there is a file to take in consideration. After uploading the file Openai server sends the following (printed with PHP print_r):

OpenAI\Responses\Files\CreateResponse Object
(
[id] => file-0KD1uVuJMNHaC2Fd7mZw3CBQ
[object] => file
[bytes] => 189220
[createdAt] => 1706604908
[filename] => myname.csv
[purpose] => assistants
[status] => processed
[statusDetails] =>
[meta:OpenAI\Responses\Files\CreateResponse:private] => OpenAI\Responses\Meta\MetaInformation Object
(
[requestId] => f058fec0f51234281410e4b3d7vn80ed
[openai] => OpenAI\Responses\Meta\MetaInformationOpenAI Object
(
[model] =>
[organization] => user-f5qx9xvfkopf71xboa67dsrn
[version] => 2020-10-01
[processingMs] => 395
)
[requestLimit] =>
[tokenLimit] =>
)
)

you see that there is no information about the specific assistant.

After having uploaded the file then I attach the file Id to the thread (PHP code):

$res = $clientOpenai->threads()->messages()->create(
$threadId,
[ ‘role’ => ‘user’ , ‘content’ => $userMessage , ‘file_ids’ => [$fileId] ]
);
$run = $clientOpenai->threads()->runs()->create(threadId: $threadId, parameters: [ ‘assistant_id’ => $botOpenaiId ]);
$runId = $run->id;
do {
sleep(1);
$res = $clientOpenai->threads()->runs()->retrieve( threadId: $threadId, runId: $runId );
} while ($res->status !== ‘completed’);

Then I get an object that contains arrays and other objects.
By carefully analysing the object received as answer, I do not detect any anomaly. Therefore, I cannot think about other than “Beta” problems. I would like to detect other kind of problems that I can solve. Beta can be solved exclusively at Openai side.

Thanks for the response. Now just testing. I have been playing with gradio interface and somehow the interface made not sure is the appropriate. I seeing an inconsistency the way I manage the events, therefore the Files uploading process as well. I made a code just hard coded to upload a file and the custom assistant to analyze it. It worked out…I will try to re do my gradio options to less event fancy one to see files are managed correctly in the thread message. I will keep you posted.

1 important statement here, as I understood in the documentation the process is…you first upload the file, then with the file Id back from the response you can link it with the message thread. Am I right? Suggestions are welcome. Regards

Actually the first upload has been a headache. I solved by sending a message in the background first. I mean a message not transparent to the user, such as: “I am about to upload a file.”. Then Openai server responds with a thread Id that is attached to the upload.

interesting, do you have the code example. So you create another thread only for the file?

I do not create a thread for each file, I create a thread when the file upload is the first user action. For a running thread there is no need to create a new thread.
The example of code in PHP is the following:
$userMessage = ‘You are about to receive a file’;
if (strlen($lastThreadId) < 6) {
$res = $clientOpenai->threads()->createAndRun([
‘assistant_id’ => $botOpenaiId,
‘thread’ => [ ‘messages’ => [ [ ‘role’ => ‘user’ , ‘content’ => $userMessage ] ] ] ]
);
$runId = $res->id;
$threadId = $res->threadId;
} else {
$threadId = $lastThreadId;
$res = $clientOpenai->threads()->messages()->create(
$threadId,
[ ‘role’ => ‘user’ , ‘content’ => $userMessage ]
);
$run = $clientOpenai->threads()->runs()->create(threadId: $threadId, parameters: [ ‘assistant_id’ => $botOpenaiId ]);
$runId = $run->id;
}
do {
sleep(1);
$res = $clientOpenai->threads()->runs()->retrieve( threadId: $threadId, runId: $runId );
} while ($res->status !== ‘completed’);
sleep(1);
$fileRes = $clientOpenai->files()->upload([‘purpose’ => ‘assistants’,‘file’ => fopen($fileToUpload, ‘r’) ]);
$fileId = $fileRes->id;
sleep(1);
$userMessage = 'You received a file id = ’ . $fileId;
$res = $clientOpenai->threads()->messages()->create(
$threadId,
[ ‘role’ => ‘user’ , ‘content’ => $userMessage , ‘file_ids’ => [$fileId] ]
);
$run = $clientOpenai->threads()->runs()->create(threadId: $threadId, parameters: [ ‘assistant_id’ => $botOpenaiId ]);
$runId = $run->id;
do {
sleep(1);
$res = $clientOpenai->threads()->runs()->retrieve( threadId: $threadId, runId: $runId );
} while ($res->status !== ‘completed’);