Thread unable to access newer assistant files

Hi everyone,

I’ve been running into a problem I can’t seem to fix.

When I create an assistant, upload a file → create a thread = works and I can query my uploaded file

BUT if I then update my file to a newer version (new data file to read from), my existing thread is no longer able to read it.

So it looks like older threads aren’t able to access newer assistant files.

What I do: When a new data file is to be uploaded, I do delete the old one. Is this wrong? (just not to get a gigantic pile of files).

Anyone else ran into this problem?


I use PHP, code below

1. Creation of my assistant

 $assistant = OpenAI::assistants()->create([
      'instructions' => $instructionPrompt,
      'name' => $name,
      'tools' => [
          [
              'type' => 'code_interpreter',
          ],
          [
              'type' => 'retrieval',
          ]
      ],
      'model' => 'gpt-3.5-turbo-1106',
  ]);

2. Uploading my JSON data file & attach to assistant

$response = OpenAI::files()->upload([
      'purpose' => 'assistants',
      'file' => $json,
  ]);

  if (!$response || !$response->id) throw new \Exception('Error [1]: could not upload file');

  $response = OpenAI::assistants()->files()->create($assistantId, [
      'file_id' => $response->id,
  ]);
  if (!$response || !$response->id) throw new \Exception('Error [2]: Could attach file to assistant

3. Adding mu message to the thread ( the fileId is a valid id, triple checked)

 $message = OpenAI::threads()->messages()->create(
    $threadId,
    [
        'role' => 'user',
        'content' => $message,
        'file_ids' => [$fileId],

    ],
);

Hi @nickQ , and welcome to the community!

It really depends on the use-case. Are these data files incremental information over previous versions?

If so, then yes, as much as possible you should avoid piling files as you will be charged on storage.

Now, to your original issue. This is expected. And the solution is only one of two, if you want to delete the old files:

  1. You can add a file not only to an assistant, but to a thread. So once you’ve uploaded the new file to your assistant, you could also add the file id to any active threads.
  2. When an old thread becomes active (user sends a new message) what you do is copy all of that thread’s history into a new thread, that will have access to the newest files. Effectively copying the context but getting access to the new files.

Cheers!

Hi @jorgeintegrait !

Thanks for your response!

Those are incremental changes indeed, so I don’t want to pile it up.

How can you add a file_id to a active thread? Not seeing it in the docs… through the modify function?

I was already doing it using the messages create endpoint, but no luck.

Yes, it would be adding a message that has no content (or empty content) but has the fileId (that you get when you upload the file to the /files endpoint)

https://platform.openai.com/docs/api-reference/messages/createMessage

You don’t need to create a run or modify the thread (sadly modifying the thread is only good for metadata).

The file must’ve been set to be used by assistants. I’ve worked with this before, should work. Let me know what issue you see if it doesn’t, if you’re unsure of how to do this 100% you can use the playground and see the APIs that are called when you add a file as a message on the right side.

Good luck!

Thanks for you suggestion @jorgeintegrait. Unfortunately I can’t seem to succeed in getting it to work … From the moment I upload a new file and delete the old one, my thread is unable to access the data and just throws errors. Making it useless from the moment a new file is present…

This is how I’m doing it now

Step 1. Create my model

$assistant = OpenAI::assistants()->create([
  'instructions' => $instructionPrompt,
  'name' => $name,
  'tools' => [
      [
          'type' => 'code_interpreter',
      ]
  ],
  'model' => 'gpt-3.5-turbo-0125',
]);

Step 2. Upload my initial data file

$file = OpenAI::files()->upload([
      'purpose' => 'assistants',
      'file' => $json,
  ]);

Step 3. Attach my file to my assistant

OpenAI::assistants()->files()->create($assistantId, [
      'file_id' => $fileId,
  ]);

Up to this point it works

Next, a new file is available

Step 4. Delete the current file from assistant and directory

$response = OpenAI::assistants()->files()->delete(
    assistantId: $assistantId,
    fileId: $fileId
);
// delete from file directory
$response = OpenAI::files()->delete($fileId);

Step 5. Upload my new file and attach it to my assistant

$file = OpenAI::files()->upload([
      'purpose' => 'assistants',
      'file' => $json,
  ]);
OpenAI::assistants()->files()->create($assistantId, [
      'file_id' => $fileId,
  ]);

Step 6. Update each existing thread with an empty message and fileId

OpenAI::threads()->messages()->create(
      $thread->openai_assistant_thread_id,
      [
          'role' => 'user',
          'content' => '',
          'file_ids' => [$fileId],
      ],
  );

BREAKS

This is the point where the thread breaks and no data can be accessed any more

Can you spot what I’m doing wrong?

Hm. I’ll admit PHP isn’t my tool of choice but I cannot see anything that is obviously wrong. I would need to run it and debug it. Is there any specific error message you see?

If there is an internal error, I assume the culprit is the file deletion. If that is the case, there is only one more suggestion I can think of:

Instead of your current Step 5, you could create a new thread with all the message history of the original thread. i.e.

  1. Delete previous file version.
  2. Upload new file, attach to assistant.
  3. List all the messages in original thread
  4. Create a new thread with all the same messages (user & assistant) | DO NOT RUN
  5. When a new user message comes, add it to the new thread, not the old one.

You could also just wait for new messages to come to a thread to update it, to avoid mass API calls to update all your active threads.

At that point, the new thread will have the latest file, and you wouldn’t need to add it as an empty message.

As a final note, these type of limitations is why we use custom RAG stacks in our real production projects, we play with the assistants API for quick prototypes, but it is a tad unreliable / limiting at the moment.

Sorry that the first idea didn’t work, but the above definitely should (albeit it’s quite a bit of API calls)

Best of Luck, and happy building!