Assistants API - File "purpose" confusion

When uploading files to the Assistants API, we must assign them a “purpose” at time of upload. The issue I’m having is when the user subsequently asks to use the files for multiple different purposes, results are very unreliable / unhelpful.

Issues below assume an assistant configured with file_search enabled, code_interpreter enabled, and vision support (i.e. using a 4o model)

Scenario 1:

  1. User uploads 2 images, and we assign a purpose of vision to them
  2. The user asks questions about them… works great for this use case.
  3. However, if they subsequently ask to create a .zip containing those two files, the assistant thinks it can do so via code interpreter… but subsequently fails in strange ways when it tries to do it. I’ve seen it either refuse (“I can’t seem to locate the files”) or generates a zip with random programmatically-generated files created based on the prior description of the image (e.g. image of a solid red square when the provided image was described as being something red, etc.)

Scenario 2:

  1. User uploads a PDF, and we assign a purpose of file_search
  2. The user asks questions about the doc contents… works great for this use case.
  3. However, if they then ask to extract all the images from that PDF, it tries to do so using code_interpreter (and theoretically could, if its purpose was different), but fails… as, like before, it can’t actually locate the file in the code_interpreter sandbox

I understand the underlying issue is that the purposes determine where the files are stored internally and how they are managed within the Assistant… but my question is: what is the recommended way to handle these sorts of scenarios?

  • Is there a way to assign multiple purposes to the same file?
  • Should we pre-emptively upload every file multiple times to cover each potential purpose?
  • Is there some way to make the assistant less… confused… about what files it has access to within different tools?
  • Is there a way to identify when the user’s prompt is trying to utilize a file for a different purpose and then re-upload the file or re-assign its purpose after-the-fact?

Eager to hear any suggestions, or if anyone has worked through something similar.

Thanks!!

1 Like

When the AI is passed an image, it isn’t given any indication that the picture it is seeing is a “file” or an “upload”, unless you talk to it in such a manner.
You cannot prevent the user from confusing the AI, though. They might say, “I uploaded my screenshot.jpg, have a look…and then convert it to the best size PNG”.

You also can be meaning to supplement vision information with metadata, such as adding [text:“image filename 123.jpg”, image: picture, text:“image filename 456.jpg”, image: picture] as user message contents. Such interleaving can help the AI answer correctly about them, but could also give misinformation about their availability as python files.

So: A code interpreter session costs you. It shouldn’t cost any more to upload the files also as purpose:assistants and then attach them to the code interpreter tool. (A multi-purpose files endpoint would be useful but is not available.)

What I would do, regardless of vision attachment or not, is provide what ChatGPT has but isn’t automatically given to you in assistants, some additional instructions:

The python notebook environment tool currently has these read-only files provided by {user | developer}:
/mnt/data/using_api.txt
/mnt/data/interface.jpg
/mnt/data/screenshot.jpg
/mnt/data/pdf_image.png

That may reduce hallucinations of what can be done there.

1 Like

Thanks so much for the feedback, @_j . It helped me realize that I was conflating the file.purpose and attachment.tools and that ultimately led me to resolutions for both scenarios I listed above.

So, just to close the loop on the solution I landed on…

Scenario 2: Using a file for both file_search + code_interpreter

As was pointed out in the reply above, purpose: assistants covers both the code_interpreter and file_search tools, as it’s just a matter of specifying both of those tools when attaching the file to a message. (for some reason I also thought you could only pick one tool there, which was incorrect). Once I provided both tools with the attachment, everything worked perfectly: it used file search when appropriate, and accessing the file from within the code_interpreter’s sandbox when appropriate.

Scenario 1: Using a file for both vision + code_interpreter

This one was a bit more off the beaten path, but I figured I’d just try it… and it worked! If you upload an image with purpose: vision, you can then also attach it to a message specifying the code_interpreter tool, and it is then able to be manipulated and used from within the code_interpreter sandbox. So, scenario 1 specified above works: you can ask about the image contents, and it will answer using the vision capability but you can also ask it to resize/reformat/etc. and the code_interpreter is able to find the file and manipulate it.

It would seem that attaching it with the code_interpreter tool specified simply means it gets copied to the sandbox when the code interpreter session runs, regardless of the specified purpose. I’ve not seen anywhere that explicitly says this should be possible, but I also don’t see anywhere that says it shouldn’t be. And in my testing, it worked every time.

Additional Note:
Similar to the recommendation above, I also insert a message describing all the files that have been uploaded, including their fileId, a display name, and primary purpose and/or tool (to help address scenarios where the user might refer to the filename they used when uploading). This seems to help with confusion/hallucination around which file a user might be referring to.

Hopefully this is of some help to anyone looking to achieve the same use case.

1 Like