When uploading files to the Assistants API, we must assign them a “purpose” at time of upload. The issue I’m having is when the user subsequently asks to use the files for multiple different purposes, results are very unreliable / unhelpful.
Issues below assume an assistant configured with file_search
enabled, code_interpreter
enabled, and vision support (i.e. using a 4o model)
Scenario 1:
- User uploads 2 images, and we assign a purpose of
vision
to them - The user asks questions about them… works great for this use case.
- However, if they subsequently ask to create a .zip containing those two files, the assistant thinks it can do so via code interpreter… but subsequently fails in strange ways when it tries to do it. I’ve seen it either refuse (“I can’t seem to locate the files”) or generates a zip with random programmatically-generated files created based on the prior description of the image (e.g. image of a solid red square when the provided image was described as being something red, etc.)
Scenario 2:
- User uploads a PDF, and we assign a purpose of
file_search
- The user asks questions about the doc contents… works great for this use case.
- However, if they then ask to extract all the images from that PDF, it tries to do so using code_interpreter (and theoretically could, if its purpose was different), but fails… as, like before, it can’t actually locate the file in the code_interpreter sandbox
I understand the underlying issue is that the purposes determine where the files are stored internally and how they are managed within the Assistant… but my question is: what is the recommended way to handle these sorts of scenarios?
- Is there a way to assign multiple purposes to the same file?
- Should we pre-emptively upload every file multiple times to cover each potential purpose?
- Is there some way to make the assistant less… confused… about what files it has access to within different tools?
- Is there a way to identify when the user’s prompt is trying to utilize a file for a different purpose and then re-upload the file or re-assign its purpose after-the-fact?
Eager to hear any suggestions, or if anyone has worked through something similar.
Thanks!!