Hi,
When adding a message to a thread, attachments can be added to the message. The attachments include file_id and tools. The tools accept file_search or code_interpreter as type.
Before the last update, the file was added without determining the tool, so the assistant will decide if the file should be sent to the code interpreter or to a function.
I am trying to make an assistant that includes the code interpreter and a function for image analysis, so before the last update I was include the file or the image with the message and the assistant decides whether to call the code interpreter or the image analyzing function.
now there are tool types for the included file, so I am wondering if the assistant will work as previously in case the tools parameter is omitted.
The file isn’t really “sent” by the AI. It’s already in the tool.
If you attach a file to code interpreter, it is already made available in the Python environment’s mount point and a message is placed with those file names. Code analysis could be done; “histogram of pixel brightness”.
If you attach to file search, a new vector store is created if the thread doesn’t have one, and the file becomes something the AI can search. The file contents should not be always present like an attachment may imply. Images are not an input.
In the case of using an external vision tool to analyze images, how you implement that is up to you, but the files would stay on your server to be sent to a vision model. It could be automatic message injection with some analysis already done, or additional_instructions, like:
<<the user has placed images [{“hostel.jpg”: “A narrow brick building with a doorway and window”},] into the analyze_images tool.>>
Thank you @_j and @nikunj
I was included the files and images with the message using the file_id parameter, if the assistant decided that he will call the code interpreter then he will use these files, and if he decided that he will call the “image analyzing” function he will ignore them and I will take care of sending them to the vision model.
Now, if I include the files with the message, I can define the tools or I can omit the related parameter, if I defined the tool then I am forced to choose code interpreter or file search, if I choose the code interpreter then I’m afraid this will affect the assistant decision towards calling the function, so I am wondering what will happen if I just omit the parameter of attachments tool.
It is really painful and costly to go back and do all the needed experiments and repair everything necessary to make sure everything will go well, so I hope someone has ideas about this.
It seems, rather, this is the type of decision you can place in your user interface, and leave it up to the intelligent user. Asking for an image to be analyzed with vision is a very distinct contrast to making it available to be processed with code the AI writes. Someone will know if they want to count the dogs, extract what model camera was used, or resize and sharpen the image blindly.
Invoking a code interpreter session also costs $0.03 for an hour of uptime, contrasted to analyzing a detail:low picture at $0.001/upload, where you’ll spend your $0.03 getting 900 tokens of words about the picture.
I went through a painful attempt to modify my workflows and see how things would work, and I found that nikunj was right:
So I add any files and images to the message’s attachments with the code interpreter as a tool, and the assistant decides if he will call the function or the code interpreter.
In the first attempts I set the function to get only the prompt as a parameter, but the assistant sent the ID of an image instead of the prompt!
So I added the IDs as a second parameter (array), and the assistant succeeded in passing them, but in most cases, if there is more than one image, the assistant will call the function by the number of the images and pass one ID per call, I tried to edit the descriptions in the function specs but I failed to make it always pass all the IDs via one call. However, my workflow can handle this since the calls are an array in the first place.