I’m trying to use the assistance API to generate images in the threads. So far I haven’t had any success with this. The only thing that I get out is a pretty good description of the image that I asked it to create, but no image whatsoever. I wanted to see if anyone else has been able to figure this out.
You can use an assistant to generate images but not as a tool like Code Interpreter but rather as a function that will pass a prompt for an image to the image generation endpoint and return the image back to the assistant’s run.
This is how I display the image file generated by the assistant:
messages = client.beta.threads.messages.list(thread_id)
for message in reversed(messages.data):
for message_content in message.content:
…
if hasattr(message_content, “image_file”):
file_id = message_content.image_file.file_id
resp = client.files.with_raw_response.retrieve_content(file_id)
if resp.status_code == 200:
image_data = BytesIO(resp.content)
img = Image.open(image_data)
…
So, if message.content has the attribute “image_file”, we can retrieve the id of the file generated by the assistant.
Does this mean every time the thread is viewed the image is download?
When you use assistants, you are typically retrieving past conversation in order to recall a chat and display it for a user’s past session.
It would be far more efficient to have a local database for conversation for this recall than one on a remote server that must have multiple connections to things not stored in a thread, like any state you wish to restore in code interpreter that only has a memory of one hour - or your images or other memory generated and stored by your tools.
If you were to use DALL-E to generate an image on demand from an assistants’ tool_call, and then return a message “the image was successfully generated and displayed” (as you would do in your UI as the AI can’t receive images), then you would need to store that image part in a separate database of binary contents linked to the thread at that message. Perhaps using the metadata feature to add an image database index that must be retrieved.
So it is only when your dalle tool is first called that you generate a DALL-E image with a DALL-E API call. You cannot download from anywhere later.