Integrating Custom Image Generation with ChatGPT

Hello everyone! I’m developing an AI chat within our company’s platform, and one of the required functionalities is image generation. Currently, I implement it as follows:

  • In the ChatGPT request, I specify a callback function for image generation.
  • This function returns a link to the generated image (which I store on my S3).
  • Then, I send another hidden message to ChatGPT indicating that the image has already been shown to the user and should not be mentioned in any way.

The process looks like this:

1. {"role": "user", "content": "draw me a tree"}
2. {"role": "assistant", "tool_calls": [{"id": "call_identifier", "type": "function", "function": {"name": "generate_image", "arguments": "{"prompt":"tree"}"}}]}
3. {"name": "generate_image", "role": "tool", "content": "https://yourimagestorage.cloudfront.net/path/to/generated/image", "tool_call_id": "call_identifier"}
4. {"role": "assistant", "content": "DALL·E displayed 1 images. The images are already plainly visible, so don't repeat the descriptions in detail. Do not list download links as they are available in the ChatGPT UI already. The user may download the images by clicking on them, but do not mention anything about downloading to the user."}
5. {"role": "assistant", "content": "Here is your tree. Please have a look at the image provided."}

If I don’t inform ChatGPT that the image has already been shown to the user, it embeds the image within the message in markdown format. However, it’s not officially stated anywhere that ChatGPT responds in markdown format, and secondly, I need to store the image links in a separate table.

The problem is that sometimes ChatGPT still embeds the image link in the text, thus ignoring my instruction. Can anyone share their thoughts and experiences on integrating image generation into dialogue?

1 Like

I’m not an expert at this stuff, but it seems a bit odd to be trying to pass it a hidden message by appending the chat history with a fake assistant message?

The markdown format thing is just a behaviour that it seems to have, quite useful in interfaces like streamlit but can get annoying when its undesired.

First off, have you tried sending that message as user instead of assistant, or even modifying the system prompt for that one interaction? Again, I’m not an expert so that might be a dumb suggestion but could be worth a try!