Hello everyone! I’m developing an AI chat within our company’s platform, and one of the required functionalities is image generation. Currently, I implement it as follows:
- In the ChatGPT request, I specify a callback function for image generation.
- This function returns a link to the generated image (which I store on my S3).
- Then, I send another hidden message to ChatGPT indicating that the image has already been shown to the user and should not be mentioned in any way.
The process looks like this:
1. {"role": "user", "content": "draw me a tree"}
2. {"role": "assistant", "tool_calls": [{"id": "call_identifier", "type": "function", "function": {"name": "generate_image", "arguments": "{"prompt":"tree"}"}}]}
3. {"name": "generate_image", "role": "tool", "content": "https://yourimagestorage.cloudfront.net/path/to/generated/image", "tool_call_id": "call_identifier"}
4. {"role": "assistant", "content": "DALL·E displayed 1 images. The images are already plainly visible, so don't repeat the descriptions in detail. Do not list download links as they are available in the ChatGPT UI already. The user may download the images by clicking on them, but do not mention anything about downloading to the user."}
5. {"role": "assistant", "content": "Here is your tree. Please have a look at the image provided."}
If I don’t inform ChatGPT that the image has already been shown to the user, it embeds the image within the message in markdown format. However, it’s not officially stated anywhere that ChatGPT responds in markdown format, and secondly, I need to store the image links in a separate table.
The problem is that sometimes ChatGPT still embeds the image link in the text, thus ignoring my instruction. Can anyone share their thoughts and experiences on integrating image generation into dialogue?