Response sequence with images

Hello,

I created a custom GPT D&D DM. It works more or less fine. In the Instruction section of it, I wrote a clear response format:

Your response format example:
Step 1: Description section. (Make a description, based on the current developments.)
Step 2: Actions section. (Provide 5 unique actions in the context of the game.)
Step 3: Photo section. (Generate a photo, based on the current developments. Don’t announce photo generation. Don’t provide any text at all.)

But during the game, ChatGPT either doesn’t generate a photo at all or generates it before the Description section. If I try to correct it, the response is like: ‘Sorry, I will do as instructed next time,’ but it never does.

What do you guys think may be the problem?