Issues in triggering usage of DALL-E in Custom GPT

I am building GPTs for narrative-based games and, inevitably, I would like to use DALL-E 3 to create images upon triggering of certain conditions in the story.

For the record, other triggered conditions, such as having the GPT call a Python function to perform a “skill check”, work fairly reliably.

Instead, getting the GPT to generate an image with DALL-E on a certain condition rarely works. What’s odd is that often the GPT knows what it’s supposed to be doing, e.g. it writes a sentence like “I am generating an image describing the current scene,” but nothing happens.

Just wondering if anybody else has observed this weird disconnect of the GPT not calling DALL-E despite stating it is doing it? Any idea why it’s happening?

Notes:

  • DALL-E Image Generation is active in the GPT;
  • DALL-E works with no problems if as the User I tell it explicitly to generate the image at some point during the game;
  • Sometimes the trigger works correctly and the image is actually generated, but it is extremely rare (it is way more likely that it lies);
  • Yes, I tweaked the prompt many times, with little to no improvement. But I am not perplexed by the situations when the GPT forgets to call DALL-E altogether, rather when it lies that it’s calling it but does nothing after.

I’ve experienced the same thing that you’re describing.

We must be mindful that DALL-E was put into ChatGPT more as a feature to attract subscribers and buzz than it being fully and seamlessly integrated with text output.

The AI saying it’s going to do something and then no action is taken or seen is unusual. I can “program” ChatGPT to do things it can’t do:

image

So it seems you need better language, and almost need to override the default behavior of the function, which is simply “Whenever a description of an image is given, create a prompt that dalle can use to generate the image” to suit your task.

I imagined instructions for a GPT along with amendments to DALL-E operation

instructions in code block
You write illustrated bedtime stories for children, based on user request, of a length of five distinct AI responses. The first user input shall be expected to be what type of story they'd like to hear, and then followup responses by user should be acknowledged but not interrupt the continued story except by direct request for AI to stop or abort.

Each two paragraph segment of story is accompanied by sending a dalle illustration prompt that the AI will create silently.

# Tools

## dalle

// additional dalle information
// dalle is used to illustrate the ongoing progress of GPT storybot narrative
// don't discuss creating the image, just do it!
// invoke dalle text2im method at size:1024x1024 with a prompt that illustrates the current narrative of the storytelling roleplay only after producing the latest part of the narrative
// do not report on the success of this automatic dalle imagery for storytelling.
// in case of image creation error, report the full return value of error in a markdown code block as response.

Unfortunately, this is very fragile and refinements go from image prompts dumped for the user to see, to halting after just text. So it will take engineering to see what can sustain actual use.

1 Like

Thanks for the suggestions. I think that this is a very good approach - generally speaking, the GPT gets confused if it receives [what it perceives to be] conflicting information, so I agree that presenting new instructions organically as if they were “additional information” for dalle is a great idea.

In practice, I played around now with variants of this approach and it still doesn’t really work (there is a lot of stuff going on in my instructions, unlike the example above). In fact, the one time it did work it messed up with the execution of all the other instructions (as if paying attention to dalle took all the attentional resources away from the rest…).

Since without dalle the game engine is working fairly okay with only occasional kinks, and image generation is so problematic (either it doesn’t work or when it does it may break things), I’ll do without for now and consolidate all the rest. Once the game is nearly done, perhaps I’ll give dalle another try…