Correctly aligning images to commentary

i am trying to use open ai to tell a story of a sporting event with images and commentary

i have requested significant events to be created in conjunction with commentary
and unsignificant events created as padding i between the main ones

the idea is every event has a starting and ending word and they join up to match the commentary

this isn’t as easy as i would have hoped. the events don’t align correctly

is this something i can fix with prompts ?
ive been trying to feed the outputs back into chat gpt to create a new set of prompts
I don’t have any experience on training it with examples

i have chat gpt but im trying to get this to work in an API using gpt-4.1

i welcome some guidance on how best to tackle this

1 Like

I would do the following:

  1. Generate the story of the sporting event, using gpt-4.1-nano or writing by hand
  2. Break up the sporting event by each moment you’d like an image of (you can do this yourself, or ask AI to do this)
  3. Feed one moment at a time into gpt-image-1

To help with consistency:

  1. Consider adding details that describe the setting to help the moment look right. For example, all prompts about moments in the audience should include a description of the setting: is the audience outside, inside, etc?
  2. To make images even more consistent, you can feed in the previous image and tell the model to generate the new image in a way that matches the style and setting of the previous prompt, while also displaying the moment you want to show. For example, you generated an image of the athletes at the starting line. The next moment is right after they start running. Your prompt could be “Create an image of the athletes beginning their run on the track. The previous image has been included”.

Best of luck!