I’m trying to build a story book for my daughter, but I’m having pretty major issues getting any sort of character consistency. My workflow with the API is:
- Generate character images
- Generate a storybook page using the character plus a scene description
Here are two images, with similar input prompts and characters but there are two major issues:
- The character generated does not bear any similarity to the input
- The art style changes from image to image (though examples below really only demonstrate #1)
{
"model": "gpt-4o",
"tools": [
{
"type": "image_generation",
"size": "1536x1024",
"quality": "high",
"output_format": "jpeg",
"background": "auto",
"moderation": "auto",
"input_fidelity": "high"
}
],
"tool_choice": { "type": "image_generation" },
"background": true,
"input": [
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "${page_prompt}"
},
{ "type": "input_image", "image_url": "data:image/jpeg;base64,...", "detail": "high" },
{ "type": "input_image", "image_url": "data:image/jpeg;base64,...", "detail": "high" }
]
}
]
}
page_prompt:
Your most important task is to maintain all facial features, skin tone, outfits and hairstyles from the input images, and use them per the scene descriptions. Adjust poses and expressions to suit the scene. Produce images consistent with input images and match the following Art Style: Soft watercolour with pencil outlines, warm pastel palette, gentle textures, cosy indoor light, expressive toddler-friendly faces. SCENE DESCRIPTION Medium close-up on the child detective opening a small notebook and holding a pencil. The tan dwarf hamster inside the clear travel ball looks up eagerly. Background shows the empty plate and the small silver fan on the counter. Soft afternoon light, slight vignette focus.
The following log from the screenshot shows the input images and output images:
Any clues or guidance on what can be done to help improve respect to the input_images?

