Hello OpenAI Image Generation Team,
I would like to formally report a consistent and critical limitation I have observed in the GPT image generation system regarding identity preservation in multi-person scenarios.
Through repeated testing, I confirmed that single-person image enhancement and upscaling work reliably. When one individual is present, facial structure, proportions, and overall visual identity are generally preserved with high accuracy.
However, two major failure cases repeatedly occur when multiple people are involved.
First, when upscaling or enhancing an existing group photo containing multiple individuals, the system fails to preserve each person’s original facial structure and appearance. Faces subtly but clearly change, resulting in individuals who no longer resemble the original subjects. This happens even when prompts explicitly demand strict identity preservation and prohibit beautification or alteration.
Second, when attempting to generate a group photo by combining multiple individual reference images into a single shared scene (for example, recreating a missed group selfie), the system does not preserve the identity of each uploaded person. Facial shapes, proportions, and defining features drift toward averaged or hallucinated results, despite clear reference images being provided for every subject.
This indicates that the current identity-conditioning pipeline appears optimized for single-subject scenarios and breaks down when multiple identity anchors are introduced simultaneously.
This is not a niche edge case. Real-world use cases include friends forgetting to take group photos, families restoring old images, professionals enhancing team photos, and users attempting to preserve shared social memories. In all of these cases, identity accuracy is more important than visual stylization.
The core issue is not image quality or realism, but multi-person identity consistency.
I strongly encourage consideration of improvements such as independent identity locking per subject, reduced identity averaging across multiple references, and dedicated evaluation metrics for group-photo identity preservation.
Solving this issue would significantly improve trust and real-world usability for GPT’s image-to-image workflows.
Thank you for your continued work and for considering this feedback.