There is a thing about the text model, it gets quite bold with semantic concepts that are competing and has to settle for a textual definition. That is not the case for the visual semantic. If GPT learns to rely these fuzzy language semantics to visuals, then rewriting prompts would not be an issue.
In my prompts I try to delegate textual semantic and visual semantic concepts for the image generations.