Overview: I’m using GPT to generate a visual summary of what a player sees in a scene in a game.
The output is as follows: Rikon stands at the entrance of the Hall of Clockwork Wonders, a complex of towering spires and intricate clockwork mechanisms. The hall is a marvel of steampunk technology, with a variety of exhibits, from automatons to flying machines to steam-powered robots. He can see an automaton standing at 2m tall, a steam-powered robot standing at 1.5m tall, a flying machine at 2m tall, a clockwork table at 5m tall, and a steam-powered generator at 4m tall. Further in the hall, he can make out a clockwork clock at 5m tall and a magical artifact and magical mirror at 1m tall. To the side, Rikon can see an automaton workshop at 4m tall and a steam-powered elevator at 3m tall. He takes a deep breath as he prepares to enter the warehouse and retrieve the artifacts.
Problem: For some reason, even though it was GPT that generated the output, the image model rejects it, saying that the text may not be allowed.
Workaround: I do have a workaround that may be okay, but more expensive. I created a loop that catches this error, and when it occurs, the description is fed into a new prompt which asks GPT to rephrase it. That seems to work after a few tries, most of the time anyway. But it’s still weird. What exactly is being caught?!