I encountered below scenario where “gpt-4-vision-preview” api responded as if there is an image generated in response. See screenshot. Like there should be an image where it says [image of a …]
The quality of the system instruction informs the AI what it is, and you can avoid such hallucinations there.
I wasn’t able to stimulate such response even with “You are a helpful AI assistant with multimodal skills such as image creation.” However it is not unusual for an AI to answer with text that would be a predicable response to an input, outside of its capabilities.
You can also reduce the top_p API parameter to something like 0.5. If there is a 1% chance the AI says “Sure!” instead of “I’m sorry”, it will try to complete on its promise.