HTTP 400 TTS in connection with GPT4-Vision

I’ve built a small app with vision and audio. I load an image to get a description and then the description is transformed into an audio file. With some images works great and with other not at all. When isn’t working, I get the error attached. How can it be than with some images works and with other not? All images are the same format .jpg.

Some images, especially ones ressembling Captchas or images containing explicit content, trigger a refusal response from GPT-4 models with vision capabilities (Turbo, Vision and 4o). It can occasionally happen with false positives too.

Maybe certain images are being flagged by the vision model, which then returns an error response, that your program then attempts to send to the TTS model despite it not being a correct json?