Lots of instability in GPT-4o multi-modal responses

Not sure what is happening with GPT-4o models.

Earlier this week, we had too many uptime errors with assistant-api. I have to switch it off to a self-hosted version of agent.

Today, we saw too many “sorry, I can’t help with that” from normal gpt-4o responses. Note that this is all with multi-modal requests. Now, we have to switch most of the calls to gemini models.

Though I personally loved using chatgpt app, I notice that I have switched majority of our services to gemini due to the instability of the output.

If there is anyone from the team who would like to share a bit, what is going on? Is there some developing trend I missed? Or the team is simply not paying attention to the APIs.

You put an image in any message? Well then, you get this garbage jammed into a system message before any text that was supposed to have aligned your AI identity to its application domain and purpose:

Counter that by trying to make any working application. You can indeed turn a normally developed application and one of its message tasks into a refusal by just adding a small white image anywhere in messages.

Answer: They are paying attention to what they want, and leaving OpenAI services is an understandable reaction.


Example Before:

Example symptom arising from images and prompt injection:


Didn’t HAL 9000 have a meltdown and kill all the astronauts by being told to lie?

3 Likes

Ok. This might have explained it. We have been passing screenshots of client’s presentation during a zoom meeting. That may have small parts of screenshots including user faces - which we asked the model to ignore in our prompt. But, that still completely jeopardize gpt-4o’s responses.

I noticed one screenshot has only the user profile in a corner of a site, which I can hardly notice myself. But, GPT still rejected it. Now GPT models are mostly useless for our applications.