Assistant struggling reading image, whilst chat completion is right 99% of the time

Hi all!

A bit of back story…

I’m currently coding up a solution that will allow me to upload an image of a paper document with the data I am expecting to see on it. The idea is that GPT will be able to tell me whether the data matches or not. I’m having success so far with gpt-4o and chat completions. I’m defining the system topic and the function it uses in my code and as I said it’s working quite well. I then came across Assistants which I’d never seen before. I did some reading up on it and created one in the assistants playground. I quickly realised I could define my system topic and functions here, meaning I wouldn’t need to do it in the code. Unfortunately, upon testing the assistant I created, it is totally unable to read the most basic information from the documents I upload.

So for comparison, if I start a new chat in the playground (not assistant), upload a document and send a text prompt saying “What is the name on the document?”. I get the right answer every time. Whereas if I start a new chat with the assistant, upload the document and then ask the same question. It gives me different names every time that aren’t remotely close. I also tried creating a new assistant with no real context to see if that made a difference - it didn’t.

So, I guess my question is, why is the default chat using gpt-4o able to read the name right nearly every single time. Yet a default assistant using gpt-4o can’t even get close? It seems to read the rest of the information okay, but it just totally fumbles the name every time. I’d even argue it’s clearer than most other text on the image.

Thanks! :slight_smile:

Hi @aaron.murphy,

Welcome to the dev forum.

I wasn’t able to reproduce this.

Image functionality is working on the assistants with gpt-4o in my recent test.

Hi, thanks for responding.

It seems to read the rest of the information okay, but it just totally fumbles the name every time. I’d even argue it’s clearer than most other text on the image. Unfortunately, I can’t share the image. But when I give it to a chat playground it will read it with ease every single time. Yet the assistant doesn’t even come close. It almost looks like it’s guessing as the names are that far off.