Parse image to text with gpt-4o with ChatGpt UI and OpenAI chat.completions.create endpoint - Very Different Results

I am testing some OCR image-to-text parsing with GPT-4o using both the ChatGPT UI and the OpenAI chat.completions.create endpoint. I have a few questions I would like to get your help and input on.

  1. I am trying to understand why the ChatGPT UI is performing much better at extracting information from images correctly. When I use the same GPT-4o with the OpenAI chat.completions.create endpoint, I encounter many errors and random pieces of information that are not present in the image.

  2. My assumption is that this discrepancy is related to the parameters of chat.completions.create, such as frequency_penalty, temperature, top_p, and max_tokens. Is there documentation where I can find out the settings that the ChatGPT UI uses to interact with GPT-4o?

  3. Is GPT-4o the right pick for OCR type of tasks?

4 Likes

BUMP Any news or update on this? Identical problem for me with 4o-mini. Only limited text parsing output via the API while the ChatGPT UI parses all the text from the image

Do you guys have any sample images?

I struggled with this and eventually found the API sample code I lifted from docs had max-tokens set to 300 (at the bottom of the json). I changed to 4095 and received the full response.

1 Like