Gpt4o ocr mistakenly bolds math symbol uvw (mathbf)

I tried using gpt4o’s Vision API to recognize some mathematical literature and get LaTeX Markdown. In most cases, its accuracy reached a basically usable level. However, there is one particularly noticeable error: for u,v,w it repeatedly recognized them as bolded uvw thousands of times(Not correct even once).
image
For example, this formula was recognized as:


\[ \mathbf{v} \cdot \mathbf{w}_2 = 0 \]

But it should actually be:


\[ v \cdot w_2 = 0 \]

Although I can post-process and change all \mathbf{v} to v, I still hope the official team can improve this aspect.

1 Like

Add “avoid using bold font” to your prompt, here’s an example:

2 Likes

For me gpt-4o wouldn’t even have a look, instead, sending to python. Just a forum screenshot fixes its attitude though on a new PDF-sourced image.

1 Like

Some of it might be just random chance. Because, using the model in ChatGPT it produces the correct result,

Though, in the playground I get the boldfaced version.

Now…

It might not actually be a “mistake.”

(Hear me out, please!)

1

Typically vectors are typeset in bold.

From on Wikipedia,

For representing a vector, the common typographic convention is lower case, upright boldface type, as in v. The International Organization for Standardization (ISO) recommends either bold italic serif, as in \mathbf{v}, or non-bold italic serif accented by a right arrow, as in \displaystyle {\vec {v}}.

It could be that Omni is recognizing this as a dot-product between vectors and “correcting” the typesetting according to what it believes it should be.

But, more likely…

2

Your image does appear to be bolded.

I made a *non-*bolded version in \LaTeX which you can see here:

This is a much higher resolution version, but if I crop and scale yours and mine you can see yours is clearly in a heavy typeface.

output

When I provide the lighter typeface version to the gpt-4o model, it seems to get the result correct every time.

So, why does gpt-4-turbo not struggle with this? I don’t know, probably just because it is a larger model.

Anyway, I would spend at least a bit of time seeing if you can just get a better quality original source image before picking a fight with the model over this.

2 Likes

With my tests yesterday, today the GPT4o web version actually seems to recognize the above image more accurately…

I can’t get a clearer image, and personally, I think the provided scanned photo is clear enough for recognition.

I don’t believe \mathbf{v} and v are very similar; the difference between them is quite significant.
image
Although the scanned document does bold the v here, if a human were handling it, they wouldn’t mistakenly recognize it as bold in LaTeX (I’ve tried specialized LaTeX OCR tools like Mathpix, SimpleTex, and Pix2Tex).

I added “avoid using bold font for u, v, w” to the prompt, and the results improved somewhat (there were fewer bold fonts, but more LaTeX syntax errors, possibly because the model was more confused, insisting it was bold, resulting in similar invalid LaTeX syntax like \(\mathbf{ u ).

I’m not saying that it’s “seeing” your scanned image as being the result of using \mathbf. I’m saying that it’s “seeing” your scanned image appears to be in a heavy (read: bold) font face and is making a “best effort” to give you what it “thinks” you want.

When I use an image with a lighter-weight font, there is zero issue.

As for humans, there is the \bm command from the bm package which produces a heavier-weight italic serif font.