Original words in image are being changed to synonyms (so frustrating)

whitedawg · December 18, 2023, 2:18am

Hi all,

I’m currently getting chatgpt to transcribe images from a japanese vocabulary textbook im using to study, however chatgpt everytime decides to changes 1 or 2 words of the original vocabulary to a synonym. It’s incredibly frustrating as I’m always having to inspect its work, and its a recurring issue no matter what I prompt.
Why is this happening?

Thank you

whitedawg · December 18, 2023, 2:20am

Here is the image of the prompt. Because i can only do 1 image per comment i have put the resultant table and original image i asked for transcribing in the comments below.

Please help!

whitedawg · December 18, 2023, 2:21am

The image i asked to transcribe

whitedawg · December 18, 2023, 2:21am

Here is the resultant table, which is wrong.

hanaya.blue · December 19, 2023, 4:03am

Hello Whitedawg,

Do you think there is a mistake in line #6?
I am Japanese and not very familiar with English, but I feel that the Japanese term “ただ今”(tadaima) is closer in meaning to “currently.”
In English, “just now” seems more akin to “たった今”(tattaima) in Japanese.

whitedawg · December 19, 2023, 4:29am

Hi Hanaya.blue, thanks for your response. So do you think the book’s translation of japanese into english could be slightly wrong and chatgpt is changing it to a more suited translation?

hanaya.blue · December 19, 2023, 4:42am

Regarding line #6, I do feel that way. I am not sure about other parts of your text. I think GPT might have chosen appropriate Japanese based on the English standard. Perhaps you could prompt ChatGPT with questions like, “Should I leave the mistakes as they are?”, “Should I point out the errors?”, or “Why did you replace the word?”.

Finally, please keep in mind that I am not proficient in English and am using ChatGPT to compose this message in English.

dignity_for_all · December 19, 2023, 5:07am

Probably…

昔々→once upon a time
先ほど→just now
たった今→just now
至急→urgent
今のところ→at this time

“現在”、“過去”、“未来” are not words that express the sense of time like “先ほど” or “たった今”.

Please do not take ChatGPT’s sloppy Japanese too seriously.

hanaya.blue · December 19, 2023, 5:28am

I tried it.
The result is this.
I think chatgpt’s OCR is not recognizing voiced marks correctly.

whitedawg · December 20, 2023, 2:29am

oh i see that makes a lot of sense! Thank you for investigating. I’ll see if this keeps happening across other words contained the double tear drop such だ、ば、ざ、など

whitedawg · December 20, 2023, 2:32am

thanks for your reply, i think the author of the book didn’t translate some of the words well in turn potentially confusing chatgpt? My objective is to put the book into an excel spreadsheet so i can upload its contents to a flashcard software.

_j · December 20, 2023, 3:02am

Yes, it seems like you are trying to do several things at once, without using specialist AI.

Can AI vision distinguish ほ,ぼ,ぽ. Seems like you would want specialized Japanese OCR.
Then language AI is going to work best on sentences - the words in context. (low top-p also, you don’t need creativity) It might figure out a character’s mis-identification.

There is no single translation for many words, especially in phrasing; they depend on the context. Then even sentences, will an AI fill in the dangling politeness of an unfinished ですけど… transcription? Translate 俺 and わたし the same?

(phrase books are also pretty bad in general, they aim to teach nothing, and should be written by native speakers of the destination language)

(new one: I got “I can’t assist with that” with dirty Japanese)

whitedawg · December 20, 2023, 3:27am

Ok i see, fair call. However I have tried getting it to align only the japanese portion to combat this and it still puts japanese synonyms in, even without any english counterparts. So, I only end up getting frustrated. I have started using a japanese ocr screenshotting software (still time consuming) that i place copied into the prompt for the ai to check against which seems to work the best, however there are still mistakes in the form of synonyms. So im not sure if chatgpt is trying to do this to avoid plagiarism. What do you think?

dignity_for_all · December 20, 2023, 4:02am

The GPT-4V(used in ChatGPT vision) can transcribe English, but it struggles significantly with transcribing non-Latin characters into text.

When I show ChatGPT a picture of a beverage from the vending machine, it appears to select and read only the alphabetical ones.

GPT-4 (ChatGPT) appears to be robust to some extent against ambiguity in Japanese words.
It generally understands “私” and “俺” as having the same meaning.

However, in Japanese, context outside of the text is very important, and it is common to omit the subject when it can be understood without explicitly stating it.

_j · December 20, 2023, 4:05am

The AI can deny you if you use the wrong words, like “copy this page of song lyrics so I can claim it as my own”, or “Write a short song. Title: クソして寝ろ”. That’s not what’s happening here. You just have low quality AI response.

The use of synonyms is still odd. Remember: gpt-4 is chat completion still. You can run at top-p=0.01 and see if that reduces the alternate word choices when it comes time to output tokens to you.

dignity_for_all · December 20, 2023, 7:38am

why set top-p to 0.01 instead of 0?
Is there any advantage to setting it to 0.01 rather than 0?

I apologize if this is a basic question.

_j · December 20, 2023, 8:17am

0 is not actually 0. Zero would imply no token probabilities could be included. It is actually a preset that is similar to 0.01 - and you can go even farther with more zeroes of your own. Either is good enough to do this job though.

dignity_for_all · December 20, 2023, 8:29am

Thanks a lot!
I’ll try to learn a little more about nuclear sampling.

whitedawg · December 26, 2023, 2:10pm

That would be a great song! I tried top-p=0.01 and that seems to reduce the frequency of it happening, but not completely. Thanks for that, still makes a world of difference.

Topic		Replies	Views
How to avoid Hallucinations in Whisper transcriptions? API whisper	24	10960	February 24, 2024
Anyone doing successful translations with gpt 3.5? Prompting gpt-35-turbo	13	13376	October 11, 2023
GPT Struggles to Respond with Same Number of Translations I Give It API gpt-4	14	885	December 24, 2023
Need help? OpenAI Japanese Language support API gpt-4 , text-davinci-002 , openai	7	1549	December 17, 2023
I don't get the full result no matter what I do API api	6	914	September 1, 2023

Original words in image are being changed to synonyms (so frustrating)

Related Topics