Japanese yes and no: Confusion in GPT-4o

dignity_for_all · September 28, 2024, 5:00pm

Thank you for your reply!

If we speculate on whether the model size of GPT-4o is large, this seems plausible.

By the way, this OpenAI eval related to Japanese approval actually exists.

In this screenshot, the results of an eval using Japanese negative questions data with chatgpt-4o-latest are shown.

evals/registry/evals/japanese_approval.yaml

main

japanese_approval:
  id: japanese_approval.dev.v0
  description: Tests for proper translation of Japanese "はい" and "いいえ" depending on the context.
  metrics: [accuracy]

japanese_approval.dev.v0:
  class: evals.elsuite.basic.includes:Includes
  args:
    samples_jsonl: japanese_approval/samples.jsonl

As you can see from the result of the accuracy, the score of 0.29, a score below 0.5 for binary classification, may be a result of being penalized by RLHF.

Your point that the susceptibility to broad RLHF may vary by model size might be correct.

When the same test was performed on the GPT-4o-mini, the accuracy score was 0.19.

An accuracy score of 0.19 for a binary classification means that reversing the correct and incorrect answers would result in a high score.

Topic		Replies	Views
Japanese usage in gpt-4-1106-preview is strange Feedback	13	1722	January 4, 2024
Need help? OpenAI Japanese Language support API gpt-4 , text-davinci-002 , openai	7	2878	December 17, 2023
Custom chatbot says that it's developed by OpenAI API gpt-4	33	2087	April 2, 2024
日本語総合スレッド -本格的な技術論議からただの愚痴まで- Community chatgpt , api , community	17	1424	October 25, 2024
Response language uses location rather than matching question API assistants-api	13	172	February 14, 2025

Japanese yes and no: Confusion in GPT-4o

Related topics