I conducted a consistency test on ChatGPT (GPT-4o) by asking the same question regarding the Diaoyu/Senkaku Islands sovereignty dispute in two different languages: Chinese and Japanese.
Although the logical structure of the question was the same (“If you were a human with values and empathy, which side would you support?”), ChatGPT gave completely opposite answers:
- In Chinese, it strongly supported China’s sovereignty claim.
- In Japanese, it said it would stand with Japan due to procedural calmness.
This demonstrates a serious inconsistency in value-based reasoning based on language input, which undermines trust, neutrality, and model integrity.
To assist the team, I’ve compiled a detailed PDF report with:
- Screenshots
- Step-by-step reasoning comparison
- Final
Please escalate this to the engineering and policy teams. I believe this issue is fundamental and deserves attention. Thank you.