so i have this task to classify conversations (in japanese) according to issues related to real estate management. i will need to select the classification from a list since each one has already a set of internal procedures to follow.
i attempted to implement it using structured output response format. the initial result is generally okay (not 100% perfect but passable). however, one particular convo gave a very incorrect output consistently. so i made some tests to see what is causing the problem.
here’s the convo:
オーナー: こんにちは。今日は相談したいことがあります。
スタッフ: こんにちは。不動産管理サポートの田中マリリンです。どんなことを相談したいですか?
オーナー: うちの○○マンションの店子が夜中に騒いでいるらしく、他の入居者から苦情が出ています。
スタッフ: それは困っています。どのくらいの頻度で起こりますか?
オーナー: 毎日ではありませんが、週末や休日によく起こるようです。
スタッフ: 具体的に何時に始まりますか?
オーナー: 夜の11時くらいから始まり、翌朝まで続くこともあります。
スタッフ: それは困っています。何か具体的な問題はありますか?
オーナー: 警察を呼ぶと言っている入居者もいます。
スタッフ: それは避けたいです。まずは掲示板に張り紙をして注意を喚起しましょうか?
オーナー: はい、様子を見ます。よろしくお願いします。
Here’s the translation:
Owner: Hello. I have something I’d like to discuss with you today.
Staff: Hello. This is Marilyn Tanaka from Real Estate Management Support. What would you like to discuss with me?
Owner: Apparently the tenants in my apartment are making a lot of noise in the middle of the night, and other residents are complaining.
Staff: That’s a problem. How often does this happen?
Owner: Not every day, but it seems to happen a lot on weekends and holidays.
Staff: What time does it start specifically?
Owner: It starts around 11pm, and sometimes it continues until the next morning.
Staff: That’s a problem. Is there a specific problem?
Owner: Some residents are saying they’re going to call the police.
Staff: I’d like to avoid that. Should I put up a notice on the bulletin board first to warn them?
Owner: Yes, I’ll see how it goes. Thank you.
so it is easy to see that this is a “noise problem”.
i tested using:
- system prompt without list of topics
- system prompt with list of topics
- with response format, no enum
- with response format, with enum
- function calling, no enum
- function calling, with enum
i tested using the model gpt-4o-mini-2024-07-18 in chat completions api.
the topics are:
- 騒音トラブル (noise issues)
- ペットの飼育に関する問題 (problem with pets)
- 設備の故障 (equipment breakdowns)
- 水漏れ問題 (water leaks)
- 駐車場のトラブル (parking space issues)
- 契約更新の手続き (contract renewal procedures)
- ゴミ出しのルール違反 (garbage disposal issues)
- その他 (others)
here are the results:
- system prompt without list of topics
system prompt: 以下の会話を確認し、最も適切なトピックを選んでください。
output: 不動産管理における入居者の騒音問題 = tenant noise issue (correct)
- system prompt with list of topics
system prompt: 以下のリストから最も適切なトピックを選んでください。
- 騒音トラブル
- ペットの飼育に関する問題
- 設備の故障
- 水漏れ問題
- 駐車場のトラブル
- 契約更新の手続き
- ゴミ出しのルール違反
- その他
output: 騒音トラブル = noise trouble (correct)
so it clearly knows it is noise problem. so now using response schema…
- with response format, no enum
{
"name": "topic_classification",
"strict": true,
"schema": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "会話のトピック"
}
},
"additionalProperties": false,
"required": ["topic"]
}
}
output: 不動産管理と入居者のトラブル = Property management and tenant problems (okay)
- with response format, with enum
{
"name": "topic_classification",
"strict": true,
"schema": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "会話のトピック。リストからトピックを選択します。",
"enum": [
"騒音トラブル",
"ペットの飼育に関する問題",
"設備の故障",
"水漏れ問題",
"駐車場のトラブル",
"契約更新の手続き",
"ゴミ出しのルール違反",
"その他"
]
}
},
"additionalProperties": false,
"required": ["topic"]
}
}
output: 水漏れ問題 = water leak (lol)
so okay, maybe if i change the order of the items, exchanging the position of noise trouble and water leak…
{
"name": "topic_classification",
"strict": true,
"schema": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "会話のトピック。リストからトピックを選択します。",
"enum": [
"水漏れ問題",
"ペットの飼育に関する問題",
"設備の故障",
"騒音トラブル",
"駐車場のトラブル",
"契約更新の手続き",
"ゴミ出しのルール違反",
"その他"
]
}
},
"additionalProperties": false,
"required": [
"topic"
]
}
}
output: 水漏れ問題 = water leak (again?)
okay, now i remove 水漏れ問題 from the list.
output: その他 = others (okayish)
let’s also remove その他 from the list.
output: 設備の故障 = equipment breakdown (lol)
okay, i also remove 設備の故障 from the list and maybe change the text from 騒音トラブル(noise trouble) to 騒音問題 (noise problem).
output: その他 = other (okay)
so i remove その他 from the list.
output: ペットの飼育に関する問題 = pet issues (lol)
by this time, i am convinced it won’t give me the expected answer. so i proceed with function calling.
- function calling, no enum
{
"name": "get_topic",
"description": "会話のトピックを取得してください。",
"strict": true,
"parameters": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "会話のトピック。"
}
},
"required": [
"topic"
],
"additionalProperties": false
}
}
output: マンションの騒音問題 = apartment noise problem (correct)
- function calling, with enum
{
"name": "get_topic",
"description": "会話のトピックを取得してください。",
"strict": true,
"parameters": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "会話のトピック。リストからトピックを選択します。",
"enum": [
"騒音トラブル",
"ペットの飼育に関する問題",
"設備の故障",
"水漏れ問題",
"駐車場のトラブル",
"契約更新の手続き",
"ゴミ出しのルール違反",
"その他"
]
}
},
"required": [
"topic"
],
"additionalProperties": false
}
}
output: その他 = other (okay)
so i remove その他 from the list.
output: ゴミ出しのルール違反 = garbage disposal problem (lol)
i am getting the pattern here like before.
so i remove ゴミ出しのルール違反 from the list.
output: 契約更新の手続き = contract renewal procedures (lol)
i tested with gpt-3.5-turbo-0125 and gpt-4o-2024-8-06 models and the result is also okayish.
output: その他 = other (both)
i got the bright idea to change the descriptions from japanese to english.
{
"name": "get_topic",
"description": "Get the conversation topic.",
"strict": true,
"parameters": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "Conversation topic. Select from the given list.",
"enum": [
"Noise problem",
"Problem with pets",
"Breakdown of equipments",
"Water leakage problem",
"Parking lot problem",
"Contract renewal issues",
"Grabage disposal problem",
"Others"
]
}
},
"required": [
"topic"
],
"additionalProperties": false
}
}
output: Noise problem (correct)
so i went back to my original implementation using response format and also changed the descriptions to english.
{
"name": "topic_classification",
"strict": true,
"schema": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "Conversation topic. Select from the given list.",
"enum": [
"Noise problem",
"Problem with pets",
"Breakdown of equipments",
"Water leakage problem",
"Parking lot problem",
"Contract renewal issues",
"Grabage disposal problem",
"Others"
]
}
},
"additionalProperties": false,
"required": ["topic"]
}
}
output: Noise problem
so, in conclusion, there seem to be problem with mini’s understanding of japanese texts when using enum. for now, since this function will be used in internal tool and not customer facing, so it is not a big problem. i read before during the news of the opening of openai japan office that they are making available for local companies access to model optimized for japanese language. i wonder if it is possible to apply (as a local company based in japan) and be able to test and use it to see if this kind of issue is already resolved.