Hi!
Recently encountered a bug (or a misdesign?), where chat completely failed a task of mapping data from one set to another, even though both sets are known to it and the mapping rules are as well. It was extremely confusing since the task itself is trivial if not simpler.
Detailed description / repro steps:
Input data: a CSV file with 2000+ rows representing list of stocks and their info (ticker, price, etc), and there is a column which describes stock Sector Name using GICS classification (like Mining, Financial, etc).
Request: to add new column for Sector Name but in NAICS classification, and map each GICS sector name to NAICS sector name for each stock (row), then send the modified file back.
Expected output: the same CSV file, but with new row “NAICS Sector Name” where values are mapped from GICS to NAICS.
Even though chat knows exactly all the sector names and their descriptions in both classifications, and can very well map any of the sectors if asked explicitly (e.g. “please map this/that sector from GICS to NAICS”) - it fails miserably to map them all in a file.
I made a series of attempts to modify the request in many ways, to describe the task as precise as I could, but it failed miserably each time. There were no results even remotely close to a complete mapping.
The mistakes it made:
-
Left more than 1000 mapped as “Unknown”, when asked why a particular sector is not mapped - it would respond like “oh! I know it actually”, then the file would be re-generated with 999 mapped to “Unknown” remaining. Implying I need to ask it 999 times more to get the correct result.
-
When asked to not leave any "Unknown"s - would do it anyway and then insist that there are no “Unknown” in the file (which would be a blatant lie in case of a human)
-
Map sector names incorrectly (the best example I remember is mapping Iron Ore Mining to Medical Supplies), when asked to re-check the results - would either lie that everything is correct, or agree that there is just 1 mistake (and fix only that one, regardless of hundreds of other faulty mappings)
-
Would map a lot of stuff to “Unknown”, when pointed to a particular line as faulty mapping - would replace ALL other entries in the file to a correct answer for that one entry. (making 1999 other entries incorrect)
the list can go on, I spent several hours trying to make it work.
The most hilarious thing is that when I asked it to explain in technical details why would a model make such mistakes - it replied that it is a HUMAN FACTOR (I literally laughed). I better get back to hiring freelancers to do the same for 5 bucks, there were less human factor in that previously.