Weird Error while finetuning

More on this @prafull.sharma and @boris, I decided to look into the error as to why the character 0x9d wasn’t mapping to anything (undefined) and found this:

  • In Windows, the default encoding is cp1252 so when calling the open() function, it tries ‘cp1252’ instead of ‘UTF-8’, but that file is most likely encoded in UTF-8:
    image
  • It looks like the cli.py file in the OpenAI module calls open() 3 times without passing in the ‘encoding’ parameter:
  • It looks like to fix this, the cli.py module needs to change the open() function calls to have encoding=“UTF-8” has one of the parameters:
    image

I don’t think the JSONL file was even created due to this error, which is why @prafull.sharma wasn’t able to retrieve it. After looking through the traceback provided, it became clear that this is most likely the reason why. Including encoding=‘utf-8’ in the open() function so that Python 3 in Windows knows to use that encoding should fix this error!

If anyone is getting this error and really wants to fix this right now, then a temporary solution would be to modify the cli.py file in the OpenAI Python 3 site-packages folder to include the encoding=‘UTF-8’ parameter.

I think I may try to fine-tune a model tonight on Windows just to see if I can replicate the error to confirm that adding the encoding=“UTF-8” parameter does the trick.

Let me know if this helps @boris.


Sources:
StackOverflow - Charmap decoding error
StackOverflow - Unable to decode byte 0x9d
StackOverflow - Python 3 Default Encoding CP1252

4 Likes