Weird Error while finetuning

Actually, I’m performing the fine-tuning now with a test JSONL file I made. I’ll be using the GIT bash like @prafull.sharma since I can’t use CMD either (tries to make me open the file using another program)

1 Like

Wow, thanks! I’ll try to get this fix deployed ASAP

4 Likes

Glad to help! I was able to recreate the error in a couple posts down. Further research shows that there’s a special right quote character that contains byte ‘0x9d’ when decoded.

'“”'.encode()
b'\xe2\x80\x9c\xe2\x80\x9d' # 0x9d byte inside ”

b'\xe2\x80\x9c\xe2\x80\x9d'.decode("cp1252")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\iadmin\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 5: character maps to <undefined>

b'\xe2\x80\x9c\xe2\x80\x9d'.decode("UTF-8")
'“”'

I got the same error that @prafull.sharma did, so the special right quote is most likely the trouble character in this scenario. Specifying “UTF-8” when opening files should fix the error!

2 Likes

Thanks @DutytoDevelop .
Actually the jsonl file does get created when using gitbash. I checked the content as well, and it looked fine to me.

But while running the fine-tuning command (with git bash), I faced this error.

However, when I used the same jsonl file (which was created in git bash) to fine tune on an Ubuntu machine, it runs smoothly.

@prafull.sharma, I was able to create the JSONL file too and recreate the error you got by having the special right quote character in the JSONL file. The default file encoding in Ubuntu is UTF-8, while Windows default is Windows-1252 (“CP1252” in Python) which is why you were able to get success with using Ubuntu to fine-tune with your prepared JSONL file.

My test-finetune2_prepared.jsonl file looks like this exactly:

{"prompt":"1Finetune Windows Error Test with special quotes that contain byte 0x9d”””","completion":" 1Testing... Please ignore”””"}
{"prompt":"2Finetune Windows Error Test”””","completion":" 2Testing... Please ignore”””"}
{"prompt":"3Finetune Windows Error Test”””","completion":" 3Testing... Please ignore”””"}

Success!

I modified the cli.py file to include the encoding=“utf-8” parameter within the open() function and I was able to get past the error with the same JSONL file!!

3 Likes

I have had this type of error before related to string characters not being properly escaped. I run a python script to properly escape the strings before I convert to the JSONL format for GPT3 fine-tuning.

2 Likes

Well, now you won’t have to :slight_smile:

1 Like

Great job. Please share the process of how to modify it.
The way I handle it today is that every time before I use the preparation tool I am using Notepad++ to convert the file to UTF-8 and for the actual fine tuning I use the Pythonanywhere bash. The encoding issue is quite irritating.

2 Likes

Hello @NSY,

If you want to fix this right away, you will need to modify the cli.py file in the OpenAI module that you’ve installed. The changes we will make will allow you to open and work with files that may cause errors during fine-tuning and will not result in any other issues:

  1. Get the install path of Python. An easy way is to simply open Command Prompt or PowerShell and run this:
Python -c "import sys; print(sys.executable)" 
# For me, I'll get: C:\Users\iadmin\AppData\Local\Programs\Python\Python39\python.exe
  1. From that directory, enter .\Lib\site-packages\openai and you’ll find the ‘cli.py’ file that we’ll need to edit:
...\Python39\Lib\site-packages\openai\cli.py
  1. Open ‘cli.py’ in any file editor, preferably a file editor with the ability to find / search for specific text. You’ll search for “open(”. There are 3 instances in cli.py. For every instance, add the ‘encoding’ parameter to the open() function exactly as follows:
# Add: encoding="utf-8" to the open() function so that each instance now looks like this:
Line 204:             file=open(args.file,encoding="utf-8"),
Line 250:                     file=open(file,encoding="utf-8"), purpose="fine-tune"
Line 283:                     file=open(file,encoding="utf-8"),
  1. Save the modified ‘cli.py’ file and you should now be able to fine-tune with the previous files that would trigger errors!

Let me know if there are any issues so I can assist further!

2 Likes

Thanks a lot! I’ll try it out.

1 Like

Sounds good! If any issues come up, just reach back out on here and I’ll do my best to assist!

OMG, you just saved me a year in my life. You deserve heaven! (In many years to come, don’t worry). Thanks a lot for this.

2 Likes

Glad to hear that the fix works!

Thank you for saving my life! Worked perfectly. Hope the fix goes out soon.

1 Like

did you mean “args.file”, or just “file”?

1 Like

I left “args.file” on Line 204 the same

I had to change it to “file” to get it to work, I think. Odd.

1 Like

That is odd, I have the OpenAI module installed on multiple systems and they all have it as ‘args.file’ since the function is passed the ‘args’ parameter.

What are the function parameters in your version? If it works then that’s good, but definitely make sure your accessing the correct parameter to ensure the command executes properly

Thank you for the details Duty. I am experiencing the noted error and wondering if this fix is for me. My cli.py file looks very different, is this still required for windows users to do or has this been fixed in more recent patches?

Much appreciated.

Yes, same here (Python311). open already uses two arguments.
…with open(args.file, “rb”) as file_reader:…