Hey all, I’m trying to train a fine-tune using davinci. I’ve done this before with smaller models, but this time around it’s not working. I prepared a dataset using the tool and it generated a jsonl. When I submit it, it sits for a minute or two and I ultimately get this:
I generated a new one from the webUI, in case that was the problem, and it didn’t help, unfortunately. I do have a credit card linked and raised the account limit of $350, so I don’t think it’s that either.
@sps, thanks for staying active here and being a part of the “signal” which keeps the signal-to-noise ratio a bit higher here.
The SNR is so low these days, you are one of the few posters here who post developer “signal” without the noise, the “hand waving” or the “cheerleading” (or “complaining”) noise we see here a lot, always posting referenced technical-facts in the spirit of a true coder / software engineer / developer.
Thank you!
We really need to increase the SNR here, as the “signal” for OpenAI API coders is getting lost in the noise. Your posts @sps are much appreciated especially since they do not have a commercial “looking for business” undertone, which is really nice to see here.
I used api files.list and got a list of my past fine-tunes, but this most recent job doesn’t appear in the list, so I can’t get a status. I also can’t look it up based on fine-tune id, because the process fails before it reports an id.
I found the verbose mode and re-ran the command in verbose mode (which I should have thought of!). Here’s a more detailed error message:
[2023-02-18 10:25:27,749] message='Request to OpenAI API' method=get path=https://api.openai.com/v1/files/dataset4_prepared.jsonl
[2023-02-18 10:25:28,086] message='OpenAI API response' path=https://api.openai.com/v1/files/dataset4_prepared.jsonl processing_ms=120 response_code=404
[2023-02-18 10:25:28,086] error_code=None error_message='No such File object: dataset4_prepared.jsonl' error_param=id error_type=invalid_request_error message='OpenAI API error received' stream_error=False
[2023-02-18 10:25:31,978] message='Request to OpenAI API' method=get path=https://api.openai.com/v1/files
[2023-02-18 10:25:32,578] message='OpenAI API response' path=https://api.openai.com/v1/files processing_ms=553 response_code=200
Seems like it’s failing to get the file from openai, but it’s doing this before it says it uploaded the file in the first place, so I’m puzzled.
From some googling it seems like utf-8 encoding is a problem, but the proposed hot fix didn’t work for me, so I tried forcibly converting all the text to ASCII (it’s mostly ASCII anyway) before generating the dataset, and that reduced the number of errors. Now it’s just this error remaining:
[2023-02-18 11:26:46,732] error_code=None error_message='No such File object: dataset4_prepared.jsonl' error_param=id error_type=invalid_request_error message='OpenAI API error received' stream_error=False
I seem to have resolved the last error by splitting the file into three equal pieces. It looks like my text dataset (roughly 500 megs) is simply too large for the cli uploader, so I’m going to upload and fine-tune on one piece at a time to work around the problem.