Fine_tunes prepare_data No Response

Hi, I am kinds of new to GPT. Looked around in OpenAI API docs and other topics/posts in the forum, not able to find an answer to the issue I am having.

When I run: !openai tools fine_tunes.prepare_data -f “mydata.json”. It only works when mydata.json has less than 36 prompt-completion pairs/rows, and I got response in seconds as expected. If mydata.json has more than 36 prompt-completion pairs, the API will “run forever” (hours) quietly without any response. Not sure it was still running or failed since I did not receive Any response for the API. The data size should not be the issue since mydata.json data size is really small, about 14KB (even when it has more than 36 prompt/completion pairs).
I am currently in trial period, not sure it is related to the issue or not.

So, my questions are:

  1. What is the output (file) for the API: !openai tools fine_tunes.prepare_data -f “mydata.json” ? is it the same file: “mydata.json”? or the API will create another json file for the prepared data? (when I run it successfully with less than 36 pairs of prompt/completion, it seems no other output file but my own data file: mydata.json.)
  2. Any idea why I am not able to provide more than 36 prompt/completion pairs in mydata.json? is there any data size limitation for fine_tunes.prepare_data?
  3. Is there any way to check the status of !openai tools fine_tunes.prepare_data -f “mydata.json”, if I did not get any response?

Thank you,
Frank

Ok, I basically figured it out myself about this issue based on various try/fail/success tests.

  1. There is no output file from running: !openai tools fine_tunes.prepare_data -f “mydata.json”. This API call is more about “validating” the “mydata.json” is ready to be used for create fine tuned model per OpenAI’s API “standard” (e.g. right format, etc.). If the file is “good”, the API call response will ask you to use it to create, like: "You can use your file for fine-tuning: >openai api fine_tunes.create -t “mydata.json” ".

Suggestion to OpenAI: Enhance/Clarify the document on fine tune prepare data at https://beta.openai.com/docs/guides/fine-tuning/cli-data-preparation-tool
The current online document for fine tune prepare data says “We developed a tool which validates, gives suggestions and reformats your data”. However, it seems it only do “validates, give suggestions”, not “give reformats” which implies it will give an output file, but it seems not. That’s why it confused me(I tried to figure out what’s the output of the prepare data API)

  1. The data size of “mydata.json” matters. In my case, if “mydata.json” file size is larger than 14.1KB, the call " !openai tools fine_tunes.prepare_data -f “mydata.json”" will run “forever” without any response. The number of pairs of prompt/completion does Not matter. Not sure the file size limitation is due to my account status (in trial period) or not.
    Suggestion to OpenAI:
    It will be very helpful if the fine_tunes.prepare_data API call can return some “error/message” when the file size is not “right” with the reason (e.g. in trial period if that’s the case)

  2. I did not find a way to check the status of fine tune prepare data API (although there is a way to check the status of fine tune create API call.)
    Suggestion to OpenAI:
    It will be helpful if fine tune prepare data API can provide a way to check its status just like we can check status for fine tune create API call.

Thank you.
Frank

1 Like

I am a very beginner programmer and I agree with these points Frank!

I am trying to fit my data with those two columns to train. I already created the columns, but I dont know how to fill the prompt/completions columns.

I know how to make the training proccess, because I already used another libraries like Keras, PyTorch or TensorFlow. But how to make it works on OpenAI API?

If someone can help me with that, I will be very appreciated :slight_smile: