Fine tune error in using openai tool to parse JSONL file

yonish3 · June 28, 2023, 5:44pm

I’m trying to prepare data for fine-tuning using this command:

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

And getting the following error:

The indices of the long examples has changed as a result of a previously applied recommendation.
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\pandas\core\frame.py", line 5266, in drop
    return super().drop(
           ^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\pandas\core\generic.py", line 4549, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\pandas\core\generic.py", line 4591, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\pandas\core\indexes\base.py", line 6696, in drop
    raise KeyError(f"{list(labels[mask])} not found in axis")
KeyError: '[242] not found in axis'

The JSONL file I’m working on looks ok, any idea what’s causing the error?

Foxalabs · June 28, 2023, 6:26pm

Just as a sanity check, are you replacing <LOCAL_FILE> with the name of your local file?

yonish3 · June 29, 2023, 6:42am

Yes, of course
the answer needs to be longer then…

Foxalabs · June 29, 2023, 7:36am

Ahh the short reply limit Ok, well the error you are getting is saying pandas can’t remove a row or column from a file, this makes me wonder if you have missed a step in the procedure.

Can you go through the process again and try?

yonish3 · June 29, 2023, 9:35am

I noticed now that this happens only when I approve the following:
- [Recommended] Remove 30 long examples [Y/n]: Y

Otherwise, it continue and creates the file (with some strange notes but still there is an output)

Foxalabs · June 29, 2023, 10:03am

Ok, so it seems that maybe the entire this is >30 long examples? or… rather, it thinks it is.

yonish3 · June 29, 2023, 10:44am

Sorry, your question isn’t clear to me.
Can you please explain again?

Foxalabs · June 29, 2023, 10:57am

My apologies, from you reply I take that if the Y flag is set the procedure fails.

If this is the case and the rule for Y is Remove 30 long examples, then it seems to be that the system thinks “everything” is within the 30 long examples and so the file is empty (or lacking certain content and thus it failed when rows are removed with the the 242 error

yonish3 · June 29, 2023, 1:43pm

So you are suggesting that maybe there are mistaken chars/ missing chars in the JSONL file that causes all rows to look like 30 long rows that is the entire doc?

And so by removing the document is empty?

Foxalabs · June 29, 2023, 1:45pm

Perhaps not that exactly, but something like that, yes. I can’t think of another reason why it might be as it is otherwise.

yonish3 · June 29, 2023, 1:47pm

Thanks, I’ll try to find the problematic row and TS it.

yonish3 · June 29, 2023, 7:48pm

I found the problematic row, but I cannot find out what’s wrong with it.

I tried looking for hidden chars that can break strings but couldn’t find anything.

It’s super long, but I really want to know what the issue is so I can make sure not to return it…

RonaldGRuckus · June 29, 2023, 7:59pm

The fine-tuning models have a 2k token limit and you’re trying to send it a 5k prompt. That could be part of the issue.

The indices of the long examples has changed as a result of a previously applied recommendation.

It seems like it’s trying to split your prompts and then for whatever reason completely loses track of them. I wonder if it’s splitting them more than once?

Reduce your prompts to a suitable token length. The prompt you’re trying to send is incredibly noisy. If you are trying to tune the model to always return a consistent object you are probably better off just using GPT-4

yonish3 · June 30, 2023, 2:34pm

I have added some code to limit the prompt to 4096 tokens, as per GPT3 instructions.
I still get the same error from this row:

I can simply delete it and move on, but the fact that I cannot find any reason for the error is making me lose sleep at night

Topic		Replies	Views
Model fine-tuning lessons learned API	2	1697	February 28, 2023
Finetuning via API issues with JSONL API	13	2695	April 1, 2023
Preparing csv -> All prompts are identical API	1	1151	March 22, 2023
Error message with longer inputs Prompting	5	2451	September 5, 2024
An error occurred while processing file 'file-name' and it cannot be used for fine-tuning. Details may be available in the file's status_details API fine-tuning , fine-tuning-problems	6	1850	September 18, 2023

Fine tune error in using openai tool to parse JSONL file

Related topics