The indices of the long examples has changed as a result of a previously applied recommendation.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pandas\core\frame.py", line 5266, in drop
return super().drop(
^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pandas\core\generic.py", line 4549, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pandas\core\generic.py", line 4591, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\pandas\core\indexes\base.py", line 6696, in drop
raise KeyError(f"{list(labels[mask])} not found in axis")
KeyError: '[242] not found in axis'
The JSONL file I’m working on looks ok, any idea what’s causing the error?
Ahh the short reply limit Ok, well the error you are getting is saying pandas can’t remove a row or column from a file, this makes me wonder if you have missed a step in the procedure.
My apologies, from you reply I take that if the Y flag is set the procedure fails.
If this is the case and the rule for Y is Remove 30 long examples, then it seems to be that the system thinks “everything” is within the 30 long examples and so the file is empty (or lacking certain content and thus it failed when rows are removed with the the 242 error
So you are suggesting that maybe there are mistaken chars/ missing chars in the JSONL file that causes all rows to look like 30 long rows that is the entire doc?
The fine-tuning models have a 2k token limit and you’re trying to send it a 5k prompt. That could be part of the issue.
The indices of the long examples has changed as a result of a previously applied recommendation.
It seems like it’s trying to split your prompts and then for whatever reason completely loses track of them. I wonder if it’s splitting them more than once?
Reduce your prompts to a suitable token length. The prompt you’re trying to send is incredibly noisy. If you are trying to tune the model to always return a consistent object you are probably better off just using GPT-4