I understand that when using the fine-tuning API, the maximum number of tokens per training sample (prompt+completion combined) must not exceed 2048. My question: what is the API’s behavior if you do accidentally send a sample that exceeds the token limits in the JSONL file? Does it:
(a) throw some kind of error or provide a warning in the response? (And if so, is it provided in the response of the file upload endpoint or the fine-tune endpoint?)
(b) silently ignore the sample and move on to the next one?
(c) trim the sample by removing excess tokens and train with the trimmed sample?
I assume that the best practice is to use the tiktoken library to check token counts before sending samples for fine-tuning. However, I’ve already run a few fine-tuning iterations before I knew this, so it would help me to understand how my results may have been impacted if any of my samples exceeded the token count.