How does the fine-tuning API handle excess tokens?

keithaw · May 10, 2023, 1:52pm

I understand that when using the fine-tuning API, the maximum number of tokens per training sample (prompt+completion combined) must not exceed 2048. My question: what is the API’s behavior if you do accidentally send a sample that exceeds the token limits in the JSONL file? Does it:
(a) throw some kind of error or provide a warning in the response? (And if so, is it provided in the response of the file upload endpoint or the fine-tune endpoint?)
(b) silently ignore the sample and move on to the next one?
(c) trim the sample by removing excess tokens and train with the trimmed sample?

I assume that the best practice is to use the tiktoken library to check token counts before sending samples for fine-tuning. However, I’ve already run a few fine-tuning iterations before I knew this, so it would help me to understand how my results may have been impacted if any of my samples exceeded the token count.

Topic		Replies	Views
What is the token limit while fine tuning gpt3 including all prompts and completion API	6	2357	December 18, 2023
What happens if input token exceeds what gpt-4 can handle? API gpt-4	2	806	November 6, 2023
How to overcome OpenAI fine-tuning training data token limit? API api	5	2447	December 18, 2023
How can I adjust the length of the prompt so that it does not exceed the max tokens? API api	4	3571	December 18, 2023
How Do We Get Charged: Exceeded Maximum Token Length API	4	2097	August 29, 2023

How does the fine-tuning API handle excess tokens?

Related topics