GPT3.5 Finetuning Dataset Size

Mbo · August 25, 2023, 1:17pm

Was wondering if there is a limit to the fine tuning dataset size. Can we potentially have as much as a million or more examples for “messages” in our dataset to finetune GPT3.5 - don’t worry i’m aware of the costs associated with this, just want confirm.

_j · August 25, 2023, 1:55pm

Docs: Each file is currently limited to 50 MB.

I recently read “50000” examples as a max, but can’t find it again so it is perhaps not applicable.

vb · August 25, 2023, 5:37pm

I think you are referring to this:

Token limits
Each training example is limited to 4096 tokens. Examples longer than this will be truncated to the first 4096 tokens when training. To be sure that your entire training example fits in context, consider checking that the total token counts in the message contents are under 4,000. Each file is currently limited to 50 MB.

But I might be wrong

Topic		Replies	Views
What are the example token length limits for fine-tuning? API fine-tuning	1	243	July 5, 2025
What is the token limit while fine tuning gpt3 including all prompts and completion API	6	2440	December 18, 2023
Fine-Tuning \| Multiple System Messages? & Limits? API fine-tuning	5	1257	June 7, 2024
Fine-tuning examples still limited to 4k tokens even on gpt-3.5-turbo-1106 API fine-tuning	8	1307	November 29, 2023
Limits on source uploads for custom GPTs GPT builders	2	1083	February 25, 2025

GPT3.5 Finetuning Dataset Size

Related topics