I am getting an invalid_request_error while creating Fine tuning job for GPT 3.5 turbo via API

I am getting following error while creating a fine tuning job through API commands
following the guide here:- GPT-3.5 Turbo fine-tuning and API updates

I have data in jsonl format with 1 system message and 24 messages the data has 557 tokens

{
“error”: {
“message”: “file- id has 1 example(s), but must have at least 10 examples”,
“type”: “invalid_request_error”,
“param”: null,
“code”: “invalid_n_examples”
}
}

Then I referred to- Open AI Fine Tuning Guide.
and still getting the same error message as above

here is the sample of the data format. Now I am getting confused with the data format.
Please let me know if there is a mistake in the format:-

{“messages”: [{“role”: “system”, “content”: " You are a helpful assistant of a bank customer service. Please be respectful and helpful.}]}
{“messages”:[{“role”: “user”, “content”: “what can i do if my credit card is lost ?” }]}
{“messages”:[{“role”: “assistant”,“content”: " Immediately call your nearest branch and ask them to discontinue the card. " }]}
{“messages”:[{“role”: “user”, “content”: “I want a virtual card!”}]}
{“messages”:[{“role”: “assistant”,“content”: “f course, I’d be happy to assist you with that! Thank you for trusting the Bank with your financial needs. To get started you can easily request a virtual card through our online banking platform or mobile app. If you don’t have an account with us yet, you can open one today by visiting our website or visiting one of our branches. Once you have an account, you can request a virtual card through our online banking platform or mobile app. Please let me know if you have any other questions or if there’s anything else I can help you with!”"}]}`

Or it is a different error altogether and nothing to do with the data format?

Please help! Thanks!

Here is an example of correct formatting required for fine-tuning an AI model, containing three distinct conversations:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

Yet yours appears to be only one conversation, and this does not comply with the data formatting guidelines for tuning, along with errors. Also, per documentation, it takes 50-100 examples before you start to see behavior changes in the specific area of enhancement.

{“messages”: [{“role”: “system”, “content”: " You are a helpful assistant of a bank customer service. Please be respectful and helpful.}]}
{“messages”:[{“role”: “user”, “content”: “what can i do if my credit card is lost ?” }]}
{“messages”:[{“role”: “assistant”,“content”: " Immediately call your nearest branch and ask them to discontinue the card. " }]}
{“messages”:[{“role”: “user”, “content”: “I want a virtual card!”}]}
{“messages”:[{“role”: “assistant”,“content”: “f course, I’d be happy to assist you with that! Thank you for trusting the Bank with your financial needs. To get started you can easily request a virtual card through our online banking platform or mobile app. If you don’t have an account with us yet, you can open one today by visiting our website or visiting one of our branches. Once you have an account, you can request a virtual card through our online banking platform or mobile app. Please let me know if you have any other questions or if there’s anything else I can help you with!”"}]}

The provided example does not comply with the data formatting guidelines for fine-tuning an AI model due to the following reasons:

  1. Mismatched Quotation Marks: The example uses curly double quotation marks (“ ”) instead of straight double quotation marks (" ") which are the standard format for specifying strings in JSON.

  2. Missing Closing Quotation Mark: In the “system” message, the closing quotation mark after “helpful.” is missing, causing a syntax error.

  3. Incomplete Content in System Message: The system message is incomplete. It starts with the instruction to the assistant (“You are a helpful assistant…”), but it lacks a closing quotation mark and any coherent content to provide context to the assistant.

  4. Improper JSON Structure: Each individual message block should be enclosed within curly braces ({ }) and separated by commas, as seen in the correctly formatted example you provided at the beginning. In the provided incorrect example, the messages are not enclosed in separate braces.

Here’s the corrected version of the example that complies with the data formatting guidelines:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant of a bank customer service. Please be respectful and helpful."},
    {"role": "user", "content": "what can i do if my credit card is lost ?"},
    {"role": "assistant", "content": "Immediately call your nearest branch and ask them to discontinue the card."},
    {"role": "user", "content": "I want a virtual card!"},
    {"role": "assistant", "content": "Of course, I’d be happy to assist you with that! Thank you for trusting the Bank with your financial needs. To get started, you can easily request a virtual card through our online banking platform or mobile app. If you don’t have an account with us yet, you can open one today by visiting our website or visiting one of our branches. Once you have an account, you can request a virtual card through our online banking platform or mobile app. Please let me know if you have any other questions or if there’s anything else I can help you with!"}
  ]
}

This corrected version maintains the proper JSON formatting, uses standard quotation marks, and ensures that each message is properly enclosed and separated within the JSON structure.

Additionally, each group of messages of an example conversation should be formatted as a single line:

{"messages": [{"role": "system", "content": "You are a helpful assistant of a bank customer service. Please be respectful and helpful."}, {"role": "user", "content": "what can i do if my credit card is lost ?"}, {"role": "assistant", "content": "Immediately call your nearest branch and ask them to discontinue the card."}, {"role": "user", "content": "I want a virtual card!"}, {"role": "assistant", "content": "Of course, I’d be happy to assist you with that! Thank you for trusting the Bank with your financial needs. To get started, you can easily request a virtual card through our online banking platform or mobile app. If you don’t have an account with us yet, you can open one today by visiting our website or visiting one of our branches. Once you have an account, you can request a virtual card through our online banking platform or mobile app. Please let me know if you have any other questions or if there’s anything else I can help you with!"}]}

AI assisted by prompting with documentation.

1 Like