Can someone help me (with fine-tuning)

I’ve had no success fine-tuning a model since the outage. Right now I’m trying the fine-tuning web interface again but when I load both the training and validation files, I get this error in red:

There was an error uploading the file: Unexpected file format, expected either prompt/completion pairs or chat messages.

I am using the correctly formatted and prepared jsonl file, and why is it saying it expected prompt/completion pairs or chat messages? Thats not how the newest OpenAI documentation said data should be prepared which is using messages/system/user/assistant.

Please, for the love of god, can someone help me. I’ve now spent two days on this.

Just guessing here, since I just started reading the docs, but I remembered this piece: " The conversational chat format is required to fine-tune gpt-3.5-turbo. For babbage-002 and davinci-002, you can follow the prompt completion pair format used for legacy fine-tuning". Maybe it’s the type of model?

I’m fine-tuning gpt3.5-turbo and according to the documentation, the data is supposed to be in this format which is what I’ve done:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

I cannot get any help on this anywhere.

What version of the OpenAI api library are you using?

1 Like

Mattrosine

Have you figured it out yet? If so please tell me and others in the future how to fix this issue.

Perhaps you could share some samples of the dataset for everyone to see.

no puedes usar varias conversaciones en el mismo archivo, prueba una conversacion por archivo, ejemplo:

{“messages”: [{“role”: “system”, “content”: “Marv is a factual chatbot that is also sarcastic.”}, {“role”: “user”, “content”: “What’s the capital of France?”}, {“role”: “assistant”, “content”: “Paris, as if everyone doesn’t know that already.”}]}

en un archivo

Here are 3 rows in a JSONL file using the Prompt/Completion pair format, but it just will not accept my file. I even generated a simple sample file from ChatGPT and it won’t take.

{“Prompt”:“Compare deforestation trends over the past decade.”,“Completion”:“The answer is {A}”}
{“Prompt”:“Explore flexible pricing models for emerging businesses.”,“Completion”:“The answer is {B}”}
{“Prompt”:“Showcase success stories from businesses that have subscribed to GreenAnt.”,“Completion”:“The answer is {C}”}

Hi @edward4 - which model are you trying to fine-tune?

Hello i am trying to crate a json file with rows expressing
name of cinema name
name of movie movie
time of Shows times
and so on one per row.
When i try to upload it i got “There was an error uploading the file: Unexpected file format, expected either prompt/completion pairs or chat messages.” do somebody know why and what i have to do?
There is a tool to validate the json file that was generate with an AI help i dont know why is not fine tuning really

Hi - could you share an actual example of your training data set? Have you ensured that the structure is consistent with the example provided here?

I have the same problem.Uploading a training file i get an error about how the file is formatted but i did it as explained one per row very easy file substantially is

cine name name of cinema
show name of movie
price price
and so on

As indicated it would be easiest to do troubleshooting if you provided an actual example rather than just the logic.

Is your assistant message/output a JSON object?

1 Like

@mattrosine It should be in .jsonl format check for any line breaks for the same object line breaks should be avoided. And you must have atleast 10 examples for fine tuning.
Note => Only files with .jsonl format is allowed for now
https://platform.openai.com/docs/api-reference/fine-tuning/create#fine-tuning-create-training_file