Hello! I have data that I would like to train in a custom model, but I would like to do this without having to save a file to disk/bucket, from a file in memory. I’m using Python and the OpenAI lib, and I’m having some difficulty, time about sending the file, time pointing out the format of the data as wrong. Below the code I’m generating, and if someone has a light, it would help me a lot.
import io, json
import pandas as pd
import openai
OPENIA_KEY = XXXXXXXXXX
def train_new_model(model_name=None, sample_text):
book_df = pd.DataFrame({'text': sample_text})
json_sample = {
"prompt": "page from article",
"completion": "Have many pages on those article - Author: Diego"
}
csv_buffer = io.StringIO()
book_df.to_json(csv_buffer, orient='records', lines=True)
openai.api_key = OPENIA_KEY
new_trained_model = openai.FineTune.create(
training_file=csv_buffer,
model='ada'
)
print(new_trained_model)
I’ve tried using StringIO and BytesIO. Possibly i’m miscreating my object before submitting, but because the documentation and online examples have a lot of misinformation, I ended up kind of lost during this stage of development. Some times I get invalid format errors, or the file that has an incorrect ID…
The purpose of my code is:
1-Receive a long text variable;
2-Create the training object in the correct format;
3-Create or continue training from a model.