I am experimenting with finetuning, so I created a very small jsonl file (I know normally these should be much larger, but I thought I would give it a shot).
Original jsonl: training_prompts_bunny.json · GitHub
Prepared: training_prompts_bunny_prepared.jsonl · GitHub
This is what prepare_data suggested and I accepted BTW:
- [Recommended] Add a suffix ending
\n
to all completions [Y/n]: Y - [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: Y
Then, I went ahead and wrote this very simple python program to test my prompts:
import os
import openai
def ask(question):
prompt_text = "The following conversation is between Hoppy, the bunny and a human. \n\nHoppy: Hello!\n\nHuman: " + question + "\n\nHoppy:"
response = openai.Completion.create(
model = "davinci:ft-personal-2022-12-29-00-14-18",
prompt=prompt_text,
temperature=0.8,
top_p=1,
max_tokens=100,
frequency_penalty=0.0,
presence_penalty=0.3,
stop=["\n"]
)
response_text = response['choices'][0]['text'];
response_text.isalnum();
return response_text
def start_chat():
while(1):
input_text = input("Ask: ")
response = ask(input_text)
print("Response: ", response)
def main():
start_chat()
if __name__ == "__main__":
main()
But it’s not really working… I am getting quite wrong responses, as if my fine tuned model was not really taking my JSONL into account:
Ask: How old are you?
Response: I am two years old.
Ask: How old are you?
Response: I am a bunny. I am 2 years old.
Ask: How old are you?
Response: I am 8 months old.
Ask: How old are you?
Response: I am a bunny!
Ask: How old are you?
Response: I am 1.
Ask: Who are you?
Response: I am Hoppy.
Ask: Where do you live?
Response: I live in the forest.
Ask: Where do you live?
Response: I live in the forest.
Ask: Where do you live?
Response: I live under the hill.
Ask: Where do you live?
Response: I live in a forest.
Ask: Where do you live?
Response: I live in a rabbit hole.
Ask: Do you have a brother?
Response: Yes I have a brother!
Ask: Who is your brother?
Response: Goppy.
Ask: Who is your brother?
Response: I have no brother.
Ask: Who is your brother?
Response: I have no brother.
Ask: Who is your brother?
Response: Bear.
Ask: Who is your brother?
Response: My brother is Hoppy.
Ask: Who is your brother?
Response: I am.
Ask: Do you have friends?
Response: Yes, I have 2 friends.
Ask: Who are your friends?
Response: I have a bear, a pig and a bunny.
Ask: How old are you?
Response: I am 5 years old.
Ask: How old are you?
Response: I am two years old.
Ask: How old are you?
Response: I am six months old.
Ask: How old are you?
Response: I'm 2 years old.
Ask: Do you go to school?
Response: No. I'm a bunny.
Ask: Do you go to school?
Response: Yes, I go to school.
Ask: Do you go to school?
Response: Yes, I do.
Ask: Do you go to school?
Response: Yes, I do.
Ask: Do you go to school?
Response: Yes! I go to Jump-n-Go bunny school.
Is this due to the lack of quality training data? Or maybe I am doing something wrong in the JSONL file format or the python code itself?
Thanks a lot in advance!