Chatbot produces bad answers on the finetuned model

Hello,

I am not sure if I post the question in a good place.

I have a chatbot I intend to give it open source community and later to the medical community, gonna work for different assistance for patients and clinical staff. Currently it’s in a small testing phase with a finetuned doc - 2 columns only, the symptoms and the names of the possible outcomes (disease).

Since my chatbot works and giving back answer with text-davinci-002 as a general topic, but when I use my finetuned model for example, I got similar answers like : [‘Lip swelling’] [‘Lip swelling’] User: Lip swelling ->Model: [‘Lip swelling’] [‘Lip swelling’] instead of " Cellulitis or abscess of mouth"
I put this as a user prompt from the chatbot UI side: " I have lip swelling as symptom" .

In the json it was formatted in a way that disease name was the celllulitisetc… and the prompt is the lip swelling. The prompt end was specified with this character → and that was added to flask earlier. Flask getting the messages from a js chatbot which I made and send the messages to openai. So davinci works good, get a good response, but that’s not specific. But when I change the model to my finetuned, it get’s back the user prompt, sometimes it gives a good answer, and other times it just give some characters randomly.

Here is the code:

prompt = f"User: {message}\n->Model:"
# Generate a response from OpenAI API
completions = openai.Completion.create(
engine=“davinci:ft-myorganization-2023-04-05-19-29-51”,
prompt=prompt,
max_tokens=50,
n=1,
stop=None,
temperature=0.1,
)

So when I use with text-davinci-002 it works well, and not give me back bad characters.
I tried with stopwords, but the performance is worse then on the finetuned. Tried with more token, and still bad.

I am sure that the error is in my knowledge. Could someone give me a good diagnosis what to change in the code? Or the engine and model’s I mix up?
I added the engine and model and then pick the model only for response, but then I got no response at all.

Thank you,
Istvan

It’s a very small under 100, I know this isn’t the enough data to train but I also tried on a 300 prompt-completion in my other topics, and the answers were very very random and sometimes it just giving back the user prompt as answer, and other times its even not reply to the question but to other topic.

Is that cause the problem?
Should be the dataset over 500?

Thanks.

This is the form what I got after I asking about headache:

[‘Headache’] → [‘Headache’] [‘Headache’] [‘Headache’] [‘Headache’] [‘Headache’] [‘Headache’]