I’m having an issue with my fine-tuned davinci model where it is performing very poorly on completions with questions that are not related to the training data. For example, if I ask for business ideas on a model that was fine-tuned for therapist conversation, the results are very poor.
I was expecting the model to perform more like text-davinci-003, but it is performing closer to the original davinci model when facing unexacting prompts.
I’m wondering if I’m missing something or if there are any strategies that I could try to improve the model’s performance on these types of inputs. Any thoughts or suggestions would be greatly appreciated.
Let’s assume I do not have enough training data for the model perform as expected in terms of getting the user the right answer, as the user “therapist”.
I’m wondering if the base model that was fine-tuned is not “text-davinci-003”, but rather the most basic version “davinci”. If that is the case, it may explain the model’s poor performance.
Could you tell me how your fine-tuned models perform with other prompts? Do they exhibit similar difficulties, or is this issue specific to the “therapist” prompt?
I have the same problem with a dataset of more than 500 lines.
The OpenAI support advised me to use Embeddings and not Completions for better result with Q&As.
Because the lengths of the vectors are normalized to a length of 1. So you will always be dividing by 1
I left the cosine_similarity function in the video so the example matched the examples on the OpenAI site. But dot product is faster and easier to code
The basic Davinci model fined tuned using the API is completely useless unless you got about a month of messing with it. I did and I’ve since abandoned the whole idea. Especially since a simple one-shot prompt using 003 yields much better results (albeit costing the earth).
My final take on this AI use cases for commercial purposes is that it’s just not ready. When they release the ability to create a model and fine tune it around text-davinci-003 (or chatGPT) then all will be good. My advice, save whatever training sets you have and wait for something that doesn’t drive you round the twist
You should reduce the max tokens hyperparameter when calling the api to remove these multiple responses. Maybe try this OpenAI GPT2 or this OpenAI API to see how many tokens your desired responses are. Also, your fine tuned model will work better when you format your prompts like the ones in you chat.jsonl.