How to test fine tuned models in sandbox?

I fine tuned a davinci-02 model with about 35 examples in jsonl file, and validated with another 5. The output should be structured JSON, like imagine training to break a paragraph into an array of sentence objects,

The training seemed to get well, and the results looked promising, though i really have no reference point and just asked chatGPT :slight_smile:

Training loss: 1.2571
Validation loss: 0.8444

When i actually submitted prompts against this model however, the response was completely out to lunch, no sign of JSON anywhere, and not even related to the prompt. Like my prompt was something like ā€˜Say this is a sentenceā€™ and the response was:

ā€œchoicesā€: [
{
ā€œtextā€: ", and youā€™re not really going to do it. Youā€™re just testing to see if Iā€™ll do it. And if I do it, then youā€™ll know that Iā€™m a good person, and youā€™ll let me live. And if I donā€™t do it, then youā€™ll know that Iā€™m a bad person, and youā€™ll kill me. So, Iā€™m going to do it. Iā€™m going to kill you. Iā€™m going to kill you. Iā€™m going to kill you. Iā€™m going to kill you. Iā€™m going to kill you. Iā€™m going to kill you. Iā€™m going to kill you. Iā€™m going to kill you.

Not very encouraging :skull_and_crossbones:

So my questions are: was my training successful or not? are those training/validation loss numbers alright? Is there some step iā€™m missing?

And finally, the documentation seems to indicate that i can select my fine tuned model in the playground, but I donā€™t see that ability. In fact, i only see gpt3.5 and gpt4 models for selection, not even davinci-2 or any other model.

Any help would be appreciated!
Thanks,
Klaus

2 Likes

Oh my :rofl:

If you are aiming for a homicidal maniac GPT then Iā€™d say so

Your training statistics are as good as the data you have given it.

You are going to need a lot more than a 35/5 split and Iā€™m going to make a wild guess that your training data is not diverse enough. How many epochs did you use?

You need to be in the Completions section

3 Likes

Thank you for the immediate response!! Selecting completions obviously did the trick, donā€™t know how i didnā€™t see that :\

I realize 35/5 is pretty weak, but the 35 are very diverse and i was expecting at least something in the ballpark, but there is no trace of it providing json output.

is davinci-002 the wrong model for generating JSON structured output?

Also, the playground says completions are deprecated, and to use chat. Any idea which chat model would be best for json structure output?

1 Like

:rofl:

Seems like you bypassed the guardrails!

2 Likes

Hmm, may have found the answer. Guessing i should be using a gpt model with functions to get structure data. Will try that nextā€¦

With enough training data / epochs your model should always be outputting JSON. The fact that your training data is JSON and the output is ā€¦ uhhhā€¦ not JSON to me indicates that it needs more training.

I honestly donā€™t know where Completions is heading. Itā€™s losing documentation and is labelled as deprecated, but they suggest using these models.

But, yes, ideally you would use gpt-3.5-turbo for the price alone.

A good rule of thumb is to first use few-shot examples

This is a good method to practice your data and see what the model works with best.
Then, once you have enough few-shot examples and the cost of the tokens outweighs the difference in a fine-tuned model you can move towards it.

You may find that few-shot examples is enough for your use-case and fine-tuning isnā€™t necessary.

Taken from the OpenAI Fine-Tuning Guide:

Fine-tuning improves on few-shot learning by training on many more examples than can fit in the prompt, letting you achieve better results on a wide number of tasks. Once a model has been fine-tuned, you wonā€™t need to provide as many examples in the prompt. This saves costs and enables lower-latency requests.

2 Likes

Thanks, this was also helpful. Trying out my training data in a one shot against gpt3.5 worked exactly as it should have, so not sure what davinciā€™s problem is. Probably pissed itā€™s being deprecated :smiley:

1 Like

I really want to know what lovecraftian horror this fine-tuned davinci saw to make it want to murder someone.

2 Likes