Issues with Fine-Tuned Babbage-002 Model Returning Incorrect Completions

babbage-002 is an extremely small model, with specifications not published, but the price that compares to the former ada should inform us of the compute. The generation is of very high perplexity.

It will need low temperature and top-p to constrain its outputs.

Then there is the topic of the actual fine-tune of a base model. The AI comes with no instruction-following skills, only the completion of text. You can compare to community efforts of fine-tuning larger open-source AI models, using days of high-end compute and instruction sets now approaching millions, to ponder your ability to achieve success.

It seems the problem with the original case is “inference” vs “overfitting”. Do you want the model to be able to infer the style of response it should produce, for breadth of answering, or do you need it to only produce from a very specific set of responses? The latter will take much more reinforcement (epochs: learning passes through your training data).

If you have a fixed set of tokens to be output, a max_tokens=1 application, you can use a large logprob_bias dictionary to promote those probabilities.