My first purpose is to reduse costs of training model, because my application needs to update “training base” very dynamicly by “adding” of 10-20 requests each time
I thought that I can somehow add new dataset to existing models dataset, but then I figured that now OpenAI do not support this ability. After that I searched for opportunities to train new model by using previous model “reference”
But now when I am doing request to OpenAI API - I got such responce
{
"error": {
"message": "Model ft:gpt-3.5-turbo-0613:{org}::{id} is not available for fine-tuning or does not exist.",
"type": "invalid_request_error",
"param": null,
"code": "model_not_available"
}
}
{org} and {id} is actual org and id. I think that gpt-3 based models do not support such ability, but when I am getting my model it says that i can fine tune it
Don’t use fine-tuning to add knowledge. Use a knowledge graph w/ embeddings. Fine-tuning is for behavioral changes. You can teach it new things while also fine-tuning but it’s inefficient and most time just plain wrong.
As of now you can’t fine-tune an already fine-tuned model.
Describe. Continued training was possible for deprecated models using the prior endpoint, but not yet for gpt-3.5-turbo, and replacement completion models davinci-002 and babbage-002.
The conversation about continuing tuning of the same model would likely end when observing, through practical experience, that fine-tuning doesn’t meet the goals of the application envisioned, anyway.
I am doing search in strings, but the data provided to request must be a determination of a searching string. For example “fruit” - “banana”, “container no.” - “FRHVW235894245”
Strings also includes similar words like “FRHVW235894245” and “F6RVW23894235”. And the main purpose to get this information as correct as possible by adding new training data to model. Also fine-tuning model can convert some data by previously given refs. For example I get “Container Name MCS” and I need “Container Name”
Another example can be determination of word in string by nearby standing word like “Total price”
Previously I did data extraction from the strings using algoritms and regular expressions, but database was too big and often result wasnt correct, because of conflicts in regex. Of course I could increase db for employees for better results, but then its hard to work with such big db
After the first try of fine tuned model I got a propper result that i needed
If you can describe it I will be very grateful
So there is no possibility in train new model using existing tuned model?
I have seen this on forum from @logankilpatrick. Is it deprecated functionality now?
You absolutely should not be using fine-tuning for this use-case. How will you update information? What happens when information becomes incorrect/outdated? You cannot just simply remove information or overwrite it. It’s for this reason that GPT-4 using the API still “believes” that it’s GPT-3. It’s for this reason as well that you can find multiple variations of truth depending on how you ask a question.
What you are looking for are pointers, or references. Because they are keyword specific you can use common user-friendly technologies like ElasticSearch to accomplish this. Then you also have the option to add descriptions which encapsulate your products for powerful semantic searching.
Let me repeat again: Fine-tuning is a terrible option for what you want. You want a database and you want a LLM such as GPT to retrieve information from it. If you decide to ignore this and continue attempting to fine-tune you will not have a good time. That is a guarantee. You may as well just donate $100 to charity and atleast you can feel good about yourself afterwards.
You never want to trust an LLM to generate facts. NEVER. You want to:
Use an LLM to shape/transform/filter a query
Run the query through your database
Return the information
Have the LLM respond using the information
Shit from a month ago should be considered deprecated.
If for whatever reason you decide to continue with fine-tuning I ask you to consider what exactly you are requiring. You are requiring a perfect validation sequence which matches a product name to a random string of characters and letters.
Just start with 100 examples. Fine tune it, test it, and see what results you get.