Poor output after fine-tuning with 800 prompt/completion pairs

socialdeversus · March 27, 2023, 4:08pm

Hello everyone,

Hope this is the right place for my question.
My question is “basic,” as I am just starting out and would like to clarify a few things. Essentially, I need to create a basic understanding of AI so that I can use it as a basis for requesting a series of data.

Firstly, I want to exclude a specific case: my idea is not to answer questions, in the style of customer service.

Here are a couple of examples to clarify the final scenario. Case 1: input a list of products and a brief description of a company and have AI tell me which products that company would be more likely to buy or, if the company is already a customer, input the first five products and have AI tell me which related products can be proposed.

I have tested this by giving a long prompt to ChatGPT, and the responses obtained are correct.

Now, I have done a fine-tuning test where I structured a document with about 800 questions that concern the company (about 80 questions on vision and mission, and about 12 questions for each product). Then, I ran a test on the playground, but the AI’s responses were somewhat random.

Did I use the wrong fine-tuning method? Is there something wrong with the Python code or something else?
I’ve set 4 epochs, do I have to set more epochs?

In summary, I can do fine-tuning, I can provide it with a lot of information, but I have the impression that something is never quite right. Can you enlighten me or provide me with documentation that shows the ideal prompt/completion pairs for training AI for my type of case?

Thank you very much!

RonaldGRuckus · March 27, 2023, 4:29pm

It seems like what you want is a recommendation system.
Fine-tuning is not the way to go.

Fine-tuning is adapting the pre-trained model to new tasks or domains by learning new patterns from the new data. It is not as simple as “I see this, I know now it and will repeat it and apply it to my complete wealth of knowledge”. This is why embeddings are almost always recommended.

You have found that ChatGPT on its own with some few-shot examples works great, so why not just use a more up-to-date model such as GPT-3.5 then with these few-shot examples?

I think there’s a huge misunderstanding that the base models for fine-tuning are comparable to ChatGPT - they are not. It would be similar to jumping into a completely modified Honda Civic specifically purposed for speed, and then expecting a generic base model from Pop’s Car Depot to run the same.

You should be trying few-shot prompts with whatever model you are attempting to fine-tune. Which is most likely “Davinci”, as in “Davinci”, not “text-davinci-003”.

If you decide you want to continue with fine-tuning, you should be building a train/validate set (usually 80:20) and running intervals to check how your training data is, and how the bot is adjusting to it.

Running 800 pieces of training data blindly will result in noise.

linus · March 27, 2023, 7:16pm

Hi and welcome to the forum @socialdeversus! In this case what you could also consider would be embeddings. The use case you describe (recomendation) is explicitly mentioned by openAI regarding embeddings.

Embeddings - OpenAI API

Here’s the cookbook: Recommendation using embeddings and nearest neighbor search | OpenAI Cookbook

BFCME · March 28, 2023, 4:13pm

I built something similar to what you are trying to do using well structured prompt injections into GPT3 and GPT4. Because the number of tokens has been increasing this prompt driven approach is now more feasible though it does depend on the number of customer types, the size of your product catalog, the complexity of your selection rules and the volume of output text you are expecting from GPT. I managed to organize my data into triples and used the triples to string together options that were allowed and those that were not allowed. This collection of triples forms a network graph which I can manage externally for various customer profiles vs. product fit options. To prompt GPT I can navigate and filter the graph externally (by a user or a program or even a spreadsheet) and I then only feed in those nodes and edges from the graph that make sense for a given scenario. There is some simple graph walking knowledge that is required to teach GPT as a preamble to the actual customer and product data prompts and this needs to be included as a prompt each time. It works surprisingly well as GPT seems to understand the concept of graphs. One thing I have been thinking about is to fine-tune GPT on graph walking so I don’t need the pre-amble on each prompt, I believe this would work as it is more of a skill that it sort of knows already. Feel free to comment back if you would like to discuss in private, I would also be interested to learn more about your experience with fine-tuning.

socialdeversus · March 28, 2023, 6:08pm

Dear RonaldGRuckus

Thank you for the valuable information you shared with me. Actually, my final scenario is a bit more complex: I would like the AI not only to be able to recommend the best product (on this, I agree with you), but also to do so based on a general assessment, a kind of client profile based on its customers reviews.

I performed a new mini fine-tuning (just on company vision/mission statement) and noticed that, for simple questions that are similar but not identical to those in the training, GPT responds correctly (although practically with the same words put in training file) also if varying the temperatures. However, when processing more complex prompts that relate the company’s vision/mission to the potential customer’s profile, the responses are completely off.

I will need to figure out how to set up the training questions so that GPT can understand the correlation between a potential customer’s profile and the company’s vision/mission. If you have any suggestions on this, it would be of great help.

Thanks to your response, I have started to review the training method, especially regarding the validation issue. Now I will try to follow your advice and adopt the 80/20 method.

Will update about my new test.
Thanks a lot!!

socialdeversus · March 28, 2023, 6:28pm

Hi barry.coflan and thank you for your message!

I found your approach without using fine-tuning very interesting, and I had never considered such a structure before. Have you read my response to RonaldGRuckus? As you can see, my scenario is a bit more complex than the direct relationship between products, but your input is super interesting.

Just to clarify, the fine-tuning is for a company that sells food products, and its customers are restaurants. I need to think about the method you proposed to create relationships between the restaurant profile and the company’s vision/mission. For example, I could base it on the scores of the reviews received by the restaurant, but this is just an idea that came to me while writing this message.

I’ve done some quick tests in ChatGPT to see if I understood the method, and I’ve received some very interesting answers regarding the relationship between products. Thank you so much!

I would definitely like to continue discussing this with you and explore how this method could integrate with the fine-tuning that I need. Regarding fine-tuning, I still need to learn how to do it correctly, as the current one is just a training ground for another, more complex personal project I’m working on.

If you’d like, we can continue the conversation here in public (perhaps it could be useful for other users to see the most important updates), and proceed privately with all the intermediate steps and tests. Just let me know your preference; I’m happy to discuss it with you.

Thank you again, and looking forward to hearing from you soon!

socialdeversus · March 28, 2023, 6:34pm

Hi Linus and thank you for your message!

I must admit that I am not familiar with structuring this type of training, as I primarily work in marketing and use external tools for fine-tuning since I do not have coding skills.

However, I appreciate your suggestion and I will definitely look into the embedding methodology. If you know of any no-code platforms that could help me in this regard, I would greatly appreciate your advice. As for fine-tuning, I am currently using Riku.

Thank you again for your input and for taking the time to help me out. I look forward to hearing back from you soon!

RonaldGRuckus · March 28, 2023, 8:08pm

I see.

My question to you:

What role do you want GPT to play in the recommendation?
What happens when your products change? Or your assessment?

You can combine any sort of recommender with any sort of metric without even using GPT. The logical/pattern part that I believe you are trying to utilize is done through embeddings as @linus and these articles have mentioned. Here’s another example of an article recommender based on the user’s previous behavior. Both of these systems are using an assessment of the user’s previous behavior

BFCME · March 28, 2023, 8:55pm

Cool, I like your idea of looking at the ratings. Here is a crude prototype using a triples table that you can paste into Chat GPT to try out. (a version of this triple could be injected into the API per the OpenAI Embeddings referenced in linus’ and Ronald’s posts.). Paste the instructions and table below as a single prompt into GPT 4. Note: the “magic” is the Friends and Foes constraint in the data table which GPT seems to understand. This would allow you to maintain your product list without writing logic, you should be able to do much of what you need by using data relationships so that it is easier to maintain.

Prompt with this…>

To use the Item-Attribute-Value triples for restaurant menu analysis, you can enter the restaurant type, and I will provide suggestions based on the dish ratings and ingredient compatibility. Please enter the restaurant type (Italian, Sushi, or Mexican) to begin the analysis:

“Please enter the restaurant type (Italian, Sushi, or Mexican):”
Category | Item | Variable

Topic		Replies	Views
Prompt Assistance , Potentially Fine Tuning oddity Prompting	6	1168	February 7, 2023
Fine Tuning Help defining Prompt/Completion API	17	2141	March 31, 2023
Fine-tuned davinci - prompt/completion - terrible responses Prompting	8	2479	December 24, 2023
Fine-tuning 3.5 turbo to act as conversational AI like Non-Playable Character in games API fine-tuning	4	1520	October 4, 2023
What to do when fine-tuning is not working? API	21	7853	December 24, 2023

Poor output after fine-tuning with 800 prompt/completion pairs

“Please enter the restaurant type (Italian, Sushi, or Mexican):” Category | Item | Variable

Related topics

“Please enter the restaurant type (Italian, Sushi, or Mexican):”
Category | Item | Variable