Fine-tuning problem

Hi, everyone. Nice to meet you. I just started using Open AI for my personal projects recently and currently face a problem I can’t seem to overcome.

My goal is to create a fine-tune model that’s trained on a few long stories (about 10,000 words / story) and will be able to answer any questions regarding them, like straight forward Q/A or insights from the story.

These are the problems I face currently:

  1. I can’t seem to label stories with their titles in a way that the model knows which story I am talking about and seek answers for.

  2. The influence of the original model (“text-davinci-002” in this case) is too high. For example, let’s say there is a John in one of the stories. The model seems to sometimes thinks that this John is from a completely different story and outputs facts that do not exist in this story.

  3. Similar to above problem let’s say there are 2 Johns in 2 different stories that I train on, even when I specify the story name it seems to be confused between the two and produce inconsistent results.

So, these are the problems I currently face now with my fine-tuning model.

Any suggestions and ideas would greatly help me understand this API better, and since I feel like I will be here for a while I would be happy if I could meet some people here.

Hi Ravikiran, Interesting question.

Have you considered using embeddings or one of the other replacements for the Answers API endpoint as a way of matching questions to a set of documents that contain the most relevant answers? This would keep GPT3 focused on your story alone.

Doing so would involve breaking the stories into many different small examples that are in Q&A format.

This link summarizes some of the alternatives:
Open AI - Answers API alternatives

This is a Q&A example using embeddings:
OpenAI - Q&A Using Embeddings

This is a question answering example that uses older API’s, but I thought it was interesting because it describes how they used GPT3 to create the questions from the text.
Open AI - older Q&A example

This example breaks a larger text into chunks, summarize each chunk, and then create a summary of a summary.
Perhaps a similar technique could get you answers to questions like “Summarize the first scene”, “Summarize the entire story”.

Reddit - recursive summarization of a novel

And perhaps you could use GPT3 to extract named entities (e.g. characters, their roles, their relationships) and create other questions and answers based on these.
towardsdatascience - entity extraction

And perhaps use sentiment analysis on the chunks to get questions and answers concerning what the characters are feeling?

1 Like

Wow, thank you for the detailed answer.

I have already tried a few of these but the results weren’t up to my expectations, the last one however piqued my interest and I will definitely try that in combination with the breaking into chunks method.

An extension to my question above is for the same problems I faced of a different use case, it’s a text completion problem.

In this case I want to train my model with a bunch of stories and instead of simple Q/A, I am expecting to output something like future predictions, which I would guess is a little more sophisticated.

My question is, can the above methods be translated to this use case, if it does then great news. Just for reference I will talk about the method I tried…

So this is what I did, let’s say a story has 10 chapters, I broke down each chapter into chunks so they won’t cross the token limit. Then I summarized each chunk using GPT-3 and combined all of them at the end to finally have a summary for each chapter. Now as for the prompt engineering, I trained the model in a recursive way. I basically made a JSONL with “summ of chapter 1” as prompt and “summ of chapter 2” as it’s completion. Similarly “summ of chapter 2” as prompt and “summ of chapter 3” as it’s completion and so on… and fine-tuned the model on this.

The problem with this was, let’s say we want to predict chapter 11 with our fine-tuned model and passed in the summary of chapter 10 as a prompt, in this case the model doesn’t seem to take into account the summaries of chapter 1-9 which we trained on and just take the chapter 10(prompt) as relevant and there is a high chance that even if get more data for the story the accuracy probably will not increase, which obviously isn’t productive.

Let me know what you think about this, and thank you for the response earlier.

currently im trying around with having gpt-3 create a couple of questions on each paragraph I feed it in the prompt and then I want to use these questions together with a paragraph for finetuning. Dont know if that makes sense - still trying

I would try the following…

Finetune Time:

  1. Divide you stories or chapters into multiple sections.
  2. Convert the sections into embeddings and label the embedding by stories and chapters.
  3. For each section, generate 1-3 sample QA that focus on that particular section.
  4. Combine the section and the sample QA together and add them as a record in your JSONL file
  5. Fine tune your model.
  6. For all the sections, store their embedding and labels some where you can retrieve at prompt time.

Prompt Time:

  1. Use semantic search to find the most relevant embeddings and filter down to story and chapters you are looking for.
  2. Use the sample QA from that section as the prefix of your prompt
  3. Append the actual Question and wait for completion…

The idea behind this is to use the sample QA to help the fine-tuned model to limit the scope to that section of the text you are referring to.

Hope that helps, let me know if you have any other questions.

From Superinsight