How to fine tune a chatbot for Q&A

Hi everyone, excited to post here!

I have been trying to fine-tune a GPT-3 model, and even after reading the documentation and API reference upside down (and following the notebooks), I still cannot achieve what I’m aiming for.

Essentially, my goal is to set up a chatbot that would do Q&A from a knowledge base (KB) of articles. And when I follow the default Q&A recipe in the playground, the chatbot works quite as expected, even with Curie—see my customized preset for the fictional “MyPetLove” company:

However, when trying to fine-tune a model feeding it with Q&A’s (as will eventually be the case for a large KB), the results for the prompts are complete nonsense, even with low temperature (more deterministic); so I must surely be doing something wrong.

I have tried a number of JSONL training sets (again, based on the playground testing that worked as expected).

A “vanilla” format:

{"prompt": "MyPetLove history", "completion": "MyPetLove was found in 1979 in Buenos Aires, Argentina. It opened two branches in the Unites States, in Chicago in 1982 and in New York City in 1989. It expanded to Europe, opening a branch in Berlin in 2005, and another one in Singapore in 2015 when it expanded to Asia."}
{"prompt": "What are MyPetLove main products?", "completion": "MyPetLove sells the brand Kitty Ultra for very high quality food (the best in the market, really), and Doggie Care, a premium family of products for dogs, including food, pest control, and toys."}
{"prompt": "Does MyPetLove have a line of bird products?", "completion": "Not at the moment, but we are thinking in creating one."}
{"prompt": "Who are MyPetLove key people?", "completion": "Emiliano Bastardez, the CEO and Founder; Ricky Ricardez, the Head of Marketing; and Sancha Sánchez, the spokesperson."}

Preformatted with \n\n###\n\n for prompts, white spaces and END separators for completions:

{"prompt": "MyPetLove Q&A\nCompany history\n\n###\n\n", "completion": " MyPetLove was found in 1979 in Buenos Aires, Argentina. It opened two branches in the Unites States, in Chicago in 1982 and in New York City in 1989. It expanded to Europe, opening a branch in Berlin in 2005, and another one in Singapore in 2015 when it expanded to Asia. END"}
{"prompt": "MyPetLove Q&A\nWhat are the main products?\n\n###\n\n", "completion": " MyPetLove sells the brand Kitty Ultra for very high quality food (the best in the market, really), and Doggie Care, a premium family of products for dogs, including food, pest control, and toys. END"}
{"prompt": "MyPetLove Q&A\nIs there a line of bird products?\n\n###\n\n", "completion": " Not at the moment, but we are thinking in creating one. END"}
{"prompt": "MyPetLove Q&A\nWho are the key people?\n\n###\n\n", "completion": " Emiliano Bastardez, the CEO and Founder; Ricky Ricardez, the Head of Marketing; and Sancha Sánchez, the spokesperson. END"}

And it doesn’t matter how I prompt the resulting model (with or without --stop, with an extra line stating that it’s a MyPetLove Q&A, or without), all I get is complete nonsense, as if the model wasn’t fine tuned at all.

$ openai api completions.create --max-tokens 128 --temperature 0.25 --stop "\n" -m curie:ft-personal-2022-07-26-14-12-06 -p "MyPetLove Q&A\nCompany history\n"

MyPetLove Q&A\nCompany history\nPetco was founded in San Diego, California in 1994 by John and Marsha Coelho. The company was originally called Petco Animal Supplies, but was renamed in 1996 to reflect the company’s broader focus on pets.


$ openai api completions.create --max-tokens 128 --temperature 0.25 --stop "\n" -m curie:ft-personal-2022-07-26-14-12-06 -p "Where are MyPetLove branches"

Where are MyPetLove branches?

We have branches in the following locations:

Brisbane, Australia

Melbourne, Australia

Perth, Australia

Sydney, Australia

Canberra, Australia

Adelaide, Australia

Brisbane, Australia

We have a branch in Brisbane, Australia.

Where can I find MyPetLove in the media?

You can find us in the media in the following ways:

We are listed in the following directories:

We are listed in the following magazines:

We are listed in the following newspapers:

:woozy_face:

What am I missing?

You might try breaking up your source document into embeddings. Search the embeddings against your query, then summarize the embedding. This is the approach that Dave Shapiro takes in his multi-document answering video.

This is a non-trivial problem for which I have tackled several components. You may also look up BlenderBot.

I would recommend that you start with just answering individual questions from arbitrary sources as I did. Then also learn to do chatbots separately. Then combine the work. Here are some of my projects:

Hi Hanley! Thanks for your reply.

I looked into the embedding guide and it would seem that my use case would be for “information retrieval” and that I’d to need to use one of the text-search-* models.

The OpenAI notebook for semantic text search using embeddings gives some hints, but the product reviews doesn’t translate so clearly to my Q&A’s case; and quite frankly, I thought fine-tuning would be easier and more straight forward!

So I’m still not clear about how to tackle my use case with embedding. Let me know if you have some additional hints! :slight_smile:

Hi Dave! @jhsmith12345 also mentioned your approach.

I’ll look into your videos and will get back to this later! :smiley:

Did you try the ‘multi document search’ implementation?

Hey, Dave @daveshapautomator and Hanley @jhsmith12345, thanks to the both of you for following up!

I’ve found the repos quite useful to examine what’s the approach Dave has taken (unrelated, the Recursive Summarizer solves quite well the problem of parsing long texts, which I’ve also been wondering how to tackle :tada:).

The MultiDocumentAnswering seems like something I could try to generalize for Q&A’s—and still, the whole process of vectorizing and indexing in order to use embeddings seems like overkill for a problem that, after reading the docs and PR for fine tuning, I would have guessed it was a lot simpler; that is, if fine-tuning actually worked as it is documented

… and when does fine-tuning become relevant for the model in any way, after 100 docs, 1,000, ever?


Okay, rant mode off; I’ll try multi document search and let you guys know later how it goes :slight_smile:

1 Like

There are a few reasons that finetuning is not an appropriate use for knowledge storage and retrieval (but it is good for other aspects of the Q&A process).

  1. You would need to continuously finetune a model as you add to your KB, database, or repository.
  2. In some cases, this would be prohibitively expensive (some organizations have many gigabytes or terabytes of data to sift through).
  3. There are better tools, such as search indexes like SOLR, ElasticSearch, Pinecone, Weaviate, and others that are lightning fast with search plus they can integrate vector-based search.
  4. While finetuning does reduce confabulation (spontaneous generation of false/imaginary information) it does not completely remove it.

It’s critical to remember that finetuning only increases the consistency of behavior, it does not teach the model anything new (not really).

We may arrive at a time in the future when neural representations of memory make sense. Indeed, this possibility is intoxicating to think about - the idea that we can compress and arbitrary amount of knowledge into neural embeddings is great. However, keep in mind that this is how LLMs are already trained and yet confabulation becomes an issue. You have no way of knowing if the model is reporting accurately or not. This necessitates an external repository of trusted facts. In other words, if you rely on neural memory today, you’ll still need to solve the search problem to ensure that you’re reporting accurate information. So you might as well just cut out a step and use search integrated with QA.

I have been hard at work on several microservices that will aid in this endeavor. They are not fully optimized, so they cannot scale beyond a few tens of thousands of documents (yet). But I plan on integrating FAISS into them, which means they will be able to scale to billions or trillions of documents in the coming months.

This is a RESTful microservice that includes several search functions. It’s basically an extremely lightweight version of SOLR, but puts vector search first. As mentioned, it is not optimized so it’s not nearly as fast as SOLR, but it is easy to use.

This is a RESTful microservice that performs offline embeddings using Google’s Universal Sentence Encoder v5. This produces smaller embeddings that even ADA, but it does it for free (and it’s lightning fast). This is optimized for short entries, such as single sentences or short paragraphs.

This is a python module meant to emulate SQLITE (a lightweight serverless SQL) called VDBLITE (vector database lite). This is the same work that the Nexus is based upon. This can act as a serverless vector database and search engine. Again, it is not optimized yet but it should work for up to 100k or 1M documents (depending on size, some folks say it starts to fail at 400k). But it should certainly be able to remain performant and stable in the 1k to 10k range:

2 Likes

It is uncanny how many times I’m puzzling over something, then go and look on the forms or discord and you have just posted the answer to my question!

In this instance, it had to do with the length of text that you can feed into the Google embedding algo

1 Like

You ask good questions :wink:

1 Like