What does fine tuning actually do? (Fine tuning vs. Knowledge Retrieval)

There appears to be some discrepancies about what fine-tuning is actually doing and whether it will produce a desired result versus Knowledge Retrieval.

First, some on the forum state specifically that fine-tuning does not teach an OpenAI model new facts.
See here: “Generally, fine-tuning is to teach the model to reply in a specific way. Fine-tuning won’t be sufficient to teach the model “new content”.” - Link

Others say that fine-tuning is supposed to teach an OpenAI model new knowledge which is not already in the dataset.
See here: " Fine-tuning is giving new knowledge to an AI by retraining it. You « feed it » new data that will now be part of the AI, therefore changing the core of that specific AI." - Link

The OpenAI documentation is unclear, it does not state anyting about teaching a model new facts which it does not already know. For example, on OpenAI’s Pricing page, it is not clear what “Create your own custom models” is supposed to mean. The first page of the Fine Tuning documentation is equally vague.

I think fine-tuning is supposed to mean one of the following:

  1. Prioritizing the use of specific data in a response (isn’t this just Knowledge Retrieval?)
  2. Giving the GPT brand new information which it did not have in the past
  3. Both 1 and 2

For example regarding #1: the field I work in is very technical and there are many conflicting pieces of information that require a lot of context, and we find that the base models often give vague, outdated, or incorrect responses to technical questions. We have a dataset that provides the most current and correct information in our field and want to use it in a GPT so that the GPT has the ability to give proper context and provides up-to-date information. Should I use Knowledge Retrieval or Fine-Tuning for this?

Hi and welcome to the Community!

The short answer is that you should opt for knowledge retrieval in this case. Fine-tuning is indeed not intended to teach a model new facts and instead is suitable to get the model to produce output in a certain style or format. You have probably already looked at this but here again the link to an overview of common fine-tuning use cases.

Additionally there is the option to work with OpenAI to create a domain-specific custom model. The latest announcement from a few days ago speaks to that in more detail. In order to be considered for this program you need to fill out this inquiry form.

I agree that the wording between fine-tuning and custom models is not fully delineated. But to reiterate, when it comes to knowledge injection RAG and custom model are the key options to consider.

Thanks for the response, but maybe if I provide more detail you can tell me if Knowledge Retrieval is still the right approach.

The field I work in requires very specific vocabulary, and GPT tends to get the vocabulary incorrect. In instances where it does not already have the required knowledge in its dataset, it will make up vocabulary that is similar/synonymous, but incorrect.

To illustrate what I mean, consider an example of software documentation. I have found consistently that the base GPT models only has about 50% accuracy of stating the correct names and operations of features in a software application. The 30,000 ft. description is correct, but the specific vocabulary to be used in the description is consistently wrong (hallucinated).

Would Knowledge Retrieval still be the best tool to solve this problem, or would fine-tuning be better? What is the difference in terms of cost and accuracy?

Hi @AltDev ,
I can assure you that the fine-tuning approach would be wrong for your use case. I would suggest you to consider using RAG or Routing by semantic similarity (a technique that consists in using embeddings as classification to route a query to the most relevant prompt where you can put all the necessary up to date information if it’s less than 128k tokens per topic), depending on size of your knowledge base. If your knowledge base is large and consists of various topics - use RAG, if your knowledge base is well structured (can be divided on clear topics) and those topics are relatively small (less than 128k tokens each), you might get better results using the Routing technique- it’s simpler, faster, and we saw it performing very well with this kind of knowledge bases. Have a look at those LangChain links I added, it’s very clear and easy to implement with just a few lines of code.

1 Like

Fine-tuning won‘t be helpful with the vocabulary specific part. You would be able to get the model respond in a certain tone or style but it will not systematically pick up terms and definitions through the fine-tuning process.

Ultimately you have to find a way for the specific terms to be included in the context for them to be considered in the response.

Besides creating your own RAG pipeline, you could try out the Assistant API and upload a dedicated glossary of terms or some such in addition to other knowledge files and then through the instructions get the Assistant to always reference to the glossary when responding to a user request.

Thanks @vasyl and @jr.2509 these are very useful responses.

Last question on this note: How would RAG/semantic similarity compare with Knowledge Retrieval? The OpenAI docs state that Retrieval will:

“…automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.”

It sounds like this is just a variation on RAG, but to me it is a black box and I am not an expert on RAG. Any insight as to whether Knowledge Retrieval would be good for a first-run at an Assistant will be helpful.

Glad to hear that! Knowledge retrieval is part of RAG and can also be used instead of RAG. Pure Knowledge Retrieval is AI Search or Semantic Search, but if you use the retrieved chunk (document) to further add it to your prompt-template, so an LLM generates an answer based on it (grounding) - that’s actually RAG, as your retrieval is augmented with generative AI and [the most relevant document] not just shown as a search result (like in Google). For example- if you type a question “What is the US GDP” in Google, you’ll have the response below your search bar and then pages below - those pages are actually retrieved based on semantic similarity [among other algorithms] and the answer below your search bar is actually the [AI] augmented part based on the retrieval form one of those documents (web sites). Hope that helps.

Hello ! @vasyl and @jr.2509 are right : fine-tuning will not be useful in your case. Vasyl’s last message is exact regarding the more technical aspects of RAG.

You can check that session from OpenAI themselves that illustrate the difference between fine-tuning and RAG and when to use one or the other (or both!) : https://www.youtube.com/watch?v=ahnGLM-RC1Y (worth watching).

Finally, @AltDev linked one of my post in your initial message, si I feel like I need to precise one thing. When I stated that " Fine-tuning is giving new […] data that will now be part of the AI.", I wanted to illustrate that unlike a custom GPT you cannot take back the data or change the dataset. Fine-tuning changes the AI behavior but must not be used to learn specific information. It is unreliable and will lead to failure. Fine-tuning is useful to teach AI how to behave.