Need some help with understanding embedding/fine-tuning

New to OpenAI and GPT. I am creating a POC for our internal knowledge base in my company. So far i’ve used LlamaIndex to index our docs, and used prompting to change behavior of the engine itself - sending the prompts first with each query invocation which seems inefficient but it works for a small number of prompts. I am also exploring storing our index in Pinecone DB and using the Pinecone reader integration in Llamaindex.

Still a bit confused of all the strategies and want to clear up my understanding, please correct if anything is wrong:

  • With fine tuning I give a set of my prompts, choose a base model such as gpt-3.5-turbo and the fine tuning exercise returns a new custom model which I can use. This new model has all my prompts integrated in it and I can then just use the new custom model without supplying the prompting with each conversation line item. Fine tuning can be very expensive.

  • Embedding is the process of taking a bunch of text and creating a vector model. For instance I can create an embedding of the current conversation so far and provide it as context for the next response. This overcomes the maximum tokens limit when supplying context using text and also is more cost efficient.

  • Instead of “fine tuning”, can I just create an embedding of my prompts and then supply that with each conversation before the context embedding? i.e. query(embedding(prompts) + embedding(context) + latest conversation line item)

Question - why use expensive Fine tuning if I can use embeddings for prompts? I am guessing this may be performance related? I understand I have to generate embeddings for the context as that is dynamic based on what the user says.

1 Like

Fine-tuning doesn’t perform well if you want retrieve your specific knowledge or facts.

However, we have seen that if you increase the number of epochs the fine-tune will “burn-in” your facts (set epoch to 16 or more), and maybe you would be happy with those results. (Maybe at the cost of some generalization of the model).

But what if your knowledge increases or changes over time? Updating a fine-tune is more work and money, and there is no-way to have a fine-tune “forget” part of its training data without restarting from scratch.

The solution here is embeddings. New information can be added on the fly, and information can be deleted on the fly. IMO, it’s this property of embeddings that makes them much more useful than a fine-tune.

1 Like