ChatGPT 3.5's fine-tuning or embeddings or both?

Hi Everyone,

I serve as the director of an educational publishing company specializing in Computer Science curriculum tailored for Basic Education in Brazil, spanning Early Childhood to Elementary Education.

While ChatGPT 3.5/4 can address many of the foundational concepts present in our course (based on its last training data up to September 2021), we are keen to train it to better understand our specific curriculum, platform, and our unique use of computational kits. This would enable it to provide better pedagogical support to our facilitators and client schools.

Given that our complete curriculum documentation encompasses roughly 20 books, each being 80 pages and dedicated to a specific semester module, and a 160-page pedagogical reference that provides an overall foundation of the program and addresses FAQs…

Which specialization strategy would be most effective?

  • Fine-tuning (However, we want it to maintain all its foundational knowledge from the original GPT Turbo model. My initial tests using Davinci yielded subpar results.)
  • Or Embeddings (Though it’s unclear if this approach aligns with our objectives)

We believe using functions might not be ideal due to the need for diverse and analytical answers. Any information we have “structured” is already accessible on our platform

Thanks in advance. Any contribution will be greatly appreciated!

From Fine Tuning Guide

When should I use fine-tuning vs embeddings with retrieval?

Embeddings with retrieval is best suited for cases when you need to have a large database of documents with relevant context and information.
By default OpenAI’s models are trained to be helpful generalist assistants. Fine-tuning can be used to make a model which is narrowly focused, and exhibits specific ingrained behavior patterns. Retrieval strategies can be used to make new information available to a model by providing it with relevant context before generating its response. Retrieval strategies are not an alternative to fine-tuning and can in fact be complementary to it.

If you are trying to “teach” the model new information, embeddings is the way to go. If you want to change the structure or way it response, then use fine-tuning.

In general,

  • Fine-tuning: Teach the model how to answer a question (e.g. structure/format, personality, etc)
  • Embedding: Provide the model with new/specific information with which to answer questions.
1 Like

Thank you so much, folks! It seems that embeddings are the way to go in our situation. I don’t have any specific questions, but there’s a lot of content that ChatGPT isn’t familiar with.

1 Like

Is this a specific service offered by OpenAI? Or just something you do locally before calling the API?

OpenAI provides an API to generate the embeddings. But it’s up to you to build the functionality around it (generating embeddings from chunks of your documents, storing the vectors, running the similarity search, adding the resulting chunks to your GPT prompt).