GPT-3.5-turbo fine-tuning plus document retrieval

My use-case is conditional generation using document retrieval to find relevant parts of the data model necessary to produce a DB query based on the user question. After reading the documentation for the new fine-tuning of GPT-3.5-turbo I wonder a few things:

Does the system prompt in the training dataset for the new fine-tuning approach always have to be exactly the same?
If so, I cannot add the retrieved documents there (because they are dynamic) and should add them to the user prompt?

Can the static part of my prompt which provides general guidelines for the model (currently GPT-4) be shortened to a generic unique prompt and the fine-tuned model basically learns these guidelines through the provided examples alone? Does that increase the required example count? If so, by how much?

Thanks for any helpful answers

If you are trying to “teach” the model new information, embeddings is the way to go. If you want to change the structure or way it response, then use fine-tuning.

From Fine Tuning Guide

When should I use fine-tuning vs embeddings with retrieval?

Embeddings with retrieval is best suited for cases when you need to have a large database of documents with relevant context and information.
By default OpenAI’s models are trained to be helpful generalist assistants. Fine-tuning can be used to make a model which is narrowly focused, and exhibits specific ingrained behavior patterns. Retrieval strategies can be used to make new information available to a model by providing it with relevant context before generating its response. Retrieval strategies are not an alternative to fine-tuning and can in fact be complementary to it.

1 Like

Thanks for the reply.

The narrow focus here would be the generation of valid DB queries that might be more complex than what GPT-4 is able to produce by default.

This is exactly what I would like to understand better: how to combine the two.