Train chatGPT on confidential dataset

I would like to fine-tune chatGPT to design a solution for my clients. I would like to know if there is an option to keep the data confidential. Thanks

1 Like

Fine tuning for ChatGPT is not possible. Neither is it possible to fine tune next best text-davinci-003 (currently). Best option you have is the base Davinci model - and good luck fine tuning that and not going totes insane. Yikes.


thank you and this partly answers my question, however, still for the hypothetical scenario that we would like to fine tune using your guide at (OpenAI API) I’d really like to know what is the extent of the use of the data we upload for fine-tuning. in other words, is there a risk this data can leak? can you use it to sell it on? i believe having legal clarity around this would greatly enhance the possibility of more vertical applications and many would be interested. so to sum up: how safe is my confidential data that i upload to train the model on. thank you

Hey @georgejs

I think the referenced (below) OpenAI document clears everything up. The short answer is that you can contact OpenAI and request they not use your data internally (or externally, I assume).


How your data is used to improve model performance

1 Like

One way to keep the information confidential (to some degree) is to store the data locally. By using Embedding, you will only send up the small pieces of information required to answer a specific question.

If the issue is people’s names, you might be able to sanitize the data by replacing names (automatically) before you do the training.

If the confidentially is related to knowledge or IP, you have to weigh up if small snippets taken out of context will cause you issues - or if they need to be in the larger surrounding text to make sense. If this is not a problem, embedding is also a good solution (for the reason described above)

1 Like

Thank you so much. @raymonddavey @DutytoDevelop. This is very helpful.
In terms of then actually using the fine-tuned model via API. How safe / confidential are the “chats.”

1 Like