Is it possible to train a model from my own private documents?

Foxalabs · November 24, 2023, 12:47am

Hi and welcome to the Developer Forum!

Data sent to the API is not used for training, so it will not make it’s way into any training data and is only kept for 30 days for legal compliance and then deleted.

There is a method called RAG which is Retrieval Augmented Generation, it takes advantage of a data storage technique called a vector database. These vector stores hold sematic vectors of text and can be searched by similarity, not on the words but on the underlying meaning held by those words, this makes it ideal for pulling only the relevant information from a large corpus.

You can then pass that information as context to the AI to answer queries.

The method for storing the data is called embedding, details can be found here

https://platform.openai.com/docs/guides/embeddings/what-are-embeddings

The amount of data you have may be too large for the current Assistants model, which is a pre built system that contains all of the above elements in an easy to build API library, so some experimentation may be required. Normal vector database methods have effectively unlimited storage, so there are options to solve your user-case.

Let me know if you need any more information.

Topic		Replies	Views
Is fine-tuning the tool for this? API fine-tuning	7	368	September 10, 2024
Is RAG + Fine Tuning only available with the Assistant API or is that available via ChatGPT 3.5 Completions? API fine-tuning , assistants-api , gpt35turbo	15	6096	July 23, 2024
Creating and fine-tuning your own GPT model API	3	6904	September 21, 2024
Why does the fine-tuned model not return my customized data as prompted? API chatgpt , fine-tuning , fine-tuning-problems	5	1259	January 22, 2024
Implementing a file upload in my application using open ai api API gpt-4 , chatgpt , plugin-development , api , chatgpt-plugin	7	7862	January 25, 2024

Is it possible to train a model from my own private documents?

Related topics