Question about data security and size of vectorised knowledge base

arjranaprep · October 16, 2023, 7:53am

I want to understand the following

Does open AI retain the information it gets from these queries and documents? If the documents used to create the vectorized knowledge base if highly sensitive and private, is it advisable to use open AI and its APIs to help process that data?
How large can this vectorized database be? Are there any limits? Will the LLM be able to go over a large pool of embedded data and give out correct information?

udm17 · October 16, 2023, 9:54am

OpenAI states in it’s documentation now days that they don’t save or use the data to train their models. However, it you wanna be even more secure, it’s safe to employ like an anonymiser on the input which you send to GPT (ours is in a JSON, so we anonymise the values for the keys)
I don’t think there are limits per se, but there is a chance of some contextual loss when the vector database is very very massive.

Topic		Replies	Views
Is it possible to train a model from my own private documents? API plugin-development , api , large-language-model , training	5	4185	May 24, 2024
Will my fine-tuning data remain private? API	3	3717	December 23, 2023
Question - Chatbot using your own data? Community gpt-4 , chatgpt	16	7754	February 6, 2024
Does the openai API get access to the data I send it or store the data Deprecations cybersecurity	2	7623	March 8, 2024
Does GPT forget the data after 30 Days sent on an API API gpt-35-turbo , chatgpt , api	0	622	June 2, 2023