Confused on whether or not the data goes to Open AI

So I’m using Llama Index and I get that the embeddings can be stored in a location that you own. But my question is this: Does the data from my training documents go to Open AI?

Documentation search. result

There are many embedding models to pick from. By default, LlamaIndex uses text-embedding-ada-002 from OpenAI. We also support any embedding model offered by Langchain here, as well as providing an easy to extend base class for implementing your own embeddings.

Thanks but this doesn’t quite answer my question. So does the data go and get stored at Open AI?

All the database information that can be looked up by a similarity search must be sent to an embeddings AI model, to retrieve a vector that can be stored along with the original text chunk.

API-sent data is no longer used for training models. You can look at the privacy and security policy for other retention policies. Some data collection is done, for example, to track thresholds of continued abuse for API account banning.

2 Likes