Looking for clarification on knowledge retrieval and using OpenAI's vector database

If I understand correctly, using knowledge retrieval means uploading a document into an OpenAI hosted vector database. A couple questions:

  1. How long are the documents stored there?
  2. Can we essentially use this as an alternative to Pinecone, Elasticsearch, or other options?
  3. Is there any loss in search performance by using a shared vector database with every other OpenAI app? What limitations should I be aware of?

Hi and welcome to the Developer Forum!

The data is currently limited to 20 files per assistant each of 512MB (0.5Gb) the cost is $0.20 per day per 1Gb of data stored.

There is currently no time limit on storage duration. This could be used as a replacement for other vector database system.

Search performance will only be affected by server load, your database will not be shared with others.


Fox, you said:

“Search performance will only be affected by server load, your database will not be shared with others.”

What do you mean by “your database will not be shared with others” ?

I mean what’s the difference if the users can just download your GPT’s files directly. No need to access the vector database.

It is not right from OpenAI’s side to not secure the retrieval documents we upload for use with our GPTs.

1 Like

I made the assumption that you were referring to Assistants and not GPT’s as you mentioned pricing, GPTs have no charge.

It is the duty of the application creator to ensure data security with their application. AI interaction can be challenging and may require significant investment in terms of engineering assets to resolve all potential issues, these same concerns apply to all vector retrieval platforms where a user is allowed unlimited, unrestricted access.

1 Like

Thanks for the quick reply. Followup question trying to understand the capabilities of retrieval.

Could I upload all 20 documents when the application starts, and each document holds thousands of lines of json? If so, does the OpenAI database chunk each json code block within each document and optimize each block for retrieval? An example could look like this:

  • 20 documents are uploaded, each with 1000 blocks of json, each looking something like:
"title": "sample title",
"content": "sample content"
  • Once all documents are uploaded, there are 20,000 blocks of json.
  • For each query, the assistant finds the most relevant block of json across all 20 documents using semantic search.

Is this how it would work?

I’ve note tested it yet, but that is in keeping with my understanding, yes.

Hello! Thank you for your insights!

Story 1. My Knowledge Retrieval Application scrapes the documentation portal with about 40k pages. For each page, I create .json file with the only JSON object: {“URL”: “url1”, “CONTENT”: “content1” }. The next step is parsing each .json file - if its size exceeds 500 tokens (is it optimal size of chunk?), the JSON object is split into several JSONs of 500 tokens but the structure of JSON is preserved: {“URL”: “url1”, “CONTENT”: “1st_content1_chunk” }, {“URL”: “url1”, “CONTENT”: “2nd_content1_chunk” }, etc. Then, the embedding is generated for each chunk via openai API and inserted into ChromaDB (PersistentClient) with the chunk itself. The main idea is that the GPT model (chat completion endpoint of gpt-3.5-turbo/gpt-4-turbo) could supplement the response to the user with a valid URL to refer to for additional information.

Question 1. What is the size of the chunk used for embedding generation by Assistants API? The problem is that the structure of my chunks can be messed up and the content will not be linked to the URL. How to solve this issue with Assistants API?

Story 2. When I receive a non-relevant response from the GPT model, I check if the corresponding information is actually parsed and inserted into the database. Then, I check the result of the vector search for user question text to check if the context received the correct information. These are the steps to localize the issue.

Question 2. How to debug the problem with non-relevant responses with Assistants?


Hey guys,

I’m also interested in how Assistants’ retrieval tool works under the hood. Could not find much here https://platform.openai.com/docs/assistants/tools/knowledge-retrieval :

  1. what is the size of the “short document” being passed without vector search?
  2. the size of a chunk when generating embeddings for “longer documents”
  3. the name of the vector database used by the retrieval tool

Thanks a lot


I am 90% positive it is no proper RAG. They just have implemented a quick and dirty page by page keyword search to quickly say: “oh now, we have knowledge search”

Imagine they do use proper RAG and vector db :wink: I doubt it :wink:

1 Like

I’m having a problem to upload a file (for retrieval) in the assistant playground.

It worked for very small files, but the upload failed for a 1.3 M text file without any error message.
I initially thought that it was too big, but it is in fact far below the limit.

How can I get more information about the problem ?