I made the assumption that you were referring to Assistants and not GPT’s as you mentioned pricing, GPTs have no charge.
It is the duty of the application creator to ensure data security with their application. AI interaction can be challenging and may require significant investment in terms of engineering assets to resolve all potential issues, these same concerns apply to all vector retrieval platforms where a user is allowed unlimited, unrestricted access.
Thanks for the quick reply. Followup question trying to understand the capabilities of retrieval.
Could I upload all 20 documents when the application starts, and each document holds thousands of lines of json? If so, does the OpenAI database chunk each json code block within each document and optimize each block for retrieval? An example could look like this:
20 documents are uploaded, each with 1000 blocks of json, each looking something like:
Story 1. My Knowledge Retrieval Application scrapes the documentation portal with about 40k pages. For each page, I create .json file with the only JSON object: {“URL”: “url1”, “CONTENT”: “content1” }. The next step is parsing each .json file - if its size exceeds 500 tokens (is it optimal size of chunk?), the JSON object is split into several JSONs of 500 tokens but the structure of JSON is preserved: {“URL”: “url1”, “CONTENT”: “1st_content1_chunk” }, {“URL”: “url1”, “CONTENT”: “2nd_content1_chunk” }, etc. Then, the embedding is generated for each chunk via openai API and inserted into ChromaDB (PersistentClient) with the chunk itself. The main idea is that the GPT model (chat completion endpoint of gpt-3.5-turbo/gpt-4-turbo) could supplement the response to the user with a valid URL to refer to for additional information.
Question 1. What is the size of the chunk used for embedding generation by Assistants API? The problem is that the structure of my chunks can be messed up and the content will not be linked to the URL. How to solve this issue with Assistants API?
Story 2. When I receive a non-relevant response from the GPT model, I check if the corresponding information is actually parsed and inserted into the database. Then, I check the result of the vector search for user question text to check if the context received the correct information. These are the steps to localize the issue.
Question 2. How to debug the problem with non-relevant responses with Assistants?
I am 90% positive it is no proper RAG. They just have implemented a quick and dirty page by page keyword search to quickly say: “oh now, we have knowledge search”
Imagine they do use proper RAG and vector db I doubt it
I’m having a problem to upload a file (for retrieval) in the assistant playground.
It worked for very small files, but the upload failed for a 1.3 M text file without any error message.
I initially thought that it was too big, but it is in fact far below the limit.
How can I get more information about the problem ?