I’m building a GPT (in ChatGPT’s GUI GPT builder) to analyze data from my ZIP archive, which is quite large at about 400MB. Initially, my GPT was unable to open it and always froze. To address this, I created a custom Python script optimized for opening it quickly. Now, the GPT can successfully complete tasks and produce the data I need using my script (it changes hardcoded values in my code based on my prompts and runs the script).
However, each prompt requires the GPT to open the ZIP archive, which is time-consuming. Is there a better approach to train the GPT once on all the data from the ZIP archive so it can retain the information without needing to open it every time? I apologize if this is a stupid question, as I am new to the field.
What you are asking is not possible though. The GPT essentially performs RAG when processing information which you either upload directly to the GPT’s knowledge base or when connecting the GPT to an external database.
For example when you use the knowledge upload functionality within the GPT, the information residing in the files is accessed for every query in the form of semantic search. Initially, when the files are uploaded their contents are chunked and converted into vector embeddings. When a query requires access to the knowledge in the files that query too is in the backend converted to an embedding vector and then a search is performed to retrieve the most similar vectors along with the associated text and included as context when formulating the response to the query. In that sense, every query is handled separately in terms of knowledge retrieval.
The logic is similar for external data although the nature of information retrieveal depends on how your information is stored externally.
Only through the chat you have some temporary retention of the latest conversation history.
zip files are just for code interpreter, and the AI must extract within the Python environment and run scripts to return their contents, knowing what’s inside, limited to 32k characters.
Only supported files, not zip, (by scanning the binary itself) are allowed for document extraction and ingestion into the retrieval tool. The documentation link on this forum’s bar will take you to assistants → tools, where there is a list of which destinations files can go to.
Retrieval doesn’t have an “unzipper” to bypass the file number limit.