I’m trying to make a custom GPT for data (in csv) larger than 512MB. Is that just not possible at this time? As far as I can tell, the data limit for knowledge files on a Custom GPT is also 512MB (when I try to upload 600MB file, nothing happens, but it did allow a 150MB file).
And does anyone know the data limit for getting files through the Custom GPT accessing an API? Is that limit simply the context length of ChatGPT 4o?
Action responses are injected into context, not downloaded as files.
That’s what I thought. So it’s limited by context. Is there anyway to dynamically change the knowledge files?
No, this isn’t possible.
The solution is to not return entire files via the action, but set up the action to perform retrieval against those files so you’re only injecting relevant context.
As far as I know, there is a hard limit on Data Analysis, structured, files—.csv—and that 512 MB sounds about right.
There is not a file size limit on unstructured knowledge base files. But:
- Longer files are harder to read.
- There is a 20-file limit for a customGPT, whether structured or unstructured data.
- Yes, there is a data limit for retrieving information via a GPT Action. I suspect they’re the same as the API’s default limits. But there’s no telling what the actual limit is. Not huge.
Can I ask what your goal is in uploading a 600GB file?
First, I imagine this isn’t a flat text file, so the first course of action would be to reformat the content of the file into a more LLM friendly format, Markdown is recommended.
Second, that is likely going to be simply too much information and will overload the attention mechanism.
I am told all the data in the csv is necessary for proper analysis, but I will push harder on it.
I don’t expect the LLM to load in the entire data, but perform some Python functions on it for analysis – so load in enough to understand what it’s dealing with.
In that case you will almost certainly need to implement your own code interpreter the model can access via the API.
Break the csv into smaller chunks. A-K and L-Z, or whatever makes it manageable. Cut out separate workbooks. Having it divided this way will make understanding more immediate, too. Just clearly reference how the files are to be used in your Instructions.
You can pull data in multiple steps. You’re just limited at how much you can get in a single go. Just “load them in the Data Analyzer.”
And Elmsedt is right, these are advanced needs and you should look into an Assistant. They don’t have the same restrictions as a cGPT.
Sorry, what’s an Assistant?
Yeah, I figure this is possible using the API, but then I’ll have to implement the logic where it can make Python and run it on the data myself – which doesn’t sound super easy.
Oh, the Assistants API. I’ve played with the API a bit, but I had never heard of that. That will make this a lot easier to do in the API. Thanks!
1 Like
Dang it. The Assistants have the same 512MB limit for files. I’ll have to figure out if there is another way to handle the data.
Upload as a zip file, have first task unzip it. Here is a gpt i made to test the exact issue youre having.
That’s a pretty smart solution. Does ChatGPT have a built in way to unzip or does that need to be added?
You can direct it to use the jupyter environment, then it will store the unzipped files temporarily in /mnt/data. You can also mnt the csv with sqlite and query the data.
This is something I can instruct the Custom GPT to do? I was unaware it had any environments or tools it could use (though I guess it would have to have some to run Python). Do you know of any place I could find more instruction on how to do this? Thanks!