Training GPTs through uploaded docs

Here is an alternate way to do this that you could try, which is how I would have approached this.

From what you said, your issue isn’t about getting the correct data, but your issue is about how it how it gives you a structured response.

Now one way you could achieve this is by using the basic API and fine tuning how you would like to output to be structured.

I would recommend:

Start with few shot (content from a few PDF)

Run it through a Series of prompts structured prompts as each API

If your response starts to improve, but you need to teach it from a larger corpus, create a JSONL file with at least 250 items and train it

It seems like a lot of work, but I don’t think GPTs can do this just because you have a huge amount of data.