I’ll admit, that’s something I’m still trying to probe myself to figure out better.
It’s been one of those cases where I don’t know how it works better, only that it does. My educated guess has to do with token counts and token limits.
Since it appears you seem relatively comfortable around the API, you could use tiktoken and some logic to parse it into chunks of around, I think like, ~10k tokens? Someone else is going to have to pitch in and find what the input limit was, I can’t find it right off the bat for some reason.
The vector mappings could definitely help if you’re comfortable working with them. @Diet 's solution should work, or maybe the combination of our suggestions.
To answer your second question, for me personally, refinement is a natural intrinsic part of this process for me, but I’m realizing it’s not always necessary. For this though, definitely. I’m assuming map-reduce means using vector embeddings/mappings to achieve this. To me that’s just the earlier step in this process before you refine for the summary you want.
I’d call it a “reiterative” approach. You’re iterating over the process as you go, giving it chunks of data that allows it to change and refine its summary as you feed it new data.
TL;DR chunk it via token count. I’m not the right person to ask for names or preferences of methods yet; all of my methods are self-taught through my own personal trial and error before I knew prompt-engineering was even a thing.