Length of prompts and limits for finetuned models

I am attempting to perform Named Entity Recognition (NER) utilizing Promptify. Each API or model call corresponds to one or two prompts. The challenge I’m encountering is the input constraint for a single prompt, considering my PDF/text document is about 3-4 pages long. I’m unsure if segmenting the document would be advantageous, given that the responses to the prompts could be located anywhere within the document. What would be the most effective strategy to overcome this issue?

Have you experimented with the GPT-3.5-Turbo-16k model? Seems like something it would be fairly good at, although I don’t know if the issue is more one of attention being overly shared or purely a token length thing.