Reduce the number of tokens

Hey guys I am using gpt-4-0613(8k). I will give You the outline of what I am doing.

I have an api that takes pdf file as input extracts text from that pdf file and after that the text extracted is sent to the gpt model for quiz generation based on the content present in the pdf file.

The problem here is that after text extraction the number of tokens that is to be sent to the model is more than 8k for some pdf files

So now I need a solution to reduce the number of tokens so that it can adapt to the gpt (8k) model

You can try to either summarize what you are sending, do that in chunks or use RAG.

Use gpt-3.5-turbo-16k for more token allowance.

The long-contex models aren’t that great, because they miss a bunch of the information in the context – they still don’t have more attention than the smaller-context variants, as far as I can tell.

It sounds like you’re generating quizzes. You could split the document into chunks, each of which are 4-7k tokens, and ask the model to generate a quiz question per chunk. You can also ask the model to summarize each chunk, and concatenate all the summaries, perhaps multiple times, to get to a smaller input size.

1 Like


It says 16k, but that’s in AND out.
It limits me to 8k in.
But maybe that’s just me. I am tier 3, though…

You could use 15k in and 1k out. Input and output tokens go into the same vector in the GPU. Once the model has generated one token, it immediately becomes a new input token. In fact, the model can’t tell the difference between tokens that you supplied as input, and tokens that it previously generated!

I wish I could. Maybe it’s just me, but if I use one token in over 8192, even if max_new_tokens is set to 100, api will reject request.

That’s surprising; are you sure you’re using the model with 16k context?

I’ve used the 16k model with 13k context, and it “worked” but the accuracy was so bad I preferred to engineer smaller context prompts.

Sigh. I wish I would stop making stupid mistakes.
Thanks for motivating me to double-check.

adjust the temperature, 0 - 0.1 more stable, more than that is ‘creative’

1 Like