Hey guys I am using gpt-4-0613(8k). I will give You the outline of what I am doing.
I have an api that takes pdf file as input extracts text from that pdf file and after that the text extracted is sent to the gpt model for quiz generation based on the content present in the pdf file.
The problem here is that after text extraction the number of tokens that is to be sent to the model is more than 8k for some pdf files
So now I need a solution to reduce the number of tokens so that it can adapt to the gpt (8k) model
The long-contex models arenât that great, because they miss a bunch of the information in the context â they still donât have more attention than the smaller-context variants, as far as I can tell.
It sounds like youâre generating quizzes. You could split the document into chunks, each of which are 4-7k tokens, and ask the model to generate a quiz question per chunk. You can also ask the model to summarize each chunk, and concatenate all the summaries, perhaps multiple times, to get to a smaller input size.
You could use 15k in and 1k out. Input and output tokens go into the same vector in the GPU. Once the model has generated one token, it immediately becomes a new input token. In fact, the model canât tell the difference between tokens that you supplied as input, and tokens that it previously generated!