How to build a Question and Answer Bot for context greater than 2048 tokens?

siddhant.saurabh · November 4, 2022, 7:03am

Hey,
I wanted to build a Question and Answer bot for contexts text having greater than 2048 letters, but I am restricted by the 2048 limit.
Is there any way to surpass the limit or any other way to give the long context once and then query the context with questions multiple times?

cody · November 4, 2022, 1:10pm

First, take a quick look at this:
https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them

It’s not actually a letter limit - but a “token” limit, and there’s no getting around it unless/until larger models are released in the future. It really is part of the way the completion model works, and not an arbitrary rate-limit, etc.

How to handle that for a Q&A bot is a question I’ve seen asked a lot and there’s a few approaches. I’m assuming you have a body of info > 2048 tokens that contains the info you want to be able to answer Qs about (if you instead mean you need a Q&A experience that considers a chat history longer than 2048 characters - or some other scenario - let me know).

I’ve seen two common approaches so far using just GTP-3, and both involve breaking the larger text down into chunks:

Break the text into chunks, have GPT-3 summarize those chunks to create a smaller overall text, use that text. Fairly simple - but also potentially “lossy” in the summarizing.
For each questions, first go through each chunk with a prompt that asks “is this question likely to be answerable from this text?” Then only pass the “yes” chunks in to consider for the actual question.

In reality, the best case here might actually be to use an embedding an different paid or open source model to quickly identify the right portion of the text to consider, then use that.

sergeliatko · November 8, 2022, 6:48pm

Hi, why do you need a long context. What’s the task and how the context looks like? I bet there is a step missing somewhere (a sort of internal reflexion before crafting the final answer)

Topic		Replies	Views
Over-prompting with irrelevant context Prompting embeddings , gpt-4	8	1621	December 17, 2023
Chained Prompt to complete text larger than 4000 tokens? API	14	5959	December 25, 2023
16k Input vs Output: Edit and token strategies for long input texts Prompting gpt-35-turbo , python	2	1901	December 17, 2023
How can I deal with follow up question in the conversation? API	6	4890	December 24, 2023
Long Prompt with Large Text Data Prompting gpt-35-turbo , chatgpt , api	3	12565	July 14, 2023

How to build a Question and Answer Bot for context greater than 2048 tokens?

Related topics