How to Increase context length

parthgupta · January 9, 2023, 1:15pm

Most models have a context length of 2048 tokens (except for the newest models, which support 4096).
How to increase this?

PaulBellow · January 9, 2023, 2:52pm

Welcome to the community.

There’s no way to increase the token limit on our end, but some have used summaries of previous prompts to increase it.

What are you trying to do exactly? Maybe we can help.

sysadmin · January 9, 2023, 4:24pm

Hi @parthgupta I did a quick search and found this:

There was another thread on how to increase the token limit in a clever way. I will update this reply when I get time to find it

Dent · January 9, 2023, 4:26pm

As @PaulBellow has said above, summarization is the way to go. AFAIK, text-ada-001 is the model of choice for summarization for the same audience as the original text, as it is fast and cheap (whereas the davinci models excel at summarizing for a specific audience). So to put it into pseudocode, it might look something like this:

summary = adaResponse(conversationHistory)
prompt = summary + input(“User: ”)
response = davinciResponse(prompt)
print(response)

parthgupta · January 10, 2023, 6:51am

My use case -
I had setup the completions API integration in my app. When I try to query the result for a particular question, I can get the result including prompt only upto 4000 tokens. Need help in knowing how to increase this limit?

PaulBellow · January 10, 2023, 6:57am

See two answers above. That’s about it at the moment. Good luck!

Universe-Observer · January 25, 2023, 8:47am

I’m trying to import a PDF file at one time, using over 4097 tokens. Is there a way to do that without running into this length limiter?

ruby_coder · January 25, 2023, 8:54am

Considering filtering out low value words like “the” etc.

joshuadmatthews · March 7, 2023, 4:55pm

One option I haven’t seen mentioned is Dedicated Instances. It says in the docs:

"Dedicated instances

We are also now offering dedicated instances for users who want deeper control over the specific model version and system performance. By default, requests are run on compute infrastructure shared with other users, who pay per request. Our API runs on Azure, and with dedicated instances, developers will pay by time period for an allocation of compute infrastructure that’s reserved for serving their requests.

Developers get full control over the instance’s load (higher load improves throughput but makes each request slower), the option to enable features such as longer context limits, and the ability to pin the model snapshot."

Probably an expensive option but an option none the less.

Topic		Replies	Views
Chained Prompt to complete text larger than 4000 tokens? API	14	5642	December 25, 2023
How to build a Question and Answer Bot for context greater than 2048 tokens? Prompting	3	1643	December 17, 2023
Getting around "max_tokens" API	8	30872	December 12, 2023
4096 response limit vs 128 000 context window API	8	7038	February 27, 2024
Test new 128k window on gpt-4-1106-preview API	29	17825	February 6, 2024

How to Increase context length

Related Topics