How to Increase context length

Most models have a context length of 2048 tokens (except for the newest models, which support 4096).
How to increase this?

Welcome to the community.

There’s no way to increase the token limit on our end, but some have used summaries of previous prompts to increase it.

What are you trying to do exactly? Maybe we can help.

1 Like

Hi @parthgupta I did a quick search and found this:

There was another thread on how to increase the token limit in a clever way. I will update this reply when I get time to find it

As @PaulBellow has said above, summarization is the way to go. AFAIK, text-ada-001 is the model of choice for summarization for the same audience as the original text, as it is fast and cheap (whereas the davinci models excel at summarizing for a specific audience). So to put it into pseudocode, it might look something like this:

summary = adaResponse(conversationHistory)
prompt = summary + input(“User: ”)
response = davinciResponse(prompt)

My use case -
I had setup the completions API integration in my app. When I try to query the result for a particular question, I can get the result including prompt only upto 4000 tokens. Need help in knowing how to increase this limit?

See two answers above. That’s about it at the moment. Good luck!

I’m trying to import a PDF file at one time, using over 4097 tokens. Is there a way to do that without running into this length limiter?

Considering filtering out low value words like “the” etc.

One option I haven’t seen mentioned is Dedicated Instances. It says in the docs:

"Dedicated instances

We are also now offering dedicated instances for users who want deeper control over the specific model version and system performance. By default, requests are run on compute infrastructure shared with other users, who pay per request. Our API runs on Azure, and with dedicated instances, developers will pay by time period for an allocation of compute infrastructure that’s reserved for serving their requests.

Developers get full control over the instance’s load (higher load improves throughput but makes each request slower), the option to enable features such as longer context limits, and the ability to pin the model snapshot."

Probably an expensive option but an option none the less.