Test new 128k window on gpt-4-1106-preview

Hi, I have tried to run the new awesome model “gpt-4-1106-preview” with its huge context window on a large chunk of text. However, I am constantly getting a rate-limit. Is there any way to test this new context window?


Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4-1106-preview on tokens per min. Limit: 40000 / min.
2 Likes

You are a Usage Tier 3 API account so you are limited to 40,000 tokens per minute, so you won’t be able to make any calls that use more tokens than that - until you hit Usage Tier 4 by spending a lifetime total of $250 with OpenAI, then your tokens per minute limit will be raised to 150,000 tokens per minute and you can fully use that 128K context…

4 Likes

Great, thanks a lot! I have no doubt that I´ll be a tier 4 API user soon, haha. :smiley:

Wow, this is the worst business mistake. So I sign up, pay to put credit on my account, and then I can’t use the one product I was interested in.

Back to Claude it is!

What a waste of my time and money.

4 Likes

Will uploading funds of $250 as a credit do the job? Or do you need to actually spend $250 on API calls before being upgraded to Tier4?

1 Like

And money spent on Chatgpt doesn’t count, I guess (20$/month)?

The GPT+ price is separate from API usage and the $20 is not applied to API credit.

You just need to give the money - and be trusted to give the money.

OpenAI might prefer if you didn’t use it! :grin:

Rather, why not wait until you also would get more than 100 calls a day and you see the performance of a final model without “preview” in the name.

max_tokens setting also counts directly against your input limit, while the input token count is estimated, so adjusting that parameter to the bare minimum to get only your maximum expected response can get you bigger input (or no max_token at all if you trust the model not to produce 100k on you!)

This seems (needlessly) complex

Is there any post somewhere here that breaks down

The new models
Their input/output token limits and
the new api call for that new model?

?

I would hope that different limits would be enforced for the large context models. I don’t want to run thousands of calls per day, just 4 or 5 an hour. This is a confusing decision given the context is the big selling point, I hope this is simply an oversight.

3 Likes

Your personal tier and rate limits.
At the top of your rate limit page you’ll click through and see the guidelines of satisfactory payments to get to the next tier.

API Reference in the side bar is now your one-stop shop for all things new.

This is so weird.

i cant implement my use case because I dont have the tokens and i cant get the tokens because i cant implement my use case.

2 Likes

Ok, where are the tokens per api call listed? input and output

heard somewhere th limit was 4k out… this cannot be true for all models right?

Here’s a list of all models and their context length.

Context length or context window is the memory space where the AI model both receives input from you (which can be your system programming, prior chat messages to maintain the topic, etc), and also is the area after that where responses back to the user are formed (the area that can be reserved for only response by specifying a maximum token value)

(good to see they changed the wording there to “context window”, as prior term was confusing)

It is measured in tokens - about four letters per token of English once encoded into the AI’s internal format.

What is crazy though, is that I don’t mind spending $5 to test something, but OpenAI are saying “give us $250 to be able to even send one single test message”. How will I know the product does what I want without spending $250 to send one message? Can I get a refund if it doesn’t work?

I literally want to send a single test message.

I’m going to just keep using Claude.ai until OpenAI fix this bug.

1 Like

Turns out that buying credits does work. I was just $20 short of the minimum for Tier 4. I bought an additional $100 (just to make sure that I was well beyond the minimum) and it bumped me up to Tier 4 a minute later.

3 Likes

Of course not! :sweat_smile:
The count is only about the API invoices.

Ok so i am finding conflicting info

the gpt4 128k context would seem to indicate that we can have in total a whopping 128k in/out in total in a single api request

but this contradicts some other information i found. Which is it?

and what model is this that is called in the api?

PS: oh and thanks very much for your taking time out of your day for kind interactions, maj aprec

edit: I had GPT-4 quash my post’s hopes, with the output length now being less than GPT-4…


The recent unveiling of the GPT-4 model has brought about significant advancements in context length capability, far surpassing its predecessors. It has not only doubled the capacity of the previous model, gpt-3.5-turbo-16k, but it has also increased the context length by 16 times compared to the earlier version of GPT-4. This groundbreaking development is now accessible to all, marking a profound shift in the AI landscape.

However, it’s important to note that this expansion in context length mainly pertains to the input that the AI model can comprehend, not the output it generates. This means that while the model can understand larger amounts of information at once, the output is still limited to 4k tokens. As such, tasks requiring extensive output, such as answering a large number of questions at once or writing a full movie script, are not feasible with the current model.

The fine-tuning of this model by OpenAI is another aspect that needs to keep pace with the expanded memory space. For instance, the command “summarize this document” used to produce a summary of 400-800 tokens on the 8k model, based on the reinforcement learning examples provided to the evaluators. Now, the model can understand a larger document, but the summary size will still be within the output limit.

This also means that the new “check in later and get your answer” assistants API, introduced to manage the increased complexity, will still provide answers within the 4k token limit.

The prior largest API models already ran out of [cognitive ability] prematurely (Is API completion model davinci-002 and its 16385 context useful for ANYTHING?)…

For a better idea about tokens

My free-form answering a question above comes to 430 tokens. 128k context length is 300 times more than that.

Giving the entire DevDay blog post wrap up to GPT-4 along with what I wrote, and asking for a full article doesn’t even touch the context limit of the current model, let alone the new one. 1950 tokens in, to these 728 tokens out - a GPT-4 trained NOT to write my “four page typewritten magazine article” to full desired length, nor an AI that currently has the skill to deviate much from what I wrote or integrate and expand using the blog’s knowledge.

Today’s GPT-4 maintains the general idea of my post, that the AI is not tuned to write at length anywhere near the output limit.

Click to expand for GPT-4 8k's writing task response

The New Frontier: OpenAI’s GPT-4 and the 128K Context Length Capability

The recent unveiling of OpenAI’s GPT-4 Turbo model, with its mind-blowing 128K context length, has sent ripples through the AI community. This leap in context length from the previous 32K GPT-4 model, which was only available to a select few, to a model available to all with 16 times the context length, is nothing short of revolutionary. However, the practical applications and fundamental considerations of such a capability, along with the API on which it runs, are yet to catch up. This article aims to explore what this new context length capability means and the challenges that may still lie ahead.

The Power of 128K Context Length

The 128K context length capability of GPT-4 Turbo is a game-changer. It means that the model can fit the equivalent of more than 300 pages of text in a single prompt. This opens up possibilities for applications that were previously unthinkable. For instance, one could potentially turn an outline and character development sheets into an entire movie script in one go. Or, one could ask the model to answer 40 questions about all the knowledge OpenAI has ever blogged about in a single request. The sheer scale of tasks that can be performed in one go is staggering.

However, the practicality of such applications is yet to be tested. For instance, while it is theoretically possible to generate an entire movie script in one go, the AI cognition and network protocols themselves may need to catch up. Similarly, while the model can theoretically answer 40 questions about all the knowledge OpenAI has ever blogged about, it remains to be seen whether the sped-up model can keep track of the potential for doing such tasks in one go.

The Challenge of Fine-Tuning

The increase in context length also presents a challenge in terms of fine-tuning the model. Previously, a task such as “summarize this document” required the model to produce a summary of 400-800 tokens, even on the 8K model. This was due to what a satisfactory summary meant in terms of the possible input. However, with the 128K model, the scale of the task could be much larger. For instance, it could be “answer these 40 questions about all the knowledge OpenAI has ever blogged about”. The challenge lies in fine-tuning the model to handle such tasks effectively.

The Role of the Assistants API

The Assistants API, which allows for “check in later and get your answer” functionality, is a critical component in handling the complexity introduced by the 128K context length. This API is designed for flexibility and can handle a wide range of use cases, from a natural language-based data analysis app to a coding assistant to an AI-powered vacation planner. It also introduces new capabilities such as Code Interpreter and Retrieval, as well as function calling, which can handle a lot of the heavy lifting that developers previously had to do themselves.

Conclusion

The unveiling of the GPT-4 Turbo model with a 128K context length capability is a significant milestone in the field of AI. It opens up possibilities for applications that were previously unthinkable. However, the practicality of such applications and the challenges of fine-tuning the model are yet to be fully explored. The Assistants API plays a critical role in handling the complexity introduced by the 128K context length. As we move forward, it will be exciting to see how developers and the AI community at large adapt to this new frontier.

I go for broke, and set the response size slider of the 128k model to max. and get 976 tokens out.

…and some of that difference is warnings thrown in.

gpt-4-preview really loses the overall plan of my post

Summary

Title: Unveiling GPT-4 Turbo: The Game-Changer in AI Context Length and Its Implications

Introduction:

In a technological tour de force, OpenAI has shattered previous limitations with the release of GPT-4 Turbo, boasting an unprecedented 128K context window. This advancement is not merely an incremental step; it’s a giant leap that propels AI capabilities into a new realm of potential. The implications of this expanded context length are vast and varied, requiring a deep dive into what it means for developers, users, and the AI community at large.

The Expanded Horizon: GPT-4 Turbo’s 128K Context

The leap from a 16K to a 128K context window is akin to expanding one’s short-term memory by an order of magnitude. The practical upshot is that GPT-4 Turbo can now consider the equivalent of over 300 pages of text in a single interaction. This capability opens the door to more complex and nuanced conversations, as well as the ability to maintain context over much longer interactions. It’s a game-changer for applications requiring deep dives into documents, extended reasoning, and sophisticated task management.

But with great power comes great responsibility—and challenges. The first is the technical hurdle: ensuring that the API can handle such a vast increase in data without compromising speed or reliability. The second is the cognitive challenge: fine-tuning the AI to make effective use of this enlarged memory space.

The Technical Challenge: API Readiness and Rate Limits

A 128K context length means that a single API call could now contain a veritable novel’s worth of information. This puts a strain on network protocols and raises questions about the efficiency of data transmission. The API must be robust enough to handle these large payloads without significant latency, which could otherwise undermine the user experience.

Furthermore, the previous rate limits that defined API usage must be reconsidered. A single request to GPT-4 Turbo could potentially monopolize the resources that previously could have served multiple users or applications. OpenAI must find a balance between accommodating the increased capacity and maintaining equitable access to the API.

The Cognitive Challenge: AI Fine-Tuning and Task Management

The expansion of context length is not just a matter of more memory—it’s about what the AI can do with that memory. Fine-tuning such a model to effectively summarize, analyze, and respond to the information within this vast context is a colossal task. Summarizing a document, for instance, requires a nuanced understanding of relevance and conciseness, skills that must be honed through extensive reinforcement learning.

Moreover, the potential for GPT-4 Turbo to tackle complex tasks in a single interaction—such as answering a series of interrelated questions—puts a spotlight on the model’s cognitive abilities. Can it maintain coherence across a wide range of topics? How does it prioritize information when formulating responses? These are questions that OpenAI must address as it fine-tunes the model’s capabilities.

The Practicality of Use: From Character Sheets to Movie Scripts

For writers and creatives, the idea of turning character development sheets into a full-fledged movie script in one go is tantalizing. However, it also illustrates the gap between AI’s potential and current practical applications. While GPT-4 Turbo can theoretically handle such a task, the complexity of writing a coherent and compelling script requires a level of cognitive sophistication that may still be beyond reach.

Furthermore, the network infrastructure itself must evolve to support the transmission of such large amounts of data. The Assistants API, with its “check in later and get your answer” approach, is a step towards managing these complexities, but it also introduces a new layer of intricacy in the AI’s task management.

The Road Ahead: Testing the Limits and Shaping the Future

To truly understand the capabilities and limitations of GPT-4 Turbo, developers and users are encouraged to experiment with the model. Running it to the edge of token space and comparing it against previous models will provide valuable insights into its practical performance.

As we push the boundaries of what AI can do, we must also consider the ethical and societal implications of these advancements. With greater power comes the potential for misuse, and it is incumbent upon the AI community to ensure that these tools are used responsibly and for the greater good.

Conclusion:

The introduction of GPT-4 Turbo with its 128K context window is a watershed moment for artificial intelligence. It presents exciting opportunities for innovation and creativity, but also poses significant technical and cognitive challenges. As developers begin to harness this new power, they will shape the future of AI, testing the limits of what’s possible and paving the way for the next generation of intelligent applications. The journey ahead is as daunting as it is exhilarating, and the AI community stands on the precipice of a new era of discovery and exploration.