It looks like GPT-4-32k is rolling out

OpenAI has this, it’s call the Retrieval Plugin! Not sure about ultra cheap, but you can pick and choose your Vector DB, which is normally the largest cost driver.

1 Like

Interesting result. Claude mentioned names, while gpt4 did not, which I found annoying.

I’m a little skeptical of this twitter account because he’s fudged some tweets in the past. Maybe he fudged this as well, maybe not.

It would also be good to see more holistic prompts. These summary prompts can easily work on retrieval / chunking basis.

Better would be, imho:

1 Like

Hmm, from what I saw in the tweet, GPT-4 did mention names. I thought both responses were similar, and pretty accurate. I watched the first 75% of it.

Not sure what Sam Altman’s strategy is though, walking into the lions den, keep your enemies close? Looks like congress and their constituents mostly hold interest in suing AI companies for all sorts of things. Nobody care’s about SMI, or AGI, or whatever “singularity” stuff Altman is worried about.

I’m a newer in gpt, and i just subscribe the GPT plus in my account. But i cannot find ‘gpt-4’ model in model rate limit page or playground page.
In my api call, i also cannot find gpt-4 model in my model list api result. gpt3 or 3.5-turbo are available. Any one can help me ?

Hey champ and welcome to the community developer forum :heart:

I understand your confusion, chatGPT and the API are two different products, purchasing chatGPT plus gives you access to using gpt-4 (and beta features such as plugins and browsing) through the chatGPT website, not the API :hugs:

1 Like

There’s always good reason to test everything with the cheapest model, I was playing around with running agents multi-threaded and got the following error:

User (yes/no): yes
agent tasks [': Find a Danish "lagkage" recipe without nuts or nut-based ingredients to accommodate a guest with a nut allergy.']
An exception occurred: This model's maximum context length is 4097 tokens. However, your messages resulted in 138155 tokens. Please reduce the length of the messages.
An exception occurred: This model's maximum context length is 4097 tokens. However, your messages resulted in...

The total number of tokens requested in the message was 1,936,916 tokens or approximately $116.21 if I had used GPT-4-32K

That’s quite expensive for a cake recipe :sweat_smile:

1 Like

Haha, 1024k I want that! :star_struck:


I’ve been maxing out 8K prompts lately, trying to iteratively generate a technical design document. Some observations

  • Feeding in full 8k context, requirements document and current version of the design document helps…
  • … but GPT4 is missing a lot of the subtle details and nuances when I provide a full 8k context.
  • When we talk about 8k/32k context, we should stress that this is input context. There may be some tasks (like translation) that GPT4 can do for output, but it really fails to generate anything useful beyond a certain point for things involving complex reasoning.
  • ergo, use it to generate small sections / snippets of text and code, maybe no more than 1000 tokens. Replace those in the current doc and refeed again.

The last points may be obvious to some folks, but the documentation and pricing obfuscate this point.

My general impression is that as a textual pattern/replacement matcher, GPT4 is pretty good (obviously way superior to a human), but when you ask it to start reasoning, there are significant limitations in its capability. It can reason, but only while juggling a few variables (way fewer than what a human can do) and only a small lookahead (again, way fewer than what a human can do). You can work around this via extensive COT and self-critique, but at the cost of a lot of tokens and time.

It’s early, but I’m beginning to suspect the right approach is to use reverse prompting to substitute human reasoning for GPT4 reasoning and just leverage the textual pattern/replacement.

Another conclusion I have - skip trying to extend the context via the heads and just go full retrieval architecture. We’ll get to an infinite input context almost immediately. Extending context in any other way isn’t working afaict and will only have slightly superior results with very limited windows, and those results will probably only be on holistic related tasks.

Results that can be solved on a retrieval basis will likely perform better in smaller contexts. In fact, it’s possible we’ll probably end up using retrieval architectures anyways…


I don’t have access to GPT-4 32K (and don’t really care about that) but I’d say that latency is always going to be related to output tokens…

In my mind the biggest use case for a larger context window is showing the model more input text but I’m working on ways around even having to do that…

I have a new AI SDK coming out at Microsoft //build next week (unfortunately not my Agent stuff) but once I get past that I have a set of experiments I’m planning to run around getting GPT to read and answer deep questions over long documents on the order of 50k - 100k tokens and I’m planning to use crappy old GPT 3.5 for my test. I have this idea for a multi-pass approach that I believe will work with any length document and even with GPT 3.5.

I’m not convinced you need mega large context windows if you’re willing to give GPT a little more time to process things.


Can you elaborate? Just curious what you’re thinking here.

I’ve been doing this to solve some RW tasks and the results have been competitive with what another expert at my level could do.

GPT4 is really superior at pattern matching and then massaging the text it finds to fit your prompts.

On top of that, it can only do some very limited reasoning, but nothing particularly competitive with what a human can do.

The other problem is that it lacks the total context usually required to competently do a task.

The key to optimal GPT4 usage I am beginning to believe is leveraging the pattern matching and finding a way to inject your own reasoning skills. The best way to do this is via ‘reverse prompting’ which is getting GPT4 to ask you questions as if you were the expert.

This does two things - it injects your superior human reasoning skills and it helps complete the context that GPT4 is missing. The results are really great.

This still fails for some things, like complex documents (eg, a technical design document), but it works for others which are not as congitively complex (eg, a requirements document)

That said, even when it fails, it still asks a lot of great questions which help you realize what you forgot to consider.


Ok this is the path I thought you might be going down… You don’t actually need to get it to ask you a question so you can save a model call…

You just need to give it human inspiration and it will make better decisions…

1 Like

Try asking yourself next time you do this and you look at the returned content - could I patent this? You might try, but you’ll probably find there exists a lot of prior art and it’s not particularly novel.

Asking GPT4 is a great way to learn, but it’s not yet at a point where you can innovate with it, not without injecting significant amounts of human reasoning. Reverse prompting is the best way I’ve found to do this.


Innovation is something that incorporates new ideas, knowledge and thought. You can build your innovation on top of existing technology, but you will have to add significant amount’s of your own thoughts to make it truly “new”.

I use the same method as you (reverse prompts) when I’m writing experimental description’s and lab guides, because it helps me catch information and context that I forget to write.

This has method has made teaching complex topics earlier more feasible and much easier. Just the other day we had 2. Year undergraduate student measure the quantum efficiency for a photolysis reaction. Something that’s usually a post-grad lab experiment because it requires a lot of contextual knowledge.

The take home advice here is:
Talking to GPT is like to a smart person who doesn’t understand the context of what you’re trying to do. Don’t view this as a limitation, use it to your advantage by having GPT ask you questions about your task


I joined the waitlist on March 15th and I’m still waiting on even basic GPT-4 access! And that despite being an OS Dev …


The only news I can report is that 32k is WAAAY faster in Playground than either davinci-003 or GPT-4-8k, on the User message/prompt of “Who is going to win Eurovision in 2023?”. (Bogus, yeah, I know).

This isn’t scientific or anything, but 32k is like a hypersonic missile in how fast it completes verses the others with this prompt. Haven’t benched the API version yet, but this is repeatable and wondering what others are seeing in 32k performance vs. the other models.

I am starting to track my output tokens per second too for each call, and logging to a database. Hopefully I can publish a graph over time which depicts the performance vs. time on this key metric.


Could that be due to low usage levels though? I wonder how much of that is load? I would assume the model is roughly the same size memory wise as GPT-4 8K so it’s more a function of technique to get it to output more tokens and that should still have somewhat constant time. So if GPT-4 32k is 4x larger then 8K I would expect it to take roughly 4x as long to generate a 32k output.

The other explanation would be that they’re running it on beefier hardware with more memory because it would need 4x memory to work.


They’re definitely not taking good care of their developer community. I don’t have basic GPT-4 access either, even though they just sent me email saying how valued I am as a level-2 contributor to the community… I have other means to access GPT-4 thanks to Microsoft but I personally feel like if you’re a level-2 contributor to the community you should have access to everything… Massive fail otherwise…

1 Like

May I ask stupid question?

Is it 32k up and running through API?
If yes, why I still got mistake when try to use model ‘gpt-4-32k’

1 Like

Yes, it is live in the API, but the invites are slow to roll out. Check the Playground dropdown under “Chat” to see if you have it. You also need gpt-4 before you get granted gpt-4-32k. So get on the gpt-4 waitlist if you haven’t already.

1 Like