It looks like GPT-4-32k is rolling out

In theory yes, but in practice, I don’t see that.

I’ve tried prompts and while they handle up to 8k input reasonably well, most of my prompts which ask for something fairly complicated longer than 1K tokens seems to respond in “as an ai language model, I can’t do that.”

What’s the maxtokens default for the API call, does anyone know? In the playground the slider only goes up to 2048 tokens.

image

In the docs it says it defaults to inf

image

I dont think that has anything to do with the token size
You should give the GPT some kind of role in the system message.
Not really sure what are you trying to do.

I also dont understand why did they limit playground to 2048 max

The prompt does matter, right.

All that context length isn’t very useful if it can’t generate anything interesting to use it up.

32K sounds fun, but if you can’t actually generate something interesting that is 32K in size, sounds like it’s mostly only useful for input.

Congrats, Curt. I have access to the gpt-4 API, but no access yet to the gpt-4-32k model, although I have noted that OpenAI have updated the documentation. I still feel special though :slight_smile:

1 Like

With a much larger context window, you can do so much more.

I think a popular application for most companies will be Q|A Chatbot’s. They could ditch embeddings and the complications those bring and fit their entire data in the 32k prompt and use the API directly. Without complicated correlations and databases. I know it’s a hassle for a lot of folks, and this could help them directly.

Another obvious one is summarization/understanding of large amounts of data at once. My first thought here is with AI agents pouring out massive amounts of data, you could create amazing insightful summaries of this data to distill it down to a polished cohesive representation of this data from all the micro-projections it built along the way. I was going to fold this into my CurtGPT agent as a first step.

Finally, and I know this example is silly, but feed in the IRS tax code and have it do my taxes! Remember GPT-4 passed the bar exam. Half joking aside, I’m not going to ditch my CPA yet, but larger context and advanced reasoning with GPT-4 is the ultimate one-two punch combo to gain deeper insights into these factual and reasoning rich domains. And maybe get me to ask a smart question or two to my CPA. :rofl:

4 Likes

Pricing according to OpenAi, twice as much per token. Perhaps, you will get it back when you complete your tax return :slight_smile:

8K context $0.03 / 1K tokens $0.06 / 1K tokens
32K context $0.06 / 1K tokens $0.12 / 1K tokens
2 Likes

Yeah for massive rollouts the pricing needs to go down. I expect it will, just like it has in the past. For now we are paying for all the non-recurring expenses (NRE) OpenAI put into the models. But costs should and will go down over time, and what we are left with is the core technology. Which is exciting/scary/good all at the same time!

1 Like

Agreed,

I think embeddings will still have their place for most companies because of the optimization they bring. But it’s still amazing for chatbot applications that uses a sliding context window. Especially for more complicated tasks. I often have to copy-paste context from earlier in the conversation because it slides out of the 8k context window, and when I do that it pushes out more context.

Totally agree here again, AI agents can also create large amounts of instruction and contextual data that doesn’t need to be shown to the user. but they tend to go loose track of the original task if you allow them to consume too many tokens again because the original context gets pushed out of context window.

1 Like

Agree. Embeddings are here to stay!

And yeah, 32k gives much more context to the model, so less juggling (less hassle) and less chances embeddings might be required.

Although the AI agents still need embeddings because they can easily pour out >32k tokens in a short time. But with 32k I can make the summary more coherent and longer (maybe even more meaningful or impactful).

Knowledge bases will also be around for sure, though some pivot might be in store.

It’s interesting to see the MosaicML model just came out with a 64K context. It might suffice for certain basic chatbot query type things and folks want to host their own.

1 Like

What I think needs to be said here is that 32k tokens is around ~40 pages of text

That’s a lot of text, and users don’t want to read a response that is 40 pages long, but they will get frustrated if your chatbot forgets their original request

I think it’s great that they rolled out the 8k tokens version first, I’m still going to recommend that people create and optimize their application for the 8k version and only upgrade to 32k when necessary. Not only because it’s cheaper to send and receive less tokens, but also because it make their application more more responsive.

For reference my current response time on GPT-4-8K is about ~140-160 seconds on average with an average input length of ~7500 tokens, but I’m happy, the responses I get are amazing, but I’m expecting the 32k window will take take roughly 10 minutes to process.

What is your experience with the response time on the 32k model?

2 Likes

I ran the timer, and got 38 seconds for an estimated 860 tokens in the response. (Input was minor in tokens, and I assume most processing is in the response).

So, extrapolating this to 8k, I get approximately 6 minutes to respond!

Granted, this is extrapolating off of one point, but, yeah, it’s in the ballpark of your 10 minute estimate!

Good things come to those who wait, I suppose :sleeping:

3 Likes

wonder what the pricing is going to be like. I already started to rack up a decent gpt4 chat bill :smile:

It’s pretty stout at:
Input: $0.06 / 1K tokens
Output: $0.12 / 1K tokens

So I’ve got to build up the database and surrounding code to hold all these precious nuggets of wisdom! And use it sparingly and wisely. Oh, and I think also up my spend limit :sunglasses:

But busy right now getting the GPT-4-8k rolled out for my business. So 32k will be more experimental.

1 Like

Heh, hmm.

1 Like

RIght @qrdl, another good use of 32k (or larger) is feed in a bunch of code into the prompt and ask it lots of questions.

This is one main reason why LLM’s could benefit from larger context windows, otherwise you would need some other underlying architecture, say, graph (edge/nodes) holding the code, so it can see the logical paths through the code.

But 32k (or larger) solves this if you can feed it the code and the context it gets called.

1 Like

I wonder how he got it in a matter of ‘seconds’. Privileged account or just light load?

BabyAGI is like is says, it’s a little baby, maybe cuter too. I based my CurtGPT on it, so you can run this small amount of code quickly through just about anything.

It’s 2069 words, just checked.

Your estimate of 38s for 860 tokens, implies that it should be over a minute before he got a response.

The video showed an immediate response. I’m guessing load, but who knows.

The vid was from the Playground, not API, and it took 30 seconds, not seconds, since I don’t count streaming start, only final finish. (Plus, I used API, not Playground in my measurements).

And of course there’s the posts on API being slower than Playground too. But we won’t go there.

Also, forgot to mention you can’t get 8k output in Playground currently, since the slider maxes out at 2k. So to get 8k out, you need the API.