For those with access to gpt-4-32K, what do you see in openai.Engines.list()?
I see
gpt-4, permissions None, ready True
gpt-4-0314, permissions None, ready True (the 3.14 checkpoint, presumably equivalent to the above)
will I see something like gpt-4-32k also in the list eventually so I can choose between?
Anyone with access to GPT-4-32 is interested in helping out on a test. I want to know how good GPT is at condensing a full movie script into a synopsis. Iâll pay for usage, of course.
I cannot wait to have it on my personal account. I got to play with this a little bit when it first came out because another account had access.
What I was able to test earlier, in applications of fiction, the 32-k model was great at reading more material, but didnât necessarily write more when prompted. However, as a larger context window for the chain-type prompting authors are doing by hand or using tools like Sudowriteâs Story Engine to do, I think the 32-k model has a lot of potential applications.
My experience so far is that when I fed it 19k tokens, it spit out a small 300 token response, smaller than the typical GPT-4-8k version on a much smaller, similar, input.
I was expecting the output to be much larger, and I was surprised it wasnât. But not sure if this is a âprompt engineeringâ issue I had or something else. So good to know @eawestwrites that this is what you experienced too.
Weâll see how this really behaves as more folks start using it.
Still on 8k, sad. Considering Ive been submitting feedbacks and having applied for the waitlist since day one. Really wanted it to compare to the Claudes new 100k one.
I want to play with prompts to get it to write more than just the 300-500 words responses. Specific limits I want to test include:
What are the limits of the âmemoryâ component if I ask it to draft say the next chapter?
Could I just say âand hereâs what happens in the next chapter?â and itâs a list of commands? (Iâve had success with this on 8k)
Really, my hypothesis is that the best way to leverage 32k is to figure out a âchatâ like instruction sequence almost like youâre playing Managing Editor to your âjunior writing partner.â
Read this: (they story so far) Respond âgot itâ when youâre done
-got it-
What are your specific ideas to continue the story in the next chapter if {your ending} is where we want to end up in {8} chapters?
-get result-
Thatâs great! What kind of prompt would I need to give you, including instructions to match the writing style of the existing book, to write that chapter and all of the plot ideas you have and to keep characterization?
And then thatâs probably where that prompt could be fed into 4 - 8K or possibly even TurboâŚ
I guess Iâll just wait until 32k is out for everyone unless thereâs someone I can ask for access
I like your idea of playing managing editor for a junior writer AI
I think what youâre asking about in terms of âmemoryâ is the context window, ie the amount of context the model can ârememberâ, in that case itâs 32k tokens (about 40 pages as mentioned earlier)
With the prices mentioned earlier:
That will be ~2-4$(excluding vat and tax) for every time you use the full 32k context window. Meaning you will burn through the standard approved usage limit of 120$ in 30-60 messages.
Please keep experimenting
Think of GPT as a person whoâs really good at language and text, but doesnât understand what you want or in what context. You can achieve what youâre asking for with a bit of creative use of gpt-3.5-t. if you want to know more I can highly suggest the course:
Iâve gotten access as well, but I havenât done much 32k testing because I want to make sure Iâve tested my prompts first.
Iâm thinking that it would be really impractical if the model would constantly try to consume the maximum amount of tokens without specifically being told to do so, so we may have to construct a task that does that.
Iâm thinking something like a translation task into multiple languages might do the trick
Iâve been testing 32k with my roguelike code base⌠expensive but very helpful when the 4k or 8k canât handle the question and needs more context (code)⌠It still tends to direct you in weird directions sometimes to âsolveâ the problem⌠reminds me of âGet rid of all spam,â request and the AI says, âOkay, I will remove all humans and there will be no more spam.â lol You have to be really specific. Same with fiction, though, reallyâŚ
Like others, Iâve found that taking what it outputs and feeding it back to it as if the user came up with it can work well⌠or saying that you want to get to the root cause of the issue rather than just make the error/bug go away ⌠Overall, though, super-useful for a non-coder like myself.
Are you thinking a bigger context window could be worse in some cases? I guess it depends on what youâre putting into the prompt(s)⌠garbage in / garbage out is more relevant than ever. Heh.
The question is, whatâs the best way to find bugs in code - prompt GPT4 8k at a time, or 32k at a time? Or both? Ignore the cost for a moment, just assume that isnât a factor.
Some bugs will obviously span across chunking boundaries, so 32K at a default may be required. But will it pick up subtle ones, so you still have to do at 8k at a time?
From my limited time with 32k and 3+ years with older GPT prompting, Iâd say that itâs best to only give it what it needs to solve the problem or find the bug. However, in some cases, Iâve found that I didnât include the offending function / method or whatever, and it had trouble âspotting the bugâ without all the info / context⌠So, itâs a balance, Iâd say⌠finding the perfect context length for your particular task. Bigger isnât always better in my experience.
Are you asking âwill it be as good at finding subtle bugs within an 8k chunk IF Iâm lucky enough to get the chunking right?â That seems a very different question that âwill it be as good at finding subtle bugs in a random 8k chunk drawn independently from my chunking?â or âwill it be as good at finding subtle bugs in code only 8k long?â
Three different questions, no? All interesting.
Does 32k have the same number of attention heads?
Yeah, exactly, I made the same point above. By default, youâll probably want to do 32k to get chunk spanning bugs. But should you also do 8k? Presumably you wouldnât chunk randomly, but with reasonable cohesion.
I think the consensus is yes, if you want to be sure, but hopefully someone will do some official evals.
Sorry, I meant to reply to bruce. Canât seem to edit the reply to unfortunately.
I cannot imagine ever sending 32k tokens in pursuit of a bugâeven 8k seems like a lot.
Thatâs just a lot of code.
But, I guess it really depends on the bug youâre looking for. If the code throws an error, you definitely shouldnât need to send that much code.
If the code is outputting the incorrect result, thatâs different, but still a ton of code and probably not something an LLM can do unless the code is very well documented, youâve got correct pseudocode and well-formatted algorithms to reference, or both.
It isnât really a bug pursuit, but rather just a second look. Getting to 100% code coverage in unit tests can be expensive and it isnât just bugs, but also security reviews which GPT4 is quite good at. It can be used for style commentary as well.
For actual known bugs, I actually find GPT4 isnât that great and frequently have to intervene.