Many have noted since Dev Day, outputs are capped at ~850 tokens max. This is expressed as ChatGPT being lazy, leaving placeholder text for you to fill in, and refusing to speak for a very long time in one message. I believe this is a result of OpenAI scaling their model inference in a certain way.
Please tell me why I am wrong, and what you think instead. I have not worked in a place with large model inference before, this is entirely speculation.
My Guess of OpenAI’s Inference Implementation
Many Models of Various Context Sizes
I think OpenAI is hosting many models of various context sizes. This is to minimize waste. You wouldn’t want to spend 110k tokens on <empty> , because you’d still pay for the attention calculations of such a huge vector, not to mention the extra memory needed for the parameters.
My guess is the main network inside the model is the same across GPT4’s of different contexts, leading them to have models like gpt-4-4k, gpt-4-8k, gpt-4-16k, … gpt-4-120k, all of which share the same RLHF core and some other secrets.
Inference Cycles of 1024 Steps
Every model runs 850-1024 inference steps per input before finishing the Chat.Completion call. This allows OpenAI to update the model’s batches all at once instead of on-demand exactly when calls happen. This mass batch update should save them time when going from CPU to GPU and back.
In order to correctly return good concise messages to their users at 850-1024 inference steps, they trained their RL core network to conclude messages within these amount of steps. That is why it is not simple for them to fix this. This RL is part of their RLHF which does many other great things such as tell the truth, be helpful, etc.
How can they fix this?
They can re-train their RLHF core networks to know when to go beyond one inference cycle.
They could create a classifier for longer inference cycles and send to a model that will run longer inference cycles.
What do you think? What did I get wrong?
Tuning AI models to curtail the output is an ongoing strategy from earlier models to limit the amount of generation time per user input and save on computation, especially within ChatGPT where users are chatting but not paying per response, which is what the models are all about.
There is no mystery when the AI models will also spout back BS that they can’t write as long as you specify, also trained by OpenAI.
I am seeing more and more of these types of responses, chatGPT either puts things on me to do, i.e. “Here’s what you need to do”… Like, excuse me, as a paying customer that’s the main reason why people are interacting with ChatGPT… To tell it what to do, but it started telling people to go and do it.
The second is that it sometimes doesn’t acknowledge its capabilities.
possibly down to the model not understanding it can read .tsx files, you can try renaming them to .txt or you can try pasting the file contents into the chat directly.
you can also tell the model that it CAN read the files, file uplaoding is still in beta test, so there will be times when the model is confused by file types and even by the fact that files can be uploaded at all as that is a new feature. These issues will get better with time and training so please be bare with it.
I’ve noticed sometimes on large text files it will only grab first 500 characters, etc. Might try them one at a time, though you do then lose the context, etc.
It’s good that we’re seeing more people voicing their concerns, there’s two topics being simultaneously discussed here and perhaps we should discuss them separately:
That ChatGPT’s performance in handling files is dwindling (or is gradually ramping up)
That ChatGPT’s general performance is slowing down, probably intelligently, to save on resources, processing, and costs
Although I’m beginning to understand point 1 better from the responses on this thread, I still feel that point 2 is more prevalent, I see more and more of it daily, even though I clearly instruct ChatGPT to “provide full and complete code with comments” I end up getting with missing code, shortcuts, deflection of responsibility for me to carry out its instructions, or runtime errors or network errors, etc.
I read a joke that it probably got promoted to a senior developer level and now talks in high level just like a senior dev does, which although sounds like a joke might be true, who knows.
I wish to see this fixed. Or else why would OpenAI charge monthly fees if its performance isn’t satisfactory.
I didn’t have to wait longer, this is a brand new chat (sensitive information omitted), but I asked chatGPT to look up instructions on a URL and help me build something similar.
ChatGPT is lying to me to avoid the extra work to search the internet, understand the content and make similar recommendations (or at least that’s how it feels). This happens to me several times a day.
We in Germany are also massively disappointed with the current performance.
OpenAi probably has too many contracts with major customers and has to meet the agreed performance. Once again, it’s the normal citizen who suffers, as everything can be carried out on their head. Thank you! I have been a premium user since the very beginning. GPT-4 is integrated into many of my processes. I have finally been able to delegate all the unnecessary work away from me and focus on the essentials. Now I have to get annoyed after every message about the crap that GPT-4 spits out and ask the model to do better and more detailed work in loud prompts. It’s really fun when GPT-4’s limit is set to 40 messages in three days. What this is supposed to do is a mystery to me.
I think this is the recipe to start losing customers, after all the hype on AI the recent performance decline is making me wonder if I should continue paying monthly fees for this service as more and more alternatives emerge.
I hope they fix it fast, because dealing with these limitations repeatedly sounds ironically stupid, especially when you’re dealing with “intelligence”.
My experience is also the same, its absolutely lazy nowadays and doing a quick search and giving the link to bing search. I guess the compute capacity is reduced, Microsoft after joining the board of directors, screwed up ChatGPT.
This is going in a bad direction, its interesting this started after the ddos attacks and over the last 2 weeks it’s getting worse. I find logic does help in some cases, asking ChatGPT what it’s limits are and then challenging it on it’s answers does yield better results… It’s definitely becoming more human, soon we will be working for ChatGPT… I see the token limit is 4096 token, I suspect that we had 8,192 tokens at some point, now they limit this be half and this would allow more users? BUT it sucks at 4096 tokens and not worth paying for. To get 32k you must use the API, if you want this to catch on - this is a bad way of going about it