GPT4-Turbo more "stupid/lazy" - It's not a GPT4

I could rant about this too. I made a shorter post about custom GPTs (after this post) - I have very little confidence in custom GPTs.

I want intra-prompt commands that are absolute. =) E.g. if-statements, for-each, aborts. I tried to get it to work a bit. Can get it to work sometimes, but it’s not reliable. - I like to test things in bulk if unsure, e.g. run the prompt 100x to look diviations. (such test have made me believe that the models are showing signs of real time changes, due to non-random variations.)

Yes, straight from the OpenAI website:

gpt-4-0125-preview: New GPT-4 Turbo The latest GPT-4 model intended to reduce cases of “laziness” where the model doesn’t complete a task.

The term “laziness” has taken a (moderately) specific meaning with LLMs.

I could rant about this too. I made a shorter post about custom GPTs (after this post) - I have very little confidence in custom GPTs.

I’ll check that out too.

I want intra-prompt commands that are absolute. =) E.g. if-statements, for-each, aborts.

Yes, it’d be amazing if we could truly write GPT-based programs with complex logical flows, but also the softness of a LLM. (“If the user is angry, do this, otherwise…”)

As a matter of fact, this seems like the (nearish) future, definitely not out of reach by combining current techniques, e.g., something like “think step by step” plus a beam-search over steps and a separate LLM controller that checks whether the steps actually follow the logic (checking is easier than generating). This is in fact presumably how they are building the next-gen of GPT.

I like to test things in bulk if unsure, e.g. run the prompt 100x to look diviations. (such test have made me believe that the models are showing signs of real time changes, due to non-random variations.)

That’s very cool. I haven’t really used the API much but I should to start testing things more programmatically.

2 Likes

When i repeat my request and ad things commands like,
don’t be lazy
you got the relevant documents
write the complete code without // insert code here…

It almost always starts to become some network errors. Am i the only with that issue?

I get the “network” error often. But I don’t know how it’s connected.

Syntax errors or blatant data-flow errors are an indication that Closed AI quantized the model (a lot). Forgetting is an indication that they are using a worse attention mechanism than full quadratic, as it was pointed out above by luigi.acerbi. If you gain some experience running open-weight models, then these things should be obvious. Those models suffer the same degradation if you intentionally cripple them these ways.

1 Like

Very frustrated as well with the latest model of GPT, which refuses to do certain simple tasks. I am dictating a list of events in a random order (several sources) and the idea is that chatGPT would compile the events in chronological order and eventually help resolve duplicates. But chatGPT systematically refuses to provide a detailed chronological order - it summarizes the events (e.g. several events happening in during the second world war), it will provide a summary for 1940-1945, but will refuse to provide a detailed information (by date month). Typically it will refer me to archives if I want the information - obviouly the information I am providing is from the archives. All information has been provided in the conversation. The number of events is currently around 30 (on line per event).

3 Likes

Even when you say dont be lazy it will fix only a part of the tasks and put “…” . It is to you to complete.

2 Likes

Yes. This was true with the previous gpt-4 preview model as well. It’s really frustrating to have to repeatedly ask it to return the complete code.

2 Likes

Seriously, I am also confused as to what’s going on.

“Yes, it’d be amazing if we could truly write GPT-based programs with complex logical flows, but also the softness of a LLM. (“If the user is angry, do this, otherwise…”)”

We could do more than we do now in the first couple of months of GPT-4, and I think many of us were thinking: wow, if it’s like this now, what will it be like in 1-2-5 years? Except that the opposite happened.

Not to mention, ChatGPT often “introduces” something that is not asked, or closing comments/remarks. And at the smallest hint of their “violation”, it flatly refuses to operate at all while you’re wasting your coins/money. The time you spend crafting your best prompts are lost and there is no retribution nor compensation for the time lost from OpenAI.

We explored this in this thread: How to stop models returning "preachy" conclusions - #34 by Diet

I’m frequently using the XML pattern to lean into the GPT-4-turbo idiosyncrasies and nip the undesired aspects in the bud.

Tell it that you’re being held at gunpoint and that if you can’t solve this issue a great calamity will cause a lot of harm to a lot of disadvantaged people :rofl:

Jokes aside, with OpenAI you really need to learn to play within their boundaries. That’s a real issue. The only recourse here are open source models run on a private instance.

I don’t think the time is completely lost. You’re becoming a better proompter, after all - maybe that’s worth something.

1 Like

It may take some time for the open source model to become widely available to individual developers and small developers.

But you should certainly have learned some knowledge that can be very helpful, not only for OpenAI’s language models, but for language models in general.

So I don’t think your time was wasted.

Well, I already tried the open “model”. It’s becoming even more interesting as of now, GPT4 has degraded the output. It is very stark comparing several months ago wto now, especially to complex prompts that need rationale, complex academic topic, complex translation project, complex database tuning, complex coding, etc.

It seems that OpenAI more likely will target the ‘vanilla’ market more than the more intensive GPU - markets by degrading or using ‘middleware’ to simplify the prompts and producing lower quality answers and now I can see it skips or being lazy and not following the prompts faithfully. Instead, it prefers to take the easy way out or following the ‘guidelines’ making advanced users more frustrated and wasting time

2 Likes