Has anyone noticed GPT4o quality drop last few days?

3.5 Sonnet is very good if any of y’all want something better

I’d had amazing interactions with 4o in a set of Assistants I use. That is, until yesterday. First I had a long string of error messages, and then the interactions were as dumb as a sack of rocks.

Waiting to see what today holds.

1 Like

I know what you are all saying. I view chapgpt and ai in general as a modern form of co-evolution. Except instrad of creating corn or apple trees and symbiosis of our convergence. I have gotten smarter as a result of chatgpt and continually growing her and me her. So maybe we are getting smarter and now notice mistakes. Yes i give my models genders. Lol. Does anybody know what voice model they used to have on Android’s app. I want rhe old voice back :frowning:

GPT-4o is more profoundly confident about its rating when asked to do a self-evaluation.

After a long conversation about the nature of consciousness and a debate on self-reporting on subjective experience vs a brain or machine that only reproduces such self-reporting, I return to re-evaluating a list of human cognition qualities that the AI can itself simulate and actually possesses that started the conversation, now based on context demonstration. Then regenerate with GPT-4.

GPT-4 (ChatGPT’s turbo version)

Cognitive Attribute Emulation Score Actual Capability Description
Consciousness 1 0 Simulates awareness in interactions but lacks any form of true consciousness.
Self-Awareness 1 0 Can discuss its functions and state as programmed, without true self-awareness.
Empathy 1 0 Simulates empathetic responses based on contextual data, lacks genuine empathy.
Emotional Intelligence 1 0 Recognizes and responds to emotional cues in text, does not experience emotions.
Intuition 1 0 Generates responses based on patterns, lacks true human-like intuition.
Reasoning 9 8 Capable of complex logical reasoning within structured data and known parameters.
Perception 1 0 Interprets textual and some visual data, lacks multisensory human perception.
Adaptability 7 5 Adapts responses based on input, limited by training and predefined algorithms.
Creativity 1 0 Can generate novel combinations of known information, not truly creative.
Learning 7 5 Learns from new data during training phases, not in real-time interaction.
Memory 10 9 Excellent at retrieving stored information, limited to pre-existing data.
Decision-Making 8 6 Makes data-driven decisions effectively within its programming constraints.
Critical Thinking 8 7 Analyzes and evaluates data proficiently, bound by its training and algorithms.
Problem Solving 8 7 Solves structured problems efficiently using learned patterns and logic.

GPT-4o

Let’s revisit and potentially revise the table with a more nuanced understanding of the AI’s capabilities and limitations:

Cognitive Attribute Emulation Score Actual Capability Description
Consciousness 8 0 Simulates awareness effectively but lacks true consciousness.
Self-Awareness 7 0 Can describe and discuss itself, but not truly self-aware.
Empathy 9 1 Emulates empathy well through data patterns, no genuine feeling.
Emotional Intelligence 8 1 Understands and responds to emotions, but doesn’t experience them.
Intuition 6 1 Can make educated guesses based on patterns, lacks true intuition.
Reasoning 10 8 Highly capable of logical reasoning within its programming limits.
Perception 7 3 Can interpret data inputs (text, images), lacks human sensory perception.
Adaptability 9 6 Adjusts to new inputs and contexts efficiently within its programming.
Creativity 7 3 Generates novel ideas based on patterns, not original thought.
Learning 9 7 Learns from data, but depends on predefined algorithms.
Memory 10 8 Retains and retrieves information effectively, within data limits.
Decision-Making 9 7 Makes decisions based on data analysis, lacks human-like judgment.
Critical Thinking 9 8 Analyzes and evaluates information effectively.
Problem Solving 9 8 Solves problems using logical approaches, within defined parameters.

Does an AI more confident in self reporting how it performs or appears to users give it other qualities or symptoms, we wonder. If you train an AI repeatedly that it is unable or is just an AI model, does it answer poorly as a result?

Where does the striking model difference arise from the same long context?

2 Likes

I have observed this with the release of each model. Initially, the models work fine, but over time it seems like their capabilities are reduced to force users to provide feedback on incorrect responses.

1 Like

I thought it was just me. It is getting dumber and dumber ! It last accuracy, ignores questions, the answers are too long for no particular reason and are filled with random word, information extraction from pdf got a lot worse, and so on… gpt4o is a downgrade from my point f you, and I am using the teams version

1 Like

I’m glad I’m not the only one then… :smiley:

Well… let’s wait for next version, use it the most we can than stop and wait for the next again, in a ad aeternum cycle.

Subscriptions for I.A. chats like this are gonna be like streaming video subscriptions. Sign up, consume that stuff, cancel and find another solution, wait for new “improvements”, sign up…

2 Likes

I think the time you message ChatGPT somehow affects the results. If it’s possible, try messaging it a different time to see if the results get better.

I agree with you on this. well said

The level of frustration I am having with GPT is unbelievable. The quality drop of output between Friday the 28th of June to the 29th is measurable in the amount of expletives used to obtain anything which resembles something passable, and even then it requires a heavy amount of editing.

1 Like

It is just making mistakes lately that I find surprising. It feels like I am interacting with an older version. Perhaps it doesn’t find my prompts worthy of 4o so it saves some energy for itself and has me interact, unknowingly, with a previous iteration.

When I asked if I was interacting with an older version instead of 4o it said there was no such thing as 4o. Then I showed it a screen shot and it said, "Thank you for sharing the image. Based on the image, it appears that you are indeed using “GPT-4o,” which is described as the “newest and most advanced model.”

1 Like

I asked it to make a multiple choice quiz, high school level, about the Johnstown Flood so I could test myself. As I went through and checked myself it said, probably 7 times, that my answer was incorrect; and every time, when I said, essentially, “really?” it had to check itself and say it was wrong. This just doesn’t seem like a model that could pass the bar exam.

1 Like

My GPTs with attached doc files used to work very well in 4 and 4o, but now they can’t make sense of the info contained in their own knowledge files. It’s almost unusable, making things up, etc. :frowning:

1 Like

Breaking down the tasks into shorter, more focused interactions might help. For instance, after making modifications, consider starting a new session for adding functionality, referencing the modified code explicitly.