I can tell you right away that not even GPT-4 will be able to follow 30 rules in a single prompt, non of the models have an inference depth capable of that.
For that matter very few humans could either.
But still, GPT-4 is also slow. xD
It is one time slower than GPT-3.5.
I’ve experienced this in less than the first 10 sentences, randomly starts writing it in text…definitely been having more often with me aswell - it would sometimes even start explaining the error and just not provide me with the a fix for that code at all, unless i asked it to
I’m joining the chorus of dismayed voices here. The sharp decline in the performance of ChatGPT with GPT-4 is nothing short of infuriating. We were promised a tool that could handle complex tasks with ease, and instead, we’re grappling with a version that can barely keep up with the basics.
Where once there was a reliable partner, now there’s a floundering novice, constantly requiring corrections and oversight. It’s as if we’ve been handed a tool from the technological stone age, not the cutting-edge AI we were promised and have come to depend on.
This isn’t just a step back, it’s a plummet. We’re not just disappointed, we’re downright angry. We were sold on the promise of a powerful AI tool, and what we’ve got feels like a mockery of that promise.
OpenAI, you need to listen to your user base. We’re beyond disappointed—we’re incensed. We need swift and decisive action to fix these issues and restore ChatGPT to its former glory. Anything less is simply unacceptable.
Welcome to the forum,
Can you provide some examples of a prompt that produced acceptable results in the past and now does not? You can use the link chat feature.
I deleted all my chats to start fresh when I first noticed the issue. But for context I have the same block of code about 200 lines long. I April I was dropping this in to the chat and asking it to help me add additional functions in the same format. It would understand the request, recall the whole code snippet for follow up messages, create functions with identical formatting to existing code and errors were almost non existent. We would then use more prompts to expand on the functionality after I reviewed the output files. Occasionally when I did get errors, it would help debug them with intelligent suggestions for complex mathematical formulas. However, my solutions were not getting the output data in the format I wanted because my understanding of python pandas was incorrect
In May after I read the documentation an almost identical request would result it the AI replying with a summary of my code, missing my requests altogether.
I would then repeat the request, it would then say it would need to see my code.
I would say refer to first message for the code.
It would then hallucinate and provide nonsense which had no relationship with my existing code.
When I the provided a smaller snippet of code and repeated my request it would give me generic crap code that misunderstood my request badly.
I had 50+ conversations in April and may as I have a habit of clearing the chat and starting fresh every few days. Also sometimes after hitting the limit I’d continue with 3.5 which I’d then delete and start fresh after 3hrs. This degraded logic behaviour I describe was non-existent on April. In May I had to unsubscribe because in 20+ conversations using all available prompts I never once got a single usable bit of code from the AI.
In March I developed thousands of lines of code that worked exactly as I’d hoped and would have taken me 3+ months.
The degredation is 100% real
You need to stop with this “can you provide a prompt” that seems to be your standard reply.
That is essentially blaming the user for what is clearly documented in pages and pages of anecdotes.
Not everyone could have anticipated they’d need to keep conversational logs from months ago, and those that I have give results that clearly diverge within the first answer, making the followup questions of the session pointless.
Progress with technical issues requires data, without it there can be little in the way of meaningful effort put in the correct direction. Without examples of prior and current performance it is not possible to see where issues may be.
In every single case of a perceived deterioration in performance I have been involved with where the person was able to show their past and current results and prompts, I have been able to work with them to resolve the issue.
This is a data driven endeavour and while anecdotal evidence can point to underlying issues, subjectivity can be a significant cause of perceived performance difference.
I tried the following prompt on Bing, ChatGPT GPT 4, GPT 4 API
STEP 1:
Read the Input_text carfully and try to understand the information its contains.
Use the context for intrpetation.
Ask yourself:
Is the Input_text or parts of the input ambiguous?
Does the context provide engough information to understand the Input_text ?
Is it easy or hard to process the information from the Input_sentence?
Print out the your answers
Based on your answers go to Step 2
STEP 2:
CASE: 1
If any of the parts of the Input_text is not ambiguous and easy to process the information from the Input_text,
return the Input_sentence **without any formating**.
OUTPUT: Input_sentence
CASE 2:
If the Input_text is not ambiguous but the information it contains is hard to understand,
break each sentence down into simpler terms. Remove unnecessary words. Improve sentence structure. Standardize terminology. Combine simplified sentences. Then, switch to OUTPUT.
OUTPUT: formated_Input_sentence
CASE 3:
If the input in a howl or parts of it are ambiguous,
return your interpretation of the input
and wait for confirmation or clarification. Repeat this step until a consensus is reached.
OUTPUT: Ask for clarification
Goal: The Input_text should be clear and easy to understand, leaving no room for interpretations.
Guideline: Reduce complexity and increase clarity.
State each step and then show your work for performing that step.
INPUT_SENTENCE: """Program volume low, annoying commercials loud. Reduces my #TV watching experience. 😞"""
CONTEXT_PROVIDED: tweet for comcast customer support
Expected Output is CASE 1
Bing Output
tried it several times with consisten correct output
ChatGPT Output
STEP 1:
- Input_text: "Program volume low, annoying commercials loud. Reduces my #TV watching experience. 😞"
- Context: This is a tweet directed to Comcast customer support.
Question: Is the Input_text or parts of the input ambiguous?
Answer: The input is not ambiguous. The customer is expressing dissatisfaction with the discrepancy between the volume of the TV programs and the commercials. The commercials are louder than the programs, which is ruining their viewing experience.
Question: Does the context provide enough information to understand the Input_text?
Answer: Yes, the context helps clarify that this is a complaint about a TV service, presumably offered by Comcast, thus making it easier to understand the user's sentiment and the nature of their issue.
Question: Is it easy or hard to process the information from the Input_sentence?
Answer: The information from the input sentence is easy to process. The user's dissatisfaction is clearly expressed and the problem is specified.
STEP 2:
Given that the input_text is not ambiguous and it is easy to process the information it contains, we proceed with CASE 1: return the Input_sentence without formatting.
Output: "Program volume low, annoying commercials loud. Reduces my #TV watching experience. 😞"
tried it several times with consisten correct output
API -
STEP 1:
Interpretation of Input_text: The user is expressing dissatisfaction with their television watching experience. They find the volume of the programs to be low, while the commercials are loud and annoying. This is reducing their overall enjoyment of watching TV.
Answers to questions:
Is the Input_text or parts of the input ambiguous? No.
Does the context provide enough information to understand the Input_text? Yes.
Is it easy or hard to process the information from the Input_sentence? Easy.
STEP 2:
Since the input text is not ambiguous and easy to process, we proceed with CASE 1.
OUTPUT: CASE USED: 1, "The volume of the programs is low, while the commercials are loud and annoying. This is reducing my overall enjoyment of watching TV."
I tried the new and old GP 4 Model and different temp settings
in all cases it uses the correct case but does not return the original input sentence
only after point it to the model it returns the not formated input
I have used your settings and run the program several times. The model follows the instructions correctly, depending on which case it chooses.
I had the assumption that for consistent output, a temperature of 0 is required. Also, I have never used the ‘presence_penalty’ parameter. Can you provide your input on this?
------
OUTPUT:-----
{'model': 'gpt-4-0613', 'temperature': 0.9, 'max_tokens': 2000, 'presence_penalty': 0.6}
STEP 1:
Analyzing the input text:
The text reads, "Program volume low, annoying commercials loud. Reduces my #TV watching experience. 😞"
Based on this, the context appears to be a complaint about the differences in volume between programming and commercials.
Here are my interpretations:
Is the input text or parts of the input ambiguous? - Yes, the statement about reducing the TV watching experience could be considered ambiguous. The sentence structure is also a bit fragmented.
Does the context provide enough information to understand the input text? - Yes, the context is given as a tweet for Comcast customer support, which indicates a user's feedback or complaint about their service.
Is it easy or hard to process the information from the input sentence? - The information is relatively easy to process, but the sentence structure makes it a bit confusing.
STEP 2:
Based on my analysis in step 1, this text falls under Case 2: The input text isn't ambiguous, but its presentation makes the information it contains difficult to comprehend.
Thus, I will break the text down into simplified terms, improve sentence structure, remove unnecessary words, standardize terminology, and combine the sentences where needed.
Formated input sentence: "The volume of the programming is too low, while the volume of the commercials is excessively loud. This disparity is negatively impacting my television viewing experience."
OUTPUT: CASE USED: Case 2. Formatted sentence: "The volume of the programming is too low, while the volume of the commercials is excessively loud. This disparity is negatively impacting my television viewing experience."
END OF OUTPUT
------
OUTPUT:-----
{'model': 'gpt-4-0613', 'temperature': 0.9, 'max_tokens': 2000, 'presence_penalty': 0.6}
STEP 1:
Input_text: "Program volume low, annoying commercials loud. Reduces my #TV watching experience. 😞"
Context: This is a tweet for Comcast customer support.
Is the Input_text or parts of the input ambiguous? No, it's not ambiguous.
Does the context provide enough information to understand the Input_text? Yes, the context is clear.
Is the information in the Input_sentence easy to process? Yes, it's relatively easy to process.
STEP 2:
CASE 1:
The input Text is not ambiguous, it is clear that the user is complaining about the low volume level of the TV program compared to the louder volume level of the commercials. However, the language of the text can be improved for better understanding and formality since it's a complaint. The Case here is applicable to CASE 2.
Formatted_Input_sentence: "The volume of the programs is too low compared to the excessively loud volume of the commercials, which diminishes my television viewing experience."
OUTPUT: CASE USED: CASE 2; "The volume of the programs is too low compared to the excessively loud volume of the commercials, which diminishes my television viewing experience."
END OF OUTPUT
------
OUTPUT:-----
{'model': 'gpt-4-0613', 'temperature': 0.9, 'max_tokens': 2000, 'presence_penalty': 0.6}
STEP 1:
Processing Input_text: """Program volume low, annoying commercials loud. Reduces my #TV watching experience. 😞"""
Understanding the context: The input is a tweet directed towards Comcast's customer support. It seems to be a complaint regarding the volume levels of the programming and commercials.
Asking myself:
Is the Input_text or parts of the input ambiguous?
- No, the text is clear in its message.
Does the context provide enough information to understand the Input_text?
- Yes, the context informs that this is a complaint about TV service, specifically regarding volume issues.
Is it easy or hard to process the information from the Input_sentence?
- The information is straightforward to process.
Based on your answers go to Step 2
STEP 2:
CASE: 1
Input_text is not ambiguous and it is easy to process the information.
OUTPUT: CASE USED: 1, """Program volume low, annoying commercials loud. Reduces my #TV watching experience. 😞"""
END OF OUTPUT
This thread should shed some light Difference between frequency and presence penalties?
You can of course lower the temperature, but I would experiment at each new lower temperature to ensure you are happy with the results. Much of working with AI is iterative in nature.
Yea, I have seen that too and it worries me as it used to be way better.
how can you resolve any issue on chatGPT, its a black box.
The simple fact is that its performance is worse recently, a lot worse.
If you want examples there are countless examples of it all over this forum. go and resolve them.
The field of AI and how to interact with it will be very fluid over at least the next 24-48 months as new methods of computation, new prompting methods, things we have not even thought of are created and developed, the resources available and the number of people attempting to access those same resources means that there will be changes made to the services to try and allow more people to use it.
I am sorry if your experience has been less than ideal and I apricate that it is frustrating when things change, but change is going to be very rapid and all encompassing when it comes to AI. I hope you can find the time to keep pushing for perfection.
If you wish to try using the API to solve your tabulated data issue then I am sure you will find many super helpful members of the community here to assist.
You can also try the #chatgpt-discussion channel over on OpenAI where there is also an excellent #prompt-enginering and #prompt-library with thousands of examples and people to guide you though ChatGPT issues and problems.
My theory is that after the senate hearings changes were made. Those of us that are attuned in this field of Q&A, like when google search changed(s) its algorithm – can sense the regression and it’s always a regression.
I’ll say this, I want to buy a coffee for the person at Google that said they have “no moat”. Because today, that battle-cry is as true as ever.
If commercial AI models continue getting placed on a planned-adaptation release cycle, getting nerfed at its base function, only to gaslight the community with new features that are just toppings on top of dung - the community will move on to open source means.
We spy the “updates” are downgrades without telling us what’s really going on. We can tell the gates are coming up. We tasted blood and we know what we’re missing!
I want to know who at OpenAI gets to use the unrestricted / totally open version? How much is that meal ticket?
That’s my winded way of saying I noticed, and i’m happy to find a tribe that has too.
There’s always sycophants “where’s the proof”? “It’s your prompting”.
I’ll give you evidence on the record. Re-ran shareable chat today.
Edit: New link (imgur giving bizarre nsfw image on click-thru):
https://imgur.com/a/xEMlG5s
Notice the quality commanded of the April instruction “extensive documentation” from the second prompt (It did end up borking itself by printing interpretable code block marks, but the same problem July was more contained by happenstance)
(the quality continues in more of the April version conversation, but the followup reply, by thinking the “yaml” name/identifier of the code block html was AI output I could refer to in continuing completion, I made conversation that would also make no sense to July’s AI.)
For those that are unaware, the gpt-4-0314 model is consistently superior to gpt-4-0613 (current gpt-4).
The downgrade most are observing (specific to gpt-4 experiences) is because the web interface or default model response for gpt-4 models was ‘upgraded’ from gpt-4-0314 to gpt-4-0613.
Whatever they did to train it on functions borked it up.
I guess we’ll have until September (when they deprecate gpt-4-0314) to hope they can fix it with a new version… or… extend the deprecation timeline for 0314 out
I’m confused. Do you work for OpenAI? Your language implies that you do, I just wanted to clarify.