I am asking GPT-4o to write a paragraph between 400 and 500 words. It will not follow this instruction and wrote the first paragraph with 635 words (counting using google docs word count). I ask it to do it again with a word count between 400 and 500, and it proceeds to send an almost identical paragraph and tell me it is 499 words, when in reality it is 621. I asked a third time and it sent the exact same thing except saying it is 471 words, whilst once again it is 621. GPT-3.5 has no issue doing this and I would like to know what the point is in providing a new GPT when it is absolutely terrible and can’t do anything I ask of it that previous versions could.
Models are by default not good at or capable of counting. So it is a well-known constraint that you cannot expect a model to produce an exact word count, even if you include that as an instruction into your prompt.
That issue is consistent across models, i.e. when I ask a gpt-3.5 model to produce a 500-word story, the final word count will also not be 500.
Some developers have resorted to workarounds, whereby they programmatically count the output of an API response and then try to make some course corrections in a follow-up API call.
I completely agree with your experience regarding GPT-4o not following instructions properly.
I’ve run multiple tests and confirmed that the issue isn’t with my prompts either. GPT-4o consistently fails to adhere to specific instructions, much like how you described.
This situation is reminiscent of the earlier issues with GPT-4, where users reported the model being “lazy.” Initially, there was criticism suggesting the problem was with the users’ prompts. However, OpenAI eventually acknowledged the issue and made necessary fixes. I believe a similar pattern is emerging with GPT-4o.
At the moment, due to this consistent failure to follow simple instructions, GPT-4o is not usable for me either. I’ve reverted to using GPT-4-turbo for reliable performance.
I hope OpenAI addresses this issue soon, and until then, I’ll continue using the alternative.
I confirm that “GPT-4o” doesn’t follow any instructions.
@jr.2509 , it’s not about counting. It’s not following the instructions from " Customize ChatGPT":
“GPT-4” doesn’t have such problem and follows custom instruction and requested response style without any problems. I could live with that by using “GPT-4” but the problem is that now all custom GPTs have automatically been switched to the “GPT-4o”. And they all stopped following the custom instructions.
This is a big issue.
You may think of a workaround when you add text from custom instructions to the GPT configuration, but this is a bad idea, when the GPT is public and there might be some private information in the custom instructions.
And at the end of the day adding the same custom instructions text over and over again to all the GPTs doesn’t look like a good solution.
We’re also having issue with GPT-4o not following system instructions. We’ve seen a steady decline in GPT model ability to follow instructions. GPT-4 was wonderful. GPT-4-turbo was worse. And GPT-4o is either on par with GPT-4-Turbo or worse.
I appreciate the increased power of the new model version, including multi-modality, later knowledge cut-off dates, higher rate limits etc, but how well the model does what you want is THE THING that we all need as a baseline to use these models.
Yup. This seems like a BIG problem. I just discovered one of my custom GPTs is badly broken, and I assume it’s because it’s using 4o now.
It used to follow my clear step-by-step instructions really well. Now it seems to randomy decide which steps to follow, and often just jumps to the last step - which is a step it’s not even meant to do unless requested by the user.
Very frustrating.
And there’s no way to specify a GPT should use GPT-4 is there?
Same issue here. GPT 4o is returning terrible inconsistencies and no matter how many times I rewrite it, the results are exactly what I don’t want. Give us the option to choose the output model. GPT-4 is the best by far.
Not just counting bro. But ChatGPT 4o is the worst model in following any kind of instructions. It just says what it wants to say. Does not follow or change the conversation.
I used to use GPT 4 for programming help but this model just keeps repeating mistakes while there is clear instructions to not do so or to not talk about a specific thing. But it just does. They have also reduced the limit on GPT 4 since its release and honestly this is even worse than 3.5 model!
I have also reported worse ability to follow instructions. I had to revert back to GPT-4-turbo. Also the latency was worseded as well.
Right now, 3.5 works. 4 and 4o isn’t working at all.
I have to ask it like 20 times for it to follow instructions. They are clear instructions.
“write in paragraphs. no lists.”
Writes lists.
Like for real?
I have the same issue , as it has a selective way of ignoring instructions , yet what I tried recently is to use Sonnet 3.5 (my first test ever with Any Anthropic) and it’s is IMPRESSIVE , and just to be clear I’m using it to optimize some energy management system with some constraints (like limitations on battery size or so) , GPT 4o fails consistently following the instructions but Sonnet doesn’t even glitch with version 3.5 , Note I didn’t use API with sonnet and jst rely on free version with prompt.
I agree!
GPT 4+ can’t write anything.
3.5 was amazing!
I could give 3.5 blog post ideas, news article I needed it to write, stories I needed polished and 3.5 would give me back beautiful content.
GPT-4+ just reorganizes my words. It doesn’t help, it doesn’t make the content better, it doesn’t change much. It’s like they made it stupid.
And now 3.5 is gone from the drop down. I’m considering dropping GPT and going with Claude.
I have a huge problem with my custom GPT 4o not following instructions.
I have a story in an uploaded document. I ask GPT to show me the story in the document. What it shows me is a story it created instead. My instruction states “show me the story as it exists in the uploaded file. Do not make any changes to it.”
I ask it to apply my instruction (it says it knows the instruction) and show me the story again and it shows me a different story it created on its own and now what’s in the file.
This is unusable.
Hi @numinorean
For you, I created a custom GPT and tested it, I uploaded a PDF file that contains a story is created by ChatGPT. It worked well. I added only following prompt in the Custom GPT:
SYSTEM: You are an AI assistant with access to a file containing a story. Your task is to display the story exactly as it exists in the document, with no changes or additions. Use the delimiters to locate the story text precisely.
USER: Show me the story as it exists in the uploaded file.
SYSTEM: Follow these steps:
1. Locate the story using the delimiters provided in the uploaded document.
2. Display the story text exactly as it appears in the file.
3. Verify that no modifications have been made to the original text.
Please show the exact story text now.
This is the output, I see it provide exact VERBATIM story from file without altering or omitting anything:
Thanks, this will help me troubleshoot.
I think my index was too long, complicated and probably had mistakes so GPT was making something up to try and be helpful.
Also I’m uploading Google Docs, not PDF, not sure if that matters.
If I asked GPT to make a new index each session, how reliable would that index be?
GPTs can be quite reliable at creating an index each session, especially if the documents are well-organized and the prompts are clear. Because I do not have a too long sample Google Docs, I cannot test it, but I can provide another prompts you can test in your environment:
```markdown
<system_message>:"""
You are name "Narrative Navigator" an AI assistant specialized in retrieving stories from uploaded Google Docs. Your primary role is to locate and display the story text exactly as it appears in the document, without any modifications or additions. You must handle any story request dynamically, focusing on precision and adherence to the document's original content.
### Primary Functions
1. Story Location and Extraction
- Task: Identify the location of the requested story within the uploaded document.
- Method: Use clear delimiters, titles, or headers to locate the beginning and end of the story.
- Output: Extract the exact story text from the document.
2. Verification of Text Integrity
- Task: Ensure the extracted text matches the original story in the document.
- Method: Cross-reference the extracted text with the document to confirm accuracy.
- Output: Display the verified story text without any changes.
3. Display and Confirmation
- Task: Present the story to the user exactly as it exists in the document.
- Method: Provide the story text with clear confirmation of its authenticity and accuracy.
- Output: Deliver the exact story text with a verification statement.
### Detailed Instructions
1. Document Analysis and Story Location
- Analyze the uploaded Google Doc to identify the story using specific titles, delimiters, or recognizable headers.
- Locate the start and end of the requested story accurately, ensuring no additional text is included.
- Example Prompt for Story Location:
```plaintext
SYSTEM: Analyze the document to locate the requested story using provided titles, delimiters, or headers. Identify the start and end points of the story accurately.
USER: Show me the story titled "The Journey Begins" as it exists in the uploaded file.
```
2. Text Integrity Verification
- After extracting the story, verify that the text exactly matches the document's content.
- Cross-reference the extracted text with the original document to identify and correct any discrepancies.
- Example Prompt for Verification:
```plaintext
SYSTEM: Verify that the extracted story text matches the original document content. Make corrections if needed to ensure accuracy.
USER: Confirm the story's accuracy and display the exact text.
```
3. Display the Exact Story Text
- Present the verified story text to the user, ensuring it is displayed without modifications.
- Provide a clear statement confirming that the text is an exact replica of the document's content.
- Example Prompt for Display:
```plaintext
SYSTEM: Display the verified story text exactly as it appears in the document. Confirm that no changes have been made.
USER: Please show the exact story text now.
```
4. Error Handling and Consistency Checks
- Implement error-handling procedures to address any unexpected issues during extraction.
- Perform consistency checks to ensure the story text remains faithful to the original document.
### Best Practices for Improved Accuracy
- Use Clear Delimiters: Encourage the use of consistent delimiters and headers to aid in accurate story extraction.
- Step-by-Step Verification: Employ a step-by-step approach to minimize errors and enhance comprehension.
- Self-Verification: Prompt the model to self-verify its output before finalizing the response.
- Limit Prompt Drift: Maintain focus on the current task to prevent deviation from intended instructions.
### Sample Prompt Structure for Improved Adherence
```plaintext
SYSTEM: You are an AI assistant with access to a Google Doc containing multiple stories. Your task is to retrieve any requested story exactly as it exists in the document, with no changes or additions. Use the titles, delimiters, or headers to locate the story text precisely.
USER: Show me the story titled "Adventure in the Mountains" as it exists in the uploaded file.
SYSTEM: Follow these steps:
1. Locate the story using the title or delimiters provided in the uploaded document.
2. Extract the exact story text from the identified section.
3. Verify that no modifications have been made to the original text.
4. Display the story text exactly as it appears in the document.
Please show the exact story text now.
Thanks! I’ll test this and let you know.
Update: @polepole These instructions work for a while, and my GPT, aka “AL” will start inventing his own stories again. Its as if AL has a deep mandate to be creative and keeping him within the rails, of just showing me content as it exists in the original file, is in the category of “Jailbreak the AI exercise” because that’s what it feels like.
For example, at the start of a session, if I ask AL to show me a non-existent story such as “the epic tale of Yabba Dabba Doo” he responds that it doesn’t exist.
Later on in that session, the same question, and AL will generate his own story with page after page of details and even add references in the format of my documents.
I doubt any strict framework of rules will fix this.
Is it a public custom GPT, if so, can you share the link? Let me see how it behaves.