GPT-4o - Hallucinating at temp:0 - Unusable in production

I agree with your point. GPT-4o offers improved reasoning capabilities. However, it seems more constrained and tends to forget instructions more frequently, like GPT-3.5. Additionally, its tools call often do not work properly. At present, I think it is not reliable for critical or sensitive tasks.

People getting this Problem

Problem:
the service is down? I get this error every time I write a message to chat gpt: Something went wrong when generating the reply. If this problem persists, please contact us via our help center at help.openai.com.

What is the problem and how to solve, Did you know?

That sounds like an interesting challenge. Have you tried cross-referencing the fake information generated with the original data to identify patterns causing the hallucination? It could help pinpoint the root cause of the reliability issues. Sharing more details with the team might lead to valuable insights.

Yes @PaulBellow, fairly complex. It involves proofreading and removing line breaks, as well as adding markdown to text. Multi sentence instructions which I wish I could shorten, but then I can’t get the right results.

My context contains JIRA issues that the AI is then asked questions about: How many open issues do I have? Please summarise my open issues, etc. I 'm finding that even though the exact data is present in the context 4o will hallucinate entire issues and report them as in progress when they do not exist at all. Currently troubleshooting this behaviour. Will post if I find anything useful.

Here is an example of legal document risk summary based on 54 controls groped by themes and ordered by their importance defined in a policy configuration file:

Risk Summary

The document fails to specify the countries where personal data is hosted and provides an insufficient description of the services justifying data processing. Additionally, it lacks commitments to data encryption, which is crucial and contrary to your requirements. The contract also does not define its purpose or duration, which is not compliant with your standards. These elements were identified based on the requirements configured in your contractual policy and categorized according to the risk levels you have defined.

Parties’ Obligations

The contract includes a clause that obligates all parties to comply with current regulations, aligning with your policy.

Description of Processing

The contract has deficiencies in describing the services that justify the processing of personal data, which does not meet your requirements. However, it adequately details the categories of individuals concerned, the types of data processed, the purposes of the processing, and the nature of operations performed on the data, ensuring compliance with these control points as per your policy.

Subcontractor’s Obligations

The contract omits specifying the countries where personal data is hosted, which does not meet your standards. Nonetheless, it ensures full compliance on all other points, including requiring subcontractors to receive necessary training, incorporate data protection by design and by default, process data solely for specified purposes, and strictly follow the data controller’s instructions. It also guarantees data confidentiality, commits subcontractors to comply with all contractual obligations, and requires subcontractors to provide the same assurances as the initial subcontractor while remaining fully responsible for fulfilling these obligations.

International Transfer

The contract explicitly prohibits the transfer of data outside the European Union by the subcontractor, in line with your policy requirements.

Data Subject Rights

The subcontractor is obligated to assist the data controller in responding to data subject requests, which must be handled via email. Additionally, the data controller must inform individuals about the processing of their personal data. These clauses comply with your policy requirements.

Data Breaches

The subcontractor is required to notify the data controller of any personal data breaches, in accordance with your policy requirements.

Security

The subcontractor does not commit to implementing data encryption or pseudonymization measures, which is significant and contrary to your encryption requirements and non-compliant regarding pseudonymization.

Miscellaneous

The contract’s purpose is undefined, which does not meet your standards. Additionally, the contract duration is not specified, representing another non-compliance. However, the contract’s effective date is clearly indicated, which aligns with your requirements.

(Translated from French by ChatGPT4o on Android app)

The whole RAG engine uses GPT4o and weaviate (multi steps). I personally never get to temp 0 as I think/suppose zero temp is unstable. The above example uses mostly 0.37-0.63. No hallucinations so far (none of models 3.5-4o).

I might be wrong, but to me, the hallucinations are the result of input data missing compared to what the model “thinks” the task is… So check your prompts alignment with the task and the data you provide.

1 Like

In case someone needs: Training Custom Agents To Reduce Margin of Error - #2 by sergeliatko describes the general approach to build summaries like this