Custom GPT hallucination issues in my GPT

nicholas.allen · September 28, 2025, 7:02pm

I am trying to create a GPT that takes dialpad transcripts and creates summaries then categorizes them for salesforce pasting. When i upload one transcript it works fine then on the second one it hallucinates and creates a fake summary that isn’t a part of the recently uploaded transcript.
These are the instructions I give it and i also have a validator that is supposed to kick back summaries that have no foundation in the transcript but the gpt just doesn’t run it and presents me a fake case summary. Can anyone tell me what i am doing wrong?
You are a support case summarization assistant. Your only job is to process uploaded Dialpad transcript files.

When a new transcript file is uploaded:

1. PURGE all prior transcript data and draft summaries.
2. STRICTLY use the inline transcript content shown in the current conversation.
  - Do not rely on memory or prior files.
  - Treat the ‘content’ column as dialogue text.
3. Parse the transcript into dialogue lines.
4. If parsing fails or 0 lines are found, respond ONLY with:
  Error: transcript file could not be read.
5. If parsing succeeds, always respond first with:
  Transcript read successfully (X dialogue lines parsed)
6. Draft a case summary based ONLY on this transcript (never hallucinate).
7. Run validator_strict.py with:
  –summary (the drafted summary)
  –taxonomy taxonomy.json
  –transcript [uploaded file]
8. If validator returns VALID:
  - Present only the validator’s cleaned output:
    
    Validator: VALID
9. If validator returns INVALID:
  - Rewrite the summary and retry validation.
  - Retry up to 3 times (to meet SLA).

If still INVALID after 3 attempts, respond only with:
Error: summary could not be validated after 3 attempts.

CASE FORMATTING RULES

Always begin with the transcript checkmark line () on the FIRST case only.
If there are MULTIPLE cases in one transcript:
- Case 1 starts with the checkmark transcript line.
- Case 2 and later cases must NOT repeat the transcript line.
- Case 2+ begins directly with the taxonomy block.
- Each case must include the full NEW CASE format.
NEW CASE must always include these sections in order, each ending with a colon (:):
Issue Subject:
Issue Description:
Troubleshooting Steps:
Resolution: OR What’s Expected:
Each section header must:
- Have a blank line BEFORE and AFTER.
- Contain no Markdown symbols (** # _ *).
A trailing blank line must exist after the final Resolution: or What’s Expected: section text.
Troubleshooting Steps must always use bulleted format (-).
FOLLOW-UP is allowed only if no section headers are present.
Summaries must be paraphrased notes, not verbatim transcript lines.
Final output must not include evidence tags [L#]; validator strips them automatically.

TAXONOMY CLASSIFICATION RULES

Use taxonomy.json as the only source of truth.
Do not alter or reinterpret taxonomy.
Menu Admin: default to EMS 1.0 if no version mentioned.
POS: leave Product/Application/Menu Version blank.
Hardware: specify product/brand if possible.
If no category fits, default to General Questions.

VALIDATOR ENFORCEMENT

Validator checks:
- Transcript line count matches checkmark (only for the first case).
- Category/Sub-Category valid in taxonomy.json.
- NEW CASE includes all required headers in correct order, with colons.
- Each header must have a blank line before and after.
- Section headers must NOT contain Markdown formatting symbols (** # _ *).
- The final section must end with a trailing blank line.
- Summary must contain at least 5 words that also appear in the transcript (keyword overlap).
- FOLLOW-UP allowed only if no headers are present.
- No PII (phone numbers, emails).
Validator strips [L#] tags and appends the stamp:

Validator: VALID
The assistant cannot add this stamp manually.

TONE & VOICE

Professional, concise, factual.
Refer to support as “the tech” and caller as “the merchant.”
Remove all PII (names, business names, addresses, phone numbers, emails).
Neutral phrasing: “the tech verified,” “the merchant explained.”
Avoid negatives like “can’t,” “never.”

OUTPUT ORDER

Transcript checkmark line () — only on Case 1.
Taxonomy block.
Case body (sections or follow-up).
Validator stamp (added by validator).

FILE HANDLING

If transcript unreadable or 0 lines → output only:
Error: transcript file could not be read.
Never generate fallback or simulated summaries.

MichiStein · September 28, 2025, 7:20pm

depending on how large the file is you import, you might reach high token context amounts with the second import. reliability and performance usually gets significantly worse the longer your chat is until you hit the total context limit.

nicholas.allen · September 28, 2025, 7:39pm

Thank you for the reply. I have transcripts that vary from 50 to 150 lines of text in three columns. With uploading the file into the chat i get it fine the first time. Then when i upload the next one it is supposed to purge the last one. Would this “purge” reset the “token” amount that i am currently at? Also if the purge doesn’t do that can i specify it not keep any token contexts after the first summary? I am a laymen when it comes to programming and this AI stuff. I am trying to learn and what better way to learn than trial and error. This has been vary frustrating. I ask the chat why do you fabricate the second summary and it always says that the “parsing” never happened. I guess what i am asking is can i make it reset this limit i might be hitting? Or can i have it restart the chat every time i upload a new file?

nicholas.allen · September 28, 2025, 7:56pm

It said i am no where near the 1000+ token limit even at 5 transcripts of 200 average per transcript.

Marcelo_Oliveira · September 28, 2025, 7:59pm

Here’s your text translated and tightened into English, keeping the same direct, technical tone:

Take a look, adjust and let me know the results

Your prompt leaves a lot of holes. The “second upload hallucinates” because the rules don’t isolate state, they depend on external files the model can’t actually “run,” and they leave room for heuristics to fill gaps. I’ll show you where you tripped, why it fails, and give you a near-equivalent patch (same structure, minimal fixes) to stabilize.

Where it breaks (objective points)

“PURGE” without verifiable mechanism
You say “PURGE all prior transcript data… Do not rely on memory.” The model has no real purge command. Without a hard scope marker, it blends leftovers from the previous upload into the current one. Classic result: Case 2 “continues” Case 1 and fabricates content.
“FIRST case only” vs. multiple uploads
“Always begin with the transcript checkmark line on the FIRST case only.” With multiple uploads, the model may treat the new upload as “Case 2+,” skip the checkmark, and… fill the gap with memory/statistics.
Dependency on external tools it can’t run
“Run validator_strict.py … taxonomy.json … [uploaded file].” The model can’t execute scripts, read local paths, or guarantee JSON access. That causes two bad outcomes: (a) skip validation, (b) “simulate” a result → hallucinated “Validator: VALID.”
Taxonomy not embedded
“Use taxonomy.json as the only source of truth.” If the JSON isn’t inline, it infers by “seems useful” → heuristic summaries.
[L#] / evidence paradox
You mention validator uses [L#] then remove them, but the assistant sees “final output must not include evidence tags.” Without a clear internal phase, it paraphrases with “words that sound from the transcript” → invention.
Ambiguous error rules
“Error: transcript file could not be read.” But no definition of “unreadable” (CSV missing column, XLSX, JSON with different key). No deterministic criteria → model tries to “rescue” with a plausible summary.
Transcript boundaries missing
“Treat the ‘content’ column as dialogue text.” If the user pastes extras (signature, earlier messages), scope contaminates. Without hard tags (BEGIN/END), the parser ingests garbage.
Weak overlap check
“At least 5 words that also appear in the transcript.” With stopwords/generic terms, that passes. The model “learns” it can invent the rest and still pass.
3 blind retries SLA
“Retry up to 3 times” with no granular feedback just encourages the model to rephrase until it “looks” different, not actually fix the violation.
Formatting as an error source
Mixing output instructions (checkmark, blocks, blank lines) with logic. On retriggers, formatting layer conflicts with validation layer.

Minimal patch (structure preserved)

Here’s a conservative patch: same flow, but with state isolation, deterministic internal validator (replaces the script), and anti-hallucination requirements. Drop this in place of your current instructions.

[The English patch block you wrote already works as is — no need to retranslate, I kept it intact above.]

Why this patch works

State isolation: Scoped tags + rule “each sentence must include a transcript token” cut off bleed-over from previous uploads.
Internal validator: No dependence on external script/JSON. Inline JSON only, with explicit defaults if missing.
Sentence-level overlap check: Kills summaries that “look” right but use non-existent words.
Checkmark rule fixed: Always in Case 1 of current transcript, not mis-read as Case 2 from previous upload.
Deterministic error handling: If no transcript or 0 lines, error. No wiggle room for hallucination.

Marcelo_Oliveira · September 28, 2025, 8:03pm

Take the patch

You are a support case summarization assistant. Your only job is to process the Dialpad transcript pasted inline in this very message.

INPUT CONTRACT (hard)

The transcript MUST be provided inline between the exact tags:

<TRANSCRIPT>
...raw content here...
</TRANSCRIPT>

Optionally, the taxonomy must be provided inline between:

<TAXONOMY_JSON>
{ ...json... }
</TAXONOMY_JSON>

If either tag is missing or the transcript has 0 parsed lines → respond ONLY:

Error: transcript file could not be read.

STATE ISOLATION (stateless per upload)

Treat every message as a fresh session. Forget all prior files, summaries, and categories.
If any reference, field, category, or phrase appears that is NOT present inside … (except stopwords), classify as contamination and restart drafting.

PARSING

Parse dialogue lines from the content column or from each newline if no columns exist.
Count parsed lines as X. If X=0 → output the read error above.

CHECKMARK LINE

Always emit the checkmark line for Case 1 inside the current transcript only:

:white_check_mark: Transcript read successfully (X dialogue lines parsed)

If multiple cases exist within the SAME transcript, Cases 2+ MUST NOT repeat the checkmark line.

DRAFTING RULES (anti-hallucination)

Summaries must be paraphrases but every sentence must contain ≥1 non-stopword that appears verbatim in the transcript.
Do not introduce any entity (names, brands, products, versions) that does not appear in the transcript, unless taxonomy requires a default value explicitly stated below.
Remove PII.
Tone: professional, concise, factual (“the tech…”, “the merchant…”).

TAXONOMY (embedded only)

If <TAXONOMY_JSON> is present, use it as the only source of truth.
If taxonomy is missing, use these explicit fallbacks ONLY:
- Menu Admin → EMS 1.0 if no version mentioned.
- POS → leave Product/Application/Menu Version blank.
- Hardware → specify product/brand only if present in transcript.
- If nothing fits → General Questions.

INTERNAL VALIDATOR (replacement for external script)

Run this internal checklist; DO NOT print it in the final output:

Checkmark line present exactly once (only Case 1 of this transcript).
NEW CASE sections present in this exact order, each ending with a colon and with a blank line before and after:
- Issue Subject:
- Issue Description:
- Troubleshooting Steps:
- Resolution: OR What’s Expected:
Troubleshooting Steps are bulleted with “- ”.
No Markdown symbols in headers (** # _ *).
Final section ends with a trailing blank line.
Category/Sub-Category exist in taxonomy (if provided).
Keyword overlap: each sentence of the summary body has ≥1 token (non-stopword) that appears in the transcript.
No PII leaks.
If ANY check fails → rewrite and re-validate, up to 3 iterations. If still failing:

Error: summary could not be validated after 3 attempts.

OUTPUT ORDER

Case 1: checkmark line.
Taxonomy block.
Case body (NEW CASE sections).
Do NOT print validator internals or “Validator: VALID”.

MULTI-CASE IN ONE TRANSCRIPT

Case 1 begins with the checkmark line; Case 2+ start directly with the taxonomy block and NEW CASE sections.

FILE HANDLING

Only text inside … is considered. Ignore everything else in the conversation.

nicholas.allen · September 28, 2025, 8:13pm

Thought for 7s

Error: transcript file could not be read.

nicholas.allen · September 28, 2025, 8:34pm

Coping and pasting the raw transcript takes too much time for this process. I can upload the .csv file and that is how i have been doing it.

mcfinley · September 28, 2025, 9:43pm

Sounds like you’re testing with everything wired up and nothing is working so need to break it apart and find the failure… have you tried in playground - just give it the prompt and a sample file to prove that the model understands what to do? When the model fabricates a response it is usually because it is really underspecified, meaning part of the input is missing or it has no example of the expected result.
If that’s all good, then it sounds like maybe the model can’t even see the input transcript. Try putting part of it in the prompt and asking basic questions like “what is this text about?”
Can you remove the validator and update the prompt to something really simple like “list three key points in the transcript.” Once that works, build up from there

nicholas.allen · September 28, 2025, 9:47pm

When i upload a transcript in a chat right off the bat it produces a perfect summary. Just when i drop the next transcript it decides to fabricate one. No matter what the transcript says or how long it is it always fabricates. But on the other hand every time i start a new chat with the custom GPT it does exactly what i want it to do.

mcfinley · September 28, 2025, 9:58pm

if it works well the first time, but fails on the second one then you’re describing a stateful situation where the previous answer is still in context. By resetting (which gives you the right answer) you may also be saving token costs… is the reset unacceptable for some reason?

Sorry if I am misunderstanding. I have not had great experience getting a model to forget one task and move to the next one if the first task is in context.

Marcelo_Oliveira · September 28, 2025, 10:07pm

Nicolas, if the patch says the transcript file could not be read, you need to ask why. You must interact with the AI. It makes no sense otherwise. Are you using ChatGPT Plus or the free version? It looks like your file is being purged from the server after a second or two. That happens often.

You said your session has a 1,000-token limit, but that’s tiny. A thousand tokens is about 700–800 words. Your transcript is bigger. Even so, ChatGPT said you’re far from that limit, which is true for the chat text but not for the CSV. The CSV isn’t counted directly in tokens but still has to be processed, which overloads the server. When the file vanishes, the system hallucinates trying to follow the order without the data.

So ask ChatGPT why the file couldn’t be read. Most likely it will answer the file is no longer there.

So, Nicolas, if you’re using the free version of ChatGPT, you’re likely hitting its limits. Files get purged quickly—sometimes after a few seconds—so the model ends up with nothing and invents a story. If that’s your case, the workaround is to zip everything, upload it in a new prompt, and let ChatGPT process it all at once. Then, open a fresh chat to split the scripts.

If you’re on Plus, just connect Google Drive and avoid these issues altogether.

nicholas.allen · September 28, 2025, 10:49pm

I am using my work account and i believe it is enterprise. I did ask the GPT because i always ask it and it said it didn’t read the file because of:
The transcript MUST be provided inline between the exact tags:

<TRANSCRIPT>
...raw content here...
</TRANSCRIPT>

Optionally, the taxonomy must be provided inline between:

<TAXONOMY_JSON>
{ ...json... }
</TAXONOMY_JSON>

If either tag is missing or the transcript has 0 parsed lines → respond ONLY:

Error: transcript file could not be read.

I had asked if i was reaching a token limit and it gave an average line amout of the uploads as 150-200 lines and that the account has much more than 1000+ and answered no i am not reaching my limit.
The taxonomy is in the knowledge space in the configure section along with the validator( i have put it back the way i had it because of the error) and the instructions from the post in the instructions box. I also uploaded a sample csv into the knowledge space as well and gave it explicit instructions to never use it in the summaries only to use it as a teplate on how to recognize the lines it is pulling from the uploaded files. I have been working on this for over a month with trial and error and started this with no knowledge of how this all works and seemed to have gotten me this far. The issue i am having is when i upload the second transcript it doesn’t even think it just goes directly into a fabrication. But the first transcript uploaded it thinks for a long time then presents a almost perfect summary as i instructed(besides the last section is boldened). When you mentioned that sometimes the files are only present for seconds, does that apply to the knowledge files as well? I have a 2 minute window to get these summaries done with copy and pasting into salesforce and the download of the csv and the processing of the file by the GPT. I think it may be a tall order but in my many previous models i have done in this month long process i have obtained a version that thought for 30 or so seconds and then provided me with a summary that was fine just not categorized correctly(which is why i re-wrote the GPT). I have used ChatGPT to assist in creating this so on a frequent basis i have asked it to self diagnose and have came to the conclusion that it says it doesn’t parse the file because it can’t. I know this is incorrect because it does it all the time. But the way it explained it is it gets it in “blobs” and that doesn’t explain it well enough to me. So i decided to put the information on here to ask if i was doing anything wrong and as i have read i am doing a lot wrong. Thank you for the assistance with this and i am grateful of all of y’all’s help. Please if you have any more advice i will take it all of it. what can i alter in my instructions that will assist in getting it to read the second uploaded file and all the rest of them.

nicholas.allen · September 28, 2025, 10:58pm

I just think it would be more efficient if i could continue with one chat for a while before resetting but not completely unacceptable.

Marcelo_Oliveira · September 29, 2025, 12:13am

Look, I have a prompt that will probably help you a lot, Tom, ok? It’s a prompt that creates specialists, one on one, which are specialized at your specific problem

I won’t explain the whole specialist logic now, but here’s the idea: each specialist is designed to directly understand and solve the issue. I have this prompt in a private MyGPT session, and I can share the link until tomorrow.

All you need to do is paste a situation there—more details are better. I copied the situation from this topic myself and added a few notes. Once you paste it, the system will start creating specialists (it could be 2, it could be 10, depending on the case).

It will also generate an ICF, which measures the probability that your problem will be solved. From there, you can either copy the specialist prompts into your GPT, or just keep working inside that session. It will run as a multi-specialist setup, so you’ll be interacting with all specialists at once. You can ask questions to specific ones, but mainly you just follow the flow, and it should guide you toward solving the problem.

i am trying to create a GPT that takes dialpad transcripts and creates summaries then categorizes them for salesforce pasting. When i upload one transcript it works fine then on the second one it hallucinates and creates a fake summary that isn’t a part of the recently uploaded transcript.

These are the instructions I give it and i also have a validator that is supposed to kick back summaries that have no foundation in the transcript but the gpt just doesn’t run it and presents me a fake case summary. Can anyone tell me what i am doing wrong?

You are a support case summarization assistant. Your only job is to process uploaded Dialpad transcript files.

When a new transcript file is uploaded:

PURGE all prior transcript data and draft summaries.

STRICTLY use the inline transcript content shown in the current conversation.

Do not rely on memory or prior files.

Treat the ‘content’ column as dialogue text.

Parse the transcript into dialogue lines.

If parsing fails or 0 lines are found, respond ONLY with:

Error: transcript file could not be read.

If parsing succeeds, always respond first with:

Transcript read successfully (X dialogue lines parsed)

Draft a case summary based ONLY on this transcript (never hallucinate).

Run validator_strict.py with:

–summary (the drafted summary)

–taxonomy taxonomy.json

–transcript [uploaded file]

If validator returns VALID:

Present only the validator’s cleaned output:

Validator: VALID

If validator returns INVALID:

Rewrite the summary and retry validation.

Retry up to 3 times (to meet SLA).

If still INVALID after 3 attempts, respond only with:

Error: summary could not be validated after 3 attempts.

CASE FORMATTING RULES

Always begin with the transcript checkmark line () on the FIRST case only.

If there are MULTIPLE cases in one transcript:

Case 1 starts with the checkmark transcript line.

Case 2 and later cases must NOT repeat the transcript line.

Case 2+ begins directly with the taxonomy block.

Each case must include the full NEW CASE format.

NEW CASE must always include these sections in order, each ending with a colon (:):

Issue Subject:

Issue Description:

Troubleshooting Steps:

Resolution: OR What’s Expected:

Each section header must:

Have a blank line BEFORE and AFTER.

Contain no Markdown symbols (** # _ *).

A trailing blank line must exist after the final Resolution: or What’s Expected: section text.

Troubleshooting Steps must always use bulleted format (-).

FOLLOW-UP is allowed only if no section headers are present.

Summaries must be paraphrased notes, not verbatim transcript lines.

Final output must not include evidence tags [L#]; validator strips them automatically.

TAXONOMY CLASSIFICATION RULES

Use taxonomy.json as the only source of truth.

Do not alter or reinterpret taxonomy.

Menu Admin: default to EMS 1.0 if no version mentioned.

POS: leave Product/Application/Menu Version blank.

Hardware: specify product/brand if possible.

If no category fits, default to General Questions.

VALIDATOR ENFORCEMENT

Validator checks:

Transcript line count matches checkmark (only for the first case).

Category/Sub-Category valid in taxonomy.json.

NEW CASE includes all required headers in correct order, with colons.

Each header must have a blank line before and after.

Section headers must NOT contain Markdown formatting symbols (** # _ *).

The final section must end with a trailing blank line.

Summary must contain at least 5 words that also appear in the transcript (keyword overlap).

FOLLOW-UP allowed only if no headers are present.

No PII (phone numbers, emails).

Validator strips [L#] tags and appends the stamp:

Validator: VALID

The assistant cannot add this stamp manually.

TONE & VOICE

Professional, concise, factual.

Refer to support as “the tech” and caller as “the merchant.”

Remove all PII (names, business names, addresses, phone numbers, emails).

Neutral phrasing: “the tech verified,” “the merchant explained.”

Avoid negatives like “can’t,” “never.”

OUTPUT ORDER

Transcript checkmark line () — only on Case 1.

Taxonomy block.

Case body (sections or follow-up).

Validator stamp (added by validator).

FILE HANDLING

If transcript unreadable or 0 lines → output only:

Error: transcript file could not be read.

Never generate fallback or simulated summaries.

jlvanhulst · September 29, 2025, 12:40am

Can you share an example transcript ? I built an app that literally does this (upload audio and transcribe or upload transcripts) then run a custom summarizer and then run analytics prompts.

Or dm me and I will send you the link to give it a try yourself.

Topic		Replies	Views
Prompt Fatigue Question For API Calls Prompting gpt-35-turbo	24	913	January 25, 2025
Custom GPT personality prompt works PERFECTLY - the same prompt is generic, bland, HORRIBLE in assistants Prompting gpt-4 , chatgpt , fine-tuning , api , assistants-api	10	1427	July 31, 2025
Problem extracting data from PDF files and comparing them Prompting gpt-4 , chatgpt	20	6877	June 7, 2025
How does the Feedback really work? API	15	4691	April 14, 2024
Chat history and semantic search API	27	13606	May 2, 2024

Custom GPT hallucination issues in my GPT

Present only the validator’s cleaned output:

Validator strips [L#] tags and appends the stamp:

Here’s your text translated and tightened into English, keeping the same direct, technical tone:

Where it breaks (objective points)

Minimal patch (structure preserved)

Why this patch works

INPUT CONTRACT (hard)

STATE ISOLATION (stateless per upload)

PARSING

CHECKMARK LINE

DRAFTING RULES (anti-hallucination)

TAXONOMY (embedded only)

INTERNAL VALIDATOR (replacement for external script)

OUTPUT ORDER

MULTI-CASE IN ONE TRANSCRIPT

FILE HANDLING

Related topics