I’m testing a GPT that has a single 3MB (20K row) CSV file. It is a catalogue of historical objects.
I have written an instruction prompt that urges the GPT to be accurate when presenting the data, and not to change any text from “Description” field when answering questions.
However, ChatGPT will always change the Description. For example, I can ask “Show me the record for object ID123”. It analyses the CSV and presents a believable-looking record, outputting each field in a list. It can do this for any of the 20K records, and it is finding relevant records. The searching works. But…
When looking closely, it makes things up. Asking it about a portrait, it adds information to the Description that just isn’t there, such as the artist name (always a name taken from somewhere else in the data), which way they’re sitting, etc.
Real data from CSV: “A portrait of a man lit by candlelight dated 1780. Artist and subject unknown”
ChatGPT output: “A portrait of a man lit by candlelight dated 1780. He is sitting on a chair facing left. The signature has been identified as well-known local artist John Smith.”
The location of the object, recorded in the “Location” field such as “Room 1, Wall A” might appear as “Room 1, Wall A, hangs above the door”. It has just made up “hangs above the door”.
I have tried some crafty prompting in the GPT instructions. I have told it in many ways to only present the actual data. I’ve asked it to write Python code to only extract the values of the field without amendment, summarisation and all the synonyms you can think of to try and get it to give me the real unadulterated records. But it never works.
I tried splitting the CSV into 10 smaller files, and this had no effect whatsoever, even starting a new GPT. Trying with just one smaller file (2K rows) and the same errors and hallucinations occurred.
Can anyone recommend a different approach? Is the kind of output that I want even possible using a Custom GPT? All tips welcomed.
Thanks in advance.