I have loaded a dataset of about 50K rows of survey data. There’s 8 columns, two of which are free-text data. When I ask it to pull quotes that demonstrate examples of a given topic, it consistently hallucinates the quote. Each one starts off matching my source data, but quickly becomes nonsense. What am I getting wrong here? Is there a way to force chatGPT to actually return my source data exactly as it is found?
Here’s an example of a prompt: “Show me some examples of respondents developing as leaders and mentors, skills like delivering feedback, coaching, mentoring, delegating work, etc. Include the full quote from the time card.”
GPT4 can only handle up to ~ 8000 tokens, so feeding it 50k rows of data can’t work. It will just remember the last ~ 8000 tokens from those. And if that wasn’t bad enough there is more problems currently:
Thanks for the heads up! I Tried again with only 100 rows of data (8,100 tokens) and it still is only paraphrasing the exact quotes, all while telling me they are precise. Still, if the tool can only handle 100 rows at a time, I would rather just look at the data manually…
I also find it difficult to use ChatGPT to analyze tables & numbers. In case it helps, I tried to develop a custom GPT to self-report hallucinations, feel free to test it out in your research here & good luck!