I can’t seem to get this to work properly. I’m confused about whether it’s actually possible, I’ve seen conflicting answers without sources.
What I want to do: Have a Custom GPT read from an online Google Sheet at https://docs.google.com/spreadsheets/ which would effectively function as a straightforward knowledge source, without having to do more extensive API work.
It seems right now that it isn’t working as expected. The GPT hallucinates and makes up new info not in the spreadsheet. I’m getting more reliable results when I export and UPLOAD the spreadsheet as a fixed file, but I want to have the Custom GPT work with the spreadsheet in near-realtime, without those extra steps.
What is the current expected behavior? Clarity is appreciated.
RAG just don’t work that way in real time. You have to upload the doc anyway, either manual or automatic to simulate uploading. Each uploading has a cost for embedding.
What I’m saying is that when the model accesses a Google Sheet it’s really just parsing the static HTML of that page—it’s basically scraping the page contents.
If you’ve ever done any web scraping you’ll know that we’ve pages which rely heavily on JavaScript can be a real pain to scrape because the actual content is built and loaded dynamically.
In this case, the utility that OpenAI uses for ChatGPT to access web pages very likely doesn’t even get any of the actual content of the target sheet, but rather just receives the initial HTML which is used to financially build and render the page, so there’s not actually anything information available for the GPT model to do anything with.
Therefore, building an action which involves the Google API for retrieving a .csv file of the Google Sheet you are interested in is the only sensible way to link this data to a custom GPT.
Using the Google Sheets API is not trivial but it’s doable,
Building the action requires describing the endpoint in an OpenAPI V3 specification which you can read about here,
You can either call the Google API directly or set up essentially an API gateway on your own server (preferred) so you can further control the response you provide to the model in terms of how the sheet data is represented.