For context: I work as a faculty member at a medical school. I have been using ChatGPT since March 2023 with over 1260 conversations. My primary use is in research and medical education.I subscribed to GPTPlus in September 2023. My school does not have a formal AI or GenAI policy despite my urging since November last year when I drafted one. On March 14, out of the blue, our IT people put out a guideline for using GenAI, which was based on security and privacy concerns in a healthcare environment. It contained Do’s and Don’ts and was based on policy from Gartner.com. One flaw in the communication: “Any information entered into a user prompt, or shared in the use of ChatGPT, Gemini or other GenAI tools may appear in another user’s output.” To my knowledge and experience, this has occurred once from an account takeover and not from OpenAI insecurity. This past week, I was blocked from uploading a spreadsheet into GPT4. IT had indeed blocked that action. After immediate complaints and some upper administrative push, the block was removed. However, this statement appeared in the final resolution, " Generative AI as a category has been flagged for its potential harm and OpenAI ChatGPT specifically was identified as High Risk."
I need advice and information to argue against this view. To not misunderstand, privacy, security, and legal compliance are utmost priorities for me. However, I do not think our IT department has evidence of any breach in these areas nor do they fully understand how GenAI works. Thank you,
Welcome to the community!
Unfortunately, I can’t help you argue against that guidance.
-
while OpenAI is really sneaky in its terminology, they have so far never denied not using chats for training. Their privacy policy links to this article: https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed#h_2df02d4917 - and while it says that they’re trying their best to scrub out personal data, they are not saying that they don’t use your data. This also doesn’t mean that just because they try their best, personal data won’t land in a future model.
-
following point 1, we know OpenAI advertises their enterprise plan doesn’t use your data for training. Since this is an explicit selling point here, you can expect that they at least might use your data for traning purposes.
I don’t think your IT department, or the guidance here is wrong. I personally think it would be irresponsible to upload any sensitive data to chatgpt. While unlikely, there’s no guarantee that your data won’t land in GPT-5. But there are more issues.
The API offers slightly more relief in that regard, but also in a limited fashion. While they don’t use it for training, they also retain data and may have that data inspected by a human. It’s not the first time privileged access sensitive data gets leaked. Whether it’s trump’s tax return, or that lady that got her nudes leaked by apple.
The only way to remove the human from the system is to exercise a processing exemption with azure. But even then, you’re sending sensitive data to azure.
Overall, this is a gigantic liability issue. Your organization’s legal department needs to square data processing agreements with your AI provider before you should even consider sending potentially sensitive data to the AI.
What can you do? You need to ask your organization for a sanctioned alternative. If you are an enterprise customer to a cloud provider, that cloud provider likely has a solution that could work for you. Whether It’s Azure OpenAI or IBM watsonx or enterprise OpenAI or some bedrock thing - but using a private ChatGPT account… …pls don’t
We do not understand how OpenAI works.
You type something into ChatGPT - it goes into training. Knowledge workers grade it. It can be used as language to train future models. It can be sold to other parties. It can be silently subpoena’d by governments. It can be auctioned to the highest bidder when they go bankrupt. Database corruption can display chats to the wrong users…
API use through a managed platform would be a better case for an institution.
Yes, look at “Open Source” Models and hosting your own instance.
That said, you will need some pretty expensive hardware and none of the OS models approach the power of the big players.
Perhaps in time a privacy focused product will emerge.
Are these policies from gartner listed on their homepage somewhere? If you can point me towards them. I know a few people there I can bark at, but I can’t promise anything
Hi!
You can expect that everything you put into ChatGPT will be used for training.
The ChatGPT Team subscription however has a promise to not use data for training, similar to the API. In short, the Team subscription is for business customers and ChatGPT Plus is for private individuals.
According to the terms of service you are not allowed to send personal identifiable information to the model. If your research and education use cases do not cover this type of sensitive information you would be fine with a team subscription.
From what I can see from your request, these two factors should be a solid start for a constructive discussion with the IT.
I have not been able to track down the policies on the Gartner site.
You may want to opt-out of collection for personal accounts. Maybe that’ll satisfy the requirements? I’ve done it myself for my plus account, but that’s just my natural security posture for all services I can do that with. Unfortunately, I didn’t take notes of the link I used to opt-out.
a quick search on hackernews turned up:
Yeah I think the summary here is:
- Private data on chatGPT
- using ChatGPT enterprise to avoid your data going into OpenAI’s training
- using the API for the same purpose
I have some experience with data collection, cybersecurity, DLP, etc. I’ve worked in IT and Engineering throughout my career.
tldr1 - Context and Insight Into IT
It’s possible your IT department is not using a risk-based approach to chatbots; instead, they’ve implemented a zero tolerance approach, which may be overly restrictive.
tldr2 - Solution
The best course would be to get an IT executive or your department to buy an Enterprise license from OpenAI and to specify in the agreement that none of your prompt or uploaded infromation will be used to train OpenAI’s LLM.
Rationale for the Solution
Without a formal agreement in place (tldr2) I would side with your IT team. There is no committment or promise or SLA or SLO that OpenAI will not use the research data you’ve given them as part of a response to another user. There is too much risk-- without any seeming reward.
In other words
- If your research is that critical to the medical school then
- The value of chatGPT outweighs the risk
- To further mitigate the risk and increase the value a formal agreement can be made with OpenAI via an Enterprise license agreement.
All bases are then covered.
Summary
I am sorry we could not help you more, but one clear solution is: the value of your research has to be higher than the cost of an Enterprise license with OpenAI. If so then great get an Enterprise license-- have an IT VP pay for it, or sign off, … or pay for it out of your operating budget and show the agreement to IT, and ask them to turn off the data checks for your IP address/account and all is well.
Background Info
No need to read further unless you want a deeper understanding
Let me share what can happen if you don’t have an agreement in place. OpenAI can store your files indefinitely on its servers. If there’s a breach there’s no financial recourse to you or your school because you have a 1 off free or paid license.
Furthermore, FWIW, (personally) I would be concerned with a researcher using chatGPT. chatGPT does not follow the “research method” of sharing its sources; instead, it requires the human prompter to determine the sources, which is near impossible without knowing (at least generally) chatGPT’s sources.
- How can a model that hallucinates be trusted for research? Even with a human attempting to verify sources?
- Well, why use chatbots at all? No, no, this a bad extrapolation. Using a chatbot for coding is fine/ok – very different because the programmer has to test the code as part of their release process. As such, the code’s source is immaterial because the code will be exhaustively tested. Now, if programmers use chatGPT and release without testing--
that’s very irresponsible and very uncommon!
Thank you for your reply. My experience is completely different. GPT-4 has repeatedly found precise, peer-reviewed sources, and cited them correctly in my conversations. The sources were legitimate, accurate, and contained the necessary details being referenced. I have tested ideas, formulated hypotheses, challenged faulty thinking, developed complete review paper outlines, found the cutting edge in a wide range of fields, and refuted peer reviewers critiques all with evidence-based content published among the more 212 million papers that exist. I do not blindly trust GPT-4 output. In my experience, the days of consistent hallucinations are quickly fading away. I appreciate your concern but I kindly suggest that it is born from inexperience and misinformation. Further, the landscape of AI-driven search and analysis tools is evolving at a rapid rate. These Semantic Scholar, Elicit.com, ResearchRabbit, and Consensus are just several that I use nearly every day. These tools coupled with informed GPT-4 prompts and output are quickly revolutionizing how research is performed. This does not include all the ways non-generative AI is finding new drugs, proteins, disease-associated genes, and [include your favorite topic]. We are in a new era, which is lightyears from when the WWW was forming. Events that I witnessed firsthand. I leave you with a statement on how NEJM AI views using LLMs in preparation of research papers for submission.
This is a good example. The topic is chatGPT, IT, and data security. Not really about hallucinations-- that was a side comment (at best)… It wasn’t mentioned once in the tldr.
But it demonstrates a lot about how we debate online. We find a point we have experience wherewith to disagree with and then we hone in and swerve the conversation to that tertiary point.
-
The OP stated they’ve used chatGPT- without listing a version number.
-
The OP’s usage date is 2023 forward – we don’t know if that’s January 2023 forward or March 2023. That’s important bc v4 came in late Q1 2023.
-
also, unless I missed it, the OP seemed to infer it was the free or per user free version; otherwise, the Enterprise license agreement would’ve stipulated that none of the information would’ve been used in RL.
-
The researcher could’ve shown that to the IT team and I think the conversation would’ve been resolved quickly.
That’s the main and most important point for us to focus on to move this thread forward.
I am very glad that ChatGPT 4 has addressed any concern for hallucinations.
I’m also glad to ChatGPT 4 shows the sources.
I’ll accept your experience with ChatGPT 4, but again, that is so off the main point.