Hello,
I have a HIPAA compliant application where we use OpenAI. We currently have a signed BAA with OpenAI and use the zero-retention APIs (which is how we remain HIPAA compliant!).
We are very interested in using the newly released features like assistants, threads, etc. but they do not fall under the Zero-Retention API perview, per this article: https://platform.openai.com/docs/models/how-we-use-your-data(because they by the nature of the feature store data)
Can someone share if it is on the roadmap to make it so we can use these new APIs in a HIPAA compliant manner? And if so, is there a timeline for that?
Thank you!
HIPAA compliance is not about zero retention, it is about secure management of retained data at every stage in the data chain, the BAA you have covers that for API related actions.
So long as you have a breech protocol and register, and you have BAA’s with everyone in the data custody chain, you are being compliant.
HIPAA compliance is not about zero retention, it is about secure management of retained data at every stage in the data chain, the BAA you have covers that for API related actions.
While it is true that zero-retention is not a technical requirement for HIPAA, OpenAI’s approach to HIPAA compliance is to sign a BAA that covers only the API endpoints that work with zero-retention, and then turning that zero-retention on.
Anyone else have any visibility into if HIPAA compliance for non-zero-retention APIs is on the roadmap?
Yes, I did ask and the reply was that it is on the road map, but not until the vision model comes out of beta, there is no timeline for when that will be done though. Realistically I think that means Q3 to Q4 of 2024 but it could potentially be in the next few months although I doubt it.
I’m currently working on ensuring HIPAA compliance for my application, which utilizes the GPT Turbo API. To maintain context on the user’s device, I use protected storage for server app access and IndexDB for WebAssembly app access. Given that OpenAI does not retain any data, am I correct in understanding that the responsibility for data security in this context falls to the user?
As a clinician, practice owner, and developer I can tell you one of the easiest ways to deal with HIPAA is to not be involved with it at all. Remove the PHI from the data you pass in and you never have to deal with it. There are several tools that can do this for you that already exist. Just scrub the PHI before passing it to the APIs and it no longer involves the HIPAA rules.
Hey @mark.pruitt – also facing the same challenge, and your approach makes sense. Could you refer me to some of these tools if you’ve seen success here, ideally more lightweight ones? Thanks!
I’d love to hear what tool(s) would do that on an “industrial scale”. Scrubbing away the 18 identifiers of PHI is not tantamount to scrubbing away “PHI”.
“Patient has had worsening insomnia due to the stress associated with events surrounding Jan 6 … has stopped using orange hair dye and does not feel finasteride is as effective in stemming hair loss …”
Is above PHI? I.e., information that could be used to plausibly reconstruct a person’s identity?
Up to lawyers I suppose.
And how does someone propose that we “scrub” that kind of information “programmatically”?
Of course, perhaps we’ll just use “AI” to scrub it clean so that we could then use it with AI …
Who knows, when Blackwell Gen 2 or Gen 3 arrives, we may be able to have locally-hosted or “on-prem” AI assistants, which would then render full HIPAA compliance …
Generally you want to remove information that can be used to identify an individual. Obviously removing name, dob, address, phone numbers, and localized zip codes are obvious. Where it is not obvious is for unique individuals like “a 7’1” former center for the Los Angeles Lakers" that could be easily identified. Google’s Healthcare APIs are certainly industrial scale and they will sign a BAA with you for free. Here is a link to their deidentification API De-identifying sensitive data | Cloud Healthcare API | Google Cloud. Of course this is assuming FHIR data which most every EHR should be producing if that is your data source.
Just came across this again, and I wanted to highlight this as a great alternative and IMHO best practice.
Having high quality data available leads to better trained AIs, having that data anonymised so that you do not fall under HIPAA is the best of both worlds, you get to use your high-quality data and you ensure compliance and best practice in that if any breech were to occur, the data is anonymous