I am working on an application that wants to use the user’s own identity documents to help them enter information faster and more accurately. In my own user preview testing of a GPT I am developing, I tried to upload the bio page of my passport and it was rejected as “unreadable”. I tried again in a ChatGPT session and was told it was rejected for security reasons as being personally identifying information.
I’d like to understand how we can work on an app that wants to accept PII. There must be a way this can be worked out. Perhaps after login/auth of the user? I don’t know but am hoping to start the conversation here. Would an enterprise account allow this usage?
PS: I would have put a tag “PII” on this post if it was available.
Honestly, I think you’d get there quicker using open-source models and/or OCR. The thing is privacy is no joke, and to meet the GDPR/COPPA privacy bar requires a lot of manpower, not to mention if PIIs somehow leaked into training data, they’ll be there forever. So I think they’re playing it safe by not allowing any PII whatsoever to show up.
thanks very much for the scanning ideas. Indeed textract is an option too.
But I still hope this thread keeps going about how to build on OpenAI’s platforms with PII because regardless of how the data gets to text it’s still PII and I still think the APIs need to account for this and allow users to safely build apps that take PII.
I have the same problem, I have tried Google vision, document AI and AWS textract. They both have their own problems with OCR. Escaping character etc… with 600 total tokens I was able to build a solution. Plus the cost of ChatGPT-o has halved and it actually cheaper to use ChatGPT over other services. Plus I don’t have to train ChatGPT it just works amazingly. OpenAPI provides opt out option for API to not use any data you provide to train ChatGPT.
The privacy issue still exists with the third party services.
So I got in touch with the support team and here is what they have come back with:
Extracting Personal Information from ID Cards:
OpenAI’s models, including the ChatGPT API, are designed to respect privacy and data security. However, using the API to extract personal information from ID cards may not align with OpenAI’s usage policies, which prohibit the use of its tools for activities that involve the generation or extraction of sensitive personal information. For more details, you can review our Usage Policies.
Ensuring Data Privacy:
OpenAI does not use data submitted by customers via the API to train or improve our models unless customers explicitly opt-in to this. Since you have already opted out, your data, including any ID images uploaded, will not be used to train our models. For more information on how we handle data, you can refer to our Privacy Policy.
Basically its no! I can’t use chatgpt-o to extract personal information.
To give you some background so you understand why I am looking into OpenAI OCR+GPT option to extract information.
Its simple to deal with Passports because they have MRZ but when it comes to other types of documents the problem gets harder and harder. For example EU National ID cards, there are policies in place but governments don’t follow rules. Don’t ask why. because I have no clue. In the end my aim is to provide a smooth process for the customer and save time. Using AWS Textract requires training and testing which is ok but not reliable.