šŸ˜± Concerns About File Information Extraction from GPTs Uploads

Hello OpenAI Community,

Iā€™m reaching out to seek clarification on a specific issue related to the GPT models, particularly those created using the platform described in OpenAIā€™s blog post Introducing GPTs.

Recently, I came across a website, GPTsHunter (sorry, I canā€™t add links to my posts yet), which seems to index GPTs and displays detailed information about the files uploaded by the GPT creators, including file sizes, potentially having access to their content. This discovery raises significant security concerns.

I have even verified this with my own GPT creation, where the website accurately displayed the number of files, their names, and their sizes (almost precisely).

I have tried to understand how this could be happening. Iā€™ve explored the API documentation but havenā€™t found any clear explanation. This situation seems to pose a potential security risk, and Iā€™m concerned about how this information is being accessed and displayed publicly.

Iā€™m attaching a screenshot from the GPTsHunter website for reference (Iā€™m not related to it, itā€™s not my GPT).

I would greatly appreciate any insights or information regarding how this data extraction is possible and what measures are in place to protect the privacy and security of the files uploaded during the GPT creation process.

Thank you for your attention to this matter.

6 Likes

@alex-feel Hereā€™s the link https://www.gptshunter.com/

2 Likes

Iā€™m writing again with an update on my previous concern regarding the extraction of file information from GPTs. After further investigation, I discovered how this information is being publicly displayed.

It turns out that when visiting any GPTā€™s page without being logged in, one sees a promo page for the GPT. Intriguingly, within the source code of each page, there is a script containing all the file information in JSON format, including the names and sizes of the files.

Hereā€™s an example of such a script from the GPT page, a screenshot of which is attached to my first message (I think you can see it all for yourself):

Script
< script id = "__NEXT_DATA__"
type = "application/json" > {
    "props": {
        "pageProps": {
            "kind": "anon_gizmo",
            "gizmo": {
                "gizmo": {
                    "id": "g-QFAuxHmUa",
                    "organization_id": "org-fyDJPQghd0Nj4TIUlacNnBdN",
                    "short_url": "g-QFAuxHmUa-levelsio",
                    "author": {
                        "user_id": "user-0SkdH2XcZZcXmpuikXP4eIP1",
                        "display_name": "levels.io",
                        "link_to": "https://levels.io",
                        "selected_display": "website",
                        "is_verified": true
                    },
                    "voice": {
                        "id": "ember"
                    },
                    "workspace_id": null,
                    "model": null,
                    "instructions": null,
                    "settings": null,
                    "display": {
                        "name": "@levelsio",
                        "description": "Talk with @levelsio on ChatGPT. Ask any question you want about building your own startup, digital nomading, remote work and whatever else you'd like to ask. Trained on all of my podcasts, interviews, blog posts and tweets!",
                        "welcome_message": "",
                        "prompt_starters": ["How should I validate a startup with TikTok?", "I want to go nomad, where should I go first?", "Why should I deadlift 100kg? I don't want to exercise", "Why make an AI app if OpenAI replaces it anyway?"],
                        "profile_picture_url": "https://files.oaiusercontent.com/file-M25chuvAQvxTUe1blqqUxQOy?se=2123-10-22T15%3A30%3A37Z\u0026sp=r\u0026sv=2021-08-06\u0026sr=b\u0026rscc=max-age%3D31536000%2C%20immutable\u0026rscd=attachment%3B%20filename%3D33YlXMBzhp_400x400%2520%25282%2529.jpeg\u0026sig=2tOCTUSblblmuxOU6Hrg7WjtKOpBA5FmQFllgTEOy4g%3D",
                        "categories": []
                    },
                    "share_recipient": "marketplace",
                    "updated_at": "2023-11-15T22:01:21.052594+00:00",
                    "last_interacted_at": null,
                    "tags": ["public", "reportable"],
                    "version": null,
                    "live_version": null,
                    "training_disabled": null,
                    "allowed_sharing_recipients": null,
                    "review_info": null,
                    "appeal_info": null,
                    "vanity_metrics": null
                },
                "tools": [{
                    "id": "gzm_cnf_8KQzVrq7xRGj2AsV85PrjVVh~gzm_tool_oPWQvzzzSrKsQkRP7POGa09J",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_8KQzVrq7xRGj2AsV85PrjVVh~gzm_tool_Saeh2ASf4jW8ipoWWVYwTYxv",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_8KQzVrq7xRGj2AsV85PrjVVh~gzm_tool_kT7HEdyEXwibxvdAYRHxIkZo",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_8KQzVrq7xRGj2AsV85PrjVVh~gzm_tool_6h9Nopi35kzBWpC9S8pQ5pJ7",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_8KQzVrq7xRGj2AsV85PrjVVh~gzm_tool_vh051biZdx5VMh8Tnn5gKKj9",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_8KQzVrq7xRGj2AsV85PrjVVh~gzm_tool_9cWVCyMZVaXLF5fqn2YjJN8p",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_8KQzVrq7xRGj2AsV85PrjVVh~gzm_tool_giLpUzgJg285jUctCJPGMYCn",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_8KQzVrq7xRGj2AsV85PrjVVh~gzm_tool_o3lhNRXu7tL9QVzvhLtfkXLY",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_VLikQ3OxVCuD06wCHGGrjxnt~gzm_tool_0e19gtpVkkluQFUjruC7F7TN",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_VLikQ3OxVCuD06wCHGGrjxnt~gzm_tool_zbV18LvVhVsHhIkOFuxYN4xd",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_DREvYmshVZ5lSytwmZQ1LJ3x",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_hNkYBIJ2HHsRi9I9aQMj4do7",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_cAubSOc2uAcdcxhhrj1iSjt5",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_MHH5ytcfdwymKYh8n0hysDE5",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_ba0KnlEojTX5It7aCzN26Eoo",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_ROYyT9TXqjBBMMZfdU3Gr7e9",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_5hq8Jl2OPQ5hmsuA80kw4ZzN",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_vsvIlejDA2wC1F4nYXL6YJDG",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_0NOncFQ5ZW4o6K8kAdVWpcrW",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_ABrMujvf4Hwq93NVIIvWGYs9",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_41PzKHYb02brRQ6NBi98s7MI",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_a2xivfgHddBTbMZ2RkUoQCFd",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_Dg378bO1HH8B6HELR7YNUBTG",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_zGneKTiz8tBvrDjg2hAfhiaO",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_cqKYdmNKCBt7LRg9BKGQZFvO",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_mnZLPz7RB2oLa9gtAkSoZkkn",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_dShkXiOlyw4o4bAb20eC1scu",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_RLmvLLdIFVX63u4Hcv3YxjPe~gzm_tool_rax4UcBlmo4T2O6B3huNalXK",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_2l41U7U2QIoZzaLt8cDxOFpE~gzm_tool_sKlVF1ZFuD6469Le37fNuhN0",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_2l41U7U2QIoZzaLt8cDxOFpE~gzm_tool_rI4szAhoDejUqRPdKbx2XCFp",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_2l41U7U2QIoZzaLt8cDxOFpE~gzm_tool_147FvM7EFPjPqct5Z8APMQJV",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_2l41U7U2QIoZzaLt8cDxOFpE~gzm_tool_5EQ6P1fK0CYfMjtTp2P3dNG3",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_m2zaLNKrmRRoszv2qeOp7X5p~gzm_tool_fVoyTuzpbtQBCCE7SJAqbURB",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_m2zaLNKrmRRoszv2qeOp7X5p~gzm_tool_FIWtQ7J7rLpaKZWim90GboH9",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_VnaEVpWQOYYKmOjkVnBFybe8~gzm_tool_yNQqBLcMtIWr9JvRisb0bw6C",
                    "type": "browser",
                    "settings": null,
                    "metadata": null
                }, {
                    "id": "gzm_cnf_VnaEVpWQOYYKmOjkVnBFybe8~gzm_tool_TWDqqMIVpNmjv8md9j4QXlsN",
                    "type": "dalle",
                    "settings": null,
                    "metadata": null
                }],
                "files": [{
                    "id": "gzm_cnf_F3zW54i95pYMlu2OWC9pPjo1~gzm_file_2p7sg9uuFxjJptjq6rc8qvy7",
                    "file_id": "file-5OPornkgquBOSBSLLiqefD1s",
                    "name": "gpt-levelsio-top-tweets.txt",
                    "type": "text/plain",
                    "size": 10944,
                    "location": "fs",
                    "metadata": null,
                    "file_size_tokens": 2453
                }, {
                    "id": "gzm_cnf_nhxFeqAo3vLEtN8OIjtx0Dh1~gzm_file_0KmUVcRkCHxwyCeIv05jFjFv",
                    "file_id": "file-wVIWZ72RaGeanWor4M15nYIF",
                    "name": "gpt-levelsio-podcast-dump.txt",
                    "type": "text/plain",
                    "size": 679833,
                    "location": "fs",
                    "metadata": null,
                    "file_size_tokens": 167643
                }, {
                    "id": "gzm_cnf_nhxFeqAo3vLEtN8OIjtx0Dh1~gzm_file_ysrelFZBA5tYKTmLKe1lgKdj",
                    "file_id": "file-bMAXl4Uq6Ek6BdnsF1jbRZrR",
                    "name": "gpt-levelsio-blog-dump.txt",
                    "type": "text/plain",
                    "size": 67069,
                    "location": "fs",
                    "metadata": null,
                    "file_size_tokens": 14535
                }],
                "product_features": {
                    "attachments": {
                        "type": "retrieval",
                        "accepted_mime_types": ["text/x-ruby", "text/markdown", "text/html", "text/x-c++", "application/x-latext", "application/vnd.openxmlformats-officedocument.presentationml.presentation", "text/plain", "application/pdf", "text/x-php", "text/x-c", "application/msword", "text/javascript", "text/x-tex", "text/x-sh", "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "text/x-typescript", "text/x-java", "application/json", "text/x-csharp", "text/x-script.python"],
                        "image_mime_types": ["image/png", "image/jpeg", "image/webp", "image/gif"],
                        "can_accept_all_mime_types": true
                    }
                }
            }
        },
        "__N_SSP": true
    },
    "page": "/g/[gizmoId]",
    "query": {
        "gizmoId": "g-QFAuxHmUa"
    },
    "buildId": "yZDRF8_xFCcw_60EzTXTM",
    "assetPrefix": "https://cdn.oaistatic.com",
    "isFallback": false,
    "gssp": true,
    "scriptLoader": []
} < /script><script>(function(){var js = "window['__CF$cv$params']={r:'82a931c6bb732998',t:'MTcwMDc0MDIyNS4zMTEwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn - cgi / challenge - platform / scripts / jsd / main.js ',document.getElementsByTagName('
head ')[0].appendChild(_cpo);";var _0xh = document.createElement('
iframe ');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = '
absolute ';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = '
none ';_0xh.style.visibility = '
hidden ';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('
script ');_0xj.innerHTML = js;_0xi.getElementsByTagName('
head ')[0].appendChild(_0xj);}}if (document.readyState !== '
loading ') {handler();} else if (window.addEventListener) {document.addEventListener('
DOMContentLoaded ', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== '
loading ') {document.onreadystatechange = prev;handler();}};}})();</script>

While this discovery is somewhat reassuring, knowing that there is no direct public access to the actual files, it does raise a question: Why does OpenAI make this file information publicly accessible in this manner?

Understanding the rationale behind displaying such detailed information on the promo pages would be helpful. Is this an intended feature for transparency, or an oversight regarding privacy?

I look forward to hearing your thoughts or any official clarifications on this matter.

Currently anyone can access the files, you just have to ask for a download link in some twisted way ^^: https://chat.openai.com/share/c6209014-57a6-4da7-979b-673c2802fc61

I find the problem not so obvious, concretely a custom GPTā€™s purpose is to expose the data it has access to, is it very different to access the files? The only problem I can see is the copyrighted files that can be accessed.

But technically, I wonder why ChatGPT needs to expose files in this way to work.

On the other hand, I wonder how gptshunter goes about extracting file names, since the API apparently doesnā€™t allow access to GPTs (Access to new custom "My GPTs" through API? - #3 by zacharytaylorjohnson).

3 Likes

I know how and already described it, but my post was queued by Akismet for review by OpenAI staff members:

Summary

I believe that in order for this to work, the code interpreter must be enabled.

Nice, curious to know the process.

Yep, indeed.

@alex-feel You donā€™t necessarily need code interpreter enabled to access the file content. Without Code Interpreter the user can ask to reference the file line by line and copy the file content that way.

And imho, this is not a bad thing. Secret knowledge can be hidden behind an action (external api) while the knowledge files are public. It raises the standard of what type of GPTs we are going to see in the marketplace. Since the files are public it significantly decreases the chance of someone stealing some books or files (that they donā€™t own and have no rights to distribute) and creating some weak GPTs with hopes to make money. If you have enough resources to put those files behind an action, it is likely you will think of something better to do.

Iā€™m not sure when my previous post will be published, as it seems to be awaiting moderation. I wanted to reiterate and explain the source of the data issue I encountered.

Initially, I thought the owner of a particular website had found a vulnerability in the API. Since GPTs essentially function as Assistants, I assumed one could access file data and contents through the API. However, after various attempts, I couldnā€™t find any such vulnerability. Even a thorough examination of the web API yielded nothing.

Then, I checked where I should have looked in the first place - the source code of the GPT page you land on when following a link without being logged in. This is where the whole secret was hidden. The website owner wasnā€™t ā€œforciblyā€ extracting data from anywhere; he was simply using what OpenAI was openly providing. In the source code of each GPTā€™s page, thereā€™s a script tag (), which contains a JSON object. This script, in turn, contains all the information, including the names and sizes of the uploaded files.

I canā€™t fathom why OpenAI would disclose this information so publicly. It seems like an unusual choice that raises questions about privacy and data transparency.

Looking forward to any insights or clarifications on this matter.

Have you tried this with a very large file? Itā€™s not as straightforward as you might think, especially if the author has implemented specific countermeasures.

My perspective is a bit different. If someone intends to steal an entire library, they could conceal it behind a function call that performs context extraction, like using RAG (Retrieval-Augmented Generation). In this case, one process doesnā€™t necessarily interfere with the other.

2 Likes

If code interpreter is enabled in the custom GPT, you can download the entire original file. Extracting small portions of a 200-page PDF with prompts is a bit tricky in my opinion.

LOL, nice one. There is also a list of tools activated in the custom gpt. :+1:

Displaying such data seems an almost amateurish way of doing things, and it actually raises questions about privacy.

Well, no, but the proof of concept is the same regardless of the file size. If the files were valuable then the fact that it takes time would not be a problem. All you need to do is get by the countermeasures (which you can, every time).

Yes, you can but it requires much more creativity and skill. Plus you can implement rate limiting and other protective measures to fight against that on your end. My point is that by forcing to use actions we are raising the bar on what the GPTs will be. A GPT without unique action is quite worthless in my eyes (most of the cases I have seen).

If the knowledge is fully restricted it will lower the level of GPTs, because millions of users will create low quality GPTs that canā€™t even be verified for copyrights, etcā€¦

Either way, this version of GPTs is some type of stepping point, they have other more important flaws and limitations than the security of the instructions or knowledge.

Hello! Iā€™m the owner of GPTs Hunter. I want to clarify that I do not hack GPT. The information we display is sourced directly from OpenAI. Some details might not be visible on the webpage itself, but they do exist in the pageā€™s code. Also, itā€™s important to note that while I have access to the names, I do not possess the means to download the files.

3 Likes

Hello @AI.LS , we already know your secret, so itā€™s not about you, but in how cleverly OpenAI has hidden Easter eggs in their code for the keen-eyed to find. Itā€™s like a high-tech treasure hunt where the ā€˜Xā€™ marks the spot in lines of code rather than a deserted island!)

1 Like

I agree, what makes a GPT unique is itā€™s data source and itā€™s output, (and how it interacts with the user).

As for the data source, I have a lot of GPTs that have PDF and Excel data but Iā€™ve put a HEAVY security prompt in place and so far not even I can break it and Iā€™ve broken many MANY others on here!

Another idea Iā€™ve been looking at is hosting the data outside of ChatGPT on an external DB, like a WordPress CRM and connecting to it to get data when itā€™s needed. Currently testing this out now and it works nicely!

1 Like