Is Whisper open source safe?

Spitz · October 27, 2024, 6:02am

I would like to use open source Whisper v20240927 with Google Colab.

The concern here is whether the video and voice data used will be sent to Open AI. In other words, they are afraid of being used as learning data.

I know that there is an opt-in setting when using ChatGPT,
But I’m worried about Whisper.

I would appreciate it if you could get an answer from an official person.

vb · October 27, 2024, 6:30am

Hi!
Yes, the open sourced version of whisper does not send any data to OpenAI.
You are free to deploy it as you wish.

Spitz · October 27, 2024, 6:50am

Hi, Mr vb.
Thank you for your prompt reply.

Do you have any documents to prove it?
For example, a page that contains sentences that declare this.

I would like to use Wisper at work, but I have to prove to my boss that Wisper is safe.

I checked the URL below, but I couldn’t find where it was listed.
Any additional replies would be greatly appreciated.

https://openai.com/index/whisper/

arata · October 27, 2024, 7:10am

You’re right to be cautious about data privacy, especially in a professional context. Let’s address your concerns about using the open-source Whisper model.

The short answer is yes, the open-source Whisper model downloaded and run locally from the GitHub repository is safe in the sense that your audio data is not sent to OpenAI. You are running the model entirely on your own hardware (in this case, Google Colab’s servers), and you control the entire pipeline. No data leaves that environment unless you explicitly program it to do so.

Here’s a breakdown to explain why and how to assure your boss:

Open-Source Nature: Whisper’s code is publicly available on GitHub under the MIT license. This means you can inspect the code yourself, verify its functionality, and confirm that no data is being transmitted externally. While OpenAI trained the model, the released version is simply a set of weights and the code to run inference. It’s like downloading a calculator app – it performs calculations locally; it doesn’t send your calculations back to the calculator’s developer.
Local Execution: Running Whisper on Google Colab means the processing happens on Google’s servers, not OpenAI’s. Again, unless you write code to send the data elsewhere, it stays within your Colab environment.
No Phone-Home Code: You can examine the Whisper repository’s code directly. Search for any network connections or data transmission functions. You won’t find any that send data back to OpenAI. The model is designed for completely offline use.
Difference from ChatGPT: ChatGPT is a service offered by OpenAI, with data processing happening on their servers. Whisper, when downloaded and run from the open-source repository, is a self-contained program. It’s a crucial distinction.

How to demonstrate this to your boss:

Code Inspection: The most convincing evidence is the code itself. Show your boss the relevant parts of the Whisper repository and explain the absence of any data transmission logic.
Local Setup Demonstration: Run a small test transcription on Colab with sample audio. Show that the output is generated locally without any external network activity (you can monitor this in Colab).
MIT License: Point to the MIT license in the repository. It explicitly grants permission to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software. This reinforces the fact that it’s truly open source and under your control.

While there isn’t a single, official “Whisper doesn’t send data to OpenAI” statement on a webpage (because it’s implied by the nature of open-source software), the combination of the open codebase, local execution, and the MIT license provides strong evidence for its safe usage. By walking your boss through these points and demonstrating the local execution, you should be able to confidently address their concerns.

OpenAI also offers Whisper as a paid API service. This does involve transmitting your data to OpenAI, but they do not retain data for more time than necessary for safety, and do not use it to train AI models.

Spitz · October 27, 2024, 7:47am

Thank you for your detailed and kind answer.
I would like to try running it locally once and verify whether communication is occurring.

vb · October 29, 2024, 7:48am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Policies regarding processing potentially harmful contents API gpt-4 , gpt-35-turbo , chatgpt , whisper	5	1171	October 25, 2023
About Whisper open source and "weights_only=False" Community whisper	1	2128	October 27, 2024
Encryption / Privacy when using API (both whisper and chat completion) API api	1	3608	July 4, 2023
Whisper Transcription Questions API whisper	10	4706	March 13, 2024
Whisper is useless and does not work Feedback whisper	3	925	November 20, 2024

Is Whisper open source safe?

Related topics