Is Whisper open source safe?

Is Whisper open source safe?

I would like to use open source Whisper v20240927 with Google Colab.

The concern here is whether the video and voice data used will be sent to Open AI. In other words, they are afraid of being used as learning data.

I know that there is an opt-in setting when using ChatGPT,
But I’m worried about Whisper.

I would appreciate it if you could get an answer from an official person.

Hi!
Yes, the open sourced version of whisper does not send any data to OpenAI.
You are free to deploy it as you wish.

Hi, Mr vb.
Thank you for your prompt reply.

Do you have any documents to prove it?
For example, a page that contains sentences that declare this.

I would like to use Wisper at work, but I have to prove to my boss that Wisper is safe.

I checked the URL below, but I couldn’t find where it was listed.
Any additional replies would be greatly appreciated.

https://openai.com/index/whisper/

You’re right to be cautious about data privacy, especially in a professional context. Let’s address your concerns about using the open-source Whisper model.

The short answer is yes, the open-source Whisper model downloaded and run locally from the GitHub repository is safe in the sense that your audio data is not sent to OpenAI. You are running the model entirely on your own hardware (in this case, Google Colab’s servers), and you control the entire pipeline. No data leaves that environment unless you explicitly program it to do so.

Here’s a breakdown to explain why and how to assure your boss:

  1. Open-Source Nature: Whisper’s code is publicly available on GitHub under the MIT license. This means you can inspect the code yourself, verify its functionality, and confirm that no data is being transmitted externally. While OpenAI trained the model, the released version is simply a set of weights and the code to run inference. It’s like downloading a calculator app – it performs calculations locally; it doesn’t send your calculations back to the calculator’s developer.

  2. Local Execution: Running Whisper on Google Colab means the processing happens on Google’s servers, not OpenAI’s. Again, unless you write code to send the data elsewhere, it stays within your Colab environment.

  3. No Phone-Home Code: You can examine the Whisper repository’s code directly. Search for any network connections or data transmission functions. You won’t find any that send data back to OpenAI. The model is designed for completely offline use.

  4. Difference from ChatGPT: ChatGPT is a service offered by OpenAI, with data processing happening on their servers. Whisper, when downloaded and run from the open-source repository, is a self-contained program. It’s a crucial distinction.

How to demonstrate this to your boss:

  • Code Inspection: The most convincing evidence is the code itself. Show your boss the relevant parts of the Whisper repository and explain the absence of any data transmission logic.

  • Local Setup Demonstration: Run a small test transcription on Colab with sample audio. Show that the output is generated locally without any external network activity (you can monitor this in Colab).

  • MIT License: Point to the MIT license in the repository. It explicitly grants permission to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software. This reinforces the fact that it’s truly open source and under your control.

While there isn’t a single, official “Whisper doesn’t send data to OpenAI” statement on a webpage (because it’s implied by the nature of open-source software), the combination of the open codebase, local execution, and the MIT license provides strong evidence for its safe usage. By walking your boss through these points and demonstrating the local execution, you should be able to confidently address their concerns.

OpenAI also offers Whisper as a paid API service. This does involve transmitting your data to OpenAI, but they do not retain data for more time than necessary for safety, and do not use it to train AI models.

Thank you for your detailed and kind answer.
I would like to try running it locally once and verify whether communication is occurring.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.