Synthetic instructions generated by OpenAI

Can I use Self-Instruct methods with ChatGPT/GPT-4 to generate synthetic instructions to fine-tune open-source LLMs used in commercial products? OpenAI’s Terms of Service clearly state that it’s not allowed. However, there are plenty of datasets that were generated by GPT-3 or GPT-4 and are available for commercial use (i.g. OpenOrca, WizardLM, UltraChat, etc.). I’m confused.

Hello and welcome to the community!

Yes, I think your observations are valid and that it is somewhat confusing. But note that the situation is actually quite clear and In some cases it appears OpenAI is cracking down on competitors fast.

But let’s try to understand from a common sense perspective what to make of it.

The terms of service state that it is not allowed to use the API to create competing models.

use Output (as defined below) to develop any artificial intelligence models that compete with our products and services.

When taking a look at the repos from the Wizard model, as an example, we find statements like this:

To commen concern about dataset:
Recently, there have been clear changes in the open-source policy and regulations of our overall organization’s code, data, and models. Despite this, we have still worked hard to obtain opening the weights of the model first, but the data involves stricter auditing and is in review with our legal team . Our researchers have no authority to publicly release them without authorization. Thank you for your understanding.

Disclaimer
The resources, including code, data, and model weights, associated with this project are restricted for academic research purposes only and cannot be used for commercial purposes.

Both quotes are from

To me this reads like they are trying to protect themselves from any legal challenge that could arise in the future.

Just because it did not happen yet, does not mean that it won’t. Just because somebody else does it does not make it “ok”.
If you expect that your use case will break the terms of service then don’t do it.

No.

1 Like