Synthetic instructions generated by OpenAI

duruku · December 23, 2023, 8:34pm

Can I use Self-Instruct methods with ChatGPT/GPT-4 to generate synthetic instructions to fine-tune open-source LLMs used in commercial products? OpenAI’s Terms of Service clearly state that it’s not allowed. However, there are plenty of datasets that were generated by GPT-3 or GPT-4 and are available for commercial use (i.g. OpenOrca, WizardLM, UltraChat, etc.). I’m confused.

vb · December 23, 2023, 9:42pm

Hello and welcome to the community!

Yes, I think your observations are valid and that it is somewhat confusing. But note that the situation is actually quite clear and In some cases it appears OpenAI is cracking down on competitors fast.

But let’s try to understand from a common sense perspective what to make of it.

The terms of service state that it is not allowed to use the API to create competing models.

use Output (as defined below) to develop any artificial intelligence models that compete with our products and services.

When taking a look at the repos from the Wizard model, as an example, we find statements like this:

To commen concern about dataset:
Recently, there have been clear changes in the open-source policy and regulations of our overall organization’s code, data, and models. Despite this, we have still worked hard to obtain opening the weights of the model first, but the data involves stricter auditing and is in review with our legal team . Our researchers have no authority to publicly release them without authorization. Thank you for your understanding.

Disclaimer
The resources, including code, data, and model weights, associated with this project are restricted for academic research purposes only and cannot be used for commercial purposes.

Both quotes are from

To me this reads like they are trying to protect themselves from any legal challenge that could arise in the future.

Just because it did not happen yet, does not mean that it won’t. Just because somebody else does it does not make it “ok”.
If you expect that your use case will break the terms of service then don’t do it.

elmstedt · December 24, 2023, 1:13am

No.

Topic		Replies	Views
Ok, I have to ask... Open AIs terms include not allowing anything involving legal stuff Community gpt-4 , chatgpt , api	2	436	August 8, 2023
A site is stealing and duplicating our GPTs - how can we protect our GPTs? GPT builders chatgpt , gpts , gpt	21	3047	January 10, 2024
Legalities of self improving agents API	8	632	December 16, 2023
Query regarding a product Community	2	387	October 12, 2021
Can OpenAI sell "Content"? Community legal	2	2252	September 5, 2023

Synthetic instructions generated by OpenAI

Related Topics