Greetings! I noticed that the docs for the moderation endpoint don’t have a posted rate limit (or I’m blind) and it is a free service…
So, what if I create a product centered entirely around mass crawling socials and passing large amounts of text asynchronously into the mod endpoint explicitly for the response, which I aggregate and report on potential trends.
It seems to comply with OpenAI terms, but would require significant loads being POSTed to the endpoint… what are the boundaries here?
None posted. If they exist I’m sure OpenAI will let you know when you reach them.
My guess is the moderation API is an extremely simple neural network—it is just a classifier after all—so, if were they to bill for it would be on the order of fractions of a cent per million (or even billion) tokens. Honestly, I wouldn’t be surprised if the bandwidth had more value than the processing.
It’s an endpoint which, in theory, every prompt for every model should be run through before passing it to the LLM, so unless you’re going absolutely bonkers with it they probably won’t care too much.
All that said, on the Moderation API overview page they have this to say:
Ah ok, thank you. It seems that little sentence does preclude my idea of using the moderation exclusively for a purpose without the other models.
Still curious about the details… I’d like to build a large, wide scale emotional, psychosocial tracker that aggregates large swathes of media and plots stuff over time. I wonder if that’s what they’re trying to prevent? Or if that’s ok?
I have trained and used plenty of sentiment classifiers. The specific question I have here is about using the moderation endpoint to perform sentiment analysis on OpenAI’s hardware instead of mine because it’s cheaper/free and faster than using other sentiment classifiers, which have to run on my hardware. There is no rate limit on the Moderation endpoint. So theoretically, I can build a powerful “social scanning” application purely using the Moderation endpoint response data.
Seems like a powerful tool for an application on its own.
The term you are going to fall afoul of is, Section 2.c.iv " including scraping, web harvesting, or web data extraction"
(c) Restrictions . You may not (i) use the Services in a way that infringes, misappropriates or violates any person’s rights; (ii) reverse assemble, reverse compile, decompile, translate or otherwise attempt to discover the source code or underlying components of models, algorithms, and systems of the Services (except to the extent such restrictions are contrary to applicable law); (iii) use output from the Services to develop models that compete with OpenAI; (iv) except as permitted through the API, use any automated or programmatic method to extract data or output from the Services, including scraping, web harvesting, or web data extraction; (v) represent that output from the Services was human-generated when it is not or otherwise violate our Usage Policies; (vi) buy, sell, or transfer API keys without our prior consent; or (vii), send us any personal information of children under 13 or the applicable age of digital consent. You will comply with any rate limits and other requirements in our documentation. You may use Services only in geographies currently supported by OpenAI.
Pretty sure the language here discusses (and is intended to get at) the specific use of scraping the “Services’” output, ie. someone making an extension for chatGPT that also scrapes the GUI chat history (“…from the Services”), etc. It doesn’t say you can’t use the Service to process or augment web-scraping… it says you can’t scrape data FROM the service (ie. the GUI).