If I have prepared a large text to create embeddings, should I send the chunks to the moderation endpoint before using the embedding API to prevent breaking open ai usage rules?
That is an interesting question, I don’t know if the ada-002 has any moderation checking, as the end result is a vector and not some text representation… I’m not sure.
A belt and braces approach would say yes, moderation check everything. If I can get an official answer to that question I’ll post it back here.
On the safe side, yes.
Logically it doesn’t make sense to run it through moderations for embeddings as you’re not generating any content which is where usage policy comes into play.
This is a genuinely new question and as @Foxabilo said, an official answer from OpenAI staff would be authoritative.
Even if depictions within your data are not screened and counted against you for policy violations, the moderations endpoint can be another tool: flag that chunk so it doesn’t later get fed into an AI language model as knowledge augmentation.
One can consider excerpts of the language of usage policies though:
we’ve created usage policies that apply to all users of OpenAI’s models, tools, and services.
Disallowed usage of our models
– We don’t allow the use of our models for the following
It doesn’t say anything about only being in regards to inputs or solicited outputs.
Improved semantic search on your hate-filled website = embeddings.