sps
11
Nope.
Chat completion in itself is RLHFd to not do BAD stuff.
However some of the simplest and easiest things devs can do to prevent termination of their OpenAI API access because of abuse is:
-
Check requests with Moderation API before sending them to the models.
-
Pass the end-user IDs to the API when making any requests, this’ll help OpenAI know and communicate with you about users that are abusing the API via your application.
This allows OpenAI to provide your team with more actionable feedback in the event that we detect any policy violations in your application.
Further best practices are listed here.
5 Likes
OpenAI didn’t support BYOK application.
These people never should have gotten API keys to begin with.
It was unfair of the users to get a developer API key.
It’s unfair of the developers to not read the documentation.
2 Likes
Agreed. It’s a key for developers. Yet, the official current stance on BYOK seems to be … undefined… We’ve seen numerous developers use BYOK for whatever reason.
Although I’m not going to point fingers I have seen some support for paid applications which also support a self-hosted BYOK version. I’d say that in this case it’s a great idea. Yet, I’d also say that in most cases it’s just bad practices.
sps
14
The problem with this approach is the unnecessarily high frequency of requests hitting the endpoint. Such an approach will unnecessarily add to compute requirements.
Imagine abusive and normal usage requests hitting the same endpoint, the decision making happening on OpenAI side whether to pass the request for generation (think about the hold time), or deny them.
Then the complexity of API response; one response for denial while another entirely different for generation.
Compare this to the scenario where only filtered requests are hitting the generation endpoint (obviously no unnecessary ones) the decision making happening on devs’ systems.
The simplicity of response; devs know what kind of responses to expect from each endpoint.
+ Edit: Integrated approach will also hamper devs from bringing their own moderation system, which could be faster/better.
3 Likes
vb
15
Communication is key.
Who should use the moderation endpoint when, how and what will happen if there are problems.
Then everybody can make their own choices.
I would leave things as they are because I think the rules are common sense.
wclayf
16
The problem with making two API calls every time instead of one is the latency, and degrading performance for no reason.
I can see use cases where developers might have a need to call the ‘moderation’ endpoint standalone, with no intent to RUN the query if it passes moderation, but just because that’s true doesn’t mean that 100% of all of the most common usage patterns of the API should always require two back to back HTTP requests.
The obvious solution is a “moderate=true” flag on the completions endpoint, that defaults to true.
wclayf
18
Yes indeed, the additional HTTP call adds latency.
I think one of the most common use cases is “chat bots” where apps take raw input from end users, and so 100% of those calls would need to be sent thru moderation (according to many peoples interpretations of the vague policy docs), and so if that’s the case the rational choice for API design is to have moderation built into the ‘completions’ call, as an option that defaults to true.
I’d like to say to the API, “block this if it violates OpenAI policy”, and don’t ding my API key. So maybe the more applicable parameter name is “ding=false”. lol.
This seems like a lot of extra work and compromises just to prevent mere milliseconds of latency.
When you are building a tool for all programmers the less assumptions & opinions the better.
1 Like
N2U
20
I can agree with that, I think it’s worth delving a bit into this.
This is absolutely true, let’s remember that the moderation endpoint is an AI classifier. It’s probably not as resource-intensive as GPT, but it will still require GPUs, and those are currently in short supply.
1 Like
sps
21
Not just that. At the scale at which OpenAI receives requests per sec, even a slight hold can induce a huge backlog.
As the saying goes “the more moving parts, the more things can go wrong”.
1 Like
N2U
22
Fair point,
Although I think it’s worth mentioning that OpenAI is already passing the input, output, and generated title on ChatGPT through the moderation endpoint, so they do already know how to successfully put those cogs together.
1 Like
sps
23
ChatGPT is an end-user product and that uses the moderation endpoint really well.
also,
3 Likes
wclayf
24
Don’t you think cutting the number of requests per second in half is a good idea? lol. I do.
wclayf
25
When developing a tool for everyone, also the most commonly occurring usage patterns become particular relevant. If most of the time a call does moderation+completion, that’s a pretty good tip you need an endpoint that encapsulates exactly that.
N2U
26
I 100% agree with @sps
But I fail to see the logic here
Why would this cut the number of requests per second in half?
1 Like
wclayf
28
“API Endpoints with Integrated Content Moderation” take one HTTP request. Without integration it’s two.
wclayf
29
You made a whole lot of assumptions I wasn’t implying. I’m just saying have a “moderate=true” flag, and if I set that the server can simply throw a bad request error, with the reason. It’s not mixing responsibilities. It’s good API design. If something takes 10 steps you don’t always tell the API consumers to make 10 HTTP requests. It’s an art not a science.
wclayf
31
HTTP APIs normally have all kinds of different “reasons” that any particular request can fail. I’m just saying “bad morality” is most certainly one of the reasons the “completions” should be able to fail. And from my 23yrs exp as a dev, I can tell you it’s about 3 lines of code to check this and throw an exception in OpenAI’s implementation. Aside from adding “dontDingMeBro” as an argument.
N2U
32
The rate limits are counted per endpoint, using the moderation endpoint is not going to count towards your rate limits on completion requests 
Regardless of academically good API design, Azure already does this. They run filtering on all chat completion requests and return moderation errors from that endpoint.