Creating adhoc API keys for giving credits to visitors

Is there a way to create ad hoc API keys?
My use case is allowing visitors to an open-source project try out the project by giving them a certain budget.
If I manage it manually on a single API key I will surely hit the rate limits.
What’s the best current approach to offer users to taste your service?

Hi and welcome to the Developer Forum!

There is no way to do that currently without being an OpenAI partner, the only option would be to use a single API key for your account and generate pseudo API keys for your clients, you would then intercept those API keys on your backend, do the required usage checks and then pass the calls on to the OpenAI API and return the results, acting as a relay of sorts.

4 Likes

In that case could my account be potentially blocked if users misuse it with non-policy prompts for Dall-e or GPT?
Or will I simply get an error in the API response no matter how many times a unallowed prompt is sent?
Or is the practice to send every request to the moderation endpoint beforehand?

Correct, best practice is that you should send all input and output to the moderation endpoint and check that a) it does not trigger any flags and b) using the values sent back you can build your own moderation and acceptable use policy. If a user then creates a prompt that gets pass those those checks you have a log of the prompt and the corresponding moderation check, that way you can show due diligence and that you are following best practice, this should protect your account from issues, but you should always take steps to handle users who continually attempt prompts that cause triggers.

2 Likes

Do you happen to know the ballpark response times for these moderation endpoints? It seems like quite a detour. Instead of the straightforward:

  • User => Server => API It requires:
  1. User => Server
  2. Server => API (for input moderation)
  3. Server => API (for the original request)
  4. Server => API (for output moderation)
  5. Back to the User

Wouldn’t it be easier for OpenAI to add a flag for all endpoints:
moderateInput: true
moderateOutput: true

It’s a single responsibility that leads to easier, more manageable code.

The latency of the API request is not substantial.

If they mixed it you would need to account for completely different response objects, including potential ambiguities with errors if for example the moderation endpoint went down.

The moderation endpoint also sends values with the response in the case you want to add your own filters. If it was reduced to a simple error then the possibility of returning these values is lost.

Anyways, they do run their own moderation check and some services like Azure handle it for you.

So it could make sense to have a simple boolean check, but it restricts the possibilities and also adds complexity to the process, leading to now coupled code & responsibilities

1 Like

I get your point. Splitting into multiple endpoints enhances flexibility, robustness, and control.

Yet, look at the Assistants API evolution. Initially, we handled conversation state and context; now, OpenAI manages it with threads and runs, simplifying developers’ work.

Similarly, OpenAI might take care of moderation for us. They are not mutually exclusive. We can use flags for common cases and moderation endpoints for complex situations.

If I understand correctly, it seems that for 100% of the use cases in which your customer uses your API key on your behalf, the moderation endpoints become an inseparable part of the call sequence. Every single call passes through them.

Simplification almost always leads to a loss of control.

If you look deeper you will see many people abandoning the Assistant’s framework because it strips away the ability to manage context & tokens leading to high-costs that cannot be managed (yet), and also lacks some features such as streaming.

If you want a gamepad-like setup to fly a plane that’s fine. Some people want the cockpit full of controls. Making the assumption that everyone wants a gamepad is just bad practice.

Yes, but you’re not thinking this completely through. Not everyone is using the GPT endpoints with public-facing (inherently dangerous) queries. Some people, for example can be using it to summarize their writings which doesn’t require the moderation endpoint.

Even if there are 90% reasons to use it, the 10% should be enough to separate the responsibilites. This is a common programming principle that always leads to better code. You want to decouple responsibilities for better management, and then also have the ability to inject your own intermediate processes.

I’m not against having a simple “validateInput”. In fact last time I argued this point an OpenAI staff member muted me for 2 weeks and then said they would “look into it (having a validateInput parameter)”. Nothing ever came out of it though.

I’m just trying to justify why they are separated. I personally would never use it though because it would inevitably lead to more, and possibly harder error handling. The latency differences in making an API request (or two) is negligible, especially compared to the disadvantages I’ve mentioned.

Okay, thanks for your in-depth reply. I’ve learned more about the current trends. I didn’t know people are abandoning the Assistant API; I assumed the missing features would be added once it’s out of beta. Regarding ‘you’re not thinking this completely through’ :slight_smile: . I am just a single developer giving my (narrow) angle on my (specific) needs. It is up to the OpenAI team to gather the different perspectives and needs of various developers and ‘think it through’ to establish a robust and useful interface for their API.

It’s good to know that the latency is negligible. I think it’s now time for me to roll up my sleeves, write the moderation API calls, and see how it goes…"

1 Like

They sort of do that already on output, terminating your generation if the AI is writing something that looks copyrighted…

Moderations on output really just serves as a mechanism of discouragement or raising user alarm, to not give users satisfaction of receiving what was flagged. Or tracking how much of that is produced on an account.

You also aren’t given a rate limit of moderations to keep up with going full-tilt 2M tokens a minute on gpt-3.5-turbo.

1 Like