OpenAI API Proxy to share limited token budget with others?

We are building a tool for students at a university course that uses the OpenAI API for certain functionality. Typical usage costs less than $1/user, but we do not want to require every student to create their own OpenAI Account and provide their own billing information (the $5 welcome budget is only available if they do not already have an older account).

Are there any proxy solutions available where we can provide one real API key and generate many API keys for students each of which is limited to a certain token/pricing budget? This feels like a very common question but I did not find an answer to it.

The desired solution would offer the following:

  • publicly hosted server proxy (preferred) or on-premise hosting
  • free to use or reasonably small fee (<$10)
  • generate custom tokens and assign individual token budgets to them (i.e., 100 users á $1 in total)
  • optionally, restrict individual tokens to certain endpoints and/or models
  • optionally, interface to programmatically generate individual tokens for a list of users

Are you aware of any existing solutions or would we have to write this by ourselves? Thanks in advance!

1 Like

so you just want to re-sell the OpenAI API?

This has been discussed before. Note, there is no code solution in that thread, but the OP might have found or developed one.

There are several services that emulate OpenAI API. Giving away what they coded might affect their advantage.

Re-sell suggests commercial interests. :slight_smile: I just want to share keys in a restricted way with a limited audience for educational purposes, yes. There are tons of other services like Poe who basically do this.

1 Like

Yeah, I saw that thread but they don’t recommend a solution that generates and limits individual tokens for users.

I just discovered GitHub - bricks-cloud/BricksLLM: Simplifying LLM ops in production and might give it a try.

2 Likes

Hi Christoph! I’m one of the creators of BricksLLM. We have both a publicly hosted version of BricksLLM, and the open-source version that you found.

Could you email me at donovan@bricks-tech.com? I’d love to chat more to see if we can help.

1 Like

Hi Donovan, thanks for the message! I just set up a self-hosted instance of BricksLLM for testing purposes and it looks very promising. I have got a few questions indeed, maybe you could help:

  1. Is it possible to define rate limits at the granularity of the OpenAI API? I.e., different RPM/TPM for each model? Context is that we want to give 100 students access to the API at the same time through our tier 3 token so every key needs to have 35 RPM/1600 TPM for gpt-3.5-turbo and 50 RPM/5000 TPM for text-embedding-ada-002. If I understand the documentation correctly we could only set the minimum of each limit currently?

  2. Similarly, can we whitelist the available models per key (so that our students could not spend our tokens on any other models)?

  3. I noticed that streaming mode for chat completions is much less fluent when using the proxy when compared to the normal OpenAI API. The normal API delivers many (5-10 I guess) chunks per second to the client while the proxy seems to update the response only once per second without caring about individual chunks. Would it be complicated to fix that?

    (I only took at short look at the implementation, maybe the buffer size is too large or synchronous cost estimation takes too much time? Of course I don’t have a deeper knowledge of your codebase, even though it appears nice to read :slight_smile: )

Would appreciate your help a lot. Thanks in advance & have a nice weekend!

I am the maintainer of the repo. To answer your questions, we don’t have features in place to support use-cases in question 1 and 2. That has been said. I am willing to add those features quickly and have you tried it out in a cloud/self hosted version.

Meanwhile, I am looking into the issue you raised in question 3. I created issues to keep track of all the problems you raised. Will update you once I make some progress.

1 Like

@LinqLover Hi Christoph, we (https://llmetrics.app) are building a product that focuses more on monitoring / limiting LLM API usage on a user-level instead of (necessarily) a key level. If you have a system in place to authenticate users (to provide us an anonymised user id), then our product may be of help. These analytics are on a hosted platform and you can query any info you need.

Nevertheless, I have sent you a DM, and would be delighted to talk more about the issues you have mentioned above :slight_smile:

1 Like

Hi @spike thanks a lot for your reply and sorry for the huge delay! I saw you included a fix for the streaming lag in the new release. Are there any docs on how I can upgrade my self-hosted BricksLLM docker container to the latest version so I can try it out?

Of course it would be awesome if you could add more fine-grained control for rate limits and models used. May I ask for an ETA though as we want to bring our tool to our students really soon (like, ideally, at the beginning of next week)? :sweat_smile:

Thank you, please see my direct message. :slight_smile:

hey friend, i added the feature where you can have both path and model access control. i put up a detailed guide under a folder called cookbook in the github repo.

regarding how to update bricksllm, i added a section called How to Update in the README cheers.

sorry i can’t post links due to being new to the forum

1 Like

the first feature you requested takes a little longer to finish. i suggest use the newest feature to create separate API keys: one for embedding endpoint and one for chat completion endpoint

1 Like

Hey @spike, this is great! Upgrade via docker worked for me and streaming responses are very smooth now.

Regarding the rate limits, as we want to give 1.6k TPM to each student, I will express that to BricksLLM using costLimitInUsdOverTime and costLimitInUsdUnit. Unfortunately, costLimitInUsdUnit does not support m at the moment, so we will aggregate this per hour and hope that there will not be many problems due to this.

Besides that, here are also some other issues and thoughts that are not critical for us but I wanted to share them with you anyway in case they can help your promising product (nevertheless I ranked them with descending relevance for our setup):

  1. Monthly rate limits. We want to give each student $0.2/month which I currently solve via the costLimitInUsd parameter (since costLimitInUsdOverTime is required for rate limits), but this means that the cost limit will never be refreshed. So basically either we would need the ability to pass multiple costLimitInUsdOverTime/costLimitInUsdUnit pairs per key or there should be an option to manually reset the token consumption each month. For now, I will just increase costLimitInUsd by 0.2 each month, but this means that students could “save” their budget from previous months and overuse the account in a later month …

  2. Avoid abuse of keys. We would like to avoid abuse of student keys which could lead to our account at OpenAI being banned. I see two alternative options to achieve that:

    1. The proxy could support automoderation, i.e., send each request to the moderation endpoint before sending it to the actual model.
    2. The proxy could log all requests for, maybe, 30 days so that in the event of an account ban, we would be able to tell which student abused their key.

    Would anything of that be easily achievable with your current solution? Maybe full prompts could be preserved in the events?

  3. The new allowedModels seems not to support gpt-3.5-turbo-1106, is this right?

  4. I did not manage to use GET /api/events without a customId parameter to get all events, even though the docs say it is optional.

  5. Deleting keys: While officially not supported, DELETE "$SERVER/api/key-management/keys/$keyId" works for me. Surprisingly, this does not revoke the key, though, I have to manually do that before.

Again, thanks a lot for your help. :pray: I think I will get BricksLLM into “production” for our students next week! I will mention this helpful product to them. :slight_smile:

PS: What system requirements (RAM, CPUs, storage) do you recommend for hosting the proxy for 100 people, if you have any empirical values for that?

Hello.
Seems like your product is close to what we are looking for.
Will be glad to chat about it

to answer your questions,

  1. i can add a feature to support monthly rate limits
  2. i need to think about potential solutions. when you deploy our service, you can enable logging of all the requests by turning privacy mode off when starting the application -m production -p verbose. by default we don’t log the content of requests due to privacy concerns
  3. gpt-3.5-turbo-1106 should be supported. i just tested it out.
  4. you can only get a single event using the endpoint via custom id as for now since fetching all events do not seem to be scalable. sorry the doc didn’t make it clear
  5. the deletion endpoint is not officially supported because we would want to have a record of everything. in the future we will probably depercate the deletion endpoint and add an endpoint to archive the key
  6. 512mb and 1 cpu should be good enough

we have a discord server if you want more immediate communication. i am super pumped for you to give bricksllm a try :slight_smile: thanks for all the amazing suggetions

1 Like

Perfect, thanks a lot! Looking forward to further updates :slight_smile: