How to control the expenditure of a budget?

Hi,

I dev an app/connector (OpenAI_Assistant ↔ MyApp ↔ Business_chats) That connects chats for clients with OpenAI assistants.

I want to provide it to clients. But I had a question:

Let’s say I have 3 clients, I created an apikey and assistant for each, client accesses his assistant through his own API key.

It turns out that they all use one common deposit of my account? Which I am replenishing. How can I control/limit the API keys?

For example, clients allocate different amounts for the work of their openai assistants: $10, $70, $100.

I have a total of $500 on my balance.

How to control consumption?

What possible steps do you currently see for this situation?

Regards, Viktor

Hi Viktor!

OpenAI doesn’t offer a solution to this. You need to create a gateway that tracks usage and billing (“API Monetization”).

I expect that you’d wanna either offer a fixed price product with a cost (“fair usage”) ceiling, or you’d want to include an uplift to cover your ancillary costs.

In either case, you need to manually track your users’ actual or statistical spending.

2 Likes

Hi Diet, thank you for message!

Currently, I see the simplest course of action as registering an OpenAI account for each client, which they can top up themselves. My application would then be offered on a subscription basis. However, from a service perspective, this is not convenient.

Otherwise, as far as I understand, I would need to calculate how many tokens they use, how much it costs, etc. But I don’t see the point in doing this. Since OpenAI already does this, what I calculate may not correspond to what OpenAI calculates.

In fact, OpenAI already does all these calculations, and what I need, they can very easily implement. Just have to wait?

It’s possible

they’re rolling out limited tracking, https://platform.openai.com/usage (right hand side) but for now it only shows how many calls and to what models these calls were made.

I wouldn’t wait on features that they haven’t announced (but tbh I wouldn’t even wait on things they do announce).

Thank you, I’ve seen it.

Another option for token accounting using standard methods. You can use: https://platform.openai.com/docs/guides/text-generation/chat-completions-api
It returns data:
“usage”: {
“completion_tokens”: 17,
“prompt_tokens”: 57,
“total_tokens”: 74
}

But in my case, it doesn’t fit. As far as I understand, chat-completions is a one-time question-answer, not a dialogue with history.

ooh.

well, the way it works is that when you have a conversation, you append the new messages to the old message list and send the whole shebang against the AI again. so you do have a square growth in input token cost, the longer the conversation gets.

if you use streaming, it won’t compute that for you, but you can use tiktoken to get a really good estimate.

edit: I’m wondering, did you choose assistants because you didn’t know you could do it with chat completions? The cost would be similar or less by using the chat completion api directly.

In my application, the work with the API is implemented like this:

#python
from openai import AsyncOpenAI

clientAI = AsyncOpenAI(api_key='secret_key')
thread = await clientAI.beta.threads.create()
message_ai = await clientAI.beta.threads.messages.create(thread_id=thread.id,
                                                         role="user",
                                                         content=message)
run = await clientAI.beta.threads.runs.create(thread_id=thread.id,
                                              assistant_id='asst_id')
run_status = await clientAI.beta.threads.runs.retrieve(thread_id=thread.id,
                                                       run_id=run.id)
messages = await clientAI.beta.threads.messages.list(thread_id=thread.id)

There is also additional logic for processing assistant functions, when:

run_status.status == 'requires_action'

I have a square growth in input token cost?

As I understand it, in my case, I need to count it myself, for example, through tiktoken

But I don’t want to do this, because there might be discrepancies in calculations with OpenAI

Why calculate myself what OpenAI is already calculating?

Yeah, but it’s extra complicated with threads. I’m not sure if it still does auto truncation and it depends on if you use runs, but in general there’s a lot of debate around cost control with the assistants.

OpenAI will probably eventually release tools that help you track this (unless they decide to retire assistants completely)

In my biased opinion, the decision to use assistants in a product comes with a ton of risks and not many rewards. You can achieve the same thing with other tools, but I understand that it may be easier for novice developers to get started by just using assistants.

2 Likes

I am that noob, such that I know not of the risks you speak, could you please explain?

From my perspective having assistants that tackle different problems or approaches to problem solving and threads for different users interacting seems pretty useful.

Well, in the short term you have uncontrollable costs, at least that’s my understanding.

In the long term, you have vendor lock-in.

1 Like

Thank you for opening my eyes to the real pricing of the API. Before this, I thought I was only paying for the message and response. I read this topic: “Assistants API pricing details per message” and i’m outraged.

Probably I will postpone my idea until better times.

I have similar tasks, I came across this service ChatGPT | Team-GPT (team-gpt.com), it seems to be able to track keys and adjust the number of requests.

You can also use WorkBot, which is The Only AI Platform you will ever need!