How exactly do you get charged for using the API for assistants?

lancejpollard · November 14, 2023, 11:11am

I racked up $200 over an hour of running chatgpt on some data it was cleaning. I had an assistant with probably 5 paragraphs of text in the intro box, and 1 paragraph of content on every request limited to 8000 characters.

What goes into the calculation for the tokens. I see $0.03 for chatgpt-4 which is what I was using, but does my 5-paragraph intro text go into the calculation every request? Or does the response data size count against me? What exactly goes into the calculation, I can’t figure it out.

At first I was using gpt 3.5 and was getting charged $1.50, $2, then I switch to gpt4 and run it for an hour and I’m at $200.

Foxalabs · November 14, 2023, 11:13am

Everything in the conversation goes to the model every time you make a call, that is how the conversational aspect of the model works. The model is stateless internally and so needs to be fed all of the context required for that query, that means all of the prior context. A+B , then A+B+C then A+B+C+D then …

lancejpollard · November 14, 2023, 11:14am

Does this mean the entire message history is fed in every request, so each request gets larger and larger each time?

Edit, OIC, that was it then… Dang… A+B , then A+B+C then A+B+C+D then …

It doesn’t work well without the message history…

Foxalabs · November 14, 2023, 11:16am

Yes, up to a maximum determined by what model you are using, then the system will truncate the message history and loose things from the start, if you are using a 128K model, this can be 300 pages worth of text before truncating.

You can of course handle the thread yourself by removing elements and keeping it smaller, but you may loose contextual accuracy and awareness.

trenton.dambrowitz · November 14, 2023, 11:18am

Maybe you could get away with only giving it some snippets of the very long sections in the message history?

lancejpollard · November 14, 2023, 11:21am

If I am using message history, then it is going to max out my token usage after a few calls, right? So 8192 tokens every call, 0.03 * 8 = 25 cents per input, 50 sents per output. That is basically 75 cents a call! Am I doing this correctly?

At that point its cheaper to hire a human to do what I’ve been doing lol.

Foxalabs · November 14, 2023, 11:23am

count your input tokens for the entire call, divide by 1000 then multiply by 0.03

lancejpollard · November 14, 2023, 11:25am

Here is the calculation, yep. 8000/1000 = 8, 8 * 0.03 = 25 cents.

lancejpollard · November 14, 2023, 11:26am

What is your technique for limiting the request size? If I don’t use message history anymore, does the intro text count against me for the assistant? Would it be better to just go back to the completions API?

(How do I trim message history size even)

trenton.dambrowitz · November 14, 2023, 11:28am

It sounds like you’re using gpt-4 rather than gpt-4-1106-preview? The preview model is around 2.75 times cheaper, and it can take a higher Tokens Per Minute (probably 300,000 depending on your usage tier).

You don’t need to add the entire conversation into your new prompt as context either, just append the last message on.


        if prompt := st.chat_input("What is up?"):
            st.session_state.messages.append({"role": "user", "content": prompt})
            with st.chat_message("user"):
                st.markdown(prompt)

            with st.chat_message("assistant"):
                message_placeholder = st.empty()
                full_response = ""
                for response in client.chat.completions.create(
                    model=st.session_state["openai_model"],
                    messages=[
                        {"role": m["role"], "content": m["content"]}
                        for m in st.session_state.messages
                    ],
                    stream=True,
                    max_tokens=4000,
                ):
                    


                    # Extract the message content using the proper JSON keys
                    #st.write(response)
                    # Access the first 'Choice' object in the 'choices' list.
                    choice = response.choices[0]  # 'choices' is a list, so you can use index access here.

                    # Access the 'ChoiceDelta' object from 'choice' which contains 'content' field.
                    choice_delta = choice.delta  # 'delta' is an attribute of 'Choice' object.

                    # Access the 'content' field from 'choice_delta'.
                    message_content = choice_delta.content  # 'content' is an attribute of 'ChoiceDelta' object.

                    # Append the extracted content to the 'full_response' string with a newline character
                    full_response += message_content if message_content is not None else ""

                    # Update the placeholder with the 'full_response' using Streamlit's markdown to render it
                    message_placeholder.markdown(full_response + "▌")

                    #content = response.choices
                    #full_response += content.text.value + "\n"  # `.text.value` instead of content["text"]["value"]
                    #full_response += response['choices'][0]['message']['content']
                    #message_placeholder.markdown(full_response + "▌")
                message_placeholder.markdown(full_response)
            st.session_state.messages.append({"role": "assistant", "content": full_response})

This is how I do mine for my streamlit interface (it’s like GPT Plus for my co-workers without actually getting GPT Plus )

trenton.dambrowitz · November 14, 2023, 11:30am

Ignore all of the #'d out notes, I’m bad at coding lol

lancejpollard · November 14, 2023, 11:30am

Where can I find a JS variant haha, python is hard to parse in my head atm.

lancejpollard · November 14, 2023, 11:31am

gpt-4-1106-preview has a 200 requst per DAY limit I think, which is why I switched. I need to make 10,000 requests.

_j · November 14, 2023, 11:32am

You cannot “handle the thread yourself”. With what, the “truncate chat” feature not offered? You can add more metadata, and it is unclear if this is just more tokens for the AI to read and ignore.

You can only put user messages in, only the AI can write assistant messages, a hard firewall against utility and creating a smaller conversation.

role
string
Required
The role of the entity that is creating the message. Currently only user is supported.

Lesson:

Assistants will empty your account by design

DO NOT USE

trenton.dambrowitz · November 14, 2023, 11:34am

They keep updating it, I think they’re looking for the sweet spot. Currently tiers 1-5 have a 10,000 RPD limit.

Foxalabs · November 14, 2023, 11:35am

Frist, does message history provide a material benefit to your use case? if so consider reducing that to half, remember we are dealing with compounding data here, so a reduction in half would make a much larger reduction.

If message history has no value… don’t use it.

trenton.dambrowitz · November 14, 2023, 11:37am

If message history has no value… don’t use it.

Simple, yet effective

_j · November 14, 2023, 11:44am

Please explain your method for doing so.

If you don’t get a context completely filled with messages, by restarting so the assistant AI can’t even answer “what about the other one”, then any retrieval will also make sure that the context is also filled before the AI is set loose iterating on function calling against your API or code interpreter.

Foxalabs · November 14, 2023, 11:51am

The thread object and the messages it contains can be modified, so just reduce all messages to 50% of what they were, as in, loose the top 50%. Next time you perform a run, it will be less data to process. Context may be lost, but that is a cost of token reduction. Do this if the messages are … lets say >4096 tokens worth.

trenton.dambrowitz · November 14, 2023, 11:54am

The models tend to repeat the instructions they were given anyways, you could fairly easily have a max-token context or a maximum number of prompt-response interactions (maybe you only want the most recent 3 back-forths to be included). I agree with @Foxalabs

Topic		Replies	Views
Assistants API token usage and pricing breakdown clarification API gpt-4 , api , assistants	10	10277	February 6, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	8	1807	April 10, 2024
Assistant API - What are Context Tokens in the Billing calculation? API assistants	24	12047	May 6, 2024
Assistants API pricing details per message API api-billing	68	39860	January 29, 2024
SOS: ALARMING Situation of Excessive Billing Threatening the Survival of my Company AI Project GPT API api-billing	20	2505	May 28, 2024

How exactly do you get charged for using the API for assistants?

Assistants will empty your account by design

DO NOT USE

Related topics