How exactly do you get charged for using the API for assistants?

Foxalabs · November 14, 2023, 11:56am

Yea, there may be some very easy to implement check for duplication like that…

_j · November 14, 2023, 12:01pm

Assume I am really stupid, but only just read all the documentation and API reference.

https://platform.openai.com/docs/api-reference/messages/modifyMessage

How do I simulate what locally is just system + chat[-turns*2:] + user to pass the number of turns that fit in my token budget?

No need to answer hastily, I’ll give you all the time you need.

Then we have the utility of “answer not ready yet, still looping calling your API endlessly with the same thing”.

Foxalabs · November 14, 2023, 12:17pm

def truncate_messages(thread):
        max_length = 4096
    truncated_thread = []

    for message in thread:
        # Truncate message to 4096 characters if it's longer
        truncated_message = message[:max_length] 
        truncated_thread.append(truncated_message)

    return truncated_thread

Not tokenised, but you get the idea.

_j · November 14, 2023, 12:34pm

Just to clarify: we manage our conversations anyway, ignoring the thread feature?

Because you try to send some actual user/assistant conversation into a new thread, I expect you will get a nice traceback

openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: “‘assistant’ is not one of [ ‘user’] - ‘messages.1.role’”, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}

(simulated, even though errors are free)

Assistants have no other way to receive conversation history except by specifying a thread. It can receive an “instruction” upon creation. Or it can receive “user” input messages.

Will it believe you when you say:

user: (no this is actually what you said)

Reference: https://platform.openai.com/docs/api-reference/threads/createThread

Foxalabs · November 14, 2023, 12:39pm

I’ve not tried it, are you saying that there is 100% no way to modify an existing message? or manage message content in a thread at all? If that is so then that seems like an oversight.

_j · November 14, 2023, 12:41pm

Anti-multishot technology ™, because the API developer is an adversary not to be trusted.

nikkdev · November 14, 2023, 2:01pm

so that way i’m doing it is retrieving or using the same thread in an existing conversation. i noticed that IT IS quite expensive. is it correct that everytime you’re using the same thread from first to latest of the conversation will be charged in tokens?

let currentThreadId = null; // Global variable to store the current thread ID
let conversationHistory = []; // Stores the history of the conversation
const assistantId = '***';

async function askGPT(question) {
    try {
        conversationHistory.push({"role": "user", "content": question});

             // Retrieve the Assistant
             const myAssistant = await openai.beta.assistants.retrieve(assistantId);
             console.log(myAssistant); // For debugging

        if (!currentThreadId) {
            // Create a new thread if there isn't an existing one
            const thread = await openai.beta.threads.create();
            currentThreadId = thread.id;
        } else {
            // Optional: Retrieve the existing thread (for verification or additional logic)
            // const thread = await openai.beta.threads.retrieve(currentThreadId);
        }

        // Add a message to the thread with the user's question
        await openai.beta.threads.messages.create(currentThreadId, {
            role: "user",
            content: question
        });

        // Run the assistant to get a response
        const run = await openai.beta.threads.runs.create(
            currentThreadId,
            { assistant_id:  assistantId }
        );

        let runStatus = await openai.beta.threads.runs.retrieve(
            currentThreadId,
            run.id
        );

        // Polling for run completion
        while (runStatus.status !== 'completed') {
            await new Promise(resolve => setTimeout(resolve, 1000)); // Wait for 1 second
            runStatus = await openai.beta.threads.runs.retrieve(currentThreadId, run.id);
        }

        // Retrieve the messages after the assistant run is complete
        const messagesResponse = await openai.beta.threads.messages.list(currentThreadId);
        
        const aiMessages = messagesResponse.data.filter((msg) => msg.run_id === run.id && msg.role === "assistant");

        // Assuming the last message is the assistant's response
        return aiMessages[aiMessages.length - 1].content[0].text.value;
    } catch (error) {
        console.error('Error in askGPT:', error.response ? error.response.data : error);
        return 'An error occurred while processing your request.';
    }
}

is it much cheaper to create a new thread on every input and output? and don’t remember conversation anymore to lessen the usage of tokens?

smhoff256 · November 15, 2023, 9:04pm

Which of course really nullifies most of the benefits. This API is ridiculously expensive.

anon10827405 · November 15, 2023, 9:07pm

Nope. No way. We have no control over messages at all besides modifying metadata, adding a new user message, and attaching assistants.

100%. There is no way this is ready for production. It’s incredibly slow, expensive, and lacking in a lot of features.

I have hopes for the future though.

Foxalabs · November 15, 2023, 9:08pm

Well, this is what beta development is about, testing the product and refining, I can see a context limiting setting becoming a thing and also a message length and size limit setting as well.

Constructive feedback forms an essential part of the development cycle.

mark.thorby · November 21, 2023, 11:18am

Am I right that there is the ability to limit tokens on the Chat API, but not on the Assistant API?

merefield · November 21, 2023, 11:28am

imho, this is generic functionality that many of us have built in various ways locally already to get around the fact that no-one had yet offered a decent “assistant API” to date.

However, this version was poorly thought out, too simplistic and should have better anticipated client sensitivity to cost. Paying ~25c per call is ridiculous and will not fly in Production.

It would be nice to have a more open approach to improving this API.

The developers should:

summarise the feedback they’ve received so far
inform us what improvements they are looking into … and then
update us on the new version when it becomes available.

Perhaps that’s in progress already and someone could share the official comms?

adaptiv · November 22, 2023, 1:02am

Thats not a beta release. This is not even alpha. Its not working as intended.

Check out my bug report:

Why is it generating messages over and over again? Here is no thread context to be aware of. To be honest, it feels like a big rip-off and make some extra money.

MichaelDotBT · November 27, 2023, 12:19am

I only just figured this out tonight. I’ve been busy building out additions same when dumping the returned data, I saw this happening. I thought it was my local json being appended until I deleted it. It seems as though you may be better off generating a new thread at every call and grabbing the last returned messages to append your new message with. Keep track of the last used and current thread id. Would that not get the same response? I have to try. It would reduce cost drastically if it adheres to the message structure the same way. I’ll have to experiment.

Topic		Replies	Views
Assistants API token usage and pricing breakdown clarification API gpt-4 , api , assistants	10	10498	February 6, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	8	1907	April 10, 2024
Assistant API - What are Context Tokens in the Billing calculation? API assistants	24	12412	May 6, 2024
Assistants API pricing details per message API api-billing	68	40733	January 29, 2024
SOS: ALARMING Situation of Excessive Billing Threatening the Survival of my Company AI Project GPT API api-billing	20	2598	May 28, 2024

How exactly do you get charged for using the API for assistants?

Related topics