When we use streaming with open ai models, I am not getting the token count

Is this by design or is there a way to get the token count. How do I calculate the token cost if streaming does not return the total token count

Yes, this behavior is documented; the only thing you’ll get is “finish reason” as the last delta.

For counting tokens, you can also record and append the deltas to make a total response. Then you can use a library, tiktoken, to count the tokens used by the text.

Compare the count of both your sent input and received output with the daily “usage” record of the same exchange in the account management web page (isolated 10 minutes from other queries to get an individual record), to ensure the calculation and accounting is done correctly.

Example of a simple looping chatbot:

import openai
openai.api_key = "sk-xxxxx"
system = [{"role": "system", "content": "You are a helpful AI assistant."}]
user = [{"role": "user", "content": "Introduce yourself."}]
chat = []
while not user[0]['content'] == "exit":
    response = openai.ChatCompletion.create(
        messages = system + chat[-10:] + user,
        model="gpt-3.5-turbo", stream=True)
    reply = ""
    for delta in response:
        if not delta['choices'][0]['finish_reason']:
            word = delta['choices'][0]['delta']['content']
            reply += word  # appends the deltas to record the whole response
            print(word, end ="")

# here, by tiktoken, you can calculate the length of:
#"user['content']", "system['content'], chat message contents sent,
# (or a function that understands formatted "system + chat[-10:] + user" sent)
# plus the "reply" variable that has the bare response text

    chat += user + [{"role": "assistant", "content": reply}]
    user = [{"role": "user", "content": input("\nPrompt: ")}]