Can't get a token count out of the API for Python/Flask Langchain RetrievalQA chatbot(*Closed*)

Like the title says in order to save people’s time and energy if this is irrelevant, but for months I’ve not been able to get a decent token count printed into my terminal even. langchain’s agents seem to screw it up somehow or tiktoken isn’t compatible or something, but I’m going on days now trying to get this to work and chatGPT can’t answer me no matter how many documents I feed it.

I’ve got a num of tokens function that supposedly works with tests, but I cannot get a total from a chat completion. is it because of the new gpt-4-1106-preview?

Is this a know issue? Am I just a noob for even playing with langChain(don’t worry I’m done after this, but I fear I’m in too deep at this point.)

any help appreciated. or point in the right direction.

“show me your code” (c) millions of people online trying to help somebody posting code problem without the code

sorry. I learned that you can’t use stream=True with a langChain retrieval agent and that the get_openai_callback is probably my best bet and tests worked with it, but I can’t get it configured to be sued perpetually with my endpoint here: @app.route(‘/chat’, methods=[‘POST’])
def chat():
try:
data = request.get_json()
user_message = data.get(‘message’)
user_id = data.get(‘user_id’) # Extract user_id from the request

    if not user_message or not user_id:
        logger.error("No message or user ID provided")
        return jsonify({"error": "Message or user ID not provided"}), 400
    
    logger.debug(f"Received user message: {user_message} for user ID: {user_id}")

    with get_openai_callback() as cb:
        # Generate response using the agent
        agent_response = agent.run(user_message)
        logger.debug(f"Agent response: {agent_response}")

        # Check if the agent returned a valid response
        if not agent_response or 'output' not in agent_response:
            logger.error("Agent did not return a valid response")
            return jsonify({"error": "Agent error"}), 500

        original_response = agent_response['output']
        professional_response = post_process_with_professional_tone(user_id, original_response)       # Log token usage

        # Log token usage
        logger.info(f"Total Tokens: {cb.total_tokens}")
        print(f"Total Tokens: {cb.total_tokens}")
        total_tokens_used = cb.total_tokens

        # Update memory with user message and chatbot response
        memory.append({'role': 'user', 'content': user_message})
        memory.append({'role': 'assistant', 'content': professional_response})

    user = db.session.get(User, user_id)

    # Deduct credits based on token usage
    deduct_credits(user_id, total_tokens_used)

    # Create or retrieve the ChromaDB collection for this user
    user_collection = get_user_chromadb_collection(user.chromadb_path)

    # Embed the user message and responses
    user_message_embedding = Embeddings.embed_query(user_message)
    original_response_embedding = Embeddings.embed_query(original_response)
    professional_response_embedding = Embeddings.embed_query(professional_response)

    # Add embeddings to ChromaDB
    vectorstore.add([user_message_embedding, original_response_embedding, professional_response_embedding])
    add_user_embeddings_to_chroma(user_id, user_message, [user_message_embedding, original_response_embedding, professional_response_embedding], user_collection)
   
    if professional_response_embedding is None:
        app.logger.info("Embedding for post-processed response is None")

    return jsonify({"user_message": user_message, "chatbot_response": professional_response})
    
except Exception as e:
    app.logger.error(f"Exception in chat endpoint: {e}")
    traceback.print_exc()
    return jsonify({"error": str(e)}), 500

Which error are you getting when executing this code or what is the expected behaviour and where it fails?

The “error” I’m getting is that it simply won’t count tokens for me, so I can’t deduct the "credits from my user’s accounts. I’m certain my deduct_credits function works fine. If I run this callback code just after initializing my agent like this:

agent = initialize_agent(
agent=“conversational-react-description”,
llm=llm,
tools=tools,
verbose=True,
max_iterations=1,
early_stopping_method=‘generate’,
handle_parsing_errors=True,
memory=memory,
persona=persona
)

with get_openai_callback() as cb:
response = agent.run(“Who is Olivia Wilde’s boyfriend? What is his current age raised to the 0.23 power?”)
print(f"Total Tokens: {cb.total_tokens}“)
print(f"Prompt Tokens: {cb.prompt_tokens}”)
print(f"Completion Tokens: {cb.completion_tokens}“)
print(f"Total Cost (USD): ${cb.total_cost}”)

Then I get a beautiful printout of the RetrievalQA chains token count in my terminal. That doesn’t work to track my user’s chatbot interactions, though, so it seems this callback has to be integrated into my /chat endpoint here:
@app.route(‘/chat’, methods=[‘POST’])
def chat():
try:
data = request.get_json()
user_message = data.get(‘message’)
if not user_message:
app.logger.error(“No message provided”)
return jsonify({“error”: “No message provided”}), 400

    # Embed the user message
    user_message_embedding = Embeddings.embed_query(user_message)
    if user_message_embedding is None:
        app.logger.info("Embedding for user message is None")

    # Generate response using the agent
    agent_response = agent.run(user_message)
    original_response = agent_response['output']

    # Embed the original agent response
    original_response_embedding = Embeddings.embed_query(original_response)
    if original_response_embedding is None:
        app.logger.info("Embedding for original agent response is None")

    # Post-process the response to ensure it adheres to Stanton's persona
    professional_response = post_process_with_professional_tone(original_response)

    # Embed the post-processed response
    professional_response_embedding = Embeddings.embed_query(professional_response)
    if professional_response_embedding is None:
        app.logger.info("Embedding for post-processed response is None")
    
    memory.append(
        {
            'role': 'user',
            'content': user_message
        }
    )
    memory.append(
        {
            'role': 'assistant',
            'content': professional_response
        }
    )
    # Store the embeddings in the Chroma DB
    vectorstore.add([user_message_embedding, original_response_embedding, professional_response_embedding])

    return jsonify({"user_message": user_message, "chatbot_response": professional_response})
except Exception as e:
    traceback.print_exc()
    return jsonify({"error": str(e)}), 500  

Note: the professional_response is a post-processing element I created to improve the general chatbot output and I wonder if that is what is confusing ChatGPT when I ask it to help me write this code properly. I’m too new to coding to really grasp what to do here. but the idea is to simply get the token “usage” amount from the gpt-4-1106-preview API call and then send it to my deduct_credits function(which I’ve tested extensively and it works fine):

def deduct_credits(user_id, tokens_used):
timestamp = datetime.now().isoformat()
print(f"[{timestamp}] deduct_credits called with user_id: {user_id} and tokens_used: {tokens_used}“)
logger.debug(f"deduct_credits called with user_id: {user_id} and tokens_used: {tokens_used}”)
“”“Deduct credits from user’s account based on tokens used.”“”
# Print to confirm the function is called
print(f"deduct_credits called with user_id: {user_id} and tokens_used: {tokens_used}“)
logger.info(f"Deducting {tokens_used} tokens from user_id: {user_id}”)
try:
user = User.query.get(user_id)
print(f"User retrieved inside deduct_credits: {user}“)
print(f"User credits inside deduct_credits: {user.credits}”)
# Debug: Log the user retrieval
logger.info(f"User retrieved for credit deduction: {user}")

    if user:
        logger.info(f"User {user_id} found with {user.credits} credits available")
        if user.credits >= tokens_used:
            user.credits -= tokens_used
             # Debug: Log the credit deduction before committing
            logger.info(f"User {user_id} credits before deduction: {user.credits + tokens_used}, tokens used: {tokens_used}")
            db.session.commit()
            
             # Debug: Log the credit deduction after committing
            logger.info(f"User {user_id} new credit balance: {user.credits}")
            return True
        else:
            logger.warning(f"Insufficient credits for user {user_id}. Required: {tokens_used}, Available: {user.credits}")
            return False
    else:
        logger.error(f"User {user_id} not found.")
        return False
except Exception as e:
    logger.error(f"Error in deduct_credits for user {user_id}: {e}")
    print(f"[{timestamp}] Error in deduct_credits: {e}")
    db.session.rollback()  # Rollback in case of error
    return False

I just need to reliable get the token usage amount from each API call and then in a 1 to 1 fashion deduct the equivalent credits from my user’s account.

I’m closing this down. no support and decided to go another route.

OpenAI needs to do way more for developers as this technical lead won’t last and then we’ll remember how we were treated when we needed help.