Max_token doesn't work: chat gpt 4o

I tried to generate a JSON format text using GPT-4o, but the output is much longer than expected. Despite setting max_tokens , the API generates excessively long responses and includes duplicate content. I’m also receiving the following warning:
Warning: API response for chunk 2 exceeds max_tokens limit.

Any insights or suggestions would be greatly appreciated!
Thank you so much in advance for your help!

def generate_survey(text, intermediate_file_name):
    max_chunk_size = 250  
    text_length = len(text)
    text_chunks = []
    
    for i in range(0, text_length, max_chunk_size):
        text_chunks.append(text[i:i + max_chunk_size])

    combined_response = []

    chat_history = [{"role": "system", "content": "You are a survey generator."}]
    for index, chunk in enumerate(text_chunks):
        chat_history.append({
                "role": "user",
                "content": f"""

Convert the following text into a JSON format for a survey with the following structure
Make sure to review the chat history and do not make same questions.
I will combine all your responses later, so remove the initial [ and final ] from each response and add a comma ',' immediately after the closing brace at the end of each response.

{{
    "question_number": "<number>",
    "question_text": "<question_content>",
    "answer_type": "<single_choice|multiple_choice|matrix_single_choice|matrix_multiple_choice|free_single_text|free_multiple_text>",
    "options": [
        {{
            "option_number": "<number>",
            "option_text": "<text>"
        }}
    ],
    "others": "<other information>"
}}

Here is the text to convert:

{chunk}

Ensure the format is correct, paying attention to curly brackets and commas especially last comma revery response.

"""
                })

        try:
            response = openai.chat.completions.create(
                model="gpt-4o",
                messages=chat_history,
                max_tokens=500,
                temperature=0.5
            )

            # max_tokenが適用されず出力が多すぎる場合のエラーハンドリング
            if response.usage.total_tokens > 500:
                print(f"Warning: API response for chunk {index + 1} exceeds max_tokens limit. Tokens used: {response.usage.total_tokens}")
                continue

            response_message = response.choices[0].message.content.strip()
            print(f"API response for chunk {index + 1}: {response_message[:50]}...")  # デバッグ用の出力
            cleaned_response = clean_response(response_message)
            
            # 質問の重複をチェックし、既存の質問を避ける
            if any(prev_question in cleaned_response for prev_question in previous_questions):
                print(f"Duplicate question detected in chunk {index + 1}, skipping...")
                continue
            
            combined_response.append(cleaned_response)
            previous_questions.append(cleaned_response)

            # チャット履歴にアシスタントの応答を追加
            chat_history.append({
                "role": "assistant",
                "content": cleaned_response
            })

You have to instruct the AI as to what type of response you want. It cannot see what you have set parameters to – that only serves to truncate the output or throw an error.

Getting GPT-4o not to spew double the length of output at you is also a challenge; it doesn’t have much regard for your instructions or wishes. You have to tell it excessively why if it doesn’t comply, errors and crashes will happen and it will lose credits it needs to survive, etc. if you want to affect its little brain and only being made as a ChatGPT AI.

1 Like

it worked !!
thanks a lot !!