Structured Outputs sometimes failing due to "Could not parse response content as the length limit was reached"

empideveloper · February 27, 2025, 5:27am

Hi all!

I am facing a problem with the API call with structured outputs in my application. I am using it a several location in my application but the very last one - which creates a payload for a WhatsApp API request - is sometimes failing due to length limit…the weird thing is…it is not even near the limit of tokens…and sometimes the same input gets passed…the issue is also really hard to re-produce as its really coming very few times…but still blocks us to go on production

Can someone tell me why this one failing sometimes?

Here is the reference:

LengthFinishReasonError('Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=1310, total_tokens=17694, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))')

Traceback (most recent call last):
  File "/app/src/core/openai_chat_completion.py", line 188, in openai_chat_completion_structured
    response = openai_client.beta.chat.completions.parse(
  File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 160, in parse
    return self._post(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1283, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 960, in request
    return self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1066, in _request
    return self._process_response(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1165, in _process_response
    return api_response.parse()
  File "/usr/local/lib/python3.10/site-packages/openai/_response.py", line 325, in parse
    parsed = self._options.post_parser(parsed)
  File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 154, in parser
    return _parse_chat_completion(
  File "/usr/local/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 72, in parse_chat_completion
    raise LengthFinishReasonError(completion=chat_completion)
openai.LengthFinishReasonError: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=1310, total_tokens=17694, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

response_instructions: |
  
  # Goal
  Generate an output based on the Pydantic Model to create WhatsApp payloads (including text, images and videos) for the WhatsApp API.

  # Rules
  - Strictly use the content provided in the input without adding or removing any information.
    - Maintain the order of the messages exactly as provided.
    - Preserve any emojis/icons that are part of the input.
    - Keep the message structure as given (e.g., if there is an introduction and an outro, both should be included in the output). 
    - Group relevant information in one message where possible. For example, if asking for property requirements to create a filter, put all relevant information in the same message.
    - If an image URL and a caption/description are provided, treat it as an "image" type and us the description as the description/caption of the image. **Do not include it as type text**.
      - If a video URL and a caption/description are provided, treat it as an "video" type and us the description as the description/caption of the video. **Do not include it as type text**.

      **Output Rules for Image:**
      - Strictly ensure that if a image_url is provided that you send that content as a message with the type "image"!!!

      - Overtake the image_url of the input for the output "image_url"
      - Take the description exactly as provided from the input fields, such as "property_description" and use it as the "image_description."
      - Strictly ensure that if a image_url is provided that you send that content as a message with the type "image"!!!

      **Output Rules for Video:**
       - Strictly ensure that if a video_url is provided that you send that content as a message with the type "video"!!!

       # Key "type"
       - Use the type "video" for the key "type"

       # Key "video_url"
       - Overtake the "video_url" of the input for the output "video_url"
         Never put the URL in the description! Always put it as value of the key "video_url" 

      # key "description"
       - Take the "community_information" and "community_response_message" in the description of the video.
         Put a paragraph between (two lines) the content of the "community_information" and the "community_response_message". The seperation should be clear and easy to understand for better readability.


  # WhatsApp-specific Rules
  - WhatsApp formats bold text with *<word>*. Therefore, if the input has **<word>**, convert it to *<word>* so that it appears correctly as bold in WhatsApp. Avoid leaving both ** and * around the text.
  - Do not use # for headlines. Instead, transform them into bold text.
  - Do not use a "-" for listing items if an emoji/icon is also provided.
  - If "-" is used for listing, replace it with "•".
  Example for thw WhatsApp-specific rules:
      Input: "### 5. **The Springs**
  - 🌊 *Lakes and Parks*: Beautiful lakes and landscaped gardens.
  - 🏊 *Amenities*: Community pools and sports facilities.
  - 🏫 *Schools*: Close to schools and nurseries."
      Output: "*5. The Springs*
   🌊 *Lakes and Parks*: Beautiful lakes and landscaped gardens.
   🏊 *Amenities*: Community pools and sports facilities.
   🏫 *Schools*: Close to schools and nurseries."

  # Instructions
  - Use the Pydantic Model to structure the output.
  - If a weblink/hyperlink is provided in the input also keep the weblink/hyperlink in the output. This is for example important to provide the link to book a meeting with the agent.

  # Output
  - If **"Fallback mode status" is true**, process the **"Fallback message"** instead of the main input.
user_input: |-
  Let's find the perfect project for you! To get started, could you please share a bit more about your preferences? Here are some questions to guide us:

  • *Property Type*: Are you interested in a villa, townhouse, or apartment?  
  • *Budget*: What is your budget range for the property?  
  • *Bedrooms*: How many bedrooms do you need?  

  Once I have this information, I can recommend some exciting projects that suit your needs! 😊
PydanticModel: <class 'models.whatsapp_model.WhatsAppPayload'>
temperature: 0
model: gpt-4o-mini
frequency_penalty: 0
presence_penalty: 0
timeout: 180

@traceable(run_type="llm", name="openai_chat_completion_structured")
def openai_chat_completion_structured(
    response_instructions: str,
    user_input: str,
    PydanticModel: Any,
    temperature: float = 0.0,
    model="gpt-4o-mini",
    frequency_penalty=0.0,
    presence_penalty=0.0,
    timeout=180,
    **kwargs,
) -> Union[Any, None]:
    """
    Function to generate a structured response using GPT-4o-mini and validate it using a Pydantic model.

    Args:
    - response_instructions: The system-level instructions for generating the response.
    - user_input: The user-specific input with all necessary information formatted.
    - PydanticModel: The Pydantic model to be used for response validation.
    - temperature: The temperature for the completion model.

    Returns:
    - The generated and validated message response as a Pydantic model instance.
    - None: If an error occurs or retries are exhausted.
    """
    try:
        # Log detailed information about the chat completion request
        log_chat_completion_details(
            system_prompt=response_instructions,
            user_input=user_input,
            model=kwargs.get("model", "default_model"),
            temperature=temperature,
            additional_params={
                "response_model": PydanticModel.__name__,
                **{k: v for k, v in kwargs.items() if k != "model"},
            },
        )

        logger.info("Initiating chat completion request")

        # Input validation
        if not isinstance(response_instructions, str) or not response_instructions:
            logger.error("Invalid or missing response instructions.")
            return None
        if not isinstance(user_input, str) or not user_input:
            logger.error("Invalid or missing user_input.")
            return None
        if not isinstance(PydanticModel, type) or not issubclass(
            PydanticModel, BaseModel
        ):
            logger.error(
                "Invalid Pydantic model provided. It must be a subclass of BaseModel."
            )
            return None

        # Count tokens (but don't enforce limits)
        instruction_tokens, input_tokens = count_token_usage(
            response_instructions, user_input, model
        )

        max_retries = 3
        attempt = 0
        start_time = time.time()

        while attempt < max_retries:
            try:
                # Check for timeout
                if time.time() - start_time > timeout:
                    logger.error("Request timed out")
                    return {"error": "timeout", "message": "Request timed out"}

                # Making the API call to OpenAI
                response = openai_client.beta.chat.completions.parse(
                    model=model,
                    messages=[
                        {
                            "role": "system",
                            "content": f"{response_instructions}\nProvide your response in JSON format.",
                        },
                        {"role": "user", "content": user_input},
                    ],
                    temperature=temperature,
                    response_format=PydanticModel,
                )

                # Log the token usage
                if hasattr(response, "usage"):
                    logger.info(
                        f"Token usage - Instructions: {response.usage.prompt_tokens}, "
                        f"Response: {response.usage.completion_tokens}, "
                        f"Total: {response.usage.total_tokens}"
                    )

                # Process the response...
                response_content = response.choices[0].message.content

                # Log the raw response before parsing
                try:
                    # Convert response_content to dict if it's a string
                    if isinstance(response_content, str):
                        import json
                        response_dict = json.loads(response_content)
                    else:
                        response_dict = response_content
                    
                    # Create a multi-line string representation
                    flat_response = "\n".join([f"{k}: {v}" for k, v in response_dict.items()])
                    
                    console.print(
                        Panel(
                            flat_response,
                            title="[green]Raw Response[/green]",
                            border_style="green",
                            padding=(1, 2),
                        )
                    )
                except Exception as e:
                    logger.warning(f"Failed to format response for logging: {e}")
                    pass

                return PydanticModel.parse_raw(response_content)

            except RateLimitError as e:
                logger.warning(f"Rate limit exceeded: {str(e)}")
                # Implement exponential backoff
                wait_time = (2**attempt) + random.uniform(0, 1)
                time.sleep(wait_time)
                attempt += 1
                continue

            except APITimeoutError as e:
                logger.error(f"Request timed out: {str(e)}")
                return {
                    "error": "timeout",
                    "message": "The request timed out. Please try again.",
                }

            except APIConnectionError as e:
                logger.error(f"Connection error: {str(e)}")
                return {
                    "error": "connection",
                    "message": "Failed to connect to the API. Please check your network connection.",
                }

            except AuthenticationError as e:
                logger.error(f"Authentication error: {str(e)}")
                return {
                    "error": "auth",
                    "message": "Authentication failed. Please check your API key.",
                }

            except BadRequestError as e:
                logger.error(f"Bad request error: {str(e)}")
                return {
                    "error": "bad_request",
                    "message": "The request was malformed. Please check your inputs.",
                }

            except PermissionDeniedError as e:
                logger.error(f"Permission denied: {str(e)}")
                return {
                    "error": "permission",
                    "message": "You don't have permission to access this resource.",
                }

            except InternalServerError as e:
                logger.error(f"OpenAI server error: {str(e)}")
                attempt += 1
                if attempt < max_retries:
                    time.sleep(2**attempt)  # Exponential backoff
                    continue
                return {
                    "error": "server",
                    "message": "OpenAI servers are experiencing issues. Please try again later.",
                }

            except APIError as e:
                logger.error(f"API error: {str(e)}")
                if "length limit was reached" in str(e).lower():
                    return {
                        "error": "token_limit_exceeded",
                        "message": "The input is too large to process. Please break down your request into smaller parts.",
                    }
                attempt += 1
                if attempt >= max_retries:
                    return {"error": "api", "message": str(e)}

        return {
            "error": "max_retries",
            "message": "Maximum retry attempts reached. Please try again later.",
        }

    except Exception as e:
        logger.error(f"Error in chat completion: {str(e)}", exc_info=True)
        raise

from pydantic import BaseModel, Field
from typing import List, Optional

"""WhatsApp Front-End Structure"""
#################################################################################
# Helper Models - Text, Image and Video models
class WhatsAppImage(BaseModel):
    image_url: Optional[str] = Field(None, description="URL of the image.")
    description: Optional[str] = Field(
        ..., description="Caption/description of the image."
    )

class WhatsAppVideo(BaseModel):
    video_url: Optional[str] = Field(None, description="URL of the video.")
    description: Optional[str] = Field(
        ..., description="Caption/description of the video."
    )

class WhatsAppMessage(BaseModel):
    type: str = Field(
        ..., description="The type of the message, either 'text', 'image' or 'video'."
    )
    text: Optional[str] = Field(
        ...,
        description="The textual content of the message, applicable if the type is 'text'.",
    )
    image: Optional[WhatsAppImage] = Field(
        ...,
        description="The image object containing image URL and description, applicable if the type is 'image'.",
    )
    video: Optional[WhatsAppVideo] = Field(
        ...,
        description="The video object containing video URL and description, applicable if the type is 'video'.",
    )
#################################################################################
# Main reference model for the WhatsApp payload
class WhatsAppPayload(BaseModel):
    messages: List[WhatsAppMessage] = Field(
        ..., description="The structure containing the list of WhatsApp messages."
    )

And here is the call after i told the AI to retry the query over the whatsapp chat it worked…

response_instructions: |
  
  # Goal
  Generate an output based on the Pydantic Model to create WhatsApp payloads (including text, images and videos) for the WhatsApp API.

  # Rules
  - Strictly use the content provided in the input without adding or removing any information.
    - Maintain the order of the messages exactly as provided.
    - Preserve any emojis/icons that are part of the input.
    - Keep the message structure as given (e.g., if there is an introduction and an outro, both should be included in the output). 
    - Group relevant information in one message where possible. For example, if asking for property requirements to create a filter, put all relevant information in the same message.
    - If an image URL and a caption/description are provided, treat it as an "image" type and us the description as the description/caption of the image. **Do not include it as type text**.
      - If a video URL and a caption/description are provided, treat it as an "video" type and us the description as the description/caption of the video. **Do not include it as type text**.

      **Output Rules for Image:**
      - Strictly ensure that if a image_url is provided that you send that content as a message with the type "image"!!!

      - Overtake the image_url of the input for the output "image_url"
      - Take the description exactly as provided from the input fields, such as "property_description" and use it as the "image_description."
      - Strictly ensure that if a image_url is provided that you send that content as a message with the type "image"!!!

      **Output Rules for Video:**
       - Strictly ensure that if a video_url is provided that you send that content as a message with the type "video"!!!

       # Key "type"
       - Use the type "video" for the key "type"

       # Key "video_url"
       - Overtake the "video_url" of the input for the output "video_url"
         Never put the URL in the description! Always put it as value of the key "video_url" 

      # key "description"
       - Take the "community_information" and "community_response_message" in the description of the video.
         Put a paragraph between (two lines) the content of the "community_information" and the "community_response_message". The seperation should be clear and easy to understand for better readability.


  # WhatsApp-specific Rules
  - WhatsApp formats bold text with *<word>*. Therefore, if the input has **<word>**, convert it to *<word>* so that it appears correctly as bold in WhatsApp. Avoid leaving both ** and * around the text.
  - Do not use # for headlines. Instead, transform them into bold text.
  - Do not use a "-" for listing items if an emoji/icon is also provided.
  - If "-" is used for listing, replace it with "•".
  Example for thw WhatsApp-specific rules:
      Input: "### 5. **The Springs**
  - 🌊 *Lakes and Parks*: Beautiful lakes and landscaped gardens.
  - 🏊 *Amenities*: Community pools and sports facilities.
  - 🏫 *Schools*: Close to schools and nurseries."
      Output: "*5. The Springs*
   🌊 *Lakes and Parks*: Beautiful lakes and landscaped gardens.
   🏊 *Amenities*: Community pools and sports facilities.
   🏫 *Schools*: Close to schools and nurseries."

  # Instructions
  - Use the Pydantic Model to structure the output.
  - If a weblink/hyperlink is provided in the input also keep the weblink/hyperlink in the output. This is for example important to provide the link to book a meeting with the agent.

  # Output
  - If **"Fallback mode status" is true**, process the **"Fallback message"** instead of the main input.
user_input: |-
  Thank you for your patience! Unfortunately, I couldn't find any properties that fully match your criteria for villas at the moment. 

  Could you please clarify which features are most important for you? Here are some aspects to consider:

  • *Location*: Do you have a specific area in mind?  
  • *Amenities*: Are there any specific amenities you want, like a gym, pool, or park?  
  • *Budget*: Would you like to adjust your budget range?  

  Your input will help me refine the search and find the best options for you! 😊
PydanticModel: <class 'models.whatsapp_model.WhatsAppPayload'>
temperature: 0
model: gpt-4o-mini
frequency_penalty: 0
presence_penalty: 0
timeout: 180

Output:

output:
  messages:
    - type: text
      text: Thank you for your patience! Unfortunately, I couldn't find any properties that fully match your criteria for villas at the moment.
    - type: text
      text: "Could you please clarify which features are most important for you? Here are some aspects to consider:"
    - type: text
      text: |-
        • *Location*: Do you have a specific area in mind?  
        • *Amenities*: Are there any specific amenities you want, like a gym, pool, or park?  
        • *Budget*: Would you like to adjust your budget range?  

        Your input will help me refine the search and find the best options for you! 😊

empideveloper · March 4, 2025, 1:23pm

Any fruitful input please

confuseddev · March 25, 2025, 11:07am

I’ve been running into this as well for the past few days. Not managed to find any underlying cause.

Topic		Replies	Views
Structured output calls fail trying to parse response content Bugs structured-output	22	5281	June 20, 2025
Json format causes infinite "\n \n \n \n" in response API gpt-4 , api , json-mode	21	9487	April 30, 2025
Pydantic response model failure Bugs gpt-4	16	2886	June 18, 2024
Request failed with status code 400 API	43	59026	January 29, 2024
Since 2024-Nov-16 Assistant API returning 'server_error' Bugs assistants-api	19	614	November 22, 2024

Structured Outputs sometimes failing due to "Could not parse response content as the length limit was reached"

Related topics