Structured Outputs sometimes failing due to "Could not parse response content as the length limit was reached"

Hi all!

I am facing a problem with the API call with structured outputs in my application. I am using it a several location in my application but the very last one - which creates a payload for a WhatsApp API request - is sometimes failing due to length limit…the weird thing is…it is not even near the limit of tokens…and sometimes the same input gets passed…the issue is also really hard to re-produce as its really coming very few times…but still blocks us to go on production

Can someone tell me why this one failing sometimes?

Here is the reference:

LengthFinishReasonError('Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=1310, total_tokens=17694, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))')

Traceback (most recent call last):
  File "/app/src/core/openai_chat_completion.py", line 188, in openai_chat_completion_structured
    response = openai_client.beta.chat.completions.parse(
  File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 160, in parse
    return self._post(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1283, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 960, in request
    return self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1066, in _request
    return self._process_response(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1165, in _process_response
    return api_response.parse()
  File "/usr/local/lib/python3.10/site-packages/openai/_response.py", line 325, in parse
    parsed = self._options.post_parser(parsed)
  File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 154, in parser
    return _parse_chat_completion(
  File "/usr/local/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 72, in parse_chat_completion
    raise LengthFinishReasonError(completion=chat_completion)
openai.LengthFinishReasonError: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=1310, total_tokens=17694, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))
response_instructions: |
  
  # Goal
  Generate an output based on the Pydantic Model to create WhatsApp payloads (including text, images and videos) for the WhatsApp API.

  # Rules
  - Strictly use the content provided in the input without adding or removing any information.
    - Maintain the order of the messages exactly as provided.
    - Preserve any emojis/icons that are part of the input.
    - Keep the message structure as given (e.g., if there is an introduction and an outro, both should be included in the output). 
    - Group relevant information in one message where possible. For example, if asking for property requirements to create a filter, put all relevant information in the same message.
    - If an image URL and a caption/description are provided, treat it as an "image" type and us the description as the description/caption of the image. **Do not include it as type text**.
      - If a video URL and a caption/description are provided, treat it as an "video" type and us the description as the description/caption of the video. **Do not include it as type text**.

      **Output Rules for Image:**
      - Strictly ensure that if a image_url is provided that you send that content as a message with the type "image"!!!

      - Overtake the image_url of the input for the output "image_url"
      - Take the description exactly as provided from the input fields, such as "property_description" and use it as the "image_description."
      - Strictly ensure that if a image_url is provided that you send that content as a message with the type "image"!!!

      **Output Rules for Video:**
       - Strictly ensure that if a video_url is provided that you send that content as a message with the type "video"!!!

       # Key "type"
       - Use the type "video" for the key "type"

       # Key "video_url"
       - Overtake the "video_url" of the input for the output "video_url"
         Never put the URL in the description! Always put it as value of the key "video_url" 

      # key "description"
       - Take the "community_information" and "community_response_message" in the description of the video.
         Put a paragraph between (two lines) the content of the "community_information" and the "community_response_message". The seperation should be clear and easy to understand for better readability.


  # WhatsApp-specific Rules
  - WhatsApp formats bold text with *<word>*. Therefore, if the input has **<word>**, convert it to *<word>* so that it appears correctly as bold in WhatsApp. Avoid leaving both ** and * around the text.
  - Do not use # for headlines. Instead, transform them into bold text.
  - Do not use a "-" for listing items if an emoji/icon is also provided.
  - If "-" is used for listing, replace it with "•".
  Example for thw WhatsApp-specific rules:
      Input: "### 5. **The Springs**
  - 🌊 *Lakes and Parks*: Beautiful lakes and landscaped gardens.
  - 🏊 *Amenities*: Community pools and sports facilities.
  - 🏫 *Schools*: Close to schools and nurseries."
      Output: "*5. The Springs*
   🌊 *Lakes and Parks*: Beautiful lakes and landscaped gardens.
   🏊 *Amenities*: Community pools and sports facilities.
   🏫 *Schools*: Close to schools and nurseries."

  # Instructions
  - Use the Pydantic Model to structure the output.
  - If a weblink/hyperlink is provided in the input also keep the weblink/hyperlink in the output. This is for example important to provide the link to book a meeting with the agent.

  # Output
  - If **"Fallback mode status" is true**, process the **"Fallback message"** instead of the main input.
user_input: |-
  Let's find the perfect project for you! To get started, could you please share a bit more about your preferences? Here are some questions to guide us:

  • *Property Type*: Are you interested in a villa, townhouse, or apartment?  
  • *Budget*: What is your budget range for the property?  
  • *Bedrooms*: How many bedrooms do you need?  

  Once I have this information, I can recommend some exciting projects that suit your needs! 😊
PydanticModel: <class 'models.whatsapp_model.WhatsAppPayload'>
temperature: 0
model: gpt-4o-mini
frequency_penalty: 0
presence_penalty: 0
timeout: 180
@traceable(run_type="llm", name="openai_chat_completion_structured")
def openai_chat_completion_structured(
    response_instructions: str,
    user_input: str,
    PydanticModel: Any,
    temperature: float = 0.0,
    model="gpt-4o-mini",
    frequency_penalty=0.0,
    presence_penalty=0.0,
    timeout=180,
    **kwargs,
) -> Union[Any, None]:
    """
    Function to generate a structured response using GPT-4o-mini and validate it using a Pydantic model.

    Args:
    - response_instructions: The system-level instructions for generating the response.
    - user_input: The user-specific input with all necessary information formatted.
    - PydanticModel: The Pydantic model to be used for response validation.
    - temperature: The temperature for the completion model.

    Returns:
    - The generated and validated message response as a Pydantic model instance.
    - None: If an error occurs or retries are exhausted.
    """
    try:
        # Log detailed information about the chat completion request
        log_chat_completion_details(
            system_prompt=response_instructions,
            user_input=user_input,
            model=kwargs.get("model", "default_model"),
            temperature=temperature,
            additional_params={
                "response_model": PydanticModel.__name__,
                **{k: v for k, v in kwargs.items() if k != "model"},
            },
        )

        logger.info("Initiating chat completion request")

        # Input validation
        if not isinstance(response_instructions, str) or not response_instructions:
            logger.error("Invalid or missing response instructions.")
            return None
        if not isinstance(user_input, str) or not user_input:
            logger.error("Invalid or missing user_input.")
            return None
        if not isinstance(PydanticModel, type) or not issubclass(
            PydanticModel, BaseModel
        ):
            logger.error(
                "Invalid Pydantic model provided. It must be a subclass of BaseModel."
            )
            return None

        # Count tokens (but don't enforce limits)
        instruction_tokens, input_tokens = count_token_usage(
            response_instructions, user_input, model
        )

        max_retries = 3
        attempt = 0
        start_time = time.time()

        while attempt < max_retries:
            try:
                # Check for timeout
                if time.time() - start_time > timeout:
                    logger.error("Request timed out")
                    return {"error": "timeout", "message": "Request timed out"}

                # Making the API call to OpenAI
                response = openai_client.beta.chat.completions.parse(
                    model=model,
                    messages=[
                        {
                            "role": "system",
                            "content": f"{response_instructions}\nProvide your response in JSON format.",
                        },
                        {"role": "user", "content": user_input},
                    ],
                    temperature=temperature,
                    response_format=PydanticModel,
                )

                # Log the token usage
                if hasattr(response, "usage"):
                    logger.info(
                        f"Token usage - Instructions: {response.usage.prompt_tokens}, "
                        f"Response: {response.usage.completion_tokens}, "
                        f"Total: {response.usage.total_tokens}"
                    )

                # Process the response...
                response_content = response.choices[0].message.content

                # Log the raw response before parsing
                try:
                    # Convert response_content to dict if it's a string
                    if isinstance(response_content, str):
                        import json
                        response_dict = json.loads(response_content)
                    else:
                        response_dict = response_content
                    
                    # Create a multi-line string representation
                    flat_response = "\n".join([f"{k}: {v}" for k, v in response_dict.items()])
                    
                    console.print(
                        Panel(
                            flat_response,
                            title="[green]Raw Response[/green]",
                            border_style="green",
                            padding=(1, 2),
                        )
                    )
                except Exception as e:
                    logger.warning(f"Failed to format response for logging: {e}")
                    pass

                return PydanticModel.parse_raw(response_content)

            except RateLimitError as e:
                logger.warning(f"Rate limit exceeded: {str(e)}")
                # Implement exponential backoff
                wait_time = (2**attempt) + random.uniform(0, 1)
                time.sleep(wait_time)
                attempt += 1
                continue

            except APITimeoutError as e:
                logger.error(f"Request timed out: {str(e)}")
                return {
                    "error": "timeout",
                    "message": "The request timed out. Please try again.",
                }

            except APIConnectionError as e:
                logger.error(f"Connection error: {str(e)}")
                return {
                    "error": "connection",
                    "message": "Failed to connect to the API. Please check your network connection.",
                }

            except AuthenticationError as e:
                logger.error(f"Authentication error: {str(e)}")
                return {
                    "error": "auth",
                    "message": "Authentication failed. Please check your API key.",
                }

            except BadRequestError as e:
                logger.error(f"Bad request error: {str(e)}")
                return {
                    "error": "bad_request",
                    "message": "The request was malformed. Please check your inputs.",
                }

            except PermissionDeniedError as e:
                logger.error(f"Permission denied: {str(e)}")
                return {
                    "error": "permission",
                    "message": "You don't have permission to access this resource.",
                }

            except InternalServerError as e:
                logger.error(f"OpenAI server error: {str(e)}")
                attempt += 1
                if attempt < max_retries:
                    time.sleep(2**attempt)  # Exponential backoff
                    continue
                return {
                    "error": "server",
                    "message": "OpenAI servers are experiencing issues. Please try again later.",
                }

            except APIError as e:
                logger.error(f"API error: {str(e)}")
                if "length limit was reached" in str(e).lower():
                    return {
                        "error": "token_limit_exceeded",
                        "message": "The input is too large to process. Please break down your request into smaller parts.",
                    }
                attempt += 1
                if attempt >= max_retries:
                    return {"error": "api", "message": str(e)}

        return {
            "error": "max_retries",
            "message": "Maximum retry attempts reached. Please try again later.",
        }

    except Exception as e:
        logger.error(f"Error in chat completion: {str(e)}", exc_info=True)
        raise

from pydantic import BaseModel, Field
from typing import List, Optional

"""WhatsApp Front-End Structure"""
#################################################################################
# Helper Models - Text, Image and Video models
class WhatsAppImage(BaseModel):
    image_url: Optional[str] = Field(None, description="URL of the image.")
    description: Optional[str] = Field(
        ..., description="Caption/description of the image."
    )

class WhatsAppVideo(BaseModel):
    video_url: Optional[str] = Field(None, description="URL of the video.")
    description: Optional[str] = Field(
        ..., description="Caption/description of the video."
    )

class WhatsAppMessage(BaseModel):
    type: str = Field(
        ..., description="The type of the message, either 'text', 'image' or 'video'."
    )
    text: Optional[str] = Field(
        ...,
        description="The textual content of the message, applicable if the type is 'text'.",
    )
    image: Optional[WhatsAppImage] = Field(
        ...,
        description="The image object containing image URL and description, applicable if the type is 'image'.",
    )
    video: Optional[WhatsAppVideo] = Field(
        ...,
        description="The video object containing video URL and description, applicable if the type is 'video'.",
    )
#################################################################################
# Main reference model for the WhatsApp payload
class WhatsAppPayload(BaseModel):
    messages: List[WhatsAppMessage] = Field(
        ..., description="The structure containing the list of WhatsApp messages."
    )

And here is the call after i told the AI to retry the query over the whatsapp chat it worked…

response_instructions: |
  
  # Goal
  Generate an output based on the Pydantic Model to create WhatsApp payloads (including text, images and videos) for the WhatsApp API.

  # Rules
  - Strictly use the content provided in the input without adding or removing any information.
    - Maintain the order of the messages exactly as provided.
    - Preserve any emojis/icons that are part of the input.
    - Keep the message structure as given (e.g., if there is an introduction and an outro, both should be included in the output). 
    - Group relevant information in one message where possible. For example, if asking for property requirements to create a filter, put all relevant information in the same message.
    - If an image URL and a caption/description are provided, treat it as an "image" type and us the description as the description/caption of the image. **Do not include it as type text**.
      - If a video URL and a caption/description are provided, treat it as an "video" type and us the description as the description/caption of the video. **Do not include it as type text**.

      **Output Rules for Image:**
      - Strictly ensure that if a image_url is provided that you send that content as a message with the type "image"!!!

      - Overtake the image_url of the input for the output "image_url"
      - Take the description exactly as provided from the input fields, such as "property_description" and use it as the "image_description."
      - Strictly ensure that if a image_url is provided that you send that content as a message with the type "image"!!!

      **Output Rules for Video:**
       - Strictly ensure that if a video_url is provided that you send that content as a message with the type "video"!!!

       # Key "type"
       - Use the type "video" for the key "type"

       # Key "video_url"
       - Overtake the "video_url" of the input for the output "video_url"
         Never put the URL in the description! Always put it as value of the key "video_url" 

      # key "description"
       - Take the "community_information" and "community_response_message" in the description of the video.
         Put a paragraph between (two lines) the content of the "community_information" and the "community_response_message". The seperation should be clear and easy to understand for better readability.


  # WhatsApp-specific Rules
  - WhatsApp formats bold text with *<word>*. Therefore, if the input has **<word>**, convert it to *<word>* so that it appears correctly as bold in WhatsApp. Avoid leaving both ** and * around the text.
  - Do not use # for headlines. Instead, transform them into bold text.
  - Do not use a "-" for listing items if an emoji/icon is also provided.
  - If "-" is used for listing, replace it with "•".
  Example for thw WhatsApp-specific rules:
      Input: "### 5. **The Springs**
  - 🌊 *Lakes and Parks*: Beautiful lakes and landscaped gardens.
  - 🏊 *Amenities*: Community pools and sports facilities.
  - 🏫 *Schools*: Close to schools and nurseries."
      Output: "*5. The Springs*
   🌊 *Lakes and Parks*: Beautiful lakes and landscaped gardens.
   🏊 *Amenities*: Community pools and sports facilities.
   🏫 *Schools*: Close to schools and nurseries."

  # Instructions
  - Use the Pydantic Model to structure the output.
  - If a weblink/hyperlink is provided in the input also keep the weblink/hyperlink in the output. This is for example important to provide the link to book a meeting with the agent.

  # Output
  - If **"Fallback mode status" is true**, process the **"Fallback message"** instead of the main input.
user_input: |-
  Thank you for your patience! Unfortunately, I couldn't find any properties that fully match your criteria for villas at the moment. 

  Could you please clarify which features are most important for you? Here are some aspects to consider:

  • *Location*: Do you have a specific area in mind?  
  • *Amenities*: Are there any specific amenities you want, like a gym, pool, or park?  
  • *Budget*: Would you like to adjust your budget range?  

  Your input will help me refine the search and find the best options for you! 😊
PydanticModel: <class 'models.whatsapp_model.WhatsAppPayload'>
temperature: 0
model: gpt-4o-mini
frequency_penalty: 0
presence_penalty: 0
timeout: 180

Output:

output:
  messages:
    - type: text
      text: Thank you for your patience! Unfortunately, I couldn't find any properties that fully match your criteria for villas at the moment.
    - type: text
      text: "Could you please clarify which features are most important for you? Here are some aspects to consider:"
    - type: text
      text: |-
        • *Location*: Do you have a specific area in mind?  
        • *Amenities*: Are there any specific amenities you want, like a gym, pool, or park?  
        • *Budget*: Would you like to adjust your budget range?  

        Your input will help me refine the search and find the best options for you! 😊
1 Like

Any fruitful input please :slight_smile: