Debugging response failure for batch action

muddzy · August 12, 2024, 5:11pm


def rate_page_relevancy_llm3(df, question):
    api_key =X
    client = instructor.patch(OpenAI(api_key=api_key))

    class PageScore(BaseModel):
        score: int = Field(..., description="Numerical rating of the page's relevance to the question (0-10)")
        reasoning: str = Field(..., description="Up to 20 words")

    class BatchPageScore(BaseModel):
        scores: List[PageScore]

    urls = df['URL'].tolist()
    messages = [{
        "role": "system",
        "content": f"""
        
        You will be given a list of URLs. For each URL, assign a score (0-10) based on the likelihood that the page will contain the answer to the question: {question}. 
        Also provide a brief reasoning (up to 20 words) for each score. Respond with a list of scores and reasonings for all URLs."""

    }, {"role": "user","content": f"URLs: {urls}"}]

    response = client.chat.completions.create(
        model="gpt-4o",  # change to 4o-mini? test!
        response_model=BatchPageScore,
        messages=messages
    )
    
    scores = [page_score.score for page_score in response.scores]
    reasonings = [page_score.reasoning for page_score in response.scores]
   
    df['Score'] = scores
    df['Reasoning'] = reasonings

    return df

As you can see from the above, I am asking gpt-4o to rate a list of different pages (contained in the variable: url) in terms of their relevancy (based on each url) to a particular input question.

In some cases, particularly when I analyse a URL variable with lots of pages at once, the total scores returned is less than the total pages in the url variable. So my guess is that for some pages in the URL variable for whatever reason, the LLM is failing to rate relevancy and hence returning no results for those pages.

How can I debug this further given that I am doing a single API call (it is not multiple calls where each would have its own response etc)? How can I figure out which page is the LLM failing for and why?

I tried doing try/except logic but since the overall API call is not failing this doesn’t give me any error messages.

PaulBellow · August 12, 2024, 6:33pm

Okay, it sounds like when you send a list of multiple URLs, it’s not rating all of them?

I might try sending a user/assistant pair with an example of what you want - ie score for each URL…

Foxalabs · August 12, 2024, 7:00pm

There is no way to debug what is happening with a single call.

More than a couple in a single call may become unreliable.

LLM’s are not deterministic nor are they able to look at and prioritise large amounts of information at one time. This is called “attention” and is best directed to single tasks per API call.

If you get it working with a single URL, you might try 2 or 3, but much past that and you are in the land of chance.

Topic		Replies	Views
API gives me too similar of results when I batch API	3	723	April 5, 2023
Questions about Assistant API, Errors, Batch & Rate Limit API	1	82	March 5, 2025
Why are requests in batches so much less effective than single API calls? API	0	116	December 31, 2024
Batch processing not finding external URLs API api	1	218	June 1, 2024
Issues with Rate Limiting and Batch Processing in OpenAI API Community api , batching	0	1880	November 11, 2023

Debugging response failure for batch action

Related topics