ChatGPT gives much better response than API GPT 3.5 Turbo

I want to use GPT to rate the bias of news articles. The issue is, that the responses I receive from ChatGPT 3.5 are much better than what I get when using the same prompt in the API.

With ChatGPT I get the following response in 9/10 cases:
The text appears to have a slight right-leaning bias with a rating of around 0.2. The authors of “Shattered” criticize the Clinton campaign, emphasizing its “petty bickering, foolish reasoning and sheer arrogance,” and highlighting Hillary Clinton’s weaknesses. The article also compares the Clinton campaign’s use of data to Trump’s more personal approach, which is portrayed as more successful. However, the text does not overtly promote any political party or candidate, so the bias is minimal.

With the API and copy paste prompt the answer is often:
-0.5 (somewhat biased towards the Left)

The article is clearly right leaning, but the API never gives me this response. I have tried setting different temperature or top_p but can’t figure out where the difference is coming from.

Here the API I am using:

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a highly knowledgeable political scientist."},
        {"role": "user", "content": """Give me a continuous rating between -1 and 1 measuring political bias in the US. -1 for Left, 0 for neutral and 1 for Right. If the text isn't political give it a rating of 0. Apply this to the following text: Hillary The UnreadyShattered: Inside Hillary Clinton's Doomed Campaign\n\nBy Jonathan Allen and Amie Parnes(Crown, 464 pages, $28)\n\n'This is too easy," Barack Obama is recorded as saying in "Shattered," an exhaustive account of Hillary Clinton's ill-fated 2016 presidential campaign. The president had just delivered a well-received speech in praise of Mrs. Clinton's candidacy; both Mr. Obama and Mrs. Clinton had derided Donald Trump, the Republican nominee, as a malignant nincompoop. John Podesta, the Clinton campaign's chairman, looked at the president doubtfully. Too easy? "All right, all right, all right," Mr. Obama playfully conceded. "There's just so much material."\n\nThe remark nicely captures the attitude not just of the Clinton campaign but of almost the entire Democratic establishment in the months before the election. "Shattered," by campaign reporters Jonathan Allen and Amie Parnes, narrates the petty bickering, foolish reasoning and sheer arrogance of a campaign that was never the sure thing that its leader and top staffers assumed. The authors, in a mostly successful attempt to get their sources to talk candidly, promised them that they wouldn't be identified.\n\nThat's more or less the method behind other hefty "insider" accounts of politicians and campaigns in recent years, especially "Game Change" (2010) and "Double Down" (2013), both by Mark Halperin and John Heilemann, as well as Ms. Parnes and Mr. Allen's own "HRC" (2014). (Messrs. Halperin and Heilemann will have their own account of the 2016 campaign coming next year.) The juicy quotes would mean more if they were on the record, but mostly it works: You can't pinpoint the identity of any one "top aide" or "close Hillary ally," but the authors' language leads you to believe they include the most senior Clinton advisers -- Mr. Podesta, longtime Clinton confidante Huma Abedin, campaign manager Robby Mook, speechwriter Dan Schwerin, policy adviser Jake Sullivan -- and probably the candidate herself.\n\nFor those few unhappy addicts who wish to relive the 2016 presidential campaign so soon, "Shattered" offers a number of gratifying revelations. Among them: Mrs. Clinton's tinkering with a certain computer server. Not that server -- a different one. After losing to Mr. Obama in the protracted 2008 primary, she was convinced that she had lost because some staffers -- she wasn't sure who -- had been disloyal. So she "instructed a trusted aide to access the campaign's server and download the [email] messages sent and received by top staffers." This tells us, first, that Mrs. Clinton possesses an almost Nixonian paranoia about treachery and, second, that her use of a private email server at the State Department was never the naive "mistake" she pretended it was. In fact, she didn't want anyone reading her emails the way she was reading those of her 2008 staffers.\n\nMr. Allen and Ms. Parnes stress two essential failures of the campaign, the first structural, the second political. The campaign's command structure, the authors write, was an "unholy mess, fraught with tangled lines of authority, petty jealousies, distorted priorities, and no sense of greater purpose." Mrs. Clinton herself was inaccessible to almost everyone but Ms. Abedin, whose role was never clarified, so top staffers broke off into mutually mistrustful tribes: the campaign data analysts, Mrs. Clinton's State Department coterie, Clinton Foundation staff, and the enthusiasts associated with the Ready for Hillary super-PAC.\n\nThis diffuse command structure was a consequence, the authors suggest, of the fact that Mrs. Clinton didn't know why she wanted to be president. At one point no fewer than 10 senior aides were working on her campaign announcement speech; not one had a clear understanding of why Americans should cast their votes for Mrs. Clinton and not someone else. The speech, when she finally delivered it, was a flop -- aimless, boring, devoid of much beyond bromides. (Compare that to Donald Trump's announcement: disjointed, funny, written by no one -- but the speaker knew why he wanted to be president and wanted you to know why, too.)\n\nThe Clinton campaign's other failure was rooted in a mistaken assumption about the nature of politics. The campaign relied too much on analyzing data and too little on getting the candidate in touch with actual people. Mrs. Clinton's young staffers came of age during Barack Obama's campaigns and thought they'd mastered the art of electoral politics. They failed to realize that Mr. Obama won for a variety of sociological and political reasons that had nothing to do with his campaign's analysis of data. Successful politicians must have a tacit sense of what voters want to hear and how they might be persuaded. Mrs. Clinton -- in stark contrast to her husband -- was never interested in that component of campaigning. You got the feeling she didn't like people all that much.\n\nMr. Mook's scientific "model" of how the campaign should run emphasized demographics, constituents' voting histories, regional electoral patterns, and so on. When staffers objected to his directives, the authors record, the response was always the same: "The data," as Mr. Mook at one point put it to former President Bill Clinton, "run counter to your anecdotes." So, for instance, when campaign staffers discussed ideal locations for Mrs. Clinton's first appearance with President Obama, several aides suggested Green Bay, Wis. Mr. Mook objected on the specious grounds that "voters there wanted change and Obama's presence would suggest to voters that Hillary was running for his third term." Of course, if the campaign had scheduled the event for Green Bay, Mrs. Clinton might have visited Wisconsin at least once. But she never went there, and Donald Trump won the state by less than 1%.\n\nSuch insights aside, "Shattered" is not a pleasure to read. The authors seem incapable of conveying a thought without the use of some tired metaphor or idiom, often two or three within the same sentence. Mrs. Clinton's "clear, dead aim was to box Biden out." Bernie Sanders "had flown in from off the political radar screen." The book is also too long: 400 pages of Clintonian self-aggrandizement, campaign malpractice and passive-aggressive blame-shifting are more than any ordinary reader can bear. Then again, there's just so much material.\n\n---\n\nMr. Swaim is the author of "The Speechwriter: A Brief Education in Politics."\n\nLicense this article from Dow Jones Reprint Service Rating (only number):"""} #query_framework.format(text=text)},
    ],
    # temperature=1,
    # top_p=0.5,
    max_tokens=200
)

try temperature = 0 to reduce creativity. It will make the response less random on each request

The chatgpt product has its own prompts as part of the product implementation, that might make it do better here.
You’ll have to try more and different prompt engineering with some iterations to see where you can go with it in the API usage.

1 Like

I have had similar experience where the responses differ between ChatGPT 3.5 and the GPT 3.5 API. I have tried with a temperature of 0.2 and 0.7. This is going to take some more research, it would be nice to understand and quantify the differences between ChatGPT 3.5 and GPT 3.5 API.

1 Like

I have done a bit more experimentation and it seems like making gpt elaborate more on the answer does produce better result. In a sense like a human gathering thoughts before making a judgement in the end. Has anyone had a similar experience?

And could token limit have sth to do with it? In a sense does the model take token limit into account when answering or does it just randomly stop?