Inconsistent results between playground and API

I used the completions API to classify a few news articles into 1 of 4 classes. However, the API returned incorrect responses for half of the articles. When I copied the same prompt into the playground, it returns the correct response. I made sure the model (text-davinci-003) was used in both environments with the same parameters (temperature=0.2, max_tokens=7). Why does this happen?

How do I achieve consistent results for the same prompt in both environments? Would completions API be a better choice over the chat API for such a task?

Appreciate your inputs

Can you please show your request.
Can you also include the parameter “echo”: true and show us the results of that?

As of now chat completions is ideal for pretty much everything. I don’t think it’ll solve your current issue though.

Not much else can be done without speculating.

Here is the output for one of the articles

[{"model": "text-davinci-003", "prompt": "Decide what category a news article falls into: Economic or Financial crime, \nCorruption or Fraud, Organised Crime or Other.\n\nNews: Goldman-backed startup Circle launches no-fee foreign payments service\nPARIS/NEW YORK, June 15 Blockchain-based payments startup Circle Internet Financial on Thursday launched an international online money transfer service that allows people in the United States and Europe to send money to each other instantly and at no cost as it seeks to tear down borders in the payments world.\nCategory:"}, {"id": "cmpl-72G1yAAoHGgtTcYgufDgScwZttK4M", "object": "text_completion", "created": 1680769790, "model": "text-davinci-003", "choices": [{"text": " Other", "index": 0, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 105, "completion_tokens": 1, "total_tokens": 106}}]

Where do I include the “echo”: true parameter? Is it in the input jsonl file along with “model” and “prompt” ?

By “chat completions” are you referring to the endpoint chat/completions which uses gpt-3.5-turbo?

Yes. I would recommend to anyone who is using Completions to use echo and run string comparisons during development to ensure that what you are sending is what the model is seeing.

echo
boolean
Optional
Defaults to false
Echo back the prompt in addition to the completion

Update your prompt to turn your string raw so Python doesn’t try and escape or process your newlines before sending it: (simply adding an R at the beginning). This may not help but it’s good practice. This may also be the reason why you are experiencing different results.

“prompt”: r"Decide what category a news article falls into: Economic or Financial crime, \nCorruption or Fraud, Organised Crime or Other.\n\nNews: Goldman-backed startup Circle launches no-fee foreign payments service\nPARIS/NEW YORK, June 15 Blockchain-based payments startup Circle Internet Financial on Thursday launched an international online money transfer service that allows people in the United States and Europe to send money to each other instantly and at no cost as it seeks to tear down borders in the payments world.\nCategory:"

Yes.

Because gpt-3.5-turbo performs at a similar capability to text-davinci-003 but at 10% the price per token, we recommend gpt-3.5-turbo for most use cases.

You may also want to consider fine-tuning a lesser model for classification purposes.

Thanks for the suggestions. I will try them out.