Inconsistent results between playground and API

dr.laxmsun · April 6, 2023, 6:27pm

I used the completions API to classify a few news articles into 1 of 4 classes. However, the API returned incorrect responses for half of the articles. When I copied the same prompt into the playground, it returns the correct response. I made sure the model (text-davinci-003) was used in both environments with the same parameters (temperature=0.2, max_tokens=7). Why does this happen?

How do I achieve consistent results for the same prompt in both environments? Would completions API be a better choice over the chat API for such a task?

Appreciate your inputs

anon10827405 · April 6, 2023, 6:29pm

Can you please show your request.
Can you also include the parameter “echo”: true and show us the results of that?

As of now chat completions is ideal for pretty much everything. I don’t think it’ll solve your current issue though.

Not much else can be done without speculating.

dr.laxmsun · April 6, 2023, 6:52pm

Here is the output for one of the articles

[{"model": "text-davinci-003", "prompt": "Decide what category a news article falls into: Economic or Financial crime, \nCorruption or Fraud, Organised Crime or Other.\n\nNews: Goldman-backed startup Circle launches no-fee foreign payments service\nPARIS/NEW YORK, June 15 Blockchain-based payments startup Circle Internet Financial on Thursday launched an international online money transfer service that allows people in the United States and Europe to send money to each other instantly and at no cost as it seeks to tear down borders in the payments world.\nCategory:"}, {"id": "cmpl-72G1yAAoHGgtTcYgufDgScwZttK4M", "object": "text_completion", "created": 1680769790, "model": "text-davinci-003", "choices": [{"text": " Other", "index": 0, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 105, "completion_tokens": 1, "total_tokens": 106}}]

Where do I include the “echo”: true parameter? Is it in the input jsonl file along with “model” and “prompt” ?

By “chat completions” are you referring to the endpoint chat/completions which uses gpt-3.5-turbo?

anon10827405 · April 6, 2023, 7:04pm

Yes. I would recommend to anyone who is using Completions to use echo and run string comparisons during development to ensure that what you are sending is what the model is seeing.

echo
boolean
Optional
Defaults to false
Echo back the prompt in addition to the completion

Update your prompt to turn your string raw so Python doesn’t try and escape or process your newlines before sending it: (simply adding an R at the beginning). This may not help but it’s good practice. This may also be the reason why you are experiencing different results.

“prompt”: r"Decide what category a news article falls into: Economic or Financial crime, \nCorruption or Fraud, Organised Crime or Other.\n\nNews: Goldman-backed startup Circle launches no-fee foreign payments service\nPARIS/NEW YORK, June 15 Blockchain-based payments startup Circle Internet Financial on Thursday launched an international online money transfer service that allows people in the United States and Europe to send money to each other instantly and at no cost as it seeks to tear down borders in the payments world.\nCategory:"

Yes.

Because gpt-3.5-turbo performs at a similar capability to text-davinci-003 but at 10% the price per token, we recommend gpt-3.5-turbo for most use cases.

You may also want to consider fine-tuning a lesser model for classification purposes.

dr.laxmsun · April 7, 2023, 4:49pm

Thanks for the suggestions. I will try them out.

Topic		Replies	Views
Why is GPT 4's response and performance on playground is so different from when using chatgpt 4 API gpt-4	12	9231	December 16, 2023
API Completions not really matching with chat.openAI GPT-3.5 Completions API gpt-35-turbo , chatgpt , api	7	2855	December 17, 2023
Different output generated for same prompt in chat mode and API mode using gpt-3.5-turbo API gpt-35-turbo , chatgpt , api	16	11095	December 18, 2023
Playground and API returing different results? API	7	2032	December 6, 2023
GPT3.5 returning incorrect data API chatgpt	7	2438	December 19, 2023

Inconsistent results between playground and API

Related topics