API vs Playground assistant performance

bPar · August 15, 2024, 3:59pm

Hey , I created an assistant meant for a specific data extraction task. When I was testing it using Playground the results were very promising, at worst satisfactory.

Recently, I moved into prototyping with node.js and I realized that the quality of responses from the API is drastically worse. The results returned by the API are wrong/useless in more than 50% of cases.

I could only achieve comparable output if I switched from gtp4o-mini-07-18 to gpt4o-08-06 in my API calls. This is not an optimal solution for my use case both financially and due to much longer inference time.

I made sure to not override any settings in run call:

 const run = await this.openAi.beta.threads.runs.createAndPoll(
      this.myThread.id,
      {
        assistant_id: assistantId,
      },
    );

I am using low temperature (0.2).
I made sure to analyze my threads one-to-one (API to Playground) and the only difference is the response.

Any ideas how to make this right? I would be very grateful.

RonaldGRuckus · August 15, 2024, 4:30pm

The playground uses (basically) the same endpoints for their calls to imitate API calls. Any differences found are usually an issue with the code.

What kind of messages are you sending? Sometimes things like formatting can creep up and cause some quality issues.

bPar · August 15, 2024, 4:40pm

I am sending a stringified JSON object with classified data to perform extraction on, with fields being described and referred to in the system prompt and json_schema.

I went into Dashboard->Threads. Both run instructions and messages are identical between the two source calls.

RonaldGRuckus · August 15, 2024, 4:48pm

Ok. Good to know.

So we need to really dig deep. There must be some very subtle differences here.

I am wondering why you are using this? Not inherently wrong, but does raise some concerns and always can introduce very hard to catch bugs.

Let’s build a very simplified example of calling the OpenAI endpoint without having some state.

In fact, if you’re comfortable, let’s just use an easy to investigate language like Python in Jupyter. You can bootstrap an api call via assistants rapidly.

my_obj = put_your_stringified_obj_here
from openai import OpenAI
client = OpenAI()

run = client.beta.threads.create_and_run(
  assistant_id=your_assistant here,
  thread={
    "messages": [
      {"role": "user", "content": my_obj}
    ]
  }
)

bPar · August 15, 2024, 4:54pm

Thank you for your help , the issue turned out to be some very subtle whitespace difference within the stringified json, which I missed at first. It turns out the model understand human-formatted JSON much better.

RonaldGRuckus · August 15, 2024, 4:55pm

Those are the worst. I’m glad you found it.

Happy coding.

Topic		Replies	Views
Getting better results when using Assistant in playground versus using the API API assistants , gpt-4-turbo , assistants-api	1	1501	February 23, 2024
Vastly Different Responses (Assistant Playground vs. API) API	10	3258	June 20, 2024
Different Results in Playground Vs Python Client Library Bugs	3	174	July 1, 2024
Performance Differences Between OpenAI Playground and Assistants API API api , playground , assistants-api	2	283	August 19, 2024
Huge difference between ChatGPT assistant and API assistant API gpt-4 , api , playground	0	2304	January 15, 2024

API vs Playground assistant performance

Related topics