API vs Playground assistant performance

Hey , I created an assistant meant for a specific data extraction task. When I was testing it using Playground the results were very promising, at worst satisfactory.

Recently, I moved into prototyping with node.js and I realized that the quality of responses from the API is drastically worse. The results returned by the API are wrong/useless in more than 50% of cases.

I could only achieve comparable output if I switched from gtp4o-mini-07-18 to gpt4o-08-06 in my API calls. This is not an optimal solution for my use case both financially and due to much longer inference time.

I made sure to not override any settings in run call:

 const run = await this.openAi.beta.threads.runs.createAndPoll(
      this.myThread.id,
      {
        assistant_id: assistantId,
      },
    );

I am using low temperature (0.2).
I made sure to analyze my threads one-to-one (API to Playground) and the only difference is the response.

Any ideas how to make this right? I would be very grateful.

1 Like

The playground uses (basically) the same endpoints for their calls to imitate API calls. Any differences found are usually an issue with the code.

What kind of messages are you sending? Sometimes things like formatting can creep up and cause some quality issues.

3 Likes

I am sending a stringified JSON object with classified data to perform extraction on, with fields being described and referred to in the system prompt and json_schema.

I went into Dashboard->Threads. Both run instructions and messages are identical between the two source calls.

Ok. Good to know.

So we need to really dig deep. There must be some very subtle differences here.

I am wondering why you are using this? Not inherently wrong, but does raise some concerns and always can introduce very hard to catch bugs.

Let’s build a very simplified example of calling the OpenAI endpoint without having some state.

In fact, if you’re comfortable, let’s just use an easy to investigate language like Python in Jupyter. You can bootstrap an api call via assistants rapidly.

my_obj = put_your_stringified_obj_here
from openai import OpenAI
client = OpenAI()

run = client.beta.threads.create_and_run(
  assistant_id=your_assistant here,
  thread={
    "messages": [
      {"role": "user", "content": my_obj}
    ]
  }
)
2 Likes

Thank you for your help , the issue turned out to be some very subtle whitespace difference within the stringified json, which I missed at first. It turns out the model understand human-formatted JSON much better.

2 Likes

Those are the worst. I’m glad you found it.

Happy coding.

2 Likes