The assistant’s instructions are to accept a short phrase about business and to write a 250-500 word summary of that phrase. It is explicitly told in the instructions to respond only with written paragraphs, no bulleted or numbered lists. It is set to use the GPT-4 model.
In playground, it performs flawlessly every time. However, I can provide the identical input via the API, and I receive a woefully inadequate response that blatantly ignores my instructions—every time. It almost always includes a numbered list, and oftentimes it isn’t even a summary at all, but something completely different.
I’m using Make to send the JSON for each step of the process (Create thread, create and run message, retrieve message). There are no errors, all inputs are being accepted and responded to accordingly. The issue is the quality of the response.
I’m hoping that there is a setting or something I can do to get responses via the API that are in alignment with those in Playground. Thank you in advance.
Thanks for the response, but I’m not providing instructions with any messages or runs. The only instructions are those configured for the Assistant itself within the OpenAI browser control panel.
I might have a solution for you. I was having a similar problem, found this thread, then later solved my problem. Here goes:
Using the Python API, I was thinking that a “run” is an iterative message-passing class. I made an assistant, made a thread, then made a run, then added messages to the thread in a loop. This did not work.
Instead, I made the assistant, made a thread, and then in a loop I (a) added a message, and then (b) created a run to get the assistant’s output for that one message. This works. Runs, as I now understand them, are a way to say “hey OpenAI, go!”
So why did I get any output in the first place? I think that when I create a run without any messages, GPT goes “well, I need to say something”, and so outputs something. Of course, that something isn’t based on any messages, so it’s useless.
Create an assistant: assistant = client.beta.assistants.create
Create a thread: thread = client.beta.threads.create
Add a message to the thread: message = client.beta.threads.messages.create(the_assistant_id, the_thread_id)
Create a “Run”: run = client.beta.threads.runs.create(the_thread_id)
The mistake which I and jack.isaak7 made was that we initially swapped steps 3 and 4. Our incorrect thinking was that a “Run” object can have messages dynamically added to it. Wrong! You need to create the messages first; the Run then bundles up the existing messages and fetches a reply.
The reason this error was hard to notice is that, if you create a Run without any messages, OpenAI still replies. The reply is nonsense, since it is not based on the message; hence this thread’s title “Vastly Different Responses”.
The reason you get different messages in Playground and your app is because you might be creating a Thread every time you send a message via the API. Whereas in the Playground, you only create one Thread. You can Send your message in the Playground, clear the Thread, and you’ll see a different message as if you’re making the API request in your app.
The solution would be either:
Train your bot to generate the same message across different Thread Ids
Only use a single Thread Id for your app
Also note that although you have selected the corect bot Id in the Playground, the settings are not synced with Assistants > “Your assistant”. So you have to manually change the settings in Playground > Assistants and in Assistants > “Your Assistant”. That is the GPT engine, Temperature, files, etc.
Create a thread and send the user’s response to openAi to add it as a message to the thread.
Create a run and wait for it to complete.
Fetch the messages for that particular run Id.
Here, I’ve been adding custom instructions thinking that it will be added to the instructions and generate a more relevant response while maintaining the flexibility to have the assistants core functionality and allowing it to have different custom instructions for different use cases.
Additionally, I’ve been making API calls rather than use the nodejs package. Also the run takes quite a bit of time to finish, going as long as 25-30 seconds. Any tips to reduce that? I tried truncation_strategy and max_completeion_token parameters but they started giving incoherent responses in different languages