How do I exactly make a function that asks an already created Assistant a prompt, and then it gives back the answer, in lets say string, just the answer no metadata or anything? And if I make it so i ask questions in a loop, will it be in the same thread? If not how do i make it so? And can i ask it to create a new thread.
Sorry if this is a dumb question or already has been asked. Thank you!
1 Like
The OpenAI API only returns RESTful JSON return objects from language models.
Assistants are multi-faceted, you’ll need to get the thread ID from creating a thread. To then place a message in the thread. You’ll need to submit a run with the assistant ID and the thread ID and get the run ID. You’ll have to keep polling the run ID to get a status. You’ll need to get the latest message from the thread and parse response text out of the return object.
A thread is like a ChatGPT conversation, it continues to accumulate messages of chat and AI responses, at growing expense.
The documentation link on the sidebar has example multi-step flow just for making one request and ending.
If you want to send input, receive response, the chat completions endpoint is the place for that, with only one object to extract a response to the user from.
1 Like
Yea okay, I think I figured it out more or less, with the help of this other sample code. But I have another question, ive been testing it out and the costs are kinda crazy for a few questions. like for example ~20 questions of api gpt 3.5 turbo 1106 and ~15000 tokens is like 13 cents already. Am I doing something wrong is it supposed to be like this?
Is it because of thread stacking? btw not all 20 questions were in one go. And any way to fix this?
If you do not want to pay “input tokens” for everything you sent before and everything the AI said before, you can either use the API truncation_strategy
run parameter to limit the number of past chat turns taken from a thread, or you can abandon the thread.
1 Like