Hey there, I am pretty new to this Api, so please bear with me.
I am currently writing an app which uses the Chatgpt Api to do two things:
I pass the official title of a German law to the Api and have it generate a good search query for searching articles about that law via a news api
I query the new api, then pass the results returned by it to ChatGPT and have it select which of the results is most relevant to the law. This step may be repeated several times for a given law, as I will be querying the news api for news about that law during various time periods, and want ChatGPT to select the most relevant result for each time period.
I am using an assistant for this, so I don’t have to re-send the instructions about how to construct the query and how to evaluate the results each time I want to use either of those functions, but instead save them in the system prompt.
Then I discovered the functions tool, and it seemed to me like a good fit, because the response I want from ChatGPT always has the same structure (either a string containing the search query, or a string containing the index number of the best result). But I can’t get it to work consistently. Despite describing what I want, in some detail, in the system prompt, and then again, much more briefly, in the function definitions, the behaviour seems erratic. Specifically, when I send a list of search results to the assistant like so:
run = client.beta.threads.runs.submit_tool_outputs_and_poll(thread_id=thread.id,
run_id=run.id,
tool_outputs=[
{“tool_call_id”: run.required_action.submit_tool_outputs.tool_calls[0].id,
“output”: json.dumps(candidates)}
])
the status of the run will sometimes switch to “completed”, even though I intended to send more results from different date ranges.
Here are my questions:
How can I make sure the status of the run does not change to “completed” after I submit a list of search results (because I may want to submit more lists for different date ranges)
After I have fully processed a law (generated a title and evaluated all search results), what am I supposed to do with the run? Should I send some message like “success” to indicate that the run is now completed? Should I cancel it? Should I simply send the title of the next law? (I don’t think the last of those options would work, as the run would be expecting more search results, not a new title)
More generally, does using an assistant with the functions tool and two functions I outlined sound like a good fit for what I am trying to achieve, or should I just use regular text completions instead?
I’m a little rusty on assistants specifically because they don’t solve any use case of mine, but I can give you some general advice
IIRC, you add your messages to a thread - the run is just the execution. A completed run is what you want, that’s when you would want to add additional messages. So that seems ok
I think you can just leave it, make a new thread. There’s discussions on how to delete threads (List and delete all threads), but if it’s completed it shouldn’t incur additional costs.
Regarding stuff I do know:
cont.: I would recommend against putting multiple different things into the same thread. Three reasons:
costs explode. You pay for all messages already submitted, every time you generate a new response. If you have a gigantic thread, you’ll pay for all of these tokens even though they do nothing for you.
security: if you have multiple users using your service, it would be possible to extract previous users’ queries and responses. Probably not a good idea.
confusion: I always advocate for keeping context as short and clean as possible (but there are other approaches, e.g. multishot) because anything in your context can be a source of confusion that can be integrated into your response and manifest as a type of hallucination.
I would go for text chat completions. If the assistant framework feels ergonomic to you, and it helps you iterate faster, there’s absolutely nothing wrong with that. Under the hood, (apart from document search and the python interpreter) the assistants API just uses and is billed exactly like the completions API.
I am using an assistant for this, so I don’t have to re-send the instructions about how to construct the query and how to evaluate the results each time
If you think this will save you in terms of cost, it won’t. As mentioned above, you will be billed for that with every run anyways.
In terms of programming convenience, I can see it. But on the other hand, a system prompt is just a string - you can tack that onto your query like everything else in the context.
I can see how it’s easier to think about in terms of a conversation - but you’re not really having conversations with a model. It’s all just fake and simulated.
IIRC, you add your messages to a thread - the run is just the execution. A completed run is what you want, that’s when you would want to add additional messages. So that seems ok
that sounds sensible, but, strangely perhaps, I was able to submit_tool_outputs_and_poll the same run several times with various sets of search results. It just wasn’t consistent - that is, the run might have responded with the index of the most relevant result and awaited the next set of results, or it might have gone into status = completed, with no discernable reason.
But that aside, let’s suppose I decide to forego the assistant and the functions tool - or at least the latter. Like you, I am concerned about exploding costs from a backlog of messages that serve no purpose. But what is the best way of avoiding that? Should I get a list of all messages in the thread and delete them, one by one, like this?
messages = client.beta.threads.messages.list(thread_id=thread.id)
for message in messages.data:
client.beta.threads.messages.delete(message_id=message.id, thread_id=thread.id)
Or should I leave the messages, but drop / delete the entire thread and start a new one for every call to the api?