I am working on a project where I need to extract specific items (eg edible products) from sentence-long strings.
I have a reasonably long sys-prompt to explain everything. Currently I’m calling the API every time including the sysprompt, which means it’s quite inefficient. I understand that API calls do not retain any context (why not, wouldn’t it make sense to implement this and sort of have a session? - or is that what the Assistants are used for?). But I wonder if at least the sys prompt could be stored somewhere in something like a ‘session’.
My only goal here is to save tokens and hence $. I’m also using langchain for this if there’s any solution provided by them that I’m not aware of?
You can use the instructions part of assistants to achieve this. I basically took all my system prompts and rewrote them so they work towards one goal in the end (provide json with specified keys), and it worked quite well. Once created, you can call the assistant through the API and use threads to manage sessions.
Thanks! But does using the instructions mean that that’s only being called once towards the token count? I was under the impression that even in the assistant, the old thread and the instructions are being fed to the model every time a new response is required?
I see what you mean… Good question! I don’t know yet to be honest… I can see the instructions coming up in the logs with every run. Going to keep track of the tokens used and check the dashboard
According to the size of the input tokens I’m seeing, it looks like instructions are being submitted just like a system prompt along with the rest of the prompts…