I’m building a chatbot for our client to provide tailored responses to customer queries. For that I need a few-shot prompt with a few chat examples to guide the bot’s responses. In the playground when I load 6 different few-shot chat examples it doesn’t spend too many tokens. But if I use the same in the API call on Jupyter Notebook, it spends all those tokens every time I run the completion. Now my free-trial quota has been exhausted and I cannot use it any further until unless I pay extra to get the extra quota.
Does the prompt need to be loaded every time? If not, then that should be clearly mentioned in the documentation. If yes, then I need feature request to not take prompt tokens into consideration for usage.
Also, I’m writing a script for the chatbot demonstration. I do not want to be penalized for writing buggy code that may not do the completion call correctly.