Given a Thread object, I’ve noticed in the Playground that GET-ing all messages from the given thread only includes the User message and Assistant message (after calling a function).
However, the Assistant does in fact remember the function outputs, based on my tests.
I am wondering whether the function outputs count towards the input tokens? As in, whether they are billed as a part of the input context?
Hi and welcome to the Developer Forum!
Yes, context produced by retrievals or code interpreter will be fed into the prompt as context and charge for as input.
@Foxabilo Is there any way to maximize this? I uploaded one small json file of icons for the retrieval to select from and it spiked up my usage tremendously!
Not worth it if it costs $0.02 per message to be sent
There is no way currently to temper the threshold or amount of data being fed to the AI. Your assistant would permanently know about icons when chatting, because “create assistant” created it with connection to the file IDs.
You would have to disengage files with “modify assistant” - a pointless function to use in runtime when you can’t know when your file will be relevant to a user’s chat query.
If a retrieval file or collection is uploaded and connected to an assistant, and the file is small enough to fit into the (>10000 word) context length of the model used, it will be included every time. Otherwise chunked to fill up the context available.
There is apparently no distinguishing files meant for code interpreter from retrieval; they are just attached to the assistant. If you meant for a csv to be available to be processed by code interpreter, it would also be fed into the AI context.
About the only way you can moderate your expenses a bit is to use GPT-3.5-turbo-0613 as your assistant model selection, thus giving the assistant backend less context window length to experiment with. That should result in the document being chunked if over the available context, and gets you also less charges per token if AI agent goes bonkers and iterates multiple times, or simply repeats phrases or produces nonsense for the remainder of the unspecify-able max_tokens.
example input context per internal call within run:
gpt-3.5-turbo-1106 | $0.0010 / 1K tokens x 14.5 = $0.0145
gpt-3.5-turbo-0613 | $0.0015 /1K tokens x 3.5 = $0.00525
assuming a max_tokens (or limit of encroachment into generation context length area) similar to ChatGPT.
(Multiply by 10x the input cost, then multiply 8x the input length for what specifying
gpt-4-turbo could net you.)
Thank you for your detailed response. Seems a little strange that they would have these assistants with their own private file bank but still need to constantly send this context each time.
Makes it very difficult to find value using this tool at least in the time being.
There’s a very simple solution for this. Simply don’t use the OpenAI File functionality, and implement your own, to provide to the AI as a function it can use.
Meaning, prepare the files you need, embed and chunk them yourself, and create a function tool for the AI that it can use to retrieve information from the embedded file, and into the instruction tell what files are available to the Assistant and what information, in brief, they contain, so the AI knows when, based on the conversation, it should call your function to retrieve the information.