Given a Thread object, I’ve noticed in the Playground that GET-ing all messages from the given thread only includes the User message and Assistant message (after calling a function).
However, the Assistant does in fact remember the function outputs, based on my tests.
I am wondering whether the function outputs count towards the input tokens? As in, whether they are billed as a part of the input context?
@Foxabilo Is there any way to maximize this? I uploaded one small json file of icons for the retrieval to select from and it spiked up my usage tremendously!
Not worth it if it costs $0.02 per message to be sent
There is no way currently to temper the threshold or amount of data being fed to the AI. Your assistant would permanently know about icons when chatting, because “create assistant” created it with connection to the file IDs.
You would have to disengage files with “modify assistant” - a pointless function to use in runtime when you can’t know when your file will be relevant to a user’s chat query.
If a retrieval file or collection is uploaded and connected to an assistant, and the file is small enough to fit into the (>10000 word) context length of the model used, it will be included every time. Otherwise chunked to fill up the context available.
There is apparently no distinguishing files meant for code interpreter from retrieval; they are just attached to the assistant. If you meant for a csv to be available to be processed by code interpreter, it would also be fed into the AI context.
About the only way you can moderate your expenses a bit is to use GPT-3.5-turbo-0613 as your assistant model selection, thus giving the assistant backend less context window length to experiment with. That should result in the document being chunked if over the available context, and gets you also less charges per token if AI agent goes bonkers and iterates multiple times, or simply repeats phrases or produces nonsense for the remainder of the unspecify-able max_tokens.
example input context per internal call within run:
gpt-3.5-turbo-1106 | $0.0010 / 1K tokens x 14.5 = $0.0145
or
gpt-3.5-turbo-0613 | $0.0015 /1K tokens x 3.5 = $0.00525
assuming a max_tokens (or limit of encroachment into generation context length area) similar to ChatGPT.
(Multiply by 10x the input cost, then multiply 8x the input length for what specifying gpt-4-turbo could net you.)
Thank you for your detailed response. Seems a little strange that they would have these assistants with their own private file bank but still need to constantly send this context each time.
Makes it very difficult to find value using this tool at least in the time being.
There’s a very simple solution for this. Simply don’t use the OpenAI File functionality, and implement your own, to provide to the AI as a function it can use.
Meaning, prepare the files you need, embed and chunk them yourself, and create a function tool for the AI that it can use to retrieve information from the embedded file, and into the instruction tell what files are available to the Assistant and what information, in brief, they contain, so the AI knows when, based on the conversation, it should call your function to retrieve the information.
Hi Foxabilo!
Do you know if there’s a way to get amount of Token consumed when a request is made?
I need to know the token consumption of each user of my platform, where users can create their Open AI assistant and use it to create threads.
Assistants are not yet in a production ready build. The only way currently to try to do that would be to have an API per customer and track the API key usage, but generating new keys is not able to be done under software control. Assistants are a demonstration of potential new services, and should not be sold as a product to end users at this stage.