My app involves a user in an ongoing conversation with an AI (using the Assistants API). The user holds a whole bunch of “state-data” which we represent as a big json structure. The AI is able to manipulate this structure through function calls.
I’m trying to understand the best way (in terms of both token use and efficient use of the context window) to pass this “state-data” to the Assistant…
-
The obvious choice is “add it to the prompt”. But my concern is that this will quickly eat up tokens and fill the context window with largely redundant information. Because with each turn of the conversation the AI (ie the underlying transformer) is not only seeing the current state-data, but also the state-data from each past-turn.
-
Currently, we are passing state-data via “additional instructions” - since my understanding is the AI (ie the underlying transformer) will only see one copy of the state-data (instead of one copy per-past-turn of the conversation) - on each turn.
-
We could also send a summarized/truncated state-data in the prompt, and make it queryable via function calls.
Specific questions:
-
I understand that token-billing is somewhat opaque - but I’m trying to get a rough idea here. Suppose we have a conversation with average message size M, structure size S, T-turns in the conversation, and token limit L. Is my understanding correct that my input token use over the conversation will be
O(min(L, (S+M)*T)*T)
if I put my structure in the prompt but justO(min(L, S+M*T)*T))
if I put it in the additional_instructions? -
Does the underlying transformer actually get the additional instructions on every prompt? I’ve seen people claiming that they are getting ignored.
-
Suppose the model can query a get_data_structure function to get the current structure via a function call. Will the response to that query still be visible to the AI in future turns of the conversation?