How to limit the number of messages or tokens that are persisted in a thread to maintain context in Open AI Assistants?

_j · January 29, 2024, 8:27pm

What you describe doesn’t ultimately control the usage based on specification. It sounds like it could just throw an error or truncate output.

API developers can handle programming. If assistants will ever be useful for production and products, you must target skilled developers. The trick is in making all specifications, or objects with sets of specifications, optional, and allow incrementally building on control beyond default values.

The below also isn’t fleshed-out (and can’t be as we are provided a black box at this point), but can be an idea of where API developers can desire controls

{
run_budget {  # threshold disables all tools, forcing user output
    max_steps - number of internal iterations to allow
    total_completion_tokens - total tokens of all internal steps to allow
    total_tool_tokens - total accumulated tool input context from iteration
}
{
run_limits {  # thresholds immediately terminate run with error if reached
    total_completion_tokens
    total_input_tokens
}
step_context_budget {
    max_input - limits context as if model had smaller input capability
    max_tokens - truncates the output of all internal generations
    }
    retrieval_injection {
        max_tokens - tokens to terminate automatic knowledge injection
        similarity_threshold - semantic threshold to block irrelevance
    }
    retrieval_browser {
        search_max_return_items
        search_max_tokens
        search_similarity_threshold
        click_max_tokens
    }
    tool_context {
        tool_max - truncates AI loading from python or returns
    }
    conversation_context {
        max_tokens
        max_turns
        favor_conversation - 0-100 importance of maintaining chat vs retrieval
}

…and then expose temperature and top_p. Or for function-calling AI in general, even an immediate sampling override upon production of a send-to-tool-recipient token.

Thanks for hearing my thoughtful rambling.

Topic		Replies	Views
Reducing Context Tokens in Assistant Threads API assistants	21	9576	July 8, 2024
Assistant API Max input context size API	5	2061	April 16, 2024
How exactly do you get charged for using the API for assistants? API assistants-api	33	7460	November 27, 2023
OpenAI Assistant maximum token per Thread API gpt-4-turbo	11	11413	May 28, 2024
Token consumption: Prompt tokens exponentially increase when using Threads (Assistants) API assistants-api	8	589	September 5, 2024

How to limit the number of messages or tokens that are persisted in a thread to maintain context in Open AI Assistants?

Related topics