How to limit the number of messages or tokens that are persisted in a thread to maintain context in Open AI Assistants?

What you describe doesn’t ultimately control the usage based on specification. It sounds like it could just throw an error or truncate output.

API developers can handle programming. If assistants will ever be useful for production and products, you must target skilled developers. The trick is in making all specifications, or objects with sets of specifications, optional, and allow incrementally building on control beyond default values.

The below also isn’t fleshed-out (and can’t be as we are provided a black box at this point), but can be an idea of where API developers can desire controls

{
run_budget {  # threshold disables all tools, forcing user output
    max_steps - number of internal iterations to allow
    total_completion_tokens - total tokens of all internal steps to allow
    total_tool_tokens - total accumulated tool input context from iteration
}
{
run_limits {  # thresholds immediately terminate run with error if reached
    total_completion_tokens
    total_input_tokens
}
step_context_budget {
    max_input - limits context as if model had smaller input capability
    max_tokens - truncates the output of all internal generations
    }
    retrieval_injection {
        max_tokens - tokens to terminate automatic knowledge injection
        similarity_threshold - semantic threshold to block irrelevance
    }
    retrieval_browser {
        search_max_return_items
        search_max_tokens
        search_similarity_threshold
        click_max_tokens
    }
    tool_context {
        tool_max - truncates AI loading from python or returns
    }
    conversation_context {
        max_tokens
        max_turns
        favor_conversation - 0-100 importance of maintaining chat vs retrieval
}

…and then expose temperature and top_p. Or for function-calling AI in general, even an immediate sampling override upon production of a send-to-tool-recipient token.

Thanks for hearing my thoughtful rambling.

1 Like