How to set max_tokens, temperature parameters when using assistants API?

how to set max_tokens, temperature parameters when using assistants API? did not find any info in API reference.


max_tokens was certainly handy for keeping prompt+messages to a
rough length. I have had mixed results setting this using just the assistants API. This would mitigate the need for post response checks.


Thanks for the clarification.
To answer your question about the use cases, what if we want to make the responses more deterministic while using Assistant API in chatbots?
Example: I want the model to always respond in a certain way when asked about the uses of a specific product mentioned in the system prompt. While testing, it works fine, but every time the same question is asked, it returns a different response. The same is the case of using it for teaching-related chatbots. I am not getting into many details for the sake of brevity, but feel free to ask any follow-up questions.


This is a great use case the seed parameter which we do not currently support for the Assistants API but do for Chat Completions. In general, you should expect to have more control via Chat Completions than you would via the Assistants API.


Chat Completions does not provide threading or retrieval or code interpreter. The primary advantages of the Assistants are 1) threading, 2) retrievals. We use retrieval to make answers more deterministic, so having the ability to set Temp = 0 is important for the most basic use case for Assistants.


Let me chime in here - using the Assistants to execute ‘tasks’ that include things like reading and processing emails requires a high level of consistency. I’m don’t mind a different worded answer or summary of things - but a different outcome in terms of action IS a problem. Reading and processing and email with a 2 page long prompt should ideally always result in the same assessment / actions?
As an example - I end my prompt for one of my assistants this way:

Response format instructions:
The response should always, no matter what be a single JSON object with one attribute ‘Reasoning’ that has the text with your reasoning and one attribute ‘Assessment’ that has REJECTED, POSSIBLE or GREAT FIT. There will be no text or other characters outside the JSON string. Remember ALWAYS return a JSON OBJECT.

And the Assistnant STILL manages to occaisonally chew out a text only response.


I’m currently in need of this feature as my AI Assistant responses are way too long. I need to control the response length. When can we expect this feature to be released?

This feature is super important because in my case for example, it sometimes takes up to 30s for my assistant to generate a response. Which is unnecessary and provides horrible user experience. No user will sit around waiting 30s for a response from a chatbot.

Use openairetro/examples/temparature at main · icdev2dev/openairetro · GitHub to preserve the semantics of AssistantApi while using the mechanics of Chatcompletion, for the moment.

Ofc in your example, you would also need to include max_tokens in the defintion of ChatcompletionAssistant.

1 Like

Thanks I’ll check it out! :smiley: