How to set temperature and other sampling parameters of model in Open AI Assistant api?

hiteshsom · November 9, 2023, 5:39am

assistant = client.beta.assistants.create(
  instructions="""<Instructions>
 """,
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}],
  file_ids=[file.id]
)

I assume there must be some argument to set model hyperparameters in this client.beta.assistants.create code block ? Anyone knows which argument or if there is any other way to set hyperparameters ?

atty-openai · November 9, 2023, 5:45am

Thanks for posting and welcome to the forum!

We do not currently support setting temperature or other completions parameters in the Assistants API, but it is something we’re seeking feedback on during the beta period.

Can you share more about your use case for modifying these parameters?

Assistant Runs currently sample multiple messages in a loop, and results can be choppy if we apply high or low temperatures to every message we sample.

hiteshsom · November 9, 2023, 6:40am

I am not aware of the sampling concept being applied internally by Assistant Runs, but when I ask same question and provide the same file, I sometimes get different answers. Sometimes its like

I apologize for the inconvenience, but it seems that there is an issue with the accessibility of the file you've uploaded. The tool I would typically use to browse and analyze the contents of your file is not able to access it. ...

While sometimes its correct but the generation is differently worded.

I usually set the temperature parameter to 0 or very low so that the LLM becomes deterministic and I can estimate its accuracy on questions related to my domain.
My usecase is related to data analysis by Assistant and generate some result on which further action can be taken, so I prefer Assistant to be more deterministic.

atwoodjw_agh · November 9, 2023, 7:03pm

Agree this is important to support. We’re generating SQL, which we want to be deterministic.

atty-openai · November 9, 2023, 7:06pm

Thank you! We’ll include this into our planning for the next beta update.

_j · November 9, 2023, 7:09pm

Also, just that any API call to a language model without constraining top-p or top-k will get you a long tail of token possibilities, and then you’re rolling the dice of breaking output and costing yourself another uncontrolled call with this system running at maximum context length…

Funny how the bare minimum chat history is what ChatGPT gets, but when paying, you get the maximum model context loaded up with “threads” for all the iterations the AI wants when it meanders through documents.

ajgreenwell · November 12, 2023, 9:02pm

Any idea when these model parameters would become available? The lack of deterministic outputs is currently a blocker for my company to begin using the Assistants API. Need to guarantee the same outputs when performing PDF analysis (or at least very similar outputs) for the same inputs.

RainyDay · November 13, 2023, 3:31pm

A little off the topic, could you elaborate more on the message sampling concept?

francotrimboli · November 14, 2023, 10:21am

We’re in the same situation. We’ve built our own assistant that returns and resolves SQL. Controlling seed / temperature in this context really helps with a level of determinism.

The Assistants API would allows us to do away with our “hand-rolled” assistant, and simply rely on threads.

I would suggest that you define your temperature (and other parameters) at Assistant creation time. Any new threads will inherit the new temperature and settings. This would alleviate the “choppy” issue as discussed above.

_j · November 14, 2023, 10:26am

Both embeddings and AI is non-deterministic, due to the architecture of the models used.

Ada embeddings, you will not get the same vector back every time and thus cannot guarantee the same set of top-x matches regardless.

Temperature or top-p, at 0.0000000000000001 instead of the lesser 0.0 placeholder, you will still get a model that can have the position of top logits flip, which will then alter the course of generation.

Then put all your data in the mystery box of assistants? nyet.

ryan98jones1310 · November 21, 2023, 7:14am

This is the same problem I am receiving. Sometimes the information is found in the files and sometimes not.

mattgos · November 22, 2023, 7:27am

Unfortunately the Assistants API isn’t very useful without these basic controls. Thumbs up for adding them as a feature.

keyer · November 28, 2023, 9:34am

Any updates on that? I know that those things isn’t created over night but can we have some date to plan around? Or was this addiction dismissed for now - which is also important information for my company

sanjay.ai · November 28, 2023, 1:22pm

Can we at least have it at the assistant level then? We want to be able to control these parameters as part of the instructions

yurisugano · December 4, 2023, 5:44pm

Upvoting the request, we are working with a deterministic data analysis project and having control over temperature would be phenomenal.

kabir.naim · December 4, 2023, 10:53pm

Thanks, this is incredibly important if we want to deploy this to our users and bake off prompt/instructions against each other on our product-specific tasks.

AdrianBriski · December 7, 2023, 6:46pm

This would be huge for our company. Having unpredictable results with Knowledge Retrieval (presumably not tool error-based) has been a large issue for us.

petercapazzi · December 12, 2023, 12:11pm

Same here… need temperature & seed capability! Producing SQL.

pavel.orlov · December 17, 2023, 1:45am

Yes definitely need it… we are getting inconsistent responses (RAG pattern) when using Assistants vs Chat completion and chat completion in general works better.
It appears that we’d have to stick with chat completion and implement our own context window size management, embedding and reasoning/summarisation over uploaded files etc., for now.
In addition Assistants API “add message” has 32K limit. Sometime we need to send more data to do RAG and message limit really should match model’s used size. I realise that instead we could attach a file to a thread for retrieval but results are not as good as our internal implementation though and having data as part of the message. Maybe there should also be an option to specify whether the file should be injected into conversation history?
In general, Assistants API is a very good idea but it requires a bit better implementation to be widely used. It’s a very good start I suppose… any ETA on upcoming updates to Assistants API?

SGuerra · January 5, 2024, 3:39pm

So there is currently no way to set a max_tokens for the Assistant API, if I understand correctly?
To answer your question, I am developing a Discord bot using the Assistant API and I would like to limit the length of the generated text to not go over the message max length on Discord.

Topic		Replies	Views
How to set max_tokens, temperature parameters when using assistants API? API	9	5825	March 15, 2024
Does temperature go to 1 or 2? API	6	27432	January 12, 2024
The generated code varies every time, even with a low temperature API assistants-api	5	301	August 27, 2024
Temperature, top_p and top_k for chatbot responses Prompting gpt-4 , chatgpt , api , api-temperature	10	121675	December 13, 2023
What is the default temperature setting of an assistant? API assistants-api	7	5940	May 14, 2024

How to set temperature and other sampling parameters of model in Open AI Assistant api?

Related topics