I assume there must be some argument to set model hyperparameters in this client.beta.assistants.create code block ? Anyone knows which argument or if there is any other way to set hyperparameters ?
We do not currently support setting temperature or other completions parameters in the Assistants API, but it is something we’re seeking feedback on during the beta period.
Can you share more about your use case for modifying these parameters?
Assistant Runs currently sample multiple messages in a loop, and results can be choppy if we apply high or low temperatures to every message we sample.
I am not aware of the sampling concept being applied internally by Assistant Runs, but when I ask same question and provide the same file, I sometimes get different answers. Sometimes its like
I apologize for the inconvenience, but it seems that there is an issue with the accessibility of the file you've uploaded. The tool I would typically use to browse and analyze the contents of your file is not able to access it. ...
While sometimes its correct but the generation is differently worded.
I usually set the temperature parameter to 0 or very low so that the LLM becomes deterministic and I can estimate its accuracy on questions related to my domain.
My usecase is related to data analysis by Assistant and generate some result on which further action can be taken, so I prefer Assistant to be more deterministic.
Also, just that any API call to a language model without constraining top-p or top-k will get you a long tail of token possibilities, and then you’re rolling the dice of breaking output and costing yourself another uncontrolled call with this system running at maximum context length…
Funny how the bare minimum chat history is what ChatGPT gets, but when paying, you get the maximum model context loaded up with “threads” for all the iterations the AI wants when it meanders through documents.
Any idea when these model parameters would become available? The lack of deterministic outputs is currently a blocker for my company to begin using the Assistants API. Need to guarantee the same outputs when performing PDF analysis (or at least very similar outputs) for the same inputs.
We’re in the same situation. We’ve built our own assistant that returns and resolves SQL. Controlling seed / temperature in this context really helps with a level of determinism.
The Assistants API would allows us to do away with our “hand-rolled” assistant, and simply rely on threads.
I would suggest that you define your temperature (and other parameters) at Assistant creation time. Any new threads will inherit the new temperature and settings. This would alleviate the “choppy” issue as discussed above.
Both embeddings and AI is non-deterministic, due to the architecture of the models used.
Ada embeddings, you will not get the same vector back every time and thus cannot guarantee the same set of top-x matches regardless.
Temperature or top-p, at 0.0000000000000001 instead of the lesser 0.0 placeholder, you will still get a model that can have the position of top logits flip, which will then alter the course of generation.
Then put all your data in the mystery box of assistants? nyet.
Any updates on that? I know that those things isn’t created over night but can we have some date to plan around? Or was this addiction dismissed for now - which is also important information for my company
Thanks, this is incredibly important if we want to deploy this to our users and bake off prompt/instructions against each other on our product-specific tasks.
This would be huge for our company. Having unpredictable results with Knowledge Retrieval (presumably not tool error-based) has been a large issue for us.
Yes definitely need it… we are getting inconsistent responses (RAG pattern) when using Assistants vs Chat completion and chat completion in general works better.
It appears that we’d have to stick with chat completion and implement our own context window size management, embedding and reasoning/summarisation over uploaded files etc., for now.
In addition Assistants API “add message” has 32K limit. Sometime we need to send more data to do RAG and message limit really should match model’s used size. I realise that instead we could attach a file to a thread for retrieval but results are not as good as our internal implementation though and having data as part of the message. Maybe there should also be an option to specify whether the file should be injected into conversation history?
In general, Assistants API is a very good idea but it requires a bit better implementation to be widely used. It’s a very good start I suppose… any ETA on upcoming updates to Assistants API?
So there is currently no way to set a max_tokens for the Assistant API, if I understand correctly?
To answer your question, I am developing a Discord bot using the Assistant API and I would like to limit the length of the generated text to not go over the message max length on Discord.