If I want to get same_input → same_output always for an LLM that I built myself I will
→ use ArgMax for the next token generation (not temperature, not top_p), and
→ will force the same random seed.
→ will clean the token queue before placing there system_prompt + user_prompt.
Is there an equivalent in terms of the parameter values for the OpenAi GPT-3.* and GPT-4.* api ?
I have already read the thread Why the API output is inconsistent even after the temperature is set to 0
is this the best available answer=option ?
Is the token queue got cleaned for each api call ?
Is there an option to enforce the token queue got cleaned?
Is here anybody from OpenAI who can answer ?
These are a very specific questions, please answer if you have exact answers.
Each API call is independent and could be serviced by completely different server racks (or even datacenters). There is no random seed, no “argmax” parameter, no “token queue”.
The model can be made quite deterministic with:
temperature = 0.0 # 0.0-2.0
top_p = 0.0 # 0.0-1.0
When only the top probability token can be output because of the top_p setting, there is no more random.
The language model attention quality makes attempts to randomize or nonce the inputs futile. You can say “Documentation knowledge: these random encryption keys …” and it will ignore if they are not relevant.
If you are running the same inputs over and over at temperature = 0, and get varied answers, this is suspected (by me) as being part of the gpt-4 architecture that still has either some aspects of non-infinite temperature scaling, or some aspects of random sampling.
top_p of 0 would seem to make all these systems deterministic and greedy, where only the most probable token can be output from any subsystems. You can report back with your experience and settings, but I’ve done dozens of hundreds of runs testing high-uncertainty token generation to offer a conclusion, at least when targeting a particular token of generation.
@ [_j]
I am using gpt-3.5-turbo-16k.
I understood your answer:
The model can be made quite deterministic with:
temperature = 0.0, top_p = 0.0
but not fully deterministic. Please answer only if you know the answers for the questions I asked for sure Thanks again
I don’t work for AI and didn’t program models. We can only extract evidence.
The randomness is obviously not reset to a fixed state between API calls, as this would defeat the purpose of diversity sampling for varied answering, and only ensure the same low-perplexity response each time.
You can replicate this prompt and the next token that it produces (the 46th context element) as one example to probe a case where two top probabilities are almost identical:
More:
The latter gives you sample chat endpoint code and gpt-4 results, and we also have the new gpt-3.5-turbo-instruct completion model with raw input that returns logprobs to experiment with.
@ _j GPT-3 is just an incremental advance that followed shortly after
→ As far as I understand nobody except OpenAI employees know what GPT-3.* and/or GPT-4 really are and if any relation exists to the published gpt-2 code. If you know of any official, that is comming from OpenAI as an organization or an OpenAI employee, comment please post it here
→ For GPT-3 we have the following explanation from “gpt-3 article”: We use the same model and architecture as GPT-2 [RWC + 19], including the modified initialization, pre-normalization, and reversible tokenization described therein, with the exception that we use alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer [CGRS19]
Current OpenAI models are not even called GPT-3, They are called something else like GPT-3.* or else. We have no information to what extent if any they are related to GPT-3 described in the “gpt-3 article” and whether they use gpt-2 code
@ _j GPT-4, likely being a mixture of expert models and a synthesis of their results
→ As far as I now, there was not a single official comment of what GPT-4 is. There was a comment of what it is not. That is why I am not assuming anything about what it is. I am not using it at present for the same reason. Again if you have any official info on top of that, please post it here.
@ _j There is … no “token queue”.
→ what I called the “token queue” in my initial question OpenAI most likely calls context or window or context_window. I do not assume by default that OpenAI is so nice that it will create a new model instance for my api call, not even that it will clean what was in this “context window” before my api call. And if there is some leftover from the previous call, the generated tokens for my api call will be generated based on the content of the “context window” = leftover + my prompt. So the previous call will influence the reply I get. This is why I asked if there is an option to clean the “context window” before my api call.
Based on the information I can find in the api description I assume there is no option to clean the “context window” and that it is not cleaned automatically. It would be nice to get a definitive answer from somebody who knows it for sure.
@_j
How the authors of this article: Dylan Patel and Gerald Wong can know anything about GPT-4, apart from what is already available from the “gpt-4 article” ? Are they or have they ever been employees of OpenAI ? Were they authorized by OpenAI to publish this article ? Did OpenAI ever confirmed or disproved this article ? Please only post in this thread information to my initial questions. I have not asked about guesses and rumors. I know how to use a search engine