top_p is where “p” stands for probability mass cutoff of the ranked multinomial input set of all logits that come from the inference model, the goodness value of each token in the BPE encoding dictionary being the output, each token evaluated by the softmax layer and normalized into a probablity by the total of all logits then adding up to 1.0, or 100%.
A top_p value of 1.0 allows all through to the next stage of biasing by temperature. A value of 0.5 is 50% of the probability, where the most certain logits would be added until up to 0.5 probability is reached and no more, a cutoff that in most cases just gives a few tokens of choice if the AI is pretty certain what to write.
In your case where you told the AI the binary choice of what to produce, and the instructionfollowing and how to write the output is what produces uncertainty beyond the specification, you’ll likely get top values like “true”: 20%, “false”: 19%, “True”: 4%, “Sure”: 2%… and the listing continues for 200000 tokens, 20 of which you can observe with logprobs.
 set top_p:0.1 in that case, only “true” can be output.
 set top_p:0.4, and the first three are considered randomly according to their weight to give a discrete probability distribution.
 set top_p:0.0000001 and it becomes mathematically impossible to have a second rank token.
So, with setting 1, you are basically turning off any function of this API parameter, which is the default when not specified.
Set it to 0, you get the indeterminate model’s best production path for a run, by turning off sampling.
three passes of rewriting through different models  at top_p 0
Understanding Nucleus Sampling (Topp) in AI Language Models
Nucleus sampling, commonly referred to as topp sampling, is a method used in AI language models to manage the diversity of generated text. The “p” in topp denotes the cumulative probability threshold that determines which subset of tokens the model considers when generating the next word in a sequence.
In language models, each token is assigned a probability through a softmax layer, which converts the raw output scores (logits) from the model into a probability distribution that sums to 1.0, or 100%. The topp parameter establishes a cutoff in this cumulative probability distribution. For example:

A topp value of 1.0 includes all tokens, allowing the model to consider the entire probability distribution for the next word. This setting is equivalent to applying no cutoff and is typically chosen to maximize diversity in the generated text.

A topp value of 0.5 means that the model considers only the most probable tokens until their cumulative probability reaches 50%. This usually results in a more focused selection of tokens, particularly when the model has high confidence in its predictions.
To illustrate, imagine a model tasked with generating a binary response. The probabilities for potential tokens might be: “yes” (20%), “no” (19%), “maybe” (4%), “possibly” (2%), etc. If topp is set to 0.1, only the token “yes” would be selected, as it alone exceeds the 10% threshold.

Setting topp to 0.4 would include the first three tokens (“yes”, “no”, and “maybe”), allowing the model to randomly select among them based on their probabilities.

A very low topp value, such as 0.0000001, would restrict the model to the single most probable token, effectively eliminating diversity in the output.
When topp is set to 1, the parameter is effectively neutralized, permitting the model to utilize the full probability distribution. Conversely, setting topp to 0 would theoretically disable sampling, compelling the model to always select the most probable token, which can result in repetitive and predictable outputs.
Adjusting the topp parameter enables users to finely tune the balance between creativity and precision in AIgenerated text, making it possible to customize the output to meet specific needs and contexts. This flexibility is crucial for applications requiring a particular style or level of inventiveness in text generation.