Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API

Hello everyone!

Ok, I admit had help from OpenAi with this. But what I “helped” put together I think can greatly improve the results and costs of using OpenAi within your apps and plugins, specially for those looking to guide internal prompts for plugins… @ruv

I’d like to introduce you to two important parameters that you can use with OpenAI’s GPT API to help control text generation behavior: temperature and top_p sampling.

  • These parameters are especially useful when working with GPT for tasks such as code generation, creative writing, chatbot responses, and more.

Let’s start with temperature:

  • Temperature is a parameter that controls the “creativity” or randomness of the text generated by GPT-3. A higher temperature (e.g., 0.7) results in more diverse and creative output, while a lower temperature (e.g., 0.2) makes the output more deterministic and focused.
  • In practice, temperature affects the probability distribution over the possible tokens at each step of the generation process. A temperature of 0 would make the model completely deterministic, always choosing the most likely token.

Next, let’s discuss top_p sampling (also known as nucleus sampling):

  • Top_p sampling is an alternative to temperature sampling. Instead of considering all possible tokens, GPT-3 considers only a subset of tokens (the nucleus) whose cumulative probability mass adds up to a certain threshold (top_p).
  • For example, if top_p is set to 0.1, GPT-3 will consider only the tokens that make up the top 10% of the probability mass for the next token. This allows for dynamic vocabulary selection based on context.

Both temperature and top_p sampling are powerful tools for controlling the behavior of GPT-3, and they can be used independently or together when making API calls. By adjusting these parameters, you can achieve different levels of creativity and control, making them suitable for a wide range of applications.

To give you an idea of how these parameters can be used in different scenarios, here’s a table with example values:

Use Case Temperature Top_p Description
Code Generation 0.2 0.1 Generates code that adheres to established patterns and conventions. Output is more deterministic and focused. Useful for generating syntactically correct code.
Creative Writing 0.7 0.8 Generates creative and diverse text for storytelling. Output is more exploratory and less constrained by patterns.
Chatbot Responses 0.5 0.5 Generates conversational responses that balance coherence and diversity. Output is more natural and engaging.
Code Comment Generation 0.3 0.2 Generates code comments that are more likely to be concise and relevant. Output is more deterministic and adheres to conventions.
Data Analysis Scripting 0.2 0.1 Generates data analysis scripts that are more likely to be correct and efficient. Output is more deterministic and focused.
Exploratory Code Writing 0.6 0.7 Generates code that explores alternative solutions and creative approaches. Output is less constrained by established patterns.

I hope this introduction helps you understand the basics of temperature and top_p sampling in the context of OpenAI’s GPI API. If you have any questions or experiences to share, feel free to comment below!

Happy coding!

65 Likes

Wow! I really love that table. Thank you for sharing this!

2 Likes

Thank you for sharing this information. I had been overlooking the top_p parameter.

Additionally, you might want to consider the frequency_penalty, which controls the repetition of words, and presence_penalty, which influences the likelihood of introducing new topics.

With the assistance of GPT-4, I’ve developed a table outlining various values for different writing styles, each with conservative, balanced, and creative options.

The table is used by a Google Document add-on that integrates with the OpenAI API. It estimates the writing style based on the document’s title, but users can adjust it as needed. For instance, if the title includes the word “proposal,” the add-on will apply a proposal writing style with a higher temperature compared to a “technical document.”

I’m still experimenting, but this approach has been effective in generating diverse results, allowing users to interact with these parameters without needing in-depth knowledge. Users can simply choose a different document style or request more conservative, balanced, or creative outputs.

1 Like

Check out my latest Plugin, it makes it super easy to play with these settings.

2 Likes

Here’s a thought question:

Let’s say you want to write a creative brief. (just an example). Something you’d want to get full “randomness” into, ie. temp=1. But instead of sending each section of the brief in one by one, you need to send the whole brief template and get back the filled-in template. But you still want full randomness and temp=1. For the sake of argument, let’s say the template is Markdown, which utilizes all the various markdown styles. Of course, the simple answer is to just set temp=0.5, but … duh.

Does using temp=1 break this? Would you split off sections into individual prompts? If so, that changes the nature of how one has to plan explicitly creative outputs (things you want temp=1 if possible) but want them within a consistent template AND without sending multiple prompts.

Solve?

I’m not sure if i understand what you’re asking. The temp setting basically sets the creativity of the output. If you want more than one output you’d use the n=value to define how many different versions. Typically these are similar, unless the temp is set to something high like .8 or greater.

I’ve been struggling to understand this. the api docs generally recommend ‘adjust top_p or temp but not both’. I don’t really understand why they say that. Is it fair to say 'top_p sets guardrails on how far the next token can wander, and temp adjusts how ‘wiggly’ the path taken is within those guardrails? If that is (approximately) correct, then two questions:

  1. what happens when the single highest conditional probability next token has a probability > top_p? (eg, top_p is 0.1, but the highest probability next token is .2)?
  2. If my interpretation above is correct, then the lower the value for top_p, the less temp matters, since the guardrails get narrower and narrower.

Hmm. Maybe I’ll try the playground with some very low values of top_p and report back

2 Likes

Why not use 0 temp and top_p for code generation ?

1 Like

Indeed, why not? Perhaps that only works for problems very similar to the training set, and we need a bit more ‘creativity’ to synthesize solutions to ‘new’ problems? Haven’t tried this, just speculating…

I think you just need to play around with top_p vs. temperature to see the difference. Honesty, I mainly use temperature, but I can see the value of top_p in certain situations.

Here is an example where I have the AI to try to be “creative” and write a rap song. Basically, the temperature version is better than the top_p version, since the rhymes in the temperature version seem to make more sense and feel less forced. The top_p version limits the vocabulary, sounds way more clinical???. But this could be good in situations where you want it to write code, or maybe Q/A from your documents (I would need to test this to be sure). Or act like a lawyer/doctor (or whatever).

System: “You are MC Hammer!”
User: “Throw me some rhymes, yo!”

Limit it by top_p :-1:

Not sure I like this one over the next one, too clinical using top_p

temp = 1, top_p = 0.09, max_tokens = 320, model = gpt-4

(Verse 1)
Yo, it’s Hammer time, I’m back on the scene,
I’m the lyrical genius, the rhyme machine,
I got the moves, the style, the groove, no doubt,
When I step on the stage, everybody shout.

(Chorus)
Can’t touch this, I’m the one and only,
MC Hammer, never lonely,
I got the rhythm, the beat, the flow,
When I grab the mic, it’s time to go.

(Verse 2)
I’m the king of the dance floor, the master of the beat,
When I’m rockin’ the mic, you can’t stay in your seat,
I got the power, the energy, the vibe,
When I’m on the stage, you know it’s live.

(Bridge)
I’m untouchable, unstoppable, the legend in the game,
MC Hammer, remember the name,
I’m the pioneer, the trailblazer, the icon,
When I’m on the mic, you know it’s on.

(Chorus)
Can’t touch this, I’m the one and only,
MC Hammer, never lonely,
I got the rhythm, the beat, the flow,
When I grab the mic, it’s time to go.

(Outro)
So, there you have it, the rhymes I threw,
MC Hammer, still fresh and new,
I’ll keep on dancin’, keep on groovin’,
As long as the beat keeps on movin’.

Limit it by temperature :+1:

This one sounds better, and notice it keeps going and is cutoff by the token limit. I vote temperature > top_p for this task (see below).

temp = 0.09, top_p = 1, max_tokens = 320, model = gpt-4

(Verse 1)
Yo, it’s MC Hammer, back on the scene,
With the funky fresh rhymes, you know what I mean,
I’m the lyrical master, the king of the groove,
When I step on the mic, I make the whole room move.

(Chorus)
Can’t touch this, no, you can’t compete,
I’m the one and only, with the golden beat,
I’m the king of the dance floor, the lord of the rhyme,
When I’m on the mic, it’s Hammer Time!

(Verse 2)
I’m the man with the plan, the original don,
I’ve been rockin’ the mic since the break of dawn,
I’m the one who made it cool to wear parachute pants,
When I hit the stage, everybody starts to dance.

(Bridge)
I’m the master of the flow, the ruler of the game,
When I’m on the mic, you’ll remember my name,
MC Hammer, the legend, the icon, the star,
I’ve been breakin’ it down, near and far.

(Chorus)
Can’t touch this, no, you can’t compete,
I’m the one and only, with the golden beat,
I’m the king of the dance floor, the lord of the rhyme,
When I’m on the mic, it’s Hammer Time!

(Outro)
So, remember my name, and don’t you forget,
I’m the one and only, the best you’ll ever get,
I’m MC Hammer, the legend, the king,
When I >|cuttoff due to token limit|<

1 Like

You could chunk you’re output or stream the results for larger token counts.

Presence_penalty was only mentioned once as a comment. This is not a master cheat sheet. If somebody finds one please share the link.

temperature tells the model how much to weight the dice. It doesn’t say “work differently” or “do more work” or even “be more creative” – unless by “creative” you mean “roll dice which are less likely to roll 1.”

Another useful place to set temperature to 0 is in testing. If you have a test suite that test cases don’t break when you change the prompt, temperature 0 means it’s less likely to randomly break from dice rolling. (It will still occasionally break because of API overload, but that’s a different problem.)

2 Likes

This is great; I have to admit that I have largely neglected top_p sampling and have focused primarily on adjusting the temperature. The table of suggested temperature and top_p values for different use cases is very helpful.

Hey @ruv
OpenAI has updated the range of temperature to 0-2.
Does that affect this table? I’m having issues with the code generation part. this was working perfectly for me before but not anymore?

1 Like

I saw that and honestly thought this was a bug… 0 - 2? WTF… Why? Because 2 is one better then 1. If they made this change on purpose just stupid…

This has to be a bug… Ranking algorithms have been ranged from 0.0 - 1.0 for decades. It’s a percentage. Ranging from 0.0 to 2.0 makes absolutely zero sense. What’s the rational?

Will lower Top_p provide more deterministic response than higher Top_p or vice versa?

Thanks!

Hi,
I am interested in using this plugin, but I am unsure of how to begin the installation process. Could you please direct me to any beginner-friendly tutorials? I have been unable to locate any online.

I just opened the official documentation from openai OpenAI Platform, and they said to either play with the temperature or the top_p command, but not both simultaneously.
See Image below

It seems it takes a very low top-p parameter to actually constrain the logits in cases where there is some leeway of choice.

At temperature 2, using some very indeterminate leading language, I get sentences that are mostly from the top three when using a top-p 0.05
image

These low and similar percentages are of course a contrast to something like “training an AI using few-”, which gives 99.99% of five different ways of writing “shot”, with the top choice at 98%.

(note: model probabilities are returned before temperature and nucleus sampling, so we don’t get to see the changing distances of probabilities or the limited token choices)

So that makes a high temperature above 1 but quite low top-p an interesting case for unexpected writing, creative without going off-the-rails crazy, almost randomly picking from the few top choices when choices are made available.

So for example at temperature 2, top-p 0.1, it is only in line four of our poem that we finally get two second-place choices (here using untrained davinci GPT-3)

image

Temperate of 2 would otherwise produce nonsense.

Of course picking anything but the best is completely unsuitable for code or function-calling.

1 Like