What are the additional risks using gpt-4-turbo-preview over gpt-4?

Hey folks. In my work we’re using GPT-4 via Azure pretty extensively. But we’ve noticed better results for some of our prompts when testing in the playground with gpt-4-1106-preview (we can’t use gpt-4-0125-preview yet through Azure in the UK) - plus the context window is a lot larger which helps.

Before we commit to this, though, I’ve been tasked with finding out “what the risks and downsides are” over using the non-preview gpt-4 8k or 32k models.

From what I have gathered online, one downside is a stricter API limits. But on Azure for us they seem to be sufficiently high, at 420 requests per minute and 80,000 tokens per minute.

The Azure OpenAI API documentation simply says that they “don’t recommend using preview models in production” and that preview models “do not follow the standard Azure OpenAI model lifecycle.” This is an Azure thing, so I will find out some other way what they mean by this, but that is pretty vague.

From OpenAI’s own documentation and blog posts it sounds like preview is not necessarily stable but this is also pretty vague.

What I’m trying to work out is what does instability mean in this case? Is the model being continually modified in really big ways, or is it liable to have some really wild moments due to a tweak? Is it akin something being in beta? Or is it similar to when a semantically-versioned software library is before the 1.0.0 release and doesn’t have a stable interface yet?

Or is it more to do with the deployment?

If anybody can help shed some light on what the risks are - as opposed to the usual risks of using LLMs in terms of un-predictability - I would be very grateful. Thanks!

2 Likes
  1. API Limits: Preview models often have stricter API rate limits compared to stable versions. For instance, the rate limits for gpt-4v-preview are fixed at 20 requests per minute and 100 requests per day1.
  2. Model Stability: Preview models may not be as stable as non-preview models. They can undergo significant changes without notice, which could affect the consistency of the results you get from the model.
  3. Model Lifecycle: Azure OpenAI’s documentation indicates that preview models do not follow the standard model lifecycle2. This means they might not receive the same level of support, such as regular updates or maintenance, as stable models.
  4. Deployment Considerations: Using a preview model in a production environment could lead to challenges if the model is updated or changed by the provider, as this could potentially disrupt services or require adjustments to your integration.
  5. Potential Risks: OpenAI has outlined potential risks associated with using GPT-4 models, which include generating harmful content, societal biases, and compromised code3. These risks can be more pronounced in preview models due to their less predictable nature.
  6. Version Upgrades: Azure OpenAI Service model versions are regularly updated with new features and improvements. Customers can choose to automatically update as new versions are released or stick with a specific version until it is retired4.

In summary, while preview models like gpt-4-1106-preview offer a larger context window and may provide better results for certain prompts, they come with trade-offs in terms of stability, support, and predictability. It’s important to weigh these factors against your organization’s tolerance for risk and need for consistent performance.

1 Like

haha I am glad you posted this I learned something… been using previews myself did not know about the model stability. this may explain why I have had to work so hard on my system to build all these things to catch the fuzzy ai logic that goes out of line. I may have to look at this closer now and compare myself :slight_smile: thanks for posting

2 Likes

Hi,

Preview is just that, if your userbase is early adopter cutting edge people, then they should be fine with a little down time, it’s still 99.somthing uptime, but you need to make provision for it to go out for an hour or two.

If you are using prod into an FT500 company running bank account checks… then Azure is probably your best bet for now. That is not to say that Azure is a perfect player here either, this is still an unprecedented amount of compute being flung around.

2 Likes

Thanks for the response! Did you copy this or write it from one source in particular?

1 Like

Thanks a lot, this sounds plausible! IMO this is sufficient for our purposes, as we can have it retry with more stable deployments as a backup if anything goes wrong. Not working at a FT500 here :slight_smile:

2 Likes

the information was me giving gpt4 with internet the information and your question along with some links so it could build that :slight_smile: I pretty much live with ai 24/7 lol.

2 Likes

I actually suspected as much but I didn’t want to assume, haha! It was helpful, thanks.

as I found out the other day have to check the references on stuff haha, got caught with my pants down for a minute haha