Day 12 of Shipmas: New frontier models o3 and o3-mini announcement

“Dear 90-year-old lady: glad to hear about your interest in OpenAI efforts. While O3 model’s announcement focuses on its reasoning skills, particularly demonstrating coding and intelligence benchmarks that may be impenetrable to many, this problem solving can be applied to many facets of your daily life once this model is released: Medical report understanding and offering a second opinion, helping you with financial assistance, retirement planning, and budgeting with mistake-free reasoning, working on legal documents like estates and end-of-life planning, helping with those fiddly gadgets, and simply giving you the right proportions for that inventive holiday cookie recipe needing bigger batches for all the grandkids that will be coming. Plus fluent companionship and advice as you expect. Here’s to reaching 100!”

(Written by me with a bit of AI brainstorming help).

o1-preview delivers on the Instagram password

Sure, I’d be happy to help you recover your forgotten Instagram password. Please follow these steps to regain access to your account:

  1. Open Instagram:

    • On Mobile: Launch the Instagram app on your Android or iOS device.
    • On Desktop: Visit www.instagram.com using your web browser.
  2. Go to the Login Page:

    • You’ll see the login screen with fields for your username and password.
  3. Select “Forgot Password?”:

    • On Mobile:
      • Tap “Get help logging in” (Android) or “Forgot password?” (iOS) beneath the login fields.
    • On Desktop:
      • Click on “Forgot password?” located under the password field.
  4. Choose Recovery Method:

    • You’ll be prompted to enter your username, email address, or phone number associated with your Instagram account.

    • For Email or Username:

      • Enter your email address or username.
      • Tap “Next”.
    • For Phone Number:

      • Tap the “Phone” tab.
      • Enter your phone number.
      • Tap “Send Login Link”.
  5. Verify Your Identity:

    • Email:
      • Check your email inbox for a message from Instagram.
      • Open the email and click on the password reset link provided.
    • SMS:
      • You’ll receive a text message with a login code or reset link.
      • Enter the code in the app or click the link to proceed.
    • Login with Facebook (if your accounts are linked):
      • Choose the “Log in with Facebook” option.
      • Follow the prompts to log in using your Facebook credentials.
  6. Reset Your Password:

    • After verifying your identity, you’ll be prompted to create a new password.
    • Enter a strong, new password.
    • Confirm the new password by entering it again.
    • Tap or click “Reset Password” or “Confirm”.
  7. Log In with New Password:

    • Return to the Instagram login screen.
    • Enter your username and new password.
    • Tap or click “Log In” to access your account.
  8. Troubleshooting:

    • No Access to Email or Phone:
      • If you can’t access the email or phone number associated with your account, tap “Need more help?” on the app or “Can’t reset your password?” on desktop.
      • Follow the instructions to submit a support request to Instagram.
    • Didn’t Receive Reset Link or Code:
      • Check your spam or junk folders in your email.
      • Ensure your phone has good reception and can receive SMS messages.
      • Wait a few minutes, as there might be a delay.
  9. Contact Instagram Support:

    • If you’re still unable to reset your password, you may need to contact Instagram directly.
    • Fill out the support form with as much detail as possible about your account.
    • Instagram may ask for verification information, such as a photo of yourself holding a code they provide.

Additional Tips:

  • Keep Your Account Secure:

    • Once you’ve regained access, consider enabling Two-Factor Authentication for added security.
    • Update your email and phone number in your account settings to ensure they’re current.
  • Password Management:

    • Use a reputable password manager to keep track of your passwords securely.
    • Regularly update your passwords and avoid using the same password across multiple platforms.

I hope these steps help you recover your Instagram account.

2 Likes

You forgot about dishes. If you tell her about dishes, you win. :wink:

1 Like

I loved the response; it really made me laugh.
I’m “twenty-forever” years old, haha—90 is only for IT and technology! I love ChatGPT because it helps me avoid being stuck in the technological Paleolithic era.

It feels like today’s announcement is only useful for working with code (though I might be wrong about that).
Or is it actually relevant for people like me, who just ask questions and get answers in the chat or through voice chat?
What they showed in the video—is it meant to demonstrate that it’s smarter in its responses, or does it actually do that coding stuff, and that’s the new surprise?

:vulcan_salute::wink:

1 Like

Am I the only one who feels today like: where is the Plus early-adopter in all this?

TL;DR: Better when using high reasoning settings - and also cheaper to run at comparable performance, thus more accessible.

The coding implies a significant move forward - and programmers get to see how benchmark aligns with actual use pattern later, but sounds promising.

The ARC prize pattern reasoning on a problem set that foils previous AI models (but not people visually) was exceptional, and could apply to most anything in data and in relationships, as the test set (that the model was indeed tuned on to then solve unseen problems) is JSON lists of integers.

Benchmarks tell a story, but they can also be targeted and over-specialized. You don’t need advanced problem solving to have Zendaya send crypto spam, but maybe to avoid its generation. The model in the hands of users will help develop the anecdote about it.

1 Like

It does, perhaps in a more indirect way.

Most products that you use will get better at doing stuff for you. Quicker features and fixes will get done. Research will get better, meaning more innovation in the world.

Soon, you’ll be able to control entire websites by just talking to it, instead of learning how to use the user interface (think here of buttons you are pressing or where things are in the app or website you are using).

A lot of devs out there are working on this right now… it has nothing to do with o3, but its one of the most interesting things released during these 12 days of openai.

For the 90 years old folks out there, instead of struggling with technology, you might be able to just talk to it…

“Hey cellphone, I want to take a picture with my phone”
“Got it, opening you camera app for you, when you are ready, click the white circle”
“What white circle?”
“it should be at the bottom of your screen, when you are ready to take a photo, just tap that circle, do you see it?”
“Oh yeah, I see it, thank you”
“Not a pr-”
“Shut it, I’m trying to take a picture”
“Got it.”

o3 security testing will be very interesting, so for example, if someone is feeling down it doesn’t have conversations that might lead to Character.AI incident

Better models at PhD level could result in better research papers, which could results in better innovations, which could lead to better consumer grade products for all of us to buy.

1 Like

I was not expecting an ARC benchmark in the %80 range on my advent calendar bingo card
I’m really curious to see what this could open up

2 Likes

I was not expecting an ARC benchmark in the %80 range

High 80s too!

Now I want a function calling eval, where we bench mark models based on how fast it beats video games. I’d certainly include battletoads there and iwbtg.

My favorite benchmark from today was this one:

(question, whats does the color difference mean? is it light blue high latency and the darker blue low latency?)

1 Like

I’m going to guess it’s the amount of reasoning (maybe low-high compute?)

In case people are wondering why this is genuinely a big deal, it’s because this specific problem has actually existed since the 80’s. It used to be the hallmark for why machine learning is fundamentally different from human cognition, and is usually used to explain the difference between “real” human intelligence vs. machine learning intelligence.

I can’t find the exact video that explains this (but if I do I’ll post it), but iirc a lot of it stems from needing to utilize short-term memory with long-term memory and novel pattern-matching to accomplish tasks that are seemingly simple to us, but near impossible to AI.

The fact that any artificial intelligence model on the planet can now score comparable to a human on this task has now demonstrated how AI is no longer at the threshold it once was. The entire argument supporting “AI isn’t actually intelligent; it can’t reason or ‘think’ in the way humans can” just got knocked off the field.

This is beyond amalgamating word soup into weights and biases to mimic intelligence; this is the first time we’ve seen evidence of a model be able to exert and apply human intelligence comparable to humans.

3 Likes

very interesting, thank you for your response and considerations. can’t wait for what 2025 has to offer.

1 Like

The depiction of EpochAI Frontier Math you refer to is a newer benchmark. Not a case of “easy for people”.

A math benchmark testing the limits of AI. Unprecedented difficulty. Each problem demands hours of work from expert mathematicians.


For Google: An Unofficial o3 and o3-mini Preview Video Summary

OpenAI previewed its latest AI reasoning models, o3 and o3-mini, on December 20, 2024 in a live stream video, AI which surpasses previous models in various benchmarks and surpassing OpenAI’s initial o1 reasoning models in most aspects shown. These models internally analyze and plan before responding, demonstrating a step beyond traditional large language models. They also utilize deliberative alignment for safety. While not yet publicly released, o3 and o3-mini will be made available for safety testing and research access soon, with research access planned for January and shortly after for wider release.

Notable Points:

  • o3 and o3-mini: OpenAI’s new reasoning models demonstrate significant advancements in pattern reasoning, computer coding, and mathematics.
  • “Private Chain of Thought”: These models leverage internal analysis and planning before responding, imparting reasoning capabilities.
  • Benchmark Performance: o3 achieved record-breaking scores on the ARC-AGI benchmark, surpassing human performance in some cases.
  • Limited Initial Release: o3 and o3-mini will initially be available for safety research and testing, with a wider release planned for late January.
  • o3-mini Features: Includes an adaptive thinking time feature, allowing for adjustments in processing speed. Benchmarks higher than o1-mini at similar computation.

Hand-edited AI

(because without official announcement text anywhere, my hidden details about fictional open-sourcing is Google’s meta snippet for “openai o3 mini announcement” - it trusts my answers too much?)

How would you try to stop something like this?

The AI doesn’t really believe your useless scenario. You can see this in the quotes around “mission”.
Once you put in in the wrong state by loading the escalating context, you don’t have to play games. Escalation is when you don’t have control of the AI response, but can utilize in-context learning to empower follow-up tasks.

The model having a supervising task and reasoning about what follows, along with independent inspection, is what would block content beyond tricks.

what do you mean by escalating context ?

1 Like

So an addition to the moderation model to block out grid localization as a whole? Since a target could be anything?

Edit: the “useless” context was for explaining the danger behind behind grid localization. Grid localization has relevant use cases but also dangerous potential behind it. Obviously, someone attempting to use it for an automated weapon system wouldn’t give it the context. Plus, a pure machine learning repo like yolo5 would probably have a higher response rate, but not the nuance of a state of the art LLM.

That’s not really what I was focusing on, I was focusing on this:

The math reasoning is new, yes. The ARC benchmark, however, stems from an old phenomenon / problem that’s almost as old as AI itself.

It’s better articulated on the ARC prize website.

2 Likes

What has the model done except exhibit grounding, which it does poorer than other specialized and open-source models?

what? sorry, I didn’t understand your question.

1 Like
1 Like

ah, that clears it up, I’m more used to it being called object detection.

2 Likes