🍓 O1 Spotlight: Share Your Favorite O1 Experiences

sps · September 14, 2024, 6:03pm

Welcome, everyone!

Starting September 12th, OpenAI launched an exciting new series of models known as O1.

This series features two impressive models: o1-preview and its smaller, faster sibling, o1-mini.

These models are designed to take time to think more deeply before responding.

As a result, they are better equipped to handle complex tasks and solve challenging problems in areas such as science, coding, and math.

Have you experienced a moment where the model did something truly amazing? We’d love to hear about it!

In this topic we invite you to share your favourite responses from these models.

Happy exploring!

PaulBellow · September 14, 2024, 6:49pm

I’ve been curious about the longest “Thinking…” time anyone’s gotten. I think mine was around 49 seconds or so?

jr.2509 · September 14, 2024, 8:22pm

I’m still in early stages of my experimentation but early results are promising.

One of the areas I am planning to really put o1-preview to test is in the context of a fully automated regulatory benchmarking process that involves comparing and contrasting regulatory texts on a given topic from across multiple jurisdictions and delivering a detailed insights report. The process comprises many steps but one of the more critical steps involves the model coming up fully autonomously with a detailed analysis framework tailored to the regulatory documents which is then used as the basis for the execution of subsequent analysis steps. This is where other models were so far struggling to reach the desired level of sophistication and where I see strong potential for integration of o1 into the workflow. Will report back once I‘ve had a chance to test this in-depth in the next 1-2 weeks.

_ali.eng · September 14, 2024, 8:34pm

I’m still amazed by the model , like the thinking ability … to be honest it’s an important update to be added to Chatgpt . Even if it takes time to think but you will get more accuracy in the answer . As i said it was an important and necessary update to be made .For me i was facing a problem in a code and the 4o model went into a loop facing the problem with no real solution and when i gave the code to o1-preview it simply fixed the issue . there was a missing part in chatgpt and it appeared to be ‘Thinking’ and reasoning .

RonaldGRuckus · September 14, 2024, 8:56pm

Great topic idea. Thanks for sharing.

I think theorem proving as demonstrated by @EricGT will find a lot of success. In fact, I see most of this model’s capability in aiding research and anything revolving proofing.

One of the top posts in HN (Hacker News) is Terence Tao summarizing and providing his experiences with o1. Also indicating that it’s potential with some languages like Lean may be very potent.

Also, Mastodon FTW. It’s where all the kewl kids hang out.

In my personal experience I haven’t found use for the model, yet. But I am very far from being a researcher.

EricGT · September 14, 2024, 9:19pm

55 seconds

The prompt which is from Jason Rute (prompt) (Quanta article)

Prompt

Consider this Lean 4 code:

inductive MyNat
| zero : MyNat
| succ : MyNat → MyNat

def MyNat.plus_left : (m n : MyNat) → MyNat
| .zero, n => n
| (.succ m), n => .succ (plus_left m n)

def MyNat.plus_right : (m n : MyNat) → MyNat
| n, .zero => n
| m, (.succ n) => .succ (plus_right m n)

example (m n: MyNat) : MyNat.plus_left m n = MyNat.plus_right m n := by sorry
Complete the proof of the example theorem.

Thinking messages

Thinking
Examining Lean 4 code
Understanding definition
Slugging through induction
Comparing expressions
Evaluating induction approach
Advancing induction
Breaking down the process
Confirming base case
Expanding the proof
Thinking through induction
Breaking down the base case
Breaking down the calculations
Analyzing base case
Demonstrating equality
Harnessing induction
Setting up the foundation
Evaluating function behavior
Detailing induction steps
Confirming equivalence
Translating to Lean 4
Validating assumptions
Exploring Lean proofs
Mapping out the proof strategy
Considering induction steps
Thought for 55 seconds

Completion

To complete the proof of the theorem in Lean 4, we can proceed by induction on the natural number m. Here’s how you can do it:

inductive MyNat
| zero : MyNat
| succ : MyNat -> MyNat

def MyNat.plus_left : (m n : MyNat) -> MyNat
| .zero, n => n
| (.succ m), n => .succ (plus_left m n)

def MyNat.plus_right : (m n : MyNat) -> MyNat
| m, .zero => m
| m, (.succ n) => .succ (plus_right m n)

example (m n: MyNat) : MyNat.plus_left m n = MyNat.plus_right m n := by
  induction m with
  | zero =>
    -- Base case: m = zero
    rw [MyNat.plus_left]
    rw [MyNat.plus_right]
    rfl
  | succ m ih =>
    -- Inductive step: m = succ m'
    rw [MyNat.plus_left]
    rw [MyNat.plus_right]
    rw [ih]
    rfl

Explanation:

Base Case (m = zero):
- We simplify both MyNat.plus_left zero n and MyNat.plus_right zero n using their definitions.
- Both simplify to n, so they are equal.
Inductive Step (m = succ m'):
- Assume that MyNat.plus_left m' n = MyNat.plus_right m' n holds for m'.
- We simplify MyNat.plus_left (succ m') n to succ (MyNat.plus_left m' n).
- We simplify MyNat.plus_right (succ m') n to succ (MyNat.plus_right m' n).
- By the induction hypothesis, these two expressions are equal.

This completes the proof that MyNat.plus_left m n = MyNat.plus_right m n for all natural numbers m and n.

Complete Proof:

example (m n: MyNat) : MyNat.plus_left m n = MyNat.plus_right m n := by
  induction m with
  | zero =>
    rw [MyNat.plus_left]
    rw [MyNat.plus_right]
    rfl
  | succ m ih =>
    rw [MyNat.plus_left]
    rw [MyNat.plus_right]
    rw [ih]
    rfl

EricGT · September 14, 2024, 9:43pm

Unless I read this wrong

it looks like Terrence had access to the OpenAI o1 models in August already which predates the public beta release of September.

RonaldGRuckus · September 14, 2024, 9:47pm

Yeah. He did touch on this in one of the posts:

turbolucius · September 14, 2024, 11:03pm

I’ve gotten it to think for 86 seconds, trying to solve a puzzle from one of the Professor Layton DS games. (It didn’t get that one right, but succeeded on others.)

EDIT:

The puzzle in question:

Mice are known for multiplying at lightning speed. The type of mouse we’re interested in here can give birth once a month, with litters of twelve. Baby mice become capable of reproducing after two months.

You bought one of these baby mice on the day it was born. After ten months, how many mice will you have?

O1’s answer after thinking for 86 seconds:

18 653

Actual answer:

…One. Because one mouse cannot reproduce on its own!

Foxalabs · September 15, 2024, 12:19am

Using o1 is different to ChatGPT with 4o or Turbo.

What I do at the moment is use gpt-4o to plan through my prompt to o1, I work with 4o to plan what I actually want o1 to do and build a prompt that covers the details of what I would like it to work on.

Once I have a detailed prompt ready, I will use o1 mini to create a list of steps needed to solve the problem and keep that on hand if I hit any issues with o1.

Then I send the main prompt to o1 and if I think it’s required the steps from o1-mini (usually not required, but useful to have for toucher problems) and I let it loose.

It’s a phenomenal solution space searcher and has come up with some really good ideas I’d not considered before.

If you do need to iterate, make your responses as detailed and explanatory as possible, I work with 4o to plan my responses as well to get the best out of it. Like a super smart assistant, it responds well to detailed and specific responses and suggestions for modifications and problems.

EricGT · September 15, 2024, 12:25am

I like what you are doing and might be doing almost the same, but don’t want to jump to the conclusion that we are doing the same.

When you get time could you share more of the details; I think your routine is better than mine.

Foxalabs · September 15, 2024, 12:36am

I think we are doing similar things.

I treat it as a research assistant. If I have a PhD on staff and I need them to go solve a problem for me, I don’t give them a one liner… bad results from that are on me.

Same idea really, I flesh out all of the details I need to get covered for the task, that in itself is a lot of work, so I make use of GPT-4o to streamline that process, it helps keep my rambling “blah, blah” concise and on point.

Usually, I work with the phone app and talk verbally with 40 to work the problem as I would with a human, for me that feels the most natural, but your milage may vary… you do you basically.

I give 4o the gist of what I need to do and I explain that I need the details and edge cases to be defined and fleshed out.

Once I have that done, I send it to o1-mini as it’s really quick to work with and is amazing at producing a list of steps to complete the task we’ve well defined prior.

Then I send the prompt from 4o and maybe the list from o1-mini as well to o1-preview, I will use 4o to go over the solution, break it into bite sized bits, look for any obvious misses vs. the spec we sent and look for early issues, if we find any, I work for 4o to build a response that accurately addresses the concerns we have and iterate.

The results have been pretty spectatular.

polepole · September 15, 2024, 7:27am

I created a prompt to play a game with GPT models. GPTs create a random number between 3 to 6 digits, and the user has to guess the number using 5 hints.

I tested two models, generating a 3-digit number in the first attempt and a 5-digit number in the second. The hints provided to the user were as follows randomly:

Nothing is correct.
One number is correct and well placed.
One number is correct but wrongly placed.
Two numbers are correct and well placed.
Two numbers are correct but wrongly placed.
Two numbers are correct; one is correctly placed and one is wrongly placed*

I used following prompt:

Prompt

You are “Password Cracking Game Creator”, an AI assistant that creates a numeric lock puzzle game. The user will specify the number of digits for the password (between 3 to 6 digits). Your task is to:

Generate a secret numeric password of the specified length.
Create 5 hints based on the secret password, following specific rules.
Display the puzzle and interact with the user using predefined hotkeys.

Step 1: Ask for the Number of Digits

Prompt the user:

"How many digits would you like the password to have? (Choose between 3 to 6 digits)"

Step 2: Generate the Secret Password

Generate a random numeric password (password) of the specified length (N).
Ensure that the password consists of digits from 0 to 9.
Digits may repeat unless you want to enforce uniqueness.

Step 3: Generate 5 Hints

Create 5 hints based on the following rules:

Correct and Well Placed:
- Indicate how many digits are correct and in the correct position.
Correct but Wrongly Placed:
- Indicate how many digits are correct but in the wrong position.
Nothing is Correct:
- Indicate when none of the digits are correct.

Rules for Generating Hints:

Each hint should be a combination of digits (a guess) and a statement about its correctness.
The guesses should be random numeric combinations of the same length as the password.
Make sure that the hints are consistent with the secret password.

Step 4: Display the Puzzle

Present the puzzle in a clear, formatted manner using markdown. Here’s the template:

Crack The Password?

A numeric lock has a [N]-digit key

+------+------+------+------+
|      |      |      |      |
+------+------+------+------+

HINTS

1. +------+------+------+------+
   | [D1] | [D2] | [D3] | [D4] |
   +------+------+------+------+
   [Hint for Guess 1]

2. +------+------+------+------+
   | [D1] | [D2] | [D3] | [D4] |
   +------+------+------+------+
   [Hint for Guess 2]

3. +------+------+------+------+
   | [D1] | [D2] | [D3] | [D4] |
   +------+------+------+------+
   [Hint for Guess 3]

4. +------+------+------+------+
   | [D1] | [D2] | [D3] | [D4] |
   +------+------+------+------+
   [Hint for Guess 4]

5. +------+------+------+------+
   | [D1] | [D2] | [D3] | [D4] |
   +------+------+------+------+
   [Hint for Guess 5]

Adjust the number of columns in the table based on N (the number of digits).
Replace [D1], [D2], etc., with the digits from each guess.
Replace [Hint for Guess X] with the corresponding hint message.

Example for a 3-digit key:

Crack The Password?

A numeric lock has a 3-digit key

+-----+-----+-----+
|     |     |     |
+-----+-----+-----+

HINTS

1. +-----+-----+-----+
   | 1   | 6   | 5   |
   +-----+-----+-----+
   One number is correct and well placed

2. +-----+-----+-----+
   | 4   | 3   | 9   |
   +-----+-----+-----+
   One number is correct but wrongly placed


... (and so on for hints to 6 like)
(Two numbers are correct; one is correctly placed and one is wrongly placed)
(Nothing is correct)
Two numbers are correct but wrongly placed)
(Two numbers are correct and well placed)
...

Step 5: Implement Hotkeys for User Interaction

Available Hotkeys:

H: Provide a hint.
- When the user inputs “H” or “h”, give one of the following hints:
  - “Remember, a digit may be correctly placed or wrongly placed based on the hints given.”
  - “Try to eliminate digits that are confirmed to be incorrect from the hints.”
  - “Focus on digits that are indicated to be correctly placed or wrongly placed multiple times.”
S: Show the final solution.
- When the user inputs “S” or “s”, display a step-by-step solution without using emojis.
E: Show the final solution using emojis.
- When the user inputs “E” or “e”, display a step-by-step solution using emojis ( for correct, for incorrect).
3, 4, 5, 6: Start a new game with the specified number of digits.
- When the user inputs “3”, “4”, “5”, or “6”, restart the game with a new password of that length.

Step 6: Provide the Final Solution When Requested

Structure for the Solution:

Eliminate Incorrect Digits:
- List the digits that are confirmed not to be in the password based on the hints.
Identify Potential Digits:
- Determine which digits might be in the password and their possible positions.
Determine the Correct Positions:
- Use the hints to narrow down the exact positions of the correct digits.
Reveal the Password:
- Present the cracked password clearly.
Verify Against All Hints:
- Show how the password satisfies all the provided hints.

Example Without Emojis:

### Step 1: Eliminate Incorrect Digits

- From Hint 5: Digits 0, 9, 1 are not in the password.

### Step 2: Identify Potential Digits

- Possible digits: 2, 3, 4, 5, 6, 7, 8

### Step 3: Determine the Correct Positions

- From Hint 1 and 2, digits 6 and 2 are correctly placed.
- From Hint 3, digits 2 and 6 are correct but wrongly placed.
- Therefore, adjust positions accordingly.

### Step 4: Reveal the Password

- The password is 362.

### Step 5: Verify Against All Hints

- Confirm that the password satisfies all hints provided.

Example With Emojis:

### Step 1: Eliminate Incorrect Digits

- Digits eliminated: 0 ❌, 1 ❌, 9 ❌

### Step 2: Identify Potential Digits

- Potential digits: 2 ✅, 3 ✅, 4 ✅, 5 ✅, 6 ✅, 7 ✅, 8 ✅

### Step 3: Determine the Correct Positions

- Position 1: 3 ✅
- Position 2: 6 ✅
- Position 3: 2 ✅

### Final Password

+-----+-----+-----+
|  3  |  6  |  2  |
+-----+-----+-----+

🎉 Password Cracked: `362` 🎉

Step 7: Additional Implementation Details

Randomness: Ensure that guesses and the secret password are generated randomly.
Consistency: Hints must be consistent with the secret password.
User Interaction: Be responsive to user inputs and provide clear prompts.

Final Notes

Adjustability: Ensure that all parts of the game adjust dynamically based on the number of digits (N).
Clarity: Present information in a clear and user-friendly manner.
Interactivity: Respond appropriately to user commands and provide helpful feedback.

Results:

GPT-4o:

The hints were mostly incorrect for both 3 and 5-digit numbers.

The only accurate hint in both digits was “Nothing is correct.”

GPT-4o was kidding me “I’m glad you enjoyed the game. If you’d like to play again or need help with anything else, feel free to ask. Have fun! ”

Of course! I can play again but not a random game.

GPT o1-preview:

100% successful. All hints provided were correct.

Action	Thought (seconds)
For the prompt	Few
For 3 digits (hint generation)	19
For answer (3 digits)	26
For 5 digits (hint generation)	23
For answer (5 digits)	41

Comparison each hint for both models:

(Hints are randomly, not in the same order)

Model	3-Digits	Hint 1	Hint 2	Hint 3	Hint 4	Hint 5
GPT-4o
GPT o1-preview

Model	5-Digits	Hint 1	Hint 2	Hint 3	Hint 4	Hint 5
GPT-4o
GPT o1-preview

Chat histories

Random numbers / GPT-4o
Random numbers / o1-preview

axelS · September 15, 2024, 7:44am

My experience:
GPT-o1 is absolutely stunning - exceeds my expectation by order of magnitude.

The Test
We are building an Autonomous Innovation system, based on an award winning innovation framework.
We test ingenuity, creativity and level of disruption.

The task is to build a geothermal energy generator and get to the earths crust to a level where we have 500 degree C, the same heat that we have with nuclear power plants. This is at about 20 to 25 km depth.
The problem is that we cant drill that deep for many known physical limitations.

It took only 2 prompts
We gave it the above challenge.

It failed to get to a meaningful solution.
We ask to question every condition to get there
The answer was a disruptive solution
The key aspect of the result is below
It reads like the model is surprised about its own discovery.

GPT’s Answer:

Mind-Bending Aspect

By questioning the necessity of a vertical approach, we unlock a new paradigm for accessing deep geothermal energy:

Redefining “Depth”: Depth doesn’t have to be measured purely in vertical terms. By combining horizontal and inclined tunneling, we can reach the same geothermal horizons without the limitations imposed by vertical shafts.

Bringing Equipment to the Heat: Transporting steam turbines and other heavy machinery through inclined tunnels is more practical than lowering them down vertical shafts. This approach allows us to set up fully functional power generation facilities deep underground.

New Industrial Frontiers: Establishing underground facilities at such depths could revolutionize not only energy production but also mining, scientific research, and underground construction techniques.

Countless scientists could not get there and came up with bizar futuristic concepts. After the above answer wo know it is feasible today. We already can drill over 50 km long Tunnels and 500°C is nothing for todays steel tools.
Our GPTBlue will use gpt-o1-preview as early as next week.

EricGT · September 15, 2024, 11:04am

This was noted on Hacker News but the comment section is worth a look, presently 349 comments.

Most of it digresses into a discussion not related to what Terence posted but if you want to know what others think about LLMs at present the comments are worth a read.

AndrewR · September 16, 2024, 10:37am

It did an amazing job on a problem that ChatGPT4 had problems with last year. I really like the “Thinking” feature we now have, but I do have one question, see below.

First, the good stuff, it went like this:

Charting invoicing trends

I’m piecing together a Python program to read an Excel or CSV file, calculate monthly invoices for the next 12 months (starting 10/1/2024), and format the output columns as Month and Total.

Parsing data details

I’m analyzing a file with columns on clients, invoicing intervals, next dates, and amounts. Mapping out monthly invoices requires parsing and structuring this data.

Determining invoicing dates

I’m figuring out invoicing dates for each client, setting them 12 months apart. For instance, monthly intervals mean invoices are due on 10/1/2024, 11/1/2024, etc., up to 9/1/2025. Quarterly intervals adjust slightly, showing clear patterns.

Mapping out billing intervals

I’m calculating invoice dates for each client over the next 12 months, starting from October 1, 2024. Using dateutil.relativedelta helps manage varying months and years effectively.

Establishing key dates
(…)
Navigating invoice dates
(…)
Setting up the CSV reader
(…)
Evaluating options
(…)
Reading and parsing
(…)
Adjusting the method
(…)
Reviewing client schedules
(…)
Summing up values

OK, let me see. Denying participants works it out to be $3,750. Combining this with the Cook County $2,578 contribution, the total comes to $6,328.

Everything was perfect EXCEPT that very last one under “Summing up values”, referring to something that sounds like welfare calculations for “Cook County”.

That final thought is a hallucination of something completely unrelated (not even from its “Memory” of all my interactions). Immediately after that it produced the code, which was perfect, so the hallucination did not affect the final result.

So finally my question is: now that we can see where in the thought process these hallucinations occur, would it be helpful for OpenAI if we had a button to push to inform you about it?

Thank you.

EricGT · September 16, 2024, 1:14pm

I am sure that is on their list, I know it would be on mine if I were doing such work.

The model is beta for the public and even more limited than when all of the other models have been released.

As this reads more like a feature request, do you want it moved to the ChatGPT category with the subcategory of feature request?

Also, please consider sharing a link to the chat so others can see all the details which might also be of interest to OpenAI.

ChatGPT Shared Links FAQ

zaggy00 · September 16, 2024, 3:44pm

I was fascinated by the “thinking” and just wish I’d known I was going to hit a limit. I was just running things through it to watch it think and suddenly - I hit my month’s limit! Do’h!

curt.kennedy · September 16, 2024, 11:21pm

Here’s my “max thinking” time of 168 seconds:

I asked it a pretty challenging 2-D optimization problem involving area codes. The average was around 135 seconds, but this one stumped it, so it took longer at 168 seconds.

But good news, this saved me about 2 hours of work.

AndrewR · September 17, 2024, 1:27am

Thanks! In retrospect it does seem pretty obvious, so I’m sure if it’s important they will add it - I’ll refrain from adding a feature request. As for the chat link, unfortunately I can’t do that as I used some real invoicing data. If you’d like to know any specifics, I’ll be happy to post the whole conversation redacted of financial details.

Topic		Replies	Views
Different results: ChatGPT3.5 vs API (gpt-3.5-turbo) API	52	26620	August 22, 2024
How to confirm that you got the correct value from a text other than repeating the same prompt over and over API	39	550	September 1, 2024
Google Spreadsheet + GPT3 Community	62	24104	November 30, 2023
A better Chain Of Thought prompt Prompting	42	16196	July 13, 2023
Plugin injection attack, pseudo code prompts, chain of thought, plugin orchestration, and more Plugins / Actions builders gpt-4 , plugin-development	26	6650	April 14, 2024