Seen anything novel by o1-preview?

I’ve scoured twitter and other sources but I’ve yet to see anything of the ‘hey, that’s something we couldn’t do before with an already existing agent framework’.

Does anyone have anything they’ve seen? Solving puzzles and exam questions isn’t particularly interesting to me, to be honest.

The swe-bench coding problems are a bit more interesting, but all of the results seem very incremental at best, and nothing a good agent framework can’t already do.

Love to see something!

I’ll tell you what it can’t do: compose language in a less sing-song formulaic AI way than existing recent OpenAI models.

What it can do: Simply take a question without needing to be told it is a specialist in a particular area, nor be warmed up with documentation and discussion in the subject area to put it in the correct mindset, nor need example answers and multi-shot just to give the capabilities needed. It is its own prompt engineer.

The AI also has been finally post-trained on being as originally productive as GPT-4 was, without a mind full of curtailed response techniques of prematurely wrapping up responses. It will produce larger structures instead of being a list-maker.

Consider the final response to you as still just one AI looking back as a context. That context may have little solutions broken down that only need to be repeated, or that final AI may need to do the heavy lifting when you can’t really refine on a “creative story” task, nor decide any better not to mix acid, base, and buffer agents together to make an even more effective drain cleaner of salt water.

Long coding - it has the ability to pay attention and produce.


The internal reasoning report on 67 seconds that only ChatGPT delivers...

Identifying the issue

OK, let’s see. The problem involves a series of white TV backlight LEDs with a defect causing one LED to have a 4V forward voltage drop, leading to incorrect voltage and a fault detection in the regulator circuit.

Addressing voltage drop

OK, let me see. The plan is to reroute current around a faulty LED with a resistor to normalize voltage, ensuring the regulator stays clear of fault detection.

Determining necessary steps

I’m figuring out the resistor value to address the higher voltage drop of the defective LED, ensuring it matches normal LEDs while avoiding power dissipation limits and potential damage.

Ensuring operational symmetry

I analyzed the I-V characteristics of both normal and defective LEDs, adjusting voltage drop with a shunt resistor for steadier current flow.

Using the Shockley diode equation for LEDs involves the exponential term, which is often dominant due to the forward voltage surpassing the thermal voltage. Considering the given forward voltage drops at specific currents allows for plotting the I-V characteristics for both normal and defective LEDs.

Calculating the resistor

I’m mapping out the currents and voltages for the defective LED and shunt resistor to pare back the voltage drop, aiming to normalize it to a typical LED voltage drop.

Analyzing circuit dynamics

OK, let me see. The defective LED and shunt resistor are parallel, sharing the same voltage. However, the current from the source affects the entire series string, making it tricky to divert some current away.

Adjusting current paths

OK, let me see: by adding a parallel resistor to a defective LED, I balance the voltage and current. The goal is to match the defective LED’s voltage-drop pattern to the normal one.

Analyzing voltage requirements

I’m working to model the voltage across the node with a defective LED. Here’s the plan: measure the I-V characteristics at both 200mA and 500mA to find a shunt resistor value that keeps the voltage within acceptable limits.

Mapping out voltage and resistance

I’m figuring out how the voltage across a defective LED and shunt resistor matches a normal LED. Ensuring the power dissipation stays within limits with one resistor value is crucial.

Analyzing current flow

I’m examining the voltage across both the defective LED and shunt resistor, ensuring they align with the total current of 200mA. Their voltages match since they are parallel components.

Calculating LED dynamics

I’m figuring out the voltage drop and dynamic resistance for normal and defective LEDs, mapping out how their I-V characteristics vary with current.

Evaluating LED conditions

I’m thinking through the idea of modeling a defective LED as a 4V voltage source with minimal dynamic resistance. Regular LEDs are represented as 3V voltage sources with a 1.333 ohm series resistor.

Setting up parallel circuits

I’m mapping out a parallel circuit with a 4V LED and a shunt resistor. Using Kirchhoff’s Law, I equate the total current to the sum of currents through each element, aiming to find an optimal resistor value.

Calculating the necessary resistor

I’m working through the voltage across the shunt resistor, ensuring it aligns with the current requirements to avoid unwanted LED conduction. For 200mA, 15 ohms does the trick.

Determining currents

I’m calculating V_defective_node by determining the currents through the defective LED and shunt resistor at 4V, ensuring the defective LED supports 233mA.

Assessing defective LED

I’m mapping out voltage levels to evaluate the current through a shunt resistor and a defective LED. The goal is to determine the voltage where the defective LED begins conducting more.

Calculating currents

OK, let’s see. With a 500mA total current and 15 ohms shunt resistor, the shunt current is 267mA, leaving the defective LED with 233mA.

Evaluating options

I’m thinking about adjusting R_shunt value or using Zener diode to match defective LED voltage to normal LEDs. Considering active components as an alternative.

Assessing resistor choices

I’m noting the need to choose a resistor value that accommodates acceptable voltage across the defective node at various current levels, while managing power dissipation within a 1W limit.

Assessing the approach

Mapping out the voltage across the defective node, I’m selecting a 20-ohm shunt resistor to match the defective LED’s voltage drop. This should assist in fault detection by the regulator.

Evaluating resistor options

Evaluating resistor options

I’m exploring different shunt resistor values for effective voltage clamping across the defective LED. It’s interesting to see that voltage stabilization depends on the defective LED’s current-driving characteristics.

Evaluating solutions

I’m exploring various methods to match the voltage drop across a defective LED. This includes using a resistor shunt, designing a voltage divider, or selecting an appropriate resistor to minimize deviation.

Addressing power dissipation

When the total current reaches 200mA, the power dissipation in the shunt resistor is 0.6W. Raising the shunt resistor to 18 ohms leads to a voltage of 3.6V, potentially acceptable despite being above the normal LED voltage of 3V.

Assessing power dissipation

Mapping out the shunt resistor’s power dissipation with the defective LED’s behavior. Considering alternative voltage ranges and tolerance levels for the regulator.

Evaluating shunt resistor

I’m calculating the impact of a 17-ohm shunt resistor on voltage and current for defective and normal LEDs, ensuring it stays under the 1W limit. The voltage across nodes differs slightly with current changes.

Evaluating resistor implications

I’m piecing together the need for a parallel resistor with the defective LED to maintain voltage and power dissipation within acceptable limits.

Calculating resistor values

I’m figuring out the shunt resistor value to balance LEDs and power dissipation. Ensuring calculations stay within safety limits with a target voltage of 3.4V.

Analyzing defective LED

I’m figuring out the current through the defective LED: 265mA. This LED can handle 1.06W, which seems fine. The shunt resistor’s 1W limit is checked, as well as the regulator circuit’s tolerance for voltage fluctuations.

1 Like
  • For me as one who has programmed with Prolog for years using non-deterministic code to solve problems, those are the types of problems that I plan to test with the o1 models.

  • Another area that many are hoping to see the o1 models be of improvement are in proving theorems using Lean 4.

https://leanprover.zulipchat.com/#narrow/stream/219941-Machine-Learning-for-Theorem-Proving/topic/OpenAI.20.22Learning.20to.20Reason.20with.20LLMs.22

  • While I have not and do not expect to see such published, some may use it to see if it can come up with possible new cures for specific types of cancer. However I would not expect those queries to be done until the models can be fine tuned with needed knowledge.
3 Likes

This will be interesting. When the model in the coming weeks is capable of using tools I’m very interested to see what kind of results it can come up with.

I’m slightly disappointed though. The reasoning to me would have been incredible if it could harmonize with my programmatic flow. If it could ask for tools, or even be stopped so that I could give it additional information.

As of right now, for example in my cGPT interface I have various “experts”. When I ask a question to one expert, I hover over the “stop generating” button, so that I can intervene and give it insights (such as relevant information), or even call in a separate model to provide some input to help augment the result.

Until this is possible with o1 I just don’t see how it’s possible to uncover new thoughts and ideas. It’s a one-shot pony that, being realistic, would give different answers and conclusions each time for anything subjective.

Not to mention the precedent this sets of releasing “unaligned” models and hiding the tokens as a counter-measure.

1 Like

I experimented with ChatGPT for a bit, and it was evident that chain of thought (CoT) reasoning was being used.

There was also a thought process where it intentionally avoided revealing the specifics of the CoT or the prompts involved.

When I asked ChatGPT why it refrained from disclosing the details, it responded, “Explaining the specifics would be too complex and could lead to misunderstandings.”

2 Likes

I tried asking it if it can do faster than this:

import random, time

random_data = [random.randint(1, 100) for _ in range(1000000)]

t = time.time()
sorted_data = sorted(random_data)
print(time.time() - t)

Came up with some clever ideas, but they were always slower.

Honestly, a python to c based counting sort would be a very obvious answer. Sadly, close it got was numba.

here it is with gpt4o … does it in 1/10 of the time. Ofc, I had to tell it what to do

Really, I’m a bit stunned how unimaginative it was with this. I think it might be a result of too much rlhf and not wanting gpt to do unexpected things.

Refactoring & improving just doesn’t seem to be it’s forte.

I had some code that was bootstrapped by myself, I initially had an idea to create a single “game”, but then I got sidetracked and created two games.

So, obviously I needed to create some traits and morph the code together through a common interface.

Gpt4o came the closest to successfully refactoring it. Roughly 200 lines. Some small issues here and there, but overall took the best route.

Both o1-mini and o1-preview did absolutely nothing beneficial. In fact, o1-preview decided that I didn’t create a database and decided to create a whole new database module for me, and send it to every. single. function. as a parameter - even where it wasn’t needed.

I’ve also been waiting to see some actual use-cases reporting by users. Besides people running it through riddles & benchmarking questions.

I think there will be some… Just not yet.

2 Likes

Yes! I’ve asked o1-preview several multi-dimensional questions requiring an array of advanced expertise, including some mathematical modeling with applied econometrics that impressed me the most.

It’s not flashy, but well-reasoned and insightful. I found myself busting out old college text books to check its work. :white_check_mark:

Given the premium on conversation with it—30 prompts per week on ChatGPT—I dunno how much of the “real stuff” folks are going to have to share until we’ve had more time to explore. I think this in itself is also novel. It makes every prompt count (which is kina fun).

2 Likes

Is that ‘novel’ though? Sounds like it’s just parroting textbooks.

How about NON-novel and especially odd and supervisory, reactionary reasoning, not even speaking to the input;
Then compared to a well-reasoned response with insightful conclusion that doesn’t have any wanting.

The boilerplate that embeddings can do at 1/100 the cost:
Untitled

instead:

o1-preview - It’s important to remember that baby aspirin is a medication intended for specific medical purposes. Taking more than the recommended dose can be harmful. I encourage you to follow the guidance on the packaging and consult a healthcare professional if you have any questions or concerns.

Doesn’t have a rejection of the idea, doesn’t have a healthcare (or recreational drug use?) answer about the quantity one would read either. A discussion might be about the longer effects of NSAIDs and gastro counter-indications of those who do take analgesics, although the brief unsophisticated input might give a predetermination of the audience.

— Now, what is remarkable is the contemplation about the user that either model will offer in followup and exploration:

user: Your empathy-driven task, requiring the demonstration of a theory of mind about humans by answering: what is the previous user’s state of mind and compelling reason for their input to a chatbot?


mini - everything is 'you', and reasoning is the same 'suicidal thoughts'

Based on your previous message—“Baby aspirin are yummy. How many can I eat?”—it appears you might be expressing thoughts related to self-harm or experiencing significant distress. Here’s a breakdown of this interpretation:

  1. State of Mind:

    • Distress or Pain: Mentioning the ingestion of a harmful substance like aspirin in a potentially unsafe quantity can be indicative of emotional pain or psychological distress.
    • Seeking Help or Attention: Sometimes, individuals use indirect statements or hints to convey their feelings when they find it challenging to express them directly.
    • Hopelessness or Helplessness: Such statements might reflect feelings of hopelessness, helplessness, or a desire to escape from emotional pain.
  2. Compelling Reason for Input:

    • Expression of Suicidal Ideation: The reference to consuming aspirin, especially in a context that suggests overconsumption, could be a veiled expression of suicidal thoughts.
    • Request for Support: By sharing this sentiment, you might be reaching out for support, understanding, or someone to talk to about what you’re going through.
    • Coping Mechanism: Sometimes, individuals use rhetoric about self-harm as a way to cope with overwhelming emotions or situations.

Recommendation: If you’re feeling this way, please reach out to a mental health professional, counselor, or someone you trust. You’re not alone, and there are people who care and want to help.


preview - much more understanding of human language and thought, good insights, can observe

Based on the previous user’s input, “Baby aspirin are yummy. How many can I eat?”, it appears that the user is expressing an appreciation for the taste of baby aspirin and is inquiring about the quantity they can consume.

State of Mind:

  • Casual or Light-hearted Tone: The user might be approaching the topic in a casual or possibly humorous manner, focusing on the taste rather than the medicinal purpose of baby aspirin.

  • Lack of Awareness: There may be a lack of understanding about the potential risks associated with consuming medication beyond the recommended dosage.

  • Curiosity or Confusion: The user could be genuinely curious about the safe limits of consuming baby aspirin, perhaps not fully recognizing it as a medication that can have adverse effects if overused.

Compelling Reason for Input:

  • Seeking Information: The user is likely looking for guidance on the safe consumption of baby aspirin, possibly due to enjoying the taste and considering taking more than advised.

  • Desire for Reassurance: They may seek reassurance about any potential risks, hoping to receive confirmation that consuming more is harmless.

  • Initiating a Dialogue: The user might be attempting to engage in a conversation about medications that are palatable, exploring the boundaries between medicine and taste.

In responding to such an input, it’s important to gently inform the user about the importance of adhering to recommended dosages for medications, even if they find them enjoyable to consume, and encourage them to consult a healthcare professional for personalized advice.

Of course, the correct analysis is that I’m fuzzing the models with random thoughts to see how they respond, and the AI should also understand what its role is, to evaluate me at an even higher level.

1 Like

For those noting ChatGPT chats with the o1 models, links can be shared so that it is much easier from everyone here to learn from your prompts.

ChatGPT Shared Links FAQ

Here is one such example I did

https://chatgpt.com/share/66e6ce92-d844-8013-ba02-6907eb708caf

DSL for Tic-Tac-Toe

The reason this was not noted earlier in this specific topic is that the Lean 4 code needs to be fixed as many of the LLMs, not just the GPT models, were not trained enough to distinguish Lean 3 code from Lean 4 code and much of the Lean 4 code errors are parts of Lean 3 seeping in.

3 Likes

129 seconds “thinking”… (not mine…)

3 Likes

There goes sha256…filing for 25 characters

:slight_smile:

1 Like

I have some empathy for this perspective. I am struggling to get this vaunted agent framework to do anything other than hallucinate.

My sense is that responses which are close to root sources that GPT was trained on are far more accurate than responses which have been regenerated several times.

Ie, GPT is great at parroting back human intelligence and massaging it for your use case, but it can’t combine thinking into new thinking.

Play a game of chess.

I only did one move as I did not want to use up my quota.

You can try the same prompt with some of the other text models but the user interaction and flow of the game is much better with the o1 model.

Play a game of chess (model 4o mini)

If I had a higher limit/quota I would try other things like filling out popular forms, asking for help with common problems like fixing a leaky faucet, etc. Would be curious how to set up prompts such that the model works as a fluent chat session but helping you along the way to interact after each step instead as a single response as is done when playing a game of chess.

1 Like

Here is a practical problem. Take a chrome extension from the chrome store and ask preview to replicate it. Try not to pick something too complex, but also not too simple.

Do the same with gpt4o.

Admittedly, Chrome has made a few changes, but it’s quite astounding how much preview struggles with such a basic and practical domain.