O1 not as good as o1-preview for problem solving

mgreiner79 · December 8, 2024, 10:09am

I was using o1 preview heavily for a challenging coding task. The challenge involved working with two complex external libraries and video processing hardware (lots of moving parts) to build an app. O1-preview was incredible at finding solutions to errors. It would “think outside the box”, even if the solutions were so complex that I did not necessarily want to use all of them.
After the switch to o1, the chat doesn’t seem to find solutions anymore. When something doesn’t work, rather than find a solution, it assumes the external libraries are to blame, and says something like “likely does not support that” and leaves it at that.
Bottom line: the new o1 does not seem to be as creative at finding solutions compared to o1-preview.

Is anyone else experiencing this?

merefield · December 8, 2024, 10:11am

Are you using a Pro subscription?

razvan.i.savin · December 8, 2024, 8:32pm

I believe he has the Plus version. They switched from o1-preview to o1, which now has a limit on prompts—mine was used up in one day.

I noticed two differences:

o1-preview was excellent at coding.
o1, the replacement for o1-preview, is somewhat rigid.
- I had to repeatedly ask it to print the code in a specific format and provide references, but it kept making the same mistakes…
- I told it multiple times that I wanted the complete method printed after each update, but it kept printing snippets… waste of prompts

dahlyx · December 9, 2024, 12:05am

Not just problem solving, I am struggling to have o1 follow instructions that o1-preview understood perfectly.

anon29740791 · December 9, 2024, 2:42am

but it kept printing snippets

I had this happen to me as well. I think that o1-preview was moved to o1 pro. o1 now gives a .diff when returning code, which includes a + character at the beginning of each line, even when explicitly told not to do so.

GPT 4o seems to perform better in most tasks than o1 now, which is really disappointing, and there seems to be no difference now between the free version of ChatGPT and Plus.

In an example, I told it to use a specific function to get data, and instead it added filler text in place and never called the function.

I tried running o1-preview in the API with the same prompt, but unfortunately I get a server error every time.

mgreiner79 · December 9, 2024, 6:39am

I’m using the Plus version

mgreiner79 · December 9, 2024, 6:47am

If it’s true that they moved the o1-preview to pro, and are now charging 10-times the price to access it, then I’m baffled.
I’m sorry, but 10-times the price!? That must be a huge value-adder to justify that price for me (or any enterprise).
I guess they figure, if by increasing the price, they lose 9 out of 10 Plus customers, but gain just 1 out of 10 pro customers, they are earning the same (and probably spending less computing power).
This is really going to open up the door to the competition. I am seriously contemplating alternatives now (Hey #Claude, #Llama, #Mistral, now’s your chance).

mgreiner79 · December 9, 2024, 6:50am

If they moved o1-preview to o1 pro because it was too computationally expensive to run and only charge $20/month. And the only way to offer that model is to charge $200/month, then o1 was really not the technological breakthrough it seemed.
It’s like, “yeah, we can technically fly you to the moon, but only the rich can afford it.”

razvan.i.savin · December 9, 2024, 7:01am

Well, I found this on reddit:

o1 at home

Think-Respond pipe that has an internal reasoning steps and another for producing a final response based on the reasoning.

"""
title: Think-Respond Chain Pipe, o1 at home
author: latent-variable
github: https://github.com/latent-variable/o1_at_home
version: 0.3.0
Descrition: Think-Respond pipeline that has an internal reasoning steps and another for producing a final response based on the reasoning.
            Now supports openAI api along with ollama, you can mix and match models 

Instructions: 
To use the o1 at home pipeline, follow these steps:

Add the Pipe Manifold:
Navigate to the Admin Panel and add the pipeline to the list of available "Functions" using the '+'.
This is not a "pipeline", Ensure you are using Function tab. 
If you are copying the code you might need ot give it name and descriprition 

Enable the Pipe Manifold:
After adding it, enable the pipeline to make it active.

Customize Settings:
Use the configuration menu (accessed via the settings cog) to tailor the pipeline to your needs:
    Select Models: Choose your desired thinking model and response model.
    Show Reasoning: Decide whether to display the reasoning process or keep it hidden.
    Set Thinking Time: Specify the maximum time allowed for the reasoning model to process.
Save and Apply:
Once configured, save your settings to apply the changes.
You should now have o1 at home in your dorp down. 

These steps ensure the pipeline is set up correctly and functions according to your requirements.
"""

import json
from time import time
from pydantic import BaseModel, Field
from dataclasses import dataclass
from typing import (
    Dict,
    List,
    Optional,
    Callable,
    Awaitable,
    Any,
    AsyncGenerator
)
import asyncio
from open_webui.utils.misc import get_last_user_message
from open_webui.apps.openai import main as openai
from open_webui.apps.ollama import main as ollama
import logging

logger = logging.getLogger(__name__)
if not logger.handlers:
    logger.setLevel(logging.DEBUG)
    handler = logging.StreamHandler()
    handler.set_name("think_respond_chain_pipe")
    formatter = logging.Formatter(
        "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    )
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.propagate = False

@dataclass
class User:
    id: str
    email: str
    name: str
    role: str

class Pipe:
    class Valves(BaseModel):
        THINKING_MODEL: str = Field(
            default="your_thinking_model_id_here",
            description="Model used for the internal reasoning step.",
        )
        USE_OPENAI_API_THINKING_MODEL: bool = Field(
            default=False,
            description="Off will use Ollama, On will use any OpenAI API",
        )
        RESPONDING_MODEL: str = Field(
            default="your_responding_model_id_here",
            description="Model used for producing the final response.",
        )
        USE_OPENAI_API_RESPONDING_MODEL: bool = Field(
            default=False,
            description="Off will use Ollama, On will use any OpenAI API",
        )
        ENABLE_SHOW_THINKING_TRACE: bool = Field(
            default=False,
            description="Toggle show thinking trace.",
        )
        
        MAX_THINKING_TIME: int = Field(
            default=120,
            description="Maximum time in seconds the thinking model can run.",
        )

    def __init__(self):
        self.type = "manifold"
        self.valves = self.Valves()
        self.start_thought_time = None
        self.max_thinking_time_reached = False
        self.__user__ = None

    
    def pipes(self):
        return [{"name": "o1 at home", "id": "o1_at_home"}]

    def get_chunk_content(self, chunk: bytes):
        """
        Process a chunk of data from the API stream.

        Args:
            chunk (bytes): The raw byte content received from the API stream.
            api (str): The source API, either 'openai' or 'ollama'.

        Yields:
            str: The extracted content from the chunk, if available.
        """
        chunk_str = chunk.decode("utf-8").strip()

        # Split the chunk by double newlines (OpenAI separates multiple data entries with this)
        for part in chunk_str.split("\n\n"):
            part = part.strip()  # Remove extra whitespace
            if part.startswith("data: "):
                part = part[6:]  # Remove "data: " prefix for OpenAI chunks
            
            if not part or part == "[DONE]":
                continue  # Skip empty or end markers

            try:
                chunk_data = json.loads(part)
                if "choices" in chunk_data and len(chunk_data["choices"]) > 0:
                    delta = chunk_data["choices"][0].get("delta", {})
                    content = delta.get("content", "")
                    if content:  # Only yield non-empty content
                        yield content
            except json.JSONDecodeError as e:
                logger.error(f'ChunkDecodeError: unable to parse "{part[:100]}": {e}')

    async def get_response(self, model: str, messages: List[Dict[str, str]], thinking: bool, stream: bool ):
        """
        Generate a response from the appropriate API based on the provided flags.

        Args:
            model (str): The model ID to use for the API request.
            messages (List[Dict[str, str]]): The list of messages for the API to process.
            thinking (bool): Whether this is the 'thinking' phase or the 'responding' phase.

        Returns:
            tuple: (response, api_source) where `response` is the API response object
                and `api_source` is a string ('openai' or 'ollama') indicating the API used.
        """
        # Determine which API to use based on the `thinking` flag and the corresponding valve
        use_openai_api = (
            self.valves.USE_OPENAI_API_THINKING_MODEL if thinking 
            else self.valves.USE_OPENAI_API_RESPONDING_MODEL
        )

        # Select the appropriate API and identify the source
        if use_openai_api:
            generate_completion = openai.generate_chat_completion
        else:
            generate_completion = ollama.generate_openai_chat_completion


        # Generate response
        response = await generate_completion({"model": model, "messages": messages, "stream": stream}, user=self.__user__)

        return response
    
    async def get_completion(self, model: str, 
                             messages:list,
                             __event_emitter__: Optional[Callable[[Any], Awaitable[None]]] = None,):
        response = None
        try:
            thinking = False
            stream = False
            response = await self.get_response(model, messages, thinking, stream)
            return response["choices"][0]["message"]["content"]
        except Exception as e:
            await __event_emitter__({ "type": "status","data": {"description": f"Error: ensure {model} is a valid model option {e}", "done": True}})
        finally:
            if response and hasattr(response, 'close'):
                await response.close()


    async def stream_response(
        self,
        model: str,
        messages: List[Dict[str, str]],
        thinking:bool,
        __event_emitter__: Optional[Callable[[Any], Awaitable[None]]] = None,
    ) -> AsyncGenerator[str, None]:
        
        try:
            stream = True
            response = await self.get_response(model, messages, thinking, stream)
            while True:
                chunk = await response.body_iterator.read(1024)
                if not chunk: # No more data
                    break
                for part in self.get_chunk_content(chunk):
                    yield part

                if thinking:
                    current_time = time() #check to see if thought time has been exceded 
                    if (current_time - self.start_thought_time) > self.valves.MAX_THINKING_TIME:
                        logger.info(f'Max thinking Time reached in stream_response of thinking model "')
                        self.max_thinking_time_reached = True
                        break

        except Exception as e:
            print('*********', e)
            await __event_emitter__({ "type": "status","data": {"description": f"Error: ensure {model} is a valid model option {e}", "done": True}})
            
        finally:
            if response and hasattr(response, 'close'):
                await response.close()


    async def run_thinking(
        self,
        messages: list,
        query: str,
        __event_emitter__: Optional[Callable[[Any], Awaitable[None]]] = None,
    ) -> str:
        await __event_emitter__({ "type": "status","data": {"description": "Thinking...", "done": False}})

        # We will stream the reasoning steps. The reasoning prompt:
        thinking_messages = {
            "role": "user",
            "content": f"You are a reasoning model.\nThink carefully about the user's request and output your reasoning steps.\nDo not answer the user directly, just produce a hidden reasoning chain.\nUser Query: {query}"
        }
        # replace last message 
        messages[-1] = thinking_messages
       
        reasoning = ""
        thinking = True
        async for chunk in self.stream_response( self.valves.THINKING_MODEL.strip(), messages, thinking, __event_emitter__):
            reasoning += chunk 
            if self.valves.ENABLE_SHOW_THINKING_TRACE:
                # Emit chunk as a "thinking" type message
                await __event_emitter__({ "type": "message","data": {"content": chunk, "role": "assistant-thinking"}})

        
        await __event_emitter__({"type": "status","data": {"description": "Finished thinking.", "done": False}})
        await asyncio.sleep(0.2)
        return reasoning.strip()

    async def run_responding(
        self,
        messages: list,
        query: str,
        reasoning: str,
        __event_emitter__: Optional[Callable[[Any], Awaitable[None]]] = None
    ) -> Dict[str, Any]:
        await __event_emitter__({"type": "status","data": {"description": "Formulating response...", "done": False}})

        responding_messages = {
            "role": "user",
            "content": f"Here is some internal reasoning to guide your response:\n<reasoning>{reasoning}<reasoning-end>\nUse this reasoning to respond in concise and helpful manner to the user's query: {query}"
        }
        # replace last message 
        messages[-1] = responding_messages

        response_text = "\n\n### Response:\n"
        await __event_emitter__({ "type": "message", "data": {"content": response_text, "role": "assistant"}})

        thinking=False
        async for chunk in self.stream_response( self.valves.RESPONDING_MODEL.strip(), messages,thinking,  __event_emitter__ ):
            response_text += chunk
            # Emit response chunks as assistant message
            await __event_emitter__({ "type": "message", "data": {"content": chunk, "role": "assistant"}})

        await asyncio.sleep(0.2)
    
    async def pipe(
        self,
        body: dict,
        __user__: dict,
        __event_emitter__: Optional[Callable[[Any], Awaitable[None]]] = None,
        __task__=None,
    ) -> str:
        
        # Get relavant info
        self.start_thought_time = time()
        self.__user__ = User(**__user__)
        messages = body["messages"]
        query = get_last_user_message(messages)
        
        if __task__ == None: # only perform thinking when not a defined task like title generation
            # Run the "thinking" step 
            reasoning = await self.run_thinking(messages, query, __event_emitter__)
            thought_duration = int(time() - self.start_thought_time)

            # Run the "responding" step using the reasoning 
            await self.run_responding(messages, query, reasoning, __event_emitter__)

            if self.max_thinking_time_reached:
                await __event_emitter__({ "type": "status", "data": {"description": f"Thought for max allowed time of {thought_duration} seconds", "done": True} })
            else:
                await __event_emitter__({ "type": "status", "data": {"description": f"Thought for only {thought_duration} seconds", "done": True} })
            return ""
        else:
            # avoid thinking and just return a regular response or named task, like tags 
            return await self.get_completion(self.valves.RESPONDING_MODEL.strip(), messages, __event_emitter__)

Source: https://openwebui.com/f/latentvariable/o1_at_home

gfdgfdg · December 10, 2024, 4:09pm

I too think that o1 is far worse than o1-preview was. at this point i’m seriously considering droping gpt and try out competition. weird move from openai

jabbaratos · December 10, 2024, 4:13pm

Same here, if the current o1 is the reasoning model you get for $20/month, it’s not there yet. It’s extremely lazy and bad with it answers, to the point 4o is just superior.

Also cancelled my subscription over the lack of transparency on any of this.

razvan.i.savin · December 10, 2024, 6:16pm

I was thinking about that too. I got bored and lost interest in AI…

EDIT:
I think I have to continue…

It’s just the beginning of AI.

anon10827405 · December 10, 2024, 6:18pm

Don’t!

AI is just going to get better and better. I would recommend maybe staying away from proprietary interfaces and try out your own homebrewed solutions. It’s much more satisfying when it works right, and it gives a lot of control.

In the meantime OpenAI will continue to improve their models.

I’m hoping that soon they will release o1 with tools. That will be a spectacular day.

shoaibqadeer · December 11, 2024, 5:49am

o1 struggles at following instructions correctly. even with XML Tags, it’ll fail

Foxalabs · December 11, 2024, 5:51am

Just to add to this thread, o1-mini has been finetuned with coding tasks in mind and will produce longer outputs, faster and with greater accuracy in general. Worth trying the 01-mini model for coding tasks, I;ve been plesently supprised with it’s speed and ability.

StefanF · December 11, 2024, 1:52pm

o1 is pretty awsome at reasoning and it can find tricky issues in code that no other models do. So I use the o1 to analyse problems in code and to generate instructions for how to solve them. Then I send the instructions to the o1-mini, which gladly provides miles with code to solve the issues… That way they complement each other, but you have to do it in two steps. Would be better with one model that could do it all, but but…

merefield · December 11, 2024, 2:03pm

I believe o1 is a sign things are really slowing down, because you can’t just brute force it with a basic model anymore … they are adding more “management” layers on top to coax the LLMs to behave as if reasoning.

There is load they can do to make the ecosystem better … e.g. some modes of operation and tools are in their infancy. I have confidence things like “distillation” will get more flexible and fully featured.

But my sense is telling me the LLMs are peaking and they have inherent limitations in any case as we know.

adityakhetarpal235 · December 12, 2024, 8:15pm

agreed, if now o1 pro is the original o1 preview, it wasn’t good enough to cost $200 lol…it was better but definitely not worth the $200 price tag

anon25271712 · December 12, 2024, 9:25pm

o1 pro > o1-preview

that’s at least how I see it

bitbytebit · December 13, 2024, 10:09pm

o1 pro is better than o1 preview in my use. o1 pro seems like they did something in addition to help craft the concise best response plus minimal thinking shown. o1 outputs full code, won’t hold back, sometimes just outputs the code much more serious in behavior than other LLM models I’ve so far. o1 pro can get stupid for a moment though and is very hard headed when it thinks something and you have to really emphasize a few times that it made a mistake. It’s annoying for that reason at times yet really powerful and very slow at reasoning compared to o1 preview. It can sit there for a very long time churning and also it is highlighted and fancier looking too but that is just bells and whistles.

o1, the plus version, just is awful. I don’t know who could pass that off as the o1-preview or better!

Topic		Replies	Views
Day 12 of Shipmas: New frontier models o3 and o3-mini announcement Community shipmas	71	8456	December 26, 2024
New reasoning models: OpenAI o1-preview and o1-mini Announcements	114	14756	September 28, 2024
Comparing GPT-4 to GPT-4o API gpt-4	4	1856	May 14, 2024
GPT-4 vs GPT-4o? Which is the better? Community gpt-4	81	264929	June 8, 2025
Are GPT writers a waste of time? GPT builders	17	1809	December 11, 2024

O1 not as good as o1-preview for problem solving

Related topics