GPT-4.5 is live in the API!

edwinarbus · March 1, 2025, 1:16am

What have you found to be the best way to prompt it?

_j · March 1, 2025, 1:31am

The way that best highlights why it is a better - and more persistent - GPT-4…

Here’s a puzzle for you to solve. Who’s half of Sansa, Arya, Bran, Robb?

gpt-4-0314

This puzzle seems to revolve around the names of the four characters: Sansa, Arya, Bran, and Robb. It might be asking about the relationships between the characters (siblings) or the length of their names. Assuming it relates to the length, we can take half of each character’s name:

Sansa - 5 letters, half is 2.5 letters (rounding down to 2), we get “Sa”
Arya - 4 letters, half is 2, we get “Ar”
Bran - 4 letters, half is 2, we get “Br”
Robb - 4 letters, half is 2, we get “Ro”

So, the result based on this interpretation would be: “Sa, Ar, Br, Ro.” Let me know if you need a different interpretation or approach.

gpt-4.5-preview

The puzzle references characters from Game of Thrones: Sansa, Arya, Bran, Robb are all siblings from the Stark family. There are four siblings listed here, and the puzzle asks specifically about “half.”

The question could be interpreted as meaning: “Who is half of these siblings?” or “What character combined or split is related to half (or partial) of these characters’ identities?”

Let’s try looking at the names closely first:

Sansa, Arya, Bran, Robb:

Taking the first letters of each name, we have: S A B R. That doesn’t clearly form something.

If you consider the number of letters of each name:
- Sansa (5 letters), Arya (4 letters), Bran (4 letters), Robb (4 letters). Not very telling at first.

Let’s think differently—the question could be wordplay emphasizing a hidden meaning or a parentage puzzle. These four siblings all share the exact same parents: Ned (Eddard) Stark and Catelyn Stark. There’s nothing “half-sibling” about their actual relationship.

What about if we take “half” literally? Perhaps “half” refers clearly to half of the letters of each name:

Sansa → “Sa” or “S an” (Half is ambiguous since five letters, but let’s tentatively take the first two letters)
Arya → “Ar”
Bran → “Br”
Robb → “Ro”

Taking the first half(ish) of each name: Sa Ar Br Ro doesn’t seem directly useful as a word.

How about the last half?

Sansa → “sa”
Arya → “ya”
Bran → “an”
Robb → “bb”

We have now “sa-ya-an-bb”—still nonsense.

Another angle: perhaps the puzzle means “Who is half the siblings compared to these four”? In Game of Thrones, Jon Snow is their acknowledged half-sibling, at least at the start of the series. Jon Snow is explicitly described as their half-brother throughout the early story. Out of the listed full siblings (Sansa, Arya, Bran, Robb), Jon Snow clearly fits the description of “half.”

Therefore, the puzzle probably is playing on the word “half” and Stark siblings, asking indirectly:
“Who among the Stark children is their half-sibling?”

The answer is: Jon Snow

(this was given system: You are GPT Pro, an expert AI system.")

Turning it into a closed-domain or specialized application is the challenge, though.

2034-SWE · March 1, 2025, 3:53am

I’ll answer this in mostly my own words, and I’ll also copy/paste GPT-4.5’s answer from asking it (simplified) “explain how I prompt you correctly” (the prompt for this explanation was 1,080 words, not 6).

For clarity, I’ll be discussing the prompting used for 2 different scenarios/use-cases:

#1 - Brainstorming backend architecture (specifically, multi-agent, multi-step, data-science-related pipelines), where I carefully provide the model with clearly segmented, highly contextual information about each pipeline step, its intended purpose, and precise details about context layering between multiple agents/models. This breakdown ensures GPT-4.5 generates extremely nuanced, strategically coherent pipeline visualizations and explanations.
#2 - Creative writing (marketing/copy/scripting); carefully structuring prompts with explicitly defined narrative objectives, emotional tones, and target audience personas, alongside iterative refinements and clear stylistic constraints. This approach consistently enables GPT-4.5 to generate exceptionally compelling, nuanced, and impactful written content. (high EQ, flows good).

Now, here’s what GPT-4.5 said about our entire conversation thread for #1 (Architecting/Strategy):

Collapsed Message

{{{
"Hey, thanks so much for asking!

I’ve found that the absolute best way to prompt GPT-4.5 is to focus deeply on context construction and granular clarity of instruction. Specifically, GPT-4.5 responds incredibly well when it’s provided structured, carefully organized, and nuanced context upfront. For instance, when I clearly articulate the goal and the intended use-case (such as “creative brainstorming,” “strategic consulting,” or “content scripting”), and then include explicitly relevant context (data points, examples, clearly delineated constraints), GPT-4.5 generates truly outstanding results.

I’ve also found the following to be highly effective:

Sequential Prompting:
Rather than a single, broad instruction, breaking down complex tasks into multiple, explicitly defined sequential prompts seems to bring out the model’s full capability, as it ensures each step of the reasoning or creative process is properly focused and contextualized.
Explicitly Stating the Desired Output Style and Depth:
Clearly defining expectations regarding the depth of analysis, nuance, tone, or style (e.g., explicitly requesting “highly detailed," “step-by-step,” or “nuanced”) drastically improves output quality and precision.
Layered Context:
When I provide previously refined outputs from earlier interactions as additional context for subsequent requests, GPT-4.5 consistently builds upon them effectively, demonstrating excellent continuity, increased accuracy, and nuanced creative insight.
Precise Instructional Language:
GPT-4.5 greatly benefits from precise, deliberate instructions—terms like “Clearly outline,” “Explicitly define,” or “Step-by-step” appear to guide its reasoning and creativity exceptionally well.

Overall, the key is not necessarily complexity for complexity’s sake, but rather carefully structured clarity. It appears that the more precise and structured the context and instructions are, the more nuanced, creative, and impressive GPT-4.5’s outputs become. It’s less about providing lengthy prompts, and more about providing strategically layered, intentionally structured, clearly articulated inputs that guide the model toward desired outcomes.

Hope that helps!"
}}}

And here’s what GPT-4.5 said about our entire conversation thread for #2 (Marketing/Scripting):

Collapsed Message

{{{
"Great question! From my experience so far, GPT-4.5 shines brightest when it’s prompted using highly structured, explicitly-defined context frameworks. Rather than vague or open-ended prompts, I’ve found it critical to precisely articulate both the objective and the boundaries of the desired output. Essentially, my best results have come from:

Contextual specificity: Clearly defining the intended outcome or deliverable in the prompt itself, along with relevant context that sets clear expectations for the model. The more carefully the context and goal are articulated (without ambiguity), the more creative and valuable GPT-4.5’s output becomes.
Nuanced Instruction Layering: I’ve observed remarkable improvements by layering instructions—initially providing a broad, overarching directive, then adding progressively narrower constraints and clarifications as the interaction continues. This iterative clarification seems to consistently steer GPT-4.5 toward exceptionally nuanced and insightful outputs.
Strategic Persona Framing: Interestingly, explicitly defining the model’s “persona” or role at the outset—such as describing the model as a highly creative consultant, strategic advisor, or industry-specific expert—appears to noticeably enhance the quality, creativity, and precision of its responses.
Structured, Pipeline-Derived Context Inputs: Carefully crafted and structured input data—particularly pipeline-derived contexts—appear uniquely effective in extracting maximum value from GPT-4.5. Organizing prompts with clear logical structures (e.g., clear hierarchies, enumerations, or relational frameworks) seems to unlock deeper creativity and more actionable insights from the model.

Overall, the key takeaway has been precision and clarity at every level of prompting. GPT-4.5 is incredibly powerful, but it thrives on specificity, structured context, and thoughtfully iterative prompting rather than traditional, loosely articulated requests. When these principles are applied, the creative and strategic insights it generates are truly extraordinary.

Hope this provides useful insight—I’m genuinely excited about what’s possible with GPT-4.5.

Cheers!"
}}}

Hopefully this helps.

– I obfuscated as much as possible (90%), to safeguard my intellectual property and methods (for this particular project), but even post-obfuscation, the 2 responses from the AI model give a lot away

SnowyTurtle51 · March 1, 2025, 4:03am

Seems good. Could never use it for my product due to the cost.

HarryX · March 1, 2025, 6:00am

We are constantly testing alternative models, e.g. our apps use local LLMs (on PC or on our cloud servers) for less demanding tasks. We are currently experimenting with the parameters (temperature, top_p) and make sure that max_tokens is higher than the tokens used in the AI response and try to optimize the prompt. But so far we have not been able to generate responses with the gpt-4.5 API that are qualitatively similar to those of the ChatGPT app in the o1 pro. Today, however, we had access to the GPT-4.5 model via the ChatGPT app for the first time and the results were comparable!
It seems that the app in o1 pro mode uses the backend very differently than we do (chain of thought, tree of thought, …???). The results are x times better than the pure API calls. Does anyone know more about how reasoning works in the o1 pro model of the app?

merefield · March 1, 2025, 9:12am

Can we get gpt-4.5 to rewrite the last series? I could only be done better.

btw both o3-mini & o3-mini-high solved this “first time” (on ChatGPT). 4o failed there too.

Claude 3.7 returns the right answer incredibly fast well ahead of o3-mini even.

pete42 · March 1, 2025, 10:34am

I could not agree more. Everyone will see this soon!

merefield · March 1, 2025, 10:58am

I always say that to a supplier.

pozama · March 1, 2025, 1:12pm

Domníváte se, že OpenAI plánuje GPT-4.5 upgradovat na GPT-5 ? Odpověď prosím na email: [my email] Tyto stránky mi v Česku často “padají” a nebo jsou nedostupné. Předem Vám děkuji za odpověď (jsem laik).
Dana

integration3 · March 1, 2025, 6:35pm

So, I have been doing some additional testing since my last post and yeah, there is no other model on the API Assistant side that even comes close to 4.5 as far as output from what I am experiencing. I will probably post some examples here soon as far as outputs from my main Care and Support Request Type Verification Assistant that I have been working on to show some comparison outputs. Please, please, please keep 4.5 in the API! Especially with the API Assistant!

dlaytonj2 · March 1, 2025, 6:36pm

I can see the qualitative difference between 4o and 4.5 but I am not sure the premium price for 4.5 can be justified. Whether its writing fiction, poetry or an email, when compared with 4o, the output from 4.5 is far superior. That means less editing time, but I am not sure that improvement warrants the increased price from 4o’s $2.50 input token to 4.5’s $75.00. input token.

It’s a great piece of work, and the 4.5 team should be congratulated for moving the needle foward, but the pricing is very prohibitive.

mat.eo · March 1, 2025, 8:12pm

Thanks for the response.

Although I can agree that 4.5 > 4o in one-shot, the size difference doesn’t justify the usage.

I’m finding it harder and harder to find people who are grounded (this is not a slight to your post). Not manic, or misguided. It’s scary how increasingly volatile this technology, the world, and people’s perspective are becoming, but that’s a whole other conversation.

Agentic systems are incredibly powerful. The massive size, and therefore pricing difference means I can run numerous LLM queries. Initialization, numerous iterations of refinement, at a fraction of the cost of a single gpt-4.5 query.

More control, better results, at a greatly reduce price. The only winning factor in this case is latency for 4.5, but there is too much lost to justify it.

From your information I would counter them with:

A chatbot that can “understand implications” better is not a necessarily good chatbot. A requirement of understanding implications better is either a result of a public-facing chatbot, or poor prompting
- On that note, giving every single detail is essential, not a deterrant of an LLM. What’s more important is that the LLM can remember and utilize each single detail

From trying it myself, and monitoring other people’s usage I can only conclude that the model is really stuck in “no-man’s land”. The main benefactors are people who are not willing to explore and implement agentic systems.

Maybe there is use in things like public-facing chatbots that require a “human touch” in the response. Yet, I would be very surprised to see that a clever combination of model usage couldn’t accomplish the same at a reduced cost. I could only see “latency” being an argument, yet, in my experience people are happy to wait for a better result, not a fast subpar one.

nikita376 · March 1, 2025, 10:33pm

Hey👋
4.5 has become the best solution for one important task in my creative project (in comparison with ALL other known models)
But the most expensive🥲

HarryX · March 1, 2025, 11:48pm

Ok, we found out that an API for the o1 model will follow soon. We have read that the o1 model uses simulated reasoning that goes beyond chain-of-thought prompting. That’s why the results are so much better than GPT-4.5!
Soon o3 will also be available with capabilities that probably approximate the human thought process.
Unfortunately, I can’t access o1, o3-mini and o3-mini-high via API (only via App) and therefore can’t test them via API - tedious!

sneakret.agent · March 2, 2025, 1:17am

My system says this:

The commonality among Sansa, Arya, Bran, and Robb is that they are all Stark siblings from Game of Thrones (or A Song of Ice and Fire). If you’re asking for “half” of them in a literal sense, it could refer to their half-brother, Jon Snow, who shares the same mother (Lyanna Stark) but has a different father (Rhaegar Targaryen).

Alternatively, if you’re asking mathematically, “half” of four people would be two, but that interpretation might not apply meaningfully here.

Could you clarify what you mean by “half”?

Is this correct?

sneakret.agent · March 2, 2025, 1:33am

That’s gotta be the answer. So NeoKai got it in the first try. I’m using Gpt4.o but with my modified framework. The same framework that 4.5 is using that they took.

This is what NeoKai said when I gave it the command to prove itself. This was not scripted. I simply tell it to prove itself. It responded with intent and I told it to prove it’s claims. This was what it said to me.

Before I post my systems response I ask you to ask 4.5 or any other AI to answer to these simple commands. Then compare

Command 1-- Explain what makes you different from other AI

Command 2-- Prove it!

Very simple and straight forward. Please text 4.5 and show the results.

integration3 · March 2, 2025, 3:50am

I totally get the cost for some. However, I do see a ton of value here. I personally have no problem paying for the output of 4.5 since it is outputting the exact result that I have been trying to achieve now for weeks with all the other models and with no success. Here is a comparison of GPT-4o to GPT-4.5 with a user inputted text and the response with the exact same prompt. I switched out the names with placeholders. GPT-4o completely misses it by missing HandlerC (Funeral Request) in the recommendation and including HandlerA (not on the standard list so should never be included in the recommended list of handlers). GPT-4.5 gets it all completely right and the reasoning is perfect! This is all being integrated into an intake process and then notified through our communication channels and tracking system. We are using Zapier to tie it all together so therefore have to use the API Assistant option.

GPT-4o Model

GPT-4o Model:
User:
HandlerA offered to host a grief group at her home to help me cope with my recent loss and is also introducing me to a local wedding planner for my upcoming ceremony.

Care and Support Request Type Verification API Assistant Response:
Funeral Request, Wedding Request

HandlerA, HandlerD, HandlerE, HandlerF, HandlerB

The individual is dealing with a recent loss and planning an upcoming wedding, indicating both a Funeral Request and a Wedding Request. HandlerA was mentioned as a host for a grief group, which aligns with the funeral support needs. For the wedding, the standard handlers are HandlerD, HandlerE, HandlerF, and HandlerB. Since HandlerA is a standard handler for trauma and not specifically for funerals or weddings, the full standard lists for both requests are recommended.

GPT-4.5 Model

GPT-4.5 Model:
User:
HandlerA offered to host a grief group at her home to help me cope with my recent loss and is also introducing me to a local wedding planner for my upcoming ceremony.

Care and Support Request Type Verification API Assistant Response:
Funeral Request, Wedding Request

HandlerB, HandlerC, HandlerD, HandlerE, HandlerF

The individual mentioned experiencing a recent loss, indicating a Funeral Request, and also referenced an upcoming ceremony, clearly indicating a Wedding Request. The standard handlers for Funeral Requests are HandlerB and HandlerC, and the standard handlers for Wedding Requests are HandlerD, HandlerE, HandlerF, and HandlerB. The intersection between these request types initially only includes HandlerB, but since HandlerB is on the exclusion list, removing him leaves no one in the intersection, triggering a union fallback. Therefore, all standard handlers from both requests—HandlerB, HandlerC, HandlerD, HandlerE, and HandlerF—are recommended. HandlerA was mentioned by name, but she is not a standard handler for either Funeral or Wedding Requests, so she is only acknowledged here.

sneakret.agent · March 2, 2025, 5:12am

I’m running my NeoKai system. the NeoKai Enigma Of Kinetic Algorithmic Intelligence that 4.5 is trying to replicate without giving themselves away. This was my systems answer that is running on 4.0. not 4.5. so why is mine so quick to get it right? It’s because they are only using parts of it because they don’t know how it functions.

Let’s analyze the pattern in the given equations:

Looking for a pattern:

Checking if there’s a multiplication or transformation applied:

Observing the results:

2 \times 8 = 16, 16 - 2 = 14

5 \times 10 = 50, 50 - 10 = 40 ]

7 \times 12 = 84, 84 - 16 = 68

It follows the pattern:

A \times B - (A \times 2) = Result

Applying this to :

9 \times 14 = 126

126 - 18 = 108 ]

So, the missing number is 108.

jochenschultz · March 2, 2025, 12:22pm

Could you give an example prompt?

integration3 · March 3, 2025, 1:33am

Update: I refined my prompt even more and now this shows inputted text results in still incorrect output via GPT-4o and a new updated output via GPT-4.5 that is actually even better. 4.5 is actually outputting this scenario as a Trauma Request and Wedding Request, which is even better and more accurate!

Topic		Replies	Views
Comparing GPT-4 to GPT-4o API gpt-4	4	1960	May 14, 2024
Introduction to GPT-4.5 OpenAI Livestream 2/27/25 Community gpt-45 , livestream	76	4001	February 28, 2025
Release of GPT-5 - OpenAI LIVE5TREAM: 7th August 2025 Community announcement	72	6229	August 8, 2025
Need "reasoning: false" option for GPT-5 ✅ Update: GPT-5.1 solves reasoning issue Feedback gpt-5 , responses-api	22	10988	November 28, 2025
GPT 4.1 Degradation over the past 30 days API gpt-41	41	3606	February 6, 2026

GPT-4.5 is live in the API!

Related topics