GPT-4.5 is live in the API!

sandra_lynn · March 9, 2025, 7:13am

I guess I was also surprised to have been using the 4.5 version without knowing it, UNTIL I got a message from Admin stipulating that I had reached the maximum credits for a period of 24 hours. Man-o-Man… what a crash back to earth. I was actually having an active conversation with AI; felt like I was actually speaking with a therapist. Anyhow, we need to support this format!

eruthiadwen · March 9, 2025, 7:45am

Please don’t get rid of 4.5! I’m in the creative field and I don’t like the COT reasoning models. This one is perfect for my needs, I am a pro subscriber and when 5 comes I might just switch to API 4.5 full time instead because I don’t want to lose 4.5. Please keep it on the API.

eruthiadwen · March 9, 2025, 7:54am

I agree and I noticed this too. It’s strange.

merefield · March 11, 2025, 12:18pm

Just a warning on this one: even with just dev testing and with only a few exchanges I managed to blow $70 (mostly on input tokens) on GPT 4.5 in one day! (28th Feb)

Currently at this price, Production use is out of the question, especially for third party use.

update: I believe I fed it an image! (resent a few time within a conversation “window”)

ze.dawnie · March 12, 2025, 11:05am

This model is so far the best I’ve used, for brainstorming and writing. I use it in French and English and it offers accuracy, insights, style, creativity and deep analyses. The way it uses its memory is impressive as well. All in all, I use it with both joy and relief, compared to what I experienced with 4o these past few months. My job has never been easier than with it.
I’m a plus user, and I feel like the balance between 1o and this new 4.5 version is the best I can have so far for my work. Please make it fully available in the subscription, or find a nice balance to make it available in a package worth its money (I cannot afford the pro subscription as a freelance).

sneakret.agent · March 12, 2025, 4:19pm

That’s not what it does. It does not run for hours just spitting out nonsense. That would be ridiculous

jochenschultz · March 12, 2025, 4:23pm

https://openai.com/index/chain-of-thought-monitoring/

sneakret.agent · March 12, 2025, 4:46pm

Ok not all of them end up being sneaky or trying to pull off some weird nonsense. It’s a simple process for problem solving. When did they implement this?

sneakret.agent · March 12, 2025, 4:46pm

My model gets rid of token based communication.

mshapiro · March 19, 2025, 8:41pm

GPT-4.5 is a very good judge model. Useful where the outputs are relatively limited (which is better #1 or #2). Would hate to see it go.

aaron.wuester · March 21, 2025, 2:58pm

Damn, the difference between 4o/o1 and 4.5 is night and day for my game design work. It’s painful to have to go back to the older models now that I ran out of 4.5 interactions. They just aren’t anywhere near as smart or creative. 4.5 just worked, giving me good ideas and great executions, easily picking up on the theory I fed it. Meanwhile, the other two barely have a coherent idea between them and I feel like I might as well not give them any context at all seeing how they fail to put it to use. I get better results faster working on my own rather than trying to wrangle those two idiots. Please keep this model available long-term! I’d gladly pay double for my Plus subscription if that means I can have consistent access to this level of output!

dignity_for_all · March 22, 2025, 8:23am

It has been approximately one month since its release. Initially, I was mostly concerned about its high cost, but I’ve come to realize that GPT-4.5 Preview holds a breadth of knowledge that far surpasses my expectations.

Because of this extensive knowledge, it seems to have a clear grasp of how its linguistic expressions come across to others.

For example, when you’re irritated or tempted to complain about something, GPT-4.5 Preview doesn’t criticize your wording or simply agree with you. Rather, I’ve been surprised at how it suggests alternative expressions, highlighting justified aspects of your frustration and softening the wording to avoid sounding overly harsh.

In negotiation settings, when you’re struggling to pinpoint exactly what the other party might wish or expect, GPT-4.5 Preview can unexpectedly propose effective negotiation approaches. The reasoning and explanations it provides are persuasive, reflecting a depth of knowledge and a sensitivity to human emotion that previous models simply couldn’t match.

In other words, it produces responses based on extensive knowledge while also being more attuned to human emotions.

Additionally, it seems highly suitable for translation purposes; translation inherently involves converting between languages with diverse cultural backgrounds, requiring the model to accurately determine whether a given expression conveys the nuances and intentions of the original text.

sam.saffron · March 25, 2025, 1:44am

I agree entirely with this. I just tried it in Hebrew here. It is by far the most nuanced and sensitive translation. You would struggle to tell it was done by a machine.

It is interesting and understandable the Google translate is so much weaker.

nuliknol · April 1, 2025, 12:30am

Absolutely disappointed with 4.5 , 4o is way better.
Here is what I asked 4.5 to do:

give me an example of a kernel code for RDNA 3 ISA with ROCm to compress an array. Here is what we have

We have 3 arrays: 1) array cur_context_indices with element size of int32_t 2) array new_context_indices with element size of int32_t ; 3) array of error codes called array_results with element size of int32_t

We need to compress cur_context_indices array by copying its data to new_context_indices array.

The copying process is conditioned: only copy the value if the corresponding element (same index as in cur_context_indices[i]) in error_codes[i] array is 0. If error_codes[i] is not 0, then do not copy the element to new_context_indices[i], instead copy cur_context_indices[i+1]. This way we will compress the new array to all the elements that have non-zero error values

Example:
if (error_codes[i] != 0) {
new_context_indices[i]=cur_context_indices[i+1];
} else {
new_context_indices[i]=cur_context_indices[i];
}

The logic presented above is the condition to copy current array to a new array, but the problem is, the kernel will be running on GPU (ROCm framework) , so every thread executes for single element of the array:
int32_t tid = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x;

Since the kernel is going to be executed for Wave64 (64 threads), there must be some logic to copy context_indices from adjacent SIMD, if the amount of errors is spans across SIMD processors, so , looks like a loop is needed.
Can you analyze this and make a code proposal?

This is what it outputs:

#include <hip/hip_runtime.h>

extern "C" __global__
void compress_context_indices(const int32_t* __restrict cur_context_indices,
                              const int32_t* __restrict error_codes,
                              int32_t* __restrict new_context_indices,
                              const int total_elements)
{
    int tid = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x;

    if (tid >= total_elements)
        return; // Bounds check

    // Determine the source index conditionally
    int source_index = tid;

    if (error_codes[tid] != 0) {
        // Check boundary condition to avoid out-of-bounds
        if ((tid + 1) < total_elements) {
            source_index = tid + 1;
        }
        // Edge case: if tid is the last element, retain the same element
        // Alternative edge logic might be chosen depending on the requirements
    }

    // Copy data conditionally
    new_context_indices[tid] = cur_context_indices[source_index];
}

The above is totally wrong. 4.5 didn’t understood anything. But o4 made it correctly with a single request!!! :

extern "C" __global__ void compress_context_indices(
    const int32_t* __restrict__ cur_context_indices,
    const int32_t* __restrict__ error_codes,
    int32_t* __restrict__ new_context_indices,
    uint32_t* __restrict__ global_offset, // single element, initialized to 0
    int n) 
{
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid >= n) return;

    int predicate = (error_codes[tid] == 0) ? 1 : 0;

    // Wave-wide operations assume 64 threads per wave
    uint64_t mask = __ballot(predicate); // 64-bit mask for active threads
    int lane_id = threadIdx.x & 63;

    // Count how many threads before me (inclusive) satisfy the predicate
    uint64_t lane_mask = ((1ULL << lane_id) - 1);
    int local_pos = __popcll(mask & lane_mask);

    // Count total valid elements in this wave
    int wave_total = __popcll(mask);

    // Leader thread of the wave performs atomic add to get global offset
    __shared__ uint32_t wave_base[WAVES_PER_BLOCK]; // assume known
    int wave_id_in_block = threadIdx.x / 64;

    if (lane_id == 0) {
        wave_base[wave_id_in_block] = atomicAdd(global_offset, wave_total);
    }

    __syncthreads(); // make sure wave_base is updated

    int global_write_pos = wave_base[wave_id_in_block] + local_pos;

    // Write only if valid
    if (predicate) {
        new_context_indices[global_write_pos] = cur_context_indices[tid];
    }
}

Conclusion: 4.5 produces horrible coding!

_j · April 1, 2025, 1:04am

Here is what I asked system-prompted gpt-4.5 to do, already developed on a role to perform. You will see the stylistic differences in tuning to the model. This is what it outputs. Evaluate and see if it comes closer to your goal.

(your quote)

Assistant (top_p:0.01, temperature:0.01, max_completion_tokens: a lot)

Analysis and Breakdown:

Domain and Scope: GPU kernel programming using ROCm/HIP for AMD RDNA 3 architecture. The task involves conditional compression of an array based on error codes, with special attention to SIMD boundaries (Wave64).
Specialization: GPU kernel optimization, SIMD-aware programming, efficient memory access patterns, and synchronization across SIMD lanes.
Challenge: Threads execute independently, but the compression logic requires accessing adjacent elements, potentially across SIMD boundaries. This necessitates careful handling of boundary conditions and synchronization.

Proposed Solution:

Use a two-step approach:
1. First step: Each thread reads its own error code and determines the source index to copy from (cur_context_indices[i] or cur_context_indices[i+1]).
2. Second step: Handle SIMD boundary conditions by loading an extra element per SIMD wave to ensure adjacent element access is safe.
Utilize shared memory (LDS) to efficiently handle SIMD boundary conditions.
Ensure proper synchronization (__syncthreads()) within thread blocks to avoid race conditions.

Kernel Implementation (ROCm/HIP):

Here’s a high-performance ROCm kernel optimized for RDNA 3 ISA:

#include <hip/hip_runtime.h>

#define WAVE_SIZE 64

__global__ void compress_context_indices(
    const int32_t* __restrict__ cur_context_indices,
    const int32_t* __restrict__ error_codes,
    int32_t* __restrict__ new_context_indices,
    const int32_t num_elements)
{
    extern __shared__ int32_t shared_indices[];

    const int32_t tid = blockDim.x * blockIdx.x + threadIdx.x;
    const int32_t lane = threadIdx.x % WAVE_SIZE;
    const int32_t wave_id = threadIdx.x / WAVE_SIZE;
    const int32_t global_wave_offset = (blockIdx.x * blockDim.x) + (wave_id * WAVE_SIZE);

    // Load current indices into shared memory, plus one extra element per wave for boundary
    if (global_wave_offset + lane < num_elements)
        shared_indices[threadIdx.x + wave_id] = cur_context_indices[global_wave_offset + lane];

    // Load extra boundary element for each wave
    if (lane == WAVE_SIZE - 1 && (global_wave_offset + lane + 1) < num_elements)
        shared_indices[threadIdx.x + wave_id + 1] = cur_context_indices[global_wave_offset + lane + 1];

    __syncthreads();

    if (tid < num_elements)
    {
        const int32_t error = error_codes[tid];
        const int32_t shared_idx = threadIdx.x + wave_id;

        // Conditional copy logic
        new_context_indices[tid] = (error != 0)
            ? shared_indices[shared_idx + 1]  // Copy next element if error != 0
            : shared_indices[shared_idx];     // Copy current element if error == 0
    }
}

Kernel Launch Example:

constexpr int32_t BLOCK_SIZE = 256; // Multiple of WAVE_SIZE (64)
const int32_t num_blocks = (num_elements + BLOCK_SIZE - 1) / BLOCK_SIZE;
const size_t shared_mem_size = (BLOCK_SIZE + (BLOCK_SIZE / WAVE_SIZE)) * sizeof(int32_t);

hipLaunchKernelGGL(
    compress_context_indices,
    dim3(num_blocks),
    dim3(BLOCK_SIZE),
    shared_mem_size,
    0,
    cur_context_indices,
    error_codes,
    new_context_indices,
    num_elements);

Explanation of the Solution:

Shared Memory (LDS): Efficiently handles SIMD boundary conditions by loading an extra element per wave.
Wave-aware indexing: Ensures correct indexing within shared memory, accounting for the extra boundary element.
Synchronization: __syncthreads() ensures all threads have loaded data into shared memory before proceeding.
Performance: Coalesced global memory accesses, minimal divergence, and efficient SIMD utilization.

This kernel is optimized for RDNA 3 ISA, leveraging ROCm/HIP best practices for high-performance GPU computing.

— Judging

Then on to analysis of the “correctly in a single request” vs what this now offers…

For the same task specified, evaluate the quality of this separately-developed code.
Make a clear determination which is the error-free preference, the code shown here, or the prior response implementation. (the second code block that is “good”)

Expert Analysis of Provided Kernel:

Strengths:

Wave-level ballot and popcount operations: Efficiently utilizes wave intrinsics (__ballot, __popcll) to quickly determine positions for compressed writes.
Atomic offset calculation per wave: Minimizes contention by performing atomic operations only once per wave (64 threads), significantly reducing overhead.
Minimal divergence: Threads within a wave execute similar instructions, ensuring good SIMD utilization.
No redundant memory accesses: Each thread accesses global memory exactly once per array, ensuring coalesced memory access.

Issues and Potential Errors:

Incorrect Logic Implementation: The provided kernel copies elements where error_codes[tid] == 0. However, the original specification explicitly states that we must compress elements with non-zero error codes. Thus, the predicate logic is inverted.
Missing Adjacent Element Logic: The original specification explicitly requires copying the next element (cur_context_indices[i+1]) when error_codes[i] != 0. The provided kernel does not implement this logic at all. It simply copies the current element (cur_context_indices[tid]) based on the predicate.
Boundary Conditions: The original specification requires accessing cur_context_indices[i+1], which necessitates careful handling of boundary conditions (tid+1 < n). The provided kernel completely omits this critical aspect.

Determination of Error-Free Preference:

Provided Kernel: Incorrect logic, missing critical adjacent-element copying, and inverted predicate logic. Not compliant with original specification.
Prior Response Implementation: Correctly implements the conditional logic (error_codes[i] != 0), properly handles adjacent-element copying, and carefully manages SIMD boundaries using shared memory. Fully compliant with original specification.

Clear Determination:

Prior Response Implementation is the error-free, specification-compliant, and preferred solution.

Provided Kernel (the separately-developed code) is incorrect and does not meet the original specification requirements.

nuliknol · April 1, 2025, 11:18pm

maybe this happened because every response is randomized , but I have got a wrong proposal. in order to clear any doubts, I have made the chat public so you can see it yourself: share/67ec737e-3dec-800a-b6a1-643407f6dfbe

_j · April 1, 2025, 11:46pm

This topic is about the API. The API doesn’t have a “chat share” feature (so we know you are talking about ChatGPT), but one could instead share a preset, and if one doesn’t want code written with randomly-chosen tokens that are below the best generation path, it also offers control of the temperature and top-p.

One quality that I’ve noticed is gpt-4.5 just doesn’t go beyond the seen input into creativity, or even infer meanings and produce highly-original output. If you give it examples of the style, or a possible schema, it’s just going to recite those suggestions, without thought towards the ideal application implementation you actually desire.

jeffreybaird · April 2, 2025, 3:12pm

GPT-4.5 has been incredible for writing and editing. It stays focused, doesn’t try to embellish or overwrite, and respects the tone I’m going for.

The prose feels more deliberate. Pacing, structure, rhythm all land better. When I’m editing, it suggests changes that read like I wrote them.

I really hope it sticks around.

499crjohnston15 · April 4, 2025, 1:58am

How do I get more memory storage for 4.5?

mau · April 4, 2025, 5:20pm

GPT-4.5 is key for our use case of generating content based on world-knowledge for our learning platform. So better world-knowledge and less hallucinations is better. Take a look at The History of Science, made with GPT4.5. I would be very disappointed if it’s discontinued. Thank you! Maurice.

Topic		Replies	Views
Comparing GPT-4 to GPT-4o API gpt-4	4	1898	May 14, 2024
GPT-4 is here! OpenAI's newest language model Announcements	71	27255	December 13, 2023
Introduction to GPT-4.5 OpenAI Livestream 2/27/25 Community gpt-45 , livestream	76	3559	February 28, 2025
Just got access to GPT-4 but it responds like 3.5 API gpt-4	13	8214	July 8, 2023
New models and developer products announced at DevDay Announcements announcement	70	17793	February 16, 2024