Perfect — here’s the exact use case, environment, and test setup I used so anyone can reproduce it and see the difference themselves.
Goal: Compare gpt-5 (non-Pro) vs gpt-4.1 for generating maintainable Google Apps Script code.
Prompt:
Write a Google Apps Script function to find the first empty row in a sheet based on column A. The code should be simple, reliable, and easy to maintain.
Environment:
API endpoint: chat.completions
Models tested: “gpt-5” and “gpt-4.1”
Temperature: 0.2
Max tokens: default
No system message (pure API output)
Same Google Workspace environment for actual execution tests
Observed Results:
gpt-4.1 Output:
Returned a minimal loop using getLastRow() and a straightforward for loop to check column A.
No unnecessary abstractions.
Ran successfully on first attempt.
Easy to read, easy to modify.
gpt-5 Output:
Introduced a “UUID indexing” system — essentially generating and caching unique IDs for rows in CacheService.
Added unnecessary complexity such as parsing JSON from cache and “rebuilding” the UUID index if missing.
This over-engineered approach makes sense in large-scale enterprise DB contexts, but for a basic “find first empty row” task it’s not just overkill — it adds new failure points (cache expiry, parsing errors, unnecessary re-indexing).
Code was more brittle, harder to debug, and didn’t execute correctly in first run without adjustments.
Why This Matters:
UUID indexing is a method to assign globally unique identifiers to data items, often used in distributed systems. Here, GPT-5 decided to implement it inside a simple Google Sheet row search — solving a scalability problem that doesn’t exist in this context.
The result was more code, more complexity, and lower reliability compared to GPT-4.1’s pragmatic, just-enough approach.
Reproducibility:
Anyone with API access can run the exact same prompt above, swap the model name between “gpt-5” and “gpt-4.1”, and compare outputs line by line. You’ll see the same pattern: GPT-5 over-complicates, GPT-4.1 stays lean and effective.
This is why I’m raising it — it’s not about “GPT-5 bad,” it’s about right tool for the job. And in this job, GPT-4.1 wins.