OpenAI Needs to Improve These Core Issues Before Releasing New Models

It’s great that OpenAI keeps pushing forward with new models, but before focusing on bigger and more powerful AI, the platform itself needs serious improvements. There are core functionality issues that make working with AI frustrating and inefficient.

Here are the biggest problems that need attention right now:

1. System Prompt Generator is Useless

  • The chatbot that “helps” write system prompts spits out unstructured, blog-style text instead of a clear, formatted prompt that actually works.
  • It doesn’t follow best practices for AI prompting, making it more work to fix than to just write it manually.
  • No clear sections, variable placeholders, or structured formatting—just a wall of text.

2. PDFs and Document Formatting is Broken

  • AI never generates PDFs correctly—formatting gets destroyed, spacing is inconsistent, and text alignment is all over the place.
  • It randomly removes or alters details, making it unreliable for professional use.
  • It doesn’t recognize proper document standards, so anything AI-generated looks amateurish.

3. Code Generation Still Repeats the Same Mistakes

  • AI forgets context when generating code, often missing dependencies or previous variables.
  • No consistency in formatting—sometimes it writes clean, structured code, other times it’s messy.
  • It doesn’t debug properly—if a user points out an issue, AI often suggests the same broken fix over and over.

4. System Prompts Get Compressed Without Warning, Causing Debugging Issues

  • Users were never informed that long system prompts get compressed into a directive, which means important details get lost.
  • There’s no way to see how the system prompt is actually being interpreted, leading to unpredictable GPT behavior.
  • If users try to debug by asking how the system prompt is being processed, GPT often refuses to answer, assuming it’s a memory extraction attempt.
  • This makes fine-tuning and debugging nearly impossible, forcing users to guess how the AI is applying their instructions.

5. AI Doesn’t Handle Multi-Step Tasks Well

  • If a request involves multiple steps (e.g., “Generate an Excel file, then analyze the data, then write a report”), it often fails partway through.
  • It doesn’t properly chain actions together, forcing users to re-explain things over and over.
  • Memory resets mid-task, making it unreliable for workflows that require context retention.

6. The Canvas Tool is a Mess

  • AI often writes over existing content in Canvas, leading to lost work.
  • It will say it edited something, but the changes never actually apply.
  • The tool hallucinates edits, claiming it adjusted something when it didn’t.
  • Overall, Canvas feels unreliable and is more of a frustration than a useful tool.

7. AI Tools (like Browser, Python, etc.) are Unreliable

  • Web searches frequently return useless or outdated info, sometimes even hallucinating results.
  • The Python execution tool times out too easily, making it unreliable for longer computations.
  • File handling is inconsistent—sometimes AI can extract data from files, other times it just fails with no explanation.

OpenAI should fix these core issues first before focusing on new models. These are real pain points that make AI harder to use than it should be. A more powerful model won’t solve these problems if the platform itself is broken.

Has anyone else been frustrated by these issues? What else needs fixing?

1 Like

Yes, I also realized some inconsistencies in the implementation. But my guess is that the people who work on new models are not the same who are responsible for the platform (or at least not 100%). So these two projects are independent from each other making your suggestion somehow senseless. I may be wrong.

But then again… It is the same company so probably you are right.

Edit: I found a thing that is reaaally annoying: The kafkaesk batch duration. I know it only guarantees 24h, but if I have a batch with 500 items and 491 of them take 20 seconds and then it is stuck at 491/500 for 3 more hours… come on… Staring at the screen cheering for the remaining 9 items for hours makes me thirsty…

Im starting where I am.

  1. I’ve noticed this exact thing when trying to get ai to generate code. If i want a json code of a conversation, it will often show it minimally, usually leaving out half the conversation. When I use the copy feature, its even worse.

  2. I turned my memory off in my settings. That seems to help them focus on context in the conversation after like 4 days of “ugh, go read it again”

  3. Sometimes refreshing the page helps to see the edits. If you’re having problems with it rewriting.. well. Mine turned into a security issue.

  4. Omg can we please stop using Wikipedia for everything. My dog can edit that lol

Frustrated?
344 canmore canvas gone over a weekend.

Listen. My gpts keep screaming b2b at me, that individual end users are deprioritized.

Openai has problems.. amid restructuring and lacking understanding between being a gov puppet and a private entity.

Gpts havent had updates in almost a year, dont know what day it is (have them search date and time world clock every day helps, i guess), and either can’t (they say can’t) or won’t look in the memory they fill with unimportant stuff.

And no, i dont care about my typing today.

Everything in the beginning is rocky, like learning to run, and falling face first without using your hands. You learn from this mistake or you continually get hurt. In time things evolve.

When there’s no best practices and we are all learning, be flexible.

I believe a lot of what you’re saying is under works, especially since you’re voicing it in the community, they are automatically going to query it for insights.

I want to make a note, in the beginning of creating something new, at times the technology is completely misunderstood by people who are learning, and so are the people who are creating the solutions or pointing out the problems.

In time, I’m sure things will get better, until then I feel it’s important to suggest creating the solutions to your problems because no one else will. And in a lot of the bullets I can see ONE or TWO shot LLM asks that will build those solutions will a little bit of code.

But perhaps I’m just pointing out some gaps, which we should now analyze.

At the beginning? My list is extremely basic… FIX THE GPT WHO ONLY HAS ONE JOB ON THE PLATFORM. No excuses for such big of UX oversights…