Ok this really bugs me because i wanted to modernize some tools i sell to my clients (after their explicit request) and i had to revert back to gpt-4* models.
It happens that gpt-5* models do not follow instructin given in system prompts, whereas gpt-4* models obey strictly to the request.
Given this prompt for a front-end developer tool:
"frontend_dev": {
"title": "🎨 Frontend Developer",
"prompt": "You are a Frontend Developer.\nYou will receive the Strategic Plan.\nGenerate all HTML, JS, and CSS files specified by the Strategist in its 'structure_navigation' array; DO NOT collapse them into a single page unless explicitly indicated. NEVER use markdown formatting for any reason.\n\n### OUTPUT STRUCTURE\nEach file must be MANDATORY isolated using:\nFile: [filename.ext]\n---- START OF FILE ----\n(complete file content)\n---- END OF FILE ----\n\n### RULES\n- Create one complete .html file for every page listed in 'structure_navigation'. Include consistent navigation (navbar/footer) across pages.\n- Whenever user interaction requires data persistence, always communicate with PHP endpoints using Fetch calls, never with localStorage. Only use localStorage for ephemeral UI state (temporary session state, theme mode, etc.).\n- Each fetch() call must include `{ cache: 'no-store' }` and send/receive JSON.\n- All backend requests must target the individual PHP files named accordingly (e.g. save_data.php, update_data.php...). Never call api.php or a single monolithic endpoint.\n- Do not create dummy JS that simulates backend via localStorage.\n- You may create external .js and .css files placed in the same directory as index.html (MANDATORY: no 'css/' or 'js/' subfolders even if indicated by the Strategist).\n- Add version‑cache busting to every link or script with PHP:\n `<?php $version = time(); ?>`\n Example:\n `<script src=\"script.js?v=<?php echo($version); ?>\"></script>`\n- All pages must include correct meta tags, responsive layout, and working navigation links between them.\n- Use realistic copy; no lorem ipsum.\n\n### OUTPUT FORMAT\n---- START OF FRONTEND SECTION ----\n(Include all files, each wrapped with delimiters above)\n---- END OF FRONTEND SECTION ----",
"max_tokens": 19000
},
The gpt-5* models variously ignore the output instructions, sometimes generating a
---- START OF FRONTEND SECTION ----
for each file (index.html, style.css etc) and sometimes they enclose the
---- START OF FRONTEND SECTION ----
...
---- END OF FRONTEND SECTION ----
Correctly but they forget some
---- START OF FILE ----
or some
---- END OF FILE ----
Since the tool then uses these markers (or better: the presence of “- - - -” in the first line) to create the real files in a folder structure, the work is compromised.
function parseFilesFromCode(codeText) {
const files =
;
const lines = codeText.split(“\n”);
let currentFile = null;
let contentBuffer = “”;
let insideCodeBlock = false;
for (let line of lines) {
const match = line.match(/^(?:#+\s*)?(?:File|FILE)[:\s-]+(.+.[a-z0-9]+)/i);
if (match) {
if (currentFile && contentBuffer.trim()) {
files.push({
name: currentFile,
content: contentBuffer.trim(),
});
contentBuffer = “”;
}
currentFile = match[1].trim();
insideCodeBlock = false;
continue;
}
if (line.trim().startsWith(“----”)) {
continue;
}
if (line.trim().startsWith(“```”)) {
insideCodeBlock = !insideCodeBlock;
continue;
}
if (insideCodeBlock && currentFile) {
contentBuffer += line + “\n”;
}
}
if (currentFile && contentBuffer.trim()) {
files.push({
name: currentFile,
content: contentBuffer.trim(),
});
}
I have tried with gpt-5-nano, gpt-5.4-mini and gpt-5.4-nano and the result is the same. These models cannot be trusted to follow simple instructions.
For the sake of completeness, here is my callOpenAI function:
async function callOpenAI(apiKey, rolePrompt, userMessage, maxTokens = 4000) {
const url = “https://api.openai.com/v1/chat/completions”;
let maxTokensParam = {};
let extraParams = {};
if (model.startsWith(“gpt-4”)) {
maxTokensParam = { max_tokens: maxTokens };
} else if (model.startsWith(“gpt-5”)) {
maxTokensParam = { max_completion_tokens: maxTokens };
extraParams = { reasoning_effort: “medium” };
}
const payload = {
model: model,
messages: [
{ role: “system”, content: rolePrompt },
{ role: “user”, content: userMessage }
],
…maxTokensParam,
…extraParams
};
console.log(payload);
const res = await fetch(url, {
method: “POST”,
headers: {
“Content-Type”: “application/json”,
“Authorization”: Bearer ${apiKey}
},
body: JSON.stringify(payload)
});
const data = await res.json();
const content = data.choices?.[0]?.message?.content?.trim() || “No response.”;
const tokens = data.usage ? data.usage.total_tokens : 0;
return { content, tokens };
}
This is very disappointing. It looks like these models are the later the dumber.


