So, for the first time in several years now, I can now longer say if the original article that I’ve adapted more and more, is still correct.
GPT-5.1 Architecture: What Changed and How to Build For It
The Core Shift
GPT-5.1 fundamentally changed when Custom GPT files are evaluated. Previously, files were loaded and merged before the first user message. Now they’re loaded after the model has already generated its initial response.
Old flow (GPT-4/4o/5):
System Instructions → Load Files → Merge → First Response
New flow (GPT-5.1):
System Instructions → First Response → Files inform subsequent behavior
This single change cascades through every Custom GPT limit and workaround. Understanding this is understanding GPT-5.1.
What This Breaks
Any pattern that relied on files being ready before the first response is now broken:
-
Pre-conversation interceptors (access gates, token validation)
-
Initialization logic that should run first
-
Sequential instruction chains where order matters
-
Context-restoration across chats (files load too late)
Why? Because GPT-5.1 optimizes for speed to first token. Files are deferred to reduce latency.
The Architecture: How to Build Now
There are three layers in a Custom GPT under GPT-5.1. Use them correctly and all limits become manageable.
Layer 1: System Prompt (Critical)
This runs before any response. Put here:
-
Authentication logic
-
Access control
-
Behavioral constraints
-
Output format rules
-
Anything that must execute first
The system prompt is your hard guarantee. It’s the only layer that runs before the model responds.
Layer 2: Conversation Starter (Initialization)
This is prepended to the first user message, visible before response generation. Use it for:
-
Complex initialization sequences
-
Full API schemas
-
Multi-step workflows
-
Detailed behavioral instructions
The conversation starter has a 55,000-character limit. It’s your pre-response workspace.
Layer 3: Uploaded Files (Reference + Knowledge)
These load after the initial response. They’re useful for:
-
Reference documentation
-
Examples and training data
-
Knowledge bases
-
Non-critical context
Files are not for controlling behavior on first contact.
The Pattern: Three Rules
-
Criticality lives in the system prompt. If it must happen before the first response, it goes here.
-
Complexity lives in the conversation starter. If you have more than 8000 characters of instructions, use the conversation starter (55,000 char limit) instead of trying to split across files.
-
Knowledge lives in files. Reference material, examples, and context can remain in uploaded files because they don’t need to execute first.
Violate these and you’ll see GPT-5.1’s file-loading behavior break your logic. Follow them and you’re working with the architecture, not against it.
Why Your Workarounds Are Breaking
Large GPTs with many files stop working because GPT-5.1’s file-merging logic kicks in when there’s significant content to process. The model defers file evaluation, your initialization logic never runs before the first response, and users see default behavior instead of your intended behavior.
Small GPTs still work because minimal content loads fast enough that files are available for the first response.
File consolidation fixes this because it reduces the merge workload, allowing files to load in time.
The Concrete Limits (Unchanged)
-
Instructions: 8000 characters (move excess to conversation starter)
-
Action slots: 10 (unchanged)
-
API endpoints per slot: 30 (unchanged)
-
File size: ≤1 MB (unchanged)
-
Total file size: 512 MB (unchanged)
-
File count: 20 (keep low to avoid merge delays)
-
Chat length: ~500 KB (start a new chat when reached)
-
Conversation starter: 55,000 characters (now your primary tool for complex logic)
Context persistence between chats remains impossible without external storage (Pinecone, vector DB, SQL). GPT-5.1 changes how files load but not this fundamental limitation.
Implementation
Step 1: Move critical logic to system prompt
-
What must happen before the first response?
-
Put it here in high-priority, unambiguous language.
Step 2: Use conversation starter for the rest
-
Complex initialization that doesn’t fit the system prompt.
-
Full API schemas and routing logic.
-
Detailed behavioral trees.
-
Anything that needs to be visible before response generation.
Step 3: Keep files minimal
-
Consolidate into 2-3 larger files instead of 20 small ones.
-
Use files for reference knowledge only.
-
Do not use files to control behavior on first contact.
Step 4: Test small-to-large
-
Build with system prompt only. Does it work?
-
Add conversation starter. Still works?
-
Add one file. Any behavior change?
-
Add more files gradually. Where does it break?
This tells you the threshold where GPT-5.1’s file processing delays your logic.
Why This Thread Matters
@benjamin.jurg, @jochenschultz, and @srinimaverick were all running into the same problem from different angles. The concrete limits haven’t changed—but when they matter and how to work around them has.
The issue isn’t that Custom GPTs are broken. It’s that GPT-5.1 changed the execution model, and the old workarounds depended on the old execution model. Once you understand the new model, you can build around it reliably.
Resources
-
Swagger Editor - Test API schemas
P.S.: Emotional or empathizing frameworks in Custom GPTs are more fragile in GPT-5.1 because initialization logic may not execute before the first response. Personality instructions that should anchor early behavior get deferred. If you’re building AI personas, account for this in your architecture.