Following Instructions Quality - Developer message and its position, instructions and prompt caching

Hi,

I’m using gpt-4.1 for a conversational chatbot (gpt-5 is horrible for this purpose as i’ve tested up to now).
i’ve been using the Instructions parameter up to now in the responses API.
then i’ve seen that when the conversation log is long (40 messages for example), it’s sometimes impossible to give proper instructions to the model, it completely ignores them and continues the flow of the conversation as it sees fit based on the conversation up to now.

that sent me on a research regarding the ‘developer’ message as a replacement for using ‘instructions’.

from doing tests:
if i’m replacing the instructions with a ‘developer’ message at the beginning of the conversation, the results are slightly better but far from perfect.
and if i’m placing the ‘developer’ message at the END of the conversation, after the user’s last message, i get really good results.

i’m really ok with doing that constantly there’s no problem in the code to simply put it in the end, the issue is - caching.
if it comes at the end - it can’t get cached because the caching will end somewhere towards it.

and that’s a big cost for me - because my prompt is pretty long.

any ideas? anyone?

if i could put the developer message at the beginning and also get good results , that would be great because it will be cached properly.

or any other trick to help the model listen to instructions properly in long conversations, while maintaining as much caching as possible.

Thanks!

Updating with the improvised solution i’ve found to be sufficient.
I put the developer message in the beginning with the full prompt.
and also each call i add a developer message in the end with only short key critical things.
(i don’t use the conversation objects, i build the call each time with the messages, so no matter where in the conversation, there’s always 1 developer message in the beginning and another short one in the end)