GPT assistant. It's limitations or failed instructions?

I’m testing many different approaches and various GPT assistants ideas.
For me GPT do not execute instructions as whole. It’s starts hallucinate sometimes. Sometimes adding output unrelated to the instructions, like ethical stuff or anything related to OpenAI core guidelines. I tried to work around it, it’s partially works.

My question is if I’m chasing mirage or my approach is wrong. I building complex GPT assistant, with completely filled up instructions and like 200 pages knowledge base .
All instructions and links, redirections to knowledge and how to use it and when and so on can’t be stated more accurately and clear, specific, I tried to put everything in such way there will be not much speculative approach for GPT.
Still GPT misses or ignores many of instructions, starts to hallucinate, adds contents which is not requested (seems it is related to OpenAI policies).
While I achieved to get amazing output it’s far from 100% of it’s tasks.

For me it looks that 128.000 token context window is not working, far from it.

Theoretically GPT could execute instructions and it’s knowledge base linked to instructions on up to 300 pages of behavior instructions. But it seem cannot handle several pages of specific instructions accurately without messing up.

Hi Berus. It’s hard to provide feedback in abstract but when I built out my assistant I found it be useful to follow an iterative approach. Start with a few instructions, a smaller knowledge base and limited function calls to get a better feel for how the assistant performs and responds to changes within the context of your specific use case. Then iteratively add more complexity to it. Like you, I have tried to be very specific - however, I have noticed that the way you phrase something or the sequence of steps you outline in your instructions can have a significant impact on the assistant’s performance. Therefore, lots of testing including trial and error is required but in general I have found the assistant to be performing fairly well (within the limitations that have been pointed out by others within this Forum).

1 Like

I agree and use the same method, starting with core goals and then expanding. I’ve done extensive testing and created assistants for specific tasks that function well. However, when it comes to handling complex behavior in certain contexts, GPT struggles. Its limited reasoning capabilities cannot handle complex instructions, even if they are clearly and strategically structured. This limitation also leads to a misapplication of OpenAI’s guidelines, resulting in skewed outputs, such as unnecessary politeness or the mislabeling of content as sensitive. The current reasoning level of GPT doesn’t allow for a proper adaptation to these guidelines, which all of this is a significant limitation for my project’s needs.