We use gpt-4-0125-preview with several functions and retrieval. When we load more than 4096 tokens for retrieval and functions, the model starts lying, inventing information and calling functions without asking the user for information, but simply substituting fictitious values. Although the documentation says that you can provide 128000 tokens as an input, and you can receive up to 4096 tokens as a response. But in fact, it doesn’t work that way.
@logankilpatrick help please with this problem.
This phenomenon is known as hallucination.
It seems that using the Assistant function on GPT-4-turbo is particularly prone to hallucinations.
The Assistant is still in beta and is not yet official. So, all feedback is valuable!
It is clear. It’s just strange that he starts hallucinating exactly at the mark of 4096 tokens. It seems that he has 4096 tokens, not 128,000 tokens.
In general, language models that include GPT-4 Turbo are more prone to hallucination when the model is given longer contexts, due to its limited attention.
I also know about this very well from using the assistant feature, and seeing how GPT-4 Turbo starts creating implausible stories when the context gets longer.
I have managed to get around this by instructing GPT-4 Turbo with the assistant function to use the code interpreter and store only the necessary part of a long sentence in a Python dictionary or list and use it as needed, instead of using the long context😅
Although GPT-4 Turbo can handle 128,000 tokens (input and output combined), I think it’s best to know there are limits to its attention, and use it wisely, without expecting to use the whole context length(this also saves on token fees).