Hello everyone,
I’m currently using the Assistants API with specific instructions to always retrieve context to answer the user’s questions accurately.
I’ve noticed that when the question is the first message in the thread, the retrieval process works flawlessly 100% of the time. GPT-4o will fetch the necessary context from the vector store and provide the correct answer.
However, if the question isn’t the first message in the thread, the retrieval only works about 50% of the time. I’m puzzled by this inconsistency. Could it be that the initial exchange influences the model’s decision on whether or not to retrieve additional context? Does the model assume that the initial retrieval was sufficient for the subsequent messages?
Has anyone else experienced this or have any insights into why this might be happening?
Thank you!