One request costs 153491 input tokens

assistant: I’m sorry, I did not receive any images. << this response is kind of strange - since from the conversation history it knows it had received two images and it could compare the two inferencences…
the “I’m sorry, I did not receive any images.” looks more like a huge overfit or even something hardcoded in the models code.