When we perform the similarity search and isolate the top three candidates for delivering an answer, we average those dot products and then compare the average to an “on-message” threshold. If the average of the top three does not meet the minimum threshold, we reject it as a meaningful question.
We’ve been fooling around with several approaches to do this. Nothing is ideal so far, but the current threshold is about 0.749.
Maybe there’s a better way to set aside questions that are not in the wheelhouse, but I haven’t stumbled on it yet.