Setting score_threshold parameter for file search tool

Hi everyone, I am having trouble setting the score_threshold parameter when creating a run.


tools = [
    {
        "type": "file_search",
        "file_search": {
            "ranking_options": {
                "score_threshold": 1.0
            }
        }
    }
]

run = client.beta.threads.runs.create_and_poll(
            thread_id=openai_thread.id,
            assistant_id=assistant_id,
            tools=tools)

This is how I am setting it at the run level. The outcome is always a response from the assistant about not being able to find any results from the files, as if they didn’t exist. I would like to set the score_threshold at the run or thread level as I create the assistant using the UI and this setting is not available.

Hello,

It makes sense, so basically when you run file retrieval each file/result found is assigned a relevancy score and what I noticed is that even for the most relevant chunk, the retrieval will not assign a 1 score (most I’ve seen it assign is around 0.9). So lowering the score_threshold should solve this issue. I personnaly use a score_threshold value of 0.75 and for now it provides a good balance between not missing results and not including irrelevant results.

I hope this helps!

1 Like

thanks for the reply. I just set it to 0.75 and it still responds with the same thing. Im wondering if maybe setting the file search settings at the run level is somehow also resetting the vector store id to none

text-embeddings-3-large at 256 dimensions returns significantly lower scores than that, although they can approach 1.0 for near identity. The AI is never going to write a short search query that looks like a document though.

Here are some results at the described parameters of file_search. The first at index 0 is the search query. It matches itself at 1.0.

The last is the documentation for the predictions feature just announced - the target, while others are irrelevant phrases or a deceptive short phrase I threw in.

 == Cosine similarity comparisons ==
0:" How does the OpenAI prediction feature work on cha" -
 match score: fp32: 1.0000 /  fp8: 1.0000 /  int8: 1.0000
1:" Loved it! I would highly recommend!" -
 match score: fp32: 0.0880 /  fp8: 0.0916 /  int8: 0.0920
2:" Was pretty good. It met the specs and not much els" -
 match score: fp32: 0.1535 /  fp8: 0.1613 /  int8: 0.1596
3:" Chat Completions is an OpenAI endpoint for AI mode" -
 match score: fp32: 0.6751 /  fp8: 0.6707 /  int8: 0.6746
4:" ## Latency optimization Improve latency across a w" -
 match score: fp32: 0.5326 /  fp8: 0.5320 /  int8: 0.5347

You can see that 0.40 is going to be a good starting point.

(for the technical and curious, the last two results per line are quantizing to a 1/4 size vector database value, with a multiplier based on the dimensional reduction performed client-side to minimize quantization error, and clipping of the overflows of the special ML oriented float8 format with limited dynamic range)

1 Like