Assistants API v2. Maximum number of chunks limit

I saw in the filesearch documentation that the chunk limit could be 20 or fewer for answering a query. Do you know which part of assistant hyperparameter to tune for retrieving lesser chunks.

(It’s just “parameter”. It’s fun to say “hyperparameter”, but those are terms OpenAI uses for machine learning specifications in training AI models).

The amount returned is not a setting that we have access to.

What OpenAI doesn’t say is if the files search retrieves fewer chunks if the search is not relevant to documents that have been uploaded. Expect it to be maximized.

Typical RAG is automatic instead of a tool call, and uses a cutoff threshold so you don’t have to pay for lists of machine parts when you ask about puppies.

Since the return is out of your control, the only thing you can do is provide the AI instructions of what you have uploaded to the vector store, so that it knows when invoking will be productive or a waste of your money.

1 Like

Thanks go it. But the AI instructions provided by the user is only useful for retrieving relevant chunks form vector database as it is converted to is vector for retrieving the relevant chunks and once top 20 chunks are retrieved OpenAI says that they rerank it and use all chunk to summarise or answer the user query. No where did i see about removing the chunks that are not relevant. Could you give me more info on that part. Thanks in advance

The second paragraph you quoted, I contrast RAG vector database you would have programmed yourself optimally, vs what OpenAI has done that is not documented.

Do OpenAI employ a threshold of minimum semantic relevance requirement to limit search result count? One would have to research your token usage and see if context tokens doesn’t dramatically drop when you ask the AI to search your extensive single-topic files for something that you absolutely didn’t supply (or OpenAI staff could just answer, if they are proud of how it works…) Then we could say “no”.

If you really wanted to investigate the quality and length, and supply the forum an answer, it is rather difficult to ask the assistant how much text data it got, because it may get 8x as much as it can output or repeat to code interpreter, and it is unlikely to produce an accurate measurement by language generation. You just have to ask it to report on the contents of an instructed search. If a search for “koala farms in Antarctica” gave your documents.

1 Like

Thanks for the response i have checked and found out that if the query is specifically related to the document the summarization of assistant only cite that document and when asked a generalized query it quotes more than 1 document. Which proves that OpenAI has been only getting the relevant chunks and not all 20 of them. Nevertheless, thank you for the insight i will also confirm my intuition by checking the token usage.

How do I do this in python? Like which parameter do I specify the number of chunks returned? I see in the documentation it says file_search.max_num_results but I don’t actually see where to put that Pythonically in the assistant initialization or the run itself.

Since this older topic about Assistants and its file search tool was last active, OpenAI has given more controls that make it more tolerable.

  • chunk size (when adding to a vector store)
  • the maximum number of chunks that will be returned
  • a similarity threshold, below which chunks will not be returned

The latter two are part of the tools specification itself, which is set when creating or modifying an assistant, or which can be overridden by tools in a run. The last is also called a “ranker”.


First we need an API reference that has more details presented thoughtfully…


Tools Parameter Definition

tools (array of tool objects, Optional, Defaults to [])

A list of tools enabled on the assistant. This list can contain a maximum of 128 tool objects. Each tool object should specify the type and relevant configurations. Tools can be of the following types:

  • code_interpreter
  • file_search
  • function

Tool Object Structure

Each tool object in the tools array must contain the following properties:

  • type (string, Required)
    • Specifies the type of tool. Accepted values:
      • "code_interpreter"
      • "file_search"
      • "function"

Code Interpreter Tool Object

If type is set to "code_interpreter", no additional properties are required within this tool object. It does not affect our addition of file search.

File Search Tool Object

If type is set to "file_search", the tool object can include the following additional properties to customize its behavior:

  • file_search (object, Optional)
    • Specifies configuration options for the file search tool.

    • max_num_results (integer, Optional, Range: 1–50)

      • Defines the maximum number of results that the file search tool should return.
      • Defaults:
        • 20 for gpt-4* models
        • 5 for gpt-3.5-turbo models
      • Note: The tool may output fewer results than specified by this limit.
    • ranking_options (object, Optional)

      • Provides options for ranking search results. If not specified, the file search tool uses an auto ranker with a score_threshold of 0.

      • ranker (string, Optional, Default: "auto")

        • Specifies the ranking method to use for the file search.
        • If not specified, the default auto ranker is applied.
      • score_threshold (float, Required, Range: 0.0–1.0)

        • Defines the minimum score required for search results to be included.
        • Must be a floating-point value between 0 and 1.
        • Higher values represent stricter thresholds, resulting in fewer but more relevant results.

Example Tool Configuration with File Search

Here’s a Python representation for configuring the tools parameter with a file_search tool, specifying a maximum of 15 results and a score_threshold of 0.6.

tools_parameter = [
    {
        "type": "file_search",
        "file_search": {
            "max_num_results": 12,
            "ranking_options": {
                "ranker": "default_2024_08_21"
                "score_threshold": 0.6
            }
        }
    }
]

Then just use that as “tools” value, or incorporate the technique into the overall API request.

Note: the separate tool_resources parameter is where vector store IDs are actually attached.

I hope that’s exactly what’s needed!

1 Like

I’ve employed this strategy before, but I’m stuck at the results. Consider the following considering I have a 0.2 score_threshold applied to my assistant:

run_step = client.beta.threads.runs.steps.retrieve(
    thread_id=thread_id,
    run_id=run.id,
    step_id=step_id,
    include=["step_details.tool_calls[*].file_search.results[*].content"]
)

This is currently resulting in the following simplified response when logging run_step.step_details:

{
    "tool_calls": [
        {
            "id": "call_UQBFB1JCqqVZZxVrR00EV6rF",
            "file_search": {
                "ranking_options": {
                    "ranker": "default_2024_08_21",
                    "score_threshold": 0.02
                },
                "results": [
                    {
                        "file_id": "{my_assistant_file_id}",
                        "file_name": "{my_file}",
                        "score": 0.01666666753590107,
                        "content": [
                            {
                                "text": "...",
                                "type": "text"
                            }
                        ]
                    },
                    {
                        "file_id": "{my_assistant_file_id}",
                        "file_name": "{my_file}",
                        "score": 0.01587301678955555,
                        "content": [
                            {
                                "text": "...",
                                "type": "text"
                            }
                        ]
                    },
                    {
                        "file_id": "{my_assistant_file_id}",
                        "file_name": "{my_file}",
                        "score": 0.015625,
                        "content": [
                            {
                                "text": "...",
                                "type": "text"
                            }
                        ]
                    } 
                ]
            }
        }
    ]
}

Note that each score in the included response is lower than the set threshold. Also note that there are some chunks in this that will randomly be included that have a higher score than the threshold. It’d be easier to overlook if it were at least consistently providing all the chunks ranked by score, but it’s seemingly responding with a set of chunks which is maxed at max_num_results, and if the number of chunks in your vector store exceeds max_num_results then it will process a random set of chunks and rank those. I’m hoping I just have something misconfigured or am overlooking something here…

1 Like

Returning random results would be that file search was completely broken. Unlikely.

Are all chunks sorted for you by the score, like the first two shown? Can you read them and see the relevance difference yourself?

A similarity score that low is quite hard using the same embeddings model OpenAI says they use. The dot product scores also jump around more when using fewer dimensions, to where the full result is almost impossible to hit 0 but the 256 dimension can dip negative.

If you don’t have your vector store loaded up with mongolian yak herding advice to get such a low score, it may be that the threshold is subtracted from the score, or the score return is multiplied or meaningless. Investigating serves no interest of mine, but you can check out targeted good or bad searches and if the threshold will limit the count of returned chunks at all at any value.