Since this older topic about Assistants and its file search tool was last active, OpenAI has given more controls that make it more tolerable.
- chunk size (when adding to a vector store)
- the maximum number of chunks that will be returned
- a similarity threshold, below which chunks will not be returned
The latter two are part of the tools specification itself, which is set when creating or modifying an assistant, or which can be overridden by tools in a run. The last is also called a “ranker”.
First we need an API reference that has more details presented thoughtfully…
Tools Parameter Definition
tools
(array of tool objects, Optional, Defaults to []
)
A list of tools enabled on the assistant. This list can contain a maximum of 128 tool objects. Each tool object should specify the type and relevant configurations. Tools can be of the following types:
code_interpreter
file_search
function
Tool Object Structure
Each tool object in the tools
array must contain the following properties:
type
(string, Required)- Specifies the type of tool. Accepted values:
"code_interpreter"
"file_search"
"function"
- Specifies the type of tool. Accepted values:
Code Interpreter Tool Object
If type
is set to "code_interpreter"
, no additional properties are required within this tool object. It does not affect our addition of file search.
File Search Tool Object
If type
is set to "file_search"
, the tool object can include the following additional properties to customize its behavior:
- file_search (object, Optional)
-
Specifies configuration options for the file search tool.
-
max_num_results (integer, Optional, Range: 1–50)
- Defines the maximum number of results that the file search tool should return.
- Defaults:
20
forgpt-4*
models5
forgpt-3.5-turbo
models
- Note: The tool may output fewer results than specified by this limit.
-
ranking_options (object, Optional)
-
Provides options for ranking search results. If not specified, the file search tool uses an
auto
ranker with ascore_threshold
of0
. -
ranker (string, Optional, Default:
"auto"
)- Specifies the ranking method to use for the file search.
- If not specified, the default
auto
ranker is applied.
-
score_threshold (float, Required, Range: 0.0–1.0)
- Defines the minimum score required for search results to be included.
- Must be a floating-point value between
0
and1
. - Higher values represent stricter thresholds, resulting in fewer but more relevant results.
-
-
Example Tool Configuration with File Search
Here’s a Python representation for configuring the tools
parameter with a file_search
tool, specifying a maximum of 15 results and a score_threshold
of 0.6.
tools_parameter = [
{
"type": "file_search",
"file_search": {
"max_num_results": 12,
"ranking_options": {
"ranker": "default_2024_08_21"
"score_threshold": 0.6
}
}
}
]
Then just use that as “tools” value, or incorporate the technique into the overall API request.
Note: the separate tool_resources
parameter is where vector store IDs are actually attached.
I hope that’s exactly what’s needed!