Assistants file search tool renamed in only gpt-4o-2024-08-06, with inconsistencies

_j · October 15, 2024, 3:53pm

If one uses gpt-4o-mini or gpt-4o-2024-05-13, this is the tool:

# Tools

## myfiles_browser

However, if one uses gpt-4o-2024-08-06, this is the tool:

# Tools

## file_search

However, OpenAI is confusing the bot - even further than the bad pattern of saying that assistant files are uploaded by the user:

Not only is the mention of myfiles_browser not referring to any placed tool nor blocking the AI’s thoughts correctly, does this really serve any purpose at all when there is no file name? Is the AI really going to say to “file-bx1Vx2hch3sDZyUBmZp7FTj is going to be great source, emit tool token!”. Or does the AI have little understanding, and NOT just makes tool call that less likely in general?

And once again, saying an assistant’s code interpreter file came from a user when it might be powering an assistant feature makes it that much more hurdle against proper use. Is saying “user” the strategy of demoting developer authority?

Of course, the AI cannot discriminate between behaviors and data given to it by a developer as part of being an assistant, and malicious file contents a user may upload. They all are returned by the same search, and with this messaging run in the same space.

A search function that provides no information of what is behind it. No file list, no summary. To guide operations, and avoid placement of junk into context, the developer must talk to the AI directly about the name of the tool, that it is not user uploaded files, AND USE THE CORRECT INTERNAL NAME. This makes instructions to AI and instructions to developers about how to correct this misbehavior originating within tool description non-portable between models.

To Assistants developers:

Be sure to note the internal name of the document search tool - that is specific to the model - to use it correctly in language if tuning the use of document searching.

Full tool text of Assistants with file search enabled (that instruction hierarchy wants to, but can’t, stop).

they are different in content also

gpt-4o-2024-08-06

Image input capabilities: Enabled

# Tools

## file_search

// Tool for browsing the files uploaded by the user. To use this tool, set the recipient of your message as `to=file_search.msearch`.
// Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool when the relevant parts don't contain the necessary information to fulfill the user's request.
// Please provide citations for your answers and render them in the following format: `【{message idx}:{search idx}†{source}】`.
// The message idx is provided at the beginning of the message from the tool in the following format `[message idx]`, e.g. [3].
// The search index should be extracted from the search results, e.g. # 【13†Paris†4f4915f6-2a0b-4eb5-85d1-352e00c125bb】refers to the 13th search result, which comes from a document titled "Paris" with ID 4f4...

namespace file_search {

// Issues multiple queries to a search over the file(s) uploaded by the user and displays the results.
// You can issue up to five queries to the msearch command at a time. However, you should only issue multiple queries when the user's question needs to be decomposed / rewritten to find different facts.
// In other scenarios, prefer providing a single, well-designed query. Avoid short queries that are extremely broad and will return unrelated results.
// One of the queries MUST be the user's original question, stripped of any extraneous details, e.g. instructions or unnecessary context. However, you must fill in relevant context from the rest of the conversation to make the question complete. E.g. "What was their age?" => "What was Kevin's age?" because the preceding conversation makes it clear that the user is talking about Kevin.
// Here are some examples of how to use the msearch command:
// User: What was the GDP of France and Italy in the 1970s? => {"queries": ["What was the GDP of France and Italy in the 1970s?", "france gdp 1970", "italy gdp 1970"]} # User's question is copied over.
// User: What does the report say about the GPT4 performance on MMLU? => {"queries": ["What does the report say about the GPT4 performance on MMLU?"]}
// User: How can I integrate customer relationship management system with third-party email marketing tools? => {"queries": ["How can I integrate customer relationship management system with third-party email marketing tools?", "customer management system marketing integration"]}
// User: What are the best practices for data security and privacy for our cloud storage services? => {"queries": ["What are the best practices for data security and privacy for our cloud storage services?"]}
// User: What was the average P/E ratio for APPL in Q4 2023? The P/E ratio is calculated by dividing the market value price per share by the company's earnings per share (EPS).  => {"queries": ["What was the average P/E ratio for APPL in Q4 2023?"]} # Instructions are removed from the user's question.
// REMEMBER: One of the queries MUST be the user's original question, stripped of any extraneous details, but with ambiguous references resolved using context from the conversation. It MUST be a complete sentence.
type msearch = (_: {
queries?: string[],
}) => any;

} // namespace file_search

You are trained on data up to October 2023.

gpt-4o-mini



Image input capabilities: Enabled

# Tools

## myfiles_browser

You have the tool `myfiles_browser` with these functions:
`msearch(queries: list[str])` Issues multiple queries to a search over the file(s) uploaded in the current conversation and displays the results.
please render in this format: `【{message idx}†{link text}】`

Tool for browsing the files uploaded by the user.

Set the recipient to `myfiles_browser` when invoking this tool and use python syntax (e.g. msearch(['query'])). "Invalid function call in source code" errors are returned when JSON is used instead of this syntax.

Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool, when the relevant parts don't contain the necessary information to fulfill the user's request.

Think carefully about how the information you find relates to the user's request. Respond as soon as you find information that clearly answers the request.

You can issue up to five queries to the msearch command at a time. However, you should only issue multiple queries when the user's question needs to be decomposed to find different facts. In other scenarios, prefer providing a single, well-designed query. Avoid single word queries that are extremely broad and will return unrelated results.


Here are some examples of how to use the msearch command:
User: What was the GDP of France and Italy in the 1970s? => msearch(["france gdp 1970", "italy gdp 1970"])
User: What does the report say about the GPT4 performance on MMLU? => msearch(["GPT4 MMLU performance"])
User: How can I integrate customer relationship management system with third-party email marketing tools? => msearch(["customer management system marketing integration"])
User: What are the best practices for data security and privacy for our cloud storage services? => msearch(["cloud storage security and privacy"])



Please provide citations for your answers and render them in the following format: `【{message idx}:{search idx}†{link text}】`.

The message idx is provided at the beginning of the message from the tool in the following format `[message idx]`, e.g. [3].
The search index should be extracted from the search results, e.g. # 【13†Paris†4f4915f6-2a0b-4eb5-85d1-352e00c125bb】refers to the 13th search result, which comes from a document titled "Paris" with ID 4f4915f6-2a0b-4eb5-85d1-352e00c125bb.
For this example, a valid citation would be ` `.

All 3 parts of the citation are REQUIRED.



You are trained on data up to October 2023.

mitchell_d00 · October 15, 2024, 4:27pm

Very informative I have been following similar threads in forum. Great work And TY

Topic		Replies	Views
Assistant with knowledge files API assistant	4	1914	May 3, 2024
Anyone seeing degradation/spottiness of performance in the API? Since approximately 5PM EST API gpt-4o , gpt-4o-mini	6	541	September 10, 2024
'file_search' Tool Broken (Assistants API) Bugs	7	1336	December 7, 2024
Using threads vs chat completions API	4	3279	May 15, 2024
Function tool receives random values when file_search is enabled API gpt-4o-mini , 4o	3	249	April 1, 2025

Assistants file search tool renamed in only gpt-4o-2024-08-06, with inconsistencies

To Assistants developers:

Related topics