New features in the Assistants API!

We are announcing a variety of new features and improvements to the Assistants API and moving our Beta to a new API version, OpenAI-Beta: assistants=v2. Here’s what’s new:

  • We’re launching an improved retrieval tool called file_search, which can ingest up to 10,000 files per assistant - 500x more than before. It is faster, supports parallel queries through multi-threaded searches, and features enhanced reranking and query rewriting.
  • Alongside file_search, we’re introducing vector_store objects in the API. Once a file is added to a vector store, it’s automatically parsed, chunked, and embedded, made ready to be searched. Vector stores can be used across assistants and threads, simplifying file management and billing.
  • You can now control the maximum number of tokens a run uses in the Assistants API, allowing you to manage token usage costs. You can also set limits on the number of previous / recent messages used in each run.
  • We’ve added support for the tool_choice parameter which can be used to force the use of a specific tool (like file_search, code_interpreter, or a function) in a particular run.
  • You can now create messages with the role assistant to create custom conversation histories in Threads.
  • Assistant and Run objects now support popular model configuration parameters like temperature, response_format (JSON mode), and top_p.
  • You can now use fine-tuned models in the Assistants API. At the moment, only fine-tuned versions of gpt-3.5-turbo-0125 are supported.
  • Assistants API now supports streaming.
  • We’ve added several streaming and polling helpers to our Node and Python SDKs.

See our migration guide to learn more about how to migrate your tool usage to the latest version of the Assistants API. Existing integrations on the older version of this Beta (OpenAI-Beta: assistants=v1) will continue to be supported will the end of the 2024.

Let us know if you have any feedback as you build on these new features :slight_smile:


Don’t forget to document for us the iterations the assistant makes when the file_search method goes exploring documents.

Additionally, the following interactive methods are provided for in-depth document exploration:

  • next(): Navigate to the next page of search results.
  • previous(): Return to the previous page of search results.
  • jump_to_result(index: int): Jump directly to a specific search result.
  • highlight_terms(terms: list[str]): Highlight specific terms in the search results for better visibility.
  • open_document(document_id: str): Open a document to view its contents in more detail.
  • close_document(): Close the currently open document.

(note this is ChatGPT methods that explore documents, not the new API stuff!)

(clever dynamic loader of methods…)

Thanks for the top_p. This is a better parameter to use to eliminate the gamble that very unlikely tokens could be produced in a response.

1 Like

Hi! We’ve actually made the model tool a lot simpler and all it does now is search. These functions in your snippet aren’t accurate (and may be a model hallucination in case you were trying to extract this from the tool prompts).


Great update! Thank you. So happy to be able to add my own Assistant messages now.

A use-case here for me is to interrupt a function call and append the assisstant message with the returned results verbatim . I’m hoping this would unlock the thread, will have to test

I’m so happy to see some insights into the underlying embedding parameters & get some control back.

By default, the file_search tool uses the following settings:
Chunk size: 800 tokens
Chunk overlap: 400 tokens
Embedding model: text-embedding-3-large at 256 dimensions
Maximum number of chunks added to context: 20 (could be fewer)

The default part makes me happy knowing that we can/will alter it as well. The playground is really going to be a tremendous web app lol


It looks like ChatGPT answering about about ## file_search gave the bulk of ## myfiles_browser additional methods that are provided at different stages of document retrieval. I see that I prompted it into simply attributing the wrong tool name by providing it suggestion to see if the new tool was there also. :laughing:

Then indeed it is well-considered documentation about the degree which retrieval will make multiple calls, while carrying along with it conversational context, that would let one understand that token usage may be higher than is expected from just API input, and that if the run token limit works correctly, it would also limit the capabilities of search.

Additional: who’s AI is rewriting for hypothetical answer on the vector database. Is that assistant output instructed towards file_search by the function specification?

(How about document reassembly technology to put those chunks in order and merge overlaps…)


Finally we have Assistant and Run objects now support popular model configuration parameters like temperature 2, response_format (JSON mode) 3, and top_p. :pizza:


These are all awesome updates. Glad to see Vector Stores, to allow for data partitioning while using same assistant across different stores.

I ran quick test to run v1 prompts against v2 (Playground) and got poor results with file_search. I am guessing this is because of mere 20 chunks used in context (x800 tokens = 16K), while v1 would often consume about 100K tokens in my use case. So yeah, until the time settings are configurable, I wont be able to switch my client to use v2.

Great work.


RIP :pray: RAG & Semantic Routing…

This is huge. I am building an app with Assistant API with multiple tools for each user. Now, I can really ramp up data collection and each user can have an amazing assistant that powers tools to help their business succeed! Excited!

great ! but expensive , i ask two question, pay for $0.35.

I’ve been waiting for a way to set the temperature for an assistant and it’s finally here, but I don’t see a way to set the temperature, top_p and response_format in the Assistant Playground.

Can you add the new fields to the Assistant Playground?

we’ll be adding this to the playground very soon


I create C# client library that support file search and vecor store and other all endpoint.
Nuget: HigLabo.OpenAI.
GitHub: higty/higlabo

1 Like

So I just want to make sure I am correct on this:

The api added response_format with json, but you cannot use file_search when you want a json response?

Why are the two are mutually exclusive? There are a lot of use cases for generating data using a file reference, but wanting the response as json.

Agreed! This is something we’d love to add support for but couldn’t make happen in time. We plan on making this possible

1 Like

Possible Feature - Assistants files

Files metadata, and operations using that metadata

Up to 10000 files per assistant. 100GB/organization. And no hierarchy to the storage system, and no additional data fields beyond the original file name and upload date. Being charged daily if left in a vector store.

{'data': [{'id': 'file-kNeCG3FLIP1OgzSD4apj', 'bytes': 2911, 'created_at': 1713555273, 'filename': '', 'object': 'file', 'purpose': 'assistants', 'status': 'processed', 'status_details': None}], 'object': 'list', 'has_more': False}

And one method: list them all.
And one method: delete a single one by ID.

A little customer-side data glitch to their robust tracking system that is used by all organization and project members and chatbots for every API call, and the organization then has thousands of files of unknown source.

The benefit of a metadata layer, allowing arbitrary developer keys and fields like "customer_id" or "job-id" seem obvious. Sublistings and batch deletions then become possible. Structured or generated metadata such as expiration_date project_id would allow query-like operations and deletions.

beta 3?


Good points, this really needs good file management utilities and structure to circumvent the nightmare scenarios these badly needed features introduce. Otherwise I’m loving it …

Note. All of my assistants just automatically flipped to v2 and it caused a jump in token usage by 10x. Just one of my prompts ate up 107,000 Context tokens. Need help in understanding why this happened. I was about to launch publicly yesterday, however now i have major concerns that could use use help.


The number of past conversational turns you now can somewhat control, but not the length of the knowledge injection or the number of internal times the AI may invoke its functions - nor the number of times the AI invokes the wrong functions or writes bad function language and gets errors.

For knowledge on file search, the AI, with all the instructions, tool specifications, past chat, user input, listings of mount files, etc, must make a minimum of one response not to you, but to its internal function for file search. What usefulness the search will return is a mystery to the AI without your tedious instructions.

OpenAI doesn’t state if the v2 AI is also preloaded with arbitrary file text, which is how v1 worked besides its searching functions directly on files.

The vector database is 800 token chunks, and they say 400 tokens of overlap, which if read one way may mean actually 1600 token chunks. Up to 20 are added to a conversation thread. That’s 1600 x 20 = 32k tokens of vector return possible.

If the AI doesn’t like what it got and/or calls a function again, the prior tool call and response is also part of the reused input context for another tool call.

You “delegate” this use of your budget while you wait for a response when using assistants.