We are announcing a variety of new features and improvements to the Assistants API and moving our Beta to a new API version, OpenAI-Beta: assistants=v2. Here’s what’s new:
We’re launching an improved retrieval tool called file_search, which can ingest up to 10,000 files per assistant - 500x more than before. It is faster, supports parallel queries through multi-threaded searches, and features enhanced reranking and query rewriting.
Alongside file_search, we’re introducing vector_store objects in the API. Once a file is added to a vector store, it’s automatically parsed, chunked, and embedded, made ready to be searched. Vector stores can be used across assistants and threads, simplifying file management and billing.
You can now control the maximum number of tokens a run uses in the Assistants API, allowing you to manage token usage costs. You can also set limits on the number of previous / recent messages used in each run.
We’ve added support for the tool_choice parameter which can be used to force the use of a specific tool (like file_search, code_interpreter, or a function) in a particular run.
We’ve added several streaming and polling helpers to our Node and Python SDKs.
See our migration guide to learn more about how to migrate your tool usage to the latest version of the Assistants API. Existing integrations on the older version of this Beta (OpenAI-Beta: assistants=v1) will continue to be supported will the end of the 2024.
Let us know if you have any feedback as you build on these new features
Hi! We’ve actually made the model tool a lot simpler and all it does now is search. These functions in your snippet aren’t accurate (and may be a model hallucination in case you were trying to extract this from the tool prompts).
Great update! Thank you. So happy to be able to add my own Assistant messages now.
A use-case here for me is to interrupt a function call and append the assisstant message with the returned results verbatim . I’m hoping this would unlock the thread, will have to test
I’m so happy to see some insights into the underlying embedding parameters & get some control back.
By default, the file_search tool uses the following settings:
Chunk size: 800 tokens
Chunk overlap: 400 tokens
Embedding model: text-embedding-3-large at 256 dimensions
Maximum number of chunks added to context: 20 (could be fewer)
The default part makes me happy knowing that we can/will alter it as well. The playground is really going to be a tremendous web app lol
Additional: who’s AI is rewriting for hypothetical answer on the vector database (that the documentation talks about, but it seems the AI will only write a short query unless system message instructs it better).
(How about document reassembly technology to put those chunks in order and merge overlaps…)
Finally we have Assistant and Run objects now support popular model configuration parameters like temperature 2, response_format (JSON mode) 3, and top_p.
These are all awesome updates. Glad to see Vector Stores, to allow for data partitioning while using same assistant across different stores.
I ran quick test to run v1 prompts against v2 (Playground) and got poor results with file_search. I am guessing this is because of mere 20 chunks used in context (x800 tokens = 16K), while v1 would often consume about 100K tokens in my use case. So yeah, until the time settings are configurable, I wont be able to switch my client to use v2.
This is huge. I am building an app with Assistant API with multiple tools for each user. Now, I can really ramp up data collection and each user can have an amazing assistant that powers tools to help their business succeed! Excited!
I’ve been waiting for a way to set the temperature for an assistant and it’s finally here, but I don’t see a way to set the temperature, top_p and response_format in the Assistant Playground.
Can you add the new fields to the Assistant Playground?
Files metadata, and operations using that metadata
Up to 10000 files per assistant. 100GB/organization. And no hierarchy to the storage system, and no additional data fields beyond the original file name and upload date. Being charged daily if left in a vector store.
And one method: list them all.
And one method: delete a single one by ID.
A little customer-side data glitch to their robust tracking system that is used by all organization and project members and chatbots for every API call, and the organization then has thousands of files of unknown source.
The benefit of a metadata layer, allowing arbitrary developer keys and fields like "customer_id" or "job-id" seem obvious. Sublistings and batch deletions then become possible. Structured or generated metadata such as expiration_dateproject_id would allow query-like operations and deletions.
Good points, this really needs good file management utilities and structure to circumvent the nightmare scenarios these badly needed features introduce. Otherwise I’m loving it …
Note. All of my assistants just automatically flipped to v2 and it caused a jump in token usage by 10x. Just one of my prompts ate up 107,000 Context tokens. Need help in understanding why this happened. I was about to launch publicly yesterday, however now i have major concerns that could use use help.
The number of past conversational turns you now can somewhat control, but not the length of the knowledge injection or the number of internal times the AI may invoke its functions - nor the number of times the AI invokes the wrong functions or writes bad function language and gets errors.
For knowledge on file search, the AI, with all the instructions, tool specifications, past chat, user input, listings of mount files, etc, must make a minimum of one response not to you, but to its internal function for file search. What usefulness the search will return is a mystery to the AI without your tedious instructions.
OpenAI doesn’t state if the v2 AI is also preloaded with arbitrary file text, which is how v1 worked besides its searching functions directly on files.
The vector database is 800 token chunks, and they say 400 tokens of overlap, which if read one way may mean actually 1600 token chunks. Up to 20 are added to a conversation thread. That’s 1600 x 20 = 32k tokens of vector return possible.
If the AI doesn’t like what it got and/or calls a function again, the prior tool call and response is also part of the reused input context for another tool call.
You “delegate” this use of your budget while you wait for a response when using assistants.