New features in the Assistants API!

Thanks for the feedback – this makes a lot of sense and we’ll look into it.

1 Like

All objects in the assistants api are accessible both in v1 and v2 – i.e. these are simply different ways of showing the same underlying data.

If you’d like to continue using the v1 API, simply pass the v1 version header your requests to the assistants api as documented here: This should ensure that nothing changes.

If you’d like to control your token usage, try to limit the max number of tokens using these parameters: We suggest you don’t decrease the max_prompt_tokens to a number less that 20,000 when using the file_search (v2) or 32000 when using the retrieval (v1) tool.

1 Like

Is it possible to set the default to V1 from Openai’s side so that we can do testing on our end? These sort of unexpected updates can cause very difficult cost implications for businesses and we need to do proper testing before implementing new versions. Instant overrides to API are extremely dangerous.

1 Like

Hi all,

Cool update - I am wondering whether when passing a file directly on an assistants thread, I would need to add it to a vector store separately?

For context, I don’t have a vector store setup, I attach a file on a thread and then ask for a response / start a run on the thread. From my testing, I’m tending to get this back in my response: No documents uploaded. Please upload the relevant documents to proceed.

I see later in the documentation that adding to a vector store is an async operation, so is it likely that the file upload isn’t complete when I run the thread directly after calling client.files.create? Do I need to treat it asynchronously as well when uploading a file specifically for a thread?

1 Like

Perhaps you are employing the Python library? Its use of particular endpoints is hidden from you. You can switch back to a version that doesn’t know how to do the new header, and would block with validation errors any beta v2 calls.

Yes – and, to be clear there is no default and nothing has been automatically upgraded. You can use any version you’d like at any time. See:


@nikunj it might be worthwile to show you what I am referring to. I have an application that generates a new assistant for every document uploaded by my clients. During my tests, i uploaded a document and the assistant popped up with v2 pre-selected with the option to switch to v1 in the assistants dashboard. If i am going crazy, maybe it would be worth an offline chat to discuss.

JSON formatted responses are a very good thing! Thanks. Also, I’m impressed with the 500x updated file_search :hugs:

This is talking about the playground then. I also noticed that the selection is not remembered as a playground setting just switching between assistants. It is a header you send in the API request, not an assistant setting with any API “remember” method.

You leave “chat”, it doesn’t remember your settings or prompt either, so this is somewhat expected. The Playground is more a demonstrator, and promoting the latest method is understood. Maybe a “super-switch” such as a drop-down v1 or v2 for two different playground views for any assistants then selected would have been a better method than a little callout within.

1 Like

When can we see GPTs powered by the new Assistants v2

Scanned PDFs and images! What are you waiting for?

Cool this seems promising. Going through the docs it seems like we have more clarity on what parameter OpenAI uses for retrival

By default, the file_search tool uses the following settings:

  • Chunk size: 800 tokens
  • Chunk overlap: 400 tokens
  • Embedding model: text-embedding-3-large at 256 dimensions
  • Maximum number of chunks added to context: 20 (could be fewer)

Regarding VectorStore, does openAI have plans enable developers to file search directly without Assistant API? Or let me put it in another way, how would I inspect the output of file search tool before being passed to the LLM?

1 Like

That may be something that they would not offer to you at any price plan similar to the current assistant function, although I have no internal insights, just inference.

Up to 1GB of data in embeddings is free. That may be 200M to 1000M tokens depending on the language, and then it is billed daily.

That is distinctly different than the cost of actually performing embedding on the API, where embedding your knowledge has a one-time charge: text-embedding-3-large $0.13 / 1M tokens.

So they clearly plan on this feature being subsidized by the ongoing language model calls and the data loaded into input (that is relatively low computation cost on the language AI now compared to output). The function returns more tokens to the assistant AI than the AI can reproduce with its new output limit of 4k, if you still wanted to pay to get the output.

1 Like

just doing some napkin math

an 1536 embedding is 6kb(1536 * 4 byte floats)
1000 embeddings would be 6MB
1m rows would be 6GB

1gb would be 160k rows but with 256 dim probably lot more. I am not sure what they thinking with 256 when embedding large can output 3k

The main issue with assistants API is debug-ability. If it returns a bad answer, you have no way of inspecting what is actually retrieved. Looking at their announcement, it seems like they want to get in the RAG business and offer beginner friendly product. That is interesting because I thought Sam wants to focus on LLMs

Since you are the one paying for the total embeddings storage (beyond the free threshold), they were likely thinking that 12288 characters of semantic metadata would be a lot of overhead for 800 tokens of language.

If it returns a bad answer, you have no way of improving what is actually retrieved. Like assistants up to now, just lots of topics “it doesn’t answer about my document”.

Perhaps down the line if there are more parameters to configure about the database and chunking then you might be able to affect the quality or adapt it to the type of data.

So which one is cheaper? creating your own RAG system with ChatCompletions and Pinecone as your vector database or using Assistant API that has the full package, with having a knowledge that creating a RAG system development is longer than using Assistant API.

Have been waiting for this. In my opinion the most powerful product in the making at OpenAI is the assistant API. I find that it can encompass all innovations existing & upcoming. Becoming the primary delivery channel for advanced, functional intelligence across domains, industries and scale. In short, big fan !!

I love Assistants API v1 too, apart from the cost and the latency. I’m waiting for streaming support to come to the C# client library that I use in Unity. Really looking forward to playing with v2. Great work OpenAI well done!

Before this update, I had success uploading xlsx files to threads and getting useful responses out of it.

But now I get a "400 Files with extensions [.xlsx] are not supported for retrieval. See" from the files api (that link leads to a page not found, by the way).

This seems like a giant regression.

1 Like

@nikunj The file-search tool is great, but I am running into one core issue - the file-search tool is functionally never used in practice. I can tell because annotations are never provided, even when I ask questions that are directly asked and answered in a document I provide.

In the playground, there is an option to Require tool call - when I specify this and use the same prompt as above, the assistant properly annotates its response, implying usage of the file-search tool. Are you able to expose a similar parameter when calling openai..beta.threads.messages.create() or openai.beta.threads.create() to force usage of the file-search tool?

For context, I created the assistant and a vector store in Open AI’s gui. I then am programmatically creating a thread, specifying the vector_store_id in the tool_resources param, and then when I stream messages, am specifying the assistant_id created from the GUI