Is there a limit to the number of Assistants per account? I need to create 40.000. Is that ok?

There are no stated limits in the docs. Could anyone please confirm it’s ok? Thank you.

Making 40000 assistants might take some careful thought.

  1. Can the user interfaces handle even listing that many per account so that you can manage them?
  2. Will they rate limit you for interacting with non-generating API, when you are creating one per second for 12 hours straight?
  3. Have you considered pricing:

How will Retrieval in the API be priced?

Retrieval is priced at $0.20/GB per assistant per day. If your application stores 1GB of files for one day and passes it to two Assistants for the purpose of retrieval (e.g., customer-facing Assistant #1 and internal employee Assistant #2), you’ll be charged twice for this storage fee (2 * $0.20 per day). This fee does not vary with the number of end users and threads retrieving knowledge from a given assistant.

What it also doesn’t say is if this is billed in increments of $0.20, a minimum of $0.20 per assistant. If you are proposing AIs that can access even a slice of common customized data, you could be looking at $8000 a day.

Likely nobody thought about the buffer overflow at 32767 of them that could be out there…


Yes, I f anyone can confirm that the pricing is this, that would be appreciated too: Assistants API Retrieval Pricing: how much does this cost?.

Could anyone at openai or elsewhere please confirm there is no limit to the number of assistants or what it is, and their pricing?

There are currently no limits to the number of assistants a single organisation can have, however, rate limits may become a determining factor when attempting to interact with the assistants created in large numbers. It may be advisable to reach out to OpenAI to enquire about the specifics of your use case, either by emailing or contacting support via the bot on and leaving your contact details and a description of your proposal by clicking on the icon in the bottom right corner.

That’s a lot of assistants :face_with_spiral_eyes:


I recall this from a couple days ago.

Did you ever bother to see if you can upload 40,000 files?

Why not just try and create 40,000 assistants and see what happens?

You have been given all the answers, and more. Just do it please and let us know the results.

For context, this person wants a single assistant for each and every file that they have so the assistant’s file is unique to each and every single user. So the exact same assistant, with just a different file.

Instead of having a single source of truth with a file uploaded to the messages.

1 Like

I assume the reason to silo all the files apart is for some kind of data protection compliance? Otherwise I struggle to understand the advantage of doing this over passing the correct file through a sorting program into the thread.

1 Like

That’s what was suggested previously. The Files endpoint does not mention any sort of limit (besides a max total size of 100GB) and OP said they were going to try it out. I don’t think they bothered.

1 Like

I don’t see how you would hit that limit anyways, once a thread is no longer in use the files saved to it are deleted right?

Obviously I understand the confusion being a novice myself, but I think they’re overcomplicating it.

1 Like


What OP is suggesting is creating and maintaining ~40,000 assistants that have the same prompts & functionalities. Have to change the file? Need to find the assistant and update it, need to update the prompt? Need to make ~40,000 calls.

On the other hand they can have one assistant (one source of truth) with the file uploaded to the user’s thread/message and not have to worry about anything else.


I tried doing it via threads but other than vastly increasing the complexity of the call (you need to create thread, add a file, create a message, run the thread, etc, several steps, each time), in the end the retrieval simply did not work when using the thread + message approach. The file id was correct but it just didn’t do anything with it.

I only need one file per conversation, and that file depends on the topic. Think of it as if you had one assistant per wikipedia article, and you want that assistant to only know about that article. That’s roughly the logic,

And you want it to (1) work and (2) not be complicated code.

That would be great but it’s just far more complex, I tried it, and it simply did not work. Retrieval for a thread + message with file id + assistant all stitched together, other than taking like 6 calls for a simple “what’s up?”, simply didn’t run any RAG.

What am I missing? In both cases you need to find the unique “something”, right? Either the unique assistant, or unique file

If you try it you’ll see one way is one call and the other way is six calls.

I’m very confused. I don’t call for my assistant. I hace the IDs mapped and presumably I’d have the files already mapped to the users. Have you tried it?

I don’t call for my thread either. I’ll have to sit down to confirm but … no… it’s not 6 calls…

Ok I sat down.

No. It’s not 6 calls.


  • Create file for User
  • Create run (thread same time) using Message (with file ID) and attaching pre-existing assistant
  • ???

Yours is

  • Create assistant for User with File
  • Create run (with assistant) using Message
  • ???

But. I must stop now. Last time I had discourse I was stripped of my status and silenced for a week. I wish you the best of luck. You seem very headstrong in your journey. Nobody else it seems has made 40,000 assistants before so maybe you will be the one to do it, for science.

1 Like

I don’t have time to check now but I was doing something like this:

import OpenAI from 'openai';

export default async function handler(req, res) {
  const openai = new OpenAI(process.env.OPENAI_API_KEY);

  const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

  const retrieveRun = async (threadId, runId) => {
    let run;
    do {
      await sleep(2000); // Wait for 2 seconds
      run = await openai.beta.threads.runs.retrieve(threadId, runId);
    } while (run.status !== 'completed' && run.status !== 'failed');

    console.log('Completed run:', run);

    //retrieve thread messages:
    const threadMessages = await openai.beta.threads.messages.list(
        // log all content of all messages in the threadMessages array of objects: => {
      let chatresponse = { value: '', annotations: [] };
        if ( > 0) {
        const lastMessageContent =[0].content;
        if (lastMessageContent && lastMessageContent.length > 0) {
          chatresponse.value = lastMessageContent[0].text.value; // Assuming 'text' is an object with 'value'
          chatresponse.annotations = lastMessageContent[0].text.annotations; // Assuming 'text' has an 'annotations' array
    return chatresponse;

  try {
    if (req.method === 'POST') {
      if (req.body.action === 'createThread') {
        const emptyThread = await openai.beta.threads.create();
      } else if (req.body.action === 'createMessage') {
        const { threadId, content } = req.body;
        const messageResponse = await openai.beta.threads.messages.create(threadId, {
          role: 'user',
          content: content,
          file_ids: ['file-5sbhiDfnNPxh41y6SkFonYT8'],

        const runResponse = await openai.beta.threads.runs.create(threadId, {
          assistant_id: 'asst_74Ed5d5vitJmpWtycKlWGD8j',
        //   tools: ['retrieval'],

        // If run is immediately completed or failed, return it right away
        if (runResponse.status === 'completed' || runResponse.status === 'failed') {
          console.log('Run completed immediately:', runResponse);
          res.status(200).json({ messageResponse, runResponse });
        } else {
          // If run is not complete, start polling for its completion
          const completedRun = await retrieveRun(threadId,;
          res.status(200).json({ messageResponse, runResponse: completedRun });
    } else {
      res.setHeader('Allow', 'POST');
      res.status(405).end('Method Not Allowed');
  } catch (error) {
    console.error('The API encountered an error:', error);
    res.status(500).json({ error: 'Internal Server Error', description: error.message });

If you know why this doesn’t work (it doesnt make use of RAG) or how to make it simpler instead of all these calls it would be great.

The thing is there are 40k files and I need the gpt to have access to only one of those for each chat session or thread or assistant.

I recently had a definitive answer from support

There is no limit on the total number of assistants you can make.

Costs calculation aside , you can manage your assistant allocation with a database

If you are not familiar with databases, one of the models will help you plan one out


Instead of hard coding your assistant ID’s you initiate a database query to return the appropriate assistant id at the run level. I currently have the same concept live

You can also manage assistant creations with the same logic

Same goes for vector store allocation. See the theme here ? You need a database and back end infrastructure for large scale projects like this .

I’m in the gym right now , but when I get home , I will see what I can do with providing you with a rough schema .