How to design SaaS architecture powered by Assistants

I’m having a tough time deciding how Assistants can be used in a SaaS application.

Convoluted example:

  • Let’s say I have a SaaS product that acts as a financial advisor.
  • You create an account on my app, upload a bunch of docs as knowledge. My app communicates with the Assistant on your behalf over API.
  • I have an Assistant with proprietary instructions using proprietary file formats to do the work.

Concerns:

  • If I have one Assistant under my own OpenAI org, then I am limited to only 20 files. How can my SaaS serve more than just a few people?
  • Or if I make the user provide their own OpenAI org ID, then I have to create an assistant in their own account. This exposes my proprietary instructions and file format.
  • Separately, but related, how does one manage billing? I feel like the only way to go about it is to charge the user for token packages through me, and then I pay OpenAI.

Anyone have additional constructs on how this might be achieved?

1 Like

You can have one Assistant, since you can (also) add files PER THREAD. So you start a new thread add files (to JUST that Thread) and then RUN.
Those files are in your account - but as the threads get discared the files can go to.
So you can have up to 20 files in your Assistant (that would be accessible for prompts in each thread/run but on top of that you would also have files per thread.
In terms of billing I would probably look for a model that bills per number of pages and / or number of runs?

If you are calling OpenAI through the API, then, yes, the only way to do billing well is to do it yourself.
If you are providing a plugin or “LLM” of some sort on the OpenAI side, and users find and use it through their site, then I believe billing has to happen through OpenAI. This is true even if the API uses a custom API of yours. Although you can also bill for your-side API usage, so the user essentially gets two bills. You then get some fraction of the money the user paid OpenAI as your “cut” of their usage.

That is interesting…

Is there a limit to the number of files you can upload to a thread? I tried googling it but not extensively yet.

Do you know how thread file retrieval fees are charged? For example, if a file is uploaded on day 1, but the user does not interact with the thread on day 2, is the $0.20/day/assistant fee charged? And actually, is that fee even charged at all for thread files since they seem to be charged per-token anyway?

So per my idea (not the convoluted one I provided) a user will actually have context at two levels. One is more of a global level, where the files apply to every thread. And the other is perfect for the thread level, where files can come and go along with the thread. So your recommendation helps here.

But this is still all per user, so I need a few global files per user. One idea is to create an assistant per user in my openai org, as I don’t think there’a limit to number of assistants. But someone brought up the point, what if you need to update the assistant instructions and you have millions of users for your saas? Probably not a feasible rounds of API calls to a million assistants.

I had another idea as I’m writing this, but how valuable is it to actually have the instructions sit at the assistant level rather than the thread level? (Obviously I can play with this idea myself and check the results) What if I had an assistant with the per-user global-files, but no default instructions, and provided all the instructions at request-time. Then the files can sit at the global level perpetually, and thread level instructions and files are passed in at request-time.

Not sure if this all makes sense to others that are reading this, but it sounds like it could be a reasonable path forward based on the state of the Assistants API at the moment.

A thread can have 10 files as per the documentation
https://platform.openai.com/docs/api-reference/messages/createMessage

Threads are automatically discarded after 60 days. I have yet to see any charges for files uploaded as part of threads.

Why would you create an Assistant per user? I would imagine the Assistant to really be ‘yours’ and contain the key global files AND global instructions. Remember an Assistant can hold 32k in instructions. That is quite detailed.
And then like you mentioned create threads with files plus thread specific (add on) instructions. So I think your ‘while writing idea’ makes a lot of sense.
You will probably end up with several global assistants that are best suited for specific tasks (or even work together chained).

1 Like

Thanks for the reference!

Continuing the example, the way I see it, there are 3 levels of knowledge/files required.

  1. The assistant level - global knowledge that applies to any account/user. Lives for the lifetime of the assistant.
  2. Account/user level files - knowledge that applies to any thread for this user but not to other users. Lives for the lifetime of the account.
  3. Thread level files, knowledge that only needs to live for the lifetime of the thread.

I talked myself out of the assistant per user idea. Even if I don’t provide any instructions, other functionality I may want to use someday, i.e code execution, function calling, etc, are all at the assistant level. It wouldn’t be good to have to update any more than a handful of assistants.

I can’t help but think there should be 3 clean layers in this solution, but given that only 2 exist, assistants and threads, I think I’ll just have to submit an ‘account level’ file with every thread.

I do like your idea of having different assistants for different functionalities in my app. Assuming they are separate enough tasks logically, there’s no reason to have one assistant to handle them all.

Thanks for bouncing the idea around.

Also don’t forget that files can live ‘by themselves’ inside OpenAI, wating to be connected to something. So you can achieve your three levels also by having you Account/User level files marked as such in your code - and (always) adding them to a user thread (or only for specific relevant Assistants) when a thread is created. Ie the thread will have your level 3) files+ whatever level 2 files you deemed needed. You control level two inside your software.

The latest Assistants update create some big changes with the the vector stores - making this scenario much easier!