Batching multiple calls to openai in a serverless context

I am currently trying to batch in parallel multiple open AI calls to generate multiple results.

There is a lot of issues doing that but the main one is the time of execution.

Doing this with a cron, is terrible idea because if I need to generate 1000 prompts with at least 1000 tokens it could take up to 1 day of generation if nothing fails and no rate limit is encountered. If i parallelize the jobs it could still take up to 3-4 hours and cron usually have 30 minutes life.

If i use a queuing service which is calling a serverless function then the timeout of the serverless function is an issue because most of the time it take longer than 60 seconds to execute.

Also having simple machines with a queue could cost me a lot of money as i would need to scale my service up to the number of prompt to generate.

Even streaming through edge function the prompt generation is not a good way for me as i need to know when the job is done to warn my queue that it can be deleted.

Is there any way to improve the speed of it or another way of addressing that issue without it costing too much money ?

Thanks a lot !

1 Like

There’s no way to improve the API speed (other than making simpler requests with less tokens, but at some point that isn’t feasible).

Is there a reason you are looking to stay serverless? A simple $10/month virtual server with a basic database/sqlite queue could churn through a ton of requests

1 Like

You are right but I wanted to avoid provisioning machines on providers like aws or else only for this need if could do so. I am also using heroku and it’s terrible for performance unless you pay a lot more. I might need to do what you say in the end thanks :ok_hand:

AWS Lambda will automatically spin up multiple instances of the API call up to your reserved concurrency (max 1000 instances per account). So when I shoot 3 or 4 calls at once to the same Lambda, it clones it out sends the calls in parallel, and answers return asynchrously to a database.

So if this spins up, you are probably looking at maybe 10 minutes, not hours. I’d be more afraid of hitting your API quotas. Not to mention your API bill with OpenAI. :heavy_dollar_sign:


Amazon SQS free tier gives you 1 million messages a month for $0.

AWS Lambdas can run for up to 15 minutes.

So, I would push all your requests into SQS and have a bunch of Lambda functions consuming the messages (you can control how much concurrency you can have) and calling the OpenAI API. They can write the results directly to a database or push them into another SQS queue.

Depending on your use case, you may stay withing the free tier usage of the lambda functions - if all they are doing is calling an API you won’t need to give them much RAM and you get 400,000 GB-seconds of free run time per month.

So if you allocated 128MB to each function and it took 5 minutes for each call to run you should be able to do about 10,000 requests per month for free (assuming my maths is correct!).


Thanks a lot for all your answers :pray::fire:

Cloudflare workers has worked great for me – I use it as a proxy: GitHub - sdan/plugin-proxy

If you pay $5/mo you get 30 second timeouts and 1m requests. However for my case I don’t need more than 1ms so there’s another option where you can get 10m requests for the same price and I use that. It’s called “bundled” workers.