Best Practices for Reliable Embeddings Pipeline

I am building a system where I need to process large volumes of data for embedding and it needs to be robust to failure. Making concurrent API calls to OpenAI or Hugging Face is not fast enough for the volume of data we have, nor is it reliable enough. We have thought about building an internal queue system but we are not sure we want to maintain it. Has anyone used any of the services that do this and can recommend them?

OpenAI Embeddings API accepts batches of embeddings. I can’t find a reference to the upper limit of each batch, but I regularly use it for batches of 30+.

And it doesn’t take much to wrap the request in a try/catch block and to retry on failure. If it fails more than some limit, then stash the request body in a failed file for further inspection.

Not sure what type of service would provide a simpler workflow than that.

1 Like