Parallelism/scaling in embedding endpoint

Yes, you are right, thank you for the additional context. Yes, I need to investigate making parallel requests, but honestly, this is something that the API should handle for me. I noticed the https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py but I will have to port that to the language I am working with.

What are you referring to with language inference? Are we still talking about embeddings? :slight_smile: