Normally you have to scale out synchronous calls on your end to achieve this. If you are doing a bunch of async calls, I don’t think there is a way to maintain sync once the responses come back.
The API réponse doesn’t really echo much back that you knew before making the call. The only exception is input tokens, you can use a library like tiktoken, identify the input tokens, and when the response comes back, it will also reflect the input tokens and you can match up this way. You would have to disambiguate (or block) the situation where the same number of input tokens are sent close together in time so you can maintain sync. Maybe use other fields like logprobs to help do this if you don’t want to block.
So without waiting synchronously, or letting your system expand synchronously automatically, like cloud services will (which is how I solve it), you’re going to have to get creative.
I hope this will be improved in the future — it requires relatively little effort to maintain and pass through a request id on the server/provider side, and it provides a huge boost for those of us implementing async workflows.