While integrating the OpenAI API (GPT-3.5-turbo) into our workflow, we have encountered several issues. When sending individual requests to the API, we receive about 20 responses before the API pauses for approximately 20 minutes. After this, we can send another 15 requests before it pauses again. This behavior significantly complicates the processing of large volumes of data, specifically an Excel file with 20,000 rows, as waiting for responses makes deployment in a production environment impractical.
In an attempt to resolve this issue, I tried implementing batch request sending. However, I encountered another problem where I only receive a response for the last string in the batch, instead of each keyword in the batch. This greatly limits the efficiency of batch processing and significantly reduces the usefulness of the API for our needs.
Do you have any experience with similar problems or recommendations on how to resolve them? Is it possible that I overlooked something in the documentation, or is there a different approach to batch processing that I should consider? Any ideas or suggestions would be greatly appreciated.
For more information on batch processing and rate limits, I referred to this documentation: OpenAI Rate Limits and Error Mitigation.