Does anyone have any experience on how long does it take for OpenAI to increase rate limits, and if they do it at all? I have requested rate limit increase for GPT3.5 model on 10th of July, 9 days ago and I haven’t got any response yet. My usecase does require a lot of calls to be made at the same time - I am basically sending abstracts of few hundred papers to API asking if that paper is relevant to specific question, and if it is, then I summarize/extract main data from the paper, all using GPT3.5 (as the cost for GPT4 would be eye-watering… In fact it already is for GPT3.5). This doesn’t hit RPM limit, but it does hit TPM limit when I am doing it on my own, so in a system where there are 3 other people doing the same thing, this would be hitting all the limits.
Based on your description of the problem, I fell that using embeddings would be the best way to accomplish this problem.
Create an embedding of each paper using ada-002 and when the user’s question comes in, creating the embedding for that as well and match it against the abstract. Play around with the matching thresholds a bit so that the cosine score (used to match the embeddings) is the most optimal based on your desires. Then, you can send the data from these papers to GPT to summarise.
This would severely reduce your GPT calls as you would only need to make the embedding for the question as it comes in and one to summarise the data. The embeddings for the papers can be stored locally or in a vector database