Hello, To monitor my limits and control the speed at which I send my requests I use the headers found in the responses. When I send requests with non-fine tuned models such as GPT-4-Turbo-Preview, I receive the headers. However, this is no longer the case with fine tuned models. Do any of you know the cause of the problem and what I could do? I use the client.chat.completions.with_raw_response function.
2 Likes
They used to include the x-ratelimit-*
headers in fine-tuned models until Thursday May 2nd, when this feature seems to have broken. I suspect this is a bug that OpenAI introduced with a release last week.
2 Likes